Architectures for Medical Image Segmentation [Part 2: Attention UNet]

Shambhavi Malik
CodeX
Published in
3 min readJun 19, 2021

--

Hey, y’all! I started writing about network architectures useful for medical image segmentation i.e. UNet and its variants. In the first article, I had covered basic UNet and 3D UNet. You can find that here. In this article, I'm going to go over Attention UNet.

Photo by JJ Ying on Unsplash

Attention UNet

Fully convolutional neural networks (FCNNs) like UNet outperform traditional approaches in medical image analysis. This is mainly attributed to the fact that (I) domain-specific image features are learned using stochastic gradient descent (SGD) optimization, (II) learned kernels are shared across all pixels and (III) image convolution operations exploit the structural information in medical images well. Convolutional layers progressively extract higher-dimensional image representations by processing local information layer by layer.

Attention Gates for Image Analysis

Schematic of the proposed additive attention gate (AG) [image by Oktay, Ozan, et al.]

Attention coefficients, αi ∈ [0, 1], identify salient image regions and prune feature responses to preserve only the activations relevant to the specific task. A gating vector gi ∈ R^Fg is used for each pixel i to determine focus regions. The gating vector contains contextual information to prune lower-level feature responses. Additive attention is used to obtain the gating coefficient. A sigmoid activation function is used.

A block diagram of the proposed Attention U-Net segmentation model. The input image is progressively filtered and downsampled by a factor of 2 at each scale in the encoding part of the network (e.g. H4 = H1/8). Nc denotes the number of classes. Attention gates (AGs) filter the features propagated through the skip connections. Feature selectivity in AGs is achieved by the use of contextual information (gating) extracted in coarser scales. (image by Oktay, Ozan, et al.)

The proposed AGs are incorporated into the standard U-Net architecture to highlight salient features that are passed through the skip connections. Information extracted from coarse-scale is used in gating to disambiguate irrelevant and noisy responses in skip connections. This is performed right before the concatenation operation to merge only relevant activations. Additionally, AGs filter the neuron activations during the forward pass as well as during the backward pass. Gradients originating from background regions are down-weighted during the backward pass. This allows model parameters in shallower layers to be updated mostly based on spatial regions that are relevant to a given task. In each sub-AG, complementary information is extracted and fused to define the output of skip connection. To reduce the number of trainable parameters and computational complexity of AGs, the linear transformations are performed without any spatial support (1x1x1 convolutions) and input feature maps are downsampled to the resolution of gating signal, similar to non-local block. The corresponding linear transformations decouple the feature maps and map them to lower dimensional space for the gating operation.

Code

The schematics of the proposed Attention-Gated Sononet (image by Oktay, Ozan, et al.)

The original code can be found here. Below is the simplified version.

Attention U-net has been applied to problems such as ocular disease diagnosis, melanoma, lung cancer, cervical cancer, abdominal structure segmentation, fetus development, and brain tissue quantification.

Stay tuned for the next article on Residual UNet. Also, find the previous article on basic UNet here.

References

Siddique, Nahian, et al. “U-Net and its variants for medical image segmentation: theory and applications.” arXiv preprint arXiv:2011.01118 (2020).

Oktay, Ozan, et al. “Attention u-net: Learning where to look for the pancreas.” arXiv preprint arXiv:1804.03999 (2018).

--

--