Architectures for Medical Image Segmentation [Part 3: Residual UNet]

Shambhavi Malik
CodeX
Published in
4 min readJun 27, 2021

--

In this article, I will try to summarise Residual UNet, its architecture, and applications. Previously, I had covered Basic UNet, 3D UNet, and Attention UNet, all of which can be found here.

Image from Fusion Alliance

In the case of medical image segmentation e.g. lung segmentation from CT images, a lot of challenges are present such as — poor illumination conditions, irregular shape, and fuzzy boundaries. Morphological techniques such as Thresholding have been utilized in the recent past for automated segmentation, but they lead to poor generalization in case of uneven illumination and highly variable lung morphology. Fuzzy connectedness has been used to determine adjacency for segmentation and rib-cage information for refinement. Further, region growing methods dependent on seed selection are also used. This method suffers from over-segmentation and leakage problems.

In contrary to traditional image processing techniques, researchers also utilized supervised techniques. Some of the examples are shape-based and atlas-based models. These methods require prior anatomy knowledge and can be computationally expensive. Recently, deep learning methods have significantly improved the state-of-the-art in object recognition and segmentation domains because of their ability to learn directly from the data. Very complex functions can be learned with multiple levels of transformations.

Residual UNet

Some of the recent studies have demonstrated how to train very deep networks effectively such as the batch normalization technique. But these techniques are not capable of handling the problem of degradation. Therefore, a new approach called residual learning has been introduced by researchers in the recent past. The residual technique can handle the problem of degradation by introducing the shortcut connection (SC).

Overview of the proposed Residual U-Net based CNN architecture for lung CT segmentation. [image by Khanna, Anita, et al.]

Residual Unit

The residual technique improves the flow of information in the network. Moreover, it reformulates the layers as learning residual functions with reference to the layer inputs. Therefore, it overcomes the problem of degradation of a deeper network. A deep residual network contains a set of residual blocks, each of these set consists of stacked layers such as batch normalization (BN), ReLU activation, weight layer (i.e. convolutional layer). Shortcut connections are those connections skipping one or more layers in the neural network. The SC goes through a convolutional layer to maintain the dimension with the output of the main convolutional block. The SC with a convolutional layer is represented by SC(conv). After constructing the residual unit, we can further build a very deep convolutional encoder-decoder by stacking residual units.

Architecture

A fundamental building block of neural networks. (a) The conventional feed-forward neural network used in U-Net, (b) residual unit consists of an identity map utilized in Residual U-Net. [image by Khanna, Anita, et al.]

Total four stages are there in the encoder and decoder path and each stage consist of the residual block. Each stage is considered a unit and there are repetitive units in each stage of the encoder path. For stage-1 there are 3 units. For stage-2, 3 the number of units is 4 and 6 respectively. For the final stage, there are 3 units.

  • Encoder Path — There are a total of 50 convolutional layers in the encoder part including the shortcut connection which goes through the convolutional layer. Convolution operations are performed in each block. The input image is resized to 128 × 128 and then batch normalization is performed on this batch. After batch normalization 2D convolution is carried out with filter size 3 × 3.
  • Decoder Path — The decoder section implements Residual U-Net consists of UpSampling layer, concatenation layer followed by a stack of convolutional, BN, and ReLU activation. The main advantage of utilizing UpSampling layer is to reconstruct the original size of activation and generate a dense activation map. Transpose convolution is used for UpSampling task. Finally, a 1 × 1 convolutional layer followed by a sigmoid activation function is utilized. The sigmoid activation function is used to generate the probability score at the output of the proposed model. This layer provides the single-channel feature representation into the desired segmentation map.

Deep residual U-nets have been used to great effect in many biomedical imaging applications such as nuclei segmentation, brain tissue quantification, brain structure mapping, retinal vessel segmentation, breast cancer, liver cancer, prostate cancer, endoscopy, melanoma, osteosarcoma, bone structure analysis, and cardiac structure analysis. Deep residual U-nets are ideal for complex image analysis applications.

Stay tuned for the next article. Also, find the previous article on Attention UNet here.

References

Siddique, Nahian, et al. “U-Net and its variants for medical image segmentation: theory and applications.” arXiv preprint arXiv:2011.01118 (2020).

Khanna, Anita, et al. “A deep Residual U-Net convolutional neural network for automated lung segmentation in computed tomography images.” Biocybernetics and Biomedical Engineering 40.3 (2020): 1314–1327.

--

--