Sustainability Journal (MDPI)
2009 | 1,010,498,008 words
Sustainability is an international, open-access, peer-reviewed journal focused on all aspects of sustainability—environmental, social, economic, technical, and cultural. Publishing semimonthly, it welcomes research from natural and applied sciences, engineering, social sciences, and humanities, encouraging detailed experimental and methodological r...
A Review of Deep-Learning-Based Medical Image Segmentation Methods
Xiangbin Liu
Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410000, China
Liping Song
Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410000, China
Shuai Liu
Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410000, China
Yudong Zhang
School of Informatics, University of Leicester, Leicester LE1 7RH, UK
Download the PDF file of the original publication
Year: 2021 | Doi: 10.3390/su13031224
Copyright (license): Creative Commons Attribution 4.0 International (CC BY 4.0) license.
[[[ p. 1 ]]]
[Summary: This page introduces a review of deep-learning-based medical image segmentation methods. It highlights the importance of medical image segmentation in sustainable medical care and its growing significance in computer vision. The paper focuses on deep learning's role, introducing basic ideas, research status, and limitations, and discusses different pathological tissues and organs.]
sustainability Review A Review of Deep-Learning-Based Medical Image Segmentation Methods Xiangbin Liu 1,2,3 , Liping Song 1,2,3 , Shuai Liu 1,2,3, * and Yudong Zhang 4, * Citation: Liu, X.; Song, L.; Liu, S.; Zhang, Y. A Review of Deep-Learning-Based Medical Image Segmentation Methods Sustainability 2021 , 13 , 1224. https://doi.org/ 10.3390/su 13031224 Academic Editors: Jordi Colomer Feliu and Daniel Burgos Received: 10 December 2020 Accepted: 21 January 2021 Published: 25 January 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations Copyright: © 2021 by the authors Licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/) 1 Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410000, China; xbliufrank@hunnu.edu.cn (X.L.); song@smail.hunnu.edu.cn (L.S.) 2 College of Information Science and Engineering, Hunan Normal University, Changsha 410000, China 3 Xiangjiang Institute of Artificial Intelligence, Changsha 410000, China 4 School of Informatics, University of Leicester, Leicester LE 1 7 RH, UK * Correspondence: liushuai@hunnu.edu.cn (S.L.); yudongzhang@ieee.org (Y.Z.) Abstract: As an emerging biomedical image processing technology, medical image segmentation has made great contributions to sustainable medical care. Now it has become an important research direction in the field of computer vision. With the rapid development of deep learning, medical image processing based on deep convolutional neural networks has become a research hotspot. This paper focuses on the research of medical image segmentation based on deep learning. First, the basic ideas and characteristics of medical image segmentation based on deep learning are introduced. By explaining its research status and summarizing the three main methods of medical image segmentation and their own limitations, the future development direction is expanded. Based on the discussion of different pathological tissues and organs, the specificity between them and their classic segmentation algorithms are summarized. Despite the great achievements of medical image segmentation in recent years, medical image segmentation based on deep learning has still encountered difficulties in research. For example, the segmentation accuracy is not high, the number of medical images in the data set is small and the resolution is low. The inaccurate segmentation results are unable to meet the actual clinical requirements. Aiming at the above problems, a comprehensive review of current medical image segmentation methods based on deep learning is provided to help researchers solve existing problems Keywords: image segmentation; deep learning; convolutional neural network; medical image 1. Introduction Image segmentation is an important and difficult part of image processing. It has become a hotspot in the field of image understanding. This is also a bottleneck that restricts the application of 3 D reconstruction and other technologies. Image segmentation divides the entire image into several regions, which have some similar properties. Simply put, it is to separate the target from the background in an image. At present, image segmentation methods are developing in a faster and more accurate direction. By combining various new theories and new technologies, we are finding a general segmentation algorithm that can be applied to kind of images [ 1 ]. With the advancement of medical treatment, all kinds of new medical imaging equipment are becoming more and more popular. The types of medical imaging widely used in clinic are mainly computed tomography (CT), magnetic resonance imaging (MRI), positron-emission tomography (PET), X-ray and ultrasound imaging (UI). In addition, it also includes some common RGB images, such as microscopy and fundus retinal images There is very useful information in medical images. Doctors use CT and other medical images to judge the patient’s condition, which has gradually become the main basis for Sustainability 2021 , 13 , 1224. https://doi.org/10.3390/su 13031224 https://www.mdpi.com/journal/sustainability
[[[ p. 2 ]]]
[Summary: This page elaborates on the advantages of deep learning in medical image segmentation, specifically in accuracy and speed, aiding doctors in tumor size confirmation and treatment evaluation. It outlines the methodology for literature selection, emphasizing deep learning methods and verified results. It also details the paper's structure, covering medical image segmentation, deep learning concepts, and network structures.]
Sustainability 2021 , 13 , 1224 2 of 29 doctors’ clinical diagnosis [ 2 ]. Therefore, the research on medical image processing has become the focus of attention in the field of computer vision With the rapid development of artificial intelligence, especially deep learning (DL) [ 3 ], image segmentation methods based on deep learning have achieved good results in the field of image segmentation. Compared with traditional machine learning and computer vision methods, deep learning has certain advantages in segmentation accuracy and speed. Therefore, the use of deep learning to segment medical images can effectively help doctors confirm the size of diseased tumors, quantitatively evaluate the effect before and after treatment, greatly reducing the workload of doctors In order to better summarize the various methods, we searched the keywords “medical image processing” or “deep learning” from Google Scholar and ArXiv to obtain the latest literature. In addition, the top medical image processing conferences are also good places for us to obtain materials, such as MICCAI (Medical Image Computing and Computer Assisted Intervention), ISBI (International Symposium on Biomedical Imaging), and IPMI (Information Processing in Medical Imaging). The papers we selected are mainly based on deep learning methods. We guarantee that all the results of the papers are verified. Different from the existing reviews [ 4 – 6 ], this survey reviews the recent progress, advantages, and disadvantages in the field of medical image segmentation from the perspective of deep learning. It compares and summarizes related methods, and identifies the challenges for successful methods of deep learning to medical imaging segmentation task in the future work. In this paper, we conduct a comprehensive review of medical imaging DL technology in recent years, mainly focusing on the latest methods published in the past three years and the classic methods in the past. First, it focuses on the application of deep learning technology in medical image segmentation in the past three years. A more indepth study is carried on its network structure and methods. At the same time, its strengths and weaknesses are analyzed. Second, some state-of-the-art segmentation methods are summarized according to the characteristics of different organs and tissues. Third, we shared many evaluation metrics and data sets of medical image segmentation for readers to evaluate and train the network. The structure of the article is as follows: Section 2 examines what is medical image segmentation. In Section 3 , we explained the concept of deep learning and the application of deep learning. Sections 4 and 5 are the main body of the reviewed literature. Section 4 introduces the three network structures, FCN (fully convolutional network), U-Net and GAN (generative adversarial network) based on deep learning medical image segmentation. Section 5 introduces the segmentation methods of different organs and tissues. Section 6 is the sharing of evaluation metrics and data sets, which are derived from the influential medical image analysis challenges. The summary and outlook of the article are in Section 7 . 2. Medical Image Segmentation 2.1. Problem Definition Image segmentation based on medical imaging is the use of computer image processing technology to analyze and process 2 D or 3 D images to achieve segmentation, extraction, three-dimensional reconstruction [ 7 ] and three-dimensional display of human organs, soft tissues and diseased bodies. It divides the image into several regions based on the similarity or difference between regions. Doctors can perform qualitative or even quantitative analysis of lesions and other regions of interest through this method, thereby greatly improving the accuracy and reliability of medical diagnosis. Currently, the main variety, tissues and organs of the image cells are used as object Generally, medical image segmentation can be described by a set theory model: given a medical image I and a set of similarity constraints C i (i = 1, 2, . . . ), the segmentation of I is to obtain a division of it, namely: N ∪ x = 1 R x = I, R x ∩ R y = ∅ , ∀ x 6 = y , x , y ∈ [ 1, N ] (1)
[[[ p. 3 ]]]
[Summary: This page defines medical image segmentation as the use of computer image processing to analyze images for segmentation, extraction, and 3D reconstruction of human organs and tissues. It outlines the stages of medical image segmentation, including data acquisition, preprocessing, segmentation, and performance evaluation. It also explains the importance of setting effective performance indicators.]
Sustainability 2021 , 13 , 1224 3 of 29 where R x satisfies both sets of all pixels in communication similarity constraint C i (i = 1,2, . . . ), i.e., the image areas. The same is true for R y x , y are used to distinguish the different regions. N is a positive integer not less than 2, indicating the number of regions after division. The process of medical image segmentation can be divided into the following stages: 1 Obtain medical imaging data set, generally including training set, validation set, and test set. When using machine learning for image processing, the data set is often divided into three parts. Among them, the training set is used to train the network model, the verification set is used to adjust the hyperparameters of the model, and the test set is used to verify the final effect of the model 2 Preprocess and expand the image, generally including standardization of input image, perform random rotation and random scaling on the input image to increase the size of the data set 3 Use appropriate medical image segmentation method to segment the medical image, and output the segmented images 4 Estimation performance evaluation. In order to verify the effectiveness of medical image segmentation, effective performance indicators need to be set to be verified This is an integral part of the process 2.2. Image Segmentation Image segmentation is a classic problem in computer vision research and has become a hotspot in the field of image understanding. The so-called image segmentation refers to the division of an image into several disjointed areas according to features such as grayscale, color, spatial texture, and geometric shapes. So that these features show consistency or similarity in the same area, but between different areas shows a clear difference. Image segmentation is divided into semantic segmentation, instance segmentation and panoramic segmentation according to the different coarse and fine granularity of segmentation. Segmentation of medical images is regarded as a semantic segmentation task. At present, there are more and more research branches of image segmentation, such as satellite image segmentation, medical image segmentation, autonomous driving [ 8 , 9 ], etc. With the large increase in the proposed network structure, the image segmentation method is improved step by step to obtain more and more accurate segmentation results. However, for different segmentation examples, there is no universal segmentation algorithm that is suitable for all images Traditional image segmentation methods can no longer be compared with the segmentation methods based on deep learning in effect, but the ideas are still worth learning [ 10 – 12 ]. Like the proposed threshold-based segmentation method [ 13 ], regionbased image segmentation method [ 14 ], and edge detection-based segmentation method [ 15 ]. These methods use the knowledge of digital image processing and mathematics to segment the image. The calculation is simple and the segmentation speed is fast, but the accuracy of the segmentation cannot be guaranteed in terms of details. At present, methods based on deep learning have made remarkable achievements in the field of image segmentation. Their segmentation accuracy has surpassed traditional segmentation methods. The fully convolutional network was the first to successfully use deep learning for image semantic segmentation. This was the pioneering work of using convolutional neural networks for image segmentation. The authors proposed the concept of full convolutional networks. Then there are outstanding segmentation networks such as U-Net, Mask R-CNN [ 16 ], RefineNet [ 17 ], and DeconvNet [ 18 ], which have a strong advantage in processing fine edges 3. Deep Learning 3.1. Overview of Deep Learning Network Deep learning is a research trend in the rise of machine learning and artificial intelligence. It uses deep neural networks to simulate the learning process of the human brain and extract features from large-scale data (sound, text, images, etc.) in an unsupervised
[[[ p. 4 ]]]
[Summary: This page describes deep learning as a research trend in machine learning that uses deep neural networks to simulate the human brain's learning process. It highlights the role of neural networks in enabling end-to-end image processing and discusses the use of deep learning in computer vision tasks such as image recognition, repair, segmentation, and object tracking.]
Sustainability 2021 , 13 , 1224 4 of 29 manner [ 19 ]. A neural network is composed of many neurons. Each neuron can be regarded as a small information-processing unit. The neurons are connected to each other in a certain way to form the entire deep neural network. The emergence of neural networks makes end-to-end image processing possible. When the hidden layers of the network develop to multiple layers, it is called deep learning. In order to solve the difficult problem of deep network training, layer-by-layer initialization and batching are required, which makes deep learning the protagonist of the era and the research boom In the field of computer vision, deep learning is mainly used in data dimensionality reduction, handwritten number recognition, pattern recognition and other fields. Such as image recognition, image repair, image segmentation, object tracking, scene analysis, etc., showing very high effectiveness [ 20 ]. 3.2. Convolutional Neural Networks The convolutional neural network (CNN) [ 21 ] is a classic model produced by the combination of deep learning and image-processing technology. As one of the most representative neural networks in the field of deep learning technology, it has made many breakthroughs in the field of image analysis and processing. In the standard image annotation set ImageNet, which is commonly used in academia, many achievements have been made based on convolutional neural networks, including image feature extraction and classification, pattern recognition, etc. The convolutional neural network is a deep model with supervised learning. The basic idea is to share the weights of feature mapping in different positions of the previous layer network, and to reduce the number of parameters by using spatial relative relationships to improve training performance From the proposal of the convolutional neural network to the current wide application, it has roughly experienced the stage of theoretical budding, experimental development, large-scale application and in-depth research. The proposal of receptive fields and neurocognitive machines in human visual information is an important theory in the embryonic stage of theory. In 1962, Hubel et al. [ 22 ] showed through biological research that the transmission of visual information in the brain from the retina is accomplished through multilevel receptive field excitation. This is the first proposed the concept of receptive field. In 1980, Fukushima [ 23 ] proposed a neurocognitive machine based on the concept of receptive fields. It is regarded as the first implementation network of convolutional neural networks. In 1998, L é cun et al. [ 24 ] proposed LeNet 5 using a gradient-based backpropagation algorithm for supervised training of the network, which entered the experimental development stage. The academic circle’s attention to convolutional neural networks also began with the proposal of the LeNet 5 network and successfully applied to handwriting recognition. After the LeNet 5 network, the convolutional neural network has been in the experimental development stage. It was not until the introduction of the AlexNet network in 2012 that the position of convolutional neural networks in deep learning applications was established. The AlexNet proposed by Krizhevsky et al. [ 25 ] was the most successful at image classification of the training set of ImageNet, making convolutional neural networks become the key research object in computer vision, and this research continues to deepen 3.2.1. 2 D CNN CNN consists of an input layer, an output layer, and several hidden layers. Each layer in the hidden layer performs a specific operation, such as convolution, pooling, and activation The input layer is connected to the input image, and the number of neurons in this layer is the pixel of the input image. The middle convolutional layer performs feature extraction on the input data through a convolution operation to obtain a feature map. The result of the convolution operation depends on the setting of the parameters in the convolution kernel The pooling layer behind the convolutional layer filters and selects feature maps, simplifying the computational complexity of the entire network. Through the fully connected layer, all neurons in the previous layer are fully connected. The obtained output value is sent to the classifier, which gives the classification result. The general convolutional neural network is
[[[ p. 5 ]]]
[Summary: This page introduces the convolutional neural network (CNN) as a classic model combining deep learning and image processing, noting its breakthroughs in image analysis. It details the basic components of a CNN, including the input, hidden, and output layers, and explains the functions of convolutional and pooling layers. It further explains the difference between 2D and 3D CNNs.]
Sustainability 2021 , 13 , 1224 5 of 29 2 D CNN. Its input image is 2 D and the convolution kernel is a 2 D convolution kernel, such as ResNet [ 26 ], VGG (Visual Geometry Group) [ 27 ], etc. Suppose the input image size is H × W with three channels, RGB. The convolution kernel of size (c, h, w) slides on the spatial dimension of the input image, where c, h, w denote the number of channels, the height and the width of the convolution kernel, respectively. The value of the image and the value of (h, w) is entered on each channel to perform a convolution operation to obtain a value. The process of 2 D CNN convolution is shown in Figure 1 . Sustainability 2021 , 13 , x FOR PEER REVIEW 5 of 30 layer is the pixel of the input image. The middle convolutional layer performs feature extraction on the input data through a convolution operation to obtain a feature map. The result of the convolution operation depends on the setting of the parameters in the convolution kernel. The pooling layer behind the convolutional layer filters and selects feature maps, simplifying the computational complexity of the entire network. Through the fully connected layer, all neurons in the previous layer are fully connected. The obtained output value is sent to the classifier, which gives the classification result. The general convolutional neural network is 2 D CNN. Its input image is 2 D and the convolution kernel is a 2 D convolution kernel, such as ResNet [26], VGG (Visual Geometry Group) [27], etc. Suppose the input image size is H × W with three channels, RGB. The convolution kernel of size (c, h, w) slides on the spatial dimension of the input image, where c, h, w denote the number of channels, the height and the width of the convolution kernel, respectively. The value of the image and the value of (h, w) is entered on each channel to perform a convolution operation to obtain a value. The process of 2 D CNN convolution is shown in Figure 1. Figure 1. Two-dimensional convolutional neural network (2 D CNN) convolution. 3.2.2. 3 D CNN Most images in medical images are usually 3 D, such as CT and MRI. Although the CT image we usually see is a 2 D image, it is just a slice of it. Therefore, if you want to segment some diseased tissues, you must use a 3 D convolution kernel. For example, the convolution kernel used by the segmentation network 3 D U-Net is 3 D. It changed the 2 D convolution kernel in the U-Net network to a 3 D convolution kernel, which is suitable for 3 D medical image segmentation [28]. 3 D CNN can extract a more powerful volume representation on the three axes of X, Y, and Z. The use of three-dimensional information in segmentation makes full use of the advantages of spatial information. The 3 D convolution kernel has one more depth than the 2 D convolution kernel, which means the number of 2 D slices of medical images. Given a 3 D image C × N × H × W where C, N, H and W represent the number of channels, the number of slice layers, the height and width of the convolution kernel. Like the 2 D convolution operation, a value is obtained by sliding the window on the height, width, and number of layers on each channel. The process of 3 D CNN convolution is shown in Figure 2. Figure 1. Two-dimensional convolutional neural network (2 D CNN) convolution 3.2.2. 3 D CNN Most images in medical images are usually 3 D, such as CT and MRI. Although the CT image we usually see is a 2 D image, it is just a slice of it. Therefore, if you want to segment some diseased tissues, you must use a 3 D convolution kernel. For example, the convolution kernel used by the segmentation network 3 D U-Net is 3 D. It changed the 2 D convolution kernel in the U-Net network to a 3 D convolution kernel, which is suitable for 3 D medical image segmentation [ 28 ]. 3 D CNN can extract a more powerful volume representation on the three axes of X, Y, and Z. The use of three-dimensional information in segmentation makes full use of the advantages of spatial information. The 3 D convolution kernel has one more depth than the 2 D convolution kernel, which means the number of 2 D slices of medical images. Given a 3 D image C × N × H × W where C, N, H and W represent the number of channels, the number of slice layers, the height and width of the convolution kernel. Like the 2 D convolution operation, a value is obtained by sliding the window on the height, width, and number of layers on each channel. The process of 3 D CNN convolution is shown in Figure 2 . Sustainability 2021 , 13 , x FOR PEER REVIEW 6 of 30 Figure 2. 3 D CNN convolution. 3.2.3. Basic Deep Learning Architectures for Segmentation The segmentation network is also changed in the common CNN structure. The first segmentation network was to change the last two fully connected layers for the classification network to convolutional layer. The bone of the medical image segmentation network is based on the deep structure like VGG and ResNet as well as the encoder-decoder structure. LeNet and AlexNet are early network models. The two network structures are relatively similar and belong to shallow networks. AlexNet has many more parameters than LeNet network. Its idea of adding a pooling layer after the convolutional layer is still popular now. An improvement of VGG over AlexNet is to deepen the number of network layers. It used several consecutive 3 × 3 convolution kernels to replace the larger convolution kernel in AlexNet. Under the condition of ensuring the same receptive field, the depth of the network and the effect of feature extraction are advanced. The structure of VGG is simple and neat. The entire network uses the same size convolution kernel and maximum pooling size, verifying that performance can be improved by continuously deepening the network structure. All the networks mentioned above obtain better training effects by increasing the number of network layers. But this can also cause problems, such as overfitting and vanishing gradients. In response to these problems, GoogleNet [29] improved from another perspective, dividing the evacuation network structure into modules. The inception structure is proposed to increase depth and width of the network while reducing parameter of the network. Inception uses multiple convolution kernels of different sizes and adds pooling. Then the result of convolution and pooled are together in series. The depth of the entire network reached 22 layers. The CNN network has developed from the seven layers of AlexNet to the 19 layers of VGG, followed by 22 layers of GoogleNet. When the depth reaches a certain number of layers, the further increase cannot improve the performance of classification, but will cause the network to converge slowly. In order to train a deeper network with good results, He et al. [26] proposed a new 152-layer network structure—ResNet. ResNet solves this problem by using shortcut, which is composed of many residual blocks. Each module consists of a number of consecutive layers and a shortcut. This shortcut connects the input and output of the module together, adding them before ReLU (rectified linear unit) activation. The resulting output is then send to the ReLU activation function to generate the output of this block. Besides, there are network structural units like squeeze-and-excitation blocks, which improve the Figure 2. 3 D CNN convolution.
[[[ p. 6 ]]]
[Summary: This page discusses the architecture of segmentation networks, noting that they often use CNNs like VGG and ResNet as well as encoder-decoder structures. It highlights the evolution of CNNs, from AlexNet to GoogleNet, and addresses issues like overfitting and vanishing gradients. It also explains the encoder-decoder architecture and its role in semantic segmentation.]
Sustainability 2021 , 13 , 1224 6 of 29 3.2.3. Basic Deep Learning Architectures for Segmentation The segmentation network is also changed in the common CNN structure. The first segmentation network was to change the last two fully connected layers for the classification network to convolutional layer. The bone of the medical image segmentation network is based on the deep structure like VGG and ResNet as well as the encoder-decoder structure. LeNet and AlexNet are early network models. The two network structures are relatively similar and belong to shallow networks. AlexNet has many more parameters than LeNet network. Its idea of adding a pooling layer after the convolutional layer is still popular now. An improvement of VGG over AlexNet is to deepen the number of network layers. It used several consecutive 3 × 3 convolution kernels to replace the larger convolution kernel in AlexNet. Under the condition of ensuring the same receptive field, the depth of the network and the effect of feature extraction are advanced. The structure of VGG is simple and neat. The entire network uses the same size convolution kernel and maximum pooling size, verifying that performance can be improved by continuously deepening the network structure. All the networks mentioned above obtain better training effects by increasing the number of network layers. But this can also cause problems, such as overfitting and vanishing gradients. In response to these problems, GoogleNet [ 29 ] improved from another perspective, dividing the evacuation network structure into modules. The inception structure is proposed to increase depth and width of the network while reducing parameter of the network. Inception uses multiple convolution kernels of different sizes and adds pooling. Then the result of convolution and pooled are together in series. The depth of the entire network reached 22 layers. The CNN network has developed from the seven layers of AlexNet to the 19 layers of VGG, followed by 22 layers of GoogleNet. When the depth reaches a certain number of layers, the further increase cannot improve the performance of classification, but will cause the network to converge slowly. In order to train a deeper network with good results, He et al. [ 26 ] proposed a new 152-layer network structure—ResNet. ResNet solves this problem by using shortcut, which is composed of many residual blocks. Each module consists of a number of consecutive layers and a shortcut. This shortcut connects the input and output of the module together, adding them before ReLU (rectified linear unit) activation. The resulting output is then send to the ReLU activation function to generate the output of this block. Besides, there are network structural units like squeeze-and-excitation blocks, which improve the expressive ability of the network model from the perspective of the new network model, the channel relationship, to design [ 30 ]. Combining the front-end-based CNN encoder and the back-end-based decoder together, this is the encoder-decoder architecture. It is also the basic structure of a semantic segmentation network. The structure of the encoder in the segmentation task is similar, and most of them are CNNs for classification tasks. It extracts image features from the input image, and compacts the features by encoding to produce the low-resolution feature map. The decoder maps the low-resolution discriminative feature map learned by the encoder to the high-resolution pixel space to realize the category labeling of each pixel. SegNet [ 31 ] is a classic encoding-decoding structure. Its encoder and decoder correspond one-to-one, both have the same spatial size and number of channels. The innovation of semantic segmentation network mainly comes from the continuous optimization of the encoder and decoder structure and the improvement of its efficiency. In particular, the effect and complexity of the decoder are very large for the result of the entire segmentation network 3.3. Application of Deep Learning in Image Segmentation Deep learning has been driving the development of the image field, including image classification and image segmentation. Image segmentation is different from image classification. Image classification only shows which class or classes the entire image belongs to, while image segmentation needs to identify the information of each pixel in the image.
[[[ p. 7 ]]]
[Summary: This page discusses the application of deep learning in image segmentation, distinguishing it from image classification. It highlights the fully convolutional network (FCN) as the first article to successfully apply deep learning to image segmentation, and mentions other models like U-Net and Mask R-CNN. It also briefly mentions other construction methods such as RNN and weakly-supervised methods.]
Sustainability 2021 , 13 , 1224 7 of 29 The study of the fully convolutional network [ 32 ] for semantic segmentation was the first article that applied deep learning to image segmentation and achieved outstanding results. After that, many models of image segmentation have borrowed from FCN. This network is inspired by the VGG network structure. FCN does not require the size of the input image. It is a novel point that all layers are fully convolutional. However, the result obtained after FCN segmentation is still not fine enough, relatively blurry and smooth. It is not sensitive to details in the image. Later, Ronneberger et al. [ 33 ] proposed U-Net for the lack of training images in biomedical images. This network has two advantages: first, the output result can locate the position of the target category. Second, the input training data are patches, which is equivalent to data augmentation and solves the problem about a small number of biomedical images. SegNet [ 31 ] builds an encoder-decoder symmetric structure based on the semantic segmentation task of FCN to achieve end-toend pixel-level image segmentation. Zhao et al. [ 34 ] proposed the pyramid scene parsing network (PSPNet). Through the pyramid pool module and the proposed pyramid scene parsing network, it aggregates the ability to mine global context information based on the context information of different regions. Another important segmentation model is Mask R-CNN. Faster R-CNN [ 35 ] is a popular target detection framework, and Mask R-CNN extends it to an instance segmentation framework. These are used for image segmentation very classic network model. Furthermore, there are other methods of construction, such as those done by RNN (recurrent neural network), and the more meaningful weaklysupervised methods 4. Medical Image Segmentation Based on Deep Learning When performing image segmentation operations, convolutional neural networks have excellent feature extraction capabilities and good feature expression capabilities. It do not require manual extraction of image features or excessive preprocessing of images Therefore, CNN has been used in medical image segmentation in recent years. It has achieved great success in the field and auxiliary diagnosis. This section summarizes the existing classic research results and divides the existing deep-learning-based medical image segmentation methods into three categories: FCN, U-Net, and GAN. Each category is separately introduced. The advantages and disadvantages of each method are compared 4.1. Fully Convolutional Neural Networks FCN is the pioneering work of the most successful and advanced deep learning technology for semantic segmentation. In this section, the advantages and limitations of FCN networks are introduced. The variants of FCN and its applications are presented 4.1.1. FCN For general classification CNN networks, such as VGG and ResNet, some fully connected layers are added at the end of the network. The category probability information can be obtained after the softmax layer, but this probability information is one-dimensional That is, only the category of the entire image can be identified, not the category of each pixel So, this fully connected method is not suitable for image segmentation. Long et al. [ 32 ] proposed the fully convolutional network in response to the above problems. In the usual CNN structure, the first five layers are convolutional layers. The sixth and seventh layers are fully connected layers with a length of 4096 (one-dimensional vector). The eighth layer is a fully connected layer with a length of 1000, corresponding to the probability of 1000 categories. FCN changes the three layers from layer 5 to 7 into convolution layers whose convolution kernel sizes are 7 × 7, 1 × 1, and 1 × 1, so as to obtain a two-dimensional feature map of each pixel. Then it is followed by a softmax layer to obtain the classification information of each pixel. The segmentation problem is solved. The fully convolutional network can accept input images of any size. FCN uses the deconvolution layer to upsample the feature map of the last convolution layer and restore it to the same size of the input image. Thus, a prediction can be generated for each pixel, while retaining the spatial
[[[ p. 8 ]]]
[Summary: This page introduces Fully Convolutional Neural Networks (FCNs) as a pioneering technology for semantic segmentation. It explains how FCNs address the limitations of traditional CNNs by converting fully connected layers into convolutional layers, allowing for pixel-by-pixel classification. It also discusses FCN's ability to accept input images of any size and its use of deconvolution layers for upsampling.]
Sustainability 2021 , 13 , 1224 8 of 29 information in the original input image. Finally, pixel-by-pixel classification is performed on the upsampled feature map to complete the final image segmentation. According to the magnification of upsampling, it is divided into FCN-32 s, FCN-16 s, and FCN-8 s. The network structure of FCN is shown in Figure 3 . Sustainability 2021 , 13 , x FOR PEER REVIEW 8 of 30 FCN is the pioneering work of the most successful and advanced deep learning technology for semantic segmentation. In this section, the advantages and limitations of FCN networks are introduced. The variants of FCN and its applications are presented. 4.1.1. FCN For general classification CNN networks, such as VGG and ResNet, some fully connected layers are added at the end of the network. The category probability information can be obtained after the softmax layer, but this probability information is one-dimensional. That is, only the category of the entire image can be identified, not the category of each pixel. So, this fully connected method is not suitable for image segmentation. Long et al. [32] proposed the fully convolutional network in response to the above problems. In the usual CNN structure, the first five layers are convolutional layers. The sixth and seventh layers are fully connected layers with a length of 4096 (one-dimensional vector). The eighth layer is a fully connected layer with a length of 1000, corresponding to the probability of 1000 categories. FCN changes the three layers from layer 5 to 7 into convolution layers whose convolution kernel sizes are 7 × 7, 1 × 1, and 1 × 1, so as to obtain a twodimensional feature map of each pixel. Then it is followed by a softmax layer to obtain the classification information of each pixel. The segmentation problem is solved. The fully convolutional network can accept input images of any size. FCN uses the deconvolution layer to upsample the feature map of the last convolution layer and restore it to the same size of the input image. Thus, a prediction can be generated for each pixel, while retaining the spatial information in the original input image. Finally, pixel-by-pixel classification is performed on the upsampled feature map to complete the final image segmentation. According to the magnification of upsampling, it is divided into FCN-32 s, FCN-16 s, and FCN-8 s. The network structure of FCN is shown in Figure 3. Figure 3. The structure of the fully convolutional network (FCN) [32]. 4.1.2. DeepLab v 1 Figure 3. The structure of the fully convolutional network (FCN) [ 32 ]. 4.1.2. DeepLab v 1 However, the shortcomings of FCN are also very prominent. First, the results of its upsampling are relatively fuzzy and insensitive to the details of the image, resulting in the segmentation results not being fine enough. Second, the idea of segmentation is essentially to classify each pixel without full consideration. The relationship between pixels and pixels lacks spatial consistency In order to get a denser score map in FCN, the authors added padding to the first convolutional layer, The padding size is equal to 100, which will bring a lot of noise. Chen et al. [ 36 ] proposed DeepLab v 1, which changed the pooling stride from the original 2 to 1 and the padding size from the original 100 to 1. In this way, the size of the pooled image is not reduced and the score map result obtained is denser than that of FCN. DeepLab v 1 is rewritten based on the VGG-16 network, removing the last fully connected layer of the VGG network and using full convolution instead because using too many pooling layers will result in the feature layer size being too small. The features contained are too sparse, which is not conducive to semantic segmentation. The authors removed the last two pooling layers and added atrous convolution. Compared with traditional convolution, the receptive field can be expanded without increasing the amount of calculation and the density of features can be increased. Finally, DeepLab v 1 uses conditional random field (CRF) [ 37 ] to improve the accuracy of segmentation boundaries 4.1.3. DeepLab v 2 DeepLab v 2 is an improvement based on DeepLab v 1. DeepLab v 2 [ 38 ] solved the difficulty of segmentation caused by differences of the same object scale in the same image. When the same thing has different sizes in the same image or different images, the traditional method is to force the image to the same size by resizing. But this will cause
[[[ p. 9 ]]]
[Summary: This page discusses DeepLab v1, an improvement over FCN that addresses issues like fuzzy upsampling and lack of spatial consistency. It explains how DeepLab v1 modifies the pooling stride and padding size to obtain a denser score map. It also discusses the use of atrous convolution to expand the receptive field and conditional random field (CRF) to improve segmentation boundaries.]
Sustainability 2021 , 13 , 1224 9 of 29 some features to be distorted or disappear. The contribution of DeepLab v 2 lies in the more flexible use of atrous convolution, which proposed atrous spatial pyramid pooling (ASPP). Inspired by spatial pyramid pooling (SPP), ASPP proposes a similar structure that uses parallel convolutional sampling of holes at different sampling rates on a given input, which is equivalent to capturing the context of images at multiple scales. In DeepLab v 2, authors switched to the more complex and expressive ResNet-101 network. The continuous pooling and downsampling of deep convolutional neural network (DCNN) cause the resolution to decrease. DeepLab v 2 removes downsampling in the last few maximum pooling layers. It instead uses atrous convolution to calculate feature maps with a higher sampling density. They also removed the fully connected layer in the network and replaced it with a fully convolutional layer, using a conditional random field to improve accuracy of the segmentation boundary. In addition, DeepLab v 2 uses a fully connected CRF. The local features of classification are optimized by using underlying detailed information. The deep neural network has a high accuracy rate for classification, which means that it has obvious advantages in high-level semantics. However, pixel-level classification belongs to low-level semantic information, so it appears very vague in local details. Therefore, the author hopes to optimize the detailed information through CRF 4.1.4. DeepLab v 3 and DeepLab v 3+ DeepLab v 3 [ 39 ] continued to use the ResNet-101 network. Aiming at the problem of multiscale target segmentation, a cascaded or parallel atrous convolution module is designed. It adopted multiple atrous rates to capture multiscale context. In addition, the authors added the previously proposed ASPP module. This module detects convolutional features on multiple scales and uses image-level features to encode the global context to further improve performance. Finally, DeepLab v 3 began to remove CRF. The experimental results showed that the model has a significant improvement over the previous DeepLab version. However, DeepLab v 3 also has some shortcomings. For example, the zooming effect of output image is not good and there is too little information. DeepLab v 3+ [ 40 ] extended DeepLab v 3. It added a simple and effective decoder module to refine the segmentation results, especially the segmentation results along target boundary. In order to improve the effect of the output image, DeepLab v 3+ used a feature map of the middle layer to enlarge the output image. The Xception model is used in the semantic segmentation task. The depthwise separable convolution is used in ASPP and the decoding module to improve the running speed and robustness of the encoder-decoder network 4.1.5. SegNet SegNet [ 31 ] builds an encoder-decoder symmetric structure based on the semantic segmentation task of FCN to achieve end-to-end pixel-level image segmentation. The network is mainly composed of two parts: the encoder and the decoder. The encoder is a network model that continues to use VGG 16, mainly for analyzing object information The decoder corresponds the parsed information into the final image form, that is, each pixel is represented by the color or label corresponding to its object information. The novelty lies in the way that the decoder upsamples its input feature map with lower resolution. FCN uses a deconvolution operation to upsample. The difference of SegNet is that decoder uses a larger pooling index (position) transmitted from the encoder to nonlinearly upsample its input, so that upsampling does not require learning and a sparse features map is generated. Then, a trainable convolution kernel is used for convolution operation to generate a dense feature map. When feature maps are restored to original resolution, they are sent to the softmax classifier for pixel-level classification. This helps maintain integrity of high-frequency information, improves edge characterization, and reduces training parameters, but, when depooling low-resolution feature maps, it will also ignore adjacent information.
[[[ p. 10 ]]]
[Summary: This page discusses SegNet, which builds an encoder-decoder symmetric structure based on FCN to achieve end-to-end pixel-level image segmentation. It explains that the encoder analyzes object information, while the decoder corresponds the parsed information into the final image form. The novelty lies in the way that the decoder upsamples its input feature map with lower resolution.]
Sustainability 2021 , 13 , 1224 10 of 29 4.1.6. Other FCN Structures Zhou et al. [ 41 ] used FCN in a 2.5 D approach for the segmentation of 19 organs in 3 D CT images. This technology uses a three-dimensional volume two-dimensional slice for pixel-to-label training, and designs a separate FCN (three FCNs in total) for each twodimensional profile. Finally, the segmentation result of each pixel is merged with results of other FCNs to obtain final segmentation output. The accuracy of this technology on large organs such as the liver is higher than that of small organs such as the pancreas. Christ et al. [ 42 ] proposed superimposing a series of FCNs. Each model using context features extracted from the prediction map of the previous model can improve accuracy of segmentation. This method is called cascaded FCN (CFCN). Zhou et al. [ 43 ] proposed the application of focal loss on FCN to reduce number of false positives in medical images due to imbalance in the ratio of background and foreground pixels 4.2. U-Net 4.2.1. 2 D U-Net Based on FCN, Ronneberger et al. [ 33 ] designed a U-Net network for biomedical images, which was widely used in medical image segmentation after it was proposed. Due to its excellent performance, U-Net and its variants have been widely used in various sub-fields of computer vision (CV). This approach was presented at the 2015 MICCAI conference and has been cited more than 4000 times. So far, U-Net has had many variants There are many new design methods of convolutional neural network. But many of them still cited the core idea of U-Net, adding new modules or integrating other design concepts U-Net network is composed of U channel and skip-connection. The U channel is similar to the encoder-decoder structure of SegNet. The encoder has four submodules, each of which contains two convolutional layers. After each submodule, there is a max pool to realize downsampling. The decoder contains four submodules. The resolution is increased successively by upsampling. Then it gives predictions for each pixel The network structure is shown in Figure 4 . The input is 572 × 572, and the output is 388 × 388. The output is smaller than the input mainly because of the need for segmentation in the medical field, which is more accurate. It can be seen from the figure that this network has no fully connected layer, only convolution and downsampling. The network also uses a skip connection to connect the upsampling result to the output of submodule with the same resolution in the encoder as the input of next submodule in the decoder Sustainability 2021 , 13 , x FOR PEER REVIEW 11 of 30 output is smaller than the input mainly because of the need for segmentation in the medical field, which is more accurate. It can be seen from the figure that this network has no fully connected layer, only convolution and downsampling. The network also uses a skip connection to connect the upsampling result to the output of submodule with the same resolution in the encoder as the input of next submodule in the decoder. The reason why U-Net is suitable for medical image segmentation is that its structure can simultaneously combine low-level and high-level information. The low-level information helps to improve accuracy. The high-level information helps to extract complex features. Figure 4. The structure of the U-Net [33]. 4.2.2. 3 D U-Net The improvement of U-Net has become a research hotspot in medical image segmentation. Many variants have been developed on this basis. Çiçek et al. [44] proposed a 3 D U-Net model. This model aims to make the U-Net structure have richer spatial information. Its network structure is shown in Figure 5. The network structure is similar to U- Net, with one encoding path and one decoding path. Each path has four resolution levels. Each layer in the encoding path contains two 3 × 3 convolutions, followed by a ReLU layer. It uses a maximum pooling layer to reduce dimensionality. In the decoding path, each layer contains a 2 × 2 × 2 deconvolution layer with a stride of 2, followed by two 3 × 3 × 3 convolution layers. Each convolution is followed by a ReLU layer. Through a shortcut, the layer with same resolution in encoding path is passed to the decoding path, providing it with original high-resolution features. The network realizes 3 D image segmentation by inputting a continuous 2 D slice sequence of 3 D images. This network can not only train on a sparsely labeled data set and predict other unlabeled places on this data set, but also train on multiple sparsely labeled data set and then predict new data. Compared with U- Net input, the input is a stereo image (132 × 132 × 116) and it has three channels. The output image size is 44 × 44 × 28. 3 D U-Net retains the excellent original features of FCN and U-Net. Its advent is of great help to volumetric images. Figure 4. The structure of the U-Net [ 33 ].
[[[ p. 11 ]]]
[Summary: This page discusses 2D U-Net, a network designed for biomedical images, widely used in medical image segmentation after its proposal. It explains that the U-Net network is composed of a U channel and skip-connection. The U channel is similar to the encoder-decoder structure of SegNet. The encoder has four submodules, each of which contains two convolutional layers.]
Sustainability 2021 , 13 , 1224 11 of 29 The reason why U-Net is suitable for medical image segmentation is that its structure can simultaneously combine low-level and high-level information. The low-level information helps to improve accuracy. The high-level information helps to extract complex features 4.2.2. 3 D U-Net The improvement of U-Net has become a research hotspot in medical image segmentation. Many variants have been developed on this basis. Çiçek et al. [ 44 ] proposed a 3 D U-Net model. This model aims to make the U-Net structure have richer spatial information. Its network structure is shown in Figure 5 . The network structure is similar to U-Net, with one encoding path and one decoding path. Each path has four resolution levels. Each layer in the encoding path contains two 3 × 3 convolutions, followed by a ReLU layer. It uses a maximum pooling layer to reduce dimensionality. In the decoding path, each layer contains a 2 × 2 × 2 deconvolution layer with a stride of 2, followed by two 3 × 3 × 3 convolution layers. Each convolution is followed by a ReLU layer. Through a shortcut, the layer with same resolution in encoding path is passed to the decoding path, providing it with original high-resolution features. The network realizes 3 D image segmentation by inputting a continuous 2 D slice sequence of 3 D images. This network can not only train on a sparsely labeled data set and predict other unlabeled places on this data set, but also train on multiple sparsely labeled data set and then predict new data. Compared with U-Net input, the input is a stereo image (132 × 132 × 116) and it has three channels. The output image size is 44 × 44 × 28. 3 D U-Net retains the excellent original features of FCN and U-Net. Its advent is of great help to volumetric images Sustainability 2021 , 13 , x FOR PEER REVIEW 12 of 30 Figure 5. The structure of the 3 D U-Net [44]. 4.2.3. V-Net Milletari et al. [45] proposed a 3 D deformation structure V-Net of the U-Net network structure Its network structure is shown in Figure 6. The V-Net structure uses the Dice coefficient loss function instead of traditional cross-entropy loss function. It uses a 3 D convolution kernel to convolve image and reduces the channel dimension through a 1 × 1 × 1 convolution kernel. On the left side of the network is a gradually compressed path, which is divided into many stages. Each stage contains one to three convolutional layers. In order to make each stage learn a parameter function, the input and output of each stage are added to obtain learning of residual function. The size of the convolution kernel used in each stage of the convolution operation is 5 × 5 × 5. The convolution operation is used to extract features of data, while, at the same time, at the end of each “stage”, through the appropriate step size, the resolution of the data is reduced. On the right side of the network is a gradually decompressed path. It extract features and expand the spatial support of lower resolution feature maps to collect and combine necessary information to output dual-channel volume segmentation. The final output size of network is consistent with the original input size. Figure 6. The structure of the V-Net [45]. Figure 5. The structure of the 3 D U-Net [ 44 ]. 4.2.3. V-Net Milletari et al. [ 45 ] proposed a 3 D deformation structure V-Net of the U-Net network structure. Its network structure is shown in Figure 6 . The V-Net structure uses the Dice coefficient loss function instead of traditional cross-entropy loss function. It uses a 3 D convolution kernel to convolve image and reduces the channel dimension through a 1 × 1 × 1 convolution kernel. On the left side of the network is a gradually compressed path, which is divided into many stages. Each stage contains one to three convolutional layers. In order to make each stage learn a parameter function, the input and output of each stage are added to obtain learning of residual function. The size of the convolution kernel used in each stage of the convolution operation is 5 × 5 × 5. The convolution operation is used to extract features of data, while, at the same time, at the end of each “stage”, through the appropriate step size, the resolution of the data is reduced. On the right side of
[[[ p. 12 ]]]
[Summary: This page discusses 3D U-Net, a model aiming to enrich the spatial information of the U-Net structure. It explains that the network structure is similar to U-Net, with one encoding path and one decoding path. Each path has four resolution levels. It realizes 3D image segmentation by inputting a continuous 2D slice sequence of 3D images.]
Sustainability 2021 , 13 , 1224 12 of 29 the network is a gradually decompressed path. It extract features and expand the spatial support of lower resolution feature maps to collect and combine necessary information to output dual-channel volume segmentation. The final output size of network is consistent with the original input size Sustainability 2021 , 13 , x FOR PEER REVIEW 12 of 30 Figure 5. The structure of the 3 D U-Net [44]. 4.2.3. V-Net Milletari et al. [45] proposed a 3 D deformation structure V-Net of the U-Net network structure Its network structure is shown in Figure 6. The V-Net structure uses the Dice coefficient loss function instead of traditional cross-entropy loss function. It uses a 3 D convolution kernel to convolve image and reduces the channel dimension through a 1 × 1 × 1 convolution kernel. On the left side of the network is a gradually compressed path, which is divided into many stages. Each stage contains one to three convolutional layers. In order to make each stage learn a parameter function, the input and output of each stage are added to obtain learning of residual function. The size of the convolution kernel used in each stage of the convolution operation is 5 × 5 × 5. The convolution operation is used to extract features of data, while, at the same time, at the end of each “stage”, through the appropriate step size, the resolution of the data is reduced. On the right side of the network is a gradually decompressed path. It extract features and expand the spatial support of lower resolution feature maps to collect and combine necessary information to output dual-channel volume segmentation. The final output size of network is consistent with the original input size. Figure 6. The structure of the V-Net [45]. Figure 6. The structure of the V-Net [ 45 ]. 4.2.4. Other U-Net Structures Res-UNet (Weighted Res-UNet) [ 46 ] and H-DenseUNet (hybrid densely connected UNet) [ 47 ] are inspired by residual connections and dense connections, respectively. Each submodule of U-Net is replaced with a residual connection and dense connection. Res- UNet is used for image segmentation about retinal blood vessels. In the segmentation of retinal vessels, we often encounter problems of missing small blood vessels and poor segmentation of optic disc. The structure of retinal blood vessels is similar to the bifurcation structure of trees. When blood vessels are too thin to detect, this structure is difficult to maintain. For these challenges, Xiao et al. proposed a weighted Res-UNet. Based on the original U-Net model, a weighted attention mechanism is added. This allows the model to learn more for distinguish characteristics of blood vessels and nonvascular pixels, and to better maintain retinal vessel tree structure. H-DenseUNet is used to segment liver and liver tumor from the contrast-enhanced CT volumes. The network takes each 3 D input and transforms the 3 D volume into 2 D adjacent slices through the transformation processing function F proposed in the article. Then these 2 D slices are sent to 2 D DenseUNet to extract the intraslice features. The original 3 D input and predicted result after 2 D DenseUNet conversion are concat sent to 3 D network for extracting interslice features. Finally, the two features are fused and result is predicted through the HFF layer. Ibtehaz et al. [ 48 ] proposed MultiResUNet that based on probable scopes for improvement to analyze the U-Net model architecture. The authors proposed a MultiRes block to replace sequence of two convolutional layers. In addition to introduction of the MultiRes block, the common shortcut connection is replaced with proposed Res path. Finally, the authors conducted experiments on public medical image data sets of different modes. The results showed that MultiResUNet has a high accuracy rate. Since the organs or tissues to be segmented in medical images vary in shape and size, this aspect is one of the difficulties to be solved by medical images. Oktay et al. [ 49 ] introduced the attention mechanism in U-Net and proposed Attention UNet. Before splicing features at each resolution of encoder with corresponding features in the decoder, they used an attention module to readjust the encoder’s output characteristics. In U-Net, the encoder consists of several convolutional
[[[ p. 13 ]]]
[Summary: This page discusses V-Net, a 3D deformation structure of the U-Net network structure. The V-Net structure uses the Dice coefficient loss function instead of traditional cross-entropy loss function. It uses a 3D convolution kernel to convolve image and reduces the channel dimension through a 1 × 1 × 1 convolution kernel.]
Sustainability 2021 , 13 , 1224 13 of 29 layers and pooling layers. Since they are all local operations, only local information can be seen. Therefore, long-distance information needs to be extracted by stacking multiple layers. This method is relatively inefficient, with a large amount of parameters and a large amount of calculation. Wang et al. [ 50 ] proposed a new U-Net model based on selfattention, called nonlocal U-Nets. A new up/down sampling method is proposed: global aggregation block, which combines self-attention and up/down sampling. It considers the full image information while up/down sampling, so as to obtain a more accurate segmentation image while reducing parameters 4.3. Generative Adversarial Network A new method of training generative models to generate adversarial networks has recently been introduced. Goodfellow et al. [ 51 ] proposed an adversarial method in 2014 to learn a deep generative model, GAN. Its structure is shown in the Figure 7 and consists of two parts. The first part is the generation network, which receives a random noise z (random number) and generates an image through this noise. The second part is to fight against the network, which is used to judge whether an image is “real”. Its input parameter is x (an image), and output D (x) represents the probability that x is a real image. Simply put, it is through training to make two networks compete with each other. Generation network generates fake data, and the adversarial network uses a discriminator to determine authenticity. Finally, it is hoped that data generated by the generator can be fake Sustainability 2021 , 13 , x FOR PEER REVIEW 14 of 30 Figure 7. The structure of the GAN. 4.3.1. First GAN for Segmentation Combining the requirements of semantic segmentation and characteristics of GAN, Luc et al. [52] trained a convolutional semantic segmentation network and an adversarial network. This paper was the first time that GAN ideas were applied to semantic segmentation. The loss function of this network is: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 , , , ,1 , ,0 n s a mce n n bce n n bce n n N s x y a x y a x s x θ θ λ = = − + (2) Among them, and represent parameters of the segmentation model and adversarial model respectively. N is the size of data set are training images and corresponding label maps , is the scalar probability of the ground truth label map y being x predicted by adversarial model. So, ∙ is a label map produced by the segmentation model. ℓ and ℓ are binary and multiclass cross-entropy losses, respectively. Segmentor is a traditional CNN-based segmentation network. Segmentor is a traditional CNN-based segmentation network, which attempts to generate a segmentation map that is close to ground truth so that it looks more realistic. The adversarial network is the discriminator in GAN. The training process is classic game idea, which mutually improves the network’s ability to improve segmentation accuracy and discrimination ability. 4.3.2. Segmentation Adversarial Network (SegAN) Xue et al. [53] proposed the U-Net structure as the generator of GAN, called segmentation adversarial network (SegAN). For medical image segmentation, U-Net cannot effectively solve the problem of unbalanced pixel categories in the image. Based on the above problem, authors designed a new segmentation network based on the ideas of GAN, and proposed a multiscale L 1 loss to optimize the segmentation network. Its network structure is divided into two parts: segmentor network S and critic network C In the min-max game, the segmenter and critic network are trained alternately and finally a model with good performance is obtained. The training goal of S is to minimize the multiscale L 1 loss we proposed, while the training goal of C is to maximize the loss function. Segmentor network S is a common U-Net structure. We use the convolutional layer with kernel size 4 × 4 and stride 2 for downsampling, and perform upsampling by image resize layer with a factor of 2 and convolutional layer with kernel size 3 × 3 stride 1. The critic network is fed with two inputs: original images masked by ground truth label maps, and original images masked by predicted label maps from S. The experiment is on the BRATS (brain tumor segmentation) brain tumor segmentation data set is more effective and stable for segmentation task. Compared with single-scale loss function, the multiscale loss function multiscale L 1 loss proposed by the authors optimizes the entire segmentation network. Figure 7. The structure of the GAN 4.3.1. First GAN for Segmentation Combining the requirements of semantic segmentation and characteristics of GAN, Luc et al. [ 52 ] trained a convolutional semantic segmentation network and an adversarial network. This paper was the first time that GAN ideas were applied to semantic segmentation. The loss function of this network is: ` ( θs , θ a ) = n = 1 ∑ N ` mce ( s ( x n ) , y n ) − λ [ ` bce ( a ( x n , y n ) , 1 ) + ` bce ( a ( x n , s ( x n )) , 0 )] (2) Among them, θs and θ a represent parameters of the segmentation model and adversarial model respectively. N is the size of data set x n are training images and corresponding label maps y n a ( x , y ) is the scalar probability of the ground truth label map y being x predicted by adversarial model. So, s ( · ) is a label map produced by the segmentation model ` bce and ` mce are binary and multiclass cross-entropy losses, respectively. Segmentor is a traditional CNN-based segmentation network. Segmentor is a traditional CNN-based segmentation network, which attempts to generate a segmentation map that is close to ground truth so that it looks more realistic. The adversarial network is the discriminator in GAN. The training process is classic game idea, which mutually improves the network’s ability to improve segmentation accuracy and discrimination ability.
[[[ p. 14 ]]]
[Summary: This page discusses Res-UNet and H-DenseUNet, which are inspired by residual connections and dense connections, respectively. Each submodule of U-Net is replaced with a residual connection and dense connection. Res-UNet is used for image segmentation about retinal blood vessels. H-DenseUNet is used to segment liver and liver tumor from the contrast-enhanced CT volumes.]
Sustainability 2021 , 13 , 1224 14 of 29 4.3.2. Segmentation Adversarial Network (SegAN) Xue et al. [ 53 ] proposed the U-Net structure as the generator of GAN, called segmentation adversarial network (SegAN). For medical image segmentation, U-Net cannot effectively solve the problem of unbalanced pixel categories in the image. Based on the above problem, authors designed a new segmentation network based on the ideas of GAN, and proposed a multiscale L 1 loss to optimize the segmentation network. Its network structure is divided into two parts: segmentor network S and critic network C. In the min-max game, the segmenter and critic network are trained alternately and finally a model with good performance is obtained. The training goal of S is to minimize the multiscale L 1 loss we proposed, while the training goal of C is to maximize the loss function. Segmentor network S is a common U-Net structure. We use the convolutional layer with kernel size 4 × 4 and stride 2 for downsampling, and perform upsampling by image resize layer with a factor of 2 and convolutional layer with kernel size 3 × 3 stride 1. The critic network is fed with two inputs: original images masked by ground truth label maps, and original images masked by predicted label maps from S. The experiment is on the BRATS (brain tumor segmentation) brain tumor segmentation data set is more effective and stable for segmentation task. Compared with single-scale loss function, the multiscale loss function multiscale L 1 loss proposed by the authors optimizes the entire segmentation network 4.3.3. Structure Correcting Adversarial Network (SCAN) Chest X-ray (CXR) is the most common X-ray used to diagnose various cardiopulmonary abnormalities in daily clinical practice. Due to the low cost and low dose radiation of CXR, it accounts for more than 55% of the total number of medical images. Therefore, it is important to develop computer-aided detection methods that support chest X-rays to support clinicians. Dai et al. [ 54 ] proposed a structure correction confrontation network (SCAN) to segment the lung field and heart in CXR images. This network adopted idea that Luc et al. first used GAN for image segmentation. The difference is that both the segmentation network and discriminant network use a fully convolutional network. For the first time, the fully convolutional network is used for segmentation and critic. The segmentation network is a fully convolutional network. Under the strict constraints of a very limited training data set of 247 images, FCNs are applied to grayscale CXR images. The FCN here departs from the usual VGG architecture, and can train the network without transferring learning from existing models. The critic network imposes structural regularity from human physiology on the convolutional segmentation network. During the training process, the critic network learned to distinguish ground truth organ annotations from a mask synthesized by the segmentation network. Through this confrontation process, the critic network learns higher-order structures and instructs the segmentation model to achieve realistic segmentation results. In addition, SCAN simplified the downsampling module based on the particularity of CXR images 4.3.4. Projective Adversarial Network (PAN) Three-dimensional medical image segmentation has always been a problem to be solved. Khosravan et al. [ 55 ] proposed a new segmentation network PAN to capture 3 D semantics in an efficient and computationally efficient way. PAN integrates high-level 3 D information through 2 D projection, without relying on 3 D images or enhancing the complexity of segmentation. The network backbone is a segmentor and two adversarial networks. The segmentor contains 10 convolution layers in the encoder and 10 convolution layers in the decoder. The input of segmentor is a two-dimensional grayscale image. The output is a pixel-level probability map. The goal of designing adversarial networks is to compensate for missing global relations and correct the high-order inconsistencies caused by the loss of a single pixel. An adversarial signal is generated by these networks and applies it to the segmentor as part of the overall loss function. The adversarial network is only used in the training phase to improve performance of the segmentor without increasing its complexity. The first adversarial network captures continuity of high-level
[[[ p. 15 ]]]
[Summary: This page introduces Generative Adversarial Networks (GANs), consisting of a generation network that creates images from random noise and an adversarial network that judges image authenticity. The goal is for the generator to produce data that the discriminator can't distinguish from real data, thus improving the generator's output quality.]
Sustainability 2021 , 13 , 1224 15 of 29 spatial labels. The second adversarial network uses a 2 D projection learning strategy to enhance 3 D semantics. It is also equivalent to adding a high-dimensional constraint through GAN, but not as direct as 3 D U-Net. PAN can be applied to any 3 D object segmentation problem, and is not specific to a single application 4.3.5. Distributed Asynchronized Discriminator GAN (AsynDGAN) GAN can not only improve performance of medical image segmentation, but also contribute to data processing of medical image segmentation. The privacy of medical data is a very important issue, which leads to very few medical data sets. However, training a successful deep learning algorithm for medical image segmentation requires sufficient data. Data enhancement can alleviate this problem slightly. We can use GAN-based data enhancement as a data expansion method for medical image segmentation. In CVPR 2020, Chang et al. [ 56 ] proposes a data privacy-preserving and communication efficient distributed GAN learning framework named distributed asynchronized discriminator GAN (AsynDGAN). AsynDGAN is composed of a central generator and multiple distributed discriminators located in different medical entities. The central generator accepts the input of a specific task and generates a composite image to fool the discriminator. The central generator is an encoder decoder network, which includes two convolutional layers with stride of 2 for downsampling, nine residual blocks and two transposed convolutions. The discriminator learns to distinguish the real image from the synthetic image generated by the central generator. AsynDGAN does not need to share data, protect data security, and achieve a distributed GAN learning framework for efficient communication. It realizes the use of a distributed discriminator to train a central generator. The generated data can be used for segmentation model training, which improves segmentation accuracy 4.3.6. Other GAN Structures Zhao et al. [ 57 ] proposed Deep-supGAN to map the 3 D MR data of the head to its CT image to facilitate segmentation of craniomaxillofacial bony structure. In order to obtain better conversion results, they proposed a deep-supervision discriminator, which uses the feature representation extracted by the pretrained VGG-16 model to distinguish between real and synthetic CT images. It provides gradient updates to the generator. The first block in the structure is used to generate high-quality CT images from MRI. The second block is used to segment bone structures from MRI and generated CT images. In the case of segmenting 3 D multimodal medical images, such as the PAN mentioned earlier there are often very few label examples used for training, resulting in insufficient model training. Using the application of antagonistic learning in semisupervised segmentation, Arnab et al. [ 58 ] proposed to use generative adversarial learning for a few-shot 3 D multimodal medical image segmentation. Based on the advantages about the combination of adversarial learning and semisupervised segmentation, a new method of generating adversarial networks is used to train segmentation models with labeled and unlabeled images. Compared with the advanced segmentation network trained in a fully supervised manner, the performance of this network is greatly improved. It is worth studying to train an effective segmentation model using unannotated images. Zhang et al. [ 59 ] proposed a new deep adversarial network (DAN) for medical image segmentation, with the goal of obtaining good segmentation results on both annotated and unannotated images. The network includes a segmentation network and an evaluation network, which can effectively use unannotated image data to obtain better segmentation results. Some papers have also successfully applied adversarial learning to medical image segmentation. Yang et al. [ 60 ] proposed GANs that use U-Net as a generator to segment the liver in three-dimensional CT image of the abdomen In addition to segmentation, the application of generative adversarial networks in medical images also plays an important role in image enhancement. In the training of medical image segmentation model, the model is overfitted due to the insufficient data set. This problem is very common in medical image analysis. A solution to insufficient training data
[[[ p. 16 ]]]
[Summary: This page discusses the first GAN for segmentation, where a convolutional semantic segmentation network and an adversarial network are trained. The loss function of this network is a combination of cross-entropy losses. The training process follows the classic game idea, mutually improving the network's ability to improve segmentation accuracy and discrimination ability.]
Sustainability 2021 , 13 , 1224 16 of 29 set is data augmentation. The GAN-based data enhancement technology for segmentation tasks is widely used in different medical images. Conditional GANs (cGAN) [ 61 ] and CycleGANs [ 62 ] have been used in various ways to synthesize certain types of medical images Bayramoglu et al. [ 63 ] used cGANs to stain unstained hyperspectral lung histopathological images to make them look like H&E (Hematoxylin & Eosin Histology) stained versions. Dar et al. [ 64 ] proposed a new method of multicontrast MRI synthesis based on conditional generative adversarial networks. Wolterink et al. [ 65 ] used CycleGAN to convert 2 D MR images into CT images. No matching image pairs are required, and training brings better results 5. The Segmentation Method for Various Human Organ Area The human body has multiple organs and tissues. Different parts have their specificities. For example, the segmentation area for diagnosing brain tumors and lung nodules is relatively large, while retinal blood images require segmentation of blood vessels. The latter requires higher segmentation accuracy. Researchers extract ideas from these messages and design segmentation algorithms for different organs to improve accuracy of segmentation The best way to segment different organs will be introduced below. Through reading the literature, we summarized the segmentation methods of brain, eyes, chest, abdomen, heart and other parts besides, and drew Tables 1 – 6 . 5.1. Brain The analysis of brain-related diseases generally requires MRI. Brain imaging analysis is widely used to study brain diseases such as Alzheimer’s disease [ 66 ], epilepsy, schizophrenia, multiple sclerosis, cancer, and neurodegenerative diseases. Myronenko et al. [ 67 ] proposed a deep learning network 3 D MRI brain tumor segmentation based on asymmetric FCN and combined with residual learning. It won the first place in the 2018 challenge. Nie et al. [ 68 ] obtained T 1, T 2 and diffusion weighted modal neural images of 11 healthy infants. The authors conducted network optimization by integrating contextual semantic information and fusing features of different scales, and segmented multimodal brain MRI images using 3 D FCN. Wang et al. [ 69 ] proposed a CRF-based edge-sensing FCN, which achieved more accurate edge segmentation by adding edge information into the loss function. The accuracy of the model was up to 87.31%, far higher than that of FCN-8 S and other basic semantic segmentation networks. Borne et al. [ 70 ] selected 62 healthy brain images from different heterogeneous databases as the training set, and segmented them using 3 D U-Net. The result was 85% correct. Casamitjana et al. [ 71 ] proposed the cascaded V-Net segmentation of brain tumor, dividing the brain tumor segmentation problem into two simpler tasks, the segmentation of entire tumor and the division of different tumor regions. There are a lot of segmentations using GAN. For example, Moeskops et al. [ 72 ] used adversarial training to improve the segmentation performance of brain MRI in fully convolutional and a network structure with dilated convolutions. Rezaei et al. [ 73 ] used cGAN to train a semantic segmentation convolutional neural network, which has a superior ability for brain tumor segmentation. Focusing on the segmentation task of MRI brain tumors, Giacomello et al. [ 74 ] proposed SegAN-CAT, a deep learning architecture based on a generative adversarial network. They apply a trained model to different modalities through transfer learning. SegAN-CAT is different from SegAN in that the loss function is extended, a dice loss term is added. The input of the discriminator network is composed of MRI image stitching and segmentation. By training several brain tumor segmentation models on the BRATS 2015 and BRATS 2019 data sets for testing, SegAN-CAT has better performance than SegAN.
[[[ p. 17 ]]]
[Summary: This page introduces Segmentation Adversarial Network (SegAN), which uses the U-Net structure as the generator of GAN. The authors designed a new segmentation network based on the ideas of GAN, and proposed a multiscale L1 loss to optimize the segmentation network. Its network structure is divided into two parts: segmentor network S and critic network C.]
Sustainability 2021 , 13 , 1224 17 of 29 Table 1. Segmentation CNN-based methods for the brain Reference Object Modalities Network Type Data Set Myronenko et al. [ 67 ] Brain MRI FCN BRATS 2018 Nie et al. [ 68 ] Brain MRI 3 D FCN Infant brain images Wang et al. [ 69 ] Brain MRI FCN ANDI data set and NITRC data set Borne et al. [ 70 ] Brain MRI 3 D U-Net 62 healthy brain images Casamitjana et al. [ 71 ] Brain MRI V-Net BRATS 2017 Moeskops et al. [ 72 ] Brain MRI GAN MRBrainS 13 Rezaei et al. [ 73 ] Brain MRI cGAN BRATS 2017 Giacomello et al. [ 74 ] Brain MRI SegAN-CAT BRATS 2015, BRATS 2019 Table 2. Segmentation CNN-based methods for the eye Reference Object Modalities Network Type Data Set Leopold et al. [ 75 ] Eye Funduscopy PixelBNN DRIVE, STARE, CHASEDB 1 Zhang et al. [ 76 ] Eye Funduscopy U-Net DRIVE, STARE, CHASEDB 1 Jaemin et al. [ 77 ] Eye Funduscopy GAN DRIVE, STARE Edupuganti et al. [ 78 ] Eye Funduscopy FCN Drishti-GS data set Shankaranarayana et al. [ 79 ] Eye Funduscopy FCN RIM-ONE Xiao et al. [ 46 ] Eye Funduscopy Res-UNet DRIVE 5.2. Eye Retinal blood image segmentation is a challenging subject in the research of retinal pathology. The problem of missing small and weak blood vessels or oversegmentation has not been solved. Methods based on deep learning are even better than human experts in retinal vessel segmentation. Leopold et al. [ 75 ] proposed a fast architecture for retinal vessel segmentation, a fully-residual autoencoder batch normalization network (PixelBNN) It is based on U-Net, PixelCNN. It also uses skip connections and batch normalization within FCN. Finally, the model is trained, tested and cross-tested on the DRIVE (Digital Retinal Images for Vessel Extraction), STARE (STructured Analysis of the Retina) and CHASEDB 1(Child Heart Health Study in England) retinal blood vessel segmentation data sets. The test time and performance are relatively good. Zhang et al. [ 76 ] used U-Net with residual connection to detect vessels, and introduced an edge-sensing mechanism to add additional labels to the boundary area to improve accuracy. They conducted experiments on STARE, CHASEDB 1 and DRIVE. Jaemin et al. [ 77 ] proposed a method that uses generative adversarial training to generate precise segmentation of retinal blood vessels. This method proposes that the segmented blood vessels are clear and sharp, with fewer false positives It finally achieved the most advanced performance on the two public data sets DRIVE and STARE. In Section 4 , we introduced Res-UNet, which can also be used for retinal vessel segmentation. It focuses on the target ROI (region of interest) and discards irrelevant noise to solve great influence of noise on vessel’ shape. For optic disc and cup segmentation, which is one of the important parameters for glaucoma screening. Edupuganti et al. [ 78 ] used FCN to segment optic disc and cupped area in fundus images to assist the diagnosis of glaucoma. Using the concept of residual learning, Shankaranarayana et al. [ 79 ] proposed an improved architecture based on FCNs. They used adversarial training to improve the segmentation results 5.3. Chest Because chest X-ray examination is quick and easy, it is the most common medical image in medicine. Chest X-rays use very small doses of radiation to produce images of the chest. In chest X-rays, we can realize the segmentation of the lung area [ 80 ]. It can be used to help diagnose and monitor various lung diseases, such as pneumonia and lung cancer The SCAN mentioned in Section 4 is used for lung fields and the heart segmentation in chest X-ray. The proposed framework was extensively evaluated on the JSRT (Japanese
[[[ p. 18 ]]]
[Summary: This page discusses the Structure Correcting Adversarial Network (SCAN) for segmenting the lung field and heart in CXR images. This network adopts the idea that Luc et al. first used GAN for image segmentation. Both the segmentation network and discriminant network use a fully convolutional network.]
Sustainability 2021 , 13 , 1224 18 of 29 Society of Radiological Technology) and Montgomery data sets, and it was proved that this method can perform high-precision and realistic segmentation of lung fields and heart in CXR images. Novikov et al. [ 81 ] made some modifications to U-Net for overfitting the model and the number of parameters, and proposed an all-convolutional modification of the original U-Net. By replacing the pool with strided convolutions to solve simplification problem of convolutional networks, the parameters are reduced by about ten times, while maintaining accuracy and achieving better results. The models are trained and tested on the JSRT database, and the performance exceeds expert observations of the lungs and heart. In CT and MRI image studies of the chest, Anthimopoulos et al. [ 82 ] used FCN with atrous convolution structure and multiscale feature fusion to segment lung parenchyma, healthy tissue, micronodules and honeycomb structures in lung CT images. Finally, it was verified on 172 high-resolution CT images collected from multiple medical institutions. A fully convolutional network was used to construct multiple shared representations between CT and MRI. Jue et al. [ 83 ] developed a learning method derived from cross-modality, using MR information derived from CT for hallucination MRI to improve CT segmentation Table 3. Segmentation CNN-based methods for the chest Reference Object Modalities Network Type Data Set Dai et al. [ 54 ] Chest CXR SCAN JSRT, Montgomery Novikov et al. [ 81 ] Chest CXR U-Net JSRT Anthimopoulos et al. [ 82 ] Chest CT FCN A data set of 172 sparsely annotated CT scans Jue et al. [ 83 ] Chest CT, MRI U-Net, dense-FCN TCIA, NSCLC Table 4. Segmentation CNN-based methods for the abdomen Reference Object Modalities Network Type Data Set Christ et al. [ 84 ] Liver CT, MRI FCN 3 DIRCADb and other Han et al. [ 85 ] Liver CT DCNN LiTS Oktay et al. [ 49 ] Pancreas CT Attention U-Net TCIA Yang et al. [ 60 ] Liver CT DI 2 IN-AN 1000 CT volumes Huo et al. [ 86 ] Spleen MRI SSNet 60 clinically acquired abdominal MRI scans Table 5. Segmentation CNN-based methods in cardiology Reference Object Modalities Network Type Data Set Tran et al. [ 87 ] Left and right ventricles MRI FCN SCD, LVSC, RVSC Xu et al. [ 88 ] The whole heart CT CFUN MM-WHS 2017 Dong et al. [ 89 ] Left ventricles 3 D echocardiography VoxelAtlasGAN 60 subjects on 3 D echocardiography Zhang et al. [ 90 ] Cardiac MRI LU-Net ACDC Stacom 2017 Ye et al. [ 91 ] The whole heart CT 3 D U-Net MICCAI 2017 whole-heart Xia et al. [ 92 ] Left atrium MRI 3 D U-Net LASC 2018 5.4. Abdomen In CT and MRI abdomen images, we can segment the liver, spleen, kidney and other organs. Christ et al. [ 84 ] proposed cascaded fully convolutional neural networks (CFCNs) to automatically segment liver and lesions in CT or MRI abdomen images. This network is composed of two FCNs cascaded. The first FCN segments the liver ROI area used as the input of the second FCN. The second FCN is only for lesions within the liver ROIs in the first FCN. The experiment was implemented on an abdominal CT data set comprising 100 hepatic tumor volumes and 3 DIRCADb data set. Han et al. [ 85 ] developed a deep convolutional neural network method, which belongs to the category of “fully convolutional neural networks”. The DCNN model takes a bunch of adjacent slices as input and generates a segmentation map corresponding to the central slice, so it works in
[[[ p. 19 ]]]
[Summary: This page discusses the Projective Adversarial Network (PAN), a segmentation network designed to capture 3D semantics through 2D projection. PAN integrates high-level 3D information through 2D projection, without relying on 3D images or enhancing the complexity of segmentation. The network backbone is a segmentor and two adversarial networks.]
Sustainability 2021 , 13 , 1224 19 of 29 2.5 D. Oktay et al. [ 49 ] extended U-Net model to an attention U-Net model for pancreas segmentation, which presented an attention gate. They have 120 CT images as the training set and 30 images as the test set. It is 2% to 3% higher than other models in the dice score indicator. It is essential in many clinical applications of liver segmentation in 3 D medical images. GAN is also used more in the segmentation of organs about the abdomen. Yang et al. [ 60 ] proposed a segmentation of liver method that using an adversarial image to image network (DI 2 IN-AN). The generator generates segmentation predictions. The discriminator classifies predictions and ground truth during the training process. When segmenting the spleen on an MRI image, the size and shape of the spleen cause vast false positive and false negative labeling. Huo et al. [ 86 ] proposed the splenomegaly segmentation network (SSNet) for this. The cGAN framework is introduced into SSNet. In order to reduce false negatives and false positives, the generator uses a global convolutional network (GCN), and Markovian discriminator (PatchGAN) is used to replace the general generator 5.5. Cardiology The heart is an important organ in our body. However, various heart diseases also seriously threaten the lives of many people. It is necessary to realize automatic segmentation of the heart region to solve practical problems in the field of cardiac medical treatment. For the first time, Tran et al. [ 87 ] applied a fully convolutional neural network architecture to pixel classification for cardiac magnetic resonance imaging. The proposed FCN architecture achieves the most advanced semantic segmentation in short-axis cardiac MRI. The authors conducted experiments to segment the left and right ventricles on the SCD (Sunnybrook cardiac data), LVSC (Left ventricle segmentation challenge), and RVSC (Right Ventricle Segmentation Challenge) data sets. Xu et al. [ 88 ] combined Faster R-CNN with fast detection capabilities and 3 D U-Net with powerful segmentation capabilities, and proposed a CFUN to obtain the results of the whole heart segmentation. The authors selected 60 heart CT images from the MM-WHS 2017 challenge, which contains 20 training volumes and 40 test volumes. Dong et al. [ 89 ] proposed VoxelAtlasGAN based on the cGAN framework and used V-Net atlas-based segmentation in the generator. This is the first time that cGAN has been used for 3 D left ventricle segmentation on echocardiography. Zhang et al. [ 90 ] proposed an improved U-Net named LU-Net, in order to solve the problem of U-Net’s low accuracy in cardiac ventricular segmentation. LU-Net has been improved in three aspects: the effectiveness of extracting original image features, the degree of pixel location information loss, and the traditional U-Net segmentation accuracy. In order to obtain a finer whole-heart segmentation, Ye et al. [ 91 ] proposed a new deep-supervised 3 D U-Net, which is applied to the original network in multiple depths to better extract context information Xia et al. [ 92 ] proposed a fully automated two-stage segmentation framework that included the first 3 D U-Net for roughly locating the atrial center from downsampled images. The second 3 D U-Net for accurately segmenting the atrial catheters in the original images at full resolution. The current state-of-the-art for cardiac image segmentation based on deep learning is summarized in this review [ 93 ]. 5.6. Other Organs and Lesion Segmentation CNN-based semantic segmentation networks also have important applications in other biomedical image segmentation fields [ 94 , 95 ]. Liu et al. [ 96 ] used SegNet structure as the core network to segment muscles, cartilages and bones from 100 groups of labeled knee MRI images in the MICCAI Challenge data set, so as to provide rapid and accurate segmentation methods of cartilage and other tissues for clinical osteoarthritis research. In addition, SegNet is also used for cell segmentation under the microscope. Tran et al. [ 97 ] used the SegNet structure to segment red blood cells and white blood cells in microscopic blood smear images. Sekuboyina et al. [ 98 ] improved GAN for the structure of the spine and proposed a butterfly shape GAN model, Btrfly Net. Similarly, Han et al. [ 99 ] proposed the application of Spine-GAN to multiple tasks and multiple targets bone marrow segmentation. V-Net combines MRI images using different equipment to achieve an end-
[[[ p. 20 ]]]
[Summary: This page discusses the Distributed Asynchronized Discriminator GAN (AsynDGAN), which addresses the privacy concerns and data scarcity in medical image segmentation. AsynDGAN is composed of a central generator and multiple distributed discriminators located in different medical entities. The central generator accepts the input of a specific task and generates a composite image to fool the discriminator.]
Sustainability 2021 , 13 , 1224 20 of 29 to-end prostate segmentation process. The network outputs segmentation results while calculating the prostate volume for subsequent clinical analysis. Rundo et al. [ 30 ] proposed to merge the squeeze-and-excitation (SE) blocks into U-Net as a new convolutional neural network, USE-Net. The introduction of this structure is expected to enhance the representation ability by modeling the channel dependence of convolutional features. The author conducted experiments on multiple heterogeneous MRI data sets of prostate. The experiments show that the model enhances the segmentation performance and improves the generalization ability. Kohl et al. [ 100 ] proposed a fully convolutional network to detect aggressive prostate cancer. Different from the general FCN, the author first used an adversarial network to distinguish between expert annotations and generated annotations to train FCNs for semantic segmentation. Finally, MRI images of 152 patients were used to segment aggressive prostate cancer. A good score was achieved in the detection sensitivity and the dice score of aggressive prostate cancer. Taha et al. [ 101 ] proposed a convolutional neural network called Kid-Net for segmenting kidney vessels, namely arteries, veins and the collecting system. This segmentation can help doctors make medical decisions before surgical incisions. At the same time, high-resolution segmentation is achieved by reducing false positives in imbalanced data. Izadi et al. [ 102 ] proposed a new method to segment skin lesions by using a generative adversarial network. The input image is divided into two types: lesion and background. Mirikharaji et al. [ 103 ] won the first place in the ISBI 2017 skin segmentation challenge and proposed an end-to-end trainable fully convolutional network framework. Wang et al. [ 104 ] modified the proposed contour segmentation deep learning model by adopting an adversarial training strategy, and proposed the basal membrane segmentation method for the diagnosis of cervical cancer Table 6. Other segmentation CNN-based methods Reference Object Modalities Network Type Data Set Liu et al. [ 96 ] Musculoskeletal MRI SegNet MICCAI Challenge data set Tran et al. [ 97 ] Cell Microscopic SegNet ALL-IDB 1 database Sekuboyina et al. [ 98 ] Spines CT Btrfly Net 302 CT scans Han et al. [ 99 ] Spines MRI Spine-GAN 253 multicenter clinical patients Milletari et al. [ 45 ] Prostate MRI V-Net PROMISE 2012 Rundo et al. [ 30 ] Prostate MRI USE-Net three T 2-weighted MRI data sets kohl et al. [ 100 ] Prostate MRI FCN MRI images of 152 patients Taha et al. [ 101 ] Kidney CT Kid-Net 236 subjects Izadi et al. [ 102 ] Skin Dermoscopy GAN DermoFit Mirikharaji et al. [ 103 ] Skin Dermoscopy FCN ISBI 2017 Wang et al. [ 104 ] Basal membrane Histopathology GAN IPMCH 6. Segmentation Evaluation Metrics and Data Sets 6.1. Evaluation Metrics Evaluating the quality of an algorithm requires a correct objective indicator. In medical segmentation algorithms, doctors’ hand-drawn annotations are usually used as the gold standard (ground truth, GT for short). Other results of the algorithm segmentation are the prediction results (Rseg, SEG for short). The segmentation evaluation of medical images is divided into pixel-based and overlap-based methods Dice index: The dice coefficient is a function for evaluating similarity. It is usually used to calculate the similarity or overlap between two samples. It is also the most frequently used. Its value range is 0 to 1. The closer the value is to 1, the better the segmentation effect. Given two sets A and B, the metrics is defined as: Dice ( A, B ) = 2 | A ∩ B | | A | + | B | (3)
[[[ p. 21 ]]]
[Summary: This page discusses Deep-supGAN to map the 3D MR data of the head to its CT image to facilitate segmentation of craniomaxillofacial bony structure. The first block in the structure is used to generate high-quality CT images from MRI. The second block is used to segment bone structures from MRI and generated CT images.]
Sustainability 2021 , 13 , 1224 21 of 29 Jaccard index: Jaccard index is similar to the dice coefficient. Given two sets A and B, the metrics are defined as: Jaccard ( A, B ) = | A ∩ B | | A ∪ B | (4) Segmentation accuracy (SA): The area of accurate segmentation accounts for the percentage of the real area in the GT image. Among them, R s represents the reference area of the segmented image manually drawn by the expert T s represents the real area of the image obtained by the algorithm segmentation | R s − T s | indicates the number of pixels that are incorrectly segmented SA = 1 − | R s − T s | Rs × 100% (5) Oversegmentation rate: The ratio of pixels that are divided into the reference area of the GT image is calculated as follows: OR = O s R s + O s (6) The pixels in O s appear in the actual segmented image, but do not appear in the theoretical segmented image R s R s represents the reference area of the segmented image manually drawn by the expert Undersegmentation rate: The ratio of the segmentation result to the missing pixels in GT image. Calculated as follows: UR = U s R s + O s (7) The pixels in U s appear in the theoretical segmented image R s , but do not appear in the actual segmented image R s , O s have the same meaning as above Hausdorff distance : This describes a measure of the degree of similarity between two sets of points, that is, the distance between the two boundaries of ground truth and the segmentation result input to the network. Sensitive to the divided boundary H = max i ∈ seg min j ∈ gt ( d ( i , j )) , max j ∈ gt min i ∈ seg ( d ( i , j )) (8) where, i and j are points belonging to different sets d represents the distance between i and j 6.2. Data Sets for Medical Image Segmentation For any model segmentation based on deep learning, it is crucial to collect enough data into the data set. The quality of the segmentation algorithm depends on the high-quality image data provided by the experts and the corresponding label-standardized data set, which enables fair comparison between systems. This section will introduce some public data sets frequently used in the field of medical image segmentation Medical segmentation decathlon (MSD): Simpson et al. [ 105 ] created a large, open source, hand-annotated medical image data set of various anatomical parts. This data set can objectively evaluate general segmentation methods through comprehensive benchmarks, and make the access to medical image data public. The data set has a total of 2633 three-dimensional medical images, involving real clinical applications of multiple anatomical structures, multiple models, and multiple sources (or institutions). It is divided into ten categories: 1 Task 01_BrainTumour: There are a total of 750, and the labels are divided into two categories: Glioma (necrotic/active tumor), edema. It is an MRI scan obtained in routine clinical practice.
[[[ p. 22 ]]]
[Summary: This page discusses the segmentation method for various human organ areas. Different parts have their specificities. For example, the segmentation area for diagnosing brain tumors and lung nodules is relatively large, while retinal blood images require segmentation of blood vessels. The latter requires higher segmentation accuracy.]
Sustainability 2021 , 13 , 1224 22 of 29 2 Task 02_Heart: There are a total of 30, and the label is the left atrium. These data come from the Left Atrial Segmentation Challenge (LASC). Images were obtained on a 1.5 T Achieva scanner with voxel resolution 1.25 × 1.25 × 2.7 mm 3 3 Task 03_Liver: There are 201 sheets in total, with labels divided into liver and tumors The type of imaging is CT. The images were provided with an in-plane resolution of 0.5 to 1.0 mm, and slice thickness of 0.45 to 6.0 mm 4 Task 04_Hippocampus: There are a total of 394, and the labels are hippocampus, head and body. The type of imaging is MRI. The data set consisted of MRI acquired in 90 healthy adults and 105 adults with a nonaffective psychotic disorder 5 Task 05_Prostate: There are a total of 48, and the labels are: Prostate central gland, peripheral zone. The type of imaging is MRI. The prostate data set consisted of 48 multiparametric MRI studies provided by Radboud University (The Netherlands) reported in a previous segmentation study 6 Task 06_Lung: There are a total of 96, and the label is lung tumor. The type of imaging is CT. The lung data set was comprised of patients with non-small-cell lung cancer from Stanford University. The tumor region was denoted by an expert thoracic radiologist on a representative CT cross section using OsiriX 7 Task 07_Pancreas: There are a total of 420, with labels divided into pancreas and pancreatic mass (cyst or tumor). The type of imaging is CT. The pancreas data set consisted of patients whose pancreatic masses were removed 8 Task 08_HepaticVessel: There are a total of 443, and the labels is liver vessels. The type of imaging is CT. This second liver data set consisted of patients with various primary and metastatic liver tumors 9 Task 09_Spleen: There are a total of 61, and the label is the spleen. The type of imaging is CT. The spleen data set comprised of patients undergoing chemotherapy treatment for liver metastases at Memorial Sloan Kettering Cancer Center 10 Task 10_Colon: There are a total of 190, and the label is colon cancer. The type of imaging is CT Segmentation in Chest Radiographs (SCR): All chest radiographs are taken from the JSRT database. The SCR database was created to simplify the comparative study of lung field, heart and clavicle segmentation in standard posterior chest radiographs [ 106 ]. All data in the database are manually segmented to provide reference standards. The image is scanned from film to 2048 × 2048 pixels, with a spatial resolution of 0.175 mm/pixel and a gray scale of 12 bits. Each of the 154 images have a lung nodule, and the other 93 images have no lung nodules Brain tumor segmentation (BRATS): This data set is a brain tumor segmentation competition data set, which is combined with the MICCAI conference [ 107 ]. In order to evaluate the best brain tumor segmentation methods and compare different methods, it has been held every year since 2012. For this reason, the data set is published. There are five types of labels: healthy brain tissue, necrotic area, edema area, tumor enhancement and nonenhancement area. New training sets are added every year Digital database for screening mammography (DDSM): DDSM [ 108 ] is a resource used by the mammography image analysis research community and is widely used by researchers. The database contains approximately 2500 studies. Each study includes two images of each breast, as well as some relevant patient information and image information Ischemic stroke lesion segmentation (ISLES): This provides MRI scans containing a large number of accurate stroke samples and related clinical parameters. This challenge is organized to evaluate stroke pathology and clinical outcome prediction in accurate MRI scan images Liver tumor segmentation (LiTS): These data and segmentations are provided by different clinical sites around the world for the segmentation of liver and liver tumors. The training data set contains 130 CT scans, and the test data set contains 70 CT scans [ 109 ]. Prostate MR image segmentation (PROMISE 12): This data set is used for prostate segmentation. These data include patients with benign diseases (such as benign prostatic
[[[ p. 23 ]]]
[Summary: This page describes the application of CNN-based methods for brain segmentation. Brain imaging analysis is widely used to study brain diseases such as Alzheimer’s disease, epilepsy, schizophrenia, multiple sclerosis, cancer, and neurodegenerative diseases. It mentions several studies that have used different CNN architectures for brain MRI segmentation.]
Sustainability 2021 , 13 , 1224 23 of 29 hyperplasia) and prostate cancer. These cases include a transversal T 2-weighted MR image of the prostate Lung image database consortium image collection (LIDC-IDRI): The data set is composed of chest medical image files (such as CT, X-ray) and corresponding diagnosis result lesion labels. The purpose is to study early cancer detection in high-risk populations A total of 1018 research examples are included. For the images in each example, four experienced thoracic radiologists performed a two-stage diagnosis and annotation [ 110 ]. Open Access Series of Imaging Studies (OASIS): This is a project aimed at enabling the scientific community to provide brain MRI data sets free of charge. A third generation has been released. OASIS-3 is a retrospective compilation of more than 1000 participants’ data collected from several ongoing projects through WUSTL Knight ADRC over the past 30 years. OASIS-3 is a longitudinal neuroimaging, clinical, cognitive, and biomarker data set for normal aging and Alzheimer’s disease. Participants included 609 cognitively normal adults and 489 people at various stages of cognitive decline, ages 42 to 95 [ 111 ]. Digital retinal images for vessel extraction (DRIVE): This data set is used to compare the segmentation of blood vessels in retinal images. The photos in the DRIVE database came from a diabetic retinopathy screening project in the Netherlands, and 40 photos were randomly selected. Among them, 33 cases had no signs of diabetic retinopathy and seven cases had signs of mild early diabetic retinopathy. Each image is captured with 768 × 584 pixels with 8 bits per color plane. The field of view of each image is circular with a diameter of approximately 540 pixels. Figure 8 is a sample of the DRIVE data set and its ground truth [ 112 ]. Sustainability 2021 , 13 , x FOR PEER REVIEW 24 of 30 has been released. OASIS-3 is a retrospective compilation of more than 1000 participants’ data collected from several ongoing projects through WUSTL Knight ADRC over the past 30 years. OASIS-3 is a longitudinal neuroimaging, clinical, cognitive, and biomarker data set for normal aging and Alzheimer’s disease. Participants included 609 cognitively normal adults and 489 people at various stages of cognitive decline, ages 42 to 95 [111]. Digital retinal images for vessel extraction (DRIVE): This data set is used to compare the segmentation of blood vessels in retinal images. The photos in the DRIVE database came from a diabetic retinopathy screening project in the Netherlands, and 40 photos were randomly selected. Among them, 33 cases had no signs of diabetic retinopathy and seven cases had signs of mild early diabetic retinopathy. Each image is captured with 768 × 584 pixels with 8 bits per color plane. The field of view of each image is circular with a diameter of approximately 540 pixels. Figure 8 is a sample of the DRIVE data set and its ground truth [112]. ( a ) ( b ) ( c ) Figure 8. Digital retinal images for vessel extraction (DRIVE) sample diagram and manual labeling sample ( a ) The blood vessels in retinal RGB image; ( b ) manual annotation 1 of sample; ( c ) manual annotation 2 of sample. Mammographic Image Analysis Society (MIAS): MIAS is a breast cancer X-ray image database created by a British research organization in 1995. Each pixel has a grayscale of 8 bits. The MIAS database contains left and right breast images of 161 patients, with a total of 322 images, including 208 healthy images, 63 benign breast cancer and 51 malignant breast cancer images. The boundary of the lesion area has also been calibrated by experts [113] Sunnybrook cardiac data (SCD): It also known as the 2009 cardiac MR left ventricle segmentation challenge data, and consists of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction [114]. In addition to the several data sets commonly used for medical image segmentation described above, there are also many competition data sets that verify the superiority of the algorithm provided by the famous medical image challenge competition. Grand Challenges in Biomedical Image Analysis: It was designed to help people solve global health and development issues. It covers all challenges in the field of medical image analysis, including medical image processing. This is also the biggest challenge in the field of medical image processing, and many excellent algorithms have been born. Liver Tumor Segmentation Challenge: The purpose of this competition is to encourage researchers to study liver lesion segmentation methods. The data and slices of the challenge competition are provided by different clinical sites around the world. The training data set contains 130 CT scans, and the test data set contains 70 CT scans. 2019 Kidney and Kidney Tumor Segmentation Challenge (KiTS 19): The KiTS 19 challenge is the semantic segmentation of kidneys and kidney tumors in contrast-enhanced CT scans. The data set consisted of 300 patients with preoperative arterial-phase abdominal CTs annotated by experts. Two-hundred and ten (70%) of these were released Figure 8. Digital retinal images for vessel extraction (DRIVE) sample diagram and manual labeling sample. ( a ) The blood vessels in retinal RGB image; ( b ) manual annotation 1 of sample; ( c ) manual annotation 2 of sample Mammographic Image Analysis Society (MIAS): MIAS is a breast cancer X-ray image database created by a British research organization in 1995. Each pixel has a grayscale of 8 bits. The MIAS database contains left and right breast images of 161 patients, with a total of 322 images, including 208 healthy images, 63 benign breast cancer and 51 malignant breast cancer images. The boundary of the lesion area has also been calibrated by experts [ 113 ] Sunnybrook cardiac data (SCD): It also known as the 2009 cardiac MR left ventricle segmentation challenge data, and consists of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction [ 114 ]. In addition to the several data sets commonly used for medical image segmentation described above, there are also many competition data sets that verify the superiority of the algorithm provided by the famous medical image challenge competition Grand Challenges in Biomedical Image Analysis: It was designed to help people solve global health and development issues. It covers all challenges in the field of medical image analysis, including medical image processing. This is also the biggest challenge in the field of medical image processing, and many excellent algorithms have been born.
[[[ p. 24 ]]]
[Summary: This page focuses on eye segmentation, particularly retinal blood vessel segmentation. It mentions that methods based on deep learning are even better than human experts in retinal vessel segmentation. The problem of missing small and weak blood vessels or oversegmentation has not been solved. Several CNN-based methods for eye segmentation are discussed.]
Sustainability 2021 , 13 , 1224 24 of 29 Liver Tumor Segmentation Challenge: The purpose of this competition is to encourage researchers to study liver lesion segmentation methods. The data and slices of the challenge competition are provided by different clinical sites around the world. The training data set contains 130 CT scans, and the test data set contains 70 CT scans 2019 Kidney and Kidney Tumor Segmentation Challenge (KiTS 19): The KiTS 19 challenge is the semantic segmentation of kidneys and kidney tumors in contrast-enhanced CT scans. The data set consisted of 300 patients with preoperative arterial-phase abdominal CTs annotated by experts. Two-hundred and ten (70%) of these were released as a training set and the remaining 90 (30%) were held out as a test set. Table 7 is the medical image data sets for segmentation Table 7. Medical image data sets for segmentation Data Set Modalities Objects URL MSD MRI, CT Various http://medicaldecathlon.com/ BRATS MRI Brain https://www.med.upenn.edu/sbia/brats 2018/data.html DDSM Mammography Breast http://www.eng.usf.edu/cvprg/Mammography/Database.html ISLES MRI Brain http://www.isles-challenge.org/ LiTS CT Liver https://competitions.codalab.org/competitions/17094 PROMISE 12 MRI Prostate https://promise 12.grand-challenge.org/ LIDC-IDRI CT Lung https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI OASIS MRI, PET Brain https://www.oasis-brains.org/ DRIVE Funduscopy Eye https://drive.grand-challenge.org/ STARE Funduscopy Eye http://homes.esat.kuleuven.be/~mblaschk/projects/retina/ CHASEDB 1 Funduscopy Eye https://blogs.kingston.ac.uk/retinal/chasedb 1/ MIAS X-ray Breast https://www.repository.cam.ac.uk/handle/1810/250394?show=full SCD MRI Cardiac http://www.cardiacatlas.org/studies/ SKI 10 MRI Knee http://www.ski 10.org/ HVSMR 2018 CMR Heart http://segchd.csail.mit.edu/ 7. Conclusions and Future Directions Although research into medical image segmentation has made great progress, the effect of segmentation still cannot meet the needs of practical applications. The main reason is that the current medical image segmentation research still has the following difficulties and challenges: 1 Medical image segmentation is a cross-disciplinary field between these two disciplines span. Clinical medical pathology conditions are complex and diverse. However, artificial intelligence scientists do not understand clinical needs. Clinicians do not understand the specific technology of artificial intelligence. As a result, artificial intelligence cannot well meet the specific clinical needs. In order to promote the application of artificial intelligence in the medical field, extensive cooperation between clinicians and machine learning scientists should be strengthened. This cooperation will solve the problem that machine learning researchers cannot obtain medical data It can also help machine learning researchers develop deep learning algorithms more in line with clinical needs and apply them to computer-aided diagnosis equipment, thereby improving diagnosis efficiency and accuracy 2 Medical images are different from natural images. There are differences between different medical images. This difference also affects the adaptability of the deep learning model during segmentation. The noise and artifacts of medical images are also a major problem in data preprocessing 3 Limitations of existing medical image data sets. The existing medical image data sets are small in scale. The training of deep learning algorithms requires a large amount of data set support, which leads to the problem of overfitting in the training process of deep learning models. One way to solve the insufficient amount of training data is data enhancement, such as geometric transformation, color space enhancement.
[[[ p. 25 ]]]
[Summary: This page discusses the application of CNN-based methods for chest segmentation. Chest X-ray examination is quick and easy, it is the most common medical image in medicine. It can be used to help diagnose and monitor various lung diseases, such as pneumonia and lung cancer. Several CNN-based methods for chest segmentation are discussed.]
Sustainability 2021 , 13 , 1224 25 of 29 GAN uses original data to synthesize new data. Another method is based on a metalearning model to study medical image segmentation under small sample conditions 4 The deep learning model has its own flaws. It mainly focuses on three aspects: network structure design, 3 D data segmentation model design and loss function design. The design of the network structure is worth exploring. The effect of modifying the network structure is significant and can be easily migrated to other tasks. 3 D medical data can more accurately capture the geometric information of the target, which may be lost when the 3 D data is sliced slice by slice. Therefore, a researchable direction is the design of 3 D convolution models to process 3 D medical image data. The design of loss function has always been a difficult point in deep learning research For medical image segmentation, deep learning has performed very well. More and more new methods are used to continuously improve the accuracy and robustness of segmentation. Diagnosing various diseases through artificial intelligence realizes the idea of sustainable medical treatment. It becomes a powerful tool for clinicians. But it is still an open problem, so we can expect a series of innovations and research results in the next few years Author Contributions: Methodology, X.L.; writing—original draft preparation, L.S.; writing—review and editing, S.L.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript Funding: This research was funded by the Natural Science Foundation of Hunan Province with No.2020 JJ 4434, Key Scientific Research Projects of Department of Education of Hunan Province with No.19 A 312; Hunan Provincial Science & Technology Project Foundation (2018 TP 1018, 2018 RS 3065); Scientific Research Fund of Hunan Provincial Education(14 C 0710) Institutional Review Board Statement: Not applicable Informed Consent Statement: Not applicable Data Availability Statement: Not applicable Conflicts of Interest: The authors declare no conflict of interest References 1 Lateef, F.; Ruichek, Y. Survey on semantic segmentation using deep learning techniques Neurocomputing 2019 , 338 , 321–348. [ CrossRef ] 2 Shen, D.; Wu, G.; Suk, H.I. Deep learning in medical image analysis Annu. Rev. Biomed. Eng 2017 , 19 , 221–248. [ CrossRef ] 3 Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y Deep Learning ; MIT Press: Cambridge, UK, 2016 4 Almeida, G.; Tavares, J.M.R.S. Deep learning in radiation oncology treatment planning for prostate cancer: A systematic review J Med. Syst 2020 , 44 , 1–15. [ CrossRef ] 5 Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep learning techniques for medical image segmentation: Achievements and challenges J. Digit. Imaging 2019 , 32 , 582–596. [ CrossRef ] [ PubMed ] 6 Altaf, F.; Islam, S.M.S.; Akhtar, N.; Nanjua, N.K. Going deep in medical image analysis: Concepts, methods, challenges, and future directions IEEE Access 2019 , 7 , 99540–99572. [ CrossRef ] 7 Hu, P.; Cao, Y.; Wang, W.; Wei, B. Computer Assisted Three-Dimensional Reconstruction for Laparoscopic Resection in Adult Teratoma J. Med. Imaging Health Inform 2019 , 9 , 956–961. [ CrossRef ] 8 Ess, A.; Müller, T.; Grabner, H.; Van Gool, L. Segmentation-Based Urban Traffic Scene Understanding BMVC 2009 , 1 , 2 9 Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–12 June 2012; pp. 3354–3361 10 Ma, Z.; Tavares, J.M.R.S.; Jorge, R.M.N. A review on the current segmentation algorithms for medical images. In Proceedings of the 1 st International Conference on Imaging Theory and Applications, Lisbon, Portugal, 5–8 February 2009 11 Ferreira, A.; Gentil, F.; Tavares, J.M.R.S. Segmentation algorithms for ear image data towards biomechanical studies Comput Methods Biomech. Biomed. Eng 2014 , 17 , 888–904. [ CrossRef ] 12 Ma, Z.; Tavares, J.M.R.S.; Jorge, R.N.; Mascarenhas, T. A review of algorithms for medical image segmentation and their applications to the female pelvic cavity Comput. Methods Biomech. Biomed. Eng 2010 , 13 , 235–246. [ CrossRef ] 13 Xu, A.; Wang, L.; Feng, S.; Qu, Y. Threshold-based level set method of image segmentation. In Proceedings of the Third International Conference on Intelligent Networks and Intelligent Systems, Shenyang, China, 1–3 November 2010; pp. 703–706 14 Cigla, C.; Alatan, A.A. Region-based image segmentation via graph cuts. In Proceedings of the 2008 15 th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 2272–2275.
[[[ p. 26 ]]]
[Summary: This page discusses the application of CNN-based methods for abdomen segmentation. In CT and MRI abdomen images, we can segment the liver, spleen, kidney and other organs. Several CNN-based methods for abdomen segmentation are discussed. GAN is also used more in the segmentation of organs about the abdomen.]
Sustainability 2021 , 13 , 1224 26 of 29 15 Yu-Qian, Z.; Wei-Hua, G.; Zhen-Cheng, C.; Tang, J.-T.; Li, L.-Y. Medical images edge detection based on mathematical morphology. In Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27 th Annual Conference, Shanghai, China, 17–18 January 2006; pp. 6492–6495 16 He, K.; Gkioxari, G.; Doll á r, P.; Girschik, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969 17 Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp 1925–1934 18 Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 1520–1528 19 Zhu, X.J Semi-Supervised Learning Literature Survey ; University of Winsconsin: Madison, WI, USA, 2005 20 Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833 21 Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shshroudy, A.; Shuai, B.; Liu, I.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks Pattern Recognit 2018 , 77 , 354–377. [ CrossRef ] 22 Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex J. Physiol 1962 , 160 , 106. [ CrossRef ] [ PubMed ] 23 Fukushima, K.; Miyake, S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets ; Springer: Berlin, Germany, 1982; pp. 267–285 24 L é cun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition IEEE 1998 , 86 , 2278–2324. [ CrossRef ] 25 Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks Adv. Neural Inf Process. Syst 2012 , 60 , 1097–1105. [ CrossRef ] 26 He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 16 June–1 July 2016; pp. 770–778 27 Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition arXiv 2014 , arXiv:1409.1556 28 Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3 d residual networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5533–5541 29 Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9 30 Rundo, L.; Han, C.; Nagano, Y.; Zhang, J.; Hataya, R.; Militello, C.; Tangherloni, A.; Nobile, M.S.; Ferreti, C.; Besozzi, D.; et al. USE-Net, Incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets Neurocomputing 2019 , 365 , 31–43. [ CrossRef ] 31 Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation IEEE Trans. Pattern Anal. Mach. Intell 2017 , 39 , 2481–2495. [ CrossRef ] [ PubMed ] 32 Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440 33 Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241 34 Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890 35 Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks Adv. Neural Inf. Process. Syst 2015 , 39 , 91–99. [ CrossRef ] 36 Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs arXiv 2014 , arXiv:1412.7062 37 Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials Adv. Neural Inf. Process. Syst 2011 , 24 , 109–117 38 Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs IEEE Trans. Pattern Anal. Mach. Intell 2017 , 40 , 834–848. [ CrossRef ] 39 Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation arXiv 2017 , arXiv:1706.05587 40 Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818 41 Zhou, X.; Takayama, R.; Wang, S.; Hara, T.; Fujita, H. Deep learning of the sectional appearances of 3 D CT images for anatomical structure segmentation based on an FCN voting method Med. Phys 2017 , 44 , 5221–5233. [ CrossRef ] [ PubMed ]
[[[ p. 27 ]]]
[Summary: This page discusses the application of CNN-based methods in cardiology. The heart is an important organ in our body. However, various heart diseases also seriously threaten the lives of many people. It is necessary to realize automatic segmentation of the heart region to solve practical problems in the field of cardiac medical treatment.]
Sustainability 2021 , 13 , 1224 27 of 29 42 Christ, P.F.; Elshaer, M.E.A.; Ettlinger, F.; Tatavarty, S.; Bickel, M.; Bilic, P.; Rempfler, M.; Armbruster, M.; Hoffman, F.; D’Anastasi, M.; et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3 D conditional random fields. In Proceedings of the International Conference on Medical Image Computing and Computer- Assisted Intervention, Athens, Greece, 17–21 October 2016; pp. 415–423 43 Zhou, X.Y.; Shen, M.; Riga, C.; Yang, G.-Z.; Lee, S.-L. Focal fcn: Towards small object segmentation with limited training data arXiv 2017 , arXiv:1711.01506 44 Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3 D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Greece, 17–21 October 2016; pp. 424–432 45 Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3 D Vision (3 DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571 46 Xiao, X.; Lian, S.; Luo, Z.; Li, S. Weighted Res-UNet for high-quality retina vessel segmentation. In Proceedings of the 9 th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; pp. 327–331 47 Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.-W.; Heng, P.A. H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes IEEE Trans. Med. Imaging 2018 , 37 , 2663–2674. [ CrossRef ] [ PubMed ] 48 Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation Neural Netw 2020 , 121 , 74–87. [ CrossRef ] [ PubMed ] 49 Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; Mcdonagh, S.; Hammerla, N.Y.; Kainz, B.; et al Attention u-net: Learning where to look for the pancreas arXiv 2018 , arXiv:1804.03999 50 Wang, Z.; Zou, N.; Shen, D.; Ji, S. Non-Local U-Nets for Biomedical Image Segmentation. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020; pp. 6315–6322 51 Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets Adv. Neural Inf. Process. Syst 2014 , 27 , 2672–2680 52 Luc, P.; Couprie, C.; Chintala, S.; Verbeek, J. Semantic segmentation using adversarial networks arXiv 2016 , arXiv:1611.08408 53 Xue, Y.; Xu, T.; Zhang, H.; Long, L.R.; Huang, X. SegAN: Adversarial Network with Multi-scale L 1 Loss for Medical Image Segmentation Neuroinformatics 2018 , 16 , 383–392. [ CrossRef ] 54 Dai, W.; Dong, N.; Wang, Z.; Liang, X.; Zhang, H.; Xing, E.P. Scan: Structure correcting adversarial network for organ segmentation in chest x-rays. In Mining Data for Financial Applications ; Springer: Cham, Switzerland, 2018; pp. 263–273 55 Khosravan, N.; Mortazi, A.; Wallace, M.; Bagci, U. Pan: Projective adversarial network for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–18 October 2019; pp. 68–76 56 Chang, Q.; Qu, H.; Zhang, Y.; Sabuncu, M.; Chen, C.; Zhang, T.; Metaxas, D.N. Synthetic Learning: Learn from Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 13856–13866 57 Zhao, M.; Wang, L.; Chen, J.; Nie, D.; Cong, Y.; Ahmad, S.; Ho, A.; Yuan, P.; Fung, S.H.; Deng, H.H.; et al. Craniomaxillofacial bony structures segmentation from MRI with deep-supervision adversarial learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 720–727 58 Mondal, A.K.; Dolz, J.; Desrosiers, C. Few-shot 3 d multi-modal medical image segmentation using generative adversarial learning arXiv 2018 , arXiv:1810.12241 59 Zhang, Y.; Yang, L.; Chen, J.; Fredericksen, M.; Hughes, D.P.; Chen, D.Z. Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10–14 September 2017; pp. 408–416 60 Yang, D.; Xu, D.; Zhou, S.K.; Georgescu, B.; Chen, M.; Grbic, S.; Metaxas, D.; Comaniciu, D. Automatic liver segmentation using an adversarial image-to-image network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10–14 September 2017; pp. 507–515 61 Mirza, M.; Osindero, S. Conditional generative adversarial nets arXiv 2014 , arXiv:1411.1784 62 Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232 63 Bayramoglu, N.; Kaakinen, M.; Eklund, L.; Heikkila, J. Towards virtual h&e staining of hyperspectral lung histology images using conditional generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 64–71 64 Dar, S.U.H.; Yurt, M.; Karacan, L.; Erdem, A.; Erdem, E.; Çukur, T. Image synthesis in multi-contrast MRI with conditional generative adversarial networks IEEE Trans. Med. Imaging 2019 , 38 , 2375–2388. [ CrossRef ] 65 Wolterink, J.M.; Dinkla, A.M.; Savenije, M.H.F.; Seevinck, P.R.; van den Berg, C.A.; Išgum, I. Deep MR to CT synthesis using unpaired data. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Quebec City, QC, Canada, 10 September 2017; pp. 14–23.
[[[ p. 28 ]]]
[Summary: This page discusses other segmentation CNN-based methods, including musculoskeletal MRI, cell microscopic, spines CT, spines MRI, prostate MRI, kidney CT, and skin dermoscopy. The applications of SegNet, Btrfly Net, Spine-GAN, V-Net, USE-Net, FCN, Kid-Net, and GAN are also discussed.]
Sustainability 2021 , 13 , 1224 28 of 29 66 Tuan, T.A.; Pham, T.B.; Kim, J.Y.; Tavares, J.M.R. Alzheimer’s diagnosis using deep learning in segmenting and classifying 3 D brain MR images Int. J. Neurosci 2020 , 1–10. [ CrossRef ] [ PubMed ] 67 Myronenko, A. 3 D MRI brain tumor segmentation using autoencoder regularization. In Proceedings of the International MICCAI Brainlesion Workshop, Shenzhen, China, 17 October 2018; pp. 311–320 68 Nie, D.; Wang, L.; Adeli, E.; Lao, C.; Lin, W.; Shen, D. 3-D fully convolutional networks for multimodal isointense infant brain image segmentation IEEE Trans. Cybern 2019 , 49 , 1123–1136. [ CrossRef ] [ PubMed ] 69 Wang, S.; Yi, L.; Chen, Q.; Meng, Z.; Dong, H.; He, Z. Edge-aware Fully Convolutional Network with CRF-RNN Layer for Hippocampus Segmentation. In Proceedings of the 2019 IEEE 8 th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 803–806 70 Borne, L.; Rivi è re, D.; Mangin, J.F. Combining 3 D U-Net and bottom-up geometric constraints for automatic cortical sulci recognition In Proceedings of the International Conference on Medical Imaging with Deep Learning, London, UK, 8–10 July 2019 71 Casamitjana, A.; Cat à , M.; S á nchez, I.; Combalia, M.; Vilaplana, V. Cascaded V-Net using ROI masks for brain tumor segmentation In Proceedings of the International MICCAI Brainlesion Workshop, Quebec City, QC, Canada, 14 September 2017; pp. 381–391 72 Moeskops, P.; Veta, M.; Lafarge, M.W.; Eppenhof, K.A.J.; Pluim, J.P.W. Adversarial training and dilated convolutions for brain MRI segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support ; Springer: Cham, Switzerland, 2017; pp. 56–64 73 Rezaei, M.; Harmuth, K.; Gierke, W.; Kellermeier, T.; Fischer, M.; Yang, H.; Meinel, C. A conditional adversarial network for semantic segmentation of brain tumor. In Proceedings of the International MICCAI Brainlesion Workshop, Quebec City, QC, Canada, 14 September 2017; pp. 241–252 74 Giacomello, E.; LoIacono, D.; Mainardi, L. Brain MRI Tumor Segmentation with Adversarial Networks arXiv 2019 , arXiv:1910.02717 75 Leopold, H.A.; Orchard, J.; Zelek, J.S.; Lakshminarayanan, V. Pixelbnn: Augmenting the pixelcnn with batch normalization and the presentation of a fast architecture for retinal vessel segmentation J. Imaging 2019 , 5 , 26. [ CrossRef ] 76 Zhang, Y.; Chung, A.C.S. Deep supervision with additional labels for retinal vessel segmentation task. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 83–91 77 Son, J.; Park, S.J.; Jung, K.H. Retinal vessel segmentation in fundoscopic images with generative adversarial networks arXiv 2017 , arXiv:1706.09318 78 Edupuganti, V.G.; Chawla, A.; Amit, K. Automatic optic disk and cup segmentation of fundus images using deep learning. In Proceedings of the 25 th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2227–2231 79 Shankaranarayana, S.M.; Ram, K.; Mitra, K.; Sivaprakasam, M. Joint optic disc and cup segmentation using fully convolutional and adversarial networks. In Fetal, Infant and Ophthalmic Medical Image Analysis ; Springer: Cham, Switzerland, 2017; pp. 168–176 80 Bhandary, A.; Prabhu, G.A.; Rajinikanth, V.; Thanaraj, K.P.; Satapathy, S.C.; Robbins, D.E.; Shasky, C.; Zhang, Y.-D.; Tavares, J.M.R.; Raja, N.S.M. Deep-learning framework to detect lung abnormality—A study with chest X-Ray and lung CT scan images Pattern Recognit. Lett 2020 , 129 , 271–278. [ CrossRef ] 81 Novikov, A.A.; Lenis, D.; Major, D.; Hlad ˚uvka, J.; Wimmer, M.; Bühler, K. Fully convolutional architectures for multiclass segmentation in chest radiographs IEEE Trans. Med. Imaging 2018 , 37 , 1865–1876. [ CrossRef ] 82 Anthimopoulos, M.M.; Christodoulidis, S.; Ebner, L.; Geiser, T.; Christe, A.; Mougiakakou, S. Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks IEEE J. Biomed. Health Inform 2019 , 23 , 714–722. [ CrossRef ] 83 Jue, J.; Jason, H.; Neelam, T.; Andreas, R.; Sean, B.L.; Joseph, D.O.; Harini, V. Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–18 October 2019; pp. 221–229 84 Christ, P.F.; Ettlinger, F.; Grün, F.; Elshaera, M.E.A.; Lipkova, J.; Schlecht, S.; Ahmaddy, F.; Tatavarty, S.; Bickel, M.; Bilic, P.; et al Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks arXiv 2017 , arXiv:1702.05970 85 Han, X. Automatic liver lesion segmentation using a deep convolutional neural network method arXiv 2017 , arXiv:1704.07239 86 Huo, Y.; Xu, Z.; Bao, S.; Bermudez, C.; Plassard, A.J.; Yao, Y.; Liu, J.; Assad, A.; Abramson, R.G.; Landman, B.A. Splenomegaly segmentation using global convolutional kernels and conditional generative adversarial networks Med. Imaging 2018 , 10574 , 1057409 87 Tran, P.V. A fully convolutional neural network for cardiac segmentation in short-axis MRI arXiv 2016 , arXiv:1604.00494 88 Xu, Z.; Wu, Z.; Feng, J. CFUN: Combining faster R-CNN and U-net network for efficient whole heart segmentation arXiv 2018 , arXiv:1812.04914 89 Dong, S.; Luo, G.; Wang, K.; Cao, S.; Mercado, A.; Shmuilovich, O.; Zhang, H.; Li, S. VoxelAtlasGAN: 3 D left ventricle segmentation on echocardiography with atlas guided generation and voxel-to-voxel discrimination. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 622–629 90 Zhang, J.; Du, J.; Liu, H.; Hou, X.; Zhao, Y.; Ding, M. LU-NET: An Improved U-Net for Ventricular Segmentation IEEE Access 2019 , 7 , 92539–92546. [ CrossRef ]
[[[ p. 29 ]]]
[Summary: This page covers segmentation evaluation metrics and data sets. The quality of an algorithm requires a correct objective indicator. The segmentation evaluation of medical images is divided into pixel-based and overlap-based methods. It also introduces some public data sets frequently used in the field of medical image segmentation.]
Sustainability 2021 , 13 , 1224 29 of 29 91 Ye, C.; Wang, W.; Zhang, S.; Wang, K. Multi-depth fusion network for whole-heart CT image segmentation IEEE Access 2019 , 7 , 23421–23429. [ CrossRef ] 92 Xia, Q.; Yao, Y.; Hu, Z.; Hao, A. Automatic 3 D atrial segmentation from GE-MRIs using volumetric fully convolutional networks. In Proceedings of the International Workshop on Statistical Atlases and Computational Models of the Heart, Granada, Spain, 16 September 2018; pp. 211–220 93 Chen, C.; Qin, C.; Qiu, H.; Tarroni, G.; Duan, J.; Bai, W.; Rueckert, D. Deep Learning for Cardiac Image Segmentation: A Review Front. Cardiovasc. Med 2020 , 7 , 25. [ CrossRef ] 94 Arshad, H.; Khan, M.A.; Sharif, M.I.; Yasmin, M.; Tavares, J.M.R.; Zhang, Y.D.; Satapathy, S.C. A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition Expert Syst 2020 , e 12541. [ CrossRef ] 95 Wang, Y.; Chen, Y.; Yang, N.; Zheng, L.; Dey, N.; Ashour, A.S.; Rajinikanth, V.; Tavares, J.M.R.S.; Shi, F. Classification of mice hepatic granuloma microscopic images based on a deep convolutional neural network Appl. Soft Comput 2019 , 74 , 40–50. [ CrossRef ] 96 Liu, F.; Zhou, Z.; Jang, H.; Samsonov, A.; Zhao, G.; Kijowski, R. Deep convolutional neural network and 3 D deformable approach for tissue segmentation in musculoskeletal magnetic resonance imaging Magn. Reson. Med 2018 , 79 , 2379–2391. [ CrossRef ] [ PubMed ] 97 Tran, T.; Kwon, O.H.; Kwon, K.R.; Lee, S.H.; Kang, K.W. Blood cell images segmentation using deep learning semantic segmentation. In Proceedings of the IEEE International Conference on Electronics and Communication Engineering, Essex, UK, 16–17 August 2018; pp. 13–16 98 Sekuboyina, A.; Rempfler, M.; Kukaˇcka, J.; Tetteh, G.; Valentinitsch, A.; Kirschke, J.; Menze, B.H. Btrfly net: Vertebrae labelling with energy-based adversarial learning of local spine prior. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 649–657 99 Han, Z.; Wei, B.; Mercado, A.; Leung, S.; Li, S. Spine-GAN: Semantic segmentation of multiple spinal structures Med. Image Anal 2018 , 50 , 23–35. [ CrossRef ] 100. Kohl, S.; Bonekamp, D.; Schlemmer, H.P.; Yaqubi, K.; Hohenfellner, M.; Hadaschik, B.; Radtke, J.P.; Maier-Hein, K. Adversarial networks for the detection of aggressive prostate cancer arXiv 2017 , arXiv:1702.08014 101. Taha, A.; Lo, P.; Li, J.; Zhao, T. Kid-net: Convolution networks for kidney vessels segmentation from ct-volumes. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 463–471 102. Izadi, S.; Mirikharaji, Z.; Kawahara, J.; Hamarneh, G. Generative adversarial networks to segment skin lesions. In Proceedings of the IEEE 15 th International Symposium on Biomedical Imaging, Washington, DC, USA, 4–7 April 2018; pp. 881–884 103. Mirikharaji, Z.; Hamarneh, G. Star shape prior in fully convolutional networks for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 737–745 104. Wang, D.; Gu, C.; Wu, K.; Guan, X. Adversarial neural networks for basal membrane segmentation of microinvasive cervix carcinoma in histopathology images. In Proceedings of the 2017 International Conference on Machine Learning and Cybernetics, Ningbo, China, 9–12 July 2017 105. Simpson, A.L.; Antonelli, M.; Bakas, S.; Bilello, M.; Farahani, K.; Van Ginneken, B.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B.; et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms arXiv 2019 , arXiv:1902.09063 106. Van Ginneken, B.; Stegmann, M.B.; Loog, M. Segmentation of anatomical structures in chest radiographs using supervised methods: A comparative study on a public database Med. Image Anal 2006 , 10 , 19–40. [ CrossRef ] [ PubMed ] 107. Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The multimodal brain tumor image segmentation benchmark (BRATS) IEEE Trans. Med. Imaging 2014 , 34 , 1993–2024. [ CrossRef ] 108. Heath, M.; Bowyer, K.; Kopans, D.; Kegelmeyer, P.; Moore, R.; Chang, K.; Munishkumaran, S. The digital database for screening mammography. In Proceedings of the 5 th International Workshop on Digital Mammography, Toronto, ON, Canada, 11–14 June 2000; pp. 212–218 109. Bilic, P.; Christ, P.F.; Vorontsov, E.; Chlebus, G.; Chen, H.; Dou, Q.; Fu, C.-W.; Han, X.; Heng, P.-A.; Hesser, J.; et al. The liver tumor segmentation benchmark (lits) arXiv 2019 , arXiv:1901.04056 110. Armato, S.G., III; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.A.; Henschke, C.I.; Hoffman, E.A.; et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans Med. Phys 2011 , 38 , 915–931. [ CrossRef ] [ PubMed ] 111. Marcus, D.S.; Fotenos, A.F.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open access series of imaging studies: Longitudinal MRI data in nondemented and demented older adults J. Cogn. Neurosci 2010 , 22 , 2677–2684. [ CrossRef ] 112. Staal, J.; Abr à moff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina IEEE Trans. Med. Imaging 2004 , 23 , 501–509. [ CrossRef ] 113. Suckling, J.P. The mammographic image analysis society digital mammogram database Digit. Mammo 1994 , 17 , 375–386 114. Fonseca, C.G.; Backhaus, M.; Bluemke, D.A.; Britten, R.D.; Chung, J.D.; Cowan, B.R.; Dinov, I.D.; Finn, J.P.; Hunter, P.J.; Kadish, A.H.; et al. The Cardiac Atlas Project—An imaging database for computational modeling and statistical atlases of the heart Bioinformatics 2011 , 27 , 2288–2295. [ CrossRef ] [ PubMed ]
