Artificial intelligence to evaluate diagnosed COVID-19 chest radiographs

.


INTRODUCTION
This section presents two main topics to which this study relates strictly: COVID-19 and Artificial Intelligence use in healthcare, with special attention to the AI's vision techniques that aid radiologists in diagnosis through radiographs by training deep convolutional neural network models [1].In addition, this work presents the authors' initial motivations, state-of-art studies on similar topics, a preliminary hypothesis to the results, the bureaucratic obstacles encountered along the way, and the reasons to continue the study notwithstanding current applied mass vaccination policies.

COVID-19
In December 2019, a new coronavirus named Severe Acute Respiratory Syndrome Coronavirus 2, or SARS-CoV-2, was identified as an outbreak disease agent in Wuhan, China.On March 11th, 2020, the WHO -World Health Organization classified the infection caused by the coronavirus as a global pandemic given its fast spread [2].On the 10th of August 2022, there was a total number of 584.065.952402.044.502cases of infection by the new Sars-CoV-2, and a total number of 6.418.958deaths worldwide [3].
When the SARS-CoV-2 infects a human body, it might present few or no symptoms.In risk cases, patients show severe symptoms of some pulmonary pathologies leading to acute respiratory failure [2] and several other clinical manifestations such as fever, cough, fatigue, anorexia, myalgia, dyspnea, and phlegm production [4].
Given its relatively short acquiring time, chest radiographs can be used to diagnose pulmonary pathologies both in people with symptomatic, pre-symptomatic, and asymptomatic COVID-19 [5].
Although chest radiographs have low sensitivity to identify lung changes in the early stages of COVID-19, it shows opacities in the lung region in advanced stages of the illness [2].This work evaluates the identification of possible changes that coronavirus may cause in human lungs through training a deep learning model, which uses x-ray chest radiographs from retrospective COVID-19 patients.The outcomes of the model were compared to other related pulmonary diseases.

Artificial intelligence in diagnostic imaging
AI applications in lung x-ray images include the classification of pulmonary pathologies which are compared to the diagnosis made by radiologists in the third year of medical residency [6], so it might be useful to aid physicians as a complementary feature to decision making.
On a more focused basis, Machine Learning algorithms used to diagnose COVID-19 have shown positive results with metrics of sensitivity and specificity reaching values of, respectively, 85.35% and 92.18% when diagnosing symptoms of pneumonia present in radiographs of patients, with COVID-19 [7].On the other hand, these high-scoring results might correlate to an overfitting problem, as the trained model might be biased when presented to the validation data subset and then carry this biased information such as temporality, locality, equipment characteristics, and acquisition techniques.These biases can be minimized by applying some transformations to the radiographic images.The procedures are detailed in the next section.
Other recent studies [8][9][10][11] point out that many related works often do not get accurate results, and this reinforces the importance of acquiring additional information other than the radiographs themselves to assist clinical diagnoses, such as location, time, and symptoms.In contrast to the studies which use Computed Tomography (CT) scans, this present study analyzes normal X-ray radiographs because they are accurate to determine the disease and demand less time and resources as well [10].Despite the limitations that AI offers when being clinically applied, further developments in this field are expected optimistically [12].

MATERIALS AND METHODS
The data was acquired following the subsequent processes required by the hospital that supplied the images in Digital Imaging and Communications in Medicine (DICOM) format.The parameters needed to reproduce this work are presented in this section.

Ethical aspects
All chest radiographs classified as COVID-19 were provided by the Hospital de Clínicas de Porto Alegre (HCPA), following ethical aspects regarding anonymity and confidentiality of the patients.Process CAAE No. 35219020.110000.5327registered on Brazil Platform and approved by the Comitê de Ética em Pesquisa (CEP) from Hospital de Clínicas de Porto Alegre, Brazil.

Data acquisition
Radiologists took the COVID-19 radiographs from HCPA, totalizing a set of 942 image files with additional information such as the patient's age and if the patient was laid down or sited in anteroposterior (AP), posteroanterior (PA), or another informed position when the radiograph was taken.

Data preparation
The algorithm, model, and images used in the neural network training and validation were put in a dataset folder, the setup is shown in Figure 1.The code is written in Python in a Jupyter Notebook from jupyter-lab, full code is available on GitHub [13].All globally installed packages are listed in a file named 'global_requirements.txt' on the GitHub page.
The image size was chosen to be 256x256 pixels, in 3 distinct channels, one for each RGB color model scale.Every pixel in each of the channels had a value going from 0 to 255, which was normalized by dividing it by 255.Some transformations to the images were also included in this work, and they will be further presented.
Moreover, the proposed model compares COVID-19 images to a public dataset from the NIHCC -Clinical Center of National Institutes of Health [14] and it is trained to identify image characteristics that can be evaluated whether it is associated with COVID-19 or not.Note that these was not needed to preprocess these images.
Then the COVID-19 dataset was investigated by making comparisons to other labeled data such as Pneumonia, Edema, Mass, Consolidation, and Fibrosis.The other diseases had more images than the COVID-19 dataset, so the found results were compared for both balanced and imbalanced class models.
The model used two distinct labels to categorize the whole dataset: COVID was labeled as 1 and Pneumonia, Edema, Mass, Consolidation, and Fibrosis were labeled as 0. These labels are assigned by keeping the images in a folder with its labeled disease named on it, so the folder's address in the algorithm may associate an image to its labeled disease and thence adds its label to a list of labels.
In this AI algorithm, the word class represents a similar concept expressed as a label, with the difference that a class is the proper classification of an image after being inputted to the trained algorithm, which considers all possible labels that an image might be classified and, then, gives a prediction.The images are fed into a trained model with labeled data, 1 for COVID and 0 for Non-COVID, so they are classified as to whether they are likely to be related to the disease or not.

Data processing
This subsection discusses the details of processes applied in the images when fed to the neural network, these are the parameters a usual network's architecture needs to start training.This section will succinctly detail the used convolutional neural network and the choices for the functions' arguments, also called parameters and hyper-parameters of the deep learning model.
The trained model is a Convolutional Neural Network, state-of-art in image classification methods.This work uses a ResNet-18 architecture, which previously showed a fair result for the accuracy metric going from 86% to 98% and a sensitivity of 90% [15] for the COVID-19 diagnostic task.Some of the parameters to be set were: Image Size: it was reduced from non-standardized dimensions to 256x256 pixels for each of the Transformations: although the clinical procedures follow protocols, operational variations during image acquisition might occur, such as positioning, X-ray energy, exposure time, and additional software parameters.Thus, they applied some transformations to each image of the set, these transformations are presented as follows, and a visual example of these transformations can be seen in Figure 2: • Rotation: randomly rotate at an angle between -30º and 30º.

Figure 2: Loss Function and Optimizer
The prediction and the actual class of a given sample are evaluated by a Loss Function, which measures the distance between the predicted and actual value.In this study, the Cross-Entropy Loss Function was used, given by the class CrossEntropyLoss from PyTorch, and the Adam optimizer was used instead of the default Stochastic Gradient Descent.

Evaluation and metrics
The model is evaluated by using some key metrics.All of them are grounded in how much the model can correctly predict a condition that is the presence or absence of the disease.From these rates, one can define other evaluation metrics that are important and summarize the results aiding a radiologist in making diagnostics.The metric used in this study was accuracy, which represents the proportion of true predicted values among the total number of predictions.

Code availability
To make it clearer and easier to sight, the algorithm was uploaded to GitHub and can be found in a GitHub repository available at [13].Note that there are 2 codes in .ipynbformat which lead to a jupyter notebook, one is the balanced model and the other is the imbalanced model.The most recent model is named Covid.ipynb.Data containing patient information is not uploaded to this repository due to ethical aspects involving sensitive patient data.

RESULTS AND DISCUSSION
The first step was to analyze the dataset.A total number of 942 DICOM of COVID-19 diagnosed patients were provided by the HCPA.Some of the radiographs had different characteristics than the others due to different radiologists' image acquiring techniques and, as formerly mentioned, it was decided to keep all images of the whole set so the model would be better trained when presented to brand new radiographs.
Firstly, the information in the file formatted as .dcm-DICOM files were studied, and we tried Graphics (PNG).To do that, the Python Imaging Library (PIL) was imported and used, which outputted images with highly lowered brightness and contrast.In some of the cases, the images were returned with only black pixels, so this image converter method was not used.Because of this, in a preprocessing step, the .dcmfiles were manually cutted by using the Adobe Photoshop CC Figure 1, this procedure was not needed for the publicly available images of the other pathologies, as they were already a PNG file with 256x256 dimensions.This work studied all the images in this same dimension of 256x256 pixels so it mitigates a possible bias influenced by the image characteristics that only appear in different resolutions.
As the radiographs were taken by different radiologists, notable differences between acquisition procedures were identified when analyzing the whole set, which implied minor variations in the images.These variations should reduce the overall accuracy but might be useful to give more realistic results as it prevents overfitting in the deep learning model.Once the trained neural network carries information regarding various acquisition methods, it is suitable to be used in distinct situations or environments All images contained in the dataset must have the same dimensions and 3 channels for each of the RGB colors because it standardizes the model and the correspondent connected neurons from one image to another refer to the same pixel.The training methodology includes Pytorch tensor comparisons and if there are changes in the dimensions it would imply changes in the network.
Otherwise, this technique would demand variable tensor sizes which would imply the need for more computational power.In addition, to reduce the influence of diverging positioning and radiation exposure amongst patients, the images passed through several transformations which simulated operational procedures that might or might not be strictly followed by all radiologists.
This study analyzed two distinct types of training for each pathology, one with balanced classes and the other with imbalanced classes.This first analysis might help the prevention of class imbalance, characterized by the low accuracy of models to identify images that are not common in the training dataset, such as rare diseases or diseases that have a lower number of inputted data, despite lowering the number of images in the whole set.The low number of samples might become an overfitting issue, which is something this work wants to avoid.For all models, the dataset was split into 80% for training and 20% for validation, this separation is made so the trained model can make predictions based on images to which they were not beforehand presented, which also prevents overfitting.
The first model to be analyzed was the imbalanced class model, which means that the number of images was not equally divided and they were compared with 942 images of COVID-19 radiographs.The numbers of diseases are presented as follows: Consolidation (791 images), Edema (374 images), Fibrosis (559 images), Mass (1234 images), and Pneumonia (192 images).We ran the deep learning model through 25, 50, 75, and 100 epochs to analyze the model's efficiency by measuring its accuracy and then saved the best model's results of accuracy and epoch number.
For the imbalanced classes models, the epochs which showed the best accuracy were, for each pathology: Consolidation (61 epochs with 87.90% accuracy), Edema (56 epochs with 85.98% accuracy), Fibrosis (81 epochs with 92.69% accuracy), Mass (86 epochs with 93.12% accuracy) and Pneumonia (35 epochs with 88.55% accuracy).The following Table 1 gives an overview of these results:      Source: authors By analyzing Figure 3 and Figure 4, it can be deduced that there is no need to train too many epochs, and the average ideal numbers of epochs for training are 63.8 ± 20.5 for the imbalanced VGG16, and 90.3% for the ensemble of these architectures except CheXpert.In a similar approach, another study [17]

CONCLUSION
There is a small but significant difference in the accuracy when comparing the imbalanced model to the balanced model, especially if one takes into account that these results are to be used in medicine.
This work points out that the number of images used to train the model should be high if one might want a better-trained model, and the usage of trained models available in public datasets is a good starting point, but it is important to be aware of the locality and chronological differences that influence the training set.Despite this work did not achieve better results than some of the results found in other studies, it highlights attention to the methodologies that one needs to be aware of to get realistic results, such as class balancing, data pre-processing, and the right set of hyperparameters.
This study had several limitations, the comparison was made using images from two distinct datasets and it could be relevant to use only one source of radiographs for the first training and, afterward, test it into other available data.To produce a more efficient algorithm, this research will need new sets of DICOM files from HCPA that are diagnosed as not due to COVID-19 so we feed the model with a dataset of non-COVID labels.By doing this, it is expected to reduce the effects of locality and temporality in the trained model.It was also utilized a binary classification methodology to compare radiological findings of COVID to each of the diseases because it demands less computational resources, another approach could be a multiclass classification which would give information about how the network understands the relation between each of the six diseases.Lastly, further studies include the creation of a mobile-friendly app to aid radiologists and the investigation of differences in chest radiographs of COVID-19 strains available in public datasets worldwide using this same algorithm to make the comparison.

Figure 1 :
Figure 1: Data Set setup structure used to run the Python algorithm.

3
RGB channels.The model uses a tensor that has the (bs, C, x_size, y_size) shape, where bs stands for batch size, C is the channel which can be 0, 1, or 2 and the x_size and y_size are the image dimensions read in means of a pixel array.Batch Size, number of Epochs, and Learning Rate: The batch size for training were chosen to be 32, and the number of epochs for training the model was 25, 50, 75, and 100.The learning rate was chosen as being the one that gives a minimum gradient to the losses.This work uses a cyclical learning rate method which selects the minimum gradient as a function of the learning rate during one epoch.Training and Validation sets: 80% of the set was used for training and 20% was reserved for validation.

Figure 3
Figure 3 shows the variation during training of Loss and Accuracy through 100 epochs of a binary supervised imbalanced classification model for the pathology Mass, which consists of 1234 images plus 942 images of COVID-19 radiographs, totaling 2176 images.

Figure 3 :
Figure 3: Loss (line) and Accuracy (line-dots) of the training (blue) and validation (orange) sets regarding the pathology Mass, with imbalanced classes and through 100 epochs of training.Source: authors

Figure 4
Figure 4 shows the variation of Loss and Accuracy during training through 75 epochs of a binary supervised balanced classification model for the pathology Consolidation, which consists of 791 images plus 791 images of COVID-19 radiographs, totaling 1582 images.

Figure 4 :
Figure 4: Loss (line) and Accuracy (line-dots) for the training (blue) and validation (red) sets regarding the pathology Consolidation, with balanced classes and through 75 epochs of training.Source: authors

Table 1 :
Accuracy results for a given pathology and their respective learning rates used to train the model, number of images of the dataset, and the epoch where the model was saved corresponding to the lowest loss function for the imbalanced classes approach.

Table 2 :
The following Table2gives an overview of these results: Accuracy results for a given pathology and their respective learning rates used to train the model, number of images of the dataset, and the epoch where the model was saved corresponding to the lowest loss function for the balanced classes approach.
trained three distinct neural networks with 404 images of five distinct classes