Spatially informed Bayesian neural network for neurodegenerative diseases classification

Magnetic resonance imaging (MRI) plays an increasingly important role in the diagnosis and prognosis of neurodegenerative diseases. One field of extensive clinical use of MRI is the accurate and automated classification of degenerative disorders. Most of current classification studies either do not mirror medical practice where patients may exhibit early stages of the disease, comorbidities, or atypical variants, or they are not able to produce probabilistic predictions nor account for uncertainty. Also, the spatial heterogeneity of the brain alterations caused by neurodegenerative processes is not usually considered, despite the spatial configuration of the neuronal loss is a characteristic hallmark for each disorder. In this article, we propose a classification technique that incorporates uncertainty and spatial information for distinguishing between healthy subjects and patients from four distinct neurodegenerative diseases: Alzheimer's disease, mild cognitive impairment, Parkinson's disease, and Multiple Sclerosis. We introduce a spatially informed Bayesian neural network (SBNN) that combines a three‐dimensional neural network to extract neurodegeneration features from MRI, Bayesian inference to account for uncertainty in diagnosis, and a spatially informed MRI image using hidden Markov random fields to encode cerebral spatial information. The SBNN model demonstrates that classification accuracy increases up to 25% by including a spatially informed MRI scan. Furthermore, the SBNN provides a robust probabilistic diagnosis that resembles clinical decision‐making and can account for the heterogeneous medical presentations of neurodegenerative disorders.

growing, neurodegenerative disorders will become the second leading cause of death after cardiovascular disease. 2 Thus, identifying and classifying these disorders is not only crucial in clinical practice but also beneficial for developing oriented treatments and enriching clinical research.
Magnetic resonance imaging (MRI) has become a reliable tool for the diagnosis and assessment of neurodegenerative diseases. In MRI data, neuronal damage caused by neurodegeneration is represented as atrophies or lesions whose intensities differ from those in surrounding apparently healthy tissue. 3 Measures of abnormalities derived from structural MRI data are markedly useful in the characterization of neurodegenerative chronic disorders such as Parkinson's, Alzheimer's, and Multiple Sclerosis. 1 Due to the availability of MRI data, there has been an inherent interest in applying analytical methods to provide a differential diagnosis of neurodegenerative disorders. Many studies have focused on measuring the volume of regions of interest, 4,5 the volume of cortical thickness 6 and the concentration of gray matter (voxel-based morphometry) 7 as distinctive indicators of neurodegenerative conditions. However, these methods measure group differences and are not extendable to individual diagnoses. Other studies based on independent component analysis (ICA) 8 and principal component analysis (PCA) 9 have uncovered neurodegeneration hallmarks and relate them to a neurodegenerative disorder. Although these techniques have sounded in neurodegenerative diseases research, they are bounded by a modest number of patients and intolerable classification accuracy from a medical diagnosis scope.
In recent years, advances in machine learning and pattern recognition methods capable of handling high-dimensional data have enabled the development of new diagnostic tools derived from morphological analyses of MRI. One prominent approach is imaging feature extraction for classification and prediction using deep learning. Multiple deep learning studies have been successful in classifying and predicting neurodegenerative diseases by using supported vector machine (SVM) 10 and convolutional neural networks (CNNs). 11,12 These approaches have achieved high accuracy in binary (over 93%) and multi-class (over 79%) disease classifications. Nonetheless, most of the studies focus on the distinction of different stages within the same disorder or the differentiation between healthy and pathological subjects. Such distinction does not mirror clinical practice, where brain abnormalities that define neurodegenerative diseases can be present before the onset of clinical features, and more than one neurodegeneration process can be found in an individual. 1 Studies in the classification of multiple neurodegenerative diseases are scarce since it entails combining and standardizing diverse MRI sources. Furthermore, predictions in deep learning methods are only partly suited for clinical diagnosis since they cannot produce probabilistic predictions nor account for uncertainty. This is particularly alarming in medical diagnostic applications, where silent failure can lead to dramatic results. 13 There is thus a need for classification methods that resemble clinical decision-making and account for atypical, multiple, and early presentations of neurodegenerative disorders.
Additionally, classification MRI-based studies ignore the spatial heterogeneity of the brain alterations caused by neurodegenerative processes. The spatial configuration of the neuronal loss is a characteristic hallmark for each disorder. In a number of studies mainly using statistical methods, [14][15][16] it has been shown that coupling the MRI data with spatial information significantly increases both the detection of and discrimination of neurodegenerative diseases. Novel approaches have proved that incorporating spatial components, such as geographic coordinates and region labels, into deep learning techniques for surface interpolation, 17 image segmentation and image recognition 18 can enhance model stability and classification accuracy.
This article aims to develop a classification technique that incorporates pattern recognition, uncertainty, and spatial information for distinguishing between healthy subjects and patients from four distinct neurodegenerative diseases: Alzheimer's disease, mild cognitive impairment (MCI), Parkinson's disease, and Multiple Sclerosis. In this work, a spatially informed Bayesian neural network (SBNN) is introduced to produce automated, probabilistic, and accurate predictions that resemble clinical diagnosis. A CNN is developed to detect a neurodegenerative process based on T 1 -weighted MRI scans from patients and healthy controls, then Bayesian inference is incorporated into the CNN parameters to quantify uncertainty and produce probabilistic predictions and finally, a spatially informed MRI scan using hidden Markov random fields (HMRFs) is included as a CNN input to improve feature detection and classification accuracy. To illustrate the proposed method, this study uses 330 T1-weighted MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, the Parkinson's Progression Markers Initiative (PPMI) database and the Multiple Sclerosis database of the University Medical Center Ljubljana.

Study population
The dataset for this study was obtained from three different sources: the ADNI database, the PPMI database and the University Medical Center Ljubljana (UMCL). 19

Image sequences
Subjects were scanned on either 1.5 T or 3.0 T MRI devices according to the three studies' protocols. All scans include a 3-dimensional MP RANGE T1-weighted gradient-echo sequence. The voxel size of the T1-images oscillates between 0.42×0.42×3.3 mm 3 and 1.0×1.0× 1.2 mm 3 . 135 and 195 patients were imaged using 1.5 T and 3.0 T devices, respectively.

Image preprocessing
The N4 inhomogeneity correction (improved N3 bias correction) algorithm 20 was applied to each MRI scan. As this study involves a population-level analysis, every image must have the same conditions in resolution, size, and spatial distribution. A non-linear spatial registration is performed to the N4-corrected T1-weighted scans to transform them into an unbiased high-resolution image with 2 mm isotropic voxels. The symmetric diffeomorphic image registration algorithm 21 implemented in the Advanced Normalization Tools (ANTs) 22 is used for this procedure. The use of a non-linear registration method guarantees the alignment of corresponding brain regions across subjects and allows the comparison of neural loss that characterizes each degenerative disorder. To remove non-cerebral tissue such as the skull and neck from the N4-corrected and registered images, a brain mask of the atlas template was applied. The Removal of Artificial Voxel Effect by Linear regression (RAVEL) method 23 was employed to normalize the image voxel intensities since these were collected on different scanners across different sites as part of different studies. RAVEL removes present unwanted variation by adopting control voxels unassociated with disease status and clinical covariates. 3-class tissue segmentation is performed using the FSL FAST segmentation algorithm 24 to obtain the control voxels within the cerebrospinal fluid (CSF) structure. Age and sex are used as clinical covariates. Finally, to correct the scanner and protocol effects and to preserve biological associations on the MRI scans, the ComBat image harmonization algorithm 25 was used. The cortical thicknesses of the 330 N4-corrected, registered, skull stripped and normalized images were harmonized. The steps mentioned above were integrated into an automatic pipeline to automate MRI preprocessing. We developed Neurorom, an R package to prepare structural MRI scans for analysis. Neurorom is an easily accessible package that allows users to preprocess multiple MRI scans in an automated fashion using minimal user input. The output of Neurorom is preprocessed MRI ready to use in our approach (see methodology). The reader is referred to Section 7 for further details on accessing the package and the raw code.   Figure 1 illustrates how preprocessing allows comparison and how spatially distributed features in the brain anatomy inform of a particular neurodegenerative disease. The differences are visually evident in Figure 1 as we deliberately selected MRI scans for patients in advanced stages.

Spatially informed Bayesian neural network
Bayesian neural networks (BNN) combine neural network and stochastic models with the latter forming the core of this integration. 26 BNN introduces stochastic components in a neural network to simulate multiple possible models with an associated probability distribution. This allows explicit uncertainty inference in the underlying processes of the neural network. 13 BNN can produce uncertainties on the predictions and draw the distributions of the parameters learned by the model. The SBNN is an extension of a BNN. The SBNN combines a CNN, a stochastic variational model, and a spatially informed input to produce classification predictions. Figure 2 displays the workflow to build the SBNN.
Since one of the main goals of the SBNN is to prove that the inclusion of an explicit spatial input in a BNN model improves classification accuracy, the SBNN model is fed not only by standard MRI data but also by spatially informed data. The standard data corresponds to the preprocessed MRI scans (x ). The spatially informed data (x ) is obtained after applying a HMRF model proposed by Zhange et al 24 to the preprocessed MRI scans. The HMRF model informs about the spatial structure using a neighboring context which allows forming mutually exclusive classes that depend on the spatial similarities of the entities. In brain imaging context, the HMRF segments MRI scans into classes that encode anatomical spatial relations. The labeling of each MRI scan voxel depends on the probability of that voxel's intensity belonging to a spatial neighborhood of similar voxel intensities. The HMRF model transforms the voxels' intensity response into a measure of spatial correlation. For further details, please refer to the aforementioned paper. In this study, three classes ( = 3) corresponding to the white matter, grey matter, and cerebrospinal fluid are selected. These structures are chosen as their spatial distribution and heterogeneity over the brain and volumes are highly correlated to the brain damage caused by neurodegenerative diseases such as Alzheimer's, Parkinson's, Multiple Sclerosis, and MCI. 27 The output spatially informed MRI scans summarize implicitly the neighborhood structures among the voxel intensities and, evidence abnormal changes-neuronal damage or death-in the three selected classes.
Both the preprocessed (x ) and the spatially informed MRI scans (or spatially informed input) (x ) are split into training, validation, and test datasets. Suppose we obtain observations from n subjects and let (x i , x i ) denotes the preprocessed F I G U R E 2 Pipeline to build, train, and test the spatially informed Bayesian neural network. Further details of the formalities are given in the text and spatially informed MRI scans, and y i denotes the label of the true ground disease k for the ith subject; k refers to one of the K classes, that is, one of the four neurodegenerative diseases or the healthy state. For training and validation purposes, each pair of MRI scans (x i , x i ) is accompanied by its corresponding true ground disease (y i ). In the case of model testing, true classes y are only used to evaluate the predicted classesŷ of the test dataset. The training dataset (D)-composed by x and x -is the input of the SBNN model. A neural network model (Φ ) is chosen, in particular, we use a CNN. This study selected 4 brain classification state-of-the-art neural network architectures and, we additionally proposed a new one. The 5 neural networks are trained to detect the architecture that best represents and generalizes the MRI data and to evaluate the effect of the spatially informed input, that is, the spatially informed MRI scans. The next step is the choice of a stochastic model that will provide a prior distribution p( ) for the bias parameters over the CNN weights and also for the variational posterior distribution q Φ ( ) to approximate the exact posterior probability distribution. The selected CNN is then trained again considering the introduced stochastic parameters and probabilistic loss function (see Section 3.4). The output is based on the posterior distribution over the model coefficients p( |D) needed to further obtain p(y = k|x, D) which provides, given MRI scans x and the trained dataset D, the distribution ofŷ, the classification predictions. The predictions are summarized to find the final prediction-the highest averaged probability-and associated uncertainty.
Data augmentation for the input x was performed by rotating, translating, and flipping the MRI scans. However, it did not affect the models' performance and prediction.

CNN architecture
The aim of a neural network model Φ is to produce predictions by representing an arbitrary function. 13 Neural networks' architecture is built using a sequence of input layers, hidden layers, and one output layer composed of a linear transformation and an activation function. Each layer depends on a parameter which represents the weights W and bias b of the network. In a Φ , is optimized to minimize the differences between the predicted and real values given the training data D.
A 3D CNN is a special kind of neural network applied to analyze 3D visual imagery. It exploits the image data topology and reduces it to its key features which are used as classification criteria. The neural network Φ used in this article is a 3D CNN with two inputs and two branches. The first branch is designed to extract high-level features from the preprocessed MRI scans. The second branch extracts key characteristics from the spatially informed images. Both branches have identical architectures to maximize feature detection and guarantee common output dimensions. A fusion layer combines the output semantic features of both CNN branches which are passed on to a set of fully connected layers (FC). The FC layers and a SoftMax activation at the last layer are trained to assign a predefined class to the input 3D images, that is, to predict the neurodegenerative disease for each patient. TA B L E 2 State-of-the-art CNN-based methods for neuroimaging multi-class classification Fully connected (ReLU) ---128 Fully connected (SoftMax) ---5 Five 3D CNN architectures are used for the neural network model branches. We use four 3D CNN architectures based on related state-of-the-art brain multi-class classification and we additionally propose a new one.
The first state-of-the-art 3D CNN architecture 28 uses three simple convolutional layers and two fully connected layers to classify computed tomography brain scans into healthy scans and abnormal scans containing subarachnoid hemorrhage, intraparenchymal hemorrhage, acute subdural hemorrhage, and brain polytrauma hemorrhage. The second architecture, proposed by Payan and Montana, 29 consists of three convolutional layers and three FC layers. This architecture differentiates between Alzheimer's disease, MCI, and healthy subjects. In a third architecture, Mzoughi et al 30 propose a complex CNN architecture to distinguish between low-grade and high-grade gliomas: eight convolutional layers with three fully connected layers. Finally, another multi-class classification approach is suggested by Parmar et al. 31 In their study, they classify four Alzheimer's disease stages using fMRI scans and a set of five convolutional layers and three FC ones. Table 2 shows the details of the used state-of-the-art 3D CNN architectures for brain imaging classification. Further details of the models can be found in the mentioned studies.
The architecture we propose has four convolutional layer groups and three fully-connected layers. The detailed configuration of the network is shown in Table 3. The input to the CNN is a 3D preprocessed and spatially informed MRI scan with 91 × 109 × 91 dimensions. The convolutional layers compute their outputs by applying the convolutional operations and ReLU activations. The results are followed by max-pooling and batch normalization. The last three layers in the CNN are fully connected. These FC layers include the neurons connected to all outputs of their precedent layers. The last FC layer with a SoftMax activation has 5 neurons, which represent the probabilities of a patient to belong to one of the five classes (healthy, AD, MCI, PD, MS).
We proposed the architecture in Table 3 responding to the aspect and dimensions of the brain changes caused by neurodegenerative diseases. Initially, our network uses smaller filters to collect as much local information as possible, for example, the transition from one bran tissue to another. Then, we gradually increase the filters' width to reduce the generated feature space to represent more general, high-level and representative information such as brain lesions, abrupt tissue changes, and anatomical distortions.
The first convolutional layer easily captures minute details. Once the layer deepens, we accumulate and aggregate the features by increasing the layer window size. We also increase the max-pool accordingly to reduce redundancies in the features. We gradually increased our filter size by two strides in each successive layer so that the detected feature will be extracted at a low, intermediate, and high level with a bigger window for the successive layers. We call this a hierarchical network because the size of the kernel increases as the stride does. Nonetheless, we keep constant the number of filters in each as we seek to retain details in more expansive windows, that is, account for brain changes in extensive regions of the brain.

Bayesian inference
The Bayesian paradigm in statistics is based on two simple ideas. The first is that probability is a measure of belief in the occurrence of events, and the second idea is that prior beliefs influence posterior beliefs. Bayes' theorem summarizes this interpretation, stating that Here, P(D|H) is the likelihood whereas P(H) is the prior and P(D) the evidence. Using Bayes' formula to train a predictor can be understood as learning from the data D. BNNs are stochastic neural networks trained using a Bayesian approach. The goal of artificial neural networks (ANNs) is to represent an arbitrary function y = Φ(x). ANNS are built using one input layer, a succession of hidden layers and one output layer, where each layer is represented as a linear transformation, followed by a nonlinear operation (activation function). In this process, we have a set of parameters of the network formed by the weights W of the network connections and the biases b of the network layers. Deep learning is the process of regressing the parameter vector from the training data D, with D composed of a series of input x and their corresponding labels y. In addition, the stochastic model assumes that is drawn from a probability distribution. The main goal in using this stochastic framework is to obtain a better idea of the uncertainty associated to the underlying processes. This can be done by comparing the predictions of multiple sampled model parametrizations . This process can be formalized as ∼ p( ), and y = Φ (x) + , with a random noise. Thus, we need a prior distribution over the possible model parametrization p( ), and a prior confidence in the predictive power of the model p(y = k|x, ), where k is one possible label of the response y.
We opt here for a variational inference method, following. 32 This method performs a Monte Carlo approximation of the posterior distribution by integrating over the kernel and bias. In this case, a surrogate posterior q ( ) (parametrized by to match the exact posterior), and divergence for both the weights and bias distributions are to be specified.
The prior p( ) and variational posterior q ( ) are taken as Gaussian distributions with initialized mean = 0 and a small variance 2 . The choice of using a normal prior is not arbitrary. Assuming a known normal prior is a common practice to not over-parametrized neural networks. 32 Besides the ideal mathematical properties of a Gaussian process and the simple formulation of its log, which is used in most of the learning algorithms, a multivariate normal surrogate posterior q ( ) dramatically simplifies the probabilistic loss function (ELBO), leading to a generalization of the expectation-maximization algorithm. 13 In Vladimirova et al, 33 the author proved that Gaussian priors also avoid overfitting and improve generalization. Malinin et al 34 also showed that Gaussian priors in multi-layered architectures perform comparatively well against other distributions such as the (non-informative) Uniform and the Weibull. Furthermore, as the number of hidden layers increases in a neural network, the prior will converge to a Gaussian process regardless of the defined initial distribution. 35 Assuming the training set is D, we denote by D x the training inputs, and by D y the training labels. Thus, applying Bayes' theorem, the resulting posterior distribution of , given the training data D, can be written as Note that the marginal probability distribution p(y|x, D) quantifies the model's uncertainty on its prediction, and it is our quantity of interest. Given p( |D), it can be calculated for each k = 1, … , K as In practice, a large number of j are sampled from p( |D), and used to compute the y j from the equation y = Φ (x) + , taking samples from p(y = k|x, D). The output becomes Y = {y j |j ∈ [1, N]} a set of samples from p(y|x, D), and Θ = { j |j ∈ [1, N]}, where N represents the Monte Carlo samples.
For classification purposes, the average model prediction will give the relative probability of each class k, considered a measure of uncertainty, 13 and it is given bŷ where I is the indicator function, and N = |Θ|. The final prediction is taken as the most likely class of the set

Training, inference, and prediction
The SBNN is trained using variational inference given the training dataset D. The variational inference samples the distribution q ( ) parametrized by parameters . The SBNN learns the values of such that q ( ) resembles closely the exact posterior, and the Kullback-Leibler divergence measures this matching. In terms of , the Kullback-Leibler divergence is defined as In order to compute D KL ( q ||p ) , p( ′ |D) has also to be computed. To avoid this problem, the evidence lower bound (ELBO) is defined as Minimizing D KL ( q ||p ) is equivalent to maximizing the ELBO given that log(p(D)) only depends on the prior. The ELBO, as any other loss function in CNN, can be optimized by applying stochastic gradient descent to variational inference. 36 The trained SBNN is used to produce predictions over the test dataset. Monte-Carlo inference is applied to each input of the test dataset to get the corresponding probability distribution p(y|x, D) depending on the estimated output posterior P( |D). This is performed by executing the SBNN n times (or iterations) for a test dataset input x.
A vector of relative probabilities (uncertainties) per disease is obtained by averaging the n predictions, that is, the mean value of the probability distribution. The final predicted classŷ is taken as the most likely class.

Evaluation metrics
The SBNN performance is measured in terms of accuracy, sensitivity, specificity, and Accuracy measures the correctly identified MRI scans over the total classification. Sensitivity describes the portion of MRI scans correctly classified in the correct class, whereas specificity refers to the ratio of MRI scans incorrectly classified within the incorrect class. F 1 -score measures the similarity between the predictedŷ classes and the target y classes.

Experimental scheme
To train, validate, and test the SBNN model, the 330 MRI scans were split using a simple random sampling technique. The training dataset, used by the SBNN model to learn the features of each disease, consists of 231 (70%) T 1 -weighted preprocessed scans. The validation dataset and test dataset are composed of 51 (15.4%) and 49 (14.6%) images, respectively. The full dataset was split per disease to guarantee a balanced distribution in each subset dataset. The CNN architectures are trained for a maximum of 100 iterations (epochs) using the Adam optimizer with an exponential decay function of initial rate = 1 × 10 −5 . For fine-tuning the CNNs, we include L2 regularizations and dropouts of 0.5 after each dense layer. We also added batch normalization layers. The batch size is set to a default value of 32 training samples. The input training and validation sets are shuffled upon generation. The CNN architectures were trained using six random dataset splits to assess their robustness in the presence of different data partitions.
The 3D CNN with the highest accuracy over the test dataset is chosen and transformed into a SBNN using variational inference. Then, the SBNN is trained, validated and tested using the same parameters mentioned above, except the loss function, which in a BNN can be defined as in expression (5). The bias and weights defining the parameter are given a Gaussian distribution for both the prior and variational posterior. The mean and 2 variance are initially set to 0 and 0.1 respectively. The Kullback-Leibler divergence function is weighted by 231, the number of training samples.
The Monte-Carlo number of iterations is set to n = 1000. Probability distributions per disease are obtained for each of the 49 test samples. For one sample, the mean value of every disease probabilities is chosen to obtain the predicted relative probabilities. The disease with the highest relative probability is taken as the final prediction.

RESULTS
In this section, the performance of the proposed SBNN for the classification of neurodegenerative disorders is evaluated. First, the overall classification accuracy of the five CNN architectures with and without the explicit spatially informed input is measured to quantify the spatial information incidence and to select a suitable architecture for the current application. The trained SBNN is used to predict the unknown testing dataset with 49 preprocessed and spatially informed MRI scans. The performance of the SBNN is discussed based on classification evaluation metrics. Since uncertainty is a relevant aspect in clinical diagnosis, confidence intervals are conducted for each neurodegenerative disorder given the test dataset. Finally, the predictions of three MRI scans from three different AD patients are constrained with the patients' clinical evaluation to assess the SBNN predictive properties. The proposed SBNN is implemented with the high-level neural networks API Keras using Python 3.7. The networks are trained on the Google Colab platform equipped with 12 GB NVIDIA Tesla K80 GPU and Xeon Processors CPU@2.3 GHz.

Spatially informed input
The five CNN architectures were trained using the same model hyper-parameters, training, validation, and test datasets. The accuracy for the testing dataset defined in Equation (8) is used to measure the performance of each CNN architecture and the effect of the spatially informed input on them. Table 4 shows the accuracy results of the five networks for one random split of the dataset including and excluding the spatially informed input. For the five architectures, omitting the spatially informed input, the accuracy oscillates between 49% to 64%. Among the CNN architectures, the lowest classification accuracy (49%) was obtained from the model of Reference 28. The highest accuracy for this group corresponds to our proposed architecture with a value of 64%. Although it represents a relatively high value compared to other architectures, it is considered a low value for classification purposes.
Furthermore, the accuracy for the five CNNs architectures with the spatially informed input ranges from 68% to 83%. Our architecture outperforms the state-of-the-art ones in terms of the accuracy of the testing dataset (83%). The architecture presented by Mzoughi et al 30 shows the second-highest accuracy among the five models. The CNN configuration from Reference 31 as well as the one from Reference 29 exhibit the lowest accuracy with a same value of 68%. A similar scenario occurred with other five random partitions of the dataset, that were considered to check for robustness in the results against the selected starting partition. Our architecture displays the highest testing accuracy in all splits ranging from 76.5% to 87.4%.
In each architecture, the accuracy for the testing dataset improved by adding the spatially informed input. Overall, the accuracies enhanced between 8% and 25%, four out of the five architectures presented an increment of more than 10%.
To visualize how the spatially informed input is improving the architectures accuracy, slices of a feature map for the first convolutional layer of our proposed architecture are shown in Figure 3. The feature map corresponds to a MRI scan of a MS patient. Typically, the areas colored in yellow and red in the feature map are called activated regions which represent the extracted key features from the input data. In the brain imaging context of neurodegenerative conditions, the activated regions express characteristic features associated with a neurodegenerative process, that is, brain regions affected by the neuronal damage.
It can be seen that the activated regions in the preprocessed MRI scan ( Figure 3A) are mainly located in the outer parts of the occipital and parietal lobes, which are unassociated manifestations in the brain produced by MS. 1 Contrarily, the activated regions in the spatially informed scan ( Figure 3B) are distributed over the brain. Some activations are found in the front and parietal lobes as small polygons which represent the white matter lesions caused by the disease. Likewise, other activated regions emphasize the corpus callosum and the cerebellum. One manifestation of MS is the slimming of these structures. Other activated areas might reflect possible cerebral atrophy in most parts of the lobes.
The combination of the preprocessed and spatially informed MRI scan boosts the CNN detection of physiological changes in the brain produced by a neurodegenerative disease. Note that abnormalities identified by the CNN only inform about the current type of degenerative disorder but do not provide any insight about disease staging or prognosis. Furthermore, although activation regions in the feature maps matched some known neurodegenerative disorders biomarkers, these cannot be directly considered informative neuroimaging features.

Model evaluation
As our proposed CNN architecture outperformed the four state-of-the-art ones, it was transformed using variational inference to include uncertainty (our SBNN approach). Initially, the CNN model without Bayesian inference was trained and tested. It showed an overall (testing) accuracy of 83%. When the stochastic model, that is, the prior and surrogate posterior over the CNN parameters were incorporated in the CNN (Bayesian CNN), the accuracy decreased to 81.5%. A loss of 1.5% occurred by introducing the stochastic model to the neural network. Although the CNN accuracy presented a reduction, confidence in the predictions can be inferred from the hybrid model. It is a fair trade-off between accuracy and inclusion of uncertainty.
Since now we have access to the probability distributions associated with the model parameters, we evaluated the model with 1000 trials and the 48 MRI scans of the test dataset to obtain an average accuracy of 81.5% with a SE of 1.5%. Evaluation metrics for each disease were calculated based on the averaged multi-class confusion matrix for the 1000 trials. Macro average and weighted average metrics are also provided to evaluate the model performance with consideration to the unbalanced number of samples per class. Table 5 shows the performance evaluation for the entire model and for each class.
As observed in Table 5, the highest sensitivity with a perfect score of 1.00 was obtained for the classification of the diseases MS and PD. Healthy and AD sensitivities are highly accurate with values of 88% and 76%, respectively. MCI presents the lowest sensitivity (25%) of the group; only one out of four MCI MRI scans is correctly classified in the MCI class. Specificity for the five diseases varies from 50% to 100%. In MS, all the images were correctly classified: the specificity and sensitivity are perfect. In the case of a healthy subject, there is only a 14% (1-0.86) chance of a MRI scan being assigned to another disease. F 1 -score might be considered as an alternative precision measure since there TA B L E 6 95% confidence intervals for the mean of the predicted probabilities for each disease is an uneven class distribution in the test dataset. In terms of the F 1 -score, most of the diseases perform well except for MCI. MS exhibits a perfect score and both PD and healthy scores are considerably high. AD F 1 -score is lower than their counterparts sensitivity and specificity due to the wrong classification of MCI MRI scans into the category. Most of the classes met the criteria considered suitable to distinguish between the neurodegenerative diseases studied in this article.
Macro average resumes the evaluation metrics for all diseases assuming that the classes are balanced. Considering the unbalanced test dataset, the weighted average accuracy better summarizes these metrics. The SBNN has an overall sensitivity of 85%, a specificity of 81%, and an F 1 -score of 82%. These metrics are consistent with the SBNN averaged accuracy and support the potential predictive properties of the model for classifying neurodegenerative diseases.
A confidence interval indicates a likely range of values that encompasses the expected value with a certain probability; this is a measure of uncertainty. Table 6 introduces the 95% confidence intervals for each disease. MCI probabilities are lower compared to other diseases; this can be explained by the number of samples in the test data and by the overlapping features it presents concerning other categories such as AD and healthy. As well, MCI presents a wider range which translates as diagnostic variability given that it might be a precursor of AD or PD. MS MRI range of probabilities is almost perfect which means the SBNN can easily categorize it. MS distinguishes itself from other disorders as the neural damage appears as small lesions distributed over the brain. This distinctive characteristic allows our model to easily recognize it resulting in a narrow predicted mean interval. PD and healthy classification accuracy can rise up to 90%, a relatively high accuracy value considering the false positives coming from other classes (see Table 5). AD predictions are precise given the narrow range of its confidence interval. AD predicted mean is a sensible value considering that MRI scans from AD can be misclassified as MCI and PD. Various anatomical changes of the brain produced by early stages of AD coincide with those caused by MCI and PD. 3

Model predictions
The SBNN was executed 1000 times in each MRI scan to obtain the probability distributions per disease. The mean and SD values were extracted from the probability distributions to represent the predicted probability and its error class-wise. The class with the highest probability is considered the predicted class for that MRI scan. If the predicted class corresponds to the ground truth class, the classification is correct. If the predicted class differs from the ground truth class, the classification is considered incorrect. If none of the average probabilities per disease exceeds a threshold probability of 0.5 (the minimum acceptable probability), the MRI scan is not assigned to any of the five classes, that is, the classification is uncertain. This procedure was performed in the 48 images of the test dataset. Three prediction outputs for three different AD patients' MRI scans are presented in Figure 4. The probability distributions are log-transformed to improve visualization; the closer the values are to 1, the higher the probability is. This information is provided by the ADNI database. The results were compared to the diagnosis summary of the clinical evaluation of the patients.
The first MRI scan ( Figure 4A) corresponds to a patient diagnosed with AD with mild dementia symptoms. Cerebral atrophy is evident in the patient's MRI. The SBNN correctly classified the MRI scan; the mean probability is 0.83 and the error is 0.13. The average probabilities for the other diseases are very low, and in most cases, are equal to 0. There is an 83% chance that the MRI scan belongs to an AD patient. The second scan ( Figure 4B) belongs to an Alzheimer's patient with mild declining dementia impairment. In the clinical diagnosis, the patient was initially diagnosed with MCI. However, due to the frequent movement disorders reported in the cognitive assessment test, it was determined that the patient resided in the Alzheimer's disease group. According to the clinical summary, the diagnosis is provisional, and further sessions are needed to assess the patient and provide a final decision. This MRI scan is not classified into any disease even though the AD class (0.45) exhibits the highest probability among the group. The prediction also suggests that the patient presents PD features (0.24). This value matches the symptomatology reported in the cognitive test; PD is characterized by the loss of motor functions. The SBNN uncertain prediction for this scan reflects the clinical diagnosis of the patient. The third MRI scan ( Figure 4C) comes from an AD patient with no evident memory disorder according to the clinical assessment test. However, the mass spectrometry measurement in the biospecimens results shows low levels of the A -42 protein. Such A -42 level decrease is a known biomarker in early stages of AD. 37 The prediction for this patient's scan shows an incorrect classification. Healthy is the most likely predicted class with a probability of 0.55 and an error of 0.26. Three main aspects affect the classification: (i) the mean probability surpassed the decision threshold only by 0.05, (ii) the error is half of the predicted probability, and (iii) the mean probability for AD is 0.43. For this scan, given predicted probabilities, their errors, and the level of uncertainty of the SBNN Model, a clinician can decide whether to integrate or not more information to support their diagnosis.

DISCUSSION
In this work, the SBNN is presented to classify four neurodegenerative disorders, Alzheimer's disease, MCI, Parkinson's disease, and Multiple Sclerosis, and healthy subjects. To improve accuracy, the model took advantage of spatially informed MRI data (introduced as a regional brain segmentation) and used a Bayesian approach in which the model predictions are drawn with confidence.
The results show a substantial increment (up to 25%) in the classification accuracy of 3D CNNs by incorporating a spatially informed MRI scan. Such an increment is a direct consequence of enhancing the CNN feature detection through spatial information of the brain. While the features identified by our method might represent anatomical biomarkers, we used them for no other purpose than classification. Including the explicit spatial scan results in detailed differentiation and increased separability between diseases. These results support previous findings in brain imaging statistical studies 15,16 where disease classification was improved by adding spatial information. Although the segmentation MRI scan enhances the distinction of neurodegeneration processes, there is a limitation in this approach. In highly correlated diseases, such as Alzheimer's and MCI, the captured spatial changes are insufficient to separate these conditions unequivocally.
This research also proposes a CNN architecture able to identify and classify neurodegenerative processes. The CNN yielded an accuracy of 81.5% and 83% with and without variational inference, respectively. The proposed architecture understands and generalizes the most relevant features of the diseases of this study. To the best of our knowledge, the SBNN model is the first to investigate four distinct neurodegenerative diseases using MRI data.
An advantage of the SBNN model lies in the fact that it accounts for uncertainty. The SBNN model formulation explicitly allows for probabilistic estimation of the CNN parameters which are used in the convolutional layers. Additionally, the Bayesian framework considers inference on the predictions by summarizing the posterior predictive probabilities. Concerning the prediction of a neurodegenerative disorder, the probabilistic formulation of the SBNN model naturally accounts for stochastic uncertainty (inherent random noise) and epistemic uncertainty (ignored information), and therefore, better represents the uncertainty in clinical diagnosis. SBNN predictions can not only detect the most likely disease of a patient, but they can also inform about patients with atypical variants, numerous pathologies, early disease stages, or diseases ignored in this research.
The overall classification accuracy of the SBNN model shows that approximately 8 out of 10 MRI scans are classified correctly. However, there is room for improvement regarding individual disease accuracy. PD, MS and HL accuracy surpassed the overall accuracy, but MCI and AD could not. MCI's low accuracy is not surprising since many studies consider this condition an early stage of AD and a subtle transition between normal aging old patients and AD. 7 Most of the MCI MRI scans were wrongly classified by the model into the AD class. MS class presents perfect evaluation metrics. MS is the only condition whose manifestation in the brain occurs in the white matter 5 and thus, the spatially informed input (or brain structure segmentation scan) provides richer information for the classification of this disorder.
It is worth noting that data preprocessing is instrumental in comparing and analyzing MRI data coming from diverse data sources. This study indirectly proposes a data preprocessing pipeline to integrate MRI scans from neurodegenerative patients. Although preprocessing varies depending on the research purpose, most brain imaging classification studies need to standardize and clean data to guarantee accurate predictions. 23 This work shows that the proposed preprocessing schema could be implemented in neuroimaging studies involving numerous MRI data sources.
In future studies concerned with the same problem, the accuracy may be improved by including exogenous variables, related to clinical assessments, which account for uncaptured hallmarks of the diseases. Clinical trials suggest that MRI-based information coupled with information derived from clinical evaluations and individual characteristics lead to accurate differential diagnosis, 1,2,8 for example, psychological and cognitive tests that describes the antecedent clinical history and behavioral aspects of the patient. Likewise, use datasets that balance patient's information (age and sex) since research have shown that these variables define the population affected by neurodegenerative diseases. For example, The incidence of AD is higher amongst women than men. 6 A direct extension that might be considered is the addition of other spatial information, for example, relative coordinates of voxels, distances between brain regions, and voxel intensity densities. These alternative ways to encode spatial information 17 could enrich disease detection and account for subtle differences between the voxel intensities from two similar disorders, for example, AD and MCI. Another approach unexplored by this work but frequent in clinical practice is the inclusion of other MRI modalities. Although in most of the cases T 1 -weighted scans display neurodegeneration biomarkers, 3 other modalities such as T 2 -weighted and FLAIR scans can be used to support the findings of the former. In a more robust model, other modalities could be added to account for disease features that are undetectable in the T 1 -weighted sequence.
The findings of this study are expected to be of direct clinical interest for neurologists, physicians, and medical researchers. In a comprehensive overview, the most important contribution of this work is that it is one of the first works that consider an explicit spatial input for disease classification, and assists diagnosis with uncertainties. The spatially informed MRI data is not only limited to neurodegenerative diseases. Given its nature, it is extendable to other brain conditions such as tumors, gliomas, and strokes, in which brain damage is also spatially located. Likewise, the SBNN model can be brought to bear on other brain imaging modalities.

CONCLUSION
In conclusion, this article introduces a reliable automatic tool that can provide a fast and accurate diagnosis, and supports the diagnosis and treatment of neurodegenerative disorders. One of the main benefits of the proposed model is the combination of explicit spatial information in MRI data, multiple diseases, and the ability to generate probabilistic diagnoses.
The SBNN was conceived as a tool to facilitate automatic medical diagnosis. The Neurorom package transforms the raw MRI scans into prepossessed MRI scans to be the inputs of the SBNN. The R package can be installed into the R platform from https://github.com/DavidPayares/neuronorm. The SBNN implementation, including data loading, model training, and model evaluation, can be found at https://github.com/DavidPayares/SBNN. The sequential use of both tools produces an automatic pipeline to diagnose neurodegenerative diseases using MRI.