Multi-branch attention Raman network and surface-enhanced Raman spectroscopy for the classification of neurological disorders

Changchun Xiong; Changchun Xiong; Qingshan Zhong; Qingshan Zhong; Denghui Yan; Denghui Yan; Baihua Zhang; Baihua Zhang; Yudong Yao; Yudong Yao; Wei Qian; Wei Qian; Chengying Zheng; Xi Mei; Xi Mei; Shanshan Zhu; Shanshan Zhu; Shanshan Zhu; Shanshan Zhu

doi:10.1364/BOE.514196

1. Introduction

Raman spectroscopy (RS), an optical spectroscopic technique typically, can indirectly measure the vibrational states and offer the fingerprint information like molecular composition and structure of the sample through an inelastic scattering phenomenon [1,2]. Consequently, this technique has been used as a powerful tool in a wide range of application scenarios, especially in biomedical areas [3–5], in which the chemical composition of the biological samples can be determined without altering it and specific target labelling. Thus, the sample property can be predicted to evaluate the patients’ disease state. However, traditional RS has an inevitable limitation, i.e., intrinsic low quantum efficiency of the Raman effect resulting in weak Raman signal, which brings a challenge to achieve high-quality Raman measurements. Fortunately, one enhancement technique, i.e., surface-enhanced Raman spectroscopy (SERS) [6] is invented. The Raman signal can be amplified by ${10}^{14}$ times through the synergies gained from electromagnetic enhancement effect and chemical enhancement effect [7]. As a rapid, low-cost, non-invasive, and label-free technique, SERS has been widely used in-situ and ex-situ biomedical diagnostics questions including cancer [8–10], infection [11], and neurological disorders [12], etc. Because the presented and most of similar studies take label-free version of SERS without specific target labelling, designing an optimal data pre-processing and modelling procedure is always important step for analyzing and interpreting the untargeted spectral data.

Chemometrics, i.e., commonly applying mathematical and statistical methods to chemical measurements [13], is the main approach for Raman spectral analysis. For example, Savitzky-Golay (SG) [14], wavelet transform (WT) [15], polynomial fitting-based correction [16], area normalization and min-max normalization [17], etc., can be used for steps of denoising, baseline correction and normalization in Raman data pre-processing. Machine learning (ML) models [13,18–20], such as principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), and K-nearest neighbors (KNN) are the prevailing ways for Raman feature extraction and data modeling. However, in biomedical areas, due to the identification or classification problems in practical applications are always complex multi-classification tasks, these cumbersome traditional methods suffer a challenge in feature extraction and uncovering intricate patterns in such high-dimensional Raman data.

In recent years, deep learning (DL) [21], as an end-to-end learning method, shows great ability in data pre-processing, feature extraction and modelling. DL-based chemometrics has been applied to Raman spectral data and outperforms the conventional pipelines in applications of mineral identification [22], cancer detection [23–26], and genotype screening [27].

Convolutional neural network (CNN) [28] is the first and most widely used DL architecture in spectral analysis. The traditional CNN can extract spectral nonlinear features and map them to the corresponding label space for classification tasks by combination of two major parts in its architecture: feature extraction part and classification part [29]. In the feature extraction part, convolutional layers, batch normalization (BN) layers, activation layers, and pooling layers are directly overlays. The input spectra are convolved by the kernels of convolutional layer and the output feature maps are passed to the next layer as input to that layer. During the process of training update, the parameters of each convolutional layer is learned, thus, feature maps are updated. The pooling layers are utilized to reduce dimension of feature map by subsampling to reduce computational complexity. In the classification part, final classification output can be generated through mapping the feature maps of the last convolutional layers to the corresponding label space by fully connected layers and activation layers. For example, in studies of Liu et al. [30], Shin et al. [31], Zhu et al. [32], Lebrun et al. [33], and Qian et al. [25], they all proposed traditional CNN structures for the Raman spectral classification tasks, in which BN layers or Dropout layers are added in classification part to improve the generalization ability and classification accuracy of the model.

With the rapid development of CNN, many studies focused on modifying the feature extraction part of CNN by introducing variant structures to enhance spectral feature learning ability and the performance of models in spectral classification tasks. For example, as an important CNN variant, deep residual network (ResNet) [34] can improve feature learning ability by introducing residual structures, which are used in the models designed by Shin et al. [24] and Bratchenko et al. [26]. Besides, the long short-term memory (LSTM) structure of recurrent neural network can work on the sequential framework considering all of the predecessor data [35], thus, the global information of the data can be enhanced. Bratchenko et al. [26] applied the LSTM structure in combination with CNN to provide more global spectral characteristic features for diseases classification and achieved a distinguished result. In recent years, due to the outstanding ability to focus on the unique parts of data when processing large amounts of information, the attention mechanism as part of the network structure have been used in Raman spectral analysis. By applying attention mechanism [36] into CNN, the model prefers learn and focus on the most distinctive and valuable feature regions in spectral data. For examples, channel attention module is added into final feature maps by Qiu et al. [37] to enhance the most important features in fusarium head blight (FHB) infection detection; multi-scale dilated convolutional attention module is designed by Cai et al. [22] to strengthen or weaken different characteristic peaks in the Raman spectra for ore mineral samples; multi-head self-attention mechanism is proposed by Ren et al. [38] to extract more comprehensive features and achieve better classification performance in species blood and semen identification analysis. All these variants show superior performance than the traditional CNN models.

However, analyzing and interpreting the untargeted spectral data is still a big challenge in biomedical diagnostics questions. Raman peaks, corresponding to specific vibration modes of biomolecules, as spectral sequences with significant variations in intensity levels, are the prominent features in classification of different diseases [39]. For some diseases with similar physiological and pathological background, the tiny difference of samples from patients with these diseases or different degrees of the same disease brings a great difficult to find the characteristics based on only peak positions (wavenumber or frequency) [40]. In the peaks’ entire region, several other prominent features, including line shape and intensity changes also contain useful local information to characterize the samples [41,42]. Therefore, extracting more characteristic Raman peaks in the whole sequences of Raman spectral and then enhancing the global and local features of these characteristic Raman peaks, an example as show in Fig. 1, may offer an alternative approach to feature extraction and uncovering intricate patterns for untargeted biomedical spectral data.

Fig. 1. An example of the global and local features of one Raman peak in SERS spectrum. The global features of Raman peak show a sequence of Raman spectra covering the entire peak from the lowest to highest with different Raman shifts. The local features of Raman peak show a series of narrow sequences with abrupt local changes in the spectrum corresponding to intensity.

Download Full Size | PDF

Previous DL studies lacked the ability to focus on key information of characteristic Raman peaks. Traditional CNNs is typically limited by the size of the convolutional layer receptive field [43,44]. The convolutional kernel is usually used to regulate the receptive field in the convolutional layer. However, small size of the kernels results in insufficient extraction of important features of some characteristic Raman peaks [45], while big size of the kernels increases computational complexity and leads to overfitting problems [29]. In many variants of CNN, the abilities to extract prominent peak features are improved by designing residual structures or attention modules. Although these models have achieved good results for spectral classification, they still suffer from incapability of learning global or local information of some characteristic Raman peaks. Meanwhile, in the field of CNNs applied in Raman spectra, there are few studies comprehensively compare the structures and performances of these models.

Given this, in this paper, a Multi-branch Attention Raman Network (MBA-RamanNet) is proposed, in which a multi-branch attention module (MBAM), including the convolutional block attention module (CBAM) branch, deep convolution module (DCM) branch, and branch weights, is added after base convolutional layer. In CBAM branch, attention mechanism, including channel and spatial aspects, is adopted to enhance the distinctive global information of Raman peaks from the features of the base convolutional layer. In DCM branch, the local information of Raman peaks can be decoupled by different channel features of the base convolutional layer. Besides, by applying the autonomously trained-branch weights to fuse the features of the base convolutional layer and each branch, thereby optimizing the global and local information at the Raman peaks. Furtherly, in order to reveal which spectral features are most focused on in each branch in MBA-RamanNet, and to answer the question of where regions of spectral features are important for the classification task, spectral highlighting is performed by visualizing the heatmaps of the last layer of feature maps based on gradient-weighted class activation mapping (Grad-CAM) method.

In summary, the contributions of this paper are as follows:

1. The MBA-RamanNet is proposed for different neurological disorders classification tasks via serum SERS data.
2. The MBAM structure, including the CBAM branch, DCM branch, and branch weights, is designed to enhance the learning ability of CNN from the global and local information of characteristic Raman peaks.
3. The effectiveness of each branch in MBAM structure is validated through extensive ablation studies and Grad-CAM-based spectral highlighting.
4. Extensive experimental results demonstrate that the proposed MBA-RamanNet outperforms commonly used CNN methods and obtain the best performance in classification of various neurological disorders via untargeted serum SERS data.

2. Methodology

In this section, we elaborate on MBA-RamanNet with four subsections regarding the overall structure, two core branches in MBAM, loss function, and Grad-CAM-based spectral visualization analysis.

2.1 Overview of MBA-RamanNet

In this paper, MBA-RamanNet is proposed as shown in Fig. 2. The main structure of MBA-RamanNet has two main parts: the feature extraction part and the classification part. In the feature extraction part, the input SERS is sequentially passed through four base convolutional blocks (i.e., Conv Block1, Conv Block2, Conv Block3, Conv Block4) for feature learning. Each convolutional block (including convolution layer, BN layer, ReLU activation layer, and Maxpooling layer) is followed by multi-branch attention module (MBAM) to enhance local information in features and supplements more global information. Finally, all feature information is flattened and input into the classification part. The classification part is designed by three fully connected layers (i.e., FC1, FC2, and FC3) to map feature information from the feature space into the label space. The BN layer, ReLU layer, and Dropout (dropout rate p = 0.5) layer are added, following the FC1 and FC2 layers to further enhance the classification ability and avoid overfitting problems. The final classification result is obtained by passing the outputs of the FC3 layer through sigmoid function. The sigmoid function is used as activation function after the FC3 layer to enhance the classification ability of the classification part.

Fig. 2. The structure of Multi-branch Attention Raman Network (MBA-RamanNet).

Download Full Size | PDF

2.2 Multi-branch attention module (MBAM)

Raman peaks, corresponding to specific vibration modes of biomolecules, appear as prominent features in classification of different diseases. Local features of wavenumber or frequency at peaks, as well as other prominent features, including line shape and intensity changes in the peaks’ entire region, both contain useful information to characterize the samples. Therefore, in feature extraction part of the CNN model, it needs to pay more attention to extract both global and local features of these characteristic Raman peaks for classification tasks. Thus, MBAM, i.e., multi-branch attention module, includes CBAM branch, DCM branch, and branch weights is designed to enhance global and local features of the characteristic Raman peaks. The detailed introduction is as follows.

Convolutional block attention module branch (CBAM). The CBAM designed by Woo et al. [46] includes a channel attention module (CAM) and a spatial attention module (SAM), focus on extracting more global information of characteristic Raman peaks. The input features infer two attention maps along channel dimension and spatial dimension, respectively; then the attention maps are multiplied to its input features for adaptive feature refinement. Where the shared network of CAM is composed of two convolutional layers instead of fully connected layer. The CAM is used to handle the selection of different channel features. The SAM allows the model to pay more attention to the feature regions of the spectrum that are important for classification and weaken the influence of irrelevant regions. The detailed operation is described in Fig. 3.

Fig. 3. The detailed operation of the input feature in CBAM branch, including (a) the input feature in the CAM; and (b) the channel-refined feature in the SAM.

Download Full Size | PDF

Firstly, the channel-refined feature of the input feature from base convolutional block is obtained after CAM as shown in Fig. 3(a). Specifically, one feature ${{F}_{1}}\in {\mathbb {R}}^{C\times W}$ is used as input, and spatial information of ${{F}_{1}}$ is searched by average-pooling and max-pooling operations, after that two different spatial information vectors are generated, i.e., ${F}_{avg}^{C}\in {\mathbb {R}}^{C\times 1}$ and ${{F}_{max}^{C}}\in {{\mathbb {R}}^{C\times 1}}$. Both descriptors are then forwarded to sigmoid function after the shared network and element-wise summation to produce the channel attention map ${{M}_{C}}\in {{\mathbb {R}}^{C\times 1}}$. Thus, the channel-refined feature ${{F}_{1}^{'}}\in {{\mathbb {R}}^{C\times W}}$ is achieved by the element-wise multiplication of ${M}_{C}$ with ${F}_{1}$. Then, the output feature is generated by utilizing the inter-spatial relationship of channel-refined feature in the SAM as shown in Fig. 3(b). Average-pooling and max-pooling operations of ${{F}_{1}^{'}}$ across the channel axis are performed to generated two maps, i.e., $F_{avg}^{S}\in {{\mathbb {R}}^{1\times W}}$ and ${F}_{max}^{S}\in {{\mathbb {R}}^{1\times W}}$. These two maps are then concatenated and convolved by a convolution layer, to produce the spatial attention map ${{M}_{S}}\in {{\mathbb {R}}^{1\times W}}$ after sigmoid function. The output feature ${F}_{2}$ is refined by the element-wise multiplication of ${M}_{S}$ with ${F}_{1}^{'}$. The overall attention process can be summarized as below:

(1)$$\begin{array}{l} {M_C}\left( {{F_1}} \right) = \sigma \left( {Net\left( {F_{avg}^C} \right) + Net\left( {F_{max}^C} \right)} \right) \\ {{F}_{1}^{'} = {M_C}\left( {{F_1}} \right) \otimes {F_1}} \\ {{M_S}\left( {{F}_{1}^{'}} \right) = \;\sigma \left( {Conv\left( {\left[ {F_{avg}^S;F_{max}^S} \right]} \right)} \right)} \\ {{F}_{2} = {M_S}\left( {{F}_{1}^{'}} \right) \otimes {F}_{1}^{'}} \end{array}$$

where $\sigma$ represents the sigmoid function, $Net$ represents the shared network.

Deep convolution module branch (DCM). The DCM branch is designed to extract deep local feature of Raman peaks by using small convolutional kernels. It consists of two convolutional layers (kernel size of 1$\times$1), one BN layer and one ReLU layer between these convolutional layers. More convolutional layers are utilized to increase the nonlinear knowledge of the input features from base convolutional block without changing the features scale (i.e., without loss of the Raman spectral sequences). The features of spectral sequences in peaks’ entire region from different channels is integrated. Then, the potential local information is decoupled to complement global information of Raman peaks. Specifically, the input feature ${F_1} \in {^{C \times W}}$ is trained by two convolutional layers, thus the output feature ${F_3} \in {^{C \times W}}$ is generated, which denotes deep feature of potential local information of Raman peak as show in Fig. 4. In short, the DCM branch is computed as:

(2)$${{F}_{3}}=Con{{v}_{2}}\left( f\left( Con{{v}_{1}}\left( {{F}_{1}} \right) \right) \right)$$

Where $f$ represents the BN layer and ReLU layer followed by $Con{{v}_{1}}$.

Fig. 4. The detailed operation of the input feature in DCM branch.

Download Full Size | PDF

Branch weights. The trainable branch weights ${w}$ are designed to fuse the input feature ${F}_{1}$, the global feature ${F}_{2}$ of Raman peaks after CBAM branch, and the potential local feature ${F}_{3}$ of Raman peaks after DCM branch. The branch weights are parameters that can be trained similar to some parameters (such as the parameters of the convolutional kernel or the fully-connected layer) during model learning, and their values are automatically modified according to the importance of features at different branches during the process of model training to achieve optimization. Thus, the last output feature ${{F}_{last}} \in {^{C \times W}}$ of MABM is obtained by weighted summation of features of each branch, as shown in below.

(3)$${{F}_{last}}=\mathop{\sum }_{i=1}^{3}{{w}_{i}}\times {{F}_{i}}$$

where ${w}_{i}$ represents the weight of each branch, ${F}_{i}$ represents the features of each branch.

2.3 Loss function

In the classification part, three fully connected layers are designed to map the features of the Raman spectra to the label space of the sample classes. The final output is the probability corresponding to the input label. However, due to the class-imbalance in the dataset, a weighted cross-entropy loss function [47] is used as the loss function in the training process. Specifically, the class weights are firstly calculated by dividing the total spectral number by $N$ times the spectral number of each class in the training dataset, as shown in Eq. (4); the weighted loss is then obtained by the cross-entropy loss and class weight, as shown in Eq. (5).

(4)$$Class\_weight\left[ C \right]=\frac{\mathop{\sum }_{i=1}^{N}Quantity\left[ i \right]}{N\times Quantity\left[ C \right]}$$

where $N$ is the number of total classes in the training dataset, $Quantity\left [C \right ]$ is the spectral number of $C$th class, and $Class\_weight\left [C \right ]$ is the class weight of the $C$th class, which $C\in N$.

(5)$$\begin{array}{l} Weighted\_loss=\frac{\mathop{\sum }_{i}^{n}\left( Class\_weights\left[ label\left[ i \right] \right]\times \left[ -\text{log}\left( \frac{\text{exp}\left( output\left[ i,~label\left[ i \right] \right] \right)}{\mathop{\sum }_{j}^{N}\exp \left( output\left[ i,j \right] \right)} \right) \right] \right)}{\mathop{\sum }_{i}^{n}Class\_weights\left[ label\left[ i \right] \right]} \end{array}$$

where $n$ is the spectral number in the training dataset, $N$ is the number of total classes in the training dataset, $label\left [i\right ]$ is the label of $i$th spectrum, $output\left [i, j\right ]$ is the result of the classification part for $j$th class in $i$th spectrum.

2.4 Visualization analysis of spectral features

Gradient-weighted class activation mapping (Grad-CAM) [48] is a spectral highlighting method to objectively find the spectral ROI, i.e., the specific Raman peaks for the final classification. Firstly, input one spectrum into the model, the gradient of the prediction result of the last fully connected layer is backpropagated to obtain the corresponding gradient vectors. Then, the gradient vector is global-averaged in each channel to obtain the weight of the corresponding feature vector. After that, all feature vectors are multiplied with the corresponding weights and then summed. The result is activated by the ReLU function to obtain a heatmap along the wavenumber dimension. The heatmap represent the classified contribution of each wavenumber of the input spectrum, i.e., the specific Raman peaks corresponding to the biomolecule, in which the red indicates a high influence while blue indicates a low influence on the final classification result. Thus, in order to reveal which spectral features are most focused on in each branch in MBA-RamanNet, and to answer the question of where regions of spectral features are important for the classification task, Grad-CAM method is adopted for spectral visualization analysis.

3. Experiments

In this section, some details about the sample preparation, datasets, and evaluation metrics used in this paper are first introduced. Then we conduct two groups of experiments and ablation studies to validate our model.

3.1 Sample preparation

In this study, all subjects were carefully selected from the Ningbo Kangning Hospital in China based on the clinical assessments, biomedical imaging and neuropsychological tests. This study was conducted in accordance with the Declaration of Helsinki. In detail, experimental subjects consist of 91 healthy controls (HC), 61 mild cognitive impairment (MCI), 46 Alzheimer’s disease (AD), 23 Non-Alzheimer’s dementia (Non-AD, include 5 vascular dementia (VD), 8 Parkinson’s disease (PD), 9 Lewy body dementia (LBD)), and 1 frontotemporal dementia (FTD)), 61 elderly depression (ED), and 22 elderly anxiety (EA). 3ml of peripheral blood was collected from these subjects with overnight fasting, and then immediately stored at $-80\,^{\circ }\mathrm{C}$ before SERS measurements.

3.2 Datasets

SERS measurements were collected from serum samples of the above subjects according to the method in our previous research [18,23]. After SERS measurements, two datasets were constructed for different neurological disorders classification tasks. More details about these two datasets (i.e., Dataset 1 and Dataset 2) are shown in Supplementary Document [see Additional file 1 and Table S1].

Each SERS spectrum in both Dataset 1 and Dataset 2 had 1266 features, and min-max normalization was calculated on each spectrum to limit the spectral intensity to between 0 and 1, which reduce the influence of spectral intensity variability.

5-fold cross-validation [49] was adopted to carry out an unbiased evaluation, in which the final SERS spectra of each category in each dataset were randomly split into 5 sets, and then one set was selected from each category and combined as the test dataset. Furtherly, 20% of the remaining 4 sets were randomly selected for validation and the remaining were used for the training. Due to the data-hungry nature of deep learning models, the five original SERS spectra corresponding to each final SERS spectra in the training and validation datasets were also added.

3.3 Evaluation metrics

Due to the class-imbalance in the dataset, accuracy, weighted precision, weighted recall, and weighted F1-score were selected to evaluate the classification ability of models. The equations are as below:

(6)$$\begin{aligned} Accuracy=\frac{\mathop{\sum }_{C=1}^{N}\left( TP\left[ C \right]+TN\left[ C \right] \right)}{\mathop{\sum }_{C=1}^{N}\left( TP\left[ C \right]+TN\left[ C \right]+FP\left[ C \right]+FN\left[ C \right] \right)} =\frac{{TP}_{\left( Total \right)}+{TN}_{\left( Total \right)}}{Quantit{{y}_{\left( \text{Total} \right)}}} \end{aligned}$$

(7)$$\begin{aligned} Sample\_weight\left[ C \right]=\frac{Quantity\left[ C \right]}{Quantit{{y}_{\left( \text{Total} \right)}}} \end{aligned}$$

(8)$$\begin{aligned} weighted\, Precision=\mathop{\sum }_{C=1}^{N}Sample\_weight\left[ C \right] \times \frac{TP\left[ C \right]}{TP\left[ C \right]+FP\left[ C \right]} \end{aligned}$$

(9)$$\begin{aligned} weighted\, Recall=\mathop{\sum }_{C=1}^{N}Sample\_weight\left[ C \right]\times \frac{TP\left[ C \right]}{TP\left[ C \right]+FN\left[ C \right]} \end{aligned}$$

(10)$$\begin{aligned} weighted\, F1\mbox{-}score=\mathop{\sum }_{C=1}^{N}Sample\_weight\left[ C \right]\times \frac{2\times Precision\left[ C \right]\times Recall\left[ C \right]}{Precision\left[ C \right]+Recall\left[ C \right]} \end{aligned}$$

where $N$ is the number of total classes in the test dataset, $Quantity[C]$ is the spectra number of $C$th class. The $TP$, $FP$, $TN$, and $FN$ represent true positive, false positive, true negative, and false negative for $C$th class, respectively. ${TP}_{\left ( Total \right )}$ represents the total number of the true positive, ${TN}_{\left ( Total \right )}$ represents the total number of the true negative, and ${Quantity}_{\left ( Total \right )}$ is the total spectra number of classes.

To carry out an unbiased evaluation, 5-fold cross-validation was employed to train, and the sum of the confusion matrices after 5-fold cross-validation was used to evaluate the performance of model. Besides, receiver operating characteristic (ROC) curves were further plotted and area under the ROC curve (AUC) were calculated for the healthy control and different disease groups, respectively.

3.4 Overall validation

In order to comprehensively verify the classification capacity of our proposed MBA-RamanNet for different Raman spectra, a series of experiments were constructed on two datasets for different neurological disorders classification tasks. Many popular CNN models applied in Raman spectra proposed by Liu et al.2017 [30], Shin et al.2020 [24], Shin et al.2023 [31], Zhu et al.2023 [32], Lebrun et al.2022 [33], Qian et al.2022 [25], Qiu et al.2022 [37], Cai et al.2022 [22], Ren et al.2023 [38], and Bratchenko et al.2022 [26] were selected for comparison. Training, validation and testing were performed on both Dataset 1 and Dataset 2, and performance of all models were evaluated after 5-fold cross-validation.

Firstly, we conducted the comparison experiments between our proposed MBA-RamanNet and the original models in the above previous studies. All models were trained using an early stopping strategy for training in order to avoid model overfitting. Specifically, when the loss of the validation dataset no longer decreased for more than 10 epochs of the training process, the training was stopped early to save the optimal model results. The batch size was 4 and the Adam function was adopted as the optimizer function to train all models with the learning rate of $1 \times {10 ^{-4}}$. As shown in Supplementary Document [see Table S2], for both Dataset 1 and Dataset 2, MBA-RamanNet outperformed all these comparisons regarding accuracy, weighted precision, weighted recall and weighted F1-score. In Dataset 1, accuracy, weighted precision, weighted recall and weighted F1-score of 88.24%, 88.79%, 88.24% and 88.08% were achieved for classification of healthy controls, mild cognitive impairment, Alzheimer’s disease, and Non-Alzheimer’s dementia. In Dataset 2, accuracy, weighted precision, weighted recall and weighted F1-score of 90.00%, 91.23%, 90.00% and 89.72% were achieved for classification of healthy controls, elderly depression and elderly anxiety.

Secondly, since different CNN models had unique classification parts which had a significant impact on performance. In order to objectively evaluate the feature learning ability of these models, further comparison experiment was performed on Dataset 1 and Dataset 2 through replacing the original classification part in different models with the same classification part in MBA-RamanNet. As shown in Table 1, with the same classification part, the performance of our MBA-RamanNet surpassed all comparison methods on both Dataset 1 and Dataset 2. It is worth noting that after replacing with the same classification part of MBA-RamanNet, all these models achieved higher classification accuracy than that with the original classification part, which demonstrated that MBA-RamanNet had a powerful capability in feature extraction and classification leading to superior performance.

Table 1. Performance evaluation with Accuracy on Dataset 1 and Dataset 2. Bold numbers indicate results after replacing with our classification part. The value after $\pm$ represents the standard error of the 5-fold cross-validation.

View Table | View all tables in this article

In addition, the confusion matrices and ROC curves of all these models with exactly same classification part were plotted in Fig. 5 and Fig. 6. The overall confusion matrix shows the prediction summary including the number of correct and incorrect predictions in each class. As shown in Fig. 5, although MBA-RamanNet did not achieve the highest AUC on Dataset 1, it reached 0.9579 and achieved the best accuracies when identifying MCI and AD. As shown in Fig. 6, on Dataset 2, MBA-RamanNet achieved the highest AUC value of 0.9727 and the best accuracy when identifying ED. In summary, our MBA-RamanNet obtained the best results on two different neurological disorders datasets for being able to capture more both global and local features of characteristic Raman peaks.

Fig. 5. Confusion matrices and ROC curves for MBA-RamanNet and other CNN models on Dataset 1. Where MCI, AD, Non-AD represents mild cognitive impairment group, Alzheimer’s disease group, Non-Alzheimer’s dementia group, respectively.

Download Full Size | PDF

Fig. 6. Confusion matrices and ROC curves for MBA-RamanNet and other CNN models on Dataset 2. Where ED and EA represents elderly depression group and elderly anxiety group, respectively.

Download Full Size | PDF

Previous studies have indicated that there are close associations between neuropsychiatric disorders and metabolism, including mitochondrial energy metabolism [50], glucose metabolism [51] and amino acid metabolism [52]. These studies have also found that phospholipids and proteins [53,54], cortisol levels [55] or vitamins [56,57] in blood serum can be used as potential diagnostic biomarkers in neuropsychiatric disorders. The abnormal changes of these substances may be linked to symptoms of psychiatric disorders. For example, B vitamins are required for proper functioning of the methylation cycle, monoamine oxidase production, DNA synthesis and the repair and maintenance of phospholipids. Vitamin B deficiency could influence memory function, cognitive impairment and dementia [56]. In addition, the abnormal changes of the level in zinc [58], iron [59] or copper [60] in neuropsychiatric disorders may also lead to phospholipids and proteins imbalance. Given this premise, SERS spectra of blood serum can provide rich molecular fingerprint information in neuropsychiatric disorders patients. Although maybe there are only tiny difference of blood samples from these patients due to similar physiological and pathological background, our proposed MBA-Raman Net provided a relatively high classification accuracy by extracting more characteristic Raman peaks in the whole sequences of Raman spectral and then enhancing the global and local features of these characteristic Raman peaks.

3.5 Overall analysis of model visualization

To more in-depth analysis of the experimental results and explain why MBA-RamanNet is more effective than other comparison CNN methods, we conducted a spectral highlighting visualization based on Grad-CAM.

For Dataset 1 and Dataset 2, original SERS spectra were firstly preprocessed by Savitzky-Golay algorithm for de-nosing, fifth-order polynomial fit for baseline removal, and min-max normalization for reducing intensity biases. Afterwards, the average SERS spectra of each category and the subtracted spectra between any two categories were obtained, which indicated the main characteristic Raman peaks, as shown in the upper part in Fig. 7 and Fig. 8. Then, the coarse heatmaps of all models were achieved based on Grad-CAM method, as described in section 2.4, to localize the important areas of spectra in each category for these two different neurological disorders classification tasks. The heatmap represent the classified contribution of each wavenumber of the input spectrum in each category, i.e., the specific Raman peaks corresponding to the biomolecule, in which the red indicates a high influence while blue indicates a low influence on the final classification result.

Fig. 7. The upper part shows the average SERS spectra after preprocessing, subtracted spectra between each class and other classes, and the lower part shows the heatmaps of all models in Dataset 1, which represents (a) healthy; (b) mild cognitive impairment (MCI); (c) Alzheimer’s disease (AD); (d) Non-Alzheimer’s dementia (Non-AD). The horizontal dimension of the heatmap represents different wavenumbers (i.e., corresponding to different biomolecules associated with Raman peaks), while the vertical dimension represents different serum samples in each group after 5-fold cross-validation. The number of 0 to 1 in the colour scale indicate the brightness in heatmaps, in which red color represents a high contribution while the blue color represents a low contribution of different Raman peaks to the final classification decision-making.

Download Full Size | PDF

Fig. 8. The upper part shows the average SERS spectra after preprocessing, subtracted spectra between each class and other classes, and the lower part shows the heatmaps of all models in Dataset 2, which represents (a) healthy; (b) elderly depression (ED); (c) elderly anxiety (EA). The horizontal dimension of the heatmap represents different wavenumbers (i.e., corresponding to different biomolecules associated with Raman peaks), while the vertical dimension represents different serum samples in each group after 5-fold cross-validation. The number of 0 to 1 in the colour scale indicate the brightness in heatmaps, in which red color represents a high contribution while the blue color represents a low contribution of different Raman peaks to the final classification decision-making.

Download Full Size | PDF

As shown in Fig. 7 and Fig. 8, the heatmaps of different models presented various important areas in some categories. In traditional CNN models, the heatmaps of the models from Liu et al.2017, Shin et al.2023 and Lebrun et al.2022 showed several red or yellow narrow straps at some characteristic Raman peaks, with small and sporadically distributed regions. As we know, the convolutional layer is mainly used for feature extraction in traditional CNN models, and the region of learned feature information is inextricably linked to the convolutional kernel of the convolutional layer [61]. The convolutional layer channel with small convolutional kernels in these models is able to extract local feature information which is useful for classification, but the features in the region around the Raman peak may be learned several times resulting in information fragmentation. By contrast, from the heatmaps of Zhu et al.2023, it can be found that large convolutional kernel in this model tends to ignore a lot of localized features when learning information from a wider region.

It can be seen that it is insufficient to extract much more useful features from complex spectral data only by using traditional convolutional layers. Many variants of the CNN model were proposed to enhance the learning ability of Raman spectral features and improve classification accuracy. The variant proposed by Shin et al.2020 adopted residual structure to prevent the loss of information during deep feature learning of the spectra, and the corresponding heatmaps showed that most important features learnt by the model mainly focused on the Raman peak region. However, the region is too broader, more deeply localized information about characteristic Raman peaks was still lost. The model in study of Bratchenko et al.2022 was also designed based on residual structure, what’s different from before was that the LSTM was added to extract more features in the global spectral sequences. LSTM can process sequential data and remember long-term dependencies [35,62]. As shown in its heatmaps, much more local information of characteristic Raman peaks was highlighted in the global spectral sequences, while the global information of these Raman peaks was incomplete. The reason might be that LSTM has been proved to be effective on data with temporal features, which was not suitable for global feature learning of characteristic Raman peaks.

Due to the outstanding ability to focus on the important parts when processing large amounts of feature information, Qiu et al.2022 incorporated a channel attention mechanism (CAM) and residual structure to improve the learning ability to important features. However, too much local information without global information of Raman peaks was obtained due to the small convolutional kernel size of 1$\times$1. This might be the reason of low accuracies were achieved in these two neurological disorders classification tasks. Ren et al.2023 and Cai et al.2022 tried to improve the above defects of CNN by adopting modified attention mechanism. Ren et al.2023 combined multi-head self-attention mechanism with the CNN model. Due to effective combination of one-dimensional convolution and multi-head self-attention mechanisms, this model can learn the overall global features of characteristic Raman peaks, but the local information of Raman peaks was slightly insufficient. Cai et al.2022 designed a CNN model with multi-scale dilated convolutional attention, which mainly included multi-scale dilated convolutional block and channel-wise attention mechanism. As shown in Fig. 7(b) and Fig. 8(b), this model presented better ability in focusing on the global information of Raman peaks than Ren et al.2023, but also poor ability in focusing on the local information of these Raman peaks. In this paper, we proposed MBA-RamanNet by using multi-branch attention module including the convolutional block attention module (CBAM) branch, deep convolution module (DCM) branch, and branch weights. From the heatmaps in Fig. 7(b) and Fig. 8(b), it can be seen that MBA-RamanNet shows the good ability in extracting characteristic Raman peaks in the whole spectral sequences, and strengthening the distinctive global and local information of these Raman peaks that can significantly distinguish different categories of SERS spectra.

3.6 Ablation study

Due to MBAM in MBA-RamanNet including the CBAM branch, DCM branch, and branch weights, we conducted two ablation studies on two datasets (i.e., Dataset 1 and Dataset 2) to verify the necessity of multi branch structure of MBAM designed in MBA-RamanNet.

Ablation study A: The performance of original CNN model without MBAM (i.e., Baseline) and original CNN model with MBAM (i.e., Baseline + MBAM) was compared, as shown in Table 2. The results indicated that MBAM can effectively improve the classification performance of the original CNN model regarding accuracy, weighted precision, weighted recall and weighted F1-score.

Table 2. Performance evaluation of ablation study A on Dataset 1 and Dataset 2. Bold numbers indicate the values of the MBA-RamanNet after adding MBAM. The value after $\pm$ represents the standard error of the 5-fold cross-validation.

View Table | View all tables in this article

Ablation study B: To furtherly verify the effectiveness of each branch in MBAM, we designed the following scenario for ablation study B: (1) Baseline: Original CNN model; (2) MB1A1: Only CBAM branch was added after base convolutional layer of Baseline. The original features and the features learnt from CBAM branch were directly accumulated; (3) MB1A2: Branch weights were added to the two branches of MB1A1; (4) MB2A1: Only DCM branch was added after base convolutional layer of Baseline. The original features and the features learnt from DCM branch were directly accumulated; (5) MB2A2: Branch weights were added to the two branches of MB2A1; (6) MB3A: Both CBAM branch and DCM branch were added after base convolutional layer of Baseline. The original features and the features learned from these two branches were directly accumulated; (7) MBAM: Complete multi-branch attention module. The detailed design of ablation study B was shown in Supplementary Document [see Fig. S1].

Table 3 shows the performance evaluation with accuracy, weighted precision, weighted recall and weighted F1-score on Dataset 1 and Dataset 2, all of which demonstrate that as the module changes, the accuracy also changes. Both MB1A1 and MB2A1 performed better than the Baseline, indicated CBAM or DCM can effectively enhance the feature learning ability in Raman spectral data. Furtherly, when both CBAM and DCM were added to the original CNN model, i.e., MB3A, the performance greatly outperformed that of MB1A1 and MB2A1. From the performance of MB1A2, MB2A2 and MBAM, it can be seen that the classification results can be greatly improved when features of different branches were fused according to the importance effects, i.e., branch weights. After conjunction with CBAM, DCM and branch weights in original CNN model, MBAM showed the best classification performance regarding accuracy, weighted precision, weighted recall and weighted F1-score on both Dataset 1 and Dataset 2.

Table 3. Performance evaluation of ablation study B on Dataset 1 and Dataset 2. Bold numbers indicate the values of the MBA-RamanNet after adding MBAM. The value after $\pm$ represents the standard error of the 5-fold cross-validation.

View Table | View all tables in this article

In order to intuitively reveal which spectral features were most focused on by CBAM and DCM in MBA-RamanNet, the heatmaps of Baseline, the CBAM branch, the DCM branch, and MBAM were visualized by Grad-CAM on two datasets.

As shown in Fig. 9 and Fig. 10, the CBAM branch tended to focus on global information of characteristic Raman peaks, for example 494, 639, 725, 1135, 1654 cm$^{-1}$ in Fig. 9(b); 589, 725, 959, 1581 cm$^{-1}$ in Fig. 9(c); 725, 1135, 1443 cm$^{-1}$ in Fig. 9(d); as well as 812, 959, 1093, 1443 cm$^{-1}$ in Fig. 10(a); 494, 589, 725, 1135, 1206, 1654 cm$^{-1}$ in Fig. 10(b). While the DCM branch was more prone to supplement some local information of these Raman peaks, for example, 639, 725, 1135 cm$^{-1}$ in Fig. 9(a); 589, 959, 1073, 1581 cm$^{-1}$ in Fig. 9(c); 725, 1135, 1443 cm$^{-1}$ in Fig. 9(d); as well as 494, 639, 725, 1135 cm$^{-1}$ in Fig. 10(a); 639, 725, 1135, 1654 cm$^{-1}$ in Fig. 10(c). The assignment of Raman peaks to specific vibrational modes and biomolecules was tentatively assigned based on previous research [18,23]. Overall, Raman peaks of 494, 639, 725, 1135, and 1654 cm$^{-1}$ hold the significant contributions to the final classification making, which indicates that Cellulose, Guanine, L-Arginine, L-Tyrosine, Lactose, Adenine, Coenzyme A, D-Mannos, Phospholipids, Amide-I, ${\alpha }$-Helix, etc., may be the important biomarkers to distinguish neuropsychiatric disorders. Thus, by combination of CBAM and DCM, MBAM showed the good ability to capture more global and local features of these characteristic Raman peaks on both Dataset 1 and Dataset 2, the best performance in Table 3 indicated that the capabilities of global feature extraction and local feature extraction were both crucial and beneficial to the proposed model.

Fig. 9. The average SERS spectra after preprocessing, and the heatmaps of Baseline, CBAM branch, DCM branch and MBAM on Dataset 1, which represents (a) healthy; (b) mild cognitive impairment; (c) Alzheimer’s disease; (d) Non-Alzheimer’s dementia. The horizontal dimension of the heatmap represents different wavenumbers (i.e., corresponding to different biomolecules associated with Raman peaks), while the vertical dimension represents different serum samples in each group after 5-fold cross-validation. The number of 0 to 1 in the colour scale indicate the brightness in heatmaps, in which red color represents a high contribution while the blue color represents a low contribution of different Raman peaks to the final classification decision-making.

Download Full Size | PDF

Fig. 10. The average SERS spectra after preprocessing, and the heatmaps of Baseline, CBAM branch, DCM branch and MBAM on Dataset 2, which represents (a) healthy; (b) elderly depression; (c) elderly anxiety. The horizontal dimension of the heatmap represents different wavenumbers (i.e., corresponding to different biomolecules associated with Raman peaks), while the vertical dimension represents different serum samples in each group after 5-fold cross-validation. The number of 0 to 1 in the colour scale indicate the brightness in heatmaps, in which red color represents a high contribution while the blue color represents a low contribution of different Raman peaks to the final classification decision-making.

Download Full Size | PDF

4. Conclusion

In this paper, we propose a Multi-branch Attention Raman Network (MBA-RamanNet) with a multi-branch attention module (MBAM) that implements two neurological disorders classification tasks via untargeted SERS spectral data and deep learning. In MBAM, the convolutional block attention module (CBAM) branch, deep convolution module (DCM) branch, and branch weights, are combined to extract more global and local information of characteristic Raman peaks which are more distinctive for classification tasks. CBAM, including channel and spatial aspects, is adopted to enhance the distinctive global information of characteristic Raman peaks, and DCM is used to supplement local information of these Raman peaks. An autonomously trained branch weights are applied to fuse the features of each branch, thereby optimizing the global and local information of the characteristic Raman peaks for identifying diseases. Extensive and detailed experiments are performed on two different SERS spectra datasets, the results indicate that MBA-RamanNet outperforms commonly used CNN methods and achieves the best performance in classifying various neurological disorders. Furthermore, it demonstrates that the combination of multi-branch attention module may provide a rapid and effective data analysis tool and could be readily extended to other medical classification algorithms. For future work, we will focus on further validating its applicability by exploring its classification problem in more domains.

Funding

Zhejiang Provincial Natural Science Foundation of China (LQ23H180004); General scientific Research Project of Zhejiang Education Department (Y202146273); K. C. Wong Magna Fund in Ningbo University; Ningbo City Key R&D plan "Jie Bang Gua Shuai" (2023Z170).

Disclosures

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. M. Prochazka, Basics of Raman Scattering (RS) Spectroscopy (Springer International Publishing, 2016).

2. R. S. Das and Y. Agrawal, “Raman spectroscopy: recent advancements, techniques and applications,” Vib. Spectrosc. 57(2), 163–176 (2011). [CrossRef]

3. G. W. Auner, S. K. Koya, C. Huang, et al., “Applications of Raman spectroscopy in cancer diagnosis,” Cancer Metastasis Rev. 37(4), 691–717 (2018). [CrossRef]

4. W. Querido, S. Kandel, and N. Pleshko, “Applications of vibrational spectroscopy for analysis of connective tissues,” Molecules 26(4), 922 (2021). [CrossRef]

5. E. S. Allakhverdiev, V. V. Khabatova, B. D. Kossalbayev, et al., “Raman spectroscopy and its modifications applied to biological and medical research,” Cells 11(3), 386 (2022). [CrossRef]

6. X. X. Han, R. S. Rodriguez, C. L. Haynes, et al., “Surface-enhanced Raman spectroscopy,” Nat. Rev. Methods Primers 1(1), 87 (2022). [CrossRef]

7. N. Guillot and M. L. de la Chapelle, “The electromagnetic effect in surface enhanced Raman scattering: enhancement optimization using precisely controlled nanostructures,” J. Quant. Spectrosc. Radiat. Transfer 113(18), 2321–2333 (2012). [CrossRef]

8. J. Lei, D. Yang, R. Li, et al., “Label-free surface-enhanced Raman spectroscopy for diagnosis and analysis of serum samples with different types lung cancer,” Spectrochim. Acta, Part A 261, 120021 (2021). [CrossRef]

9. A. Chakraborty, A. Ghosh, and A. Barui, “Advances in surface-enhanced Raman spectroscopy for cancer diagnosis and staging,” J. Raman Spectrosc. (2019).

10. E. Avci, H. Yilmaz, N. Sahiner, et al., “Label-free surface enhanced Raman spectroscopy for cancer detection,” Cancers 14(20), 5021 (2022). [CrossRef]

11. L. Hamm, A. Gee, and A. D. S. Indrasekara, “Recent advancement in the surface-enhanced Raman spectroscopy-based biosensors for infectious disease diagnosis,” Appl. Sci. 9(7), 1448 (2019). [CrossRef]

12. G. Cennamo, D. Montorio, V. B. Morra, et al., “Surface-enhanced Raman spectroscopy of tears: toward a diagnostic tool for neurodegenerative disease identification,” J. Biomed. Opt. 25(08), 1 (2020). [CrossRef]

13. S. Guo, J. Popp, and T. W. Bocklitz, “Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling,” Nat. Protoc. 16(12), 5426–5459 (2021). [CrossRef]

14. A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by simplified least squares procedures,” Anal. Chem. 36(8), 1627–1639 (1964). [CrossRef]

15. P. M. Ramos and I. Ruisánchez, “Noise and background removal in Raman spectra of ancient pigments using wavelet transform,” J. Raman Spectrosc. 36(9), 848–856 (2005). [CrossRef]

16. T. J. Vickers, R. E. Wambles, and C. K. Mann, “Curve fitting and linearity: data processing in Raman spectroscopy,” Appl. Spectrosc. 55(4), 389–393 (2001). [CrossRef]

17. O. Ryabchykov, S. Guo, and T. W. Bocklitz, “Analyzing Raman spectroscopic data,” Phys. Sci. Rev. 4(2), 1 (2019). [CrossRef]

18. D. Yan, C. Xiong, Q. Zhong, et al., “Identification of late-life depression and mild cognitive impairment via serum surface-enhanced Raman spectroscopy and multivariate statistical analysis,” Biomed. Opt. Express 14(6), 2920–2933 (2023). [CrossRef]

19. W. Zhang, J. S. Rhodes, A. Garg, et al., “Label-free discrimination and quantitative analysis of oxidative stress induced cytotoxicity and potential protection of antioxidants using Raman micro-spectroscopy and machine learning,” Anal. Chim. Acta 1128, 221–230 (2020). [CrossRef]

20. N. A. Blake, R. Gaifulina, L. D. Griffin, et al., “Machine learning of Raman spectroscopy data for classifying cancers: a review of the recent literature,” Diagnostics 12(6), 1491 (2022). [CrossRef]

21. F. Lussier, V. Thibault, B. Charron, et al., “Deep learning and artificial intelligence methods for Raman and surface-enhanced Raman scattering,” TrAC Trends Anal. Chem. 124, 115796 (2020). [CrossRef]

22. Y. Cai, D. Xu, and H. Shi, “Rapid identification of ore minerals using multi-scale dilated convolutional attention network associated with portable Raman spectroscopy,” Spectrochim. Acta, Part A 267, 120607 (2022). [CrossRef]

23. C. Xiong, S. Zhu, D. Yan, et al., “Rapid and precise detection of cancers via label-free SERS and deep learning,” Anal. Bioanal. Chem. 415(17), 3449–3462 (2023). [CrossRef]

24. H. Shin, S.-R. Oh, S. Hong, et al., “Early-stage lung cancer diagnosis by deep learning-based spectroscopic analysis of circulating exosomes,” ACS Nano 14(5), 5435–5444 (2020). [CrossRef]

25. H. Qian, X. Shao, H. Zhang, et al., “Diagnosis of urogenital cancer combining deep learning algorithms and surface-enhanced Raman spectroscopy based on small extracellular vesicles,” Spectrochim. Acta, Part A 281, 121603 (2022). [CrossRef]

26. I. A. Bratchenko, L. A. Bratchenko, Y. A. Khristoforova, et al., “Classification of skin cancer using convolutional neural networks analysis of Raman spectra,” Comput. Methods Programs Biomed. 219, 106755 (2022). [CrossRef]

27. S. Zhu, Y. Li, F. Zhang, et al., “Raman spectromics method for fast and label-free genotype screening,” Biomed. Opt. Express 14(6), 3072–3085 (2023). [CrossRef]

28. X. Zhang, J. Xu, J. Yang, et al., “Understanding the learning mechanism of convolutional neural networks in spectral analysis,” Anal. Chim. Acta 1119, 41–51 (2020). [CrossRef]

29. R. Luo, J. Popp, and T. W. Bocklitz, “Deep learning for Raman spectroscopy: a review,” Analytica 3(3), 287–301 (2022). [CrossRef]

30. J. Liu, M. Osadchy, L. Ashton, et al., “Deep convolutional neural networks for Raman spectrum recognition: a unified solution,” The Analyst 142(21), 4067–4074 (2017). [CrossRef]

31. H. Shin, Y. Kang, K. W. Choi, et al., “Artificial intelligence-based major depressive disorder (MDD) diagnosis using Raman spectroscopic features of plasma exosomes,” Anal. Chem. 95(15), 6410–6416 (2023). [CrossRef]

32. J. Zhu, X. Jiang, Y. Rong, et al., “Label-free detection of trace level zearalenone in corn oil by surface-enhanced Raman spectroscopy (SERS) coupled with deep learning models,” Food Chem. 414, 135705 (2023). [CrossRef]

33. A. Lebrun, H. Fortin, N. Fontaine, et al., “Pushing the limits of surface-enhanced Raman spectroscopy (SERS) with deep learning: Identification of multiple species with closely related molecular structures,” Appl. Spectrosc. 76(5), 609–619 (2022). [CrossRef]

34. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp. 770–778.

35. A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Phys. D 404, 132306 (2020). [CrossRef]

36. Z. Niu, G. Zhong, and H. Yu, “A review on the attention mechanism of deep learning,” Neurocomputing 452, 48–62 (2021). [CrossRef]

37. M. Qiu, S. Zheng, L. Tang, et al., “Raman spectroscopy and improved inception network for determination of fhb-infected wheat kernels,” Foods 11(4), 578 (2022). [CrossRef]

38. P. Ren, R. Zhou, Y. Li, et al., “Raman convmsanet: A high-accuracy neural network for Raman spectroscopy blood and semen identification,” ACS Omega 8(33), 30421–30431 (2023). [CrossRef]

39. Y. Wei, H. Chen, B. Yu, et al., “Multi-scale sequential feature selection for disease classification using Raman spectroscopy data,” Comput. Biol. Med. 162, 107053 (2023). [CrossRef]

40. P. J. Larkin, Infrared and Raman Spectroscopy: Principles and Spectral Interpretation (Elsevier, 2011).

41. X. Cong, X.-L. Liu, M. Lin, et al., “Application of Raman spectroscopy to probe fundamental properties of two-dimensional materials,” npj 2D Mater. Appl. 4(1), 13 (2020). [CrossRef]

42. M. S. Bradley, “Lineshapes in ir and Raman spectroscopy: A primer,” Spectroscopy 30, 42–46 (2015).

43. R. Gabbasov and R. Paringer, “Influence of the receptive field size on accuracy and performance of a convolutional neural network,” in 2020 International Conference on Information Technology and Nanotechnology (ITNT), (2020), pp. 1–4.

44. J.-H. Jacobsen, J. Van Gemert, Z. Lou, et al., “Structured receptive fields in CNNS,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 2610–2619.

45. J. Ding, M. Yu, L. Zhu, et al., “Diverse spectral band-based deep residual network for tongue squamous cell carcinoma classification using fiber optic Raman spectroscopy,” Photodiagn. Photodyn. Ther. 32, 102048 (2020). [CrossRef]

46. S. Woo, J. Park, J.-Y. Lee, et al., “Cbam: convolutional block attention module,” Computer Vision–ECCV 2018 pp. 3–19 (2018).

47. T. H. Phan and K. Yamamoto, “Resolving class imbalance in object detection with weighted cross entropy losses,” arXiv, arXiv:/2006.01413 (2020), [CrossRef]

48. R. R. Selvaraju, A. Das, R. Vedantam, et al., “Grad-cam: visual explanations from deep networks via gradient-based localization,” Int. J. Comput. Vis. 128(2), 336–359 (2020). [CrossRef]

49. D. Normawati and D. P. Ismi, “K-fold cross validation for selection of cardiovascular disease diagnosis features by applying rule-based datamining,” Sig. Img. Proc. Lett 1(2), 23–35 (2019). [CrossRef]

50. A. Misrani, S. Tabassum, and L. Yang, “Mitochondrial dysfunction and oxidative stress in Alzheimer’s disease,” Front. Aging. Neurosci. 13, 57 (2021). [CrossRef]

51. F. Panza, V. Frisardi, D. Seripa, et al., “Metabolic syndrome, mild cognitive impairment and dementia,” Curr. Alzheimer Res. 8(5), 492–509 (2011). [CrossRef]

52. R. Marijnissen, N. Vogelzangs, M. Mulder, et al., “Metabolic dysregulation as predictor for the course of late-life depression,” Eur. psychiatr. 33(S1), S416 (2016). [CrossRef]

53. J. Depciuch, M. Sowa-Kućma, G. Nowak, et al., “Phospholipid-protein balance in affective disorders: analysis of human blood serum using Raman and FTIR spectroscopy. a pilot study,” J. Pharm. Biomed. Anal. 131, 287–296 (2016). [CrossRef]

54. D. Pogocki, J. Kisała, and J. Cebulski, “Depression as is seen by molecular spectroscopy. phospholipid-protein balance in affective disorders and dementia,” Curr. Mol. Med. 20(6), 484–487 (2020). [CrossRef]

55. M. Dadkhah, M. Jafarzadehgharehziaaddin, S. Molaei, et al., “Major depressive disorder: biomarkers and biosensors,” Clin. Chim. Acta 547, 117437 (2023). [CrossRef]

56. K. Mikkelsen, L. Stojanovska, and V. Apostolopoulos, “The effects of vitamin B in depression,” Curr. Med. Chem. 23(38), 4317–4337 (2016). [CrossRef]

57. P. P. Lerner, L. Sharony, and C. Miodownik, “Association between mental disorders, cognitive disturbances and vitamin d serum level: current state,” Clin. Nutr. ESPEN 23, 89–102 (2018). [CrossRef]

58. J. Depciuch, M. Sowa-Kućma, G. Nowak, et al., “The role of zinc deficiency-induced changes in the phospholipid-protein balance of blood serum in animal depression model by Raman, FTIR and UV–Vis spectroscopy,” Biomed. Pharmacother. 89, 549–558 (2017). [CrossRef]

59. P. K. Mandal, D. Dwivedi, S. Joon, et al., “Quantitation of brain and blood glutathione and iron in healthy age groups using biophysical and in vivo mr spectroscopy: potential clinical application,” ACS Chem. Neurosci. 14(12), 2375–2384 (2023). [CrossRef]

60. C. N. Black, M. Bot, P. G. Scheffer, et al., “Uric acid in major depressive and anxiety disorders,” J. Affect. Disord. 225, 684–690 (2018). [CrossRef]

61. S. Khan, H. Rahmani, S. A. A. Shah, et al., A Guide to Convolutional Neural Networks for Computer Vision, vol. 8 (Springer Cham, 2018).

62. J. Zhao, F. Huang, J. Lv, et al., “Do RNN and LSTM have long memory?” in Proceedings of the 37th International Conference on Machine Learning, vol. 119 (2020), pp. 11365–11375.

Model	Dataset 1		Dataset 2
Model	Accuracy of original classification part	Accuracy after replacing with our classification part	Accuracy of original classification part	Accuracy after replacing with our classification part
Liu et al.2017 [30]	81.45% $\pm$ 2.81%	82.35% $\pm$ 2.53%	83.33% $\pm$ 1.83%	84.67% $\pm$ 2.26%
Shin et al.2020 [24]	55.17% $\pm$ 6.12%	81.91% $\pm$ 2.25%	57.33% $\pm$ 5.52%	85.33% $\pm$ 2.49%
Shin et al.2023 [31]	76.07% $\pm$ 3.92%	78.77% $\pm$ 3.02%	81.33% $\pm$ 2.49%	83.33% $\pm$ 2.98%
Zhu et al.2023 [32]	70.18% $\pm$ 3.91%	80.08% $\pm$ 3.34%	79.33% $\pm$ 1.94%	84.00% $\pm$ 2.45%
Lebrun et al.2022 [33]	76.48% $\pm$ 4.67%	78.30% $\pm$ 2.03%	79.33% $\pm$ 3.23%	82.00% $\pm$ 2.91%
Qian et al.2022 [25]	79.22% $\pm$ 3.89%	80.56% $\pm$ 3.24%	82.67% $\pm$ 2.87%	84.00% $\pm$ 2.87%
Qiu et al.2022 [37]	72.90% $\pm$ 4.08%	78.27% $\pm$ 2.76%	78.00% $\pm$ 2.00%	83.33% $\pm$ 2.79%
Cai et al.2022 [22]	81.03% $\pm$ 2.56%	85.08% $\pm$ 1.81%	84.00% $\pm$ 2.21%	86.67% $\pm$ 2.58%
Ren et al.2023 [38]	75.16% $\pm$ 3.39%	83.25% $\pm$ 2.35%	83.33% $\pm$ 1.49%	85.33% $\pm$ 2.71%
Bratchenko et al.2022 [26]	67.45% $\pm$ 2.80%	80.99% $\pm$ 2.12%	73.33% $\pm$ 1.05%	87.33% $\pm$ 2.67%
Our	-	88.24% $\pm$ 1.65%	-	90.00% $\pm$ 2.58%

Ablation study A	Dataset 1				Dataset 2
Ablation study A	Accuracy	weighted Precision	weighted Recall	weighted F1-score	Accuracy	weighted Precision	weighted Recall	weighted F1-score
Baseline	83.25% $\pm$ 2.35%	85.13% $\pm$ 1.75%	83.25% $\pm$ 2.35%	82.91% $\pm$ 2.33%	84.67% $\pm$ 2.26%	85.65% $\pm$ 2.71%	84.67% $\pm$ 2.26%	84.58% $\pm$ 2.33%
Baseline + MBAM	88.24% $\pm$ 1.65%	88.79% $\pm$ 1.47%	88.24% $\pm$ 1.65%	88.08% $\pm$ 1.71%	90.00% $\pm$ 2.58%	91.23% $\pm$ 2.31%	90.00% $\pm$ 2.58%	89.72% $\pm$ 2.51%

Ablation study B		Baseline	MB1A1	MB1A2	MB2A1	MB2A2	MB3A	MBAM
Dataset 1	Accuracy	83.25% $\pm$ 2.35%	85.07% $\pm$ 1.69%	85.99% $\pm$ 2.17%	84.16% $\pm$ 2.78%	85.53% $\pm$ 1.83%	85.08% $\pm$ 1.95%	88.24% $\pm$ 1.65%
	weighted Precision	85.13% $\pm$ 1.75%	85.34% $\pm$ 1.56%	87.03% $\pm$ 2.10%	86.39% $\pm$ 1.72%	86.52% $\pm$ 1.90%	86.80% $\pm$ 1.78%	88.79% $\pm$ 1.47%
	weighted Recall	83.25% $\pm$ 2.35%	85.07% $\pm$ 1.69%	85.99% $\pm$ 2.17%	84.16% $\pm$ 2.78%	85.53% $\pm$ 1.83%	85.08% $\pm$ 1.95%	88.24% $\pm$ 1.65%
	weighted F1-score	82.91% $\pm$ 2.33%	84.62% $\pm$ 1.71%	85.81% $\pm$ 2.17%	83.37% $\pm$ 3.12%	85.29% $\pm$ 1.89%	84.87% $\pm$ 2.00%	88.08% $\pm$ 1.71%
Dataset 2	Accuracy	84.67% $\pm$ 2.26%	85.33% $\pm$ 2.26%	87.33% $\pm$ 1.94%	86.00% $\pm$ 1.94%	87.33% $\pm$ 1.94%	86.67% $\pm$ 1.83%	90.00% $\pm$ 2.58%
	weighted Precision	85.65% $\pm$ 2.71%	87.00% $\pm$ 1.66%	88.05% $\pm$ 2.08%	87.71% $\pm$ 1.40%	87.60% $\pm$ 1.42%	89.05% $\pm$ 1.63%	91.23% $\pm$ 2.31%
	weighted Recall	84.67% $\pm$ 2.26%	85.33% $\pm$ 2.26%	87.33% $\pm$ 1.94%	86.00% $\pm$ 1.94%	86.67% $\pm$ 1.83%	86.67% $\pm$ 1.83%	90.00% $\pm$ 2.58%
	weighted F1-score	84.58% $\pm$ 2.33%	84.92% $\pm$ 2.38%	87.09% $\pm$ 1.95%	85.61% $\pm$ 2.03%	86.31% $\pm$ 1.86%	86.55% $\pm$ 1.90%	89.72% $\pm$ 2.51%

Model	Dataset 1		Dataset 2
Model	Accuracy of original classification part	Accuracy after replacing with our classification part	Accuracy of original classification part	Accuracy after replacing with our classification part
Liu et al.2017 [30]	81.45% $\pm$ 2.81%	82.35% $\pm$ 2.53%	83.33% $\pm$ 1.83%	84.67% $\pm$ 2.26%
Shin et al.2020 [24]	55.17% $\pm$ 6.12%	81.91% $\pm$ 2.25%	57.33% $\pm$ 5.52%	85.33% $\pm$ 2.49%
Shin et al.2023 [31]	76.07% $\pm$ 3.92%	78.77% $\pm$ 3.02%	81.33% $\pm$ 2.49%	83.33% $\pm$ 2.98%
Zhu et al.2023 [32]	70.18% $\pm$ 3.91%	80.08% $\pm$ 3.34%	79.33% $\pm$ 1.94%	84.00% $\pm$ 2.45%
Lebrun et al.2022 [33]	76.48% $\pm$ 4.67%	78.30% $\pm$ 2.03%	79.33% $\pm$ 3.23%	82.00% $\pm$ 2.91%
Qian et al.2022 [25]	79.22% $\pm$ 3.89%	80.56% $\pm$ 3.24%	82.67% $\pm$ 2.87%	84.00% $\pm$ 2.87%
Qiu et al.2022 [37]	72.90% $\pm$ 4.08%	78.27% $\pm$ 2.76%	78.00% $\pm$ 2.00%	83.33% $\pm$ 2.79%
Cai et al.2022 [22]	81.03% $\pm$ 2.56%	85.08% $\pm$ 1.81%	84.00% $\pm$ 2.21%	86.67% $\pm$ 2.58%
Ren et al.2023 [38]	75.16% $\pm$ 3.39%	83.25% $\pm$ 2.35%	83.33% $\pm$ 1.49%	85.33% $\pm$ 2.71%
Bratchenko et al.2022 [26]	67.45% $\pm$ 2.80%	80.99% $\pm$ 2.12%	73.33% $\pm$ 1.05%	87.33% $\pm$ 2.67%
Our	-	88.24% $\pm$ 1.65%	-	90.00% $\pm$ 2.58%

Ablation study A	Dataset 1				Dataset 2
Ablation study A	Accuracy	weighted Precision	weighted Recall	weighted F1-score	Accuracy	weighted Precision	weighted Recall	weighted F1-score
Baseline	83.25% $\pm$ 2.35%	85.13% $\pm$ 1.75%	83.25% $\pm$ 2.35%	82.91% $\pm$ 2.33%	84.67% $\pm$ 2.26%	85.65% $\pm$ 2.71%	84.67% $\pm$ 2.26%	84.58% $\pm$ 2.33%
Baseline + MBAM	88.24% $\pm$ 1.65%	88.79% $\pm$ 1.47%	88.24% $\pm$ 1.65%	88.08% $\pm$ 1.71%	90.00% $\pm$ 2.58%	91.23% $\pm$ 2.31%	90.00% $\pm$ 2.58%	89.72% $\pm$ 2.51%

Multi-branch attention Raman network and surface-enhanced Raman spectroscopy for the classification of neurological disorders

Abstract

1. Introduction

2. Methodology

2.1 Overview of MBA-RamanNet

2.2 Multi-branch attention module (MBAM)

2.3 Loss function

2.4 Visualization analysis of spectral features

3. Experiments

3.1 Sample preparation

3.2 Datasets

3.3 Evaluation metrics

3.4 Overall validation

3.5 Overall analysis of model visualization

3.6 Ablation study

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (10)

Tables (3)

Equations (10)

Biomedical Optics Express