Application of Chinese medicine evidence classification algorithm in the identification and treatment of Parkinson’s disease
Published Online: Jun 05, 2025
Received: Jan 15, 2025
Accepted: May 04, 2025
DOI: https://doi.org/10.2478/amns-2025-0973
Keywords
© 2025 Danqi Zhang, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Chinese medicine, which originated in China, has a history of thousands of years and is the traditional medicine of China, which is based on the theory and practical experience of Chinese medicine of the Han Chinese people in ancient China. Chinese medicine is a comprehensive science, the goal is to study the laws of human disease and health transition, including human health care and rehabilitation, diagnosis and treatment of disease [1]. Among them, “evidence” is a specialized term in Chinese medicine, which is a general term for a series of interrelated symptoms. It is a general term for a series of interrelated symptoms, which can be obtained through the four diagnostic methods of observation, smelling, questioning, and cutting, and is referred to as evidence or symptoms [2-4]. There are six major categories, including the eight principles, qi and blood, six meridians, Wei qi and blood, San Jiao, and viscera, as well as several subcategories [5-8]. In the clinical application of TCM syndrome classification, it is important to be able to accurately diagnose patients with different disease conditions, i.e., comprehensive syndrome classification.
Classification algorithms are an important technique in machine learning, whose main goal is to classify the samples in a dataset into predefined categories [9]. In many real-world applications, classification algorithms are widely used in text categorization, image categorization, spam filtering, recommender systems, and so on [10-13]. The essence of classification algorithms is to construct a classifier model by training samples, and then use the model to classify new unknown samples. According to different feature representations and classification ideas, classification algorithms can be classified into multiple categories, such as decision tree algorithms, plain Bayesian algorithms, K-nearest neighbor algorithms, logistic regression algorithms, support vector machine algorithms, neural network algorithms, and integrated learning algorithms, such as random forests, and gradient boosting trees, etc. [14-17]. The classification of TCM symptoms is complex and varied, and there may be judgmental errors for academic ineptitude or complex patient symptoms, resulting in medical malpractice. Therefore, the use of classification algorithms can assist medical practitioners in the classification process.
Bayesian classification is an important classification algorithm in data mining, and the use of Bayesian classification method for Parkinson’s disease Chinese medicine evidence identification is of practical significance for the identification and treatment of Parkinson’s disease. Based on the Bayesian principle, this paper completes the correlation analysis of the TCM evidence classification algorithm, captures the key issues of evidence identification, and introduces the topic model framework into the TCM evidence classification algorithm to improve the efficiency of the algorithm. Finally, the ROC curve of the improved TCM evidence classification algorithm model is compared with DenseNet121 and DAMNet model to evaluate the performance of the improved TCM evidence classification algorithm model. This paper has an important comparative value in the TCM evidence classification algorithm for the identification and treatment of Parkinson’s disease, and it can also be used as a precedent work and an important foundation for the subsequent clinical use of the TCM evidence classification algorithm.
Parkinson’s disease is a neurodegenerative disease, which is often characterized by motor slowing and stiffness [18]. And Chinese medicine, as a traditional Chinese medical art, has begun to be applied in the treatment of Parkinson’s disease, such as Chinese herbs [19] and acupuncture [20]. TCM evidence is to analyze the patient’s systemic symptoms and carefully determine the disease in order to personalize the treatment [21]. According to the classification of TCM syndromes, Parkinson’s belongs to the type of qi and blood deficiency, liver and kidney yin deficiency, wind and yang internal movement, qi deficiency and blood stasis, and phlegm-heat and wind movement [22]. Moreover, these symptoms of Parkinson’s present different symptoms in the early, middle and late stages, for this reason, the literature [23] combined TCM symptoms in a segmented manner, which was the first clinical treatment of TCM symptoms and Parkinson’s. Similarly, literature [24] used the classification of TCM syndromes for pattern recognition of Parkinson’s and analyzed the nature and location of Parkinson’s disease based on pattern recognition using hierarchical clustering. It was found that these analyses provided evidence for the pathogenesis of Parkinson’s. Literature [25] also mentioned that the use of hierarchical clustering can provide an accurate distinction between Parkinson’s disease and TCM evidence. And literature [26] utilized convolutional neural network based to design another intelligent dialectic model which contains all the classifications of TCM syndromes and is able to differentiate between the conditions that it does not have. And literature [27] utilized residual structure graph convolutional network to combine patient state elements with symptoms in order to classify the evidence and it is better than algorithms such as support vector machine, random forest and convolutional neural network. With the complexity of Parkinson’s disease, such algorithms can be combined.
Conditional probability is an important concept in probability theory. Suppose
Let events
Is the conditional probability of event
From Definition 1 of conditional probability, the multiplication theorem can be introduced.
Theorem 1 (Multiplication Theorem): Given any two events
The multiplication theorem applies not only to two events, but also to the case of multiple events, assuming that
Theorem 2: Let the sample space of Experiment
Equation (4) is then called the full probability formula.
With the foundations of the conditional probability formula and the full probability formula, the next step down is to introduce Bayes’ theorem.
Theorem 3 (Bayes’ Theorem): Let the sample space of experiment
Equation (5) is then called the Bayesian formula, i.e., Bayes’ theorem.
Let
Let
Observing Equation (8), it can be observed that
In some special cases, Eq. (8) can be further simplified when the prior probability of each hypothesis
In practical data mining, data
The AODE algorithm is an improved NBC algorithm with better classification performance, which is based on the data in the training set, and selects the parent attribute values from the attribute values of the test instances by statistical methods, so as to categorize the attributes in the training set into parent attributes and child attributes, and then assumes that the attribute values of an instance are independent of each other under the conditions of the given class labeling and the value of the parent attribute, which greatly relaxes the assumptions of the NBC algorithm conditions and improves its classification performance. The structure of AODE algorithm can be obtained from the above description as shown in Figure 1.

AODE algorithm structure diagram
In summary, we can know that the AODE algorithm is using the idea of restricted single dependency estimation and aggregating the results of multiple classifiers to improve its classification accuracy and reduce the complexity of classifier construction. In the following paper, the basic principles of the AODE algorithm will be described in detail.
Given a test instance
The formula can be obtained:
The AODE algorithm refers to
And in Eq. (12)
In Eq. (13),
A computational advantage of AODE over the TAN algorithm or the SP-TAN algorithm is that it can be used directly for incremental learning. Updating an AODE classifier with a new training instance only requires increasing the joint attribute values and the relevant entries in the class labeling frequency table.
Figure 2 demonstrates the classification process of the TCM evidence classification algorithm based on the Bayesian approach:

Classification process of TCM syndrome classification algorithm
They go through four stages of data preparation, feature extraction, classification training and classification testing respectively, which will be explained in the following in the context of differential diagnosis of Parkinson’s symptoms in TCM.
Data is the basis of data mining work, and it is also a crucial part in the differential diagnosis of Parkinson’s evidence. In this project, 200 Parkinson’s clinical medical cases were collected from one of the top 100 hospitals in the country, with the time years spanning from 1999 to 2008; 60 Parkinson’s TCM medical cases were extracted from various TCM books. Due to time (both ancient and modern), individuals (many TCM schools), and lack of standards, there are many noisy data, redundant data, sparse data, and incomplete data in the collected medical cases. For this reason, we firstly excluded the cases with incomplete records and serious irregularities due to various reasons; we also excluded the cases whose main evidence was not Parkinson’s, and finally 215 valid cases were left. After that, the cases were standardized and described uniformly, and 190 cases were selected as the training set and the remaining 25 cases as the test set, and both the test set and the training set covered the five types of evidence.
Chinese medicine symptoms can be divided into three main categories according to their degree of influence on the disease: the characteristic symptom system, the primary symptom system, and the secondary symptom system. However, in the TCM symptom classification system, it is possible to ignore these differences and look at the symptoms in a unified way, which may be multiple expression values on a single concept, such as the symptom of fever, the possible values of which are low fever, moderate fever, and high fever, and when constructing the eigenvector space, the low fever, moderate fever, and high fever are taken as the three independent components, instead of using the three values of a single component. The advantage of this is that we can not have to go to the category and degree for each symptom in TCM, because TCM has no unified standard in this regard, a hundred schools of thought, each with their own opinion, so ignoring this concept of degree and treating it as different dimensions, although it will lose a certain amount of precision, but it is helpful for the rapid judgment of TCM evidence.
The conventions of this paper:
Each sample of an unknown category is represented by a The categories are represented by the variable For a sample of a known category, we denote it by a A sample set consisting of samples of known categories is called a training sample set, and a training sample set
Parkinson’s disease is a progressive neurodegenerative movement disorder characterized by the accumulation of abnormal
In this section, we hope to model the process of diagnosing Parkinson’s disease in Chinese medicine as a Chinese medicine syndrome classification problem from the perspective of objectivization research, and we hope to automate the process of inferring the syndrome type from the characteristics of the scales by using the Chinese medicine syndrome classification algorithm, so as to provide auxiliary references for doctors and to promote the standardization of the diagnosis of Parkinson’s disease in Chinese medicine.
TCM has established five major categories of Parkinson’s disease: phlegm-heat-driven wind, blood stasis-driven wind, qi and blood deficiency, liver and kidney deficiency, and yin and yang deficiency, each of which can be categorized into primary and secondary syndromes. Parkinson’s patients usually suffer from a primary syndrome or both a primary and a secondary syndrome. The problem of determining the type of evidence a patient has is a multi-label classification problem if the type of evidence a patient has is labeled.
In this chapter, 91 symptoms obtained from “looking, smelling, asking, and cutting the tongue” in the TCM scale are used as data features, and the corresponding features are represented as “1” if the symptoms are present, and “0” if the symptoms are absent. A similar approach is used for the selection of labels, with the 10 types of syndrome types after the primary and secondary classifications as labels, and if the diagnosticator has a certain syndrome type, the corresponding label is represented as “1”, and if it does not exist, the corresponding label is represented as “0”. That is, in the transformed multi-label data, the features represent the symptoms, and the labels represent the syndrome types.
Each TCM scale is represented as an example of TCM evidence classification, and the TCM scale is treated as an unknown sample, and the corresponding evidence type is described as the label of the unknown sample, respectively, so as to ultimately model the process of TCM diagnosis of Parkinson’s disease, in which the evidence type is inferred from the TCM scale, as a TCM evidence classification problem in which it is predicted to be one of the five evidence types, with reference to a given set of samples of Parkinson’s with a known category.
The transformed dataset for the TCM scales has a number of features. Figure 3 Parkinson’s evidence type frequency plot demonstrates the label imbalance problem. In addition, there is a distinction between primary and secondary evidences between the evidences represented by the labels, thus making the inter-label relationship complex. That is, the transformed Parkinson’s dataset also suffers from the following challenges:
Complex inter-label relationships Label imbalance

Frequency of Parkinson’s syndrome type
To address the above problems, the next section attempts to improve the TCM evidence classification algorithm using the topic modeling framework.
The Parkinson’s dataset (parkinson) in this experiment was converted from a real Parkinson’s disease Chinese medicine scale provided by a brain hospital. Table 1 demonstrates the co-occurrence frequency of the labels in the Parkinson’s dataset.
Label co-occurrence frequency table of Parkinson’s data set
Phlegm heat moving wind - main | Blood stasis and wind. - main | Deficiency of qi and blood. - main | Liver and kidney deficiency. - main | Yin-yang deficiency - main | |
---|---|---|---|---|---|
Phlegm heat moving wind - Secondary | 2 | 1 | 13 | 5 | |
Blood stasis and wind. - Secondary | 3 | 1 | 6 | 3 | |
Deficiency of qi and blood. - Secondary | 2 | 3 | 9 | 13 | |
Liver and kidney deficiency. - Secondary | 7 | 2 | 5 | 2 | |
Yin-yang deficiency - Secondary | 1 | 7 | 16 | 2 |
Table 2 demonstrates the Top words of the themes obtained from the experiments, with the theme setting of 3, the tags generated by BTM-ML with the highest relevance to the three themes are in theme 0. Liver-kidney insufficiency-principal and phlegm-heat-driven-wind-times are located in the first two places, and from the co-occurrence frequency of the tags shown in Table 2, it can be seen that the tags Liver-kidney insufficiency-principal and phlegm-heat-driven-wind-times co-occur 13 times, which is in the highest co-occurrence, and the other tags Yin-yang double deficiency-principal and Qi and Blood deficiency - times also had 13 times. Liver and Kidney Deficiency - Main and Qi and Blood Deficiency - Times also co-occurred 9 times, which shows that the top labels in Theme 0 have high correlation, which effectively extracts part of the relationship between the labels.
Analyzing the frequency table of co-occurrence of Top words and labels for Theme 1 and Theme 2 shows that yin and yang deficiency-second and qi and blood deficiency-main appeared 16 times, liver and kidney insufficiency-second and phlegm-heat-driven-wind-main appeared a total of 7 times, and yin and yang deficiency-main and qi and blood deficiency-second appeared a total of 13 times, which suggests that Theme 1 and Theme 2 are also effective at representing the relationships among the labels.
In summary, this indicates that the relationships between labels can be effectively extracted and represented using the theme modeling approach. On the other hand, whether it is Theme 0, Theme 1 or Theme 2, the Liver and Kidney Deficiency Certificate is ranked in the top position, indicating that almost all the other certificates labels are related to the Liver and Kidney Deficiency Certificate, and that the Liver and Kidney Deficiency Certificate plays a very crucial role in the field of Parkinson’s in TCM, which is in line with the more common TCM view that Parkinson’s disease is based on Liver and Kidney Deficiency, and Qi and Blood Deficiency.
Top word
Topic 0 | probability | Topic 1 | probability | Topic 2 | probability |
---|---|---|---|---|---|
Liver and kidney deficiency. - main | 0.3298 | Yin-yang deficiency - Secondary | 0.4500 | Liver and kidney deficiency. - Secondary | 0.3318 |
Phlegm heat moving wind - Secondary | 0.2311 | Deficiency of qi and blood. - main | 0.3000 | Phlegm heat moving wind - main | 0.1981 |
Deficiency of qi and blood. - Secondary | 0.1918 | Liver and kidney deficiency. - main | 0.1500 | Yin-yang deficiency - main | 0.1289 |
Yin-yang deficiency - main | 0.1321 | Phlegm heat moving wind - main | 0.1250 | Deficiency of qi and blood. - Secondary | 0.1452 |
Table 3 shows the results of the classical algorithm and the algorithms modified by LDA-ML, BTM-ML and WNTM-ML on the Parkinson’s dataset.
Experimental results of Parkinson’s data set
Micro-F | Macro-F | Example-F | |
---|---|---|---|
LDA-ML(BR) | 0.4645 | 0.2456 | 0.4171 |
BTM-ML(BR) | 0.4378 | 0.2087 | 0.4459 |
WNTM-ML(BR) | 0.4209 | 0.2179 | 0.4981 |
BR | 0.4172 | 0.1789 | 0.4003 |
LDA-ML(LP) | 0.4098 | 0.2708 | 0.4410 |
BTM-ML(LP) | 0.4431 | 0.2567 | 0.4507 |
WNTM-ML(LP) | 0.4309 | 0.2208 | 0.4878 |
LP | 0.4001 | 0.2153 | 0.4027 |
LDA-ML(ECC) | 0.4112 | 0.2975 | 0.4268 |
BTM-ML(ECC) | 0.4035 | 0.2114 | 0.5624 |
WNTM-ML(ECC) | 0.4892 | 0.2084 | 0.4290 |
ECC | 0.4018 | 0.2041 | 0.3927 |
LDA-ML(MLkNN) | 0.3290 | 0.1532 | 0.1390 |
BTM-ML(MLkNN) | 0.2893 | 0.0973 | 0.3082 |
WNTM(MLkNN) | 0.3322 | 0.2567 | 0.1240 |
MLkNN | 0.2130 | 0.0897 | 0.1211 |
LDA-ML(CLR) | 0.4921 | 0.2330 | 0.4134 |
BTM-ML(CLR) | 0.4199 | 0.2652 | 0.4903 |
WNTM-ML(CLR) | 0.4933 | 0.2974 | 0.4872 |
CLR | 0.4119 | 0.1976 | 0.4135 |
Win/loss | 5/0 | 5/0 | 5/0 |
As can be seen from Table 3, the algorithm improved by the topic modeling framework is improved in Micro-F, Macro-F and Example-F evaluation metrics compared to the original algorithm.
Figure 4 shows the ROC curves of the three models of DenseNet121, DAMNet and the improved TCM evidence classification algorithm (TCM), and the area of the ROC curve of the improved TCM evidence classification algorithm model can reach 0.99, which is more than the other two classification models. With the minimum parameter of 7.8M, the improved TCM classification algorithm achieves an accuracy of 96.73%, a precision of 97.45%, a sensitivity of 96.27%, and an F1 score of 96.21%.

ROC curves of the three models
Thus, it can be verified that the improved TCM evidence classification algorithm is efficient and accurate in diagnosing Parkinson’s disease.
Based on the Bayesian principle, this paper completes the correlation analysis of TCM evidence, captures the key problem of evidence recognition, analyzes the efficiency of the TCM evidence classification algorithm starting from Parkinson’s disease, and proposes the directions it can be optimized and improved.
Compared with the original algorithm, the improved algorithm has improved in the evaluation indexes of Micro-F, Macro-F and Example-F, which indicates that the improved algorithm has better performance in recognizing the Parkinson’s disease dataset after the theme model framework.
Comparing the ROC curves of the improved TCM evidence classification algorithm model with the DenseNet121 and DAMNet models, the improved TCM evidence classification algorithm achieves 96.73% accuracy, 97.45% precision, 96.27% sensitivity, and 96.21% F1 scores with a minimum parameter of 7.8 M. This proves that the improved algorithm with the DenseNet121 and DAMNet models, which proves that the improved TCM evidence classification algorithm is more efficient and accurate in diagnosing Parkinson’s disease.