Analysis of the construction and application effect of ophthalmic care service model based on telemedicine technology

Currently, there is an imbalance in China’s healthcare development, and the distribution of medical resources is skewed, with large cities having more and better quality medical resources, while remote areas and rural areas experience a certain degree of medical resource shortage [1-2]. This situation is particularly serious in the field of ophthalmology. In China, 70% of ophthalmologists are concentrated in large and medium-sized cities, and the number of ophthalmologists at the grassroots level is very small, which is far from being able to meet the needs of patients. Also, due to the lack of understanding of ophthalmic diseases, most patients are prone to pessimism and other negative emotions, and the patients’ own awareness of nursing care is constantly changing, with a strong demand for high-quality care [3-5]. Therefore, it is necessary to carry out quality nursing services based on telemedicine.

The occurrence of ophthalmic diseases has a certain relationship with age, heredity, etc., and bile can cause ophthalmic diseases under the combined effect of a single factor or multiple factors [6]. Patients with ophthalmologic diseases have mild or asymptomatic symptoms in the early stages of the disease, and generally need to be diagnosed by imaging tests and then need to implement early treatment to improve the prognosis, and effective nursing care is very important in the treatment process [7-9]. The implementation of quality care for patients with ophthalmic diseases can achieve better results, and its application can alleviate the psychological symptoms of patients, improve the degree of knowledge of the disease and surgery, reduce postoperative complications, alleviate the pain of the body, and promote the improvement of the quality of life of patients [10-13].

The current use of artificial intelligence in the field of ophthalmology is mostly in the imaging examination. Since the eye is the only organ on the human body that can directly observe blood vessels and nerves, ocular images can also provide a lot of information on blood vessels and nerves to assist in the diagnosis of other diseases [14-16]. With the advancement of artificial intelligence, most of the eye image examination can also be gradually used in various scenarios of ophthalmology, and ophthalmic telemedicine is also included [17-18]. The advantages of AI in ophthalmology make it an effective method to solve the problem of imbalance in the development of ophthalmology in China, and it is applied in the prevention, screening, diagnosis and care services of ophthalmic telemedicine [19-22].

In order to construct an ophthalmic care service model, this paper is based on telemedicine technology, adopts the Gray Wolf optimization algorithm for ophthalmic image feature selection, and proposes the M²LC-Net neural network model classification with the introduction of Grad-CAM visualization module. Comparison experiments with unimodal models are used to verify the ability of the M²LC-Net model to perceive the correlation between sample labels. The range of influence of the discard rate on the accuracy of the M²LC-Net model is derived by conducting a variety of experiments with different discard rates. Finally, the performance of the M²LC-Net model is compared with the existing bimodal model and the model without the introduction of the CAM module to evaluate the application of the M²LC-Net model.

2

Ophthalmic care service model based on telemedicine technology

2.1

Operating modes of the real-time system

The real-time system of remote consultation is based on the interactive communication mode, and this diagnostic mode can realize two or more parties’ consultation in a short time, which breaks through the limitation of time and space. Its process is shown in Figure 1.

The steps are as follows: 1)

The consultation requesting party (hospital) submits a consultation request to the relevant specialists, designates their roles, and agrees on a consultation time.

2)

Relevant information is organized in advance for real-time transmission.

3)

After accepting the request, the consulting experts can browse the patient’s electronic medical record before the real-time consultation.

4)

Go online at the agreed consultation time.

5)

The consultation parties start NetMeeting software to establish real-time communication, and the consultation requesting party passes.

6)

NetMeeting’s toolbar in the file transfer item, the relevant information generated by the graphics file transmitted to the consultation service. Starting the NetMeeting toolbar audio, conversation and whiteboard can be real-time text and voice communication, consultation on the patient’s medical records of the two sides can be discussed in real time, until a consensus is reached.

7)

The consultation host will store the consultation results in the hospital expert diagnosis information database. Consultation results in the form of a written diagnostic report to the relevant parties to sign for the record.

8)

NetMeeting software provides real-time function is not only the two sides of real-time communication, and can provide multi-party real-time communication function, that is to say, can provide multi-party real-time consultation. In the real-time consultation, a consultation service can not solve the problem, you can call a third party to participate in the consultation through the call function of NetMeeting, real-time text and voice communication can be realized between multiple parties. Due to the limited transmission speed of the network, it is difficult to ensure the real-time communication when there are too many consultation parties. Therefore, the real-time consultation should be based on the actual situation to choose the appropriate number of consultation.

2.2

Population Intelligence Optimization Algorithm for Feature Selection

2.2.1

Principles of the Gray Wolf Optimization Algorithm

Gray Wolf Algorithm (GWO) is a group intelligence optimization algorithm inspired by gray wolves.

Gray wolves belong to pack animals, their lifestyle is very similar to ancient human beings, they have a quite strict social hierarchy, all orders have to go through layers and layers of management, and the final decision is made by the topmost leader. The social hierarchy of the gray wolf is like a pyramid structure.

The first level of the social hierarchy: the leader of the gray wolf pack, called α. In the wolf pack, α is the one who has the power to take charge of all affairs, including hunting, sleeping time and place, food distribution, and other aspects related to food, clothing, housing, and transportation to ensure that the whole pack can thrive. In addition, α may not be the strongest wolf in the entire pack, but in terms of management decisions, α is the most suitable wolf in terms of the big picture for the prosperity of the entire pack.

The second level of the social hierarchy: the think-tank of the gray wolf population, called β, is subordinate to α and assists α in making important decisions. If the position of α wolf becomes vacant due to death or aging, β wolf will change its position and take over the position of α wolf to become the next leader of the gray wolf population. As can be seen from the pyramid structure, β is second only to α, and its main task is to give orders from α to the other gray wolves, and feedback to α on how well the other gray wolves carry out the orders, supporting the internal operation of the entire gray wolf population.

Tier 3: δ wolf, following the decision-making orders of α and β, mainly responsible for scouting, sentry, guarding and other affairs, while dominating the remaining tiers of wolves.

The fourth tier of the social hierarchy: ω wolves, as the lowest tier in the entire gray wolf population, are mainly responsible for the balance of the internal relationship within the population. If a population needs to survive, it needs to have the existence of wolves like ω to maintain the stable development within the entire population.

The social hierarchy of gray wolves plays an important role in the collective hunting process, and the GWO optimization process mimics the collective hunting process of gray wolf populations, which consists of the following three main parts: 1)

Stalking, chasing and approaching prey;

2)

Pursuing, encircling and harassing prey;

3)

Attacking the prey.

2.2.2

Gray Wolf Optimization Algorithm Flow

The flow of implementing the Gray Wolf optimization algorithm is shown in Figure 2.

The implementation process is as follows:

Similar to the leadership class, the fittest solution is taken as α, and the second and third best solutions are β, δ, and ω as candidates. The hunting process is led by α, β, and δ, and ω wolves follow these three wolves.

To encircle the prey, firstly, the gray wolf encircle prey behavior is defined as follows: (1) $\vec{D} = | \vec{C} \times \vec{x_{p}} (t) - \vec{x} (t) |$ (2) $\vec{x} (t + 1) = \vec{x_{p}} (t) - \vec{A} \times \vec{D}$

Eq. (1) denotes the distance between an individual gray wolf and its prey, and Eq. (2) denotes the gray wolf position update. Where t denotes the current number of iterations, A and C are coefficient vectors, and x_p and x_x are the positions of the prey and the gray wolf, respectively. The formulas for A and C are as follows: (3) $\vec{A} = 2 \times \vec{a} \times \vec{r} 1 - \vec{a}$ (4) $\vec{C} = 2 \times \vec{r 2}$

Where r₁ and r₂ are random values between 0 and 1, and a decreases linearly from 2 to 0 as iterations increase.

For hunting, the gray wolf pursuing prey is led by α wolves, accompanied by β and δ wolves, and the remaining ω wolves practice hunting by changing their direction according to the optimal position of the three wolves, and each iteration re-selects the optimal three wolves to continue leading the hunt.

(5)

\begin{array}{l} D_{a} = | \vec{C_{1}} \times \vec{x_{α}} - \vec{x} | \\ D_{β} = | \bar{C_{2}} \times \bar{x_{β}} - \vec{x} | \\ D_{δ} = | \vec{C_{3}} \times \bar{x_{δ}} - \vec{x} | \end{array}

Where, D_α, D_β, D_δ denote the distance between the three wolves and other wolves: x_α, x_β, x_δ denote the current position of the three wolves; x is the current position of the gray wolf.

(6)

\begin{array}{l} \vec{x_{1}} = \vec{x_{α}} - \vec{A_{1}} \times \vec{D_{α}} \\ \vec{x_{2}} = \vec{x_{β}} - \vec{A_{2}} \times \vec{D_{β}} \\ \vec{x_{3}} = \vec{x_{δ}} - \vec{A_{3}} \times \vec{D_{δ}} \end{array}

(7)

\vec{x} (t + 1) = \frac{\vec{x_{1}} + \vec{x_{2}} + \vec{x_{3}}}{3}

Eq. (6) represents the direction and distance that the gray wolf advanced toward the three wolves, and Eq. (7) determines the updated position of the gray wolf.

Attacking the prey, with iterations, the gray wolf pack, led by the three wolves, continues to approach the prey and eventually converges in order to attack the prey. In the mathematical model, this is affected by the value of A. If the value of A is greater than 1 or less than -1, the gray wolves find other areas more suitable for hunting, and if the value of A is between -1 and 1, the gray wolves move closer to the prey.

In summary, the Gray Wolf Optimization algorithm can achieve a balance between global search and local optimization through the mechanism of continuous information transfer between populations and the introduction of parameters that can adaptively converge, and has good performance in terms of optimization accuracy and convergence speed.

In this experiment, the Gray Wolf Algorithm (GWO) is used for feature selection, and the features extracted by the deep learning network are selected for feature selection in order to achieve better classification results and thus improve the performance of the whole model.

To use the Gray Wolf Optimization algorithm for feature selection, several issues need to be considered.

First, if the input of the GWO algorithm is directly the feature space, after the iteration of the algorithm, it will lead to a huge change in the original feature space, and the expression is no longer the ophthalmic disease image features, which will be inconsistent with the requirements of the subject. Therefore, the initial population location is set in the same dimension as the feature space, and the location points are randomized to 0 or 1.

Each line represents the location of a wolf, the purpose of this setting of location information is to formulate the feature selection rules, determining that if the wolf is at a location point value of 1, the feature value of the location will be selected from the feature space by default to be added to the subsequent classification, and the feature location point value of other values will not be selected.

The first line represents the location information of the first wolf, and if the location points of 1, 2, 3, and 7 are all 1, then the four feature locations of 1, 2, 3, and 7 are selected, and the corresponding locations represent the locations of the selected features in order to achieve the goal of optimizing the classification effect of ophthalmic diseases.

Next, since the position of the wolf is random, it is also necessary to consider that as the iteration proceeds, the final result of feature selection should be reasonable to ensure that the highest accuracy is obtained by eliminating the least amount of sub-feature space, and if the final selection of feature dimensions is too low to eliminate a large number of features in exchange for the experimental results are not persuasive.

Thus, this topic adds an activation function after updating the position in each iteration, with the aim of locking the target range between 0 and 1. The activation function is shown in equation (8).

(8)

y = \frac{1}{1 + e^{- 2 x}}

Then a 0-1 random value is introduced for comparison, and if the random value is greater than the result y obtained from the activation function, the original position point of that wolf remains unchanged, and if the random value is less than the result y obtained from the activation function, the original position point of that wolf is increased by 1 on top of the original.

In this experiment the position point value of the wolf is always restricted between -1 and 1. With iteration, if the position point value of the wolf is greater than 1, the position point value becomes 1. Therefore, with iteration, combined with the use of the activation function strategy and the restriction of the wolf’s position, the position point value of 1 will become more and more, which means that more and more features will be selected, and the final selection of the features will have reasonableness, and the results of the experiment are persuasive, and can achieve the purpose of eliminating the least sub-feature space features to obtain a higher accuracy rate.

In addition, in the final optimization goal, the impact of increasing the number of selected features on the results. This experiment is finally embedded in the support vector machine classification, and the fitness function is expressed as follows: (9) $f i t n e s s = 0.99 \times (1 - a c c) + 0.01 \times \frac{I}{\dim}$

Where, acc denotes the accuracy of classification with the support vector machine algorithm, I denotes the number of selected features, dim denotes the total number of features, and 0.99 and 0.01 represent the respective weights that are guaranteed to come to select more features with the same accuracy.

2.3

Building a neural network model

2.3.1

M²LC-Net Neural Network Modeling

Most of the existing studies utilize a single modality for the training of the algorithm, the number of diseases that can be classified by the model is small, and more importantly, long-tailed rare diseases in real-life scenarios can’t be handled effectively. In order to solve the above problems, this dissertation proposes a model that addresses multimodal, multidisease long-tail data classification M²LC–Net. For the data source scenario of the inputs to this architecture, according to the findings of this dissertation, when doctors examine a patient’s condition using an OCT device, they first utilize the OCT device to observe the patient’s macular fundus map to locate the lesion area, and then observe the lesion area again in the fundus cross-section of the OCT images, the infrared macular fundus map helps in the identification of the disease, so in this thesis, paired macular fundus images and OCT images are collected as data inputs for the M²LC–Net-network architecture.

First, dataset D = {x_f,x_o|y} was defined, where x_f and x₀ are fundus images and OCT images obtained from the same eye, respectively, and y is the diagnostic label for the set of images, y ∈ {no visible lesion, preretinal, central plasma choroidal retinopathy, xanthochoroidal schisis, macular schisis, choroidal neovascularization, age-related macular degeneration, retinal detachment, branch vein obstruction, arterial occlusion, central venous obstruction, and Harada disease}, which contains 11 ophthalmic diseases as well as no obvious lesions. The neural network model M²LC–Net designed in this thesis receives pairs of inputs {x_f, x_o} and outputs a diagnosis of the eye $\hat{y}$ : (10) $\hat{y} \leftarrow M^{2} L C - N e t ({x_{f}, x_{0}})$

M²LC–Net consists of two symmetric branches, one for fundus images and the other for OCT, and the weights of the two branches are not shared. Each branch uses the structure of ResNet18 removing all fully connected layers as a backbone network for extracting feature information from the images, followed by the attention mechanism module CBAM connected behind the backbone.

Each input image is first size-unified and randomly augmented such that Ff is the feature map generated by ResNet18 in the fundus image branch, F_f ∈ R^7×7×512. Similarly, defining the feature map of the OCT subtext as F_O gives F_O ∈ R^7×7×512.

Let ${F^{'}}_{f}$ be the feature map obtained by the global average pooling operation of F_f, then ${F^{'}}_{f} \in R^{512 \times 1}$ . The fully connected layer on the fundus image branch and the fully connected layer on the OCT branch do not change the size of the feature map, i.e., ${F^{'}}_{f}$ passes through the fully connected layer to obtain the feature map of ${F^{″}}_{f} \in R^{512 \times 1}$ . Similarly, ${F^{'}}_{O} \in R^{512 \times 1}$ and ${F^{″}}_{O} \in R^{512 \times 1}$ . After that, ${F^{″}}_{f}$ and ${F^{″}}_{O}$ are concatenated to form a 1024-dimensional vector containing information from both modalities. For classification purposes, the merged vectors are fed into the fully connected layer to produce a score for the final output $\hat{y}$ , denoted as $s_{\hat{y}}$ : (11) $s_{\hat{y}} = W_{f} \cdot {F^{″}}_{f} + W_{O} \cdot {F^{″}}_{O}$ where W_f and W_O are the class-related weights of the FC layer, and W_f, W_O ∈ R^512×12, $s_{\hat{y}} \in R^{12}$ . The categorization expressed in Equation (10) is achieved by selecting the category with the highest score.

2.3.2

Grad-CAM Visualization Module

In order to effectively show the contribution provided by the feature information of each modality, this thesis adds the class activation mapping CAM to generate a heat map of each modality in M²LC–Net to the input image, with the parts of the model of particular interest highlighted. The CAM operation is performed using F_f and F_o, defining Ff(x,y) as the value of the feature map F_f at point (x,y), where x,y ∈ {1,2,…,7}, F_f(x,y) ∈ R^512×1. Similarly, F_O(x,y) is the value of F_O at point (x,y), F_O(x,y) ∈ R^512×1. Then the CAM operation is defined as follows: (12) ${\begin{array}{l} C A M_{f} (x, y) = W_{f} \cdot F_{f} (x, y) \\ C A M_{O} (x, y) = W_{O} \cdot F_{o} (x, y) \end{array}$

CAM_f(x,y) and CAM₀(x,y) denote the contribution of specific locations of the fundus image and OCT image, respectively. For all x,y, the fundus image visualization heat map is obtained by splicing and upsampling CAM_f(x,y) to the same size as x_f and stacking it with x_f. Similarly the OCT visualization heat map can be obtained.

3

Effectiveness of M²LC-Net neural network modeling application

3.1

Superiority of M²LC-Net Neural Network Modeling

In the OIA-ODIR database selected for this paper, a total of eight label types were included, including normal (Normal,N), diabetic retinopathy (Diabetes,D), glaucoma (Glaucoma,G), cataract (Cataract,C), age-related macular de- generation (A), Hypertension (H), Pathological Myopia (M), Other diseases/abnormalities (O).

Figure 3 demonstrates a heat map comparison between the M²LC-Net model and the unimodal model Inception-V3 model for predicting the correlation between diabetic retinopathy and glaucoma, where the Inception-V3 model perceived the correlation between the features of diabetic retinopathy and glaucoma to be weak and failed to correlate the key features effectively. The M²LC-Net model, on the other hand, was better able to perceive the features of both lesions as highly correlated.

Figure 4 demonstrates a heat map comparison of the M²LC-Net model with the unimodal model Inception-V3 model predicting the correlation between diabetic retinopathy and age-related macular degeneration, and it can be seen that the Inception-V3 model has a weaker perception of the correlation between diabetic retinopathy and age-related macular degeneration features, misperceiving that pathologic myopia is more associated with diabetic retinal lesion features were more associated. The M²LC-Net model, on the other hand, was better able to perceive that the characteristics of the two lesions were highly correlated.

It can be seen that the unimodal model focuses more on calculating the loss of negative samples and lighter on calculating the loss of positive samples, and also does not fully consider the correlation between the labels of positive samples, resulting in a lower accuracy of the model in predicting ocular diseases, while the bimodal model has a stronger perception of the correlation between the labels of the positive samples, which enables the model to identify the negative samples based on a better grasp of the correlation between the positive samples as well. The bimodal model is more capable of perceiving the correlation between the labels of positive samples, which enables the model to have a better grasp of the correlation between positive samples based on recognizing negative samples.

In order to verify the effect of different Dropout discard rates on the M²LC-Net model, multiple experimental comparisons with different dropout rates are performed in the same environment, as shown in Table 1. When the Dropout rate is 0.1, the Accuracy of the model is 75.48%, the F1-score is 91.54%, the Kappa coefficient is 71.08%, and the AUC value is 96.38%. When the discard rate is 0.2, the Accuracy improves to 78.53%, the F1-score improves by 0.89%, the Kappa coefficient improves by 2.20%, and the AUC value improves by 1.55%. This shows that increasing the discard rate to a certain level can improve the model performance. When the discard rate is 0.3, the F1-score, Kappa, and AUC are improved although Accuracy decreases to 74.25%. This suggests that a higher discard rate may affect the accuracy of the model. When the discard rate increases to 0.4 to 0.8, the model performance fluctuates across metrics, but there is no significant upward or downward trend, suggesting that changes in the discard rate in this range have a limited impact on the overall performance of the model. At a discard rate of 0.9, all metrics decrease, with Accuracy decreasing to 70.36%, F1-score decreasing to 90.24%, Kappa coefficient decreasing to 68.92%, and AUC value decreasing to 92.53%. This indicates that too high discard rate has a negative impact on model performance.

Table 1.

Experimental results of different Dropout rates

Discard rate	Accuracy/%	F1-score/%	Kappa/%	AUC/%
0.1	75.48	91.54	71.08	96.38
0.2	78.53	92.43	73.28	97.93
0.3	74.25	93.24	74.18	98.24
0.4	76.35	92.93	74.01	97.24
0.5	78.24	94.32	75.23	93.99
0.6	77.24	94.03	74.26	94.72
0.7	76.35	95.36	71.24	92.46
0.8	76.32	94.11	74.29	94.24
0.9	70.36	90.24	68.92	92.53

In summary, a moderate discard rate (0.2-0.6) improves the overall performance of the model most significantly. Too low or too high a discard rate will have a poor impact on the model’s performance. In particular, when the discard rate reaches 0.9, all performance metrics perform poorly, suggesting that excessive discarding may cause the model to lose important information, thereby reducing performance.

3.2

Effectiveness of M²LC-Net neural network model application

Tensorboard, a visualization tool for the TensorFlow deep learning framework, was used to observe the convergence of the multi-label classification model. The accuracy and loss value curves of the M²LC-Net model on the training and validation sets are displayed in Figure 5. After 80 training iterations, there is no further change in the accuracy and loss values of the model and the model training stops. On the training set, the model accuracy reaches 98.70%. On the validation set, the model accuracy reaches 94.74%. The loss value curve decreases rapidly and reaches a steady state soon, and the model converges rapidly, indicating that the parameters of the model are suitable for the task of multi-label classification of multiple types of fundus diseases in fundus images. The results show that the proposed model achieves high accuracy for multiple types of fundus diseases and even avoids overfitting when the model is trained using a small sample training set.

In the case of class imbalance, performance analysis at the class level is important to better evaluate the overall performance of the model. In this study, class imbalance exists in the training set, so performance analysis at the category level was performed on the test set. The accuracy, precision, sensitivity and specificity of the M²LC-Net model at the class level are shown in Table 2. The model has high accuracy and specificity for the disease categories in the dataset where lesion representations are obvious, and the high accuracy rates for the cataract and myopia disease categories may be due to the more obvious features in the images of these two categories. The high accuracy and specificity for the less numerous disease categories is due to the fact that the number of negative samples is much higher than the number of positive samples for that particular disease category. However, the low sensitivity of the model for most of the diseases is an issue that mainly reflects some of the challenges and shortcomings in the selected OIA-ODIR databases, mainly in the following areas.

1)

Image quality. Due to the wide range of image sources and the rich diversity among images of the same category, there are large differences in color, illumination, and shooting conditions despite the fact that these images are labeled as the same category;

2)

The presence of different fundus diseases in the images can affect each other. For example, the model in the study was unable to recognize the presence of hard exudate lesions from cataract images, where hard exudates were also difficult to distinguish from vitreous warts, making it difficult to identify valid regional features in most misclassified images, which explains the relatively high sensitivity for cataract in Table 2;

3)

Localized features are not obvious. The determination of glaucoma requires accurate optic cup to disc ratio, AMD requires more detailed macular features, and different stages of DR have different pathologic features;

4)

There is a significant category imbalance in the database, with low sensitivity at high specificity mainly due to the fact that the number of images of negative samples is much higher than the number of images of positive samples. The problem of low sensitivity for most of the diseases is also present in the reviewed literature, in contrast to cataract and myopia, which have distinct pathologic features, for which the M²LC-Net model proposed in this study is more advantageous.

Table 2.

Class-Wise performance of M²LC-Net mode

Category	Accuracy rate	Precision rate	Sensitivity	Specificity
Normal	0.58	0.61	0.48	0.69
Diabetic retinopathy	0.71	0.46	0.49	0.84
Glaucoma	0.88	0.25	0.21	0.93
Cataract	0.99	0.74	0.83	0.95
Age-related macular degeneration	0.97	0.61	0.21	0.95
Hypertensive retinopathy	0.94	0.18	0.07	0.91
Nearsightedness	0.98	0.73	0.94	0.99
Other abnormal lesions	0.81	0.38	0.37	0.87

The M²LC-Net model proposed in this study is compared with other existing bimodal models on the same dataset. The F1 values, AUC and network parameters of the models were used to evaluate the classification performance of the different models, as shown in Table 3. Among the mainstream bimodal models, the ResNet50 model and the DenseNet121 model perform better. Compared to the ResNet50 model, the model in this study has effectively improved in performance. Without overfitting, the accuracy of the proposed model in this study is improved by about 2.93% on the validation set. Compared with the ResNet50 model, the AUC of the M²LC-Net model reached 88.35%, which was an improvement of about 14.03%. The F1 value of the ResNet50 model was higher than that of our model, but the training parameters were about three times higher than those of the M²LC-Net model. The fundus image and the grayscaled image from the OIA-ODIR database were used as inputs to the model in the training process of the ResNet50 model. images simultaneously as input to the model, which indirectly increases the number of training images, and the size of the images input to the network is nearly twice that of the M²LC-Net model, which requires more storage space on the computer and is not conducive to equipping it into a mobile device. The DenseNet121 model, which has a number of modules as components, is more stable compared to the M²LC-Net model, where the AUC is improved by approximately 9.00%, although the F1 value decreased by 3.19%, the DenseNet121 model training parameters were approximately 9 times higher than the M²LC-Net model, and the DenseNet121 model also used 3521 images collected from other databases as an additional training set during the training process. The results show that the model proposed in this study achieves better classification performance compared to the ResNet50 model and the DenseNet121 model.

Table 3.

Performance comparison results of different models

Model	Training accuracy/%	Verification accuracy/%	AUC/%	F1/%	Number of training parameters
ResNet50	-	91.93	74.32	88.24	>8.6M
DenseNet121	-	-	79.35	89.32	>27.80M
M²LC-Net	98.10	92.46	84.24	86.13	>29.35M
M²LC-Net without CAM	99.16	94.86	88.35	87.35	2.97M

In order to highlight the advantages of introducing the CAM module into the classification network, the M²LC-Net model was compared with the M²LC-Net without CAM model in this study. In Table 3, the AUC and F1 values of M²LC-Net without CAM are 84.24% and 86.13%, respectively. The M²LC-Net model has the highest AUC and F1 values and relatively fast classification speed, while its number of training parameters is about 2.97M, which is only 10.12% of that of M²LC-Net without CAM. Compared with M²LC-Net without CAM, the advantage of the proposed model is that it not only realizes multi-label classification of multiple types of fundus diseases, but also has a smaller number of training parameters and a relatively faster classification speed.

4

Conclusion

In order to build an eye care service model based on telemedicine technology, this paper proposes the M²LC-Net model for fundus disease classification, and the performance of the M²LC-Net model is validated.

The results of correlation heat map comparison show that the accuracy of the unimodal model Inception-V3 model for ocular disease prediction is lower than the bimodal model M²LC-Net model. Comparison results of experiments with different discard rates showed that a moderate discard rate (0.2-0.6) improved the overall performance of the M²LC-Net model most significantly. When the discard rate reaches 0.9, all performance metrics perform poorly.

The M²LC-Net model achieves 98.70% accuracy on the training set and 94.74% on the validation set after 80 training iterations. Comparing the performance with the existing bimodal models ResNet50 and DenseNet121 models, the AUC of the M²LC-Net model is improved by about 14.03% over the ResNet50 model and by about 9.00% over the DenseNet121 model. Comparing with the model without introducing the CAM module, the number of training parameters of the M²LC-Net model is only 10.12% of the M²LC-Net without CAM model.

The results show that the proposed M²LC-Net model is more accurate in the classification of ophthalmic diseases and provides effective technical support for the construction of ophthalmic care models based on telemedicine technology.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Analysis of the construction and application effect of ophthalmic care service model based on telemedicine technology

Ce Gao

Huan Liu

Published Online: Mar 19, 2025

Received: Nov 07, 2024

Accepted: Feb 18, 2025

DOI: https://doi.org/10.2478/amns-2025-0523

KeywordsOphthalmic diseases, Telemedicine technology, MLC-Net model, Multimodal neural network, Gray wolf optimization algorithm

© 2025 Ce Gao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Ophthalmic diseases, Telemedicine technology, MLC-Net model, Multimodal neural network, Gray wolf optimization algorithm