Uneingeschränkter Zugang

Identification of women with high grade histopathology results after conisation by artificial neural networks


Zitieren

Figure 1

Schematic of simple neural network with input, output and three hidden layers.
Schematic of simple neural network with input, output and three hidden layers.

Figure 2

Matthews correlation coefficient (MCC) for categorisation squamous intraepithelial lesion (HSIL)-combined for YES and NO prediction for different equalisation methods (no correction of minority class, under-sampling, oversampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance of multi-layer perceptron (MLP) is on dataset with data organised in classes and over-sampling method for minority class – MCC = 0.64. Lowest performance is with original dataset without correction for minority class – MCC = 0.086.
Matthews correlation coefficient (MCC) for categorisation squamous intraepithelial lesion (HSIL)-combined for YES and NO prediction for different equalisation methods (no correction of minority class, under-sampling, oversampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance of multi-layer perceptron (MLP) is on dataset with data organised in classes and over-sampling method for minority class – MCC = 0.64. Lowest performance is with original dataset without correction for minority class – MCC = 0.086.

Figure 3

True positive and False positive rate for different settings for prediction Yes and No combined and for different equalisation methods (no correction of minority class, under-sampling, over-sampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance model from Figure 2 has 0.842 true positive rate and 0.182 false positive rate. Lowest performance model from Figure 2 has high 0.814 true positive rate which is almost as high as best performance model but also high false positive rate 0.735.
Raw = original settings; Class = class setting; FPR = false positive rate; HSIL = high grade squamous intraepithelial lesion; overs = oversampling; TPR = true positive rate; unders = undersampling; SMOTE = synthetic minority over-sampling technique
True positive and False positive rate for different settings for prediction Yes and No combined and for different equalisation methods (no correction of minority class, under-sampling, over-sampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance model from Figure 2 has 0.842 true positive rate and 0.182 false positive rate. Lowest performance model from Figure 2 has high 0.814 true positive rate which is almost as high as best performance model but also high false positive rate 0.735. Raw = original settings; Class = class setting; FPR = false positive rate; HSIL = high grade squamous intraepithelial lesion; overs = oversampling; TPR = true positive rate; unders = undersampling; SMOTE = synthetic minority over-sampling technique

Figure 4

Receiver operator characteristic (ROC) curve for multi-layer perceptron (MLP) performance on dataset without grouping in classes and no correction for minority class where X axis represent 1- specificity (false positive rate) and Y axis represents sensitivity (true positive rate). Area under the ROC curve (AUC) = 0.594. AUC for categorisation with random guessing is 0.5. This Figure represents model with lowest performance of MLP from our study.
Receiver operator characteristic (ROC) curve for multi-layer perceptron (MLP) performance on dataset without grouping in classes and no correction for minority class where X axis represent 1- specificity (false positive rate) and Y axis represents sensitivity (true positive rate). Area under the ROC curve (AUC) = 0.594. AUC for categorisation with random guessing is 0.5. This Figure represents model with lowest performance of MLP from our study.

Figure 5

Receiver operator characteristic (ROC) curve for multi-layer perceptron (MLP) performance on dataset with patients grouping in classes and synthetic minority over-sampling technique (SMOTE) correction for minority class where X axis represent 1- specificity (false positive rate) and Y axis represents sensitivity (true positive rate). Area under the ROC curve (AUC) = 0.802 which is well above classification with random guessing where AUC is 0.5. This Figure represents best performance model of MLP from our study.
Receiver operator characteristic (ROC) curve for multi-layer perceptron (MLP) performance on dataset with patients grouping in classes and synthetic minority over-sampling technique (SMOTE) correction for minority class where X axis represent 1- specificity (false positive rate) and Y axis represents sensitivity (true positive rate). Area under the ROC curve (AUC) = 0.802 which is well above classification with random guessing where AUC is 0.5. This Figure represents best performance model of MLP from our study.

Number and percentage of patients according to human papilloma virus (HPV) 16 and 18 statuses in high grade squamous intraepithelial lesion (HSIL) and NO-HSIL group

HPV 16 HPV 18

HSIL group NO-HSIL group HSIL group NO-HSIL group

Frequency % Frequency % Frequency % Frequency %
not performed 177 14 29 16 172 13 27 15
negative 693 54 106 57 775 60 120 65
positive 419 32 51 27 342 27 39 20
Total 1289 100 186 100 1289 100 186 100

Results of multi-layer perceptron (MLP) classifications for different settings with baseline prediction – ZeroR, percentage of correct classification and Kappa statistic for all analysis. Results are for prediction high grade squamous intraepithelial lesion (HSIL)-Yes (Y), prediction NO-HSIL (N) and weighted average for whole model (YES and NO combined) – Weighted average (AVG). In bold-type letters are results, where prediction by MLP is better than baseline prediction ZeroR

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class % Correct Kappa ZeroR %
Class_orig–Y 0.751 0.634 0.739 0.751 0.745 0.118 0.567 0.735 Yes 82.10 0.0965 87.39
Class_orig–N 0.366 0.249 0.308 0.366 0.373 0.118 0.567 0.377 No
Class_orig–AVG 0.637 0.521 0.633 0.637 0.635 0.118 0.567 0.629 Weighted Avg
Class_overs–Y 0.860 0.201 0.908 0.860 0.884 0.640 0.870 0.920 Yes 84.19 0.6376 69.79
Class_overs–N 0.799 0.140 0.712 0.799 0.753 0.640 0.870 0.703 No
Class_overs–AVG 0.842 0.182 0.849 0.842 0.844 0.640 0.870 0.855 Weighted Avg
Class_SMOTE–Y 0.797 0.274 0.834 0.797 0.815 0.515 0.802 0.850 Yes 77.08 0.5141 63.40
Class_SMOTE–N 0.726 0.203 0.673 0.726 0.699 0.515 0.802 0.669 No
Class_SMOTE–AVG 0.771 0.248 0.775 0.771 0.772 0.515 0.802 0.784 Weighted Avg
Class_unders–Y 0.669 0.559 0.636 0.669 0.652 0.112 0.542 0.608 Yes 57.64 0.1113 59.39
Class_unders–N 0.441 0.331 0.477 0.441 0.458 0.112 0.542 0.448 No
Class_unders–AVG 0.576 0.466 0.572 0.576 0.573 0.112 0.542 0.543 Weighted Avg
RAW_orig–Y 0.907 0.828 0.884 0.907 0.895 0.086 0.594 0.905 Yes 81.42 0.0856 87.39
RAW_orig–N 0.172 0.093 0.211 0.172 0.189 0.086 0.594 0.174 No
RAW_orig–AVG 0.814 0.735 0.799 0.814 0.806 0.086 0.594 0.813 Weighted Avg
RAW_overs–Y 0.825 0.285 0.870 0.825 0.847 0.525 0.837 0.905 Yes 79.21 0.523 69.79
RAW_overs–N 0.715 0.175 0.639 0.715 0.675 0.525 0.837 0.661 No
RAW_overs–AVG 0.792 0.252 0.800 0.792 0.795 0.525 0.837 0.831 Weighted Avg
RAW_SMOTE–Y 0.800 0.258 0.843 0.800 0.821 0.533 0.814 0.867 Yes 77.87 0.5318 63.4
RAW_SMOTE–N 0.742 0.200 0.681 0.742 0.710 0.533 0.814 0.691 No
RAW_SMOTE–AVG 0.779 0.237 0.784 0.779 0.780 0.533 0.814 0.802 Weighted Avg
RAW_unders–Y 0.688 0.575 0.636 0.688 0.661 0.115 0.551 0.614 Yes 58.08 0.1144 59.39
RAW_unders–N 0.425 0.313 0.482 0.425 0.451 0.115 0.551 0.466 No
RAW_unders–AVG 0.581 0.469 0.573 0.581 0.576 0.115 0.551 0.554 Weighted Avg

Confusion matrix for classification with all possible outcomes

Predicted pos (PP) Predicted neg (PN)
Actual pos (P) True positives (TP) False negatives (FN)
Actual neg (N) False positives (FP) True negatives (TN)

Final histology of the cone in patients without human papilloma virus (HPV) testing

Frequency Percent
NO dysplasia 9 1.8
CIN 1 26 5.3
CIN 1–2 27 5.4
CIN 2 90 18.1
CIN 2–3 55 11.1
CIN 3 223 45.0
CIS 55 11.1
invasive ca 11 2.2
Total 496 100.0
eISSN:
1581-3207
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
4 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Medizin, Klinische Medizin, Allgemeinmedizin, Innere Medizin, Hämatologie, Onkologie, Radiologie