Predictive classification and regression models for bioimpedance vector analysis: Insights from a southern Cuban cohort
Pubblicato online: 04 ago 2025
Pagine: 89 - 98
Ricevuto: 10 gen 2025
DOI: https://doi.org/10.2478/joeb-2025-0012
Parole chiave
© 2025 Jose Luis García Bello et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Bioimpedance technology presents a versatile and noninvasive method for cancer detection and monitoring. Its use in detecting and tracking mammary, neck, skin, and breast cancers showcases its potential to enhance early diagnosis, monitor treatment efficacy, and improve patient outcomes. As research advances, bioimpedance could become a crucial component of personalized cancer care, equipping clinicians with essential tools for more accurate diagnosis and effective management [1,2,3,4,5,6,7,8,9,10,11,12].
Bioimpedance technologies, which measure the resistance and reactance of biological tissues to an applied electrical current, have demonstrated significant potential in oncology [1,2,3,4,5,6,7,8,9,10,11,12,13]. This non-invasive method offers valuable insights into tissue composition and physiological changes, making it a promising tool for cancer detection and monitoring [1,2,3]. For instance, mammary cancer, commonly known as breast cancer, is one of the most prevalent cancers among women [13,14,15,16]. Early detection and accurate monitoring are essential for improving patient outcomes [17,18,19]. Recent studies have explored the application of bioimpedance in various types of cancer, including mammary, neck, skin, and breast cancers [13,14,15,16].
Bioimpedance techniques, including single-frequency (SF-BIA) and multi-frequency (MF-BIA) bioimpedance, are used to measure physiological parameters. SF-BIA, which utilizes a single frequency (50 kHz) of electrical current, is a simple and cost-effective method often employed in healthcare and fitness settings [20,21,22,23,24]. However, it may not accurately measure extracellular and intracellular water content due to varying electrical properties of tissues and fluids at different frequencies [20,21,22,23,24]. MF-BIA and impedance spectroscopy (SBIA) uses multiple frequencies disclosing more detailed information on the cellular and molecular properties of tissues and fluids [20,21,22,23,24]. Another bioimpedance method is the BIA vector analysis (BIVA). BIVA is a non-invasive method that evaluates body composition and hydration status by measuring electrical impedance at a low frequency. The BIVA method standardizes Bioelectrical Impedance Analysis (BIA) measurements by height and represents them as bivariate vectors with confidence intervals, depicted as ellipses on the R-Xc plane. This technique's key advantage is its ability to provide simultaneous information about changes in tissue hydration and soft-tissue mass, independent of regression equations or body weight. BIVA has been extensively utilized to study hydration across various diseases [20,21,22,23,24,25,26,27,28] and to conduct general body composition assessments in patients with lung cancer [27, 29] and head and neck cancers [25,26,27,28].
Studies have demonstrated that BIVA can effectively differentiate between cancer patients and healthy individuals by analysing parameters such as impedance, phase angle, and reactance [25,26,27,28]. For instance, cancer patients often exhibit lower phase angles and higher impedance values, indicating altered cellular integrity and body composition due to the disease. In addition, BIVA has been utilized to assess hydration status in advanced cancer patients, revealing significant associations between hydration levels and clinical outcomes. Research has shown that lower hydration status is linked to increased symptom intensity and shorter survival times in cancer patients [25,29,30]. The integration of BIVA into routine assessments highlights its potential as a valuable tool in oncology and general health monitoring [25,29,30].
In a previous work, the integration of various bioimpedance modalities was used to evaluate the health status of southern Cuban populations. We demonstrated that this integrated methodology serves as a sensitive complementary tool, adept at distinguishing between healthy individuals and cancer patients. In addition, it confirms that the phase angle value at the characteristic frequency may be a robust indicator of overall health status of individuals, similar to the reported at 50 kHz [31].
Determining the location of individuals within the tolerance ellipses derived by BIVA is crucial for accurately assessing their health and nutritional status [25,26,27,28,29,30,31]. Location variables contemplate the BIVA status, quartile and centile. These ellipses provide a graphical representation of an individual's impedance vector relative to a reference population [26,27,28,29]. By pinpointing where an impedance vector falls within these ellipses, clinicians can gain valuable insights into their body composition, hydration levels, and cellular health [26,27,28,29]. For example, individuals positioned within the upper regions of the ellipses typically exhibit better cell membrane integrity and hydration status, while those in the lower regions may indicate dehydration or cellular function [25,26,27,28,29,30]. Moreover, the location within the tolerance ellipses can help identify early signs of malnutrition, overhydration, or other health issues that may not be immediately apparent through conventional assessments [26,27,28,29,30]. This proactive approach is particularly beneficial for managing chronic conditions, such as heart failure, renal disease, and cancer. The ability to determine an individual's location within the BIVA tolerance ellipses enhances the precision and effectiveness of health assessments, leading to better-informed clinical decisions and improved patient outcomes [25,26,27,28,29,30,31].
Interest in artificial intelligence (AI) has grown significantly in recent years, driven by continuous advancements in computer science. AI has proven particularly useful in data management and various scientific fields, including medicine [32,33,34,35,36]. One area where AI holds great potential is diagnostic imaging, as it can aid in identifying various conditions, including body composition [32,33]. This capability may further contribute to predicting health outcomes across different diseases [32,36]. For instance, a combination of bioimpedance measurements and machine learning analysis was conducted in an infant-juvenile cohort from the eastern Cuban region [34,35]. The classification model demonstrated that, aside from body mass index, alternative indicators can serve as predictors of weight status [34,35]. Additionally, the regression learner model accurately predicted the weight status of the subjects with high precision [34,35].
Recently, a cancer predictive model was developed using bioimpedance measurements and machine learning methods [36]. The results revealed that the classification model identified two robust parameters for predicting health status: impedance, total body water, and phase angle, which showed significant relevance [36]. The phase angle predictions align with previous reports on other pathologies, indicating that higher phase angle values are associated with better health status. Furthermore, males tend to have higher phase angle values compared to females [36].
To our knowledge, there is no reports combining machine learning with BIVA studies for prediction of location classes. In the present work, we employed predictive classification and regression learner models to investigate the association between bioparameters of subjects, derived at the characteristic frequency, and their locations within tolerance ellipses for a cohort from the southern Cuban region. The model developed in this study plays a crucial role in location assignment without relying on BIVA methods, improving clinical evaluations and health monitoring.
A descriptive, retrospective, randomized study was conducted on patients who visited the “Conrado Benítez” Teaching Oncology Hospital in Santiago de Cuba for suspected cancer during the periods of March–April and July–August 2002. The hospital was chosen for its suitable facilities and accommodations, despite participants hailing from various locations across eastern Cuba. The study comprised 367 subjects (235 females and 132 males) aged 18 to 86 years. Among them, 61 were diagnosed with different types of cancer pathologically, while 306 were healthy. Participants were recruited following strict ethical guidelines and medical practices as outlined by the Health General Law of the Ministry of Public Health of the Republic of Cuba (Number 41, July 13, 1983, updated in 2010).
Following the Helsinki Declaration, the research was approved by the ethics committees and scientific councils of the Oncological Hospital “Conrado Benítez.” Additionally, data on bioelectrical parameters of cancer patients and healthy individuals were retrieved from a database archived at the National Center for Applied Electromagnetism (CNEA); (ISBN: 978-959-207-679-2). The diagnosis of the patients was made through pathological anatomy. The samples were taken using the fine needle aspiration cytology technique. Anatomical pathology laboratory diagnoses revealed that female patients had various types of cancer with different stages, including breast and cervical cancer, whereas male patients were diagnosed with more aggressive cancers, such as skin melanoma, colon cancer, lung neoplasm. Further details can be found in Ref [11].
Informed consent has been obtained from all individuals included in this study.
The research related to human use has been complied with all relevant national regulations, institutional policies and in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors' institutional review board or equivalent committee.
Achieving balanced data is essential for successful machine learning studies [32,33,34,35,36,37,38,39,40,41]. In our study, we ensured data balance by implementing random oversampling for the minority class (cancer) and random undersampling for the majority class (healthy). The final dataset included 621 individuals, aged between 18 and 86 years (382 females and 239 males), with 316 diagnosed with cancer and 306 healthy participants. To prevent overfitting, a cross-validation technique is used, in which 95% of the data is designated for training and the remaining 5% is reserved for validation. This approach ensures that the model generalizes well to unseen data rather than memorizing patterns specific to the training set [37,38,39,40,41].
For the machine learning study, the features include: health status (cancer, healthy), sex, high frequency resistance (Rinf), characteristic frequency (
Several metrics can be used to assess the accuracy of machine learning models [39,40,41]. Metrics for classification models are derived from the confusion matrix [39,40,41]. For classification models, the accuracy is defined as:
The precision, which indicates the amount of the predicted positive instances that are actually positive is defined as follows [39,40]:
The recall describes how well the model captures all the positive instances [39,40]:
F1-Score is defined as [39,40]:
In addition, for regression models the common metrics are: the coefficient of determination (
Bioimpedance parameters were measured using a BioScan 98® model bioimpedance analyzer (Biológica Tecnología Médica S.L., Barcelona, Spain) with a tetrapolar whole-body configuration. Participants, who fasted for at least 3 hours, emptied their bladders, and refrained from exercise and alcohol for 12 hours prior, were included in the study. MF-BIA measurements were taken at frequencies ranging from 10 to 250 kHz using Ag/AgCl electrodes model 3M Red Dot 2560 (3M, Ontario, Canada). The study was conducted in a room maintained at 23°C with 60–65% relative humidity. Volunteers lay on a non-conductive surface without clothing or pillows, with their arms positioned 30° away from the chest and legs spread 45° apart. The injector electrodes were placed (after cleaning the skin with 70% alcohol) on the inner side of the dorsal surfaces of the hands and feet, near the metatarsophalangeal and third metacarpal joints. The detector electrodes were positioned between the distal ends of the ulna and radius at the pisiform prominence and at the midpoint between both malleoli. A 5 cm gap was maintained between detector and injector electrodes during measurements.
With the balanced database, we perform classification models to predict the location features. Table 1 collects the best model and the metrics describing the model performance and generality of each location feature. The Fine Tree model is the most effective in describing health and BIVA statuses, with accuracies of 92.80% and 99.50%, respectively. Meanwhile, the Linear Support Vector Machine and Random under-sampling Boosted Tree approach (RUSBoosted Tree) are the top models for describing quartile and centile responses, achieving accuracies of 100% and 97%, respectively.
Response models and their respective metrics for the classification of health status and location variables. The column Class includes Health Status (Cancer, Healthy), Quartile (1, 2, 3, 4), Centile (50, 75, 95, and 100%), and BIVA status (11, 12, 13, 14, 21, 22, 23, 31, 32, 33, 41, 42, 43, 44).
Health status | Fine Tree | 92.80% | Cancer | 0.912 | 0.951 | 0.931 |
Healthy | 0.947 | 0.905 | 0.926 | |||
BIVA status | Fine Tree | 99.50% | 11 | 1.000 | 1.000 | 1.000 |
12 | 1.000 | 1.000 | 1.000 | |||
13 | 1.000 | 1.000 | 1.000 | |||
14 | 1.000 | 1.000 | 1.000 | |||
21 | 1.000 | 1.000 | 1.000 | |||
22 | 1.000 | 1.000 | 1.000 | |||
23 | 0.857 | 0.750 | 0.800 | |||
31 | 1.000 | 1.000 | 1.000 | |||
32 | 1.000 | 1.000 | 1.000 | |||
33 | 0.889 | 0.941 | 0.914 | |||
41 | 1.000 | 1.000 | 1.000 | |||
42 | 1.000 | 1.000 | 1.000 | |||
43 | 1.000 | 1.000 | 1.000 | |||
44 | 1.000 | 1.000 | 1.000 | |||
Quartile | Linear SVM | 100% | 1 | 1.000 | 1.000 | 1.000 |
2 | 1.000 | 0.983 | 0.991 | |||
3 | 0.990 | 1.000 | 0.995 | |||
4 | 1.000 | 1.000 | 1.000 | |||
Centile | RUS Boosted Tree | 97% | 50% | 0.988 | 0.984 | 0.986 |
75% | 0.959 | 0.953 | 0.956 | |||
95% | 0.948 | 0.938 | 0.943 | |||
100% | 0.957 | 1.000 | 0.978 |
The confusion matrices presented in Figure 1 illustrate the model's performance across various classification tasks. For health status (Figure 1a), the matrix reveals a high true positive rate (TPR) for healthy individuals, with 96.9% correctly identified and only 3.1% misclassified. The model shows excellent accuracy in distinguishing between healthy individuals and those with health issues, demonstrated by the 100% correct classification of the cancer (C) class. For BIVA status (Figure 1b), which encompasses 44 classes, most achieved 100% classification accuracy. Minor misclassifications were observed between classes 33 and 34, with a small percentage (4.5%) incorrectly classified.

Confusion matrix of trained models: a) Health status, b) BIVA status, c) quartile and d) centile responses.
The overall TPR of 99.5% highlights the model's high precision in identifying BIVA statuses correctly. The Quadrant Responses matrix (Figure 1c) displays perfect classification, achieving 100% accuracy across all four classes, indicating an exceptional ability to differentiate quadrant responses without any errors. For Centile Responses (Figure 1d), the matrix shows slightly lower but still high performance, with true classes 50%, 75%, and 95% having some minor misclassifications. Notably, class 100% is perfectly classified. The overall TPR of 95.3% demonstrates robust model performance, though there is some room for improvement in distinguishing closely related centiles. Collectively, these matrices underscore the models' effectiveness in accurately classifying various health statuses, BIVA parameters, quadrant responses, and centile categories, with high true positive rates and minimal misclassifications.
Feature importance analysis using a Pareto chart is a powerful method for identifying and visualizing the most significant predictors in a classification model [37,38,39]. By ranking features according to their importance scores and displaying them in a Pareto chart, one can easily discern the few key features that contribute most to the model's predictive power [37,38,39]. Applied to feature importance, this means that a small number of features typically account for the majority of the model's performance. In the Pareto chart, features are ordered from highest to lowest importance, with a cumulative line illustrating their collective contribution. This visual representation aids in prioritizing features for model improvement, simplifying complex datasets, and focusing efforts on the most impactful predictors [37,38,39].
Figure 2 displays the feature importance of each response. For health status model (Figure 2a), it can be note that the most influencing feature is the age with a 95% of importance score, followed by α, fc, BIVA status (BIVA), Rinf,

Feature importance of trained models: a) Health status, b) BIVA Status, c) quartile and d) centile responses.
The phase angle is a well-established marker for assessing cell membrane health and integrity, bearing significant implications for cancer prognosis [41,42,43,44]. A comprehensive study with 2625 participants demonstrated a positive and substantial correlation between phase angle and cancer survival rates. Specifically, patients exhibiting low phase angle values were found to be 23% less likely to survive compared to those with higher values [41]. Additionally, phase angle at the time of diagnosis is recognized as a crucial prognostic factor for survival among patients with advanced head and neck cancer [42]. Previous research has indicated that phase angle shows a moderate to strong correlation with body composition and physical function, while its correlation with nutritional status, complications, survival, quality of life, and symptoms tends to be weaker. These findings underscore the utility of phase angle as a robust indicator in clinical assessments, particularly in the context of cancer prognosis [43,44].
In addition, phase angle, Xc and R determine the location in the tolerance elipses in BIVA studies [25,26,27]. By analysing the position of an individual's impedance vector within these ellipses, researchers can assess hydration status, cell membrane integrity, and overall cellular health.
Previous reports used single-frequency bioimpedance for BIVA studies. In our previous work, we demonstrated the use of impedance spectroscopy combined with BIVA for accessing the health status by constructing the tolerance elipses at the characteristic frequency instead of the common 50 kHz [31]. In the present study, we found that the bioparameters derived at the characteristic frequency play a crucial role determining the health status and the location parameters.
This section focuses on developing regression models to predict Zc, Rc, Xcc, and
Accuracy parameters of each model.
Zc (Ω) | Linear | 1.000 | 0.351(Ω2) | 0.123 (Ω) | 0.233 (Ω) |
Linear | 0.980 | 0.237 (°2) | 0.056 (°) | 0.166 (°) | |
Xcc (Ω) | Linear SVM | 0.990 | 2.378 (Ω2) | 5.652 (Ω) | 1.674 (Ω) |
Rc (Ω) | Linear SVM | 1.000 | 4.216 (Ω2) | 17.773 (Ω) | 3.223 (Ω) |

Response vs predicted plot of the selected responses.
The accuracy parameters listed in Table 2 further corroborate the visual findings from Figure 3. The Fine Tree model for health and BIVA statuses demonstrates an R2 of 1.00 for characteristic impedance and resistance, indicating perfect prediction. The RMSE, MSE, and MAE values are very low for these models, further emphasizing their high accuracy. The phase angle has an R2 = 0.98, with similarly low RMSE, MSE, and MAE values, highlighting its robust prediction accuracy.
For the characteristic reactance, the Linear SVM model shows an R2 = 0.99, indicating slightly lower but still high accuracy. The RMSE, MSE, and MAE values reflect this slight decrease in precision but still demonstrate the strong performance.
Figure 4 displays the observable and predictions of Zc,

The box plots in Figure 4 illustrate the close alignment between the experimental and predicted values of Zc,
From Figure 4c, again, lower characteristic reactance is reasonably found for cancer patients, in accordance with cell membrane deterioration reported [25,26,27,28,29,30,31]. By sex, female subjects having higher Xcc indicates better cell membrane integrity and overall well-being [25,26,27,28,29,30]. Quartiles 3 and 4 have lower Xcc, where the majority of the cancer patients are located. Higher Xcc is encountered for subjects located at 50% centile, while 75% and 95% have similar values and at 100% centile has the lowest Xcc, at 50–100% centiles the cancer patients are located. Similar behaviour is found for the BIVA statuses. From Figure 4d, the characteristic resistance in cancer patients is slightly higher compared to that of the healthy individuals, which can be attributed to unbalanced body composition in cancer patients. Notably, lower Rc values are observed in male individuals, those in the third quartile, and within the 75–95% centile range. Additionally, Rc exhibits variations across different BIVA statuses.
In order to evaluate the model's performance, a schematic representation of BIVA is presented in Figure 5. The maximum and minimum values of Zc,

Schematic representation of BIA vector analysis (BIVA), presenting the maximum and minimum values of Zc,
Previous studies have reported that individuals with pathologies associated with hydration status are located in quartile 1, while athletes or people who exercise regularly are in quartile 2. The model can also assign the location of healthy individuals between quartiles 1 and 2, centile 50%. Additionally, obese subjects, hemodialysis patients, and individuals with chronic renal failure undergoing hemodialysis are found in quartile 3. Patients diagnosed with cancer, anorexia nervosa, human immunodeficiency virus (in various stages), COVID-19, etc., are located in quartile 4 [2,29]. Our predictive model aligns with these previous studies. Thus, the model developed in this work is significant for assigning locations without the need for BIVA methods, enhancing clinical assessments and health monitoring.
In this work, a predictive classification and regression learner models is used to study the association between bioparameters at the characteristic frequency with the location at the tolerance elipses of a cohort from the southern Cuban region. We used 16 characteristics (features) derived from bioimpedance measurements, including other physical parameters. The classification model shows that the bioparameters derived at the characteristic frequency play a crucial role determining the health status and the location parameters. From the regression models, we found a strong agreement between experimental and predicted values for Zc,
While the predictive models presented in this study demonstrate strong agreement between experimental and predicted values, certain limitations must be acknowledged. The dataset is specific to a cohort from the southern Cuban region, which may limit the generalizability of the findings to other populations with different physiological or environmental conditions. For future directions, expanding the dataset to include diverse ethnic, geographic, and physiological groups could enhance the robustness of the models across populations.