Deep Cross-Organism Generalization of the Physiological Effects of Spaceflight from Mammalian Model Organisms to Humans
Categoria dell'articolo: Research Note
Pubblicato online: 02 lug 2025
Pagine: 39 - 50
DOI: https://doi.org/10.2478/gsr-2025-0003
Parole chiave
© 2025 Konstantinos I. Adamopoulos et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
The National Aeronautics and Space Administration (NASA) Open Science Data Repository (OSDR;
Afshinnekoo et al. highlighted that the biological responses during spaceflight arise from six fundamental molecular and cellular features, including oxidative stress, DNA damage, mitochondrial dysregulation, epigenetic changes, and alterations of telomere length (Afshinnekoo et al., 2020).
There are still many unknown mechanisms regarding how human physiology is affected during long-term deep space exploration missions (Liphardt et al., 2023). It is a significant challenge to accurately map animal findings to humans, as most data originates from model organisms. Nonetheless, rodent models are indispensable for advancing our understanding of these effects, given their established use and physiological similarities in simulating microgravity conditions (Moyer et al., 2016; Cahill et al., 2021; Ronca and Lowe, 2022).
Some stressors (e.g., confinement) are comparatively straightforward to replicate, while altered gravity and radiation are considerably more challenging to simulate by spaceflight analogues on Earth. Microgravity analogues, such as random positioning machines, clinostats, short-term free-fall intervals during parabolic flight, head-down bed rest, hindlimb unloading rodent models, and rotating wall vessels, have been devised to simulate infrequent altered gravity conditions. Although the analogues often fall short of fully capturing the nuanced effects of spaceflight. These limitations underscore the need for continued refinement and innovation in analogue development (Morey-Holton and Globus, 2002; Brungs et al., 2016; Kiss et al., 2019; Afshinnekoo et al., 2020).
In our present study, we initially conducted an extensive differential expression analysis of OSDR data from
The overall study workflow is presented in Figure 1. Firstly, we systematically identified all the available microarray datasets and studies housed in NASA Open Science Data Repository (OSDR) by applying the search filters of “Microarray”, “

Overall study workflow.
The data preprocessing and analysis of differential expression were performed using the R programming language within the R Bioconductor environment. Following retrieval of the raw microarray expression data for each dataset from NASA OSDR, we performed normalization utilizing affy, oligo (for Affymetrix microarray experiments), or limma (for Agilent) packages (Gautier et al., 2004; Carvalho and Irizarry, 2010; Ritchie et al., 2015). Outlier samples were identified and removed through Quality Control visualization methods, including diagnostic plots such as PCA plots, boxplots of array intensity distributions, density of raw intensities plots, and MA plots. Official Gene Symbols, as designated by the Human Genome Organization Gene Nomenclature Committee, were employed as the identifiers across experiments and platforms.
The Differentially Expressed Genes (DEGs) between the microgravity and control group were identified using the limma R package (Ritchie et al., 2015). The Benjamini-Hochberg method was used to adjust the p-value (Benjamini and Hochberg, 1995). In particular, the identification of DEGs was achieved through the following criteria: 1. Adjusted p-value threshold of below 0.05, and 2. |log2FC| (absolute value of logarithm to the base 2 of fold change) threshold above 1. Furthermore, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analyses were performed using the bioinformatics resource system DAVID (Database for Annotation, Visualization and Integrated Discovery) (Huang et al., 2009; Sherman et al., 2022).
Data obtained from the screening of Differentially Expressed Genes (DEGs) across individual datasets and tissues were integrated into new data frames, resulting in two merged meta-datasets, each corresponding to a species. These meta-datasets included adjusted p-value, Gene Symbol, and log2FC value as primary features. Subsequently, the following features were added to investigate as predictors in our Machine Learning models: Gene Ontology ID, gravitational method (actual or simulated), gravitational exact method (e.g., spaceflight, bed rest analogue, hindlimb suspension etc.), type of tissue (e.g., soleus, vastus lateralis, gastrocnemius, etc.), microarray platform (GPL6480 Agilent, GPL17692 Affymetrix etc.), Chromosome annotation, and time of exposure in microgravity (in hours). To address imbalanced datasets, resampling techniques such as oversampling of minority classes and the Synthetic Minority Over-sampling Technique (SMOTE) algorithm were applied (Ramachandran et al., 2021). Categorical features were encoded into numeric representations using the OneHotEncoder method from the scikit-learn library (Pedregosa et al., 2011). These structured, preprocessed, and feature-enriched meta-datasets collectively define what is termed an “AI-ready dataset,” optimized for training robust and interpretable Machine Learning models.
After data preparation and pre-processing procedures, we proceeded to train multiple supervised Machine Learning models aimed at developing effective classifiers for gene expression levels. Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, Naïve Bayes, and Artificial Neural Network Classifiers were employed to classify whether a gene was overexpressed or underexpressed. Additionally, we conducted a multi-class classification using K-Nearest Neighbors to predict whether a gene is over-expressed, under-expressed, or non-significantly differentially expressed in microgravity. In the context of our Transfer Learning approach, we initially pretrained our model on mouse datasets and then fine-tuned it using a comparatively smaller human target dataset. To reduce dimensionality and facilitate analysis, Principal Component Analysis (PCA) was applied to both the mouse and human datasets, reducing the feature dimensionality of each dataset to the top 100 principal components, which were then used as input to the classifiers. The performance of these classifiers was evaluated and compared using metrics such as accuracy, precision, and recall. Following hyperparameter tuning of the Neural Network architecture, we incorporated three layers with activation functions: ReLU, SELU, and Sigmoid, respectively. The Nadam optimizer and binary cross-entropy loss function were employed during compilation. Five-fold cross-validation (k = 5) was implemented for robust evaluation. Our dataset was partitioned into train and test sets using an 80–20% split ratio. We utilized a linear kernel for the SVM model, configured the Random Forest classifier with seventy estimators, and applied Gaussian Naïve Bayes to the mouse dataset, while employing the Bernoulli Naïve Bayes model for the human dataset.
Machine Learning models were implemented using the Python programming language, with Jupyter Notebook serving as the web-based interactive computing platform (Van Rossum and Drake, 2009). The TensorFlow library and Keras were employed to execute the Neural Networks (Developers, 2024). Furthermore, the scikit-learn and seaborn libraries were utilized to generate classification reports, confusion matrices, and heatmaps. Pandas, NumPy, and Matplotlib were also utilized for data processing, numerical computations, and data visualization tasks, respectively (Hunter, 2007; Harris et al., 2020; Waskom, 2021).
We present below (Figures 2 and 4) heatmaps depicting the differential expression profiles of genes identified through a DEG analysis of

Heatmap of significantly dysregulated genes in Mus musculus. Genes that are shaded in dark green indicate high up-regulation with a logFC value above 1.5, while genes shaded in light green indicate up-regulation with a logFC value above 1. Genes with a logFC value between −1 and 1 are non-significantly dysregulated. Conversely, genes shaded in dark red represent highly down-regulated genes with a logFC below −1.5. The analyses and results depicted in the Heatmap were conducted with consideration only for the Top 50 differentially expressed genes per study or dataset.

Enriched terms and pathways of dysregulated genes in Mus musculus. Panel a presents the enriched terms of upregulated genes, while panels b and c display the enriched terms of downregulated genes. Corresponding enriched Gene Ontology terms and KEGG pathways are depicted on the right side of each panel, with associated genes shown at the bottom. The dark green color indicates the correlations between genes and enriched terms or pathways.

Heatmap of significantly dysregulated genes in Homo sapiens. Genes that are shaded in dark green indicate high up-regulation with a logFC value above 1.5, while genes shaded in light green indicate up-regulation with a logFC value above 1. Genes with a logFC value between −1 and 1 are considered to be non-significantly dysregulated. Conversely, genes shaded in dark red represent highly down-regulated genes with a logFC below −1.5. The analyses and results depicted in the Heatmap were conducted with consideration only for the Top 50 differentially expressed genes per study or dataset.
The downregulated genes observed across multiple
Moreover, nineteen genes exhibited a dynamic behavior, displaying upregulation in certain tissues while being downregulated in others, thus lacking consistent expression patterns. Among these genes,
Differential expression analysis in
On the other hand, a significant downregulation was observed in twenty-one genes, namely
We present a comparative meta-analysis of orthologous genes between
Conversely, a total of nineteen orthologous genes were found to be downregulated, namely
After the identification of dysregulated genes, two AI-ready merged datasets were produced, one for each species, integrating essential features such as Benjamini-Hochberg adjusted p-value and binary logarithm of Fold Change value for differential gene expression between microgravity conditions and controls, Official Gene Symbol, , Gene Ontology ID, gravitational method (i.e. actual or simulated gravity), gravitational exact method (e.g. spaceflight, bed rest analogue, hindlimb suspension etc.), specific type of musculoskeletal tissue (e.g. longissimus dorsi, extensor digitorum longus soleus, vastus lateralis, gastrocnemius etc.), Chromosome annotation, hours of exposure in microgravity.
Following data preparation, we employed various supervised Machine Learning models to develop efficient classifiers for gene expression levels. Specifically, classifiers including Logistic Regression, Decision Trees, Support Vector Machines, Random Forests, Naïve Bayes, and Neural Networks were utilized for classifying over-expressed and under-expressed gene clusters. Subsequently, we conducted multi-class classification using the K-Nearest Neighbors (KNN) approach to predict whether a gene is over-expressed, under-expressed, or non-significantly differentially expressed. Ultimately, we utilized a Transfer Learning methodology to explore effectiveness of transferring knowledge from

Enriched terms and pathways of dysregulated genes in Homo sapiens. Panel a presents the enriched terms of upregulated genes, while panel b displays the enriched terms of downregulated genes. Corresponding enriched Gene Ontology terms and KEGG pathways are depicted on the right side of each panel, with associated genes shown at the bottom. The dark green color indicates the correlations between genes and enriched terms or pathways.

Confusion Matrices of Mus musculus’ classifiers. a) Logistic Regression, b) Decision Tree, c) Neural Network, d) Gaussian Naive Bayes, e) Random Forest, f) SVM Support Vector Machine, SVC (kernel = linear). The y-axis corresponds to the true expression level, while the x-axis corresponds to the predicted expression level. Class 0 represents down-regulated genes, and class 1 represents up-regulated genes.
In summary, concerning the
Feature importance rankings can provide valuable insights into the factors that influence a classifier’s predictions, aiding in feature selection and interpretation. Here, we present the feature importance plots of Random Forest and Decision Tree models applied to the
Regarding the

Confusion Matrices of Homo sapiens’ classifiers. a) Logistic Regression, b) Decision Tree, c) Neural Network, d) Gaussian Naive Bayes, e) Random Forest, f) SVM Support Vector Machine, SVC (kernel = linear). The y-axis corresponds to the true expression level, while the x-axis corresponds to the predicted expression level. Class 0 represents down-regulated genes, and class 1 represents up-regulated genes.

Classification reports of a) Mus musculus’ and b) Homo sapiens’ classifiers, c) Classification Report of Transfer Learning application. The classification report for our Transfer Learning model reveals an enhanced accuracy of 71% compared to the NN classifier on the Homo sapiens dataset. The performances of these classifiers were evaluated and compared based on accuracy, recall, and precision metrics.
To improve the performance of our Neural Network classifier, initially yielding 62% accuracy on the human dataset, we investigated the application of Transfer Learning techniques. Specifically, we pre-trained our model on the larger mouse dataset to assimilate common features before refining it with the smaller human dataset. Following this, our Transfer Learning model exhibited an improved accuracy of 71% (Figure 8c). Notably, the recall score was higher (at 80%) for the under-expressed class compared to precision, while the precision score was superior (at 78%) for the over-expressed class compared to recall. Figure 9 shows confusion matrices for KNN multi-classifiers and Transfer Learning.

Confusion Matrices of KNN multi-classifiers and Transfer Learning application. a) KNN multi-classifier on human dataset, b) KNN multi-classifier on mouse dataset, class 0 represents down-regulated genes, class 1 represents non-significantly dysregulated genes, and class 2 represents up-regulated genes. c) Confusion matrix of Transfer Learning application, class 0 represents down-regulated genes, and class 1 represents up-regulated genes. The y-axis corresponds to the true expression level, while the x-axis corresponds to the predicted expression level.
Enriched terms and pathways associated with significantly dysregulated genes were identified within each species. Regarding
As far as
Following the comparative meta-analysis of orthologous genes between human and mouse organisms, we identified common dysregulations across species, leveraging the high conservation of orthologous genes across evolutionary time. The upregulated orthologous genes were primarily involved in cellular growth, metabolic regulation, and immune responses, while the downregulation of nineteen orthologous genes was linked to kidney-related complications, Type 2 Diabetes, and pathways such as glucose metabolism dysfunction and insulin resistance. The identification of these dysregulated orthologous genes holds potential for elucidating conserved biological pathways such as impaired muscle and skeletal development, cardiovascular functions, metabolic homeostasis, and inflammatory and stress response.
Following the identification of enriched terms and pathways for the dysregulated genes, we constructed AI-ready merged datasets, one per species, to implement multiple supervised Machine Learning models and develop efficient classifiers. Random Forest and Neural Network classifiers showcase higher performance on the
Therefore, we explored the application of Transfer Learning since we aimed to enhance the performance of the Neural Network classifier, initially achieving 62% accuracy on
In conclusion, this study highlights the significant biological implications of altered gravitational conditions on gene expression and the dysregulation of pathways. The findings in
Similarly, in
Additionally, this study highlights the potential of leveraging Machine Learning and Transfer Learning methodologies to overcome the challenges of analyzing species-specific datasets in the context of space research. The application of Transfer Learning provides valuable insights into bridging interspecies gaps by facilitating the effective transfer of knowledge from model organisms to humans, addressing the limitations posed by the scarcity of human data. To further advance the field, future research should prioritize standardizing designs to minimize heterogeneity, expanding the diversity and size of datasets, incorporating longitudinal studies to capture temporal dynamics, and leveraging advanced computational methodologies such as multi-omics integration, Transfer Learning, and explainable AI models.