Prostate cancer is a severe threat to human lives. Approximately 1 in 7 men will be diagnosed with prostate cancer throughout their lifetimes, and 1 in 39 men will die from prostate cancer. There are many factors which increase or decrease the survival time of prostate cancer patients. Data is used here from a randomised clinical trial for the choice of treatment for prostate cancer patients in stages 3 and 4. This study is done to identify probable variables that influence the survival time of patients only for these two stages. The AFT and the Cox-PH models determine how variables affect prostate cancer patients' survival time.
Published Online: 30 Dec 2022 Page range: 99 - 126
Abstract
Summary
This paper concerns the problem of diagnosing the type of cancer with the use of machine learning and statistical methods. Nowadays, the problem of neoplasms, in particular breast cancer, is one of humanity's greatest challenges. The identification of cancer and its type is extremely important. In solving this problem, classification methods can be used as objective tools that may be helpful for doctors making a diagnosis. For this reason, we discuss many efficient classifiers in the context of cancer detection. In addition, we consider the topic of data set transformations to deal with the problem of data unbalance, as well as measures of classification quality. In the experimental part, an attempt will be made to find the best classifier and to improve the quality of the original data set to obtain the highest values of classification quality measures for a particular data set.
Published Online: 30 Dec 2022 Page range: 127 - 139
Abstract
Summary
We compared two of the most common methods for differential expression analysis in the RNA-seq field: edgeR and DESeq2. We evaluated these methods based on four real RNA-seq plant datasets. The results indicate that there is a large number of joint differentially expressed genes between the two methods. However, depending on the research goal and the preparation of an experiment, different approaches to statistical analysis and interpretation of the results can be suggested. We focus on answering the question: what workflow should be used in the statistical analysis of the datasets under consideration to minimize the number of falsely identified differentially expressed genes?
Published Online: 30 Dec 2022 Page range: 141 - 157
Abstract
Summary
In this paper, some multivariate and double multivariate modelling approaches are presented. Moreover, this article provides an overview of the modelling of the structure of the covariance matrix. Furthermore, some methods of covariance structure identification are given.
Published Online: 30 Dec 2022 Page range: 159 - 169
Abstract
Summary
Linearly structured covariance matrices are widely used in multivariate analysis. The covariance structure can be chosen from a class of linear structures. Therefore, the optimal structure is identified in terms of minimizing the discrepancy function. In this research, the entropy loss function is used as the discrepancy function. We give a methodology and algorithm for determining the optimal structure from the class of structures under consideration. The accuracy of the proposed method is checked using a simulation study.
Published Online: 30 Dec 2022 Page range: 171 - 177
Abstract
Summary
The Faculty of Agronomy at the University of Life Sciences in Poznań conducted laboratory tests on the content of B vitamins in the grain of three varieties of yellow-colored fodder maize. The grains of the variety ES Metronom had the statistically significantly highest content of vitamins B1 and B9. In the case of vitamin B3, the significantly highest concentration was recorded in the grain of the variety ES Abakus, while the lowest concentration was found for the variety ES Metronom. In contrast, the grain of the variety ES Bombastic had significantly higher vitamin B6 content than that of the varieties ES Abakus and ES Metronom. The grain of the variety ES Metronom had significantly higher vitamin B9 content than the other two varieties tested. In general, it should be concluded that the content of B vitamins in maize grain is not determined by the type of maize hybrid.
Prostate cancer is a severe threat to human lives. Approximately 1 in 7 men will be diagnosed with prostate cancer throughout their lifetimes, and 1 in 39 men will die from prostate cancer. There are many factors which increase or decrease the survival time of prostate cancer patients. Data is used here from a randomised clinical trial for the choice of treatment for prostate cancer patients in stages 3 and 4. This study is done to identify probable variables that influence the survival time of patients only for these two stages. The AFT and the Cox-PH models determine how variables affect prostate cancer patients' survival time.
This paper concerns the problem of diagnosing the type of cancer with the use of machine learning and statistical methods. Nowadays, the problem of neoplasms, in particular breast cancer, is one of humanity's greatest challenges. The identification of cancer and its type is extremely important. In solving this problem, classification methods can be used as objective tools that may be helpful for doctors making a diagnosis. For this reason, we discuss many efficient classifiers in the context of cancer detection. In addition, we consider the topic of data set transformations to deal with the problem of data unbalance, as well as measures of classification quality. In the experimental part, an attempt will be made to find the best classifier and to improve the quality of the original data set to obtain the highest values of classification quality measures for a particular data set.
We compared two of the most common methods for differential expression analysis in the RNA-seq field: edgeR and DESeq2. We evaluated these methods based on four real RNA-seq plant datasets. The results indicate that there is a large number of joint differentially expressed genes between the two methods. However, depending on the research goal and the preparation of an experiment, different approaches to statistical analysis and interpretation of the results can be suggested. We focus on answering the question: what workflow should be used in the statistical analysis of the datasets under consideration to minimize the number of falsely identified differentially expressed genes?
In this paper, some multivariate and double multivariate modelling approaches are presented. Moreover, this article provides an overview of the modelling of the structure of the covariance matrix. Furthermore, some methods of covariance structure identification are given.
Linearly structured covariance matrices are widely used in multivariate analysis. The covariance structure can be chosen from a class of linear structures. Therefore, the optimal structure is identified in terms of minimizing the discrepancy function. In this research, the entropy loss function is used as the discrepancy function. We give a methodology and algorithm for determining the optimal structure from the class of structures under consideration. The accuracy of the proposed method is checked using a simulation study.
The Faculty of Agronomy at the University of Life Sciences in Poznań conducted laboratory tests on the content of B vitamins in the grain of three varieties of yellow-colored fodder maize. The grains of the variety ES Metronom had the statistically significantly highest content of vitamins B1 and B9. In the case of vitamin B3, the significantly highest concentration was recorded in the grain of the variety ES Abakus, while the lowest concentration was found for the variety ES Metronom. In contrast, the grain of the variety ES Bombastic had significantly higher vitamin B6 content than that of the varieties ES Abakus and ES Metronom. The grain of the variety ES Metronom had significantly higher vitamin B9 content than the other two varieties tested. In general, it should be concluded that the content of B vitamins in maize grain is not determined by the type of maize hybrid.