Acceso abierto

Machine learning for stem cell differentiation and proliferation classification on electrical impedance spectroscopy


Cite

Introduction

Cell regenerative medicine (CRM) is a field that started to emerge in recent years. Advancements in growth and manipulation of stem cells allows us to envision the possibility of replacing damaged tissues with synthetic solutions based on cells grown artificially [1]. The success of the development of these technologies depends at a basic level on two fundamental cell behaviors: proliferation and differentiation as stem cells need to replicate and then to be matured. However, relevant challenges are yet to be solved. Poor cell retention and survival, teratoma of pluripotent stem cells in vivo and limited cell fate and survival non-invasive assessment in vivo prevent the management of cell regenerative techniques [2]. Even something as fundamental as assessing cell survival remains elusive in vivo [3]. For these reasons, it is paramount to develop and study technologies that allow for non-invasive cell behavior assessment [4].

One important cell behavior in cell regenerative medicine is proliferation, that is, the ability for the cell to replicate itself in its current form to an identical copy of itself, which is the fundamental process to be able to grow synthetic cell based tissues. To this purpose, it is important to develop non-invasive assays of cell proliferation events to evaluate whether or not cells are growing. In addition to proliferation, cells can undergo the process of differentiation, which is when the stem cells mature in different cell types, either artificially or naturally during development. It should also be noticed that there are different types of stem cells, for example pluripotent stem cells can be differentiated into any cell type while more specific stem cell lines such as neural stem cell will differentiate into different types of neurons and glial cells. It is also fundamental to develop techniques to assess this process to ensure cells differentiate into the targeted cells and not into teratoma or other abnormal structures [5].

Neurodegenerative disorders occur as a result of progressive loss of structure, function or death of neurons in the central nervous system. Many of these disorders such as Parkinson’s, Huntington’s or Epilepsy have a high prevalence and are associated with impairments and disabilities with high emotional, financial and social burden for not only the patients but also the community. These diseases are incurable thus requiring the development of treatments [6]. An increasing trend in the research for CRM treatments for neurodegenerative diseases is the development of implantable stem cell based medical devices. The basic premise of these treatments is to replace dysfunctional neuronal structures with implanted replacements as previously described urging to solve the aforementioned challenges [7].

While the integration of cell fate imaging in clinical trials may help overcome these obstacles, live bioimpedance monitoring may be the ideal candidate to surveil those treatments [4]. Moreover, a panoply of monitoring and control strategies will be fundamental in overcoming these problems to enable the success of cell-based regenerative medicine [2]. Many studies based on electrical impedance measurements of live biological cells allowed the technique to become widely accepted as a label free, non-invasive and quantitative analytical method to determine cell status. Electrical impedance has been shown to be successful not only to proliferation [8] but also apoptosis [9], migration [10], degeneration [11], morphological changes [12] and also (neuronal) differentiation [13].

However, electrical impedance measurements can be quite complex. While single frequency measurements are sufficient to provide information about cell proliferation[14], complex behaviors such as differentiation are frequency dependent [13]. Electrical impedance spectroscopy (EIS) measurements use a multi-frequency AC signal to take an array of impedance measurements resulting in a 2D spectrum. The specificities of each spectrum are not immediately visible (see Figure 1) and often require extensive postprocessing [13]. Even for traditional bioimpedance measurements, parametrization of the data to electrically equivalent circuits is often a challenge on its own, and while some single cell measurements are success stories of this approach [14], the scalability of these methods as we delve into complex collective and highly non-linear biological processes such as stem cell differentiation, it becomes nearly impossible to establish variables much less to model the entire system. On the other hand, with the possible exception of textbook applications, the machine learning methods rapidly become very complex resulting in black-box models where the interpretation of the relation between variables is often labelled as speculative even if effective [15].

Figure 1

Relative impedance spectra for the three cell lines for proliferation (green) and differentiation (blue) on Day 20.

Passive electrical properties in cell cultures reflect the combination of structures and liquids that form the cell like cell membranes and cytoplasm culture but also the apparatus like scaffolds and electrodes. Ionic conduction in the cell culture medium and biological liquids contribute to the resistivity, while membrane structures contribute to the capacitive properties. While it can be argued that in cell proliferation measurements the relationship between the biological processes and changes in the passive electrical properties are fairly linear, reaching a saturation level not changing in direction as the process evolves, such is not the case in cell differentiation. Cell differentiation is a complex phenomenon where stem cells dramatically change shape and space distribution. Through simultaneous processes of migration, proliferation and transient reshaping from stem cell to mature cell geometry, stem cells mature into neurons or astrocytes for instance, changing from fairly aggregated ellipsoids to dendritic shapes and networks. This can be seen as a highly non-linear system characterized by conflicting events that simultaneously compete such as proliferation (impedance increase) or the increase in the total area of cell membrane (increasing electrical capacitance) created by the dendritic geometry competing with decreased cell density as neurons are sparser when compared to the precursor stem cells (less cells and therefore less total membrane and also big morphological changes). As with tissue [16], this leads to the possibility that the value of electrical parameters changes at different rates and overlap at different time durations of the underlying biological processes, making the ability to discern the state of the cell culture dependent on memory and the historical development of the parameters as not every cell will not undergo the same process at the same time. This represents a big challenge even for the application of machine learning as the differentiation phenomenon is non-linear both in time and also regarding the subjacent processes that compose it.

However, this change ultimately results in different impedance spectra, an increased granularity is required by coupling EIS with a time series analysis. One big obstacle to this approach is that even the most established methodologies require extensive manual measurement and post processing analysis. As an example, a system based on a chip with 54 electrodes with a resolution of 51 points through a spectrum of 500 Hz to 5 MHz will result 2754 data points per chip for a single measurement! To address this issue, a popular method to reduce the information of the system, reducing it to the changes it experiences from measurement to measurement is to use relative impedance over the maximum change of an averaged spectrum for each daily measurement. While this is an intelligent way of both dealing with electrode contributions and also reducing the size of the samples, we argue that machine learning methods offer a viable mechanism to automate some of the post processing methods since it can be used to perform automatic feature extraction and enabling new insights over the processes subjacent to cell differentiation by allowing increased granularity through post processing automation. Also, as CRM becomes an established field, clinical monitoring will take precedence over scientific interpret-tation [16], making the prediction performance more important than drawing inference from the data. The overall aim can typically be to assess cell state or levels of pathological changes related to situations where there is a need for improvement of the cultured cells. Recent studies from our group demonstrated promising results using a recurrent neural network (RNN) with long short-term memory (LSTM). Through hours long measurements, they obtained high classification accuracies using repeated measurements in tissue related studies suggesting that this type of machine learning approach may be useful in a wider sense for impedance time series problems [16,17].

In this study however, we investigate the use of these same methods for the analysis of EIS measurements on neural precursor cells (NPC) proliferation and differentiation characteristic profiles, with the aim of assessing the potential in distinguishing both biological events under conditions where conventional methods would be inadequate. By profiling differentiation and proliferation, we expect to be able to be able to identify trends subjacent to differentiation and proliferation during different stages. With this work, we expect to lay the stone for a pioneering framework that may advance the possibilities in fully utilizing the information within EIS measurements on differentiating stem cell cultures.

Materials and methods

Since this work is a pioneering proof of concept relying on data obtained from previous experiments with measures that lack the necessary time resolution for an in depth work, we limited the scope to a simple but useful application: distinguish cell proliferation from the differentiation processes. The cell measurements were conducted at the Division of Molecular Biological-Biochemical Processing Technology, Center for Biotechnology and Biomedicine (BBZ), University of Leipzig, Germany. The overall setup follows the methodology developed by this group [13].

As a proof of concept, we tried to constrain the data variables as much as possible without compromising the quality of the study. With this in mind, we chose to follow the original methodology [13] behind the work done using this setup and used relative impedance spectra as the source of our data. The basic setup consists of a Sciospec ISX-5 impedance analyzer (ScioSpec GmbH, Germany) and 20 different single well microelectrode array (MEA) chips of 54 electrodes with a common reference sharing the same design. Measurements were done from 500 Hz to 5 MHz across 51 frequencies (approximately at 10 per decade). To ensure variability, three different cell lines of pluripotent stem cells were used and for each one of them, a different model was trained. Each model was trained using only the data from two MEAs, one for proliferation and another for differentiation for day 20. The three cell models were tested on a total of 20 different MEAs (4 differentiation chips and 2 proliferation chips for the first two cell lines, and 3 differentiation chips and 5 proliferation chips for the third) performed on day 20 but also on day 10. While the focus of this work is to be able to distinguish proliferation from differentiation, we also expect to draw some conclusions about the robustness of the model trained only using the information of a single day on catching the trend of the process at a much earlier stage on day 10.

Cell lines and cell culture

Experiments were performed with hiPS cell lines 4603c27, IMR90c01 and IMR90-4. IMR90c01 and 4603c27 were obtained from the Institute for Stem cell Therapy and Exploration of Monogenic diseases (I-Stem), while IMR90-4 was obtained from WiCell Research Institute, Inc.

The hiPS cells were cultured in self-prepared mTeSR1 (mTeSR1self), according to the published protocol by Ludwig et al. [18], at 37 °C, 5 % CO2 and 95 % relative humidity. Before cell seeding, plastic ware was coated with growth factor-reduced Matrigel (Corning, cat.-nr. 356231) at a concentration of 0.12 mg/ml in DMEM/F12 for 2 h at room temperature or overnight at 4 °C. A medium change was performed every other day. Passaging of hiPSCs was performed in colonies with 0.5 mM EDTA solution.

The three hiPS cell lines were differentiated into stable neural precursor cell (NPC) lines using a protocol by Li et al. [19]. For this, hiPSCs were seeded as single cells in neural induction medium (NIM) and cultured for 7 days with medium change every second and later every day. See Supplemental Material for media composition. On Day 7, the generated NPCs were passaged with 0.05 % Trypsin-EDTA and cultivated in neural proliferation medium (NPM) for ten passages to generate stable neural precursor cell lines. 10 μM ROCK-Inhibitor was added during the first six passages.

For the final differentiation from NPCs to mature neurons, cells were cultivated in astrocyte-conditioned medium for up to 35 days. For long-term cultivation, Matrigel was used at a higher concentration of 0.18 mg/ml.

Generation of astrocyte-conditioned medium (Astro-NDM)

Primary murine astrocytes were isolated from cortices of neonatal rats (postnatal day 1). The preparation was done in accordance with animal welfare laws. Astrocytes were expanded in astrocyte proliferation medium (see Supplemental material for media composition). Media were changed every 2-4 days with thorough washing to remove neurons and other less adherent cells.

At confluence, astrocyte proliferation medium was changed to neuronal differentiation medium (NDM). NDM was kept on the astrocytes for 4 days to generate astrocyte-conditioned medium (Astro-NDM). The Astro-NDM was sterile-filtered and stored at −20 °C.

Microelectrode arrays (MEA)

For impedance spectroscopy, microelectrode arrays (MEAs) with 54 measurement electrodes and a single connected reference electrode, all made from gold, were used. The electrode arrangement is shown in Figure 2. All MEAs were produced in a clean room facility with standard lift-off techniques by the Division of Molecular Biological-Biochemical Processing Technology, Center for Biotechnology and Biomedicine (BBZ), University of Leipzig, Germany.

Figure 2

1w54e-Microelectrode array with 54 measurement electrodes in a single well format. Arrangement of electrodes with diameter and horizontal and vertical spacing shown in mm.

Electrical impedance spectroscopy (EIS)

Impedance spectra were recorded during the final differentiation from NPCs to mature neurons with a hybrid instrument for impedance spectroscopy and field potential recording (ScioSpec ISX-5, ScioSpec GmbH, Germany). NPCs generated with each hiPS cell line were seeded on MEAs at a density of 1 Mio cells/cm2 and differentiated with astrocyte-conditioned medium (Astro-NDM). As a control, NPCs were also seeded at a density of 0.25 Mio cells/cm2 and cultivated in neural proliferation medium (NPM) for comparison of neuronal differentiation and proliferation. Impedance spectra were recorded every fifth day for up to 35 days in the frequency range from 500 Hz to 5 MHz (51 frequency points, logarithmic) at a voltage of 20 mV.

For analysis, relative impedance (|Z|rel) was calculated from the impedance of the cell-covered (|Z|cell) and cell-free electrode (|Z|blank) based on the following equation: |Z|rel[%]=|Z|cell-|Z|blank|Z|blank100%\left| Z \right|_{rel}\, [\% ] = {{\left| Z \right|_{cell} - \left| Z \right|_{blank} } \over {\left| Z \right|_{blank} }} \cdot 100\%

Very low relative impedance values from electrodes not or only partially covered with cells, as well as extremely high relative impedance values from damaged electrodes were excluded from analysis.

Training of artificial neural networks

Cell proliferation and differentiation electrical impedance spectra can be interpreted as bidimensional sets of information. Changes in the spectrum as time goes by represents one dimension of the problem while frequency, which in this case is expected to reflect the spatial changes in the system, represents another. While it would be interesting to use machine learning methods to assess the time dependencies of the underlying biological processes especially in the case of differentiation, constraints on available data prompted a different route for this study. The space (frequency) and time domains of this system are not independent as spatial changes occur as a dependency on time as the system evolves and the changes in the spectra are not drastic, but smooth reshapes. In the domain of biology, we can recall the words of Leibniz, Natura non facit saltus. As we profile both proliferation and differentiation, we expect to be able to profile the underlying features of each spectra that allow a classification on one or the other. By testing these models at an earlier stage of the biological process, we will test the ability of the model capturing the aforementioned features by applying them at an earlier stage. While an accuracy drop is expected, if the model is overfitted, this drop should be accentuated but if it still performs favorably, then we might speculate that we are successful at catching relevant features.

Accuracy and loss data for both training and testing (day 20) for day 20 cell models. Maximum (green) and minimum (red) values signalled for each cell model.

Training4603c27IMR90c01IMR90-4
DayID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLoss
2015531514101517151910.02155215100.910.35
Testing
DayID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLoss
20152015640.880.56155520210.780.64202015160.870.50
201215640.870.77204220210.740.56202215160.960.45
204115640.940.22205020210.820.57205515160.950.42
204415640.970.15205320210.720.60202015560.810.53
152020470.860.55155520480.770.77202215560.930.47
201220470.850.79204220480.730.73205515560.910.44
204120470.930.16205020480.810.80202020100.860.52
204420470.970.82205320480.710.71202220100.970.46
205520100.960.43
202020130.830.55
202220130.930.48
205520130.920.46
202020140.840.58
202220140.950.51
205520140.930.48
Average0.910.500.760.670.910.47

This double nature of the problem motivates us to procure a contextual solution. As we wish to profile the spectra, we interpret them as a sequence classification problem where the spectra are interpreted as sequences and classification is between proliferation and differentiation. For this kind of problem, recurrent neural networks are often the typical solution [20]. Artificial neural networks are a set of algorithms, which are modelled after the human brain. The input data is passed through the nodes that contain activation functions before they are passed on to the next layer nodes where we eventually use the output from the output nodes. For this case, we used Long short-term memory (LSTM) artificial recurrent neural networks. A special feature of LSTM is their ability to carry short term information across the layers which provides the contextual ability we need. This type of neural network is often used to classify sequences through space or time or linguistic processing with the weakness of easily incurring in overfitting, by far the biggest problem with these implementations [21].

To address overfitting concerns, two methods were used. A fully connected hidden layer is added using Ridge regularization. This particular choice of regularizer is known in the machine learning literature as weight decay because in sequential learning algorithms, it encourages weight values to decay towards zero, unless supported by the data [22]. While the hidden layer used Ridge regression for weight decay, ensuring the neural network to become an ensemble of weak classifiers, random Dropout was implemented meaning randomly selected classifiers were dropped during training to be recovered again during testing of the model resulting in higher accuracy during testing and reducing overfitting [23].

To translate the probabilistic nature of the model into either proliferation or differentiation classes, a Softmax activation function after a dense layer was used (mimicking the binary classes into a distribution) as the model outcome should express the underlying features likelihood to be one or the other. By assigning decimal probabilities to the categories, they must add to 1.0 providing an additional constraint to the method resulting in faster convergence.

Accuracy and loss data for both training and testing (day10) for day 20 cell models. Maximum (green) and minimum (red) values signalled for each cell model.

Training4603c27IMR90c01IMR90-4
DayID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLoss
2015531514101517151910.02155215100.910.35
Testing
DayID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLossID (Diff.)ID (Prol.)AccuracyLoss
10152015640.780.65155520210.660.60202015160.870.60
201215640.740.56204220210.570.77202215160.800.62
204115640.810.57205020210.640.90205515160.900.55
204415640.720.60205320210.640.57202015560.880.59
152020470.770.70155520480.670.63202215560.810.62
201220470.730.62204220480.570.79205515560.920.55
204120470.800.62205020480.650.92202020100.890.59
204420470.700.65205320480.650.60202220100.820.61
205520100.930.54
202020130.860.60
202220130.790.62
205520130.890.56
202020140.870.63
202220140.800.65
205520140.910.58
Average0.760.620.630.720.860.59
Ethical approval

The research related to human use has been complied with all relevant national regulations, institutional policies and in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.

The research related to animals use has been complied with all the relevant national regulations and institutional policies for the care and use of animals.

Results

The training accuracy of 100% with minimal loss for two of the three cell lines show the algorithm was able to learn the differences between a proliferation and a differentiation spectrum while the third cell line shows 90%. Loss at near zero for the two first cell lines against 30% for the third should be noticed. During the test for the same day of the training data, day 20, the results are very good. With a maximum accuracy of 97%, 82%, 97%, minimum of 85%, 71%, 81% with an average value of 91%, 76%, 91% for each of the cell lines, the model validates the data we are using as it shows high similarity between similar cell cultures. By analyzing the expected accuracy drop as we apply the model to a different day, the average accuracy drops by 15%, 13%, 5% to 76%, 63%, 86% for each cell line.

Discussion

As we performed this study, two major conclusions were reached. Machine learning methods are a powerful tool to process EIS measurements on cells, having shown the ability to distinguish both proliferation and differentiation behaviors which could have countless applications on stem cell implant therapies.

While the focus of this work is to be able to distinguish proliferation from differentiation, we also expect to draw some conclusions about the robustness of the model trained only using the information of a single day (day 20) by catching the features of the process at a much earlier stage at day 10. The accuracy drop of 15% and 13% for the first two cell lines as we apply the model to a different day could suggest the models were capable of extracting underlying features of differentiation and proliferation and that we probably avoided excessive overfitting. However, the third cell line model distinguishes itself with a drop of 5%. At the same time with a training loss of almost 30% versus the very low training loss for the two previous models, it might indicate in one hand that the two first cell lines spectra are more distinguishable (see Figure 1) but may also point for some overfitting on these cases. As the third cell line model holds itself better through time changes, it is suggested that the underlying features extraction was successful here but considering the limitations of the available data, this hypothesis requires further research. It is relevant to point that, as the model profiles the spectra by using memory mechanisms, this same result points strongly to the coupling effects between the shape of the spectrum and its evolution through time, which suggests that valuable insights might be obtained by addressing the temporal behavior of the system.

By studying the difference between differentiation and proliferation using the aforementioned machine learning methods, as differentiation is an umbrella process composed by many subprocesses, we speculated that an underlying proliferation process might be shown at an earlier stage of the cell culture by a model trained at a later stage by the systematic accuracy drop as the ability of the model to distinguish both processes decreases. While the results point towards this, the third cell line drop at only 5% is not enough to draw this conclusion without more data.

While still in its infancy, this proof of concept suggests that machine learning methods might be used to identify the underlying biological processes beneath stem cell differentiation. A possible approach would be to model each basic behavior (migration, reuptake, proliferation, reshape) and by using high time resolution EIS measurements, to determine the dominant process during each stage of differentiation or less ambitiously, to perform a time series analysis with the methods used in this work.