This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
Introduction
In 2006, Takahashi and Yamanaka [1] generated induced pluripotent stem cells (iPSCs) through introducing Oct4, Sox2, Klf4 and c-Myc into mouse fibroblasts. Later, numerous studies demonstrated that iPSCs and embryonic stem cells (ESCs) were similar in both the cell morphology and functions of self-renewal and differentiation into almost all cell types. Particularly, Zhao et al. [2] and Zhao et al. [3] reported the generation of viable and fertile mice through tetraploid complementation, an assay considered to be the most stringent test for pluripotency and developmental potency. The assay was performed in such a way that at the 2-cell embryo stage, the two cells were fused to create tetraploid (4N) embryos, which would normally cease to develop at a later stage, so that no viable embryos were generated. When the iPSCs were injected into these tetraploid embryos, the resulting embryos or animal must develop entirely from the injected iPSCs. Thus, the successful tetraploid complementation assay confirmed that iPSCs could attain true pluripotency that is extremely similar to embryonic stem cells (ESCs), generated from in vivo or nuclear transfer embryos [4,5]. Together with the fact that iPSCs were believed to have less of an ethics issue compared to ECSs, making them important and valuable resources for both medical applications and fundamental science research [6,7].
With promising potential for clinical application, human iPSCs provide hopeful resolutions for disease research, cell therapy and even organ transplantation. Human iPSCs can be derived from a patient’s somatic cells, which can be differentiated into a disease model, such as Parkinson’s and Huntington’s models [8,9]. Compared with the use of traditional immortalized cell lines, patient-derived iPSCs can accurately reflect the drug response in patients, thus making them a valuable tool for screening new drug candidates [10,11]. In addition, iPSCs can also be the potential sources for curing some kind of genetic diseases. Hanna et al. [12] demonstrated that sickle cell anemia model mouse can be rescued by transplanting hematopoietic progenitors derived from corrected autologous iPSCs. And our laboratory also reported that the transplantation of iPSCs with the normal human β-globin gene into the β+-thalassemia (β+-thal) IVS-II-654 (C>T) (HBB: c.316-197C>T) (http://globin.cse.psu.edu) blastocytes conditionally reversed the pathology of anemia [13]. In 2014, the first autologous iPSCs derived from retinal tissue were applied clinically [14,15], representing the landmark of iPSCs that took the step from basic research to clinical application. Recently, a series of clinical trials were proposed and performed in Japan, the iPSCs were used to treat heart disease and Parkinson’s disease [16, 17, 18]. Despite this, however, the iPSCs’ safety is still an important issue before their application in stem cells and regenerative medicine. During the process of iPSCs induction by retrovirus, integration of viral DNA into host cells is an essential step. However, the random integration of the exogenous genes into genome of host cells may affect the pluripotency and safety of iPSCs [19], especially by inactivating some functional genes or activating pro-oncogene. To investigate the influence of exogenous transcription factors (TFs) to the pluripotency and safety of iPSCs, this study detected the integration sites of exogenous genes and the function of the flanking genes in three iPSC lines, which were shown to be fully pluripotent through tetraploid complementation assay, reported in our previous study [2].
Materials and methods
Nested Inverse Polymerase Chain Reaction. Nested inverse polymerase chain reaction (iPCR) was used to detect the integration sites of four exogenous transcriptional factors in three iPSC lines (Figure 1). Genome DNA fragmentation and circularization: iPSCs IP14D-1, IP14D-6, IP14D-101 [2] and mouse tail fibroblast cells (B6D2F1) were harvested. The genomic DNA was extracted [TIANamp genomic DNA kit; Tiangen Biotech, Beijing, People’s Republic of China (PRC)], and then digested in 40 μL of digestion reaction that contained 2 peg of genome DNA, 2 μL BamHI, 2 μL BglII, 4 μL 10× Fast Digest Buffer (Fermentas Life Science, St. Leon-Rot, Germany). After incubation in 37 °C for 4 hours, 5 μL digests were electrophoreses in 0.9% agarose gel. The remaining digests were purified by QIAquick Gel Extract Kit (Qiagen GmbH, Hilden, Germany) and then self-ligated within 1000U T4 DNA ligase (New England Biolabs Ltd., Hitchin, Hertfordshire, UK) in a 200 μL reaction. After overnight incubation at 22 °C, ligated DNA was purified and dissolved in 30 μL water.
Nested iPCR Primer Designations. We designed the forward primers paired with the skeleton of the vector, and the inverse primers paired with exogenous TFs (Plasmid pMXs-Sox2: No.13367, pMXs-Oct4: No.13366, pMXs-Klf4: No. 13370, pMXs-c-Myc: No. 13375; Addgene, Watertown, MA, USA) (Table 1).
The primer sequences of nested inverse polymerase chain reaction
First iPCR. The cyclized DNA from iPSCs IP14D-1, IP 14D-6, IP 14D-101 and mouse B6D2F1, were used as the template for PCR amplification in 25 μL reaction system that contained 1 μL of cyclized DNA, 0.75 μL forward first ICR primer and 0.75 μL reverse first iPCR primer, 2.5 μL PER buffer for KOD-Plus- (10×), 2.5 μL dNTPs (2 mM), 1 μL MgSO4 (2 mM), 0.5 μL KOD-Plus- (Toyobo Co. Ltd., Osaka, Japan). The reaction condition was 94 °C for 2 min., 25 cycles at 94 °C for 15 seconds, 60 °C for 30 seconds, 68 °C for 3.5 min.
Second iPCR. The secondary iPCR system used 1 μL product of the first iPCR as template and the primers were 0.75 μL forward second iPCR primer and 0.75 μL reverse second iPCR primer, followed the same program as the first iPCR. The reaction condition was 94 °C for 2 min., 25 cycles at 94 °C for 15 seconds, 60 °C for 30 seconds, 68 °C for 3.5 min.
The second PCR product was electrophoresed on 1.0% agarose, the specific fragment was recycled for sequencing. To find the integration sites of the exogenous TFs, the target and vectors sequence were aligned using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
Microarray Data Analysis of Flanking Genes. After confirmation of the integration sites, expressing pattern of the flanking genes of the integration sites in IP14D-1, IP14D-6, IP14D-101, mouse embryonic fibroblast (MEF), and ESCs were compared using the global expression analysis (part microarray data were obtained from the Gene Expression Omnibus repository, accession number GSE16925; https://ncbi.nlm.nih.gov/geo/).
Gene Oncology and Cluster Analysis of Flanking Genes. To verify whether the flanking genes were related to development, cell differentiation or cancer, functions of the flanking genes were analyzed using the online database MGI (http://www.informatics.jax.org/). Moreover, cluster analysis was used to investigate the functions of the flanking genes and to identify whether any important signaling pathways were involved. In addition, we analyzed the 500 kb of downstream and upstream flanking genes from the integration sites through mouse genomic information. To verify whether these genes were associated with tumorigenesis, flanking genes with known oncogenes were compared according to the database described by Akagi et al. [20].
Results
Identification of Transgene Integration Sites. Electrophoresis results of digested genomic DNA showed large diffused bands, indicating that the genomic DNA were digested completely [Figure 2(A)], while nested iPCR products displayed multiple bands, indicating multiple integration sites of exogenous TFs in iPSCs [Figure 2(B)]. Through alignment of the specific sequence, we identified 22 integration sites for the four TFs in three different iPSC lines, in which 17 (77.3%) were located in intergenic regions, and five (22.7%) were located in introns far from the transcription start sites.
Expression Profile Analysis of Flanking Genes of Integration Sites. For the 22 sites above, 39 flanking genes were involved (Table 2). Of the 39 flanking genes, five genes including LOC101055956, LOC105244150, Olfr456, Olfr455 and Nanos3, were not expressed in iPSCs, MEF or ESCs. The expression patterns of the remaining 34 genes in three iPSC lines, MEF and ESCs were compared, and no distinct expression difference was observed in these cells (Figure 3), indicating that the integration of exogenous genes did not activate the expression of flanking genes.
The identified integration sites and their flanking genes.
Functional Analysis of Flanking Genes of Integration Sites. For the 39 flanking genes, none of them was found to have the function as early embryonic development or cancer-related genes (Table 3). Of the five flanking genes mentioned above, Olfr456 and Olfr455 were olfactory receptors that were specifically expressed in olfactory cells [21,22], while Nanos3 was involved in spermatogenesis and specific expressed in germ cells [23]. LOC101055956 and LOC105244150 are predicted genes, and the other 34 genes are necessary for normal life activities. On the other hand, the cluster analysis results showed that these genes have different functions and were involved in different metabolic activities. Moreover, no common function was found in early embryonic development or differentiation between these genes except that some of them, including LSM10, were relative to cartilage development (Figure 4). Additionally, we further studied the function of genes located 500 kb upstream and downstream from integration sites and none of the genes that related to tumorigenesis was found in that range. Taken together, we concluded that the integration of exogenous genes would not influence the safety of iPSCs.
The molecular functions of flanking genes.
Genes
Function
Mainly Process Involved
Phlda1
protein binding
cell apoptosis; programmed cell death
Krr1
RNA binding; POLY-A binding
biosynthesis of ribosomes; rRNA process
Vat1
Zn2+ binding; oxioreductase activity
negative regulation of mitochondrial fusion; redox
LOC101055956 and LOC105244150 are predicted genes, the function and mainly process involvement is as yet unknown.
Discussion
Once the first successful induction of pluripotent stem cells from adult cells, Hanna et al. [12] demonstrated that iPSCs could be used to treat sickle cell anemia in the mouse model. Later, Raya et al. [6] obtained disease-corrected, patient-specific iPSCs that can be used for cell therapy without immune rejection. Using chimera models, Yang et al. [13] demonstrated that gene-modified iPSCs derived from the β+-thal IVS-II-654 mouse significantly improved the disorders in β+-thal IVS-II-654 mice, especially when the chimerism of iPSCs with normal human β-globin gene was more than 30.0%.
The traditional method of pluripotent cell induction is transferring the exogenous TFs into MEF via retrovirus. However, retrovirus can randomly integrate the exogenous genes into host cells with multiple copies, which may inactivate the tumor suppressor genes or activate the protooncogene or make any frameshift mutation [24]. Thus, retrovirus could probably influence the pluripotency and safety of iPSCs. Despite that the pluripotency of iPSCs has been confirmed through the tetraploid complementation assay [2], the exogenous TFs’ integration sites can still influence the development and health of the iPSCs-derived mice.
To investigate possible issues with these integration sites, nested iPCR [25] was performed in three iPSC lines previously identified to be pluripotent through the tetraploid complementation assay [2]. A total of 22 integration sites were identified, 77.3% (17) were located in the intergenic region, while 22.7% (five) were within introns far from the transcription start sites. Expressional profiles of 39 flanking genes were analyzed and functions of these flanking genes were reviewed in the iPSCs, ESCs and MEF. Our results showed that these flanking genes have no distinct difference in the expression levels in the three iPSC lines, MEF and ESCs. Moreover, none of the 39 flanking genes correlated to early embryonic development or differentiation, and most of them belonged to housekeeping genes, which are necessary for the basic life events of cells.
It is generally recognized that retroviral integration sites were randomly distributed in multiple chromosomes [26]. In this study, these transgene integration sites were widely distributed throughout the mouse chromosomes and no common integration sites were detected in these iPSCs, which was similar to the previous studies in the iPSCs derived from adult mouse cells [27,28]. However, reports showed retroviral vectors such as murine leukemia virus (MLV) and human immunodeficiency virus (HIV) have preferences for integration sites in the human genome [29]. The HIV was prone to integrate into active genes, while MLV favored integrating near transcription-start regions [29]. In our study, these transgene integration sites mainly resided in the intergenic regions, and it appeared that there is no effect on the function of the flanking genes. Moreover, many of the flanking genes were found to be house-keeping genes, suggesting a friendly environment for the expression of exogenous TFs for iPSCs induction. Thus, we speculated that these clones contained retroviral integration sites in active genes and transcription-start regions may have no capability to finish the correct reprogramming and generate iPSCs, while the clones that have no integration sites in these related locations could be induced into iPSCs. In addition, specific differences may also give rise to integration bias of retroviral vectors in different species. Locus control region (LCR) and gene-proximal elements, which could affect the transcription of genes by recruiting coactivator, transcription complexes or RNA polymerase, play an important role in gene expression [30,31]. In the process of iPSCs induction, these LCR and gene-proximal elements could be disrupted by the retroviral integration. However, according to our previous reports, these iPSCs in this research have the similar transcription pattern with ESCs, implying the integration of retroviral vector has no distinct effect on the transcription of genes [2]. Theoretically, the integrality of LCR and gene-proximal elements is essential for iPSCs to pass the tetraploid complementation assay and generate live pups. In this study, for the five genes that contain retroviral integration in their introns, the expression level showed no obvious difference in iPSCs, MEF and ESCs, suggesting the integrality of LCR and gene-proximal elements. In future, we could induce the homozygous mouse that contains retroviral integration in indicated intron to study the effect of intron integration. Additionally, for the integration sites detection, PCR was the common method. However, this method was time-consuming and it was hard to get an integrated map of the insertion sites. Along with the advancement of the science and technology, new technical approach such as CRISPR/Cas9 was developed to study structural variants in mammalian genomes [32]. In future, we could also use new technologies to study the insertion sites, and to further study the stability of gene expression in iPSCs.
The mechanisms of tumorigenesis by iPSCs are probably attributed to two aspects. One might be related to the exogenous TFs Klf4 and C-Myc, as these two proto-oncogenes can facilitate cell proliferation and transformation through regulating the expression and activity of down-stream proteins, eventually leading to malignant proliferation [33]. The other is the integration sites of exogenous TFs. Sadelain et al. [34] summarized the features of safe integration sites on the human genome and considered that two genes hardly influence each other when their distance is more than 300 kb. Thus, we further studied the 500 kb of downstream and upstream genes from integration sites, and no tumor-related gene was found in this range. Our results suggested that integration sites probably have no effect on the safety of iPSCs. This was consistent with the research that the tetraploid complementation mice derived from iPSCs in this study was similar to mice derived from ESCs in both intelligence and development, which proved that these iPSCs possess totipotency [35]. Overall, we conclude that the integration of exogenous genes did not affect the expression of the flanking genes, and the stable expression of the flanking genes offered a safe environment for iPSCs.