In general, the inflorescence is the main ornamental part of
RNA-Seq data expression profiles in
With the development of sequencing technology in recent years, high-throughput sequencing of mRNA is being used to provide specific expression information in particular species (He et al., 2020). This technology is additionally one of the effective approaches to detect functional genes in organisms without reference genome. Recently, assembly of
Rhizomes from
High-quality total RNA from the tissue samples was isolated using the RNeasy mini Kit (Qiagen, Germany), following the manufacturer's instructions. gDNAase (provided by the Kit) was added to remove the genomic DNA. The purity and concentration of isolated RNA were checked by a 2100 Bioanalyser (Agilent Technologies, Inc., Santa Clara, CA, USA) and quantified using an ND-2000 (NanoDrop Thermo Scientific, Wilmington, DE, USA). Only high-quality RNA samples (OD260/280 = 1.8–2.2, OD260/230 ≥ 2.0, RIN ≥ 8.0, 28S:18S ≥ 1.0, >2 μg) were used to construct the sequencing library.
RNA purification, reverse transcription, library construction and sequencing were performed at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The RNA-Seq transcriptome libraries were prepared using an Illumina TruSeqTM RNA sample preparation Kit (San Diego, CA, USA). Poly(A) mRNA was purified from total RNA using oligo-dT-attached magnetic beads and then fragmented by fragmentation buffer. Taking these short fragments as templates, double-stranded cDNA was synthesised using a SuperScript double-stranded cDNA synthesis kit (Invitrogen, CA, USA) with random hexamer primers (Illumina). Then the synthesised cDNA was subjected to end-repair, phosphorylation and ‘A’ base addition according to Illumina's library construction protocol. Libraries were size selected for cDNA target fragments of 200–300 bp on 2% Low Range Ultra Agarose followed by PCR amplified using Phusion DNA polymerase (New England Biolabs, Boston, MA, USA) for 15 PCR cycles. After quantifying by TBS380, two RNA-Seq libraries were sequenced in single lane on an Illumina Hiseq NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) for 2 × 150 bp paired-end reads.
The raw reads obtained from HiSeq2000 sequencing were filtered by fastp software (v0.19.6): reads containing adaptors, reads with more than 5% unknown nucleotides and low-quality reads with more than 20% of bases with a quality value ≤20 were deleted (Wang, 2007). Then all clean reads were assembled with Trinity (release 2013-02-25) with the following settings: inchworm: -K –L 25; Chrysalis: -min_glue 2 -glue_factor 0.05 -min_iso_ratio 0.05 -kk 48 –strand -report_welds -max_mem_reads 1,000,000 (Grabherr et al., 2011). We then filtered out any transcripts with less than 1% of the per-component expression level using a script bundled with Trinity. ORFs were extracted using the Perl script bundled with Trinity (transcripts_to_best_scoring_ORFs.pl). To estimate the number of full length transcripts that had been assembled in our data sets we used the tool benchmarking universal single-copy orthologs (BUSCO) (v5.2.2), which is based on evolutionarily informed expectations of gene content, with default settings. HMMER is then run to assign a score to the candidate amino acid sequence (Seppey et al., 2019). Finally, Trinity connects the transcripts and obtains sequences that cannot be extended on either end. Such sequences are defined as unigenes. When multiple samples from the same species are sequenced, the unigenes from each sample's assembly can be further analysed through sequence splicing and the removal of redundancy using sequence clustering software to acquire non-redundant unigenes with the longest length possible. In the final step, transcripts and unigenes were separately searched against the National Center for Biotechnology Information (NCBI) protein non-redundant (NR), the Clusters of Orthologous Groups of Proteins (COG) and Swiss-Prot databases using DIAMOND (v0.8.37.99) to perform function annotations with a significant threshold E-value of 10−5. BLAST2GO (v2.5.0) (
To identify DEGs between two different samples, the gene expression level was calculated using fragments per kilobase of exon per million mapped reads (FPKM). RNA-Seq by expectation-maximization (RSEM) (
Primers used for qRT-PCR assays were designed with the online Primer3 (v0.4.0) software and they amplified PCR products that varied from 80 bp to 200 bp. The housekeeping gene
To obtain a more detailed understanding of the anthocyanin regulatory network within different varieties and different flowering developments, four samples, derived from the red bracts from the SF, HF and BF stages of ‘Dutch Red’, and the yellow bracts from BF stage of ‘Chocolate’ (YF), were used as materials for RNA sequencing (Figure 1A). Each sample was performed with three biological replicates; thus a total of 12 sequenced libraries were constructed.
In total, approximately 576.45 Mb of raw read data were generated from Illumina HiSeq sequencing ranging from 45.24 Mb to 52.98 Mb in each sample (Table S1 in Supplementary Materials). After raw read filtering, a total of 566,269,472 clean reads (the number of total clean nucleotides was 83,999,856,857 nt; the Q20 and GC percentages were 98.31% and 49.25%, respectively) were obtained from the 12th libraries (SF-1, SF-2, SF-3; HF-1, HF-2, HF-3; BF-1, BF-2, BF-3; YF-1, YF-2, YF-3). Then, clean reads were assembled into 159,687 transcripts and 69,453 unigenes, reaching an averagely mapped ratio of 73% in samples. The assembled transcript length ranged from 500 bp to over 4,500 bp with an average of 1,236 bp and N50 of 1,830 bp (Table 1). Other statistical results obtained from the
Summary of assembly results for ‘Dutch Red’ at three development stages and ‘Chocolate’ at the blossomed stage.
Features | SF | HF | BF | YF |
---|---|---|---|---|
Total raw reads (Mb) | 45.24 | 48.46 | 52.98 | 45.47 |
Total clean reads (Mb) | 44.47 | 47.55 | 52.10 | 44.63 |
Total clean bases (Gb) | 6.59 | 7.05 | 7.75 | 6.61 |
Clean reads Q20 (%) | 98.27 | 98.23 | 98.34 | 98.40 |
Clean reads Q30 (%) | 94.63 | 94.56 | 94.83 | 95.01 |
Clean reads (pair reads) | 22.24 | 23.78 | 26.05 | 22.31 |
Mapped reads | 16,378,830 | 17,511,187 | 19,200,458 | 16,316,551 |
Mapped ratio (%) | 73.66 | 73.66 | 73.69 | 73.13 |
Total number of transcripts | 159,687 | |||
Total number of unigenes | 69,453 | |||
Total sequence base | 197,307,177 | |||
Average length of transcripts | 1,236 | |||
N50 value of transcripts | 1,830 | |||
E90N50 value of transcripts | 1,921 | |||
GC (%) | 42.49 | |||
TransRate score | 0.30 | |||
BUSCO score | 60.1% (3.5%) |
Q20: The rate of bases whose quality is greater than 20; Q30: The rate of bases whose quality is greater than 30; N50: A weighted median statistic in which 50% of the total length is contained in unigenes greater than or equal to this value; E90N50: A weighted median statistic in which 50% of the total length is contained in the top 90% unigenes greater than or equal to this value; GC (%): The percentage of G and C bases in all unigenes; TransRate: Score the assembly results with transrate; BUSCO: Score the assembly integrity with BUSCO.
BF, blossomed flowering; HF, half-flowering.
These transcripts offer a potential source for the identification of functional genes. Subsequently, reads with an FPKM value <0.5 were removed, and 95,625 (mean from three replicates, the same below), 88,479, 89,429 and 78,537 transcripts were found in SF, HF, BF and YF, respectively. On average, 63% of the transcripts were in the 0.5–5 FPKM range, and 35% of the transcripts were in the 5–100 FPKM range (Figure 1B; Table S2 in Supplementary Materials). A principal component analysis showed that there were highly correlated transcriptome characteristics between the biological replicates of each sample (Figure 1C). A total of 44,536 transcripts were shared between samples (Figure 1D).
The transcript and unigene sequences were first aligned against the NR and Swiss-Prot databases using BlastN and BlastX searches with an E-value less than 10−5. This analysis indicated that 104,303 transcripts (65.32% of the total transcripts) and 34,189 unigenes (49.23% of the total unigenes) were matched to the NR database (Table S2 in Supplementary Materials), while 82,193 transcripts (51.47%) and 26,589 unigenes (38.28%) were matched to the Swiss-Prot database (Table S3 in Supplementary Materials). The annotated transcripts and unigenes in the COG, GO and KEGG databases are also listed in Table S3 in Supplementary Materials. However, 53,694 transcripts (33.62%) and 34,270 unigenes (49.34%) could not be matched to any functions included in these five databases.
Comparison of gene expression levels in varieties and different development stages revealed a total of 15,449 DEGs (BF vs. YF, 11,000 DEGs; BF vs. HF, 4,761 DEGs; BF vs. SF, 7,742 DEGs; HF vs. SF, 1,535 DEGs), and only 1% (215 DEGs) of the DEGs between varieties and different development stages are the same (Figure 2A). Interestingly, the number of down-regulated DEGs was greater than the number of up-regulated DEGs in red bracts of blooming flowers of ‘Dutch Red’, as indicated from the comparisons of BF vs. HF (1,643 DEGs up-regulated; 3,118 DEGs down-regulated) and BF vs. SF (3,591 DEGs up-regulated; 4,151 DEGs down-regulated); however, more genes were significantly up-regulated in the bracts of ‘half flowering’ of ‘Dutch Red’, as indicated from the comparisons of HF vs. SF (1,198 DEGs up-regulated; 337 DEGs down-regulated) (Figure 2A).
(A) Distributions and (B) KEGG analysis of DEGs identified from pairwise comparisons between different developmental stages and between varieties. The FDR value is the multiple hypothesis test-corrected
KEGG analysis of the identified DEGs showed most enrichment in the following pathways: ‘Phenylpropanoid biosynthesis’ (map00940), ‘Plant-pathogen interaction’ (map04626), ‘Plant hormone signal transduction’ (map04075), ‘Flavonoid biosynthesis’ (map00941), ‘Starch and sucrose metabolism’ (map00500), ‘Flavone and flavonol biosynthesis’ (map00944) and ‘Terpenoid backbone biosynthesis’ (map00900) (Figure 2B). We noted that among the DEGs associated with flavonoid, those involved in anthocyanin biosynthesis, and transcripts of
Heat maps of the clustering analysis results also suggested that genes related to the ‘Flavonoid biosynthesis’, ‘Flavone and flavonol biosynthesis’ and ‘Phenylpropanoid biosynthesis’ pathways were significantly up-regulated in the early stage bracts of ‘Dutch Red’ when compared with those in the half and fully blossomed stages (Figure 3A). Analysis of DEGs related to ‘Signal transduction’ revealed that auxin-responsive proteins were screened as the most variable gene family, and auxin-responsive small auxin up RNA (SAUR) genes were important in the early development stage of red bracts. Meanwhile, IAA transcripts were mainly produced in both the half and fully blossomed stages when compared with the initial blossoming stage. Besides, the transcripts of
Comparisons of DEGs. (A) Heatmap comparison of DEGs in the most enriched KEGG pathways during the three development stages. (B) Heatmap comparison of DEGs in the most enriched KEGG pathways between ‘Chocolate’ and ‘Dutch Red’. DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes.
By annotation in the public database, more than 30 genes affiliated to nine structural gene families were predicted to participate in the anthocyanin biosynthesis pathway (Figure 4). Anthocyanin biosynthesis started with the conversion of phenylalanine to coumarate-CoA by phenylalanine ammonia lyase (PAL), cinnamate-4-hydroxylase (C4H) and 4-coumarate: CoA ligase (4CL). There were seven unigenes annotated as
Putative structural genes in anthocyanin biosynthesis identified from DEGs.
Unigene | Annotation | FPKM-SF | FPKM-HF | FPKM-BF | Log2 Ratio (BF/SF) | FPKM-YF | Log2 Ratio (BF/YF) |
---|---|---|---|---|---|---|---|
DN17923_c1_g3 | CHS-like protein | 2,593.0 | 1,906.1 | 185.4 | −3.80 | 2,667.3 | −3.8 |
DN16695_c0_g5 | CHS | 80.4 | 760.5 | 298.4 | 1.9 | 2,187.2 | −2.9 |
DN10857_c0_g4 | Probable chalcone – flavonone isomerase 3 isoform X1 (CHI) | 501.8 | 344.7 | 38.0 | −3.7 | 561.0 | −3.9 |
DN15898_c0_g1 | Chalcone – flavonone isomerase (CHI) | 152.8 | 139.5 | 20.0 | −2.9 | 237.0 | −3.6 |
DN14172_c2_g1 | F3H | 976.4 | 886.8 | 232.7 | −2.1 | 1,440.4 | −2.6 |
DN22055_c1_g3 | Flavonol synthase/F3H | 5.1 | 7.3 | 11.9 | 1.2 | 1.4 | 3.1 |
DN22236_c1_g1 | Flavonol synthase/F3H | 296.2 | 125.1 | 3.0 | −6.6 | 257.7 | −6.4 |
DN16064_c1_g1 | Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H) | 1,422.9 | 1,389.6 | 1,138.9 | −0.3 | 64.0 | 4.2 |
DN8362_c0_g1 | Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H) | 6.6 | 14.1 | 2.3 | −1.5 | 20.0 | −3.1 |
DN14193_c0_g5 | DFR | 7.5 | 8.8 | 2.4 | −1.6 | 18.5 | −2.9 |
DN16691_c4_g1 | DFR | 67.8 | 259.7 | 117.8 | −0.8 | 570.8 | −2.3 |
DN12990_c1_g6 | Anthocyanidin reductase (ANR) | 64.7 | 331.3 | 90.8 | 0.5 | 323.2 | −1.8 |
DN13257_c0_g1 | Anthocyanidin reductase (ANR) | 27.8 | 249.5 | 85.3 | 1.6 | 277.8 | −1.7 |
DN12119_c3_g2 | Anthocyanidin 3-O-glucosyltransferase 7-like flavonoid 3-O-glucosyltransferase (UFGT) | 191.51 | 222.9 | 58.1 | −1.7 | 250.3 | 0.38 |
DN13065_c1_g1 | Anthocyanin 3′-O-beta-glucosyltransferase-like (UFGT) | 3.74 | 2.6 | 0.24 | −4.0 | 11.2 | −5.54 |
DN13592_c1_g3 | Anthocyanidin 3-O-glucosyltransferase-like (UFGT) | 155.8 | 42.5 | 3.1 | −5.67 | 196.8 | −6.0 |
BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; DEGs, differentially expressed genes; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; FPKM, fragments per kilobase of transcript per million mapped reads; HF, half flowering.
The anthocyanin biosynthesis process with its core metabolites and enzymes and the expression levels of core enzyme genes. Enzyme names and expression patterns are indicated at the side of each step. Colour boxes from left to right represent unigenes showing lower or higher expression level in the colour bracts of ‘Dutch Red’ of SF, HF, BF and YF, respectively. ANS, anthocyanidin synthase; BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; C4H, cinnamate-4-hydroxylase; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; HF, half-flowering; PAL, phenylalanine ammonia lyase.
In addition to the structural genes in the anthocyanin biosynthetic pathway, TFs also play important roles in flower colour development through regulating the temporal and spatial expression of structural genes. Many DEGs identified in the KEGG database were annotated as TFs, including
Distribution and selection of key differently expressed TFs associated with anthocyanin biosynthesis. (A) Distribution of TF family of unigenes. (B) Clustering heat maps of significantly enriched differently expressed TFs including
Differently expressed genes in transcription factor families of MYB and WD40.
Unigene | Annotation | FPKM-SF | FPKM-HF | FPKM-BF | FPKM-YF | Regulation |
---|---|---|---|---|---|---|
DN13707_c0_g2 | WD-repeat protein | 306.8 | 283.4 | 244.7 | 226.5 | Whole flowering |
DN14303_c2_g4 | WD-repeat protein | 90.3 | 97.0 | 97.2 | 101.5 | periods |
DN17014_c0_g2 | MYB_superfamily | 613.5 | 1,695.1 | 531.6 | 718.1 | |
DN20674_c0_g1 | WD-repeat protein | 121.4 | 131.1 | 136.2 | 67.3 | |
DN15014_c1_g1 | MYB_superfamily | 66.3 | 27.2 | 13.4 | 11.8 | Early flowering |
DN11279_c3_g3 | MYB_superfamily | 105.1 | 64.9 | 0.8 | 38.9 | periods |
DN19744_c1_g3 | MYB_superfamily | 160.6 | 132.0 | 45.9 | 53.1 | |
DN12166_c1_g8 | MYB_superfamily | 16.3 | 10.9 | 41.8 | 6.4 | Blossom periods in |
DN12860_c2_g5 | MYB_superfamily | 4.5 | 5.2 | 22.7 | 2.8 | ‘Dutch Red’ |
DN15887_c1_g3 | MYB_superfamily | 14.1 | 10.7 | 36.5 | 2.8 | |
DN22635_c0_g1 | MYB_superfamily | 7.1 | 11.2 | 26.8 | 3.3 | |
DN12661_c0_g1 | WD-repeat protein | 1.9 | 5.5 | 36.5 | 36.2 | Blossom periods |
DN21970_c4_g5 | MYB_superfamily | 15.3 | 121.8 | 220.9 | 298.7 |
FPKM, fragments per kilobase of transcript per million mapped reads; HF, half-flowering.
To evaluate the reliability of RNA-Seq analysis using Illumina sequencing, 14 candidate genes in the anthocyanin biosynthesis and regulations were selected for qRT-PCR test in the four bracts (SF, HF, BF and YF). The oligonucleotide primer sequences used to amplify these transcripts for each gene and related unigene are listed in Table S5 in Supplementary Materials. In general, our qRT-PCR results show a high degree of consistency with the RNA-Seq results (Figure 6). Expression patterns of six (42.85%) genes (
qRT-PCR validations of 14 putative genes involved in anthocyanin biosynthesis and regulations. The histograms represent expression determined by qRT-PCR (left
Anthocyanin, a family of water-soluble bioactive flavonoids, is considered to be rich in deep-coloured flowers and the content was found to have a correlation with colour (Pavel et al., 2018). The flavonoid biosynthesis, as a branch of the phenylpropanoid pathway, is responsible for the production of anthocyanins, isoflaonoid and flavonols (Tohge et al., 2017). In this study, the comparative transcriptome was used to explore the mechanisms of colour formation of ‘Dutch Red’ and the colour difference between ‘Dutch’ and ‘Chocolate’. Our results suggested that the deepening of bract colour in
For anthocyanin biosynthesis pathways, as is shown in Figure 4, when phenylalanine is transformed into coumarate-CoA, it will be introduced into the anthocyanin biosynthesis pathway by CHS and CHI. CHS is one of the key and rate-limiting enzymes of phenylpropanoid pathway which plays superior roles in the production of anthocyanin. Singh and Kumaria (2020) showed that
In plants, the structural genes of the flavonoid biosynthetic pathway are significantly regulated at the level of transcription. TFs of
To summarise, we applied the RNA-Seq technology to perform transcript analysis of the three flowering stages of ‘Dutch Red’ (SF, HF and BF), and the yellow ‘Chocolate’, YF. Our results revealed that the deepening of