Comparative RNA-Seq analysis to understand anthocyanin biosynthesis and regulations in

Curcuma alismatifolia is a beautiful exotic plant originally from Thailand and Cambodia (Khumkratok et al., 2012). It belongs to the ginger family, and has recently become popular as an ornamental around the word. Nowadays, a large number of cultivars, including ‘Qiang Mai Pink’, ‘Qiang Mai Red’, ‘Dutch Red’ and ‘UB Snow 701’, have been introduced in China. They are produced as commercial cut flowers or as potted plants in urban gardens (Li et al., 2021a).

In general, the inflorescence is the main ornamental part of C. alismatifolia. It consists of colourful and green bracts on the showy flowering stem. The pink to purple red/white distal bracts are more numerous than the basal green bracts. Several small flower buds with purple flag petals are borne on the green bracts (Figure 1A). Anthocyanins are widely known as nutraceuticals and are a group of soluble vacuolar pigments among leaves, flowers, fruits and roots (Mizuta et al., 2009). These kinds of pigments exhibit a wide range of colours, from pink to blue-purple, and primarily consist of six common types of anthocyanidins, namely pelargonidin, cyanidin, delphinidin, peonidin, petunidin and malvidin. Both internal factors and exterior environments, such as different varieties and growth periods, can influence the distribution of anthocyanins (Zhao and Tao, 2015). The biosynthesis of anthocyanin is catalysed by a series of enzymes, comprising chalcone synthase (CHS), chalcone isomerase (CHI), dihydroflavonol 4-reductase (DFR) and anthocyanidin synthase (ANS) in the flavonoid biosynthetic pathways, and has been reported in Arabidopsis (Zhang et al., 2017), Begonia semperflorens (Wang et al., 2018) and Narcissus pseudonarcissus (Li et al., 2018). By promoting the activity of key enzymes, anthocyanin synthesis can be induced that leads to changed flower colours. Studies have also found that three main types of transcription factors (TFs), MYB, basic helix-loop-helix (bHLH) and WD40, can form an MBW complex to regulate the expressions of structural genes (Wei et al., 2019). In C. alismatifolia, a total of five anthocyanins (delphinidin 3-O-rutinoside, cyanidin 3-O-rutinoside, petunidin 3-O-rutinoside, malvidin 3-O-glucoside and malvidin 3-O-rutinoside) were identified as being responsible for the pink to red bracts (Koshioka et al., 2015). The genes, CHS and DFR, were successfully cloned from C. alismatifolia (‘Qing Mai Pink’) by Chanapan et al. (2017) and Petchang et al. (2017). However, the underlying molecular mechanisms that control anthocyanin synthesis in flower colour formation and in cultivars of C. alismatifolia with different colours are far from conclusive.

RNA-Seq data expression profiles in C. alismatifolia colour bracts. (A) Phenotypes of ‘Dutch Red’ at three typical development stages and ‘Chocolate’ at the blossomed stage. (B) Numbers of transcripts in the four bract samples. (C) Principal component analysis of the RNA-Seq data. (D) Venn diagram of the RNA-Seq data from the four bract samples. BF, blossomed flowering; HF, half-flowering; FPKM, fragments per kilobase of transcript per million mapped reads.

With the development of sequencing technology in recent years, high-throughput sequencing of mRNA is being used to provide specific expression information in particular species (He et al., 2020). This technology is additionally one of the effective approaches to detect functional genes in organisms without reference genome. Recently, assembly of de-novo transcriptomes was first performed in C. alismatifolia, identifying novel EST-SSR markers for C. species (Taheri et al., 2019). However, studies of the colour formation in the C. alismatifolia transcriptome are rare. In this study, we performed RNA-Seq analysis on the regulatory networks of anthocyanin biosynthesis in ‘flowers’ (colour bracts) of the C. alismatifolia cultivars Dutch Red and Chocolate. The expression profiles in ‘Dutch Red’ at the different flower developmental stages were also investigated. We used this information to identify hub genes related to anthocyanin pathways and to analyse the differently expressed genes (DEGs) involved in anthocyanin biosynthesis and regulation. The goal was to acquire comprehensive knowledge of the anthocyanin regulatory network during C. alismatifolia flower formation and to provide a theoretical basis for the future breeding of C. alismatifolia cultivars with high ornamental and commercial values.

MATERIALS AND METHODS

Plant materials and growth

Rhizomes from C. alismatifolia cv. ‘Dutch Red’ and ‘Chocolate’ were obtained from a horticultural company (Jinluan) in China. Rhizomes were grown in a greenhouse within the campus of Minnan Normal University. During the experiment, the plants were grown at 25 °C under 500 μmol · m⁻² · s⁻¹. Colourful bracts from ‘Dutch Red’ were harvested at three continuous developmental stages [early flowering small flowers (SF), half-flowering (HF) and blossomed flowering (BF)]. The yellow bracts from ‘Chocolate’ were harvested at the peak flowering stage yellow flowers (YF). For each sample, three biological replicates (3 g each) were collected. These tissue samples were immediately subjected to RNA extraction.

RNA extraction, library construction and Illumina sequencing

High-quality total RNA from the tissue samples was isolated using the RNeasy mini Kit (Qiagen, Germany), following the manufacturer's instructions. gDNAase (provided by the Kit) was added to remove the genomic DNA. The purity and concentration of isolated RNA were checked by a 2100 Bioanalyser (Agilent Technologies, Inc., Santa Clara, CA, USA) and quantified using an ND-2000 (NanoDrop Thermo Scientific, Wilmington, DE, USA). Only high-quality RNA samples (OD260/280 = 1.8–2.2, OD260/230 ≥ 2.0, RIN ≥ 8.0, 28S:18S ≥ 1.0, >2 μg) were used to construct the sequencing library.

RNA purification, reverse transcription, library construction and sequencing were performed at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The RNA-Seq transcriptome libraries were prepared using an Illumina TruSeqTM RNA sample preparation Kit (San Diego, CA, USA). Poly(A) mRNA was purified from total RNA using oligo-dT-attached magnetic beads and then fragmented by fragmentation buffer. Taking these short fragments as templates, double-stranded cDNA was synthesised using a SuperScript double-stranded cDNA synthesis kit (Invitrogen, CA, USA) with random hexamer primers (Illumina). Then the synthesised cDNA was subjected to end-repair, phosphorylation and ‘A’ base addition according to Illumina's library construction protocol. Libraries were size selected for cDNA target fragments of 200–300 bp on 2% Low Range Ultra Agarose followed by PCR amplified using Phusion DNA polymerase (New England Biolabs, Boston, MA, USA) for 15 PCR cycles. After quantifying by TBS380, two RNA-Seq libraries were sequenced in single lane on an Illumina Hiseq NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) for 2 × 150 bp paired-end reads.

De novo assembly and functional annotation

The raw reads obtained from HiSeq2000 sequencing were filtered by fastp software (v0.19.6): reads containing adaptors, reads with more than 5% unknown nucleotides and low-quality reads with more than 20% of bases with a quality value ≤20 were deleted (Wang, 2007). Then all clean reads were assembled with Trinity (release 2013-02-25) with the following settings: inchworm: -K –L 25; Chrysalis: -min_glue 2 -glue_factor 0.05 -min_iso_ratio 0.05 -kk 48 –strand -report_welds -max_mem_reads 1,000,000 (Grabherr et al., 2011). We then filtered out any transcripts with less than 1% of the per-component expression level using a script bundled with Trinity. ORFs were extracted using the Perl script bundled with Trinity (transcripts_to_best_scoring_ORFs.pl). To estimate the number of full length transcripts that had been assembled in our data sets we used the tool benchmarking universal single-copy orthologs (BUSCO) (v5.2.2), which is based on evolutionarily informed expectations of gene content, with default settings. HMMER is then run to assign a score to the candidate amino acid sequence (Seppey et al., 2019). Finally, Trinity connects the transcripts and obtains sequences that cannot be extended on either end. Such sequences are defined as unigenes. When multiple samples from the same species are sequenced, the unigenes from each sample's assembly can be further analysed through sequence splicing and the removal of redundancy using sequence clustering software to acquire non-redundant unigenes with the longest length possible. In the final step, transcripts and unigenes were separately searched against the National Center for Biotechnology Information (NCBI) protein non-redundant (NR), the Clusters of Orthologous Groups of Proteins (COG) and Swiss-Prot databases using DIAMOND (v0.8.37.99) to perform function annotations with a significant threshold E-value of 10⁻⁵. BLAST2GO (v2.5.0) (http://www.blast2go.com/b2ghome) program was used to get GO annotations of unigenes for describing biological processes, molecular functions and cellular components (Conesa et al., 2005). Metabolic pathway analysis was performed using KOBAS (v2.1.1) based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Ogata et al., 1999).

Differential expression analysis and functional enrichment

To identify DEGs between two different samples, the gene expression level was calculated using fragments per kilobase of exon per million mapped reads (FPKM). RNA-Seq by expectation-maximization (RSEM) (http://deweylab.biostat.wisc.edu/rsem/) (Dewey and Li, 2011) was used to quantify gene and isoform abundances. R statistical package software EdgeR (Empirical analysis of Digital Gene Expression in R, http://www.bioconductor.org/packages/2.12/bioc/html/edgeR.html) (Robinson et al., 2010) was utilised for DEGs between two samples (Anders et al., 2013). The threshold for the p-value was determined by the false-discovery rate (FDR). Unigenes with FDR ≤0.001 and ratio of FPKMs of the two samples larger than 2 (genes for which FPKM <1 were filtered) were considered to be DEGs in this study. In addition, functional-enrichment analysis including GO and KEGG were performed to identify those DEGs that were significantly enriched in GO terms and metabolic pathways at Bonferroni-corrected p-value ≤0.05 compared with the whole-transcriptome background. GO functional enrichment and KEGG pathway analysis were carried out by Goatools (https://github.com/tanghaibao/Goatools) and KOBAS (http://kobas.cbi.pku.edu.cn/home.do) (Xie et al., 2011).

qRT-PCR analysis

Primers used for qRT-PCR assays were designed with the online Primer3 (v0.4.0) software and they amplified PCR products that varied from 80 bp to 200 bp. The housekeeping gene ACTIN (Li et al., 2021b) was used as internal reference control for normalisation. qRTPCR assays were performed as described previously (Li et al., 2019). The relative expression of the genes was calculated using the 2^−ΔCt method.

RESULTS

Transcriptome analysis in colourful bracts of ‘Dutch Red’ and ‘Chocolate’

To obtain a more detailed understanding of the anthocyanin regulatory network within different varieties and different flowering developments, four samples, derived from the red bracts from the SF, HF and BF stages of ‘Dutch Red’, and the yellow bracts from BF stage of ‘Chocolate’ (YF), were used as materials for RNA sequencing (Figure 1A). Each sample was performed with three biological replicates; thus a total of 12 sequenced libraries were constructed.

In total, approximately 576.45 Mb of raw read data were generated from Illumina HiSeq sequencing ranging from 45.24 Mb to 52.98 Mb in each sample (Table S1 in Supplementary Materials). After raw read filtering, a total of 566,269,472 clean reads (the number of total clean nucleotides was 83,999,856,857 nt; the Q20 and GC percentages were 98.31% and 49.25%, respectively) were obtained from the 12th libraries (SF-1, SF-2, SF-3; HF-1, HF-2, HF-3; BF-1, BF-2, BF-3; YF-1, YF-2, YF-3). Then, clean reads were assembled into 159,687 transcripts and 69,453 unigenes, reaching an averagely mapped ratio of 73% in samples. The assembled transcript length ranged from 500 bp to over 4,500 bp with an average of 1,236 bp and N50 of 1,830 bp (Table 1). Other statistical results obtained from the C. alismatifolia transcriptome sequencing and assemblies are also listed in Table 1. The accession numbers of all raw data in the Short Read Archive (SRA) Sequence Database in the NCBI are listed in Table S1 in Supplementary Materials.

Table 1

Summary of assembly results for ‘Dutch Red’ at three development stages and ‘Chocolate’ at the blossomed stage.

Features	SF	HF	BF	YF
Total raw reads (Mb)	45.24	48.46	52.98	45.47
Total clean reads (Mb)	44.47	47.55	52.10	44.63
Total clean bases (Gb)	6.59	7.05	7.75	6.61
Clean reads Q20 (%)	98.27	98.23	98.34	98.40
Clean reads Q30 (%)	94.63	94.56	94.83	95.01
Clean reads (pair reads)	22.24	23.78	26.05	22.31
Mapped reads	16,378,830	17,511,187	19,200,458	16,316,551
Mapped ratio (%)	73.66	73.66	73.69	73.13
Total number of transcripts	159,687
Total number of unigenes	69,453
Total sequence base	197,307,177
Average length of transcripts	1,236
N50 value of transcripts	1,830
E90N50 value of transcripts	1,921
GC (%)	42.49
TransRate score	0.30
BUSCO score	60.1% (3.5%)

Q20: The rate of bases whose quality is greater than 20; Q30: The rate of bases whose quality is greater than 30; N50: A weighted median statistic in which 50% of the total length is contained in unigenes greater than or equal to this value; E90N50: A weighted median statistic in which 50% of the total length is contained in the top 90% unigenes greater than or equal to this value; GC (%): The percentage of G and C bases in all unigenes; TransRate: Score the assembly results with transrate; BUSCO: Score the assembly integrity with BUSCO.

BF, blossomed flowering; HF, half-flowering.

These transcripts offer a potential source for the identification of functional genes. Subsequently, reads with an FPKM value <0.5 were removed, and 95,625 (mean from three replicates, the same below), 88,479, 89,429 and 78,537 transcripts were found in SF, HF, BF and YF, respectively. On average, 63% of the transcripts were in the 0.5–5 FPKM range, and 35% of the transcripts were in the 5–100 FPKM range (Figure 1B; Table S2 in Supplementary Materials). A principal component analysis showed that there were highly correlated transcriptome characteristics between the biological replicates of each sample (Figure 1C). A total of 44,536 transcripts were shared between samples (Figure 1D).

The transcript and unigene sequences were first aligned against the NR and Swiss-Prot databases using BlastN and BlastX searches with an E-value less than 10⁻⁵. This analysis indicated that 104,303 transcripts (65.32% of the total transcripts) and 34,189 unigenes (49.23% of the total unigenes) were matched to the NR database (Table S2 in Supplementary Materials), while 82,193 transcripts (51.47%) and 26,589 unigenes (38.28%) were matched to the Swiss-Prot database (Table S3 in Supplementary Materials). The annotated transcripts and unigenes in the COG, GO and KEGG databases are also listed in Table S3 in Supplementary Materials. However, 53,694 transcripts (33.62%) and 34,270 unigenes (49.34%) could not be matched to any functions included in these five databases.

Comparisons of differently expressed genes between different development stages and varieties

Comparison of gene expression levels in varieties and different development stages revealed a total of 15,449 DEGs (BF vs. YF, 11,000 DEGs; BF vs. HF, 4,761 DEGs; BF vs. SF, 7,742 DEGs; HF vs. SF, 1,535 DEGs), and only 1% (215 DEGs) of the DEGs between varieties and different development stages are the same (Figure 2A). Interestingly, the number of down-regulated DEGs was greater than the number of up-regulated DEGs in red bracts of blooming flowers of ‘Dutch Red’, as indicated from the comparisons of BF vs. HF (1,643 DEGs up-regulated; 3,118 DEGs down-regulated) and BF vs. SF (3,591 DEGs up-regulated; 4,151 DEGs down-regulated); however, more genes were significantly up-regulated in the bracts of ‘half flowering’ of ‘Dutch Red’, as indicated from the comparisons of HF vs. SF (1,198 DEGs up-regulated; 337 DEGs down-regulated) (Figure 2A).

(A) Distributions and (B) KEGG analysis of DEGs identified from pairwise comparisons between different developmental stages and between varieties. The FDR value is the multiple hypothesis test-corrected p-value. The FDR value is in the range of [0–1]. The closer that number is to 0, the more significant the enrichment. The rich factor refers to the ratio of the number of genes among the DEGs located in a number of pathways to the total number of genes in the pathway entries in all of the annotated genes. The greater the rich factor, the greater the degree of enrichment. BF, blossomed flowering; DEGs, differentially expressed genes; FDR, false-discovery rate; HF, half-flowering; KEGG, Kyoto Encyclopedia of Genes and Genomes.

KEGG analysis of the identified DEGs showed most enrichment in the following pathways: ‘Phenylpropanoid biosynthesis’ (map00940), ‘Plant-pathogen interaction’ (map04626), ‘Plant hormone signal transduction’ (map04075), ‘Flavonoid biosynthesis’ (map00941), ‘Starch and sucrose metabolism’ (map00500), ‘Flavone and flavonol biosynthesis’ (map00944) and ‘Terpenoid backbone biosynthesis’ (map00900) (Figure 2B). We noted that among the DEGs associated with flavonoid, those involved in anthocyanin biosynthesis, and transcripts of shikimate O-hydroxycinnamoyltransferase (HCT, EC2.3.1.133), the enzyme which is of great importance in plant secondary metabolite production, are more enriched in the initial blossoming stage when compared with the blossomed stage of ‘Dutch Red’. Those proteins, involved in stress resistance and antioxidant defence, were more enriched in the half and fully blossomed stages when compared with the initial blossoming stage. In addition, most of the transcripts that participated in photosynthesis were more enriched in the yellow bracts of ‘Chocolate’ when compared with those in ‘Dutch Red’.

Heat maps of the clustering analysis results also suggested that genes related to the ‘Flavonoid biosynthesis’, ‘Flavone and flavonol biosynthesis’ and ‘Phenylpropanoid biosynthesis’ pathways were significantly up-regulated in the early stage bracts of ‘Dutch Red’ when compared with those in the half and fully blossomed stages (Figure 3A). Analysis of DEGs related to ‘Signal transduction’ revealed that auxin-responsive proteins were screened as the most variable gene family, and auxin-responsive small auxin up RNA (SAUR) genes were important in the early development stage of red bracts. Meanwhile, IAA transcripts were mainly produced in both the half and fully blossomed stages when compared with the initial blossoming stage. Besides, the transcripts of 3-ketoacyl-CoA synthase (XP_009405939.1), a key enzyme in fatty acid biosynthesis, were observed exclusively enriched in an early development stage compared with the half and fully blossomed stages (Figure 3A). From the clustering analysis of blossoming stages between ‘Chocolate’ and ‘Dutch Red’, DEGs related to ‘flavonoid biosynthesis’ and ‘Photosynthesis’ were more up-regulated in the yellow bracts of ‘Chocolate’ (Figure 3B). DEGs in the ‘Porphyrin and chlorophyll metabolism’ pathway were significantly enriched in ‘Chocolate’. DEGs in ‘Phenylproanoid biosynthesis’ showed that peroxidase transcripts were significantly enriched in the red bracts of ‘Dutch Red’, which was suggestive of a stress response in the blossomed bracts of ‘Dutch Red’ (Figure 3B).

Comparisons of DEGs. (A) Heatmap comparison of DEGs in the most enriched KEGG pathways during the three development stages. (B) Heatmap comparison of DEGs in the most enriched KEGG pathways between ‘Chocolate’ and ‘Dutch Red’. DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Identification of candidate genes related to anthocyanin biosynthesis

By annotation in the public database, more than 30 genes affiliated to nine structural gene families were predicted to participate in the anthocyanin biosynthesis pathway (Figure 4). Anthocyanin biosynthesis started with the conversion of phenylalanine to coumarate-CoA by phenylalanine ammonia lyase (PAL), cinnamate-4-hydroxylase (C4H) and 4-coumarate: CoA ligase (4CL). There were seven unigenes annotated as PAL and 12 unigenes annotated as 4CL, among which DN13924_c2 and DN18026_c0 had a significantly lower expression in the blossomed stage of the ‘Dutch Red’. Subsequently, dihydroflavonol was formatted by coumarate-CoA and malonyl-CoA catalysed by CHS, CHI, flavanone-3-hydroxylase (F3H) and flavonoid 3′,5′-hydroxylase (F3′5′H), which were the key enzymes in the metabolism of anthocyanin. The formation of various anthocyanidins by dihydroflavonols was then catalysed by DFR and ANS. Among them, DN14172_c2, DN16691_c4 and DN17923_ c1 encode F3H, DFR and CHS, respectively (Table 2). The first two were the same genes as that submitted previously in the C. alismatifolia by Kriangphan et al. (2009), while DN17923_c1 was closely related to CHS (AEU17693.1) gene in C. longa (Resmi and Soniya, 2012). They all showed significantly higher expression in the bracts of ‘Chocolate’ (FPKM as 1440.4, 570.8 and 2667.3, respectively) (Figure 4; Table 2). Another three anthocyanin structural genes, DN10857_c0, DN16064_c1 and DN12990_c1, have 70.9%, 84.2% and 84.0%, respectively, nucleic acid similarity with CHI (XP_009384766.1), F3′5′H (XP_009411862.1) and ANR (XP_009412668.1) in Musa acuminate. DN10857_c0 was expressed significantly lower in the blossomed stage of ‘Dutch Red’ (FPKM as 38.0) than those in the bracts of SF, HF and YF (FPKM as 501.8, 344.7 and 561.0, respectively). DN16064_c1 was expressed significantly lower in the bracts of YF than those of ‘Dutch Red’. A total of 14 genes were identified in the clade with glucosyltransferase-like family with different patterns of expression (Figure 4). Among them, DN12119_c3 and DN13592_c1 encode the anthocyanidin/anthocyanin 3-O-glucosyltransferases, which catalyse unstable anthocyanidin into anthocyanin. They are the last key enzymes in the anthocyanin biosynthetic pathway. DN12119_c3, DN16301_c1 and DN13065_c1 had significantly lower expressions in BF than that in SF and HF, while other unigenes showed no significant divergence in different stages or cultivars (Table 2; Figure 4).

Table 2

Putative structural genes in anthocyanin biosynthesis identified from DEGs.

Unigene	Annotation	FPKM-SF	FPKM-HF	FPKM-BF	Log₂ Ratio (BF/SF)	FPKM-YF	Log₂ Ratio (BF/YF)
DN17923_c1_g3	CHS-like protein	2,593.0	1,906.1	185.4	−3.80	2,667.3	−3.8
DN16695_c0_g5	CHS	80.4	760.5	298.4	1.9	2,187.2	−2.9
DN10857_c0_g4	Probable chalcone – flavonone isomerase 3 isoform X1 (CHI)	501.8	344.7	38.0	−3.7	561.0	−3.9
DN15898_c0_g1	Chalcone – flavonone isomerase (CHI)	152.8	139.5	20.0	−2.9	237.0	−3.6
DN14172_c2_g1	F3H	976.4	886.8	232.7	−2.1	1,440.4	−2.6
DN22055_c1_g3	Flavonol synthase/F3H	5.1	7.3	11.9	1.2	1.4	3.1
DN22236_c1_g1	Flavonol synthase/F3H	296.2	125.1	3.0	−6.6	257.7	−6.4
DN16064_c1_g1	Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H)	1,422.9	1,389.6	1,138.9	−0.3	64.0	4.2
DN8362_c0_g1	Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H)	6.6	14.1	2.3	−1.5	20.0	−3.1
DN14193_c0_g5	DFR	7.5	8.8	2.4	−1.6	18.5	−2.9
DN16691_c4_g1	DFR	67.8	259.7	117.8	−0.8	570.8	−2.3
DN12990_c1_g6	Anthocyanidin reductase (ANR)	64.7	331.3	90.8	0.5	323.2	−1.8
DN13257_c0_g1	Anthocyanidin reductase (ANR)	27.8	249.5	85.3	1.6	277.8	−1.7
DN12119_c3_g2	Anthocyanidin 3-O-glucosyltransferase 7-like flavonoid 3-O-glucosyltransferase (UFGT)	191.51	222.9	58.1	−1.7	250.3	0.38
DN13065_c1_g1	Anthocyanin 3′-O-beta-glucosyltransferase-like (UFGT)	3.74	2.6	0.24	−4.0	11.2	−5.54
DN13592_c1_g3	Anthocyanidin 3-O-glucosyltransferase-like (UFGT)	155.8	42.5	3.1	−5.67	196.8	−6.0

BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; DEGs, differentially expressed genes; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; FPKM, fragments per kilobase of transcript per million mapped reads; HF, half flowering.

The anthocyanin biosynthesis process with its core metabolites and enzymes and the expression levels of core enzyme genes. Enzyme names and expression patterns are indicated at the side of each step. Colour boxes from left to right represent unigenes showing lower or higher expression level in the colour bracts of ‘Dutch Red’ of SF, HF, BF and YF, respectively. ANS, anthocyanidin synthase; BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; C4H, cinnamate-4-hydroxylase; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; HF, half-flowering; PAL, phenylalanine ammonia lyase.

Identification of candidate regulators

In addition to the structural genes in the anthocyanin biosynthetic pathway, TFs also play important roles in flower colour development through regulating the temporal and spatial expression of structural genes. Many DEGs identified in the KEGG database were annotated as TFs, including MYB, AR2/ERF, C2C2, bHLH and WD. Their abundance levels and detailed information are shown in Figure 5A and Table S4 in Supplementary Materials. MYB superfamily and WD-repeat transcripts were the most abundant found to regulate the flower development and flower formation (Figure 5B). DN17014_c0 was annotated as expansin-A15, which belongs to the MYB superfamily, and DN13707_c0 was annotated as WD-repeat–containing protein. They were the two important TFs that expressed highly during the whole flower developments (Table 3). Further comparison showed that DN17014_c0 shares sequence similarity with the TF MYB105-like gene, and the average value was about 95%. DN11279_c3 and DN19744_c1 annotated as MYB-related proteins are highly expressed during initial flower development in ‘Dutch Red’ (FPKM as 105.1 and 160.6, respectively), but during the blossomed stages they were significantly decreased. This suggests that DN11279_c3 and DN19744_c1 were more likely the floral activators of ‘Dutch Red’. Another MYB protein, DN18449_c0, annotated as trihelix TF GT-3a, was exclusively expressed highly in the blossomed bracts of ‘Dutch Red’ (FPKM as 116.7), but was significantly lower in the blossom bracts of ‘Chocolate’ (Table 3). A comparison of the MYB TFs between ‘Dutch Red’ and ‘Chocolate’ showed that DN21970_c4, annotated as divaricata-like MYB gene, had significantly high expression in the blossomed bracts of both ‘Dutch Red’ and ‘Chocolate’ (Table 3), indicating that DN21970_c4 was possibly the common TF in regulating later floral development in C. alismatifolia.

Distribution and selection of key differently expressed TFs associated with anthocyanin biosynthesis. (A) Distribution of TF family of unigenes. (B) Clustering heat maps of significantly enriched differently expressed TFs including MYB, bHLH and WD during three development stages and between species ‘Chocolate’ and ‘Dutch Red’. TF, transcription factors.

Table 3

Differently expressed genes in transcription factor families of MYB and WD40.

Unigene	Annotation	FPKM-SF	FPKM-HF	FPKM-BF	FPKM-YF	Regulation
DN13707_c0_g2	WD-repeat protein	306.8	283.4	244.7	226.5	Whole flowering
DN14303_c2_g4	WD-repeat protein	90.3	97.0	97.2	101.5	periods
DN17014_c0_g2	MYB_superfamily	613.5	1,695.1	531.6	718.1
DN20674_c0_g1	WD-repeat protein	121.4	131.1	136.2	67.3
DN15014_c1_g1	MYB_superfamily	66.3	27.2	13.4	11.8	Early flowering
DN11279_c3_g3	MYB_superfamily	105.1	64.9	0.8	38.9	periods
DN19744_c1_g3	MYB_superfamily	160.6	132.0	45.9	53.1
DN12166_c1_g8	MYB_superfamily	16.3	10.9	41.8	6.4	Blossom periods in
DN12860_c2_g5	MYB_superfamily	4.5	5.2	22.7	2.8	‘Dutch Red’
DN15887_c1_g3	MYB_superfamily	14.1	10.7	36.5	2.8
DN22635_c0_g1	MYB_superfamily	7.1	11.2	26.8	3.3
DN12661_c0_g1	WD-repeat protein	1.9	5.5	36.5	36.2	Blossom periods
DN21970_c4_g5	MYB_superfamily	15.3	121.8	220.9	298.7

FPKM, fragments per kilobase of transcript per million mapped reads; HF, half-flowering.

qRT-PCR analysis validated the differently expressed gene DEGs of the anthocyanin pathway

To evaluate the reliability of RNA-Seq analysis using Illumina sequencing, 14 candidate genes in the anthocyanin biosynthesis and regulations were selected for qRT-PCR test in the four bracts (SF, HF, BF and YF). The oligonucleotide primer sequences used to amplify these transcripts for each gene and related unigene are listed in Table S5 in Supplementary Materials. In general, our qRT-PCR results show a high degree of consistency with the RNA-Seq results (Figure 6). Expression patterns of six (42.85%) genes (F3H1, F3H2, MYB4, WD2, MYB5 and DFR) had almost similar expression patterns but with very small partial inconsistencies compared with the RNA-Seq results. Seven (50%) genes (CHS2, CHI, F35H, MYB1, MYB2, MYB3 and WD1) fit well with the RNA-Seq results across all four samples. The gene CHS1 expression was up-regulated significantly in the yellow ‘Chocolate’ bracts, which was inconsistent with our FPKM database. It is rational and acceptable that there are certain differences in direct comparisons between the RNA-Seq and qRT-PCR results due to different normalisation methods and other technical biases.

qRT-PCR validations of 14 putative genes involved in anthocyanin biosynthesis and regulations. The histograms represent expression determined by qRT-PCR (left y-axis), while lines represent expression by RNA-Seq in FPKM values (right y-axis). The x-axis in each chart represents the SF, HF, BF and YF, respectively. For qRTPCR assays, the mean was calculated from three biological replicates. For RNA-Seq, each point is the mean of three biological replicates. BF, blossomed flowering; HF, half-flowering.

DISCUSSION

Anthocyanin, a family of water-soluble bioactive flavonoids, is considered to be rich in deep-coloured flowers and the content was found to have a correlation with colour (Pavel et al., 2018). The flavonoid biosynthesis, as a branch of the phenylpropanoid pathway, is responsible for the production of anthocyanins, isoflaonoid and flavonols (Tohge et al., 2017). In this study, the comparative transcriptome was used to explore the mechanisms of colour formation of ‘Dutch Red’ and the colour difference between ‘Dutch’ and ‘Chocolate’. Our results suggested that the deepening of bract colour in C. alismatifolia was mostly associated with the up-regulated activity of structural enzymes in anthocyanin biosynthesis pathway at the initial flowering stage. The activity of these structural enzymes in the blossomed ‘Chocolate’ bracts was contrary to those in ‘Dutch Red’, and we examined different expressing patterns of MYB TF, DN12166_c1, DN12860_c2, DN15887_c1 and DN22635_c0, between ‘Chocolate’ and ‘Dutch Red’, which indicated that MYB factors might have participated in regulating anthocyanin biosynthesis and colour formation between different C. alismatifolia varieties. However, their real roles are difficult to define due to the lack of a transformation system in C. alismatifolia.

For anthocyanin biosynthesis pathways, as is shown in Figure 4, when phenylalanine is transformed into coumarate-CoA, it will be introduced into the anthocyanin biosynthesis pathway by CHS and CHI. CHS is one of the key and rate-limiting enzymes of phenylpropanoid pathway which plays superior roles in the production of anthocyanin. Singh and Kumaria (2020) showed that CHS expression was positively correlated with flavonoid and anthocyanin contents from the orchid plant. Accumulation of anthocyanin constituents was also quantified and found to corroborate well with the expression levels of CHS in plants of eggplant (Wu et al., 2020) and Grewia asiatica (Wani et al., 2017). CHI is the second rate-limiting and the first reported enzyme involved in the biosynthetic pathway of anthocyanin. Over-expression of CHI enhanced the flavonoid production in D. cambodiana and tobacco (Zhu et al., 2021). CHI suppression in the tissues of herbaceous peony resulted in the impairing of anthocyanin accumulation (Wu et al., 2018). In this study, up-regulated genes account for the majority of genes related to anthocyanin biosynthesis pathways, such as PAL, 4CL, CHS and UFGT, while their expressions were relatively higher at the beginning of flowering. CHI expression was significantly down-regulated at the full flowering stage of ‘Dutch Red’, but not at the full flowering stage of ‘Chocolate’. This result implies that significant up-regulation of anthocyanin-related genes from the beginning of flowering may be one of the reasons for the deepening of red C. alismatifolia bracts, but the yellow formation mechanism in bracts of ‘Chocolate’ is another way of regulation, which is hard to define in this study. F3′5′H plays a dominant role in the formation of delphinidin-based anthocyanins. Previous studies (Li et al., 2021b; Nguyen et al., 2021) have shown that F3′5′H plays key roles in flower colour regulation. Over-expression of F3′5′H significantly increased the total flavonoid content in petals of Aconitum carmichaelii, and flowers of the transgenic lines of cyclamen showed modified colour and this correlated positively with the loss of endogenous F3′5′H transcript (Boase et al., 2010). In this study, most of the genes related to anthocyanin biosynthesis were up-regulated in YF (Table 2; Figure 4); however, the expression of F3′5′H was significantly down-regulated, which was consistent with the results of Boase et al. (2010). Reduced contents of F3′5′H proteins would make less dihydroflavonol introduced into the anthocyanin production, which is unfavourable for anthocyanin accumulation.

In plants, the structural genes of the flavonoid biosynthetic pathway are significantly regulated at the level of transcription. TFs of R2R3-MYB, bHLH and MBW complex, which activate the transcription of anthocyanins pathway, were widely studied. In previous studies, 13 TFs (MYB, bHLH, WD40, etc,) which showed strongly relating to flavonoid biosynthesis had been screened from the transcriptome analysis of Paeonia by Guo et al. (2019). Three MYB genes and two bHLH genes were strikingly down-regulated in the white flowers of a tobacco mutant (Jiao et al., 2020). WD-repeat protein 5 (WDR5), as a member of the WD40 protein family, is found widely involved in epigenetic regulation, and associated with controlling long noncoding RNAs and TFs (Chen et al., 2021; Dölle et al., 2021). In this study, nine MYB genes and one WD gene (DN12661_ c0) were also found in different expression patterns that regulate anthocyanin biosynthesis. MYB105 belongs to the S21 subfamily of the MYB family, and is involved in the biosynthesis of volatiles and flavonoid metabolites (Chen et al., 2015). In this study, expression of one MYB superfamily (DN17014_c0_g2) highly homologous to MYB105 was significantly higher in the half-blossoming flowers than that in the buds, which is consistent with the comparative transcriptomic analysis of rose studied by Feng et al. (2021). No significant DEGs encoding bHLH proteins were found in comparisons with YF, SF, HF and BF. In conclusion, the regulation of MYB TF is more likely the main factor in the regulation of anthocyanin biosynthesis in C. alismatifolia colourful bracts.

CONCLUSION

To summarise, we applied the RNA-Seq technology to perform transcript analysis of the three flowering stages of ‘Dutch Red’ (SF, HF and BF), and the yellow ‘Chocolate’, YF. Our results revealed that the deepening of C. alismatifolia colour was associated with the up-regulation of anthocyanin biosynthesis at the initial flowering stage. In addition, we examined the down-regulation of the gene F3′5′H, an important enzyme for the synthesis of different basic anthocyanins, in the yellow ‘Chocolate’. This gives clues to how the yellow-green appearance is formed; however, more research is still needed to verify this. In this paper, we also exhibited the differently expressed patterns of MYB TFs associated with anthocyanin-related gene expressions. Above all, our transcriptome analysis provided valuable molecular information for the colour formation of C. alismatifolia.

eISSN:: 2083-5965
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 2 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Biologie, Botanik, Zoologie, Ökologie, andere

Zeitschrift RSS Feed

Comparative RNA-Seq analysis to understand anthocyanin biosynthesis and regulations in Curcuma alismatifolia

Article Category: Original Article

Online veröffentlicht: 08. Mai 2022

Seitenbereich: 65 - 83

Eingereicht: 09. Nov. 2021

Akzeptiert: 13. März 2022

DOI: https://doi.org/10.2478/fhort-2022-0007

Schlüsselwörteranthocyanin biosynthesis, transcriptome analysis

© 2022 Yuan-Yuan Li et al, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Schlüsselwörter
anthocyanin biosynthesis, transcriptome analysis