Uneingeschränkter Zugang

Comparative RNA-Seq analysis to understand anthocyanin biosynthesis and regulations in Curcuma alismatifolia


Zitieren

Figure 1

RNA-Seq data expression profiles in C. alismatifolia colour bracts. (A) Phenotypes of ‘Dutch Red’ at three typical development stages and ‘Chocolate’ at the blossomed stage. (B) Numbers of transcripts in the four bract samples. (C) Principal component analysis of the RNA-Seq data. (D) Venn diagram of the RNA-Seq data from the four bract samples. BF, blossomed flowering; HF, half-flowering; FPKM, fragments per kilobase of transcript per million mapped reads.
RNA-Seq data expression profiles in C. alismatifolia colour bracts. (A) Phenotypes of ‘Dutch Red’ at three typical development stages and ‘Chocolate’ at the blossomed stage. (B) Numbers of transcripts in the four bract samples. (C) Principal component analysis of the RNA-Seq data. (D) Venn diagram of the RNA-Seq data from the four bract samples. BF, blossomed flowering; HF, half-flowering; FPKM, fragments per kilobase of transcript per million mapped reads.

Figure 2

(A) Distributions and (B) KEGG analysis of DEGs identified from pairwise comparisons between different developmental stages and between varieties. The FDR value is the multiple hypothesis test-corrected p-value. The FDR value is in the range of [0–1]. The closer that number is to 0, the more significant the enrichment. The rich factor refers to the ratio of the number of genes among the DEGs located in a number of pathways to the total number of genes in the pathway entries in all of the annotated genes. The greater the rich factor, the greater the degree of enrichment. BF, blossomed flowering; DEGs, differentially expressed genes; FDR, false-discovery rate; HF, half-flowering; KEGG, Kyoto Encyclopedia of Genes and Genomes.
(A) Distributions and (B) KEGG analysis of DEGs identified from pairwise comparisons between different developmental stages and between varieties. The FDR value is the multiple hypothesis test-corrected p-value. The FDR value is in the range of [0–1]. The closer that number is to 0, the more significant the enrichment. The rich factor refers to the ratio of the number of genes among the DEGs located in a number of pathways to the total number of genes in the pathway entries in all of the annotated genes. The greater the rich factor, the greater the degree of enrichment. BF, blossomed flowering; DEGs, differentially expressed genes; FDR, false-discovery rate; HF, half-flowering; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 3

Comparisons of DEGs. (A) Heatmap comparison of DEGs in the most enriched KEGG pathways during the three development stages. (B) Heatmap comparison of DEGs in the most enriched KEGG pathways between ‘Chocolate’ and ‘Dutch Red’. DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Comparisons of DEGs. (A) Heatmap comparison of DEGs in the most enriched KEGG pathways during the three development stages. (B) Heatmap comparison of DEGs in the most enriched KEGG pathways between ‘Chocolate’ and ‘Dutch Red’. DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 4

The anthocyanin biosynthesis process with its core metabolites and enzymes and the expression levels of core enzyme genes. Enzyme names and expression patterns are indicated at the side of each step. Colour boxes from left to right represent unigenes showing lower or higher expression level in the colour bracts of ‘Dutch Red’ of SF, HF, BF and YF, respectively. ANS, anthocyanidin synthase; BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; C4H, cinnamate-4-hydroxylase; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; HF, half-flowering; PAL, phenylalanine ammonia lyase.
The anthocyanin biosynthesis process with its core metabolites and enzymes and the expression levels of core enzyme genes. Enzyme names and expression patterns are indicated at the side of each step. Colour boxes from left to right represent unigenes showing lower or higher expression level in the colour bracts of ‘Dutch Red’ of SF, HF, BF and YF, respectively. ANS, anthocyanidin synthase; BF, blossomed flowering; CHI, chalcone isomerase; CHS, chalcone synthase; C4H, cinnamate-4-hydroxylase; DFR, dihydroflavonol 4-reductase; F3H, flavanone-3-hydroxylase; F3′5′H, flavonoid 3′,5′-hydroxylase; HF, half-flowering; PAL, phenylalanine ammonia lyase.

Figure 5

Distribution and selection of key differently expressed TFs associated with anthocyanin biosynthesis. (A) Distribution of TF family of unigenes. (B) Clustering heat maps of significantly enriched differently expressed TFs including MYB, bHLH and WD during three development stages and between species ‘Chocolate’ and ‘Dutch Red’. TF, transcription factors.
Distribution and selection of key differently expressed TFs associated with anthocyanin biosynthesis. (A) Distribution of TF family of unigenes. (B) Clustering heat maps of significantly enriched differently expressed TFs including MYB, bHLH and WD during three development stages and between species ‘Chocolate’ and ‘Dutch Red’. TF, transcription factors.

Figure 6

qRT-PCR validations of 14 putative genes involved in anthocyanin biosynthesis and regulations. The histograms represent expression determined by qRT-PCR (left y-axis), while lines represent expression by RNA-Seq in FPKM values (right y-axis). The x-axis in each chart represents the SF, HF, BF and YF, respectively. For qRTPCR assays, the mean was calculated from three biological replicates. For RNA-Seq, each point is the mean of three biological replicates. BF, blossomed flowering; HF, half-flowering.
qRT-PCR validations of 14 putative genes involved in anthocyanin biosynthesis and regulations. The histograms represent expression determined by qRT-PCR (left y-axis), while lines represent expression by RNA-Seq in FPKM values (right y-axis). The x-axis in each chart represents the SF, HF, BF and YF, respectively. For qRTPCR assays, the mean was calculated from three biological replicates. For RNA-Seq, each point is the mean of three biological replicates. BF, blossomed flowering; HF, half-flowering.

Summary of assembly results for ‘Dutch Red’ at three development stages and ‘Chocolate’ at the blossomed stage.

Features SF HF BF YF
Total raw reads (Mb) 45.24 48.46 52.98 45.47
Total clean reads (Mb) 44.47 47.55 52.10 44.63
Total clean bases (Gb) 6.59 7.05 7.75 6.61
Clean reads Q20 (%) 98.27 98.23 98.34 98.40
Clean reads Q30 (%) 94.63 94.56 94.83 95.01
Clean reads (pair reads) 22.24 23.78 26.05 22.31
Mapped reads 16,378,830 17,511,187 19,200,458 16,316,551
Mapped ratio (%) 73.66 73.66 73.69 73.13
Total number of transcripts 159,687
Total number of unigenes 69,453
Total sequence base 197,307,177
Average length of transcripts 1,236
N50 value of transcripts 1,830
E90N50 value of transcripts 1,921
GC (%) 42.49
TransRate score 0.30
BUSCO score 60.1% (3.5%)

Differently expressed genes in transcription factor families of MYB and WD40.

Unigene Annotation FPKM-SF FPKM-HF FPKM-BF FPKM-YF Regulation
DN13707_c0_g2 WD-repeat protein 306.8 283.4 244.7 226.5 Whole flowering
DN14303_c2_g4 WD-repeat protein 90.3 97.0 97.2 101.5 periods
DN17014_c0_g2 MYB_superfamily 613.5 1,695.1 531.6 718.1
DN20674_c0_g1 WD-repeat protein 121.4 131.1 136.2 67.3
DN15014_c1_g1 MYB_superfamily 66.3 27.2 13.4 11.8 Early flowering
DN11279_c3_g3 MYB_superfamily 105.1 64.9 0.8 38.9 periods
DN19744_c1_g3 MYB_superfamily 160.6 132.0 45.9 53.1
DN12166_c1_g8 MYB_superfamily 16.3 10.9 41.8 6.4 Blossom periods in
DN12860_c2_g5 MYB_superfamily 4.5 5.2 22.7 2.8 ‘Dutch Red’
DN15887_c1_g3 MYB_superfamily 14.1 10.7 36.5 2.8
DN22635_c0_g1 MYB_superfamily 7.1 11.2 26.8 3.3
DN12661_c0_g1 WD-repeat protein 1.9 5.5 36.5 36.2 Blossom periods
DN21970_c4_g5 MYB_superfamily 15.3 121.8 220.9 298.7

Numbers of transcripts in the 12 sequenced libraries.

seq_id FPKM (<0.5) FPKM (0.5–5) FPKM (5–100) FPKM (>100) SUM (30.5)
SF_1 65,107 61,397 31,536 1,647 94,580
SF_2 62,631 63,656 31,774 1,626 97,056
SF_3 64,446 62,140 31,444 1,657 95,241
HF_1 80,443 49,508 27,811 1,925 79,244
HF_2 67,574 59,459 30,864 1,790 92,113
HF_3 65,607 60,672 31,696 1,712 94,080
BF_1 67,310 59,403 31,170 1,804 92,377
BF_2 66,614 60,394 30,896 1,783 93,073
BF_3 76,848 51,186 29,570 2,083 82,839
YF_1 76,226 51,191 30,451 1,819 83,461
YF_2 84,464 45,418 27,806 1,999 75,223
YF_3 82,758 46,325 28,622 1,982 76,929

Primers used for qRT-PCR analysis.

Genes Sequences (5′-3′) Product size
ANR 12990 CAGCTGAAGCAGATGCAGAGCTCCGGATTCTCCGACATTA 147
ANR 13257 TCACATCATCCTCACCGTGTAAGGGGCTGGAGGAGATTTA 103
CHI 10857 ACGGAGTTCTTCCAGAGCAATTTCTGCATCATCTCCACCA 165
CHI 15898 GCTCGAACACCTCCTTGAACCGACAAGTTCACGAGGATCA 155
CHS 16695 TGAGGGAGAACCCGAATATGGCAGAAGACGAGGTGTGTGA 161
CHS 17923 ATGGCCTGAAGATGGATCAGGCGTCCTCTTCATTCTCGAC 186
DFR 16691 TGCCCGTTTAACTCATTTCCACTGTGGAGAAGGAGCAGGA 101
DFR 14193 GTGGTGTTCACCTCCTCGATCGCTAGCTCTACCCCCTTCT 189
F3′5′H 8362 CTCCTCCTCCGCTACCTTCTGGTACATGATGGGGCCATAG 160
F3′5′H 16064 ATGTGGAGGTTGCAGAGCTTGCGTACGAAGGACAGGACAT 80
F3H 14172 ACGTTGATTGGACCGAACATCTGCGAGATCTACACGGACA 186
F3H 22055 GCCTCTTGCATAGCTTCACCGGTGATCCTGCCAACTCATT 83
F3H 22236 AAGGAGAAGTACGCGTCCAATGTATTCCTCGTTCGCCTTC 184
MYB 11279 TTTCTCCCATGCTGCTTTCTTCACAATGCCACGGTTAAGA 160
MYB 12860 ACCCTATCGTCGAACCACTGTCGACCAAGAGAGCCAACTT 174
MYB 15887 CATCTTCCTCCTTGGAGCTGTCTGGATCTGCGAGGAAAGT 138
MYB 18449 TTCCACTTGGACTTGCACTGAAGAGCTTCACGGAGACGAA 110
MYB 12166 GCCGTCCAAACACATCTTCTGATTTGGTCGAGCCAGAGAG 85
MYB 15014 ATTTCTGCAGATGGCTTGCTATCAATTGGGCATCGAGAAG 99
MYB 17014 CTACGGAGGAGGATGGATGAAAGTTGCCATACCCACAAGC 93
MYB 19744 CGAAGAACACTGCACTGGAATATCCACTGCCTCCTCCATC 190
MYB 21970 GCACCGGTAAGTGCGTTAATTTCGTTTCCATCCATCCATT 159
MYB 22635 GTCGCAATAACCGACCATCTATCATCTGCAGCCTCTTCGT 125
UFGT 12119 TCGGCGATCAAGGTAGATTCGGATGAAGCTGAGCAACTC 137
UFGT 13065 CGGAGGTAGCCATCAACAATATCCCACATGGTGGTCTTGT 143
UFGT 13592 CTGGAGCTAAGGAGCAAGGATACCAGGGGTAATGGGATGA 118
WD 12661 CAGTAGCCATGGCAGCTACAATGGCGGATGAGTACCAAAG 109
WD 13707 CCAAAGAAGTCCAGCTCCTGAATTTTTGGCAATGGCAGAG 91
WD 14303 CAGGCGATGTGAAGAATTGAGCAACTGATGGGGGTAAAGA
WD 20674 TTCGTCCCTTTGAGTTTGCTATGGGCTGTAACACGTAGCC

RNA sequencing data and corresponding quality control.

Sample Raw reads Raw bases Clean reads Clean bases Error rate (%) Q20 (%) Q30 (%) GC content (%) Accession number
BF_1 54,170,030 8,179,674,530 53,271,940 7,920,660,194 0.0243 98.33 94.81 48.94 SRR17783056
BF_2 55,251,576 8,342,987,976 54,405,312 8,096,016,364 0.0241 98.41 94.99 48.87 SRR17783055
BF_3 49,533,034 7,479,488,134 48,632,006 7,241,216,477 0.0244 98.29 94.7 49.33 SRR17783054
HF_1 47,378,590 7,154,167,090 46,432,354 6,880,585,840 0.0248 98.14 94.35 50.69 SRR17783059
HF_2 47,325,446 7,146,142,346 46,485,250 6,893,639,691 0.0244 98.27 94.67 49.13 SRR17783058
HF_3 50,665,694 7,650,519,794 49,743,012 7,381,231,384 0.0244 98.28 94.67 48.55 SRR17783057
SF_1 43,599,698 6,583,554,398 42,859,044 6,355,814,159 0.0247 98.16 94.36 48.96 SRR17783064
SF_2 44,423,394 6,707,932,494 43,751,414 6,485,935,396 0.024 98.48 95.15 48.33 SRR17783063
SF_3 47,696,422 7,202,159,722 46,810,850 6,927,255,232 0.0247 98.16 94.38 48.97 SRR17783060
YF_1 45,005,912 6,795,892,712 44,227,366 6,543,730,785 0.0241 98.4 95 49.34 SRR17783053
YF_2 47,249,416 7,134,661,816 46,353,918 6,861,606,053 0.0241 98.41 95.05 50.28 SRR17783062
YF_3 44,148,530 6,666,428,030 43,297,006 6,412,165,282 0.0241 98.39 94.99 49.7 SRR17783061

Putative structural genes in anthocyanin biosynthesis identified from DEGs.

Unigene Annotation FPKM-SF FPKM-HF FPKM-BF Log2 Ratio (BF/SF) FPKM-YF Log2 Ratio (BF/YF)
DN17923_c1_g3 CHS-like protein 2,593.0 1,906.1 185.4 −3.80 2,667.3 −3.8
DN16695_c0_g5 CHS 80.4 760.5 298.4 1.9 2,187.2 −2.9
DN10857_c0_g4 Probable chalcone – flavonone isomerase 3 isoform X1 (CHI) 501.8 344.7 38.0 −3.7 561.0 −3.9
DN15898_c0_g1 Chalcone – flavonone isomerase (CHI) 152.8 139.5 20.0 −2.9 237.0 −3.6
DN14172_c2_g1 F3H 976.4 886.8 232.7 −2.1 1,440.4 −2.6
DN22055_c1_g3 Flavonol synthase/F3H 5.1 7.3 11.9 1.2 1.4 3.1
DN22236_c1_g1 Flavonol synthase/F3H 296.2 125.1 3.0 −6.6 257.7 −6.4
DN16064_c1_g1 Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H) 1,422.9 1,389.6 1,138.9 −0.3 64.0 4.2
DN8362_c0_g1 Flavonoid 3′,5′-hydroxylase 1-like (F3′5′H) 6.6 14.1 2.3 −1.5 20.0 −3.1
DN14193_c0_g5 DFR 7.5 8.8 2.4 −1.6 18.5 −2.9
DN16691_c4_g1 DFR 67.8 259.7 117.8 −0.8 570.8 −2.3
DN12990_c1_g6 Anthocyanidin reductase (ANR) 64.7 331.3 90.8 0.5 323.2 −1.8
DN13257_c0_g1 Anthocyanidin reductase (ANR) 27.8 249.5 85.3 1.6 277.8 −1.7
DN12119_c3_g2 Anthocyanidin 3-O-glucosyltransferase 7-like flavonoid 3-O-glucosyltransferase (UFGT) 191.51 222.9 58.1 −1.7 250.3 0.38
DN13065_c1_g1 Anthocyanin 3′-O-beta-glucosyltransferase-like (UFGT) 3.74 2.6 0.24 −4.0 11.2 −5.54
DN13592_c1_g3 Anthocyanidin 3-O-glucosyltransferase-like (UFGT) 155.8 42.5 3.1 −5.67 196.8 −6.0

Detailed annotation of differentially expressed MYBs and WD proteins.

Unigene Hit_name Description Identity (%) Similarity (%) KO Paths Swiss-Prot
DN13707_c0_g2 XP_009415403.1 WD-repeat protein 88 95.3 gi|74698589|sp|Q9Y7K5.2|YGI3_SCHPO
DN14303_c2_g4 ONM38806.1 WD-repeat protein 76.8 85.4 K10752 gi|22096353|sp|O22607.3|MSI4_ARATH
DN17014_c0_g2 XP_009404081.1 MYB_superfamily 88 91.6 gi|115502385|sp|O80622.2|EXP15_ARATH
DN20674_c0_g1 XP_010931565.1 WD-repeat protein 79.7 91.5 gi|75318693|sp|O80775.2|WDR55_ARATH
DN15014_c1_g1 XP_009385043.1 MYB_superfamily 54.9 63.7 K14491 map04075 gi|75323583|sp|Q6H805.1|ORR24_ORYSJ
DN11279_c3_g3 XP_017696697.1 MYB_superfamily 95.6 97.8 K09422 gi|75317981|sp|O22264.1|MYB12_ARATH
DN19744_c1_g3 XP_009418439.1 MYB_superfamily 76.3 82.4 gi|75330977|sp|Q8S9H7.1|DIV_ANTMA
DN12166_c1_g8 XP_016205675.1 MYB_superfamily 86.2 92.6 K09422 gi|56749347|sp|Q8LPH6.1|MYB86_ARATH
DN12860_c2_g5 XP_023531843.1 MYB_superfamily 49.8 59.8 K09422 gi|75333993|sp|Q9FKL2.1|MYB36_ARATH
DN15887_c1_g3 XP_009408601.1 MYB_superfamily 44.7 55 gi|75338846|sp|Q9ZQ85.2|EFM_ARATH
DN22635_c0_g1 PSS31516.1 MYB_superfamily 66.9 74.4 K09422 gi|75335856|sp|Q9M2Y9.1|RAX3_ARATH
DN12661_c0_g1 XP_009389455.1 WD-repeat protein 72.9 78.9 K14963 gi|82232080|sp|Q5M786.1|WDR5_XENTR
DN21970_c4_g5 XP_009405310.1 MYB_superfamily 59.5 66.4 gi|75330977|sp|Q8S9H7.1|DIV_ANTMA

Functional annotation of transcripts and unigenes in databases.

Transcript number (%) Unigene number (%)
NR 104,303 (0.6532) 34,189 (0.4923)
Swiss-Prot 82,193 (0.5147) 26,589 (0.3828)
Pfam 73,262 (0.4588) 24,101 (0.347)
COG 25,088 (0.1571) 7,537 (0.1085)
GO 71,758 (0.4494) 23,229 (0.3345)
KEGG 47,453 (0.2972) 14,838 (0.2136)
Total_anno 1,05,993 (0.6638) 35,183 (0.5066)
Total 1,59,687 (1) 69,453 (1)
eISSN:
2083-5965
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
2 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Biologie, Botanik, Zoologie, Ökologie, andere