This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rice is the second most important food crop in the world after corn based on the total production. In 2016, rice was cultivated in 161.1 million ha area, and the global production was 482 million metric tons (World Rice Statistics, International Rice Research Institute, Manila, Philipines, http://ricestat.irri.org:8080/wrsv3/entrypoint.htm). The rice root-knot nematode, Meloidogyne graminicola, has emerged as a devastating pest of rice in South-East Asia (Dutta et al., 2012; Mantelin et al., 2016), where it is highly damaging under upland, rainfed lowland (Prot et al., 1994) and irrigated (Netscher and Erlan, 1993) cultivation conditions. Severe M. graminicola infection is known to cause 100% damage to the rice nursery. Here, we report the sequencing and assembly of the genome of M. graminicola IARI strain. This resource would help researchers investigate and understand the unique biology of this nematode and discover new strategies for its management.
Considering the ~30 Mb genome size of M. graminicola as predicted by Feulgen densitometry (Lapp and Triantaphyllou, 1972), we planned to generate two libraries of varying insert length with ~150× depth of data (~4.5 Gb) per library using paired-end sequencing to achieve a comprehensive assembly. The M. graminicola population was collected from the infected rice fields from Indian Agricultural Research Institute farm, New Delhi, and multiplied from a single egg mass in pots under greenhouse conditions. Freshly hatched second stage juveniles were used for the genomic DNA extraction using Gentra Puregene Tissue Kit (Cat No.: 158667 Qiagen, Valencia, CA, USA). The short (150–200 bp) and long (300–500 bp) DNA fragments were obtained by diluting 1 µg of genomic DNA in 100 µl nuclease free water (Ambion, Waltham, MA, USA) and sonication by Bioruptor (Diagenode, Seraing (Ougrée), Belgium) at 20 and 13 pulses at 30 sec ON and 30 sec OFF, respectively. The resulting fragmented DNA was cleaned using QIAquick columns (Qiagen, Valencia, CA, USA). The size distribution was checked by running an aliquot of the fragmented DNA sample on Agilent high sensitivity bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Subsequently, the libraries for whole genome sequencing were constructed as per the Illumina TruSeq DNA sample preparation guide (Illumina, San Diego, CA, USA). The sequencing was performed on Illumina GAIIx platform at the Genotypic Technology Pvt. Ltd., Bengaluru, India.
A total of ~130 million raw reads were generated comprising of 13 Gb sequence data using 100 bp paired-end sequencing. Approximately 120 million High Quality (HQ) reads were obtained from the raw data by using NGS QC Tool Kit v.2.3.3 (Patel and Jain, 2012). These ~12 Gb of 120 million HQ reads were better than our planned strategy expecting nine Gb. The HQ reads obtained from both short and long insert libraries were used to generate primary assembly using Platanus assembler v.1.2.4 (Kajitani et al., 2014), and the resulting contigs were further scaffolded using Platanus Scaffolding module to generate secondary assembly. The secondary assembly was further refined by Redundans pipeline (Pryszcz and Gabaldón, 2016) to generate the final genome assembly with a minimum sequence length of 500 bp. The contaminating mitochondrial and bacterial sequences were identified by NCBI servers and removed from the draft genome assembly prior to submission to the NCBI GenBank. The mitochondrial genome was assembled separately from complete HQ reads using SPAdes assembler (Bankevich et al., 2012) with coverage cutoff of 500, wherein available M. graminicola mitochondrial genome sequences (accession nos. HG529223, KJ139963) obtained from GenBank were provided as trusted contigs to the SPAdes assembler. This resulted in only 4 scaffolds from the assembly. The resulted scaffolds from SPAdes assembler were further merged using EMBOSS merger tool (Rice et al., 2000) to construct full length mitochondrial genome. The assembled genome was further annotated using MITOS (Bernt et al., 2013) and ARWEN (Laslett and Canbäck, 2008) servers.
The final M. graminicola genome sequence assembly was of 38.18 Mb size, and included 4,304 scaffolds with an average scaffold length of 8.87 Kb. The minimum and maximum scaffold length was 501 bp and 145 Kb, respectively. The N50 and N90 lengths for the final assembly were 20.4 Kb and 4.2 Kb, respectively. The GC content of the assembled genome was 23.05%, and there were 1.88% N’s in the assembly. Core Eukaryotic Genes Mapping Approach (CEGMA) (Parra et al., 2007) was used to assess the completeness of the M. graminicola genome assembly, and out of 248 core genes, 209 complete (84.27%) and 225 partial (90.73%) core eukaryotic genes (CEGs) were found to be present. Identification of protein-coding genes was carried out by using GenMark-ES tool (Borodovsky and McIninch, 1993) which predicted 10,196 protein-coding genes, as compared with 6,712 to 20,317 in other plant-parasitic nematode genomes (summarized in Kikuchi et al., 2017). Functional annotation of predicted M. graminicola protein-coding genes performed using OrthoMCL (Li et al., 2003) identified 5,427 proteins that shared high homology with other Meloidogyne spp. In addition, 245 tRNA genes were predicted. The mitochondrial genome sequence of M. graminicola IARI strain was 19,019 bp long and contained 12 protein-coding genes, 22 tRNA and two ribosomal RNA genes. Based on the mitochondrial genome sequence, the M. graminicola IARI strain appears phylogenetically closer to the M. graminicola strain from Philippines (HG529223, 20,030 bp, Besnard et al., 2014) as compared with the Chinese strain (KJ139963, 19,589 bp, Sun et al., 2014).
The present assembly size deviates from that of the ~30 Mb as predicted by Feulgen densitometry (Lapp and Triantaphyllou, 1972). Using sequencing technologies that produce longer reads such as PacBio or mate pair sequencing to obtain better genome assemblies, and, inbreeding the nematode strain to be used for sequencing to reduce possible heterozygosity might help in correcting the mismatch between predicted and assembled genome sizes. However, N50 value, complete and partial CEGs and other genome statistics for our M. graminicola assembly are comparable to the closely related and published plant-parasitic nematode genomes solved using similar sequencing platforms (Supplementary Table A1).
A comparison of Meloidogyne graminicola genome information with the published plant-parasitic nematode genomes.
Sl. no.
Nematode
Sequencing platform
Assembly approach/assembler/assembly-pipeline
Assembly size
No. of scaffolds
N50 value (kbp)
CEGMA score (complete/partial %) or complete%
Reference
Year
1
Meloidogyne incognita
Sanger, ABI3730x1 DNA analyzer
De-novo, Arachne
86.1
2,995
62.5
77/80.6
Abad et al. (2008)
2008
2
Meloidogyne hapla
ABI3730, megabase dequence analyzer
De-novo, Arachne v2.0.1
53.0
3,452
37.6
94.8/96.8
Opperman et al. (2008)
2008
3
Bursaphelenchus xylophilus
454 FLX, illumina GAI
De-novo, Newbler v2.3, Velvet v 1.0.12, AbacasII, IMAGE, iCORN
74.6
5,527
949.8
97.6/98.4
Kikuchi et al. (2011)
2011
4
Meloidogyne floridensis
Illumina HiSeq2000
De-novo, Velvet v1.1.04
96.7
58,696
3.7
58.1/77.4
Lunt et al. (2014)
2014
5
Globodera pallida
ABI 3730 Capillary DNA Analyser, 454 GS-20 and GS-FLX sequencer, Illumina GAIIx
This draft genome sequence would be useful for the researchers working on comparative genomics of Meloidogyne and other tylenchid nematodes, and enable functional genomics in M. graminicola. We understand that the present M. graminicola draft genome is incomplete, and expect to improve it in the near future. The present assembly would work as a base for the further improvement of the M. graminicola genome sequence.
GenBank accession numbers: The Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession NXFT00000000. The raw DNA sequence data was deposited in GenBank under BioSample no. SAMN04041660, BioProject No. PRJNA411966 and SRX1224028 (long insert library) and SRX1223928 (short insert library), respectively. The mitochondrial genome was submitted to GenBank under accession no. MG763561.