Complete Mitogenome of Cruznema Tripartitum Confirms Highly Conserved Gene Arrangement within Family Rhabditidae

Abstract Mitochondrial genomes have widely been used as molecular markers in understanding the patterns and processes of nematode evolution. The species in genus Cruznema are free-living bacterivores as well as parasites of crickets and mollusks. The complete mitochondrial genome of C. tripartitum was determined through high-throughput sequencing as the first sequenced representative of the genus Cruznema. The genome is comprised of 14,067 bp nucleotides, and includes 12 protein-coding, two rRNA, and 22 tRNA genes. Phylogenetic analyses based on amino acid data support C. tripartitum as a sister to the clade containing Caenorhabditis elegans and Oscheius chongmingensis. The analysis of gene arrangement suggested that C. tripartitum shares the same gene order with O. chongmingensis, Litoditis marina, Diplocapter coronatus, genus Caenorhabditis, and Pristionchus pacificus. Thus, the mitochondrial gene arrangement is highly conserved in the family Rhabditidae as well as some species in Diplogasteridae.


Introduction
The nematode genus Cruznema Artigas, 1927 belongs to the family Rhabditidae Örley, 1880under the order Rhabditida Chitwood, 1933. The species in the genus are either free-living bacterivores (Tahseen et al., 2012;Sultana and Pervez, 2019) or parasites of cricket (Reboredo and Camino, 1998;Camino and Achinelly, 2011) and mollusk (Grewal et al., 2011). Currently, seven valid species have been described, including Cruznema campestre Reboredo and Camino, 2000, Cruznema graciliformis (Goffart, 1935) Sudhaus, 1978, Cruznema lincolnense Reboredo and Camino, 1998, Cruznema minimus Sultana and Pervez, 2019 (Sudhaus, 1978) Andrássy, 1983, C. tripartitum (von Linstow, 1906 Sudhaus, 1974, andCruznema velatum Brzeski, 1989. Inferring accurate phylogenies can provide significant insights into the patterns and mechanisms of evolution, and is thus a hot research topic that has been recurrently discussed among biologists. Within Nematoda, although a phylum-wide phylogeny has been established based on rRNA (Blaxter et al., 1998;Holterman et al., 2006;Bert et al., 2008), there are areas that do not seem amenable to resolution by this single nuclear gene. More recently, transcriptomes based phylogenome were introduced in Nematoda, and their role as a powerful tool in resolving phylogeny has been proven (Ahmed et al., 2021). However, the approach requires fresh nematodes, needs high sequencing depth, and involves substantial downstream bioinformatics work. In contrast, the complete mitochondrial genome (mitogenome) is a powerful alternative. It has multiple genes but in relatively small genome size, and is thus easier to handle for sequencing and downstream phylogeny reconstruction. Moreover, mitogenome is independent from nuclear gene, and thus can provide extra information on evolution background. Besides, the differences in gene order arrangement, base composition, and codon usage can also be used for comparative genomics (Jameson et al., 2003). Indeed, the mitogenome-based Nematoda phylogeny has yielded well-supported results for clades that were not well-resolved using other approaches (Kern et al., 2020).
Although there are over 200 nematode mitogenomes that have been sequenced, most of them are from economically important species or zooparasitic species (Kern et al., 2020), and very rarely are they from the free-living taxa. This bias in selection of taxa is partly a result of sequencing obstacles, as a majority of genomes have been acquired by long PCR together with Sanger sequencing technology (e.g., Hu et al., 2002;Kim et al., 2015Kim et al., , 2017. Given that mitogenome is highly variable while a closely related reference genome is often missing, a successful primer design and PCR amplification is practically difficult and timeconsuming. The recent development of highthroughput approaches holds great promise for sequencing and annotating mitogenomes, but it has only been applied in a few nematodes (e.g., Ma et al., 2020;Qing et al., 2021).
In this study, through high-throughput sequencing in the Illumina platform, we determined the complete mitogenome of Cruznema tripartitum, which is the first representative of the genus. We further annotated the genome features, including gene content and genome architecture. These sequences were also used to infer the placement of C. tripartitum within the Nematoda by phylogenetic analysis.

Nematode extraction
The specimen of C. tripartitum were originally extracted from soil in a chestnut orchard in Guangzhou, China (GPS coordinates: 23°56¢44²N, 114°42¢15²E). This species is a free-living bacterivore, and the population was established through pure culture using nematode growth medium (NGM) supplemented with Escherichia coli OP50 and maintained at 24°C.
An approximate number of 8,000 living individual nematodes were used for genomic DNA extraction using Ezup Column Animal Genomic DNA Purification Kit (Sangon Biotech Co., Ltd., Shanghai, China) according to the manufacturer's protocol. The DNA concentration was examined by a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) using the Qubit 1X dsDNA HS Kit (Yeasen Biotech, Shanghai, China). A total of 200 ng DNA extraction was used for sequencing.

High-throughput sequencing and genome assembly
For library preparation, the TruSeq DNA Sample Prep Kit (Illumina Inc. San Diego, CA, USA) was used according to the manufacturer's instructions. Library concentration and fragment size distribution were determined using the Qubit fluorometer and Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). Sequencing was conducted by Personalbio, Ltd. (Shanghai, China) using an Illumina NovaSeq platform.
For quality control and trimming, reads less than 50 bp, Q-score less than 20, or number of n > 3 were removed by FASTP (Chen et al., 2018). The resultant reads were mapped to the reference mitogenomes of 32 nematode species and 11 insect species using NextGenMap program (Sedlazeck et al., 2013) with a similarity threshold of 0.3. The candidate raw reads that matching at least one direction in the alignment results are extracted by SAMtools (Li et al., 2009). Reads were assembled in NOVOPlasty (Dierckxsens et al., 2017) using seedextend algorithm. Since no cox1 gene was available in GenBank for the genus Cruznema, a partial cox1 belonging to the closely related Pelodera strongyloides (GenBank accession number LR700242) was used as a "seed" sequence. A total of 554,098 reads were used for the final assembly that generated a circular DNA molecule.
Multiple alignments of amino acid sequences of 12 PCGs were carried out individually using TranslatorX (Abascal et al., 2010) under invertebrate mitochondrial genetic code and MAFFT approach. The concatenate sequence matrix was generated using Geneious (Biomatters, Auckland, New Zealand) and 12 PCG alignments were arranged in following order: nad6-nad5-nad4-nad4L-nad3-nad2-nad1-cytb-cox3-cox2-cox1-atp6. The phylogeny placement of the newly sequenced C. tripartitum was inferred by maximum likelihood (ML) method together with the full mitogenomes of 36 other nematode species. An isopod parasitic mermithid nematode Thaumamermis cosgrovei was selected as the outgroup. The phylogeny tree was constructed using RAxML 8.2.12 (Stamatakis et al., 2014) with MtZoa protein substitution model and under a total of 1,000 bootstrap replicates. The phylogeny analysis was performed in the CIPRES Science Gateway (Miller et al., 2010).

Mitogenome characterization
A total of ca.105 million reads (paired-end 150 bp) were generated and 99.04% of these reads were of high-quality (Q-score > 30). Approximately 10 GB raw data were mapped to the references, resulting in 0.53% mitochondrial reads; and these reads were subsequently used for mitogenome assembly.
The complete mitogenome of C. tripartitum (GenBank accession OM234676) is a 14,067 bp single circular DNA that contains 12 PCGs (atp6, cob, cox1-3, nad1-6, and nad4L), two ribosomal RNA genes, and 22 tRNA genes (Fig. 1). The atp8 gene was missing in the assembled mitogenome. All genes of C. tripartitum mitogenome are encoded in the same direction, which is a result similar to all other known chromadorean nematodes. The nucleotide composition is strongly biased toward A + T (78.7%), with 48.1% for T, 7.5% for C, 30.6% for A, and 13.8% for G. An intergenic non-coding region was found between trnA and trnP in a length of 696 bp. This region has much higher A + T content (87.1%) in comparison with any other part of mitogenome.

Phylogeny and gene arrangement
Maximum likelihood phylogenetic analysis of C. tripartitum was conducted using the translated amino acid data set of 12 PCGs (3,893 alignment characters) together with those of 36 other nematodes (32 chromadoreans and 4 enopleans; available in NCBI GenBank) as references (Fig. 3). In general, the phylogeny we revealed was congruent with previously published phylogeny inferred by rRNA genes (Blaxter et al., 1998;De Ley and Blaxter, 2004;Holterman et al., 2006;Meldal et al., 2007) or mitogenome (Kim et al., 2015(Kim et al., , 2017. The family Rhabditidae is paraphyletic, containing a moderately supported major clade (BS = 87) and a separated Diploscapter coronatus sistering to several genera in Rhabditina, Tylenchina, and Spirurina. In the major clade of Rhabditidae, the newly sequenced C. tripartitum is sister to Oscheius chongmingensis, Litoditis marina, and species in the genus Caenorhabditis.
The analyses of gene arrangement suggested Rhabditidae representing by C. tripartitum, O. chongmingensis, L. marina, Diplocapter coronatus and genus Caenorhabditis are highly conserved in arrangement pattern, except for C. briggsae which alanine was not annotated (Fig. 4). A similar arrangement pattern was also found in some of Diplogasteridae species, i.e., Pristionchus pacificus shares exactly the same pattern, while Koerneria sudhausi only slightly differs in the placement of a few tRNAs (trnM, trnD, and trnA). Interestingly, such conserved arrangement may also extend to the suborder Tylenchina, e.g., the superfamily Aphelenchoidea represented by Aphelenchoides medicagus, A. besseyi, Bursaphelenchus xylophilus, and B. mucronatus share identical arrangement of PCGs to those species from Rhabditidae, although translocations can apply to some of the tRNAs.

Discussion
The metazoan mitogenomes are approximately 14-20 kb in size and usually comprise 36-37 genes, including [12][13]atp8,cob,and nad4L), two rRNA genes (small The genes are denoted by the color block and encoded in the clockwise direction. The leucine and serine tRNA genes are separated by their anticodon sequences: L1 is trnL-cta, L2 is trnL-tta, S1 is trnS-aga, and S2 is trnS-tca. The region without color indicates the non-coding region. and large subunit ribosomal RNA), and 22 tRNA genes (Wolstenholme, 1992;Boore, 1999). The length of C. tripartitum mitogenome is between the range of 13,794 bp by Caenorhabditis elegans and 15,413 bp by O. chongmingensis. The assembled mitogenome consists of 12 PCGs, 22 tRNA genes, and two rRNA genes; this composition is identical to that of other reported species in Rhabditida  (e.g., C. elegans), but different from that of the vertebrate parasite Trichinella (Lavrov and Brown, 2001) and many metazoans, in that the atp8 gene is absent (Gissi et al., 2008). The mitogenome gene arrangement is considered as a useful tool to interpret phylogenetic relationships (Boore and Brown, 1998;Fritzsch et al., 2006). In this study, we recovered a highly conserved gene order in Rhabditidae and Diplogasteridae. This is in line with their intimate sister relationships revealed by both rRNA and mitogenome base phylogeny (Holterman et al., 2006;Park et al., 2011;Kim et al., 2015). Consequently, gene order can be used as an informative reference in higher level phylogeny. Conversely, gene order varies greatly among many Enoplea nematode species, and even congeneric species may have completely different arrangements. Therefore, the use of gene order to infer phylogenetic relationship within Enoplea is limited (Tang and Hyman, 2007;Hyman et al., 2011).
Complete mitogenome has shown promise as an additional independently evolving genome for developing phylogenetic hypotheses for nematodes (Hu and Gasser, 2006;Sultana et al., 2013;Humphreys-Pereira and Elling 2014;Sun et al., 2014;Kern et al., 2020). Regardless of their importance, the study of nematode mitogenome falls far behind those of other metazoans (Ye et al., 2014). Compared with more than 25,000 described nematodes (Zhang, 2013), only around 200 species were sequenced for mitogenome (Kern et al., 2020). One main bottleneck that limits nematode mitogenome sequencing is the primer selection in long PCR amplification. Indeed, a success primer design is often challenging for using few but high variable existing references. As a result, plenty of primer pairs were designed for a specific species (Humphreys-Pereira and Elling, 2014;Kim et al., 2015). With the development of high-throughput sequencing techniques, assembly of mitogenome from whole genome sequencing data Figure 3: Maximum likelihood phylogenetic tree inferred from amino acid sequences of 12 concatenated protein-coding genes containing 3,893 alignment characters. The newly obtained sequence is indicated with asterisk. became a feasible alternative. However, only few species have been sequenced using this approach, e.g., Globodera ellingtonae, Hoplolaimus columbus, and A. medicagus (Phillips et al., 2016;Ma et al., 2020;Qing et al., 2021). Using the Illumina platform for sequencing and seed-extend algorithm assembly, we obtained the first mitogenome representative of the genus Cruznema and demonstrated that high-throughput sequencing is a promising option to acquire mitogenome in a time-and costefficient manner. Although substantially increased taxa sampling is still needed, our result updates mitogenome phylogeny and provides a valuable reference for future high-throughput sequencing based mitogenome research. Figure 4: Linearized mitochondrial gene arrangement patterns of Cruznema tripartitum ( indicated in bold) and other related species in suborders Rhabditina and Tylenchina. All genes are encoded in the same direction. All 22 tRNA genes are designated by a single letter amino acid code, based on the International Union of Pure and Applied Chemistry code. The two leucine and two serine tRNA genes are labeled according to their distinct anticodon sequences as L1 (trnL1-cta), L2 (trnL2-tta), S1 (trnS1-aga), and S2 (trnS2-tca). The non-coding regions are not included.