Uneingeschränkter Zugang

Nematode genome announcement: A chromosome-scale genome assembly for the Pristionchus pacificus reference mapping strain PS1843

,  und   
11. Sept. 2024

Zitieren
COVER HERUNTERLADEN

The free-living nematode Pristionchus pacificus is an established model species for comparative studies with Caenorhabditis elegans and for studying developmental plasticity (Sommer and McGaughran, 2013; Sommer et al., 2017). P. pacificus has an advanced genetic toolkit and a high quality genome (Rödelsperger et al., 2017; Han et al., 2020; Hiraga et al., 2021; Igreja et al., 2022). Several forward genetic studies have exploited the wealth of natural isolates in order to investigate the genetic basis of various traits (Mayer et al., 2015; McGaughran et al., 2016; Moreno et al., 2016; Dardiry et al., 2023). One particular strain of P. pacificus (PS1843) that had originally been isolated from the state of Washington has been frequently employed for genetic analysis (Srinivasan et al., 2002; Sieriebriennikov et al., 2020; Rillo-Bohn et al., 2021). Low coverage sequencing data for this strain has been available since the first sequencing of the P. pacificus genome and revealed more than a million differences with regard to the P. pacificus PS312 genome, i.e., sequence divergence ~1% (Dieterich et al., 2008; Rödelsperger et al., 2014). Recently, we resequenced and assembled the complete PS1843 genome from Illumina short read data (Prabh and Rödelsperger, 2022). Based on benchmarking of universally conserved single copy orthologs (BUSCO) (Simão et al., 2015), the completeness of the Illumina genome was comparable to the chromosome-scale genome of reference strain PS312 from California that was assembled from PacBio long reads (88.8% vs. 92.6%, Table 1). However, the genome was still quite fragmented (Table 1), and specific analysis of gene losses often turned out to be inconclusive due to the presence of assembly gaps (Prabh and Rödelsperger, 2022). Therefore, we decided to resequence the P. pacificus PS1843 genome using the PacBio platform. Worms were washed off of 100 plates using M9 buffer, and genomic DNA was extracted using the Qiagen genomic DNA extraction kit according to the manufacturer’s instructions. DNA quality and quantity were determined with a NanoDrop spectrometer (ThermoFisher), a Qubit fluorometer (Invitrogen), and a Femto pulse system (Agilent). A genomic library was prepared according to the manufacturer’s protocol (Pacific Biosciences) using a BluePippin size-selection system and was sequenced on a multiplexed run of a PacBio Sequel II SMRT cell. This yielded around 730,000 reads with a median read length of 11.6kb (Interquartile range: 4.5–24.4kb). This translates into 11.7Gb of raw sequence, which corresponds to around 70x coverage of the P. pacificus genome. Raw PacBio reads were assembled by the software Canu (version 1.4, options: genomeSize=180m -pacbio- raw) (Koren et al., 2017). This resulted in an initial assembly of 63 contigs spanning 171 Mb with an N50 value of 5.9 Mb. BLASTN searches (e-value < 10−5) did not identify any of the contigs as contamination from the Escherichia coli OP50. We then scaffolded the assembly using a reference-guided approach as implemented in the software RagTag (version 2.1.0) (Alonge et al., 2022), which took the chromosome-scale assembly for the P. pacificus strain PS312 (version El Paco) as reference (Rödelsperger et al., 2017). This resulted in a pseudochromosomal assembly of 11 scaffolds spanning 171 Mb with an N50 value of 26.3 Mb. BUSCO completeness of the genome assembly was estimated to be 92.4% (91.1% single copy + 1.3% duplicates). To annotate protein-coding genes, we employed the PPCAC pipeline (version 1) (Rödelsperger, 2021), which generated evidence-based gene annotations from existing gene annotations for the P. pacificus reference strain PS312 (El Paco gene annotations, version 3) (Athanasouli et al., 2020), as well as from RNA-seq data that was generated previously (Rödelsperger et al., 2018). We refined the gene annotations by removing species-specific orphan genes that overlapped in the antisense direction with conserved genes, as we previously observed that coding potential on the antisense strand can result in gene annotation errors (Athanasouli et al., 2020). Evaluating the completeness of the final gene annotations using BUSCO (version 3.1, nematode odb9 data set) revealed a completeness level of 94.9%. The most notable distinguishing feature between the PS1843 and the PS312 genome is a difference of ~16Mb in genome size (Table 1). Whole genome alignments by the minimap2 software (Li, 2018) revealed megabase-sized regions on ChrIII, ChrIV, and ChrX that have no correspondence in the PS312 genome and explain the majority of the genome size difference (Figure 1). These three regions account for roughly 11Mb and contain around 1100 genes, most of which are conserved with other Pristionchus species, suggesting that these regions represent ancestral haplotypes that have been retained in the PS1843 genome but were lost in the lineage leading to the PS312 strain.

Figure 1:

Comparison between the two chromosome-scale P. pacificus genomes. The dotplots visualize the megabase coordinates of alignments between the P. pacificus PS312 chromosomes (x-axis) and their counterparts in the PS1843 genome (y-axis). The whole genome alignments were generated by the minimap 2 aligner (options: -N 0 -t 6 -x asm5), and the coordinates were extracted from the resulting PAF file. The gray dashed lines mark 52 assembly gaps in the PS1843 genome that were filled with undetermined sequence stretches of 100 nucleotides in length.

Characteristics of different P. pacificus genome assemblies. BUSCO values include single copy and duplicated genes.

PS312 (Version El Paco) PS1843 (Prabh and Rödelsperger 2022) PS1843 (This study)
Number of contigs/scaffolds 47 39,807 11
Total genome size (Mb) 155.0 174.1 171.2
N50 (Mb) 23.9 0.4 26.3
BUSCO (genome) (%) 92.6 88.8 92.4
Number of genes 28,996 26,757 30,098
BUSCO (proteins) (%) 97.6 88.8 94.9

The raw sequencing data and the PS1843 genome assembly have been uploaded to the European nucleotide archive under the study accession PRJEB77003. The data are also available on the pristionchus.org webserver. We hope that the improved PS1843 genome will be useful for future forward genetic studies and evolutionary genomic comparisons at the intra-species level.

Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
1 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Biologie, Biologie, andere