Whole Genome Shotgun Sequencing-Based Insights into the Benzene and Xylene Degrading Potentials of Bacteria
Categoria dell'articolo: Original Paper
Pubblicato online: 18 giu 2025
Pagine: 244 - 261
Ricevuto: 07 apr 2025
Accettato: 11 mag 2025
DOI: https://doi.org/10.33073/pjm-2025-020
Parole chiave
© 2025 FATIMA MUCCEE et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Benzene is declared carcinogenic by the International Agency for Research on Cancer (IARC) and the Environmental Protection Agency (EPA) of the USA (Li et al. 2021; Matheson et al. 2023). Xylene is non-carcinogenic with risk class 3 (moderate risk), but when it is found in the form of benzene, toluene, ethylbenzene, and xylene (BTEX), it proves hazardous (Li et al. 2025). These two pollutants adversely affect nervous, respiratory, reproductive, endocrine, urinary and cardiovascular systems (Caron-Beaudoin et al. 2022).
Given the ubiquitous nature, persistence, carcinogenicity, and other associated health hazards, extended range of transport, highly volatile nature, high emission rate, and internal resistance to degradation, benzene and xylene are the significant concerns today. Multiple approaches, including catalytic oxidation, soil vapor extraction, photocatalysis, and thermal incineration, have been employed (Tabakova 2024). However, as per the previously documented research, these conventional methods are not eco-friendly due to high carbon footprint and are highly expensive (Unegg et al. 2023; Sornaly et al. 2024; Kaur and Sood 2025). In comparison, bioremediation using microbes is found to be more significant due to being economically profitable, eco-friendly, capable of degrading the pollutants in dilute amounts, simple operation, least energy consumption, low carbon footprint, and reduced formation of secondary pollutants (Kuppan et al. 2024). Literature documents various case studies that successfully bio-remediated the water and soil contaminated with oil, polycyclic aromatic hydrocarbons (PHCs) and polychlorinated biphenyls (PCBs) (You et al. 2014; Schad 2016). Hence, bioremediation is capable of diluting the pollutants in environmental resources even without the use of conventional techniques. However, this will require optimizing the bioremediation process, selecting the right microorganism, adaptive management, and sufficient ongoing monitoring. In contrast, physicochemical and mechanical methods cannot completely remove pollutants when employed without bioremediation (Sales da Silva et al. 2020). However, a strategic combination of conventional and biological methods improves the outcome synergistically (Aparicio et al. 2022; Kumar et al. 2022; Nagda et al. 2022; Sornaly et al. 2024; Ha and Tan 2025).
Bacteria, adaptable to various environmental conditions and nutritionally versatile, are the most suitable candidates for bioremediation. The biodegradable potential of bacteria is reflected in their genomes. Whole Genome Sequencing (WGS) could provide a holistic view of the genes of these environmentally competent bacteria concerning bioremediation. In addition, it is also helpful in tracking the changes that occur in microbial communities during the biodegradation process, thus providing insights into its effectiveness and possible ways of optimizing the technique (Chettri et al. 2024).
Ortega-González et al. (2013) documented the genes toluene 1,2-dioxygenase (todC1), and xylene monooxygenase (xylM) in a bacterial sample comprising
The current study has targeted the genomes of eleven bacteria, which were identified in two separate projects. One project isolated benzene, and another project targeted the xylene-degrading bacteria. Ten of these eleven isolates belonged to the genus
Although we can find many benzene and xylene-metabolizing bacteria in previously published data, among these, only the pathways of Gram-negative bacteria are well characterized, while Gram-positive, particularly genus
So, considering these constraints of earlier reported bacteria, we initiated a current research project hypothesizing that we might identify unusual and novel pathways and associated genes in benzene and xylene metabolizing bacteria isolated from the tannery industry of Multan.
Ingredients of minimal salt medium (MSM) employed for isolation of benzene metabolizing bacteria included KH2PO4 (1 g/l), Na2HPO4 (1.25 g/l), (NH4)2SO4 (0.5 g/l), MgSO4 (0.5 g/l), CaCl2 (0.5 g/l), FeSO4 (0.005 g/l), and benzene 80 μl/100ml (Dairawan and Shetty 2020). MSM used for isolation of xylene degrading bacteria composed of KH2PO4 (1 g/l), K2HPO4 (1 g/l), NH4NO3 (1 g/l), MgSO4 (0.2 g/l), CaCl2 (0.02 g/l), FeCl3 0.5M (2 drops/l) and xylene 1 ml/100ml (Doley and Barthakur 2022). Following this, 16S rRNA gene sequencing analysis was performed to determine the genus and species of bacteria (Schumann and Pukall 2013). For amplification of the gene, forward (5′AGAGTTTGATCCT-GGTCAGAAC3′ and Tm = 70.4°C) and reverse (5′CGTACGGCTACCTTGTTACGACTTCACCCC3′ and Tm = 74.8°C) primers were used. Growth phases were predicted based on measuring the optical density at 600nm (OD600) spectrophotometrically at different time intervals (0, 3, 6, 24, 27, 30, 46, 49, 51, and 54 hours) of isolates cultures incubated at 37°C and 150 rpm (Brewster 2003). To biochemically characterize the benzene and xylene metabolizing isolates, the Remel™ RapID™ NF PLUS System and the Remel™ RapID™ STR (Thermo Scientific™, Thermo Fisher Scientific, Inc., USA) system were used, respectively (Evangelista et al. 2001; Mayz et al. 2013). The degradation of organic compounds was confirmed via removal assays and gas chromatography-mass spectrometry (GC-MS) analysis (Mohamed et al. 2017; Raju and Kumar 2020).
A benzene removal assay was performed by preparing benzene stock solutions of known concentrations ranging between 5–30 mg/l. The OD180 of these solutions was estimated to plot the standard curve. To find out linearity, linear regression analysis was performed. Bacteria were cultured in MSM containing benzene (40 mg/l) as the only carbon supplement until log phase. Benzene concentration in supernatant was determined by measuring OD180 followed by its comparison with the standard curve (Hussain et al. 2025).
A stock solution of known concentrations of xylene, ranging between 100–900 mg/l, was prepared to perform the xylene removal assay. The OD265 of solutions was spectrophotometrically measured, and a standard curve was plotted. Correlation coefficient (
Cultures of individual bacteria, harvested at the end of the log phase, were pelleted out via centrifugation (6,000 ×
A qualified DNA sample for library construction was selected by running the sample on agarose gel electrophoresis. The 1% agarose gel was prepared by dissolving 0.5 g agarose powder in 50 ml of 1× TAE buffer with pH = 8.0. Heated the mixture for 1 minute and poured it into a gel casting tray containing a comb to create wells. Ethidium bromide was added to the gel mixture as a DNA intercalator. Following solidification, the gel was suspended in 1× TAE buffer contained in the electrophoresis tank. The 5
Nextera® XT DNA Library Preparation Kit (Illumina, Inc., USA) was used to prepare the libraries. The library protocol used was Nextera®XT DNA Library Prep Kit Reference Guide (15031942 v03). Genomic DNA (1 ng) was fragmented and tagged with an adapter. After tagmentation, the Nextera®indexed primer was used to attach indexes with PCR. Finally, qPCR was used to quantify the purified product as per the qPCR quantification protocol guide (KAPA Library Quantification Kits for Illumina Sequencing platforms) and was qualified using the TapeStation D5000 ScreenTape (Agilent Technologies, Inc., USA). Following this, sequencing was performed using HiSeq™ (Illumina, Inc., USA).
Illumina sequencer generated raw images using sequencing control software for system control and base calling through integrated primary analysis software called Real Time Analysis (RTA). The base call (BCL) binary files were converted into FASTq files using bcl2fastq, an Illumina-provided package. Adaptors were not trimmed away from reads. FASTq files were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) to get the accession number assigned.
Raw data comprised the date of total number of bases sequenced, total number of reads, for paired end sequencing it is the sum of read 1 and read 2, ratio of GC content (GC %), ratio of AT content (AT %), ratio of bases that have phred quality score of over Q20 (%) and ratio of bases that have phred quality score of over Q30 (%).
WGS reads were assembled into contigs/scaffolds using a new versatile metagenomic assembler, metaSPAdes (Nurk et al. 2017). It is part of SPAdes (St. Petersburg genome assembler), an assembly toolkit containing various assembly pipelines. The metagenomic contigs generated by the metaSPAdes assembler were distributed in ‘bins’ using the MetaBAT2 program (Kang et al. 2019). MetaBAT2 is a robust statistical framework for reconstructing genomes from metagenomic data. It is an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.
Quality Assessment for Genome Assemblies was done by the Quast program (Gurevich et al. 2013). Parameters considered included contigs with several contigs greater than > 10,000 bp, more than or equal to 5,000, 10,000, 25,000, and 50,000 bp. Number of largest contigs. Total length of contigs. Total length of contigs with more than or equal to 1,000, 5,000, 10,000, and 25,000 bp. Low-quality reads were removed using Trimmomatic v.0.39 (Bolger et al. 2014). Sequences obtained were assessed and subjected to taxonomic and functional profiling.
For taxonomic profiling, annotation of metagenomic contigs was done by the Kraken program. Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences (Wood and Salzberg 2014).
After concatenating qualified reads, the binning approaches MaxBin2 and MetaBAT2 algorithms were employed to reconstruct the good-quality draft genome. By default, values of minimum completion (70%) and contamination (10%) were considered. For functional profiling, metagenomic contigs of each bin were analyzed by Prokka v.1.14.6 (Seemann 2014; Neely et al. 2020). KofamScan v.1.3.0 (Aramaki et al. 2020) was used to annotate Kyoto Encyclopedia of Genes and Genomes (KEGG).
The FASTq files are submitted in Sequence Read Archives, SRA NCBI database, under the accession number PRJNA976120.
Among the total bacteria characterized in the current study, six were benzene-metabolizing and five were xylene-degrading. These were identified as belonging to
GC-MS based profiling identified the metabolic intermediates of multiple previously documented benzene degradation pathways. i.e., benzene methylation pathway, benzene degradation via benzaldehyde and via carboxylation, meta and ortho-cleavage pathways and benzene degradation via phenol. The intermediates were toluene, benzyl alcohol, benzaldehyde, benzoate, catechol,
In xylene degrading bacteria, metabolic intermediates of earlier documented pathways., i.e., xylene degradation via benzoate formation and anaerobic oxidation pathway, were identified. Like, methylbenzyl alcohol, methyl benzaldehyde, methyl benzoate, 1,2-dihydroxy-methyl-cyclohexa-3,5-diene carboxylate,
The quality control analysis of the sample was performed by the 260/280 ratio and agarose gel electrophoresis. The 260/280 ratio was estimated as 1.338. DNA sample resolved on agarose gel is shown in Fig. S1.
In the present study, 14,782,932 reads were produced in the bacterial DNA sample, and total read bases were 2.2 GBp. The GC, AT, Q20, and Q30 contents were estimated as 54.9, 45.1, 93.9, and 86.2 (Table SI and Fig. 1).

Total bases, GC and AT contents, and Q20 and Q30 ratios of total reads calculated during WGS analysis.
The QUAST quality parameters and their values calculated via metaSPAdes program are shown in detail in Table II.
Assigning taxonomic labels to metagenomic contigs showed that 75% of contigs belonged to the genus

Krona image of taxonomic assignments to metagenomic contigs by Kraken.
Gene annotation by the Prokka program revealed 4,992 metagenomic contigs with 27,559,880 bases and 27,021 protein-coding regions. Functional annotation of benzene and xylene degradation genes retrieved from the KEGG database is described in Table III. The genes were found to involve in xylene, benzene, toluene, benzoate, aminobenzoate, and fluorobenzoate.
Among the total genes identified in the current bacterial community, a maximum number of genes were identified to be associated with benzoate and xylene metabolism. i.e., 28 and 14%, respectively. Followed this, genes found were associated with metabolism of xenobiotics by cytochrome P450 (6%), aminobenzoate (6%), chloroalkane and chloroalkene (6%), styrene (6%), dioxin (6%), caprolactum (5%), fluorobenzoate (4%), toluene (4%), steroid (4%), chlorocyclohexane and chlorobenzene (3%), atrazine (3%), naphthalene (3%) and polycyclic aromatic hydrocarbon (2%) (Fig. 3). Stoichiometry of the reactions of pathways associated with the degradation of benzene,

Pathways associated with aromatic compound metabolism, with their respective abundances, explored in a bacterial consortium using whole genome sequencing.
BD – benzoate degradation, ABD – aminobenzoate degradation, FBD – fluorobenzoate degradation, CLAD – chloroalkane and chloroalkene degradation, CLHD – chlorocyclohexane and chlorobenzene degradation, TD – toluene degradation, XD – xylene degradation, SD – styrene degradation, AD – atrazine degradation, CD – caprolactam degradation, DD – dioxin degradation, ND – naphthalene degradation, PAHD – polycyclic aromatic hydrocarbon degradation, SD – steroid degradation, MP450 – metabolism of xenobiotics by cytochrome P450
Stoichiometry of reactions involved in benzene, xylene and toluene degradation identified in the current study bacteria based on WGS.
No. of reaction | Stoichiometry of reactions | |
---|---|---|
Reactants | Products | |
Benzene degradation pathway | ||
I | C6H6 + H2 + CO2 benzene | C7H8O + [O-] |
II | C7H8O | C7H6O + H2 |
III | C7H6O + [O-] | C7H6O2 |
IV | C7H6O2 + [O-] + H2O | C7H8O4 |
V | C7H8O4 | C6H6O2 |
VI | C6H6O2 + 2 [H+] | C6H5O4- + 3 [H+] |
VII | C6H5O4- | C6H6O4 |
VIII | C6H6O4 | C6H5O4+ [H+] |
IX | C6H5O4 | C6H5O52- |
X | C6H5O52- + [H+] + C21H36N7O16P3S | C27H42N7O20P3S + [O] |
XI | C27H42N7O20P3S + 5H2O | C23H38N7O17P3S + 4CO2+ 7H2O |
I | C8H10 + [O-] |
C8H10O |
II | C8H10O + CO2 | C9H10O2 + [O-] |
III | C9H10O2 + 2O2 | C8H9O4 + CO2 + [H+] |
IV | C8H9O4 | C7H8O2 + CO2 + [H+] |
V | C7H8O2 + O2 | C7H8O4 |
VI | C7H8O4 + [O] | C7H8O5 |
VII | C7H8O5 | C7H8O5 |
VIII | C7H8O5 | C6H8O3 + CO2 |
IX | C6H8O3 + H2O | C6H10O4 |
X | C6H10O4 + 2O2 + H2O | 3CO2+ C3H3O3- + H2O |
I | C8H10+ [O] |
C8H10O |
II | C8H10O | C8H8O + H2 |
III | C8H8O + CO2+ H2 | C9H10O2 + H2O |
IV | C9H10O2 + CO2 | C8H8O4 + H2 |
I | C8H10 + [O-] |
C8H10O |
II | C8H10O | C8H8O + H2 |
III | C8H8O + CO2 + H2 | C9H10O2 + [O] |
IV | C9H10O + CO2 | C8H10O4 |
V | C8H10O4 | C7H8O2 + CO2 + H2 |
VI | C7H8O2 + O2 | C7H8O4 |
VII | C7H8O4 + [O-] | C5H6O3 + 2CO2 + H2 |
VIII | C5H6O3 + H2O | C5H7O4- + [H+] |
IX | C5H7O4- + [O-] | C3H3O3- + 2CO2 + 2H2 pyruvate |
Toluene degradation pathway | ||
I | C6H5CH3 + [O] toluene | C6H5CH2OH |
II | C6H5CH2OH benzyl alcohol | C6H5CHO + H2 |
III | C6H5CHO + [O] benzaldehyde | C7H5O2 + [H] |
Statistics of contigs produced by the metaSPAdes program.
No. | Quast quality parameters | Statistics (bp) |
---|---|---|
1 | Contigs > 10,000 bp | 4,992 |
2 | Contigs ≥ 5,000 bp | 940 |
3 | Contigs ≥ 10,000 bp | 431 |
4 | Contigs ≥ 25,000 bp | 169 |
5 | Contigs ≥ 50,000 bp | 76 |
6 | Largest contig | 469,435 |
7 | Total length | 30,570,959 |
8 | Total length ≥ 1,000 bp | 27,559,880 |
9 | Total length ≥ 5,000 bp | 19,294,370 |
10 | Total length ≥ 10,000 bp | 15,759,083 |
11 | Total length ≥ 25,000 bp | 11,859,173 |
12 | Total length ≥ 50,000 bp | 8,511,336 |
13 | N50 | 10,938 |
Functional annotation of genes associated with aromatic compound degradation via the KEGG database.
No. | Associated biochemical pathway | Genes identified |
---|---|---|
1 | Xylene degradation pathway | toluene methyl-monooxygenase [EC:1.14.15.26], |
2 | Toluene degradation pathway | toluene methyl-monooxygenase [EC:1.14.15.26], |
3 | Benzene | benzoate/toluate 1,2-dioxygenase subunit alpha [EC:1.14.12.10], dihydroxycyclohexadiene carboxylate dehydrogenase [EC:1.3.1.25], |
3 | Aminobenzoate degradation pathway | benzaldehyde dehydrogenase (NAD) [EC:1.2.1.28], |
4 | Fluorobenzoate degradation pathway | benzoate/toluate 1,2-dioxygenase subunit alpha [EC:1.14.12.10], |
WGS strategies are proving a source of insights into the biodegradation capabilities of bacteria (Oyewusi et al. 2021). It decodes both the uncultivable and cultivable microbes. The current study attempts to decode the underlying benzene and xylene-metabolizing genes and pathways of cultivable bacterial species. Ten out of eleven bacteria analyzed belonged to the genus
Taxonomic profiling revealed 49%
WGS analysis revealed the enzymes involved in

Reconstruction of ortho, meta and para-xylene degradation pathways based on whole genome shotgun functional annotation.
The
Following this, 3-methylcatechol is produced, which is oxidized into
The
During degradation of
Following this, benzaldehyde dehydrogenase performs oxidation and forms
Pathways identified in the current bacteria involved in benzene, 3-aminobenzene sulfonate, toluene, and 2-, 3-, and 4-fluorobenzoate are shown in Fig. 5. In benzene degradation pathways, benzene is first converted into benzyl alcohol by benzyl alcohol dehydrogenase. Then, benzyl alcohol is converted into benzaldehyde, which is transformed into benzoate by benzaldehyde dehydrogenase. Benzoate is oxidized by benzoate/toluate 1,2-dioxygenase subunit alpha into

Reconstruction of benzene, 3-aminobenzene sulfonate, toluene and 2-, 3- and 4-fluorobenzoate degradation pathways based on whole genome shotgun functional annotation.
During degradation of 3-aminobenzene sulfonate, it is first converted into 3-sulfocatechol in the presence of 2-aminobenzene sulfonate 2,3-dioxygenase. Following this, oxidation occurs by catechol 2,3-dioxygenase and 2-hydroxymuconate is formed. Then, 2-hydroxymuconate tautomerizes into gamma-oxalocrotonate by 2-hydroxymuconate tautomerase. Afterwards, decarboxylation occurs in the presence of 2-oxo-3-hexenedioate decarboxylase, and 2-oxopent-4-enoate is formed, which undergoes hydration to form 4-hydroxy-2-oxopentanoate. The hydration reaction is catalyzed by 2-keto-4-pentenoate hydratase. The 4-hydroxy-2-oxopentanoate is further cleaved into acetaldehyde and pyruvate in 4-hydroxyl-2-oxovalerate aldolase. All the enzymes involved in this pathway, except 2-aminobenzene sulfonate 2,3-dioxygenase, 2-hydroxymuconate tautomerase, and 2-oxo-3-hexenedioate decarboxylase, were identified in the current bacteria.
During toluene degradation, it is initially oxidized by toluene methyl monooxygenase into benzyl alcohol, which is then oxidized into benzaldehyde in the presence of arylalcohol dehydrogenase. Following this, benzaldehyde is converted into benzoate by benzaldehyde dehydrogenase. All three enzymes involved in these steps were identified in the bacteria in the current study. This pathway is consistent with earlier reported toluene degradation pathways in BTEX-degrading
Enzyme benzoate 1,2-dioxygenase subunit alpha that catalyzes the transformation of 2-, 3-, and 4-fluorobenzoate into 6-, 3-, 5-, and 4-fluorocyclohexadiene-
In addition to enzymes associated with the above pathways, some additional enzymes were also identified in bacteria, including amidase, 3-hydroxybenzoate monooxygenase, 4-hydroxybenzoate decarboxylase subunit c, 3-carboxy-

Reactions of benzene and xylene metabolism catalyzed by bacterial enzymes identified in the current study using whole genome shotgun functional annotation.
Enzymes identified in the current study are in accordance with previous literature, as a study focused on WGS of twenty-six different strains of bacteria reported the enzymes associated with benzene, xylene, benzoate, naphthalene, and toluene (Révész et al. 2020; Eze 2021; Bedics et al. 2022; Hossain et al. 2024). Another study reported xylene degrading enzyme catechol 2,3-dioxygenase in
The current project documenting the Gram-positive hydrocarbonoclastic
In addition, these isolates exhibit the coexistence of metabolic pathways for diverse substrates. i.e., benzene, toluene,
Despite these facts, current data have significance because they document Gram-positive bacteria, which can be more potent bioremediating agents than previously reported Gram-negative bacteria. Due to the unique characteristics of Gram-positive bacterial cell walls, including the absence of an outer membrane, they have a greater tendency to uptake organic pollutants and bind with and accumulate heavy metals from external sources (Liaqat and Sabri 2008). These features render them more potent bioremediation candidates than Gram-negative bacteria (Ganesh and Lin 2009; Nanda et al. 2019). In Gram-negative bacteria, efflux pumps are present, which confer them tolerance against various pollutants like organic solvents, heavy metals, dyes, and antibiotics. Hence, they contribute to the survivability of bacteria in the highly polluted environment. According to the literature, the efflux pumps contribute to the bioremediation potentials of bacteria by allowing them to persist in contaminated environments (Blair et al. 2014).
However, bioremediation is considered an eco-friendly mode of pollutant degradation. A bacterium possessing pumps that confer antibiotics and other multi-drug resistance can be a source of spread of multi-drug resistance (MDR) (Cunningham et al. 2020). According to this fact, we cannot use such bacteria as whole cells for eco-friendly bioremediation. Hence, Gram-positive bacteria can be more potent and sustainable bioremediating agents because they can accumulate large concentrations of pollutants in their cells and have the least chance of developing MDR due to the absence of efflux pumps (Biswas et al. 2021).