Comprehensive Analysis of Codon Usage Bias in Human Papillomavirus Type 51
Categoría del artículo: ORIGINAL PAPER
Publicado en línea: 28 oct 2024
Páginas: 455 - 465
Recibido: 14 may 2024
Aceptado: 03 sept 2024
DOI: https://doi.org/10.33073/pjm-2024-036
Palabras clave
© 2024 Xiaochun Tan et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Human papillomavirus (HPV) belongs to the
Furthermore, HPV-51 has been implicated in various anogenital cancers and lesions, such as vulvar, vaginal, penile, and anal cancers (Muñoz et al. 2003). While HPV-16 and HPV-18 are commonly acknowledged as high-risk types, HPV-51 has been identified in a significant proportion of cervical cancer cases globally, emphasizing its relevance in HPV-related diseases. A study highlights that while HPV-51 is less common among high-risk types, it still accounts for a notable percentage of HPV-related cancers (Smith et al. 2007). The prevalence of HPV-51 varies by region, but it has been detected in cervical, anal, and oropharyngeal cancers. While HPV-51 may not be as widespread as some HPV types, its potential to cause cancer underscores its importance in HPV-related disease prevention and management (Smith et al. 2007).
Codon bias is the phenomenon wherein synonymous codons coding for the same amino acid are not uniformly utilized within an organism’s genome (Sharp and Li 1986). This phenomenon is prevalent across diverse organisms, including viruses (Sharp et al. 1988). Understanding codon bias offers insights into evolutionary dynamics, translational efficiency, and host adaptation strategies (Das et al. 2006). Several factors influence an organism’s codon usage pattern, such as the abundance of specific transfer RNAs (tRNAs) for each codon, mutation bias, and natural selection (Plotkin and Kudla 2011). High-frequency codons, commonly found in highly expressed genes, often align with abundant cellular tRNAs, promoting efficient translation. Conversely, low-frequency codons may lead to less efficient translation due to the limited availability of corresponding tRNAs, potentially affecting protein synthesis rates (Koonin and Novozhilov 2009). Viruses often adjust their codon usage to match that of their hosts, which improves viral replication and helps them avoid host immune responses (Hershberg and Petrov 2008; Chaney and Clark 2015). This adaptation underscores the importance of codon usage bias in understanding the evolutionary processes shaping genomes and optimizing heterologous gene expression for biotechnological applications (Quax et al. 2015). Codon usage can affect protein folding, stability, function, and the yield of recombinant proteins in expression systems. Consequently, codon optimization strategies are often employed to boost the expression of foreign genes in host organisms, especially in microbial systems used for biopharmaceutical production (Sharp and Devine 1989).
To date, there have been no reports on the codon preference of HPV-51. Given its clinical relevance and its association with various cancers, understanding its codon usage patterns could offer valuable insights into its evolutionary dynamics and interaction with host cellular machinery. This study aims to fill this gap by providing a comprehensive analysis of the codon usage bias of HPV-51.
We acquired HPV-51 genome sequences from the NCBI GenBank database. Strict criteria were applied to ensure quality: sequences were 7,000–8,500 bp long with < 5 ambiguous sites; those with ≥ 70 consecutive “N” characters or duplicates were removed. Each Coding Sequence (CDS) had to consist of a number of nucleotides divisible by three, starting with ATG and ending with TAG, TGA, or TAA. 64 HPV-51 genomes were analyzed, with accession numbers in the Table SI.
We first looked at the nucleotide content of HPV-51’s coding sequences throughout its genome. Then, we examined the distribution of nucleotides in the third position of synonymous codons, denoted as %A3s, %C3s, %T3s, and %G3s. The examination was carried out utilizing CodonW (
Relative dinucleotide abundance analysis involves examining the frequency distribution of dinucleotide pairs within a given sequence. This analysis provides insights into the compositional biases and potential functional significance of dinucleotide patterns within nucleic acid sequences. The formula for calculating dinucleotide frequency is:
In this equation,
RSCU is a metric used to quantify the frequency bias of synonymous codons in a given gene or genome. It measures the relative abundance of each codon compared to the expected frequency under the assumption of equal usage of synonymous codons for the same amino acid.
The formula for calculating RSCU is:
In this context,
The GRAVY score computes the mean hydrophobicity across all amino acids within a sequence. The score spans from – 2.0 to + 2.0, where positive values denote hydrophobic proteins and negative values signify hydrophilic ones (Kyte and Doolittle 1982). GRAVY values were calculated using
The AROMO score evaluates the frequency of aromatic amino acids, specifically Phe, Tyr, and Trp, within a provided amino acid sequence. AROMO values were calculated using CodonW (
ENC is a metric used to assess how efficiently codons are utilized within a gene, indicating how far it deviates from random selection. The gene’s length does not influence this measure. ENC values range from 20, where each amino acid corresponds to a single effective codon, to 61, indicating equal utilization of codons. Generally, an ENC value below 35 implies a bias towards particular codon preferences (Wright 1990).
The ENC is calculated using a distinct mathematical formula:
Here,
The calculation of
For this investigation, PR2 was employed to assess the influence of natural selection and mutational pressure on codon usage. The horizontal axis represents AT skew [A3/(A3 + T3)], while the vertical axis signifies GC skew [G3/(G3 + C3)]. When data points converge around the midpoint (0.5, 0.5), indicating A = T and G = C, it implies an equilibrium between natural selection and mutational pressure. The dispersion of data points from this midpoint indicates the extent of deviation from PR2 (Sueoka 1995).
The neutrality plot is a valuable tool in evaluating the impact of mutational pressure and natural selection on codon usage bias. It illustrates the average GC content of the first and second positions (GC12s) along the vertical axis, with the GC content of the third position (GC3s) represented on the horizontal axis. A linear regression approach is employed to analyze the data. A slope approaching 1 signifies that mutational pressure predominantly shapes codon usage patterns. Conversely, a slope nearing 0 indicates that natural selection significantly affects codon usage bias (Chen et al. 2014).
The ENC plot illustrates a standard curve representing the correlation between ENC and GC3s with no selective pressures.
The standard curve is determined by the following calculation:
If the ENC value observed falls beneath this curve, it suggests that influences beyond mutation pressure, such as natural selection, play a role in shaping codon usage bias (Wright 1990). This analysis was conducted using R 4.1.1.
CAI elucidates the prevalence of preferred codons within highly expressed genes. Its scale spans from 0.0 to 1.0, where higher scores signify heightened potential for gene expression. Scores nearing 1 denote the utilization of codons with elevated RSCU within the gene. Notably, CAI remains unaffected by sequence length, relying solely on amino acid frequency (Xia 2007). In this study, we calculated the CAI values for the entire coding sequence of HPV-51 using the RSCU reference set tailored for Homo sapiens, obtained from the codon usage database at
Different isoacceptor tRNAs, derived from various tRNA types, crucially bind specific codons corresponding to amino acids, ensuring accurate incorporation into polypeptide chains during translation. The frequency of human tRNA genes is sourced from
The RCDI evaluates the similarity in codon usage between HPV and its human host, indicating the viral genome’s translation efficiency within the host’s genetic framework. A value nearing 1 signifies a significant congruence in codon usage, suggesting a robust adaptation of the virus to its host (Puigbò et al. 2010). RCDI calculations were performed via
The results are shown as means with standard deviations (SDs). Statistical analysis was conducted with SPSS® 22.0 software (IBM® SPSS® Statistics for Windows, version 22.0, IBM Corp., USA).
In the entirety of HPV-51 genomes, the GC3s is below 32%, suggesting a bias towards A/T-ending codons (Table I). Additionally, GC1 is the most prevalent, whereas GC3s is the least. A3s and T3s appear more frequently than G3s and C3s positioned third among synonymous codons.
Nucleotide composition of complete genomes in HPV-51.
Genotype | GC1s* | GC2s | GC3s | T3s | C3s | A3s | G3s |
---|---|---|---|---|---|---|---|
HPV-51 | 46.23 ± 0.27 | 43.26 ± 0.22 | 31.03 ± 0.24 | 43.36 ± 5.56 | 17.08 ± 2.19 | 44.32 ± 5.68 | 19.59 ± 2.51 |
* - the values in the cells were represented as mean% ± SD%
The dinucleotide frequencies observed in HPV-51 differ from the anticipated value of 1.0, indicating unique patterns influenced by translational selection. HPV-51 shows decreased occurrences of ApA, CpG, and TpC (
Fig. 1.
Analysis of dinucleotide frequencies in the complete coding sequences of HPV-51. Dashed lines represent overrepresented values (

HPV-51 exhibited a preference for 30 specific codons (RSCU > 1; TTA[Leu], ACA[Thr], TTT[Phe], GCA[Ala], TAT[Tyr], AGA[Arg], AGT[Ser], GTA[Val], ATT[Ile], CCT[Pro], CCA[Arg], TCT[Ser], TGT[Cys], GAA[Glu], AAA[Lys], ATA[Ile], GAT[Asp], GCT[Ala], AAT[Asn], TTG[Leu], CAA[Gln], GGT[Gly], CGT[Arg], CAT[His], GTG[Val], AGG[Arg], ACT[Thr], GTT[Val], TCA[Ser], GGC[Gly]), with 26 of them ending in A/U. This includes eight over-represented codons (RSCU > 1.6): TTA, ACA, TTT, GCA, TAT, AGA, AGT, and GTA (Fig. 2).
Fig. 2.
Analysis of Relative Synonymous Codon Usage (RSCU) in the complete coding sequences of HPV-51. Codon values less than 0.6, between 0.6 and 1.6, and greater than 1.6 indicate low, normal, and excessive codon usage, respectively.

Among the 59 synonymous codons, there are 15 underrepresented codons (RSCU < 0.6), which include TTC [Phe], CTC [Leu], TCG [Ser], GTC [Val], ATC [Ile], ACG [Thr], TAC [Tyr], CCG [Pro], GCG [Ala], CGG [Arg], GCC [Ala], TGC [Cys], TGC [Cys], CTT [Leu], and CTG [Leu]. All of these codons end with G or C.
Both RSCU and the composition of nucleotides suggest that HPV-51 tends to use codons ending with A/T. This implies that the nucleotide at the third position of codons influences the selection of synonymous codons in HPV-51 (Jenkins and Holmes 2003).
To examine how dinucleotide usage affects codon preference, we compared the occurrence of overrepresented and underrepresented dinucleotides within both favored and less favored codons. All six codons containing ApA showed RSCU values below 1.6, indicating a suppression of this dinucleotide. Five of the eight CpG-containing codons were less represented (CGG, GCG, CCG, ACG, and TCG). Similarly, five of the eight TpC-containing codons were less represented (ATC, GTC, TCG, CTC, TTC), indicating reduced usage of CpG and TpC. For the eight codons with CpA, six (ACA, GCA, CCA, CAA, CAT, TCA) exhibit a preference for synonymous codons. Out of the five codons with TpG, three (TGT, TTG, GTG) predominantly utilized synonymous codons, showing a preference for CpA and TpG. The dinucleotide composition of HPV-51 exhibits a low occurrence of ApA, CpG, and TpC, with CpA and TpG being more prevalent than others. Trinucleotide usage patterns align with these dinucleotide trends, as evidenced by RSCU data. These observations suggest that dinucleotide frequency heavily influences the codon usage preferences in HPV-51.
The PR2 analysis revealed that the codons of the entire genome of HPV-51 are closely clustered around the central point on the scatter plot, where A is approximately equal to T and C is approximately equal to G, this suggests a minor distinction between mutational pressure and natural selection in HPV-51 (Fig. 3).
Fig. 3.
Analysis of Parity Rule 2 (PR2) constructed from the complete coding sequences of HPV-51.
The central point of the plot, represented by coordinates (0.5, 0.5), indicates a balance between mutation and selection rates.

We evaluated the ENC values for the entire genomes of HPV-51 to measure the extent of codon usage bias in its coding sequences. The average ENC was found to be 48.25 ± 0.20, suggesting a predominantly low codon preference (Table II). When analyzing the complete HPV-51 genomes, they cluster slightly below the curve. This indicates that the codon usage pattern throughout the HPV-51 coding genome is notably shaped by natural selection (Fig. 4).
Fig. 4.
Analysis of the Effective Number of Codons (ENC) plot for the complete coding sequences of HPV-51.
ENC values are plotted against the GC3s content. The black line represents the standard curve based on codon usage bias deter-mined solely by GC3s composition. This standard curve illus-trates the expected relationship between ENC and GC3s without selective pressures. The deviation of actual ENC values below this curve indicates a more significant influence of natural selection on codon bias.

The Effective Number of Codons (ENC), Codon Adaptation Index (CAI), and Relative Codon Deoptimization Index (RCDI) values of complete genomes in HPV-51.
Genotype | ENC* | CAI | RCDI |
---|---|---|---|
HPV-51 | 48.25 ± 0.20 | 0.72 ± 0.00 | ± 0.01 |
* - the values in the cells were represented as mean% ± SD%
Neutral analysis offers further insights into the factors shaping the preference of HPV-51 for codon usage. Within the complete coding genomes of HPV-51, natural selection predominantly shapes the pattern, contributing to about 73.37% of the observed bias (Fig. 5).
Fig. 5.
Neutrality analysis of the complete coding sequencesof HPV-51.
The relationship between GC content at the first and second positions of the codon (GC12s) and the third position of the codon (GC3s) was analyzed. The dashed line depicts the correlation between GC12s and GC3s in HPV-51. A slope approaching 1 indicates that mutation pressure predominantly shapes the codon usage pattern, while a slope closer to 0 suggests a more substantial influence of natural selection.

In summary, the codon usage pattern observed in the HPV-51 genome shows a subtle balance between mutational pressure and natural selection, with the latter significantly molding the pattern.
Various isoacceptor tRNAs, originating from different tRNA types, bind to distinct codons representing specific amino acid residues. Table III presents the tRNA gene frequencies in human cells. It’s worth noting that several isoacceptor tRNAs can recognize a single codon, and this phenomenon varies among different species. Recognizing the most frequently used codons by HPV-51 by the most abundant isoacceptor tRNAs influences translation selection (Kumar et al. 2016). For the 18 amino acids encoded by multiple codons, optimal codon-anticodon base pairs are employed for Gly, Val, Leu, Phe, Lys, Asp, Glu, His, Gln, and Cys, indicating higher translation efficiency of HPV-51 within the human cellular translational system.
Frequency of tRNA genes in human cells corresponding to the most preferentially used codons in HPV-51.
Amino acid | Most preferred codons in HPV-51 | tRNA isotypes in human cells | Total count |
---|---|---|---|
Ala (A) | GCG | AGC (22), GGC (0), CGC (4), TGC (8) | 34 |
Gly (G) | GGC | ACC (0), GCC (14), CCC (5), TCC (9) | 28 |
Pro (P) | CCG | AGG (9), GGG (0), CGG (4), TGG (7) | 20 |
Thr (T) | ACG | AGT (9), GGT (0), CGT (5), TGT (6) | 20 |
Val (V) | GTG | AAC (9), GAC (0), CAC (11), TAC (5) | 25 |
Ser (S) | TCG | AGA (9), GGA (0), CGA (4), TGA (4), ACT (0), GCT (8) | 25 |
Arg (R) | AGG | ACG (7), GCG (0), CCG (4), TCG (6), CCT (5), TCT (6) | 28 |
Leu (L) | CTG | AAG (9), GAG (0), CAG (9), TAG (3), CAA (6), TAA (4) | 31 |
Phe (F) | TTC | AAA (0), GAA (10) | 10 |
Asn (N) | AAT | ATT (0), GTT (20) | 20 |
Lys (K) | AAG | CTT (15), TTT (12) | 27 |
Asp (D) | GAC | ATC (0), GTC (13) | 13 |
Glu (E) | GAG | CTC (8), TTC (7) | 15 |
His (H) | CAC | ATG (0), GTG (10) | 10 |
Gln (Q) | CAG | CTG (13), TTG (6) | 19 |
Ile (I) | ATC | AAT (14), GAT (3), TAT (5) | 22 |
Tyr (Y) | TAT | ATA (0), GTA (13) | 13 |
Cys (C) | TGC | ACA (0), GCA (29) | 29 |
Trp (W) | TGG | CCA (7) | 7 |
Met (M) | ATG | CAT (9/10) | 19 |
CAI values assess the expression level of viral proteins within the host and the virus’s adaptability with its host. Sequences with elevated CAI values are deemed better suited for adapting to specific hosts compared to those with lower values. The CAI value for HPV-51 in relation to humans is 0.72 ± 0.00, suggesting a degree of adaptation to human codon preferences (Table II).
Table IV presents a correlation examination among GC3s, CAI, GRAVY, and AROMO. GRAVY and AROMO are markers of natural selection influencing codon bias, while GC3s indicate compositional characteristics (Barbhuiya et al. 2019). A positive correlation between GC3s and AROMO suggests that increased compositional bias is associated with increased aromaticity. The aim of assessing the impact of GRAVY and AROMO, markers of natural selection, on gene expression (represented by CAI), resulted in an investigation into their correlation. However, no significant correlation between CAI, GRAVY, and AROMO was detected.
Correlation analysis between GC3s, CAI, GRAVY, and AROMO.
GC3s | CAI | GRAVY | |
---|---|---|---|
CAI | –0.036 | ||
GRAVY | 0.143 | 0.168 | |
Aromo | 0.818*** | –0.135 | –0.122 |
*** -
Additionally, the RCDI values were utilized to evaluate the similarity in codon usage patterns between HPV-51 and its host. HPV-51 shows RCDI values approaching 2, suggesting a moderate similarity in codon preferences with humans (Table II).
These results indicate that HPV-51 adjusts to the host’s translation process and minimizes competition with cellular proteins. These adaptive strategies may contribute to the persistent presence of these viruses in human hosts.
HPV-51 belongs to the high-risk category of HPV, which is linked to multiple cancer types, such as cervical cancer. Although of clinical importance, scant attention has been paid to studying its codon usage bias. The research presents a comprehensive analysis of codon usage bias in HPV-51, shedding light on its evolutionary dynamics, host adaptation strategies, and potential translational efficiency within human cells. In our study, we thoroughly investigated codon usage patterns by analyzing coding sequences from 64 genomes of HPV-51.
The analysis of codon usage bias in HPV-51 shows a slight preference for A/T-ending codons. We additionally computed RSCU values for 59 synonymous codons, indicating that the favored codons in HPV-51 (RSCU > 1) mainly terminate with A or T. Such bias is commonly associated with the interaction of mutational pressure and natural selection, reflecting the virus’s evolutionary adaptation to optimize its replication and survival within the host (Sharp and Li 1987). The preference for A/T-ending codons may confer several advantages to HPV-51. First, these codons might be more abundant in the human genome, allowing the virus to camouflage its gene expression patterns and evade the host immune response (Karlin and Burge 1996). Second, in genomes with high AT content, a relationship frequently exists between codons ending in A or T and increased efficiencies in transcription and translation processes (Aktürk Dizman 2024). Additionally, the ENC values for HPV-51 suggest a relatively low overall codon usage preference. Identical findings of low codon usage bias were reported in other viruses like Zika virus (Tao and Yao 2020), molluscum contagiosum virus (Nair et al. 2021), and invertebrate iridescent virus (Aktürk Dizman 2023). The ENC value is an indicator of codon bias, with higher values signaling reduced bias and lower values suggesting a more pronounced bias towards particular codons. Previous studies have shown that viruses with higher ENC values tend to exhibit increased adaptability (Jenkins and Holmes 2003). Furthermore, a lower codon usage bias, as indicated by higher ENC values, may allow HPV-51 to conserve energy during viral biosynthesis and diminish interference with host protein synthesis, enhancing its overall fitness within the host environment (Wright 1990).
The frequency of dinucleotides significantly influences the preferences for codon usage in various organisms. Dinucleotide abundance reflects codon usage bias, mutational tendencies, and the impact of selective pressure and compositional constraints. Within viral genomes, dinucleotides’ proportional distribution significantly influences codon usage patterns (Upadhyay and Vivekanandan 2015). In our research, we examined 16 different dinucleotide frequencies. Notably, dinucleotides ApA, CpG, and TpC were less common, whereas CpA and TpG were more frequently observed than others. CpG depletion is a common viral strategy to evade host immune detection (Karlin and Burge 1995). Natural selection drives this reduction in CpG-containing codons, as host defenses recognize unmethylated CpG sequences, triggering immune responses (Khandia et al. 2019). To evade these defenses, viruses often reduce their CpG content (Vetsigian and Goldenfeld 2009). The increase in CpA and TpG may be compensatory, balancing the reduced CpG levels (Rima and McFerran 1997). Comparing dinucleotide frequencies with RSCU values of HPV-51 reveals that trinucleotide usage often mirrors dinucleotide preferences. The abundance or scarcity of specific dinucleotides can significantly shape the overall codon usage bias (Karlin and Burge 1995). In summary, dinucleotide frequencies are foundational elements shaping codon usage bias, reflecting the complex interplay between viral evolutionary pressures and host interactions in HPV-51.
Then, to comprehensively understand the factors shaping codon usage in HPV-51, we conducted PR2 analysis, ENC plots, and neutrality analysis. These analyses revealed a slight bias in codon usage, indicative of the intertwined implications of mutational pressure and natural selection. Interestingly, natural selection has emerged as the primary factor influencing codon preferences within HPV-51. This phenomenon likely signifies the virus’s adjustment to its host’s cellular environment and ability to evade the host’s immune responses. This preference might optimize viral protein synthesis and facilitate viral persistence. Our results align with previous research emphasizing the significance of natural selection in molding viral codon usage (Shackelton and Holmes 2008).
Additionally, we conducted an analysis of translational selection in HPV-51. In HPV-51, the most preferred codons utilize approximately half of the optimal codon-anticodon base pairings available in the human isoacceptor tRNA Pool. This preference for specific codons might indicate evolutionary pressures that optimize translation efficiency and accuracy (Kanaya et al. 2001). Utilizing the most compatible codons with the available tRNA molecules can enhance translation rates and reduce errors during protein synthesis (Ikemura 1985). The presence of approximately half of the optimal codon-anticodon pairs in the human isoacceptor tRNA pool facilitates efficient and accurate protein synthesis, thereby contributing to the organism’s overall fitness. This preference for specific optimal codons is believed to result from co-evolution between codon usage and tRNA availability, ensuring an optimal balance between translational speed and accuracy (Akashi 2001).
The positive correlation between GC3s and AROMO values suggests an intriguing relationship between compositional bias and amino acid aromaticity. This correlation implies that an increased compositional bias towards GC-rich codons is associated with higher aromaticity in proteins. One possible explanation for this correlation could be that GC-rich codons tend to encode aromatic amino acids more frequently, leading to proteins with higher aromaticity. Additionally, the selection pressures that favor GC-rich codons might also indirectly select proteins with increased aromaticity due to functional advantages conferred by aromatic amino acids (Duret 2002). The lower GC codon content in HPV-51, which is influenced more significantly by natural selection, supports this explanation. The interplay between codon bias, compositional, and amino acid properties is complex and multifaceted. Further research is needed to explore the mechanisms underlying these correlations and to elucidate the functional implications of increased aromaticity in proteins.
Finally, we evaluated the adaptation of HPV-51 to its human host using CAI and RCDI values. The CAI value for HPV-51 was determined to be 0.72 ± 0.00, indicating a moderate alignment with human codon preferences. This suggests that the virus has undergone adaptive evolution to utilize the host’s translational machinery efficiently. The RCDI value for HPV-51 was close to 2, further supporting its moderate adaptation to the human host. The concept of codon adaptation is crucial for understanding how viruses interact with their hosts at the molecular level. Viruses often optimize their codon usage to adapt to the translational machinery of the host cell, facilitating efficient production and duplication of viral proteins (Sharp et al. 2005). A high CAI value signifies a strong adaptation to host codon usage, whereas a low value may indicate a lack of adaptation or evolutionary divergence from the host. The RCDI value provides additional insights into the adaptation process by measuring the extent of codon deoptimization. A higher RCDI value suggests that the virus has minimized the use of non-preferred codons, which can lead to slower translation rates and increased competition with host mRNAs for ribosomal machinery (Plotkin and Kudla 2011). Therefore, an RCDI value close to 2 for HPV-51 indicates that the virus has struck a balance between efficiently utilizing the host’s translational machinery and minimizing excessive competition with host mRNAs Our findings suggest that HPV-51 has undergone adaptive evolution to align its codon usage with human preferences moderately. This adaptation likely contributes to the virus’s ability to efficiently replicate and propagate within the human host, highlighting the importance of codon adaptation in viral-host interactions.
Our previous research analyzed codon preference in HPV-33 and HPV-58 (Tan et al. 2024). Now, we compare the codon preference results of HPV-51 with those of HPV-33 and HPV-58. All three types – HPV-33, HPV-58, and HPV-51 – show a preference for codons ending in A or T, with GC3s values below 35%. The analysis of RSCU of HPV-31 revealed 30 favored codons, 26 concluding with A/T nucleotides, while among the 26 favored codons in HPV-33 and HPV-58, 25 concluded with A/U. The mean ENC values were 44.19 for HPV-33, 46.90 for HPV-58, and 48.25 for HPV-51, all above 35, indicating a relatively low level of codon preference. HPV-33, HPV-58, and HPV-51 exhibit specific patterns in dinucleotide frequency, with lower frequencies of ApA, CpG, and TpC, and higher frequencies of CpA and TpG. Analyses, including the ENC plot, PR2, and neutrality analysis, suggest that natural selection is the primary factor influencing codon usage in HPV-33, HPV-58, and HPV-51. The RCDI values for HPV-33, HPV-58, and HPV-51 are around 2, indicating a similar degree of codon usage preference to humans. These comparisons suggest that HPV-33, HPV-58, and HPV-51 share similar codon preferences, indicating common evolutionary mechanisms and selective pressures influencing codon usage.
While our study provides valuable insights into HPV-51’s codon usage bias, it is not without limitations. The analysis is based on genomic sequences from the NCBI GenBank database, which may not capture all HPV-51 types or regional variations (Van Doorslaer et al. 2013). Future studies could benefit from analyzing additional HPV-51 genomes, from diverse geographic regions and clinical samples, to further elucidate its codon usage patterns and evolutionary dynamics (Xi et al. 2006). Our thorough examination of codon utilization bias in HPV-51 reveals evolutionary patterns, host adaptation strategies, and translational efficiency within human cells. These findings enhance our understanding of HPV-51’s persistence, pathogenicity, and potential implications for HPV-related diseases.