Signal transducer and activator of transcription factors (STAT) regulate key aspects of cell growth, survival, and differentiation. STAT pathway is linked to the Janus kinases (JAK) family of proteins, i.e., STATs are activated by phosphorylation by JAKs. STAT1, STAT2, STAT3, STAT4, STAT5a, STAT5b, and STAT6 are members of the mammalian STAT family [1]. Each member responds to different kinds of cytokines and growth factors and has a distinct role in signal transduction. When certain chemical signals activate STAT proteins, they move into the nucleus of the cell and bind to specific areas of DNA. STAT3 is found in all of the body's tissues and plays a significant role in the development and function of various physiological systems. The protein is involved in the regulation of inflammation, which is one of the ways the immune system responds to infection or injury. STAT3 activation occurs as a result of cytokine binding and is required for cellular responses via JAK1, JAK2, or TYK2. The non-classical functions of STAT3 include the induction of microRNAs, binding to non-canonical motifs, formation of more complex signaling cascades such as STAT tetramers, and contributing to epigenetic remodeling [2].
The major disorders associated with
Single nucleotide polymorphism (SNP) is considered for over 90% of sequence variations in the human genome [8] and plays an important role in investigating potential biomarkers and identifying common genetic variants. These SNPs may have a deleterious or neutral effect on protein function associated with a variety of diseases and disorders. Missense variants, by substituting amino acids, cause alterations in protein-coding regions [9]. In silico methods have been developed for screening functional SNPs, detecting the effect of damaging non-synonymous single nucleotide polymorphisms (nsSNPs) in selected proteins, and predicting structural changes based on single amino acid substitution in the protein. It is a time and cost–effective alternative to experimental techniques and has already incorporates the process of screening for deleterious nsSNPs, and thus could be used in future studies.
In this study, detailed investigations have been carried out using several in silico tools to evaluate the potentially detrimental effects on nsSNPs of the
Various in silico tools were used to predict the variations in the structure, stability, and function of the
The nsSNP distribution of the
To estimate the functional repercussions of nsSNPs in the coding area acquired from the dbSNP database, 7 best-performing web tools were employed sequentially.
Sorting Intolerant From Tolerant (SIFT;
Polymorphism Phenotyping v2 (PolyPhen2;
Based on the alignment score of a protein, Protein Variation Effect Analyzer (PROVEAN;
Based on evolutionary relationships, molecular activities, and interactions with other proteins, Protein Analysis through Evolutionary Relationship (PANTHER;
SNPs&GO (
Predictor of human Deleterious Single Nucleotide Polymorphisms (PhD-SNP;
I-Mutant 3.0 (
The ConSurf web server (
NetsurfP-2.0 [19] predicts the accessibility of solvents, secondary structure, disorders, and phi/psi dihedral angles of amino acids in an amino acid sequence. The FASTA format of the STAT3 protein sequence was submitted as an input query for this server.
HOPE (
SOPMA (
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING;
Raptor X gives the prediction of the secondary and tertiary structure of the protein, contact and distance maps, solvent accessibility, disordered regions, functional annotation, and binding sites based on a 3D model. Binding site prediction is calculated by the pocket multiplicity, which determines the quality of the projected pocket. The predicted pocket is more accurate when the score is greater than 40.
GeneMANIA [22] (
AlphaFold (
The impact of a single point mutation on the stability, conformation, flexibility of proteins, and the visualization of protein dynamics, were evaluated using the DynaMut server (
All the SNP data for the
Various in silico prediction tools such as SIFT, Polyphen-2, PROVEAN, SNP & GO, PANTHER, and PhD–SNP were used to analyze disease-associated SNPs. Initially, all the 417 nsSNPs were loaded to the SIFT server, which predicted 160 nsSNPs as tolerated or deleterious, whereas the remaining SNPs were not found. Out of this, 67 nsSNPs were classified as deleterious with a SIFT score ≤0.05 and the remaining were tolerated. The nsSNPs were then submitted for PolyPhen2 analysis. To increase the accuracy of the prediction, the combined prediction of both SIFT and PolyPhen, such as the nsSNPs with SIFT score ≤0.05 and PolyPhen score >0.90, was selected and 9 nsSNPs were identified as deleterious. The selected 9 nsSNPs were subjected to other in silico tools, namely PROVEAN, SNP & GO, PANTHER, and PhD–SNP, and the results are given in
List of nsSNPs of
1 | rs145786768 | C/A | V507F | Deleterious (0.004) | Probably damaging (0.990) | Probably damaging | Disease (9) | Deleterious (−3.744) | Disease (6) | Decrease (−2.49) |
2 | rs193922716 | G/A | R335W | Deleterious (0) | Probably damaging (0.996) | Probably damaging | Disease (3) | Deleterious (−5.816) | Neutral (1) | Decrease (−0.36) |
3 | rs193922717 | C/T | E415K | Deleterious (0.003) | Probably damaging (0.955) | Probably damaging | Disease (5) | Deleterious (−3.097) | Neutral (0) | Decrease (−1.00) |
4 | rs193922719 | T/A | K591M | Deleterious (0.002) | Possibly damaging (0.751) | Probably damaging | Disease (8) | Deleterious (−4.949) | Disease (6) | Decrease (−0.13) |
5 | rs1803125 | G/T | Q32K | Deleterious (0.025) | Possibly damaging (0.868) | Probably damaging | Disease (0) | Neutral (−1.975) | Disease (3) | Decrease (−0.41) |
6 | rs11547455 | G/A | S629F | Deleterious (0.001) | Possibly damaging (0.481) | Probably damaging | Disease (5) | Deleterious (−3.097) | Disease (1) | Increase (0.64) |
7 | rs11547455 | G/A | S727F | Deleterious (0.002) | Probably damaging (0.974) | Probably damaging | Neutral (0) | Deleterious (−3.858) | Neutral (1) | Decrease (−0.20) |
8 | rs374063766 | C/G | Q198H | Deleterious (0.035) | Probably damaging (0.965) | Probably damaging | Neutral (3) | Neutral (−1.942) | Neutral (3) | Decrease (−0.82) |
9 | rs11547455 | G/A | S727F | Deleterious (0.002) | Probably damaging (0.974) | Probably damaging | Neutral (0) | Deleterious (−3.858) | Neutral (1) | Decrease (−0.20) |
nsSNPs, non-synonymous single nucleotide polymorphisms; PANTHER, Protein Analysis Through Evolutionary Relationship; PhD-SNP, Predictor of human Deleterious Single Nucleotide Polymorphisms; RI, reliability index; SNP, single nucleotide polymorphism.
According to PROVEAN results, 6 nsSNPs were predicted as disease-causing and 3 were neutral. Through the PANTHER tool, 6 nsSNPs were predicted as probably damaging, 1 was possibly damaging, and 1 was found to be benign. Moreover, SNP & GO predicted 6 of these 9 nsSNPs as disease-relevant mutations in
Through the ConSurf web server, the evolutionary conservancy of 6 nsSNPs of STAT3 protein was analyzed, together with the identification of putative, structural, and functional residues. The results showed that all the 6 nsSNPS, namely V507F, R335W, E415K, K591M, F561Y, and Q32K, were highly conserved and the variants are indicated in black boxes in
Evolutionary conservancy of STAT3 by ConSurf server. The high-risk nsSNPs are denoted by black boxes. nsSNPs, non-synonymous single nucleotide polymorphisms; STAT, signal transducer and activator of transcription factors.
Analysis of evolutionary conservation profile of high-risk nsSNPs of STAT3 by ConSurf
V507F | 8 | Buried | - |
R335W | 9 | Exposed | Functional |
E415K | 8 | Exposed | - |
K591M | 9 | Exposed | Functional |
F561Y | 8 | Exposed | Functional |
Q32K | 7 | Exposed | - |
nsSNPs, non-synonymous single nucleotide polymorphisms; STAT, signal transducer and activator of transcription factors.
The variants with high conservation scores in ConSurf output were assessed for solvent accessibility, stability, and secondary structure prediction by NetsurfP-2.0. The results are displayed in
NetsurfP-2.0 prediction based on relative solvent accessibility, stability, and secondary structure prediction
V507F | Buried | 13% | 20 Å | α helix | −64° | −44° |
R335W | Exposed | 44% | 101 Å | Coil | −110° | 137° |
E415K | Exposed | 52% | 90 Å | Strand/β sheet | −108° | 137° |
K591M | Exposed | 37% | 77 Å | α helix | −57° | −41° |
E594K | Buried | 6% | 10 Å | α helix | −66° | −41° |
F561Y | Buried | 4% | 8 Å | α helix | −60° | −38° |
R609S | Buried | 9% | 22 Å | Strand/β sheet | −112° | 132° |
ASA, absolute surface accessibility; RSA, Relative surface accessibility.
HOPE showed the difference between the wild-type and mutant amino acids in connection with their physical and chemical properties, hydrophobicity, spatial structure, and function. Project HOPE server revealed that the mutant residues V507F, R335W, E415K, F561Y, and Q32K were bigger than the wild-type residues whereas the mutant residue of K591M was smaller than the wild-type. In addition, the mutant residue of R335W, K591M, and F561Y is more hydrophobic than the wild-type. The change in the size and hydrophobicity of the mutant residue can disrupt the H-bonding interaction with neighboring molecules. Moreover, the mutation at 335, 591 positions the wild-type residue charge was lost, whereas at 415th position the mutant introduced an opposite charge; further, the mutant also introduced a charge at position 32. The variation in charge can cause a loss of interaction with other molecules.
Structural variation of the wild-type and mutant residues by Project HOPE. The wild-type residue is presented as green and the mutant residue is shown in red.
SOPMA predicted the secondary structure of STAT3, which explained the distributions of alpha-helix, beta-sheet, and random coil. The secondary structure prediction of STAT3 by SOPMA is shown in
Secondary structure prediction and calculations using SOPMA. The SOPMA program predicts the secondary structure of the STAT3 protein. The black boxes represent wild amino acids that might be altered by STAT3 pathogenic nsSNPs. Alpha helix, extended strand, beta turn, and random coil are all represented by the letters “h,” “e,” “t,” and “c,” respectively. nsSNPs, non-synonymous single nucleotide polymorphisms; STAT, signal transducer and activator of transcription factors.
Protein–protein interaction network of
STAT3 ligand binding sites were predicted using the RaptorX Binding server. A pocket multiplicity number greater than 40, according to the RaptorX Binding server, suggests an accurate prediction. In the STAT3 protein, there were only 1 processed domain and 2 predicted pockets, one with a multiplicity of 11 that binds to R382 L430 I431 S465 N466 Q469 and the other with a multiplicity of 3 that binds to M331 H332 I467. The server predicted that 128 locations were disordered. RaptorX also predicted the secondary structure showing 45% helix, 13% sheets, and 41% coil.
3D model and binding site residues for domain 1 predicted by RaptorX.
GeneMANIA server predicted the functional gene–gene interaction network of
Gene–gene interaction network of
The overall confidence for the whole protein chain is shown by the average pLDDT scores across all residues The Alpha-Fold algorithm produces a pLDDT score for each individual residue that ranges from 0 to 100. In STAT3 3D structure, a very high degree of confidence (pLDDT >90) was obtained and majority of the 3D structural area belongs to α-helical domains (
3D structure prediction of STAT3 by AlphaFold. STAT, signal transducer and activator of transcription factors.
DynaMut calculates the effect of point mutations on protein stability and flexibility based on interatomic interactions, as summarized in
Visual representation of protein flexible conformation based on the vibrational entropy difference (ΔΔS) and the interatomic interaction between wild-type and mutant structures on STAT3 structure. Amino acids colored according to the vibrational entropy change upon mutation. BLUE represents a rigidification of the structure and RED a gain in flexibility. WT and MT residues are depicted as light-green sticks alongside the surrounding residues that are involved in any form of interaction.
Prediction of protein stability using DynaMut server
Q32K | 0.087 (Stabilizing) | −0.009 kcal/mol (Destabilizing) | −0.453 (Destabilizing) | 0.030 (Stabilizing) | −0.019 (Destabilizing) | 0.012 | Increase of molecule flexibility |
F561Y | −0.705 (Destabilizing) | −0.089 (Destabilizing) | −0.673 (Destabilizing) | −1.030 (Destabilizing) | −0.560 (Destabilizing) | 0.111 | Increase of molecule flexibility |
K591M | 0.057 (Stabilizing) | 0.016 (Destabilizing) | 0.335 (Stabilizing) | 0.130 (Stabilizing) | 0.485 (Stabilizing) | −0.019 | Decrease of molecule flexibility |
E415K | 0.107 (Stabilizing) | 0.023 (Destabilizing) | −0.461 (Destabilizing) | −0.150 (Destabilizing) | −0.062 (Destabilizing) | −0.029 | Decrease of molecule flexibility |
V507F | 0.627 (Stabilizing) | 0.195 (Destabilizing) | −1.244 (Destabilizing) | −1.190 (Destabilizing) | 1.458 (Destabilizing) | −0.244 | Decrease of molecule flexibility |
R335W | 0.050 (Stabilizing) | −0.172 (Destabilizing) | −0.198 (Destabilizing) | −0.010 (Destabilizing) | −0.447 (Destabilizing) | 0.215 | Increase of molecule flexibility |
ENCoM, elastic network contact model; NMA, normal mode analysis.
In mammals, several STAT proteins play an important role in host defense. Despite its importance, STAT3 is linked to the JAK family of proteins and is capable of integrating signals from several signaling pathways. STAT3 that is persistently active has been associated with various malignancies, though it is most commonly associated with head and neck cancer and multiple myelomas [1]. The occurrence of deleterious SNPs in various disease-related genes has made in silico analysis of deleterious SNPs from large databases a major concern in recent years [28]. Prioritization of the most deleterious SNPs enables their use as markers in genetic disease screening and helps in the formulation of personalized treatment strategies. Studying the structure and function of the
In this study, by merging numerous in silico-based SNP prediction algorithms, we projected 6 deleterious SNPs that are considered high-risk and valid, and they were further examined. The nsSNPsrs145786768, rs193922716, rs193922717, rs193922719, rs1064116, and rs180312 with amino acid changes V507F, R335W, E415K, K591M, F561Y, and Q32K were considered deleterious using 6 in silico SNP prediction tools SIFT, Polyphen-2, PANTHER, SNP&GO, PROVEAN, and PHD -SNP. Based on the conservation profile, structural conformation, relative solvent accessibility, secondary structure prediction, and protein–protein interaction, all of the 6 nsSNPs were identified as the most deleterious nsSNPs. Protein stability is essential for a protein's structural and functional activity. The I-Mutant tool revealed that the predicted 6 nsSNPs on the STAT3 protein had decreased stability. Protein stability determines the conformational structure and function of the protein. Misfolding, degradation, or abnormal protein conglomeration can be affected by changes in protein stability [29].
Conserved residues of the protein were involved in biological system management, including stability, folding, or both of these. Functional amino acids that are present in enzymatic sites interact with the proteins in a significant manner and when compared to other residues in the protein, these residues are found to be more conserved [30]. The evolutionary conservation profile of STAT3 protein by ConSurf revealed that the 570, 335, 415, 591, 561, and 32 amino acid positions were located in highly conserved regions and the 507 position was predicted to be structural and buried whereas the rest of the positions were functional and exposed. Combining the result of both ConSurf and I-Mutant, the predicted 6 nsSNPs are potentially high-risk variants due to their ability to decrease the protein stability and higher conservancy. The Project HOPE results indicate that the wild-type residue Valine at position 507 and Phenylalanine at position 561 are highly conserved and that neither the mutant nor another residue with similar properties could be observed in other homologous sequences, which justifies the inference that the mutations are probably damaging to the protein.
The exposed variants were located on the surface of the protein, which might lead to loss of interactions and structural alterations, particularly in the transmembrane domains. Repulsion, misfolding, and loss of interactions may occur from the addition or removal of charge or hydrophobicity. SOPMA secondary structure calculations showed that the deleterious nsSNPs are mainly found in helix and coil regions rather than β turns. RaptorX predicted the high-risk nsSNPs as ligand binding locations.
Protein–protein interaction network is an important factor in understanding biological processes. Using STRING, the functional genomics data and structural assessment, functional, and evolutionary features of STAT3 protein were analyzed [31]. EP300, PIAS3, IL10RA, JAK1, JAK2, EGFR, HSP90AA1, SRC, and NANOG were found to have strong functional association with STAT3 protein. Amino acid change can alter the structure of a protein, and consequently its function. As a result, the variant protein with deleterious SNPs may interact with other proteins, and thereby cause phenotypic changes and protein expression [32]. The structural domains of STAT3 involved in protein–protein interactions allow selective inhibition of a group of
Deleterious SNPs of the
The transcription factor