Cite

Introduction

Primary hyperparathyroidism (PHPT) is a disease caused by the autonomous over-secretion of parathyroid hormone (PTH) due to adenoma and hyperplasia of the parathyroid gland, furthermore it has an estimated prevalence of 6.7 per “1,000” population. The majority of PHPT cases are not inherited, while some are due to genetic mutations, it include multiple endocrine neoplasia (MEN), familial hypocalciuric hypercalcemia, neonatal severe hyperparathyroidism, familial isolated hyper parathyroidism and hyperparathy-roidism-jaw tumor (HPT-JT) syndrome (caused by mutations in the CDC73 gene) (1, 2, 3, 4, 5).

Hyperparathyroidism-jaw tumor HPT-JT (OMIM #145001) first observed by Jackson in 1958 .It is an autosomal dominant disorder with variable expression (6, 7, 8). It includes parathyroid adenomas, fibro-osseous jaw tumors, uterine tumors and renal diseases such as hamartomas, polycystic disease and Wilms tumors or adenocarcinoma. Diagnosis of HPT-JT is important because of its genetic involvement and the 24% chance of malignant transformation (9, 10). The nuclear medicine imaging - especially the scintigraphy parathyroid with 99m Tc-MIBI (methoxyisobutyl-isonitrile)- has an important role in establishing the diagnosis (11).

CDC73 -related (HPT-JT) syndrome results from truncating (80%) or missense variants in the tumor suppressor Cell Civision Cycle protein 73 (CDC73) gene (also known as HRPT-2), that is located at chromosome 1q31.2, and encodes the 531-amino acid protein parafibromin (12, 13, 14, 15, 16). this protein regulates transcriptional and post-transcriptional events. The majority of the CDC73 gene mutations related to hyperparathyroidism and parathyroid carcinoma are frame-shift, nonsense or missense occurring within the protein-encoding exons.(16) It has been estimated that 70% of patients suffering from this mutation may develop PHPT (17).

In vitro characterization studies, is a demanding approach requiring a lot a lot of resources, time, and fees. For these reasons, bioinformatics analysis is an appropriate, fast, dependable and cost effective approach to enhance our understanding of the role of mutations in the pathogenesis. Insilco or bio-informatics analysis has become an important tool in the recent years that help in the advancement of biological research through the use of different algorithms and databases (18). Many researches about the role of CDC73 gene in the development of (HPT-JT) syndrome have focused in the deletion type of mutation (5, 19, 20), yet few of them have study single nucleotides polymorphisms (SNPs). This study is unique because it is the first in silico analysis of CDC73 gene associating it with jaw tumor syndrome. Therefore, the aim of this study is to assess the effect of mutational SNPs on CDC73 gene on the structure and function of parafibromin protein using different bioinformatics tools.

Methodology
Source of retrieving nsSNPs

The SNPs related to the human gene CDC37 were obtained from single nucleotide database (dbSNP) in the National Center for Biotechnology Information (NCBI) web site. www.ncbi.nlm.nih.gov And the protein sequence with ID Q6P1J9 was obtained from UniProt database. www.uniprot.org

Severs used for identifying the most damaging and disease related SNPs

Six server were used for assessing the functional and disease related impact of deleterious nsSNP:

SIFT (Sorting Intolerant From Tolerant)

SIFT is the first online server that was used in our assessment. It predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. Sift score range from 0 to 1. The amino acid substitution is predicted deleterious if the score is ≥ 0.05, and tolerated if the score is ≥ 0.05. The protein sequence that obtained from UniProt and the substitutions of interest were submitted to SIFT server, then according to the score the substitution will be either deleterious or tolerated. The deleterious SNPS were further evaluated (21). It is available online at www. sift.bii.a-star.edu.sg/

Polyphen2 (prediction of functional effect of human nsSNPs)

PolyPhen-2 (Polymorphism Phenotyping v2) it is another online server, available as software and via a Web server predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification (22).

The output of the PolyPhen-2 prediction pipeline is a prediction of probably damaging, possibly damaging, or benign, along with a numerical score ranging from 0.0 (benign) to 1.0 (damaging). This three predictions means that, when the prediction is “probably damaging” indicates damaging with high confidence, “possibly damaging” indicates damaging with low confidence, and “benign” means that the query substitution is predicted to be benign with high confidence. It is available online at http://genetics.bwh.harvard.edu/pph2/ Only the damaging SNPs further evaluated.

PROVEN (protein variation effect analyzer)

Was the third software tool used which predicts whether an amino acid substitution has an impact on the biological function of a protein. PROVEAN is useful for filtering sequence variants to identify nonsynonymous or indel variants that predicted to be functionally important. The result obtained from this web site is either deleterious when the score is ≤-2.5, and neutral if the score above -2.5 (23).. Moreover, the deleterious prediction considered for further evaluation. It is available at http://provean.jcvi.org/index.php

SNAP2

It is another tool to predict functional effects of mutations. SNAP2 is a trained classifier that based on a machine-learning device called “neural network”. It distinguishes between effect and neutral variants/non-synonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. In addition, structural features such as predicted secondary structure and solvent accessibility are considered. If available also annotation (i.e. known functional residues, pattern, regions) of the sequence or close homologs pulled in. In a cross-validation over 100,000 experimentally annotated variants, SNAP2 reached a sustained two-state accuracy (effect/ neutral) of 82% (at an AUC of 0.9) .(24) We submit the reference sequence and then recorded the relevant result. It is available at https://rostlab.org/services/snap2web/

SNPsGO

Is a web server for predicting disease-associated variations from protein sequence and structure. SNPs&GO is an accurate method that, starting from a protein sequence, can predict whether a variation is disease related or not by exploiting the corresponding protein functional annotation. (25) We submitted the reference sequence in FASTA format in the search bar. The result we obtained from this server consist of three different analytical algorithms; PHD, SNP&GO, and Panther. The output consist of a table listing the number of the mutated position in the protein sequence, the wild-type residue, the new residue and if the related mutation is predicted as disease-related (Disease or as neutral polymorphism Neutral) and The RI value (Reliability Index). It is available at http://snps.biofold.org/snps-and-go/index.html

P-Mut

It is a web-based tool for annotation of pathological variant on proteins. P-Mut Web portal allows the user to perform pathology predictions, to access a complete repository of pre-calculated predictions, and to generate and validate new predictors. It require the reference sequence of the related gene to do these functions. The P-Mut portal is freely accessible at http://mmb.irbbarcelona.org/PMut (26).

After using all these analytical algorithms for the prediction. The resulted SNPs were further analyzed by another tool called I-mutant.

I-mutant (predictors of effect of single point protein mutation)

It is an online server to predict the protein stability change upon single site mutations starting from protein sequence alone or protein structure when available. It is freely available at (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) (27). We select the protein sequence tool on the website, and then we submitted the reference sequence without the heading. In addition to that, we inserted the positon of the intended SNPs with the related new residue, after which we recorded the related result.

Gene mania

it is a free online server that used to predict the function of gene and its interaction using very large set of functional associated data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity (28). It is available at (http://www.genemania.org/) We typed the gene name in the search bar then download the following data (Network image, Gene’s data, Functions data and Interactions data).

Hope project

A webserver used to analyze the effect of single point mutation on structural level of protein. It is the best way to visualize the mutation since it creates a report consist of figures, animation, 3d structure and mutation report just by submitting the protein sequence and mutation (29). It is available at http://www.cmbi.ru.nl/hope/ W submitted the reference sequence along with the wild and mutant amino acids of the eleven final SNPs. After that, we downloaded the related reports.

Chimera 1:10:2

Ucsf chimera is a computer program that used to visualize the interaction and molecular analysis including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. It is also allow creating movies. Chimera (version 1:10:2) software was used to scan the 3D (three-dimensional) structure of specific protein, and then modification was made to the wild type to show the difference after mutation and a graphic view was made for each mutation change (30). (http://www.cgl.ucsf.edu/chimera/)

COSMIC v86

The Catalogue of Somatic Mutations in Cancer is one of the most comprehensive resource for exploring the impact of somatic mutations in human cancer. It covers both coding and non-coding mutations. Since it is primarily hand-curated, this ensure the high quality and accuracy of the results (31). The gene name (CDC73) was used to search for the associated variants. Then we search for the resulted mutation type for each of the eleven SNPs and record it on the paper. COSMIC is available at (https://cancer.sanger.ac.uk/cosmic)

dbSNP Short Genetic Variations

This is a tool available at NCBI, that use other database to calculate the alleles frequency (like GnomAD_exome and ExAC) (32). We used it to study the global allele’s frequency in CDC73 gene by submitting the rs number of each of the eleven substitution and then recording the resulted frequency in a table. It is available at (https://www.ncbi.nlm.nih.gov/snp/rs759222387)

Results

Seven hundred and thirty three SNPs were downloaded from NCBI, from these only 184 SNPs were missense mutations. Firstly we analyzed the effect of the SNPs on the function of the protein using four soft wares (SIFT, PROVEAN, Polyhen2, Snap2), which result in 31 SNPs that had an effect on protein function (Table 1). Then we further analyzed them by SNPs&-GO, PHD, PMUT, I-Mutant, COSMIC and dbSNP Short Genetic Variations resulting in 11 SNPs (Table 2, 3). Finally, we studied their effect on the structure of the protein by using Hope and chimera (Fig. 2 -12). The following workflow (Fig. 1) summaries the methodology used in this paper.

Functional analysis of single nucleotide polymorphisms (SNPs) by SIFT, PROVEAN, PolyPhen-2 and SNAP2 servers, showing 31 deleterious SNPs.

dbSNP rs#*Sub*Sift predictionSift scoreProvean predictionPROVEAN scorePolyphen2 predictionPolyphen2 scoreSnap2 predictionSnap2 Score
rs28942098M1Iafect0Deleterious-2.586probably damaging0.999effect41
rs1296841626L5Fafect0Deleterious2.676probably damaging1efect43
rs770544416S6Gafect0Deleterious-2.539possibly damaging0.567efect8
rs1054465259G28Aafect0Deleterious-3.745possibly damaging0.816efect47
rs777541949T38Safect0Deleterious-2.785probably damaging1efect49
rs1019931450Y40Cafect0Deleterious-5.114probably damaging1efect56
rs200806263G49Cafect0Deleterious-5.363probably damaging1efect14
rs1454615241T56Iafect0Deleterious-4.102probably damaging1efect48
rs1060500015L63Pafect0Deleterious-4.277probably damaging1efect25
rs121434264L64Pafect0Deleterious-4.435probably damaging0.999efect60
rs778467088D90Hafect0Deleterious-5.36probably damaging1efect17
rs1186176634S174Pafect0Deleterious-2.64probably damaging1efect23
rs770439843R222Gafect0Deleterious-3.272possibly damaging0.955efect49
rs776394390D223Gafect0Deleterious-3.71probably damaging0.994efect25
rs1330160847I224Tafect0Deleterious-3.072probably damaging0.991efect25
rs1060500022W231Rafect0Deleterious-7.888probably damaging0.996efect64
rs1452051467R234Qafect0Deleterious-3.063probably damaging1efect72
rs973863694I249Tafect0Deleterious-3.648probably damaging0.999efect33
rs878855091R263Cafect0Deleterious-3.402probably damaging1efect17
rs1244272523R330Wafect0Deleterious-3.789probably damaging0.996efect56
rs770734388P360Safect0Deleterious-6.86probably damaging1efect59
rs769288212P365Lafect0.04Deleterious-8.661probably damaging1efect75
rs113200235A367Vafect0.05Deleterious-3.178probably damaging0.996efect66
rs866465727V387Aafect0Deleterious-3.326possibly damaging0.656efect48
rs866793539G396Cafect0.02Deleterious-7.128probably damaging1efect25
rs754454928R441Cafect0Deleterious-7.186probably damaging1efect58
rs778432682R441Hafect0Deleterious-4.441probably damaging1efect64
rs1225502334R484Cafect0.05Deleterious-3.972probably damaging0.994efect32
rs1292596060R504Safect0Deleterious-5.296probably damaging1efect74
rs759222387R504Hafect0Deleterious-4.43probably damaging1efect73
rs1060500011R513Wafect0.04Deleterious-3.834probably damaging0.977efect59

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution

Functional analysis of single nucleotide polymorphisms (SNPs) by SNPs&GO, PHD and PMUT servers, showing eleven pathogenic SNPs.

dbSNP rs#*Sub*SNPandGO PredictionRI*snp and go score PHD PredictionRIPHD probabilityPMut predictionPMut score
rs200806263G49CDisease70.829Disease80.893Disease0.54 (80%)
rs1060500015L63PDisease30.657Disease70.861Disease0.68 (85%)
rs121434264L64PDisease50.729Disease80.875Disease0.75 (87%)
rs778467088D90HDisease40.708Disease70.867Disease0.70 (86%)
rs770439843R222GDisease10.553Disease60.782Disease0.53 (80%)
rs1060500022W231RDisease10.544Disease10.56Disease0.64 (84%)
rs770734388P360SDisease30.65Disease00.503Disease0.84 (90%)
rs754454928R441CDisease50.743Disease50.775Disease0.61 (83%)
rs778432682R441HDisease30.669Disease50.753Disease0.70 (86%)
rs1292596060R504SDisease60.823Disease70.867Disease0.86 (91%)
rs759222387R504HDisease60.811Disease70.857Disease0.72 (86%)

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution, RI*: Reliability Index

stability analysis of 11 single nucleotides polymorphism using I-MUTANT showing decrease in the related protein stability. Variant effect on cancer was analyzed by COSMIC tool, which show the related type of mutations. Finally, the frequency alleles of the SNPs analyzed by dbSNP Short Genetic Variations that shows a frequency ranging from 0.0 to 0.00001.

dbSNP rs#subI-MUTANT prediction SCORERICOSMIC Mutation typeFrequency alleles
rs200806263G49CDecrease-0.915N/A0.00000
rs1060500015L63PDecrease-1.612Deletion - FrameshiftN/A*
rs121434264L64PDecrease-1.582N/A0.0000
rs778467088D90HDecrease-1.199N/A0.00001
rs770439843R222GDecrease-1.236Substitution - Nonsense0.00001
rs1060500022W231RDecrease-1.078Substitution - NonsenseN/A
rs770734388P360SDecrease-1.458N/A0.00001
rs754454928R441CDecrease-0.917Substitution - Missense0.00001
rs778432682R441HDecrease-1.319N/A0.00001
rs1292596060R504SDecrease-1.39N/AN/A
rs759222387R504HDecrease-1.529N/A0.00004

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution, RI*: Reliability Index, N/A: not available

Figure 1

Paper workflow, which shows the used methods and soft wares in the paper.

Firstly we assess the effect of the missense SNPS on the related protein by SIFT, Polyphen2, PROVEAN and SNAP2, which resulted in 31 combined deleterious SNPS by the four soft wares, with different scores. Polyphen2 prediction shows two types of result (probably damaging prediction with 27 SNPS, and possibly damaging with four SNPs) (Table 1).

Further functional analysis was done using SNPandGO, PHD and P-Mut soft wares. The resulted predictions were varied among the soft wares with different reliability index (RI). When the combined Disease related SNPs were selected from the three software’s, we end up with only 11 SNPs. (Table 2).

In our attempt to study the effect of these eleven SNPs on the protein stability, we used I-Mutant software, which predicted that all the eleven SNPs would decrease the stability of the protein. Furthermore, we used COSMIC software to predict the possible mutation type that could result from each variant. (Some predictions were not available in the database) Then we calculated the frequency alleles of each variant independently by dbSNP Short Genetic Variations. (Some results were also unavailable in the database) (Table 3).

To study the gene-gene interactions and predict the possible function of CDC73 gene we used gene MANIA software, which shows different types of interactions (co-expression and sharing the same domain), with other genes like CTR9 and CHUK. The software also provide us with a list of possible predicted functions. (Table 4 and 5) and (Fig.13).

CDC73 gene Functions and its appearance in network and genome as predicted by Gene mania. Showing the function, number of genes in network and in the genomes.

FunctionFDRGenes in networkGenes in genome
transcription elongation factor complex2.52E-09628
regulation of transcription elongation from RNA polymerase II promoter2.52E-09510
transcription elongation from RNA polymerase II promoter6.60E-09775
DNA-templated transcription, elongation5.45E-087108
positive regulation of DNA-templated transcription, elongation5.45E-08520
mRNA polyadenylation7.21E-08522
RNA polyadenylation9.97E-08524
regulation of DNA-templated transcription, elongation2.01E-07528
histone modification3.31E-078260
covalent chromatin modification3.47E-078265
DNA-directed RNA polymerase II, holoenzyme3.50E-07681
endodermal cell fate commitment3.50E-07410
RNA polymerase complex7.92E-07696
DNA-directed RNA polymerase complex7.92E-07695
nuclear DNA-directed RNA polymerase complex7.92E-07695
endodermal cell differentiation8.50E-07413
negative regulation of myeloid cell differentiation1.03424E-06544
cell fate commitment involved in formation of primary germ layer2.50852E-06417
histone monoubiquitination3.86491E-06419
endoderm formation5.6616E-06421
mRNA processing1.22928E-057287
endoderm development1.28119E-05426
histone ubiquitination2.24022E-05430
mRNA 3’-end processing2.5645E-05588
regulation of myeloid cell differentiation4.02517E-05597
protein monoubiquitination4.41323E-05437
formation of primary germ layer4.41323E-05437
RNA 3’-end processing4.41323E-055101
regulation of mRNA processing0.000289177459
positive regulation of mRNA 3’-end processing0.000313172315
regulation of mRNA 3’-end processing0.000372752316
myeloid cell diferentiation0.0004465495165
Gastrulation0.000451884468
positive regulation of mRNA processing0.000804394321
histone H3-K4 methylation0.001915467328
cell fate commitment0.0029564454111
negative regulation of cell differentiation0.0040909335267
stem cell maintenance0.005276884340
histone lysine methylation0.006873226344
regulation of histone modification0.011780086353
histone methylation0.01357129356
stem cell diferentiation0.0139558224171
cellular response to lipopolysaccharide0.01758198362
regulation of chromatin organization0.018029071363
cellular response to molecule of bacterial origin0.020035617366
embryonic morphogenesis0.0200356174192
RNA polymerase II core binding0.020433712210
peptidyl-lysine trimethylation0.022132906211
basal transcription machinery binding0.022132906211
basal RNA polymerase II transcription machinery binding0.022132906211
mRNA cleavage0.022132906211
transcriptionally active chromatin0.022132906211
regulation of histone H3-K4 methylation0.022132906211
protein alkylation0.022434688373
protein methylation0.022434688373
cellular response to biotic stimulus0.022949254374
mRNA cleavage factor complex0.024254179212
RNA polymerase core enzyme binding0.024254179212
positive regulation of histone methylation0.028159975213
response to lipopolysaccharide0.037153654390
stem cell development0.037153654390
regulation of chromosome organization0.037153654390
RNA polymerase binding0.040493707216
regulation of histone methylation0.04514658217
response to molecule of bacterial origin0.0498915933101
peptidyl-lysine methylation0.054973868219
macromolecule methylation0.0805064563120

*FDR: false discovery rate is greater than or equal to the probability that this is a false positive

The gene co-expressed, share domain and interaction with CDC73 gene network, as predicted by Gene mania. Showing the type of interaction between different genes and CDC73 gene.

Gene 1Gene 2WeightNetwork group
AP3S1HSP90AA10.013259681Co-expression
CTR9CDC730.008208696Co-expression
CTR9LEO10.003100712Co-expression
WDR61LEO10.007506342Co-expression
WDR61CTR90.003934787Co-expression
KMT2CBCL9L0.01605392Co-expression
CDK9CPSF40.007431932Co-expression
GTF2F1PAF10.004063675Co-expression
GTF2F1KMT2C0.008976047Co-expression
AURKBTKT0.011608324Co-expression
NUP98CTR90.014227687Co-expression
CPSF4TKT0.00430011Co-expression
CPSF4AURKB0.006637127Co-expression
CHUKCSTF30.016616315Co-expression
CSTF2AURKB0.004579851Co-expression
CHUKCDC730.001583963Co-expression
CHUKNUP980.004846978Co-expression
GTF2F1TKT0.009125041Co-expression
CPSF4CSTF30.014529244Co-expression
CHUKNUP980.008912819Co-expression
NUP98CDC730.00588025Co-expression
CHUKCDC730.001885684Co-expression
CSTF2LEO10.006971632Co-expression
CSTF3CTR90.006781118Co-expression
CHUKCDC730.011535326Co-expression
CHUKCTR90.00340099Co-expression
AP3S1CTR90.004597292Co-expression
GTF2F1PAF10.010657921Co-expression
GTF2F1CPSF40.011196721Co-expression
CSTF2CTR90.011377413Co-expression
CDK9CPSF40.009146069Co-expression
AURKBCSTF30.006553773Co-expression
SPCS3CHUK0.00718225Co-expression
AURKBNUP980.004425747Pathway
CPSF4CSTF30.11179613Pathway
CHUKHSP90AA10.006348364Pathway
CSTF2CSTF30.009734826Pathway
CSTF2CPSF40.11179613Pathway
GTF2F1CSTF30.005905584Pathway
GTF2F1CSTF20.005905584Pathway
GTF2F1CDK90.011108032Pathway
CSTF2CSTF30.011789562Pathway
GTF2F1CSTF30.008021638Pathway
GTF2F1CSTF20.007878398Pathway
GTF2F1CDK90.014240757Pathway
CHUKHSP90AA10.088913664Pathway
LEO1CDC730.122442566Physical Interactions
PAF1CDC730.21769759Physical Interactions
PAF1LEO10.15407903Physical Interactions
CTR9CDC730.21769759Physical Interactions
CTR9LEO10.15407903Physical Interactions
CTR9PAF10.27394587Physical Interactions
WDR61CDC730.21769759Physical Interactions
WDR61LEO10.15407903Physical Interactions
WDR61PAF10.27394587Physical Interactions
WDR61CTR90.27394587Physical Interactions
HSP90AA1CDC730.3958806Physical Interactions
CDK9HSP90AA10.20299108Physical Interactions
CDK9CDC730.119673245Physical Interactions
AURKBCDC730.1827706Physical Interactions
AURKBHSP90AA10.042777777Physical Interactions
PAF1CDC730.13507365Physical Interactions
CTR9CDC730.18643428Physical Interactions
NUP98CDC730.203537Physical Interactions
NUP98PAF10.31207067Physical Interactions
CHUKCDC730.1774874Physical Interactions
CHUKPAF10.2721304Physical Interactions
CDK9HSP90AA10.040214784Physical Interactions
LEO1CDC730.20954604Physical Interactions
PAF1CDC730.16041815Physical Interactions
PAF1LEO10.08672058Physical Interactions
CTR9CDC730.24223034Physical Interactions
CTR9LEO10.1309475Physical Interactions
CTR9PAF10.10024697Physical Interactions
CSTF2CSTF30.24596581Physical Interactions
GTF2F1CDC730.1822143Physical Interactions
GTF2F1CDK90.077137925Physical Interactions
LEO1CDC730.3225924Physical Interactions
PAF1CDC730.3225924Physical Interactions
CTR9CDC730.3225924Physical Interactions
WDR61CDC730.17647609Physical Interactions
CSTF3CDC730.3225924Physical Interactions
FIP1L1CDC730.3225924Physical Interactions
CPSF4CDC730.17647609Physical Interactions
CHUKHSP90AA10.02887361Physical Interactions
KMT2CCDC730.15483053Physical Interactions
CSTF2CDC730.16203262Physical Interactions
LEO1CDC730.09317254Physical Interactions
PAF1CDC730.08377836Physical Interactions
PAF1LEO10.07275753Physical Interactions
CTR9CDC730.09010142Physical Interactions
CTR9LEO10.078248814Physical Interactions
CTR9PAF10.07035932Physical Interactions
WDR61CDC730.114276804Physical Interactions
WDR61LEO10.09924398Physical Interactions
WDR61PAF10.089237645Physical Interactions
WDR61CTR90.09597274Physical Interactions
AP3S1CDC730.2352175Physical Interactions
CSTF2CSTF30.077111326Physical Interactions
LEO1CDC730.049589828Physical Interactions
PAF1CDC730.10079521Physical Interactions
PAF1LEO10.11382505Physical Interactions
CTR9CDC730.01533186Physical Interactions
CTR9LEO10.017313818Physical Interactions
CTR9PAF10.035191692Physical Interactions
WDR61CDC730.041940387Physical Interactions
WDR61LEO10.047362044Physical Interactions
WDR61PAF10.09626707Physical Interactions
WDR61CTR90.014643089Physical Interactions
FIP1L1CSTF30.113308094Physical Interactions
CPSF4FIP1L10.14965165Physical Interactions
CSTF2FIP1L10.10615889Physical Interactions
CDK9CTR90.0411037Physical Interactions
LEO1CDC730.05172406Physical Interactions
PAF1CDC730.028004196Physical Interactions
PAF1LEO10.054951686Physical Interactions
CTR9CDC730.04297603Physical Interactions
CTR9LEO10.08433042Physical Interactions
CTR9PAF10.045657773Physical Interactions
WDR61CDC730.04966663Physical Interactions
WDR61LEO10.09745915Physical Interactions
WDR61PAF10.05276587Physical Interactions
WDR61CTR90.080975994Physical Interactions
CSTF3CDC730.06669109Physical Interactions
FIP1L1CSTF30.14174445Physical Interactions
AURKBHSP90AA10.002030884Physical Interactions
CPSF4CDC730.09574746Physical Interactions
CPSF4FIP1L10.20350051Physical Interactions
CHUKHSP90AA10.001720827Physical Interactions
BCL9LCDC730.2791709Physical Interactions
KMT2CCDC730.04642578Physical Interactions
CSTF2CDC730.0463337Physical Interactions
CSTF2CSTF30.11722768Physical Interactions
CDK9PAF10.01438359Physical Interactions
GTF2F1CDK90.013446528Physical Interactions
AURKBHSP90AA10.03819712Physical Interactions
CDK9HSP90AA10.03819712Physical Interactions
LEO1CDC730.5193767Physical Interactions
PAF1CDC730.5193767Physical Interactions
CTR9CDC730.37205398Physical Interactions
WDR61CDC730.5193767Physical Interactions
AURKBHSP90AA10.080966905Physical Interactions
CDK9HSP90AA10.036444858Physical Interactions
LEO1CDC730.0771831Physical Interactions
PAF1CDC730.031383026Physical Interactions
PAF1LEO10.14005615Physical Interactions
CTR9CDC730.058806863Physical Interactions
CTR9PAF10.10671071Physical Interactions
WDR61CTR90.34459004Physical Interactions
CPSF4FIP1L10.44817418Physical Interactions
CHUKHSP90AA10.0081622Physical Interactions
HSF2BPCDC730.31519976Physical Interactions
CDK9HSP90AA10.010792454Physical Interactions
HSF2BPCDC730.58701295Physical Interactions
CPSF4FIP1L10.6677753Physical Interactions
CHUKHSP90AA10.007963367Physical Interactions
CSTF2CSTF30.2417784Physical Interactions
CTR9LEO10.09094542Predicted
CTR9LEO10.0869592Predicted
LEO1CDC731Predicted
CPSF4FIP1L11Predicted
CSTF3LEO10.020077549Predicted
FIP1L1CSTF30.015798066Predicted
NUP98CTR90.047920566Predicted
SPCS3CDC730.25866398Predicted
CTR9LEO10.34126943Predicted
TKTCDC731Predicted
TKTCDC731Predicted
CHUKAURKB0.006874138 Shared protein domains
CDK9AURKB0.003942981Shared protein domains
CDK9CHUK0.006889931 Shared protein domains
CHUKAURKB0.003882364 Shared protein domains
CDK9AURKB0.002896895 Shared protein domains
CDK9CHUK0.004389529 Shared protein domains

We used project Hope and Chimera to study the structural changes inflicted by the eleven SNPs on the parafibromin protein, we combined the two-dimensional structure we acquired from Hope reports with the three dimensional structure we designed using chimera software. (Fig. 2-12). These figures show the differences between native and mutant amino acids, in the green and red boxes the schematic structures of the native amino acids (in the left side), and the mutant ones (in the right side). In the 2D image the backbone, which is the same for each amino acid, is colored red and the side chain, unique for each amino acid, is colored black, the 3D wide-type residues colored green and mutant ones colored red, whereas the protein is colored cyan.

Figure 2

SNPs ID: rs200806263: G49C: The amino acid Glycine (green color) changed to Cysteine (red color) at position 49. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 3

SNPs ID: rs1060500015: L63P: The amino acids Leucine (green color) changed to Proline (red color) at position 63. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 4

SNPs ID: rs121434264: L64P: The amino acid Leucine (green color) changed to Proline (red color) at position 64. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 5

SNPs ID: rs778467088: D90H: The amino acid Aspartic acid (green color) changed to Histidine (red color) at position 90. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 6

SNPs ID: rs770439843: R222G: The amino acid Arginine (green color) changed to Glycine (red color) at position 222. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 7

SNPs ID: rs1060500022: W231R: The amino acid Tryptophan (green color) changed to Arginine (red color) at position 231. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 8

SNPs ID: rs770734388:P360S: The amino acid Proline (green color) changed to Serine (red color) at position 360. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 9

SNPs ID: rs754454928: R441C: The amino acid Arginine (green color) changed to Cysteine ( red color) at position 441. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 10

SNPs ID: rs778432682: R441H: The amino acid Arginine (green color) changed to Histidine (red color) at position 441. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 11

SNPs ID: rs1292596060: R504S: The amino acid Arginine (green color) changed to Serine (red color) at position 504. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 12

SNPs ID: rs759222387: R504H: The amino acid Arginine (green color) changed to Histidine (red color) at position 504. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Figure 13

Interaction between CDC73 gene and related genes using Gene MANIA software showing the different types of interactions between these genes. With physical interactions being the most significant type (76.64%).

Discussion

It has been found that many genes with proved association to cancer, usually contain single nucleotide polymorphisms (SNPs) (33), Furthermore these disease-causing SNPs are usually found to occur at evolutionarily conserved regions. Those have essential role in the structural integrity and function of the protein (34, 35), therefore, our focus was dedicated to the coding region, which revealed 11novel mutations in CDC73 gene, out of 184-missense SNPs download from NCBI web site. These SNPs were then analyzed using twelve soft wares to study the effect of the mutation on the structural and function of the parafibromin protein.

Project hope was used to study the effect of the mutation on the physiochemical properties of the protein, where we have found changes on the charge, size and hydrophobicity of the protein illustrated in the hope report, which leads to disturbance in the interactions with other molecules. In the first substitution G49C, The mutant residue is bigger and more hydrophobic than the wild-type residue.in L63PThe mutant residue is smaller than the wild-type residue, which lead to an empty space in the core of the protein. In L64P, we also found the mutant residue is smaller than the wild-type residue, causing an empty space in the core of the protein. In D90H, There is a difference in charge between the wild-type and mutant amino acid. The charge of the buried wild-type residue is lost by this mutation. The wild type and mutant amino acids differ in size,

With the mutant residue been bigger than the wild-type residue. The wild-type residue was buried in the core of the protein, and since the mutant residue is bigger, it probably will not fit. In R222G, There is a difference in charge between the wild-type and mutant amino acid, this can cause loss of interactions with other molecules or residues, The wild-type and mutant amino acids also differ in size, where the mutant residue is smaller, this might lead to loss of interactions. Furthermore, the hydrophobicity of the wild type and mutant residue differs, as the mutation introduces a more hydrophobic residue at this position, this can result in loss of hydrogen bonds and/or disturb correct folding. In W231R, There is a difference in charge between the wild type and mutant amino acid. The mutation introduces a charge; this can cause repulsion of ligands or other residues with the same charge. In addition, the wild type and mutant amino acids differ in size, where the mutant residue is smaller; this might lead to loss of interactions. The hydrophobicity of the wild type and mutant residue also differs, that is why hydrophobic interactions, either in the core of the protein or on the surface may be lost. In P360S, the mutant residue is smaller than the wild-type residue, which is why the mutation will cause an empty space in the core of the protein. The hydrophobicity of the wild type and mutant residue also differs; this difference will cause loss of hydrophobic interactions in the core of the protein. In R441C, The charge of the wild-type residue is lost, causing loss of interactions with other molecules. In addition, the wild type and mutant amino acids have different size; causing a possible loss of external interactions. In R441H, interactions are lost due to the loss of charge in the mutant amino acid. In R504S, the interaction is lost due to difference in size, charge and hydrophobicity. In R504H, the interaction is lost due to loss of charge and the difference in size between the wild and mutant amino acid with the mutant amino acid been smaller in size the wild one.

We also recognized common shared domains, which were Cdc73/Parafibromin IPR007852, C-Terminal Domain Super-family IPR038103 and Cell Division Control Protein 73, C-Terminal IPR031336, which indicated the conservancy and the significance of these SNPs.

Chimera software was used to show the 3D changes in the structure of the protein coupled with 2D schematic structures from project hope for comparison (Fig. 2-12). The structural differences between the wild type and mutant type visualized using these two software, prove the pathogenic impact of the eleven SNPs on parafibromin protein.

We tried to identify the variant in CDC73 gene that could cause mutational impact on the protein that could lead to cancer development. Using COSMIC software. We focused on the 11 novel mutation at hand. We found after analyzation that three SNPs will lead to Nonsense Substitution (R222G, W231R and R441C), while one SNP (L63P) will lead to Frameshift-Deletion. This indicate the significance of theses SNPs in the development of hyperparathyroidism-jaw tumor (HPT-JT) syndrome. Other SNPs prediction was not available at the web side.

We analyzed the allele’s frequency using dbSNP Short Genetic Variations that provided different prediction about the global frequency of our gene in question. We noted R504H SNP to have the highest alleles frequency with global frequency of 0.00004.while other SNPs results ranged from 0,0 to 0.00001 indicating that R504H could be best candidate as a diagnostic marker.

Previous papers report novel mutations associated with Jaw tumor syndrome in the following positions: c.191-192 delT,(19) and (c.1379delT/p.L460Lfs*18) (36). While novel deletion of exons 4 to 10 of CDC73 detected in another study (37).

In other study NGS revealed four pathogenic or likely pathogenic germline sequence variants in CDC73 c.271C>T (p.Arg91*), c.496C>T (p.Gln166*), c.685A>T (p.Arg229*) and c.787C>T (p.Arg263Cys that are related to Primary hyperparathyroidism (PHPT) (2) of those mutations, only (A263C) was found to be pathogenic (by SIFT, PROVEAN, Polyphen2 and SNAP2) in the present study. Furthermore A study done by Sulaiman et al. identified another polymorphism at 109 T>G (38). In addition, CDC73 gene mutation was also reported be associated with other types of cancers like gastric, colorectal, ovarian and head and neck cancers (38). The current work provide new mutations in CDC73 gene which are related to Jaw tumor syndrome, this will add to the growing knowledge about the association between CDC73 gene mutations and Jaw tumor syndrome.

The mutations identified in this study can be used as diagnostic marker for the disease, and serve as “actionable targets” for chemotherapeutic intervention in patients whose disease is no longer surgically curable further in vitro and in vivo studies are needed to confirm these results.

Conclusion

In this study, the effect of the SNPs of CDC73 gene was thoroughly investigated through different bioinformatics prediction soft wares, 11 novel mutations were found to have damaging impact on the structure and function of the protein and may thus be used as diagnostic marker for hyperparathyroidism-jaw tumor (HPT-JT) syndrome.

eISSN:
2564-615X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Life Sciences, Genetics, Biotechnology, Bioinformatics, other