In silico analysis of CDC73 gene revealing 11 novel SNPs with possible association to Hyperparathyroidism-Jaw Tumor syndrome

Primary hyperparathyroidism (PHPT) is a disease caused by the autonomous over-secretion of parathyroid hormone (PTH) due to adenoma and hyperplasia of the parathyroid gland, furthermore it has an estimated prevalence of 6.7 per “1,000” population. The majority of PHPT cases are not inherited, while some are due to genetic mutations, it include multiple endocrine neoplasia (MEN), familial hypocalciuric hypercalcemia, neonatal severe hyperparathyroidism, familial isolated hyper parathyroidism and hyperparathy-roidism-jaw tumor (HPT-JT) syndrome (caused by mutations in the CDC73 gene) (1, 2, 3, 4, 5).

Hyperparathyroidism-jaw tumor HPT-JT (OMIM #145001) first observed by Jackson in 1958 .It is an autosomal dominant disorder with variable expression (6, 7, 8). It includes parathyroid adenomas, fibro-osseous jaw tumors, uterine tumors and renal diseases such as hamartomas, polycystic disease and Wilms tumors or adenocarcinoma. Diagnosis of HPT-JT is important because of its genetic involvement and the 24% chance of malignant transformation (9, 10). The nuclear medicine imaging - especially the scintigraphy parathyroid with 99m Tc-MIBI (methoxyisobutyl-isonitrile)- has an important role in establishing the diagnosis (11).

CDC73 -related (HPT-JT) syndrome results from truncating (80%) or missense variants in the tumor suppressor Cell Civision Cycle protein 73 (CDC73) gene (also known as HRPT-2), that is located at chromosome 1q31.2, and encodes the 531-amino acid protein parafibromin (12, 13, 14, 15, 16). this protein regulates transcriptional and post-transcriptional events. The majority of the CDC73 gene mutations related to hyperparathyroidism and parathyroid carcinoma are frame-shift, nonsense or missense occurring within the protein-encoding exons.(16) It has been estimated that 70% of patients suffering from this mutation may develop PHPT (17).

In vitro characterization studies, is a demanding approach requiring a lot a lot of resources, time, and fees. For these reasons, bioinformatics analysis is an appropriate, fast, dependable and cost effective approach to enhance our understanding of the role of mutations in the pathogenesis. Insilco or bio-informatics analysis has become an important tool in the recent years that help in the advancement of biological research through the use of different algorithms and databases (18). Many researches about the role of CDC73 gene in the development of (HPT-JT) syndrome have focused in the deletion type of mutation (5, 19, 20), yet few of them have study single nucleotides polymorphisms (SNPs). This study is unique because it is the first in silico analysis of CDC73 gene associating it with jaw tumor syndrome. Therefore, the aim of this study is to assess the effect of mutational SNPs on CDC73 gene on the structure and function of parafibromin protein using different bioinformatics tools.

Methodology

Source of retrieving nsSNPs

The SNPs related to the human gene CDC37 were obtained from single nucleotide database (dbSNP) in the National Center for Biotechnology Information (NCBI) web site. www.ncbi.nlm.nih.gov And the protein sequence with ID Q6P1J9 was obtained from UniProt database. www.uniprot.org

Severs used for identifying the most damaging and disease related SNPs

Six server were used for assessing the functional and disease related impact of deleterious nsSNP:

SIFT (Sorting Intolerant From Tolerant)

SIFT is the first online server that was used in our assessment. It predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. Sift score range from 0 to 1. The amino acid substitution is predicted deleterious if the score is ≥ 0.05, and tolerated if the score is ≥ 0.05. The protein sequence that obtained from UniProt and the substitutions of interest were submitted to SIFT server, then according to the score the substitution will be either deleterious or tolerated. The deleterious SNPS were further evaluated (21). It is available online at www. sift.bii.a-star.edu.sg/

Polyphen2 (prediction of functional effect of human nsSNPs)

PolyPhen-2 (Polymorphism Phenotyping v2) it is another online server, available as software and via a Web server predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification (22).

The output of the PolyPhen-2 prediction pipeline is a prediction of probably damaging, possibly damaging, or benign, along with a numerical score ranging from 0.0 (benign) to 1.0 (damaging). This three predictions means that, when the prediction is “probably damaging” indicates damaging with high confidence, “possibly damaging” indicates damaging with low confidence, and “benign” means that the query substitution is predicted to be benign with high confidence. It is available online at http://genetics.bwh.harvard.edu/pph2/ Only the damaging SNPs further evaluated.

PROVEN (protein variation effect analyzer)

Was the third software tool used which predicts whether an amino acid substitution has an impact on the biological function of a protein. PROVEAN is useful for filtering sequence variants to identify nonsynonymous or indel variants that predicted to be functionally important. The result obtained from this web site is either deleterious when the score is ≤-2.5, and neutral if the score above -2.5 (23).. Moreover, the deleterious prediction considered for further evaluation. It is available at http://provean.jcvi.org/index.php

SNAP2

It is another tool to predict functional effects of mutations. SNAP2 is a trained classifier that based on a machine-learning device called “neural network”. It distinguishes between effect and neutral variants/non-synonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. In addition, structural features such as predicted secondary structure and solvent accessibility are considered. If available also annotation (i.e. known functional residues, pattern, regions) of the sequence or close homologs pulled in. In a cross-validation over 100,000 experimentally annotated variants, SNAP2 reached a sustained two-state accuracy (effect/ neutral) of 82% (at an AUC of 0.9) .(24) We submit the reference sequence and then recorded the relevant result. It is available at https://rostlab.org/services/snap2web/

SNPsGO

Is a web server for predicting disease-associated variations from protein sequence and structure. SNPs&GO is an accurate method that, starting from a protein sequence, can predict whether a variation is disease related or not by exploiting the corresponding protein functional annotation. (25) We submitted the reference sequence in FASTA format in the search bar. The result we obtained from this server consist of three different analytical algorithms; PHD, SNP&GO, and Panther. The output consist of a table listing the number of the mutated position in the protein sequence, the wild-type residue, the new residue and if the related mutation is predicted as disease-related (Disease or as neutral polymorphism Neutral) and The RI value (Reliability Index). It is available at http://snps.biofold.org/snps-and-go/index.html

P-Mut

It is a web-based tool for annotation of pathological variant on proteins. P-Mut Web portal allows the user to perform pathology predictions, to access a complete repository of pre-calculated predictions, and to generate and validate new predictors. It require the reference sequence of the related gene to do these functions. The P-Mut portal is freely accessible at http://mmb.irbbarcelona.org/PMut (26).

After using all these analytical algorithms for the prediction. The resulted SNPs were further analyzed by another tool called I-mutant.

I-mutant (predictors of effect of single point protein mutation)

It is an online server to predict the protein stability change upon single site mutations starting from protein sequence alone or protein structure when available. It is freely available at (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) (27). We select the protein sequence tool on the website, and then we submitted the reference sequence without the heading. In addition to that, we inserted the positon of the intended SNPs with the related new residue, after which we recorded the related result.

Gene mania

it is a free online server that used to predict the function of gene and its interaction using very large set of functional associated data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity (28). It is available at (http://www.genemania.org/) We typed the gene name in the search bar then download the following data (Network image, Gene’s data, Functions data and Interactions data).

Hope project

A webserver used to analyze the effect of single point mutation on structural level of protein. It is the best way to visualize the mutation since it creates a report consist of figures, animation, 3d structure and mutation report just by submitting the protein sequence and mutation (29). It is available at http://www.cmbi.ru.nl/hope/ W submitted the reference sequence along with the wild and mutant amino acids of the eleven final SNPs. After that, we downloaded the related reports.

Chimera 1:10:2

Ucsf chimera is a computer program that used to visualize the interaction and molecular analysis including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. It is also allow creating movies. Chimera (version 1:10:2) software was used to scan the 3D (three-dimensional) structure of specific protein, and then modification was made to the wild type to show the difference after mutation and a graphic view was made for each mutation change (30). (http://www.cgl.ucsf.edu/chimera/)

COSMIC v86

The Catalogue of Somatic Mutations in Cancer is one of the most comprehensive resource for exploring the impact of somatic mutations in human cancer. It covers both coding and non-coding mutations. Since it is primarily hand-curated, this ensure the high quality and accuracy of the results (31). The gene name (CDC73) was used to search for the associated variants. Then we search for the resulted mutation type for each of the eleven SNPs and record it on the paper. COSMIC is available at (https://cancer.sanger.ac.uk/cosmic)

dbSNP Short Genetic Variations

This is a tool available at NCBI, that use other database to calculate the alleles frequency (like GnomAD_exome and ExAC) (32). We used it to study the global allele’s frequency in CDC73 gene by submitting the rs number of each of the eleven substitution and then recording the resulted frequency in a table. It is available at (https://www.ncbi.nlm.nih.gov/snp/rs759222387)

Results

Seven hundred and thirty three SNPs were downloaded from NCBI, from these only 184 SNPs were missense mutations. Firstly we analyzed the effect of the SNPs on the function of the protein using four soft wares (SIFT, PROVEAN, Polyhen2, Snap2), which result in 31 SNPs that had an effect on protein function (Table 1). Then we further analyzed them by SNPs&-GO, PHD, PMUT, I-Mutant, COSMIC and dbSNP Short Genetic Variations resulting in 11 SNPs (Table 2, 3). Finally, we studied their effect on the structure of the protein by using Hope and chimera (Fig. 2 -12). The following workflow (Fig. 1) summaries the methodology used in this paper.

Table 1

Functional analysis of single nucleotide polymorphisms (SNPs) by SIFT, PROVEAN, PolyPhen-2 and SNAP2 servers, showing 31 deleterious SNPs.

dbSNP rs#*	Sub*	Sift prediction	Sift score	Provean prediction	PROVEAN score	Polyphen2 prediction	Polyphen2 score	Snap2 prediction	Snap2 Score
rs28942098	M1I	afect	0	Deleterious	-2.586	probably damaging	0.999	effect	41
rs1296841626	L5F	afect	0	Deleterious	2.676	probably damaging	1	efect	43
rs770544416	S6G	afect	0	Deleterious	-2.539	possibly damaging	0.567	efect	8
rs1054465259	G28A	afect	0	Deleterious	-3.745	possibly damaging	0.816	efect	47
rs777541949	T38S	afect	0	Deleterious	-2.785	probably damaging	1	efect	49
rs1019931450	Y40C	afect	0	Deleterious	-5.114	probably damaging	1	efect	56
rs200806263	G49C	afect	0	Deleterious	-5.363	probably damaging	1	efect	14
rs1454615241	T56I	afect	0	Deleterious	-4.102	probably damaging	1	efect	48
rs1060500015	L63P	afect	0	Deleterious	-4.277	probably damaging	1	efect	25
rs121434264	L64P	afect	0	Deleterious	-4.435	probably damaging	0.999	efect	60
rs778467088	D90H	afect	0	Deleterious	-5.36	probably damaging	1	efect	17
rs1186176634	S174P	afect	0	Deleterious	-2.64	probably damaging	1	efect	23
rs770439843	R222G	afect	0	Deleterious	-3.272	possibly damaging	0.955	efect	49
rs776394390	D223G	afect	0	Deleterious	-3.71	probably damaging	0.994	efect	25
rs1330160847	I224T	afect	0	Deleterious	-3.072	probably damaging	0.991	efect	25
rs1060500022	W231R	afect	0	Deleterious	-7.888	probably damaging	0.996	efect	64
rs1452051467	R234Q	afect	0	Deleterious	-3.063	probably damaging	1	efect	72
rs973863694	I249T	afect	0	Deleterious	-3.648	probably damaging	0.999	efect	33
rs878855091	R263C	afect	0	Deleterious	-3.402	probably damaging	1	efect	17
rs1244272523	R330W	afect	0	Deleterious	-3.789	probably damaging	0.996	efect	56
rs770734388	P360S	afect	0	Deleterious	-6.86	probably damaging	1	efect	59
rs769288212	P365L	afect	0.04	Deleterious	-8.661	probably damaging	1	efect	75
rs113200235	A367V	afect	0.05	Deleterious	-3.178	probably damaging	0.996	efect	66
rs866465727	V387A	afect	0	Deleterious	-3.326	possibly damaging	0.656	efect	48
rs866793539	G396C	afect	0.02	Deleterious	-7.128	probably damaging	1	efect	25
rs754454928	R441C	afect	0	Deleterious	-7.186	probably damaging	1	efect	58
rs778432682	R441H	afect	0	Deleterious	-4.441	probably damaging	1	efect	64
rs1225502334	R484C	afect	0.05	Deleterious	-3.972	probably damaging	0.994	efect	32
rs1292596060	R504S	afect	0	Deleterious	-5.296	probably damaging	1	efect	74
rs759222387	R504H	afect	0	Deleterious	-4.43	probably damaging	1	efect	73
rs1060500011	R513W	afect	0.04	Deleterious	-3.834	probably damaging	0.977	efect	59

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution

Table 2

Functional analysis of single nucleotide polymorphisms (SNPs) by SNPs&GO, PHD and PMUT servers, showing eleven pathogenic SNPs.

dbSNP rs#*	Sub*	SNPandGO Prediction	RI*	snp and go score	PHD Prediction	RI	PHD probability	PMut prediction	PMut score
rs200806263	G49C	Disease	7	0.829	Disease	8	0.893	Disease	0.54 (80%)
rs1060500015	L63P	Disease	3	0.657	Disease	7	0.861	Disease	0.68 (85%)
rs121434264	L64P	Disease	5	0.729	Disease	8	0.875	Disease	0.75 (87%)
rs778467088	D90H	Disease	4	0.708	Disease	7	0.867	Disease	0.70 (86%)
rs770439843	R222G	Disease	1	0.553	Disease	6	0.782	Disease	0.53 (80%)
rs1060500022	W231R	Disease	1	0.544	Disease	1	0.56	Disease	0.64 (84%)
rs770734388	P360S	Disease	3	0.65	Disease	0	0.503	Disease	0.84 (90%)
rs754454928	R441C	Disease	5	0.743	Disease	5	0.775	Disease	0.61 (83%)
rs778432682	R441H	Disease	3	0.669	Disease	5	0.753	Disease	0.70 (86%)
rs1292596060	R504S	Disease	6	0.823	Disease	7	0.867	Disease	0.86 (91%)
rs759222387	R504H	Disease	6	0.811	Disease	7	0.857	Disease	0.72 (86%)

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution, RI*: Reliability Index

Table 3

stability analysis of 11 single nucleotides polymorphism using I-MUTANT showing decrease in the related protein stability. Variant effect on cancer was analyzed by COSMIC tool, which show the related type of mutations. Finally, the frequency alleles of the SNPs analyzed by dbSNP Short Genetic Variations that shows a frequency ranging from 0.0 to 0.00001.

dbSNP rs#	sub	I-MUTANT prediction	SCORE	RI	COSMIC Mutation type	Frequency alleles
rs200806263	G49C	Decrease	-0.91	5	N/A	0.00000
rs1060500015	L63P	Decrease	-1.61	2	Deletion - Frameshift	N/A*
rs121434264	L64P	Decrease	-1.58	2	N/A	0.0000
rs778467088	D90H	Decrease	-1.19	9	N/A	0.00001
rs770439843	R222G	Decrease	-1.23	6	Substitution - Nonsense	0.00001
rs1060500022	W231R	Decrease	-1.07	8	Substitution - Nonsense	N/A
rs770734388	P360S	Decrease	-1.45	8	N/A	0.00001
rs754454928	R441C	Decrease	-0.91	7	Substitution - Missense	0.00001
rs778432682	R441H	Decrease	-1.31	9	N/A	0.00001
rs1292596060	R504S	Decrease	-1.3	9	N/A	N/A
rs759222387	R504H	Decrease	-1.52	9	N/A	0.00004

dbSNP rs*: is a locus accession for a variant type assigned by dbSNP, Sub*: substitution, RI*: Reliability Index, N/A: not available

Paper workflow, which shows the used methods and soft wares in the paper.

Firstly we assess the effect of the missense SNPS on the related protein by SIFT, Polyphen2, PROVEAN and SNAP2, which resulted in 31 combined deleterious SNPS by the four soft wares, with different scores. Polyphen2 prediction shows two types of result (probably damaging prediction with 27 SNPS, and possibly damaging with four SNPs) (Table 1).

Further functional analysis was done using SNPandGO, PHD and P-Mut soft wares. The resulted predictions were varied among the soft wares with different reliability index (RI). When the combined Disease related SNPs were selected from the three software’s, we end up with only 11 SNPs. (Table 2).

In our attempt to study the effect of these eleven SNPs on the protein stability, we used I-Mutant software, which predicted that all the eleven SNPs would decrease the stability of the protein. Furthermore, we used COSMIC software to predict the possible mutation type that could result from each variant. (Some predictions were not available in the database) Then we calculated the frequency alleles of each variant independently by dbSNP Short Genetic Variations. (Some results were also unavailable in the database) (Table 3).

To study the gene-gene interactions and predict the possible function of CDC73 gene we used gene MANIA software, which shows different types of interactions (co-expression and sharing the same domain), with other genes like CTR9 and CHUK. The software also provide us with a list of possible predicted functions. (Table 4 and 5) and (Fig.13).

Table 4

CDC73 gene Functions and its appearance in network and genome as predicted by Gene mania. Showing the function, number of genes in network and in the genomes.

Function	FDR	Genes in network	Genes in genome
transcription elongation factor complex	2.52E-09	6	28
regulation of transcription elongation from RNA polymerase II promoter	2.52E-09	5	10
transcription elongation from RNA polymerase II promoter	6.60E-09	7	75
DNA-templated transcription, elongation	5.45E-08	7	108
positive regulation of DNA-templated transcription, elongation	5.45E-08	5	20
mRNA polyadenylation	7.21E-08	5	22
RNA polyadenylation	9.97E-08	5	24
regulation of DNA-templated transcription, elongation	2.01E-07	5	28
histone modification	3.31E-07	8	260
covalent chromatin modification	3.47E-07	8	265
DNA-directed RNA polymerase II, holoenzyme	3.50E-07	6	81
endodermal cell fate commitment	3.50E-07	4	10
RNA polymerase complex	7.92E-07	6	96
DNA-directed RNA polymerase complex	7.92E-07	6	95
nuclear DNA-directed RNA polymerase complex	7.92E-07	6	95
endodermal cell differentiation	8.50E-07	4	13
negative regulation of myeloid cell differentiation	1.03424E-06	5	44
cell fate commitment involved in formation of primary germ layer	2.50852E-06	4	17
histone monoubiquitination	3.86491E-06	4	19
endoderm formation	5.6616E-06	4	21
mRNA processing	1.22928E-05	7	287
endoderm development	1.28119E-05	4	26
histone ubiquitination	2.24022E-05	4	30
mRNA 3’-end processing	2.5645E-05	5	88
regulation of myeloid cell differentiation	4.02517E-05	5	97
protein monoubiquitination	4.41323E-05	4	37
formation of primary germ layer	4.41323E-05	4	37
RNA 3’-end processing	4.41323E-05	5	101
regulation of mRNA processing	0.000289177	4	59
positive regulation of mRNA 3’-end processing	0.000313172	3	15
regulation of mRNA 3’-end processing	0.000372752	3	16
myeloid cell diferentiation	0.000446549	5	165
Gastrulation	0.000451884	4	68
positive regulation of mRNA processing	0.000804394	3	21
histone H3-K4 methylation	0.001915467	3	28
cell fate commitment	0.002956445	4	111
negative regulation of cell differentiation	0.004090933	5	267
stem cell maintenance	0.005276884	3	40
histone lysine methylation	0.006873226	3	44
regulation of histone modification	0.011780086	3	53
histone methylation	0.01357129	3	56
stem cell diferentiation	0.013955822	4	171
cellular response to lipopolysaccharide	0.01758198	3	62
regulation of chromatin organization	0.018029071	3	63
cellular response to molecule of bacterial origin	0.020035617	3	66
embryonic morphogenesis	0.020035617	4	192
RNA polymerase II core binding	0.020433712	2	10
peptidyl-lysine trimethylation	0.022132906	2	11
basal transcription machinery binding	0.022132906	2	11
basal RNA polymerase II transcription machinery binding	0.022132906	2	11
mRNA cleavage	0.022132906	2	11
transcriptionally active chromatin	0.022132906	2	11
regulation of histone H3-K4 methylation	0.022132906	2	11
protein alkylation	0.022434688	3	73
protein methylation	0.022434688	3	73
cellular response to biotic stimulus	0.022949254	3	74
mRNA cleavage factor complex	0.024254179	2	12
RNA polymerase core enzyme binding	0.024254179	2	12
positive regulation of histone methylation	0.028159975	2	13
response to lipopolysaccharide	0.037153654	3	90
stem cell development	0.037153654	3	90
regulation of chromosome organization	0.037153654	3	90
RNA polymerase binding	0.040493707	2	16
regulation of histone methylation	0.04514658	2	17
response to molecule of bacterial origin	0.049891593	3	101
peptidyl-lysine methylation	0.054973868	2	19
macromolecule methylation	0.080506456	3	120

*FDR: false discovery rate is greater than or equal to the probability that this is a false positive

Table 5

The gene co-expressed, share domain and interaction with CDC73 gene network, as predicted by Gene mania. Showing the type of interaction between different genes and CDC73 gene.

Gene 1	Gene 2	Weight	Network group
AP3S1	HSP90AA1	0.013259681	Co-expression
CTR9	CDC73	0.008208696	Co-expression
CTR9	LEO1	0.003100712	Co-expression
WDR61	LEO1	0.007506342	Co-expression
WDR61	CTR9	0.003934787	Co-expression
KMT2C	BCL9L	0.01605392	Co-expression
CDK9	CPSF4	0.007431932	Co-expression
GTF2F1	PAF1	0.004063675	Co-expression
GTF2F1	KMT2C	0.008976047	Co-expression
AURKB	TKT	0.011608324	Co-expression
NUP98	CTR9	0.014227687	Co-expression
CPSF4	TKT	0.00430011	Co-expression
CPSF4	AURKB	0.006637127	Co-expression
CHUK	CSTF3	0.016616315	Co-expression
CSTF2	AURKB	0.004579851	Co-expression
CHUK	CDC73	0.001583963	Co-expression
CHUK	NUP98	0.004846978	Co-expression
GTF2F1	TKT	0.009125041	Co-expression
CPSF4	CSTF3	0.014529244	Co-expression
CHUK	NUP98	0.008912819	Co-expression
NUP98	CDC73	0.00588025	Co-expression
CHUK	CDC73	0.001885684	Co-expression
CSTF2	LEO1	0.006971632	Co-expression
CSTF3	CTR9	0.006781118	Co-expression
CHUK	CDC73	0.011535326	Co-expression
CHUK	CTR9	0.00340099	Co-expression
AP3S1	CTR9	0.004597292	Co-expression
GTF2F1	PAF1	0.010657921	Co-expression
GTF2F1	CPSF4	0.011196721	Co-expression
CSTF2	CTR9	0.011377413	Co-expression
CDK9	CPSF4	0.009146069	Co-expression
AURKB	CSTF3	0.006553773	Co-expression
SPCS3	CHUK	0.00718225	Co-expression
AURKB	NUP98	0.004425747	Pathway
CPSF4	CSTF3	0.11179613	Pathway
CHUK	HSP90AA1	0.006348364	Pathway
CSTF2	CSTF3	0.009734826	Pathway
CSTF2	CPSF4	0.11179613	Pathway
GTF2F1	CSTF3	0.005905584	Pathway
GTF2F1	CSTF2	0.005905584	Pathway
GTF2F1	CDK9	0.011108032	Pathway
CSTF2	CSTF3	0.011789562	Pathway
GTF2F1	CSTF3	0.008021638	Pathway
GTF2F1	CSTF2	0.007878398	Pathway
GTF2F1	CDK9	0.014240757	Pathway
CHUK	HSP90AA1	0.088913664	Pathway
LEO1	CDC73	0.122442566	Physical Interactions
PAF1	CDC73	0.21769759	Physical Interactions
PAF1	LEO1	0.15407903	Physical Interactions
CTR9	CDC73	0.21769759	Physical Interactions
CTR9	LEO1	0.15407903	Physical Interactions
CTR9	PAF1	0.27394587	Physical Interactions
WDR61	CDC73	0.21769759	Physical Interactions
WDR61	LEO1	0.15407903	Physical Interactions
WDR61	PAF1	0.27394587	Physical Interactions
WDR61	CTR9	0.27394587	Physical Interactions
HSP90AA1	CDC73	0.3958806	Physical Interactions
CDK9	HSP90AA1	0.20299108	Physical Interactions
CDK9	CDC73	0.119673245	Physical Interactions
AURKB	CDC73	0.1827706	Physical Interactions
AURKB	HSP90AA1	0.042777777	Physical Interactions
PAF1	CDC73	0.13507365	Physical Interactions
CTR9	CDC73	0.18643428	Physical Interactions
NUP98	CDC73	0.203537	Physical Interactions
NUP98	PAF1	0.31207067	Physical Interactions
CHUK	CDC73	0.1774874	Physical Interactions
CHUK	PAF1	0.2721304	Physical Interactions
CDK9	HSP90AA1	0.040214784	Physical Interactions
LEO1	CDC73	0.20954604	Physical Interactions
PAF1	CDC73	0.16041815	Physical Interactions
PAF1	LEO1	0.08672058	Physical Interactions
CTR9	CDC73	0.24223034	Physical Interactions
CTR9	LEO1	0.1309475	Physical Interactions
CTR9	PAF1	0.10024697	Physical Interactions
CSTF2	CSTF3	0.24596581	Physical Interactions
GTF2F1	CDC73	0.1822143	Physical Interactions
GTF2F1	CDK9	0.077137925	Physical Interactions
LEO1	CDC73	0.3225924	Physical Interactions
PAF1	CDC73	0.3225924	Physical Interactions
CTR9	CDC73	0.3225924	Physical Interactions
WDR61	CDC73	0.17647609	Physical Interactions
CSTF3	CDC73	0.3225924	Physical Interactions
FIP1L1	CDC73	0.3225924	Physical Interactions
CPSF4	CDC73	0.17647609	Physical Interactions
CHUK	HSP90AA1	0.02887361	Physical Interactions
KMT2C	CDC73	0.15483053	Physical Interactions
CSTF2	CDC73	0.16203262	Physical Interactions
LEO1	CDC73	0.09317254	Physical Interactions
PAF1	CDC73	0.08377836	Physical Interactions
PAF1	LEO1	0.07275753	Physical Interactions
CTR9	CDC73	0.09010142	Physical Interactions
CTR9	LEO1	0.078248814	Physical Interactions
CTR9	PAF1	0.07035932	Physical Interactions
WDR61	CDC73	0.114276804	Physical Interactions
WDR61	LEO1	0.09924398	Physical Interactions
WDR61	PAF1	0.089237645	Physical Interactions
WDR61	CTR9	0.09597274	Physical Interactions
AP3S1	CDC73	0.2352175	Physical Interactions
CSTF2	CSTF3	0.077111326	Physical Interactions
LEO1	CDC73	0.049589828	Physical Interactions
PAF1	CDC73	0.10079521	Physical Interactions
PAF1	LEO1	0.11382505	Physical Interactions
CTR9	CDC73	0.01533186	Physical Interactions
CTR9	LEO1	0.017313818	Physical Interactions
CTR9	PAF1	0.035191692	Physical Interactions
WDR61	CDC73	0.041940387	Physical Interactions
WDR61	LEO1	0.047362044	Physical Interactions
WDR61	PAF1	0.09626707	Physical Interactions
WDR61	CTR9	0.014643089	Physical Interactions
FIP1L1	CSTF3	0.113308094	Physical Interactions
CPSF4	FIP1L1	0.14965165	Physical Interactions
CSTF2	FIP1L1	0.10615889	Physical Interactions
CDK9	CTR9	0.0411037	Physical Interactions
LEO1	CDC73	0.05172406	Physical Interactions
PAF1	CDC73	0.028004196	Physical Interactions
PAF1	LEO1	0.054951686	Physical Interactions
CTR9	CDC73	0.04297603	Physical Interactions
CTR9	LEO1	0.08433042	Physical Interactions
CTR9	PAF1	0.045657773	Physical Interactions
WDR61	CDC73	0.04966663	Physical Interactions
WDR61	LEO1	0.09745915	Physical Interactions
WDR61	PAF1	0.05276587	Physical Interactions
WDR61	CTR9	0.080975994	Physical Interactions
CSTF3	CDC73	0.06669109	Physical Interactions
FIP1L1	CSTF3	0.14174445	Physical Interactions
AURKB	HSP90AA1	0.002030884	Physical Interactions
CPSF4	CDC73	0.09574746	Physical Interactions
CPSF4	FIP1L1	0.20350051	Physical Interactions
CHUK	HSP90AA1	0.001720827	Physical Interactions
BCL9L	CDC73	0.2791709	Physical Interactions
KMT2C	CDC73	0.04642578	Physical Interactions
CSTF2	CDC73	0.0463337	Physical Interactions
CSTF2	CSTF3	0.11722768	Physical Interactions
CDK9	PAF1	0.01438359	Physical Interactions
GTF2F1	CDK9	0.013446528	Physical Interactions
AURKB	HSP90AA1	0.03819712	Physical Interactions
CDK9	HSP90AA1	0.03819712	Physical Interactions
LEO1	CDC73	0.5193767	Physical Interactions
PAF1	CDC73	0.5193767	Physical Interactions
CTR9	CDC73	0.37205398	Physical Interactions
WDR61	CDC73	0.5193767	Physical Interactions
AURKB	HSP90AA1	0.080966905	Physical Interactions
CDK9	HSP90AA1	0.036444858	Physical Interactions
LEO1	CDC73	0.0771831	Physical Interactions
PAF1	CDC73	0.031383026	Physical Interactions
PAF1	LEO1	0.14005615	Physical Interactions
CTR9	CDC73	0.058806863	Physical Interactions
CTR9	PAF1	0.10671071	Physical Interactions
WDR61	CTR9	0.34459004	Physical Interactions
CPSF4	FIP1L1	0.44817418	Physical Interactions
CHUK	HSP90AA1	0.0081622	Physical Interactions
HSF2BP	CDC73	0.31519976	Physical Interactions
CDK9	HSP90AA1	0.010792454	Physical Interactions
HSF2BP	CDC73	0.58701295	Physical Interactions
CPSF4	FIP1L1	0.6677753	Physical Interactions
CHUK	HSP90AA1	0.007963367	Physical Interactions
CSTF2	CSTF3	0.2417784	Physical Interactions
CTR9	LEO1	0.09094542	Predicted
CTR9	LEO1	0.0869592	Predicted
LEO1	CDC73	1	Predicted
CPSF4	FIP1L1	1	Predicted
CSTF3	LEO1	0.020077549	Predicted
FIP1L1	CSTF3	0.015798066	Predicted
NUP98	CTR9	0.047920566	Predicted
SPCS3	CDC73	0.25866398	Predicted
CTR9	LEO1	0.34126943	Predicted
TKT	CDC73	1	Predicted
TKT	CDC73	1	Predicted
CHUK	AURKB	0.006874138	Shared protein domains
CDK9	AURKB	0.003942981	Shared protein domains
CDK9	CHUK	0.006889931	Shared protein domains
CHUK	AURKB	0.003882364	Shared protein domains
CDK9	AURKB	0.002896895	Shared protein domains
CDK9	CHUK	0.004389529	Shared protein domains

We used project Hope and Chimera to study the structural changes inflicted by the eleven SNPs on the parafibromin protein, we combined the two-dimensional structure we acquired from Hope reports with the three dimensional structure we designed using chimera software. (Fig. 2-12). These figures show the differences between native and mutant amino acids, in the green and red boxes the schematic structures of the native amino acids (in the left side), and the mutant ones (in the right side). In the 2D image the backbone, which is the same for each amino acid, is colored red and the side chain, unique for each amino acid, is colored black, the 3D wide-type residues colored green and mutant ones colored red, whereas the protein is colored cyan.

SNPs ID: rs200806263: G49C: The amino acid Glycine (green color) changed to Cysteine (red color) at position 49. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs1060500015: L63P: The amino acids Leucine (green color) changed to Proline (red color) at position 63. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs121434264: L64P: The amino acid Leucine (green color) changed to Proline (red color) at position 64. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs778467088: D90H: The amino acid Aspartic acid (green color) changed to Histidine (red color) at position 90. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs770439843: R222G: The amino acid Arginine (green color) changed to Glycine (red color) at position 222. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs1060500022: W231R: The amino acid Tryptophan (green color) changed to Arginine (red color) at position 231. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs770734388:P360S: The amino acid Proline (green color) changed to Serine (red color) at position 360. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs754454928: R441C: The amino acid Arginine (green color) changed to Cysteine ( red color) at position 441. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs778432682: R441H: The amino acid Arginine (green color) changed to Histidine (red color) at position 441. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs1292596060: R504S: The amino acid Arginine (green color) changed to Serine (red color) at position 504. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

SNPs ID: rs759222387: R504H: The amino acid Arginine (green color) changed to Histidine (red color) at position 504. Illustration was done by UCSF Chimera (v 1.10.2.) and project HOPE.

Interaction between CDC73 gene and related genes using Gene MANIA software showing the different types of interactions between these genes. With physical interactions being the most significant type (76.64%).

Discussion

It has been found that many genes with proved association to cancer, usually contain single nucleotide polymorphisms (SNPs) (33), Furthermore these disease-causing SNPs are usually found to occur at evolutionarily conserved regions. Those have essential role in the structural integrity and function of the protein (34, 35), therefore, our focus was dedicated to the coding region, which revealed 11novel mutations in CDC73 gene, out of 184-missense SNPs download from NCBI web site. These SNPs were then analyzed using twelve soft wares to study the effect of the mutation on the structural and function of the parafibromin protein.

Project hope was used to study the effect of the mutation on the physiochemical properties of the protein, where we have found changes on the charge, size and hydrophobicity of the protein illustrated in the hope report, which leads to disturbance in the interactions with other molecules. In the first substitution G49C, The mutant residue is bigger and more hydrophobic than the wild-type residue.in L63PThe mutant residue is smaller than the wild-type residue, which lead to an empty space in the core of the protein. In L64P, we also found the mutant residue is smaller than the wild-type residue, causing an empty space in the core of the protein. In D90H, There is a difference in charge between the wild-type and mutant amino acid. The charge of the buried wild-type residue is lost by this mutation. The wild type and mutant amino acids differ in size,

With the mutant residue been bigger than the wild-type residue. The wild-type residue was buried in the core of the protein, and since the mutant residue is bigger, it probably will not fit. In R222G, There is a difference in charge between the wild-type and mutant amino acid, this can cause loss of interactions with other molecules or residues, The wild-type and mutant amino acids also differ in size, where the mutant residue is smaller, this might lead to loss of interactions. Furthermore, the hydrophobicity of the wild type and mutant residue differs, as the mutation introduces a more hydrophobic residue at this position, this can result in loss of hydrogen bonds and/or disturb correct folding. In W231R, There is a difference in charge between the wild type and mutant amino acid. The mutation introduces a charge; this can cause repulsion of ligands or other residues with the same charge. In addition, the wild type and mutant amino acids differ in size, where the mutant residue is smaller; this might lead to loss of interactions. The hydrophobicity of the wild type and mutant residue also differs, that is why hydrophobic interactions, either in the core of the protein or on the surface may be lost. In P360S, the mutant residue is smaller than the wild-type residue, which is why the mutation will cause an empty space in the core of the protein. The hydrophobicity of the wild type and mutant residue also differs; this difference will cause loss of hydrophobic interactions in the core of the protein. In R441C, The charge of the wild-type residue is lost, causing loss of interactions with other molecules. In addition, the wild type and mutant amino acids have different size; causing a possible loss of external interactions. In R441H, interactions are lost due to the loss of charge in the mutant amino acid. In R504S, the interaction is lost due to difference in size, charge and hydrophobicity. In R504H, the interaction is lost due to loss of charge and the difference in size between the wild and mutant amino acid with the mutant amino acid been smaller in size the wild one.

We also recognized common shared domains, which were Cdc73/Parafibromin IPR007852, C-Terminal Domain Super-family IPR038103 and Cell Division Control Protein 73, C-Terminal IPR031336, which indicated the conservancy and the significance of these SNPs.

Chimera software was used to show the 3D changes in the structure of the protein coupled with 2D schematic structures from project hope for comparison (Fig. 2-12). The structural differences between the wild type and mutant type visualized using these two software, prove the pathogenic impact of the eleven SNPs on parafibromin protein.

We tried to identify the variant in CDC73 gene that could cause mutational impact on the protein that could lead to cancer development. Using COSMIC software. We focused on the 11 novel mutation at hand. We found after analyzation that three SNPs will lead to Nonsense Substitution (R222G, W231R and R441C), while one SNP (L63P) will lead to Frameshift-Deletion. This indicate the significance of theses SNPs in the development of hyperparathyroidism-jaw tumor (HPT-JT) syndrome. Other SNPs prediction was not available at the web side.

We analyzed the allele’s frequency using dbSNP Short Genetic Variations that provided different prediction about the global frequency of our gene in question. We noted R504H SNP to have the highest alleles frequency with global frequency of 0.00004.while other SNPs results ranged from 0,0 to 0.00001 indicating that R504H could be best candidate as a diagnostic marker.

Previous papers report novel mutations associated with Jaw tumor syndrome in the following positions: c.191-192 delT,(19) and (c.1379delT/p.L460Lfs*18) (36). While novel deletion of exons 4 to 10 of CDC73 detected in another study (37).

In other study NGS revealed four pathogenic or likely pathogenic germline sequence variants in CDC73 c.271C>T (p.Arg91*), c.496C>T (p.Gln166*), c.685A>T (p.Arg229*) and c.787C>T (p.Arg263Cys that are related to Primary hyperparathyroidism (PHPT) (2) of those mutations, only (A263C) was found to be pathogenic (by SIFT, PROVEAN, Polyphen2 and SNAP2) in the present study. Furthermore A study done by Sulaiman et al. identified another polymorphism at 109 T>G (38). In addition, CDC73 gene mutation was also reported be associated with other types of cancers like gastric, colorectal, ovarian and head and neck cancers (38). The current work provide new mutations in CDC73 gene which are related to Jaw tumor syndrome, this will add to the growing knowledge about the association between CDC73 gene mutations and Jaw tumor syndrome.

The mutations identified in this study can be used as diagnostic marker for the disease, and serve as “actionable targets” for chemotherapeutic intervention in patients whose disease is no longer surgically curable further in vitro and in vivo studies are needed to confirm these results.

Conclusion

In this study, the effect of the SNPs of CDC73 gene was thoroughly investigated through different bioinformatics prediction soft wares, 11 novel mutations were found to have damaging impact on the structure and function of the protein and may thus be used as diagnostic marker for hyperparathyroidism-jaw tumor (HPT-JT) syndrome.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Life Sciences, Genetics, Biotechnology, Bioinformatics, Life Sciences, other

Journal RSS Feed

In silico analysis of CDC73 gene revealing 11 novel SNPs with possible association to Hyperparathyroidism-Jaw Tumor syndrome

Article Category: Research Article

Published Online: Apr 30, 2020

Page range: 67 - 81

DOI: https://doi.org/10.2478/ebtj-2020-0008

Keywords
SNPs, PHPT, gene, HPT-JT, in silico analysis

© 2020 Abdelrahman H. Abdelmoneim, Alaa I. Mohammed, Esraa O. Gadim, Mayada Alhibir Mohammed, Sara H. Hamza, Sara A. Mirghani, Thwayba A. Mahmoud and Mohamed A. Hassan, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

In silico analysis of CDC73 gene revealing 11 novel SNPs with possible association to Hyperparathyroidism-Jaw Tumor syndrome

Abdelrahman H. Abdelmoneim

Alaa I. Mohammed

Esraa O. Gadim

Mayada Alhibir Mohammed

Sara H. Hamza

Sara A. Mirghani

Thwayba A. Mahmoud

Mohamed A. Hassan

Article Category: Research Article

Published Online: Apr 30, 2020

Page range: 67 - 81

DOI: https://doi.org/10.2478/ebtj-2020-0008

KeywordsSNPs, PHPT, gene, HPT-JT, in silico analysis

© 2020 Abdelrahman H. Abdelmoneim, Alaa I. Mohammed, Esraa O. Gadim, Mayada Alhibir Mohammed, Sara H. Hamza, Sara A. Mirghani, Thwayba A. Mahmoud and Mohamed A. Hassan, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Keywords
SNPs, PHPT, gene, HPT-JT, in silico analysis