In Silico Analyses of the Pseudogenes of Helicobacter pylori
In Silico Analyses of the Pseudogenes of Helicobacter pylori
Neenish Rana, Nosheen Ehsan, Awais Ihsan and Farrukh Jamil*
Department of Biosciences, COMSATS Institute of Information Technology, Sahiwal, Punjab, Pakistan
Pseudogenes were previously regarded as molecular fossils, non-functional by-products of genome evolution. However, it has been indicated by several lines of evidences that some pseudogenes are active. Using current data of NCBI we have retrieved 65 pseudogenes from the genome sequence of human pathogenic bacteria Helicobacter pylori (H. pylori) strain 26695. Computational analysis of the genome showed 6 transcriptionally active pseudogenes that can produce stable mRNA secondary structure compared to their functional parents. Moreover it was observed that their putative protein products will be thermodynamically stable. The sequence-based predictions suggested that the pseudogenes-derived proteins may involve in different biological functions like translation, energy metabolism, amino acid metabolism and transport and binding.
Received 12 April 2016
Revised 24 August 2016
Accepted 30 January 2017
Available online 28 June 2017
FJ designed the study. NR and NE performed the experiments. AI and FJ collected and analyzed the data. FJ supervised and prepared the manuscript.
Pseudogenes, Helicobacter, Protein structure, mRNA stability, Protein models.
* Corresponding author: firstname.lastname@example.org
0030-9923/2017/0004-1261 $ 9.00/0
Copyright 2017 Zoological Society of Pakistan
Pseudogenes are genomic loci having sequence homology with other functional genes but they are biologically inactive due to certain aberrations in their sequences like deletions/insertions frameshift mutations and premature stop codons (). Therefore, they were referred as genomic fossil or inert genes (). However, recent studies have challenged this concept and proposed several different functions for different pseudogenes of unicellular and multicellular organism (; ; ). For example, at the RNA level they can compete with other gene’s RNA by interacting with RNA binding proteins (), and as proteins they may affect parent or other unrelated enzymes. Therefore they may consequently affect vital metabolic pathways ().
Pseudogenes have been categorized into three classes: processed, duplicated and unitary pseudogenes (). Processed pseudogenes lack introns and they are generated from reverse transcription of their mRNA or integration into silent regions of the genome (); whereas duplicated pseudogenes are inactive due to certain disabling features in their regulatory regions such as unfaithful gene duplication, premature stop codons, frame shift mutations or removal of promoter region (); while unitary pseudogenes exist with absence of functional counterparts ().
Helicobacter pylori is a human pathogen that exists in the gastric mucosa of human stomach, and it plays a vital role in causing gastric cancer and gastrointestinal disorders (). Its genome sequence (NCBI accession no; GCA_000307795) host 65 pseudogenes out of 1561 total predicted genes. It might be possible that they may produce stable proteins as a study has shown that non coding part of E. coli produced stable proteins (). In this frame work, computational analyses of H. pylori’s pseudogenes is a step toward understanding possible functions of their derived proteins.
The genome of H. pylori (GCA_000307795) was analyzed and 65 pseudogenes sequences were retrieved from NCBI database and computationally translated into protein by using Transeq tool of European Bioinformatics Institute (EBI) (; ). Twenty one pseudogene-derived proteins showed significant homology with other functionally active proteins and these were considered for further analyses (i.e. predicting sequence based function prediction).
Sequence based function prediction
By using the basic local alignment search tool (BLAST) functional parents of the pseudogenes were identified (). Strength of the pseudogenes and their relative’s promoters was calculated by using BPROM program and expressed in linear discrimination function (LDF) value (). Messenger RNA (mRNA) stability was predicted by using RNA fold web server () on the basis of minimum free energy (MFE). ProtFun tool (, ) was used for predicting the possible functions of the pseudogene and sub-cellular localization of these proteins was studied by using ProtCompB program; while the physiochemical properties: molecular weights, theoretical isoelectric points, aliphatic index () and hydropathicity (GRAVY) () were predicted by ExPASy ProtParam server (). Tertiary structures of pseudogenes encoded proteins and their functional relative proteins were predicted using SWISS-MODEL and I-TASSER server (; ).
Stability of pseudogene-derived proteins
GROMAS69 force field implemented in Swiss PDB viewer () to calculate total energy of the predicted model based on non-bonded and electrostatic constrains. Total cation-π interactions and their energies were calculated using CaPTURE Program (). By using Expasy ProtParam server instability index was calculated.
Results and discussion
This study was designed to understand possible roles of the pseudogenes of H. pylori by using different computational analysis of the artificially transcribed and translated products of the genes. Analyses of the upstream sequence of the 21 pseudogenes and their functional parents showed that 4 pseudogenes (HP0052, HP0205, HP0502, and HP1522) have a stronger promoter region while 7 pseudogenes (HP0039, HP0041, HP0343, HP0482, HP0505, HP0548, HP0744 and HP0915) host weaker promoters sequence than those of their functional parents (). Analyses showed that 6 pseudogenes (HP0143, HP0369, HP0432, HP0481, HP0548 and HP0679) have 100% sequence identity with 100% query coverage to known proteins of other H. pylori strains and their promoters also show similar strengths (). It appears that these genes might be active and they may be poorly annotated.
The expression of the pseudogenes was evaluated on the basis of free energy values (MFE) of the secondary structures of their mRNAs. It has been proposed that highly expressed genes pose less stable mRNA secondary structure, while low expressed genes show more stable secondary structure (; ). MFE values of the pseudogenes range from -563.5 to -18.5 kcal/mol and it correspond well to the MFE of their functional parents (-794 to -33.50 kcal/mol) (). Analyses of the data showed that MFE values of the five pseudogenes (HP0143, HP0369, HP0432, HP0481 and HP0679) are same as those of their parent mRNA’s. It seems that these 5 genes might be active under certain specific conditions or may transcribe to regulate other parent genes of the organism. There are only three pseudogenes (HP0094, HP0619 and HP0744) that produced more stable mRNA compared to their functional parents ().
|Sequence ID||Stability centers||Instability index||
Total energy (kcal/mol)
Tertiary structures of the 21 pseudogene-encoded proteins were predicted and their putative functions were obtained from ProtFun tool. Most of the proteins were
|Sequence ID||Molecular mass (KDa)||Theoratical pI||Aliphatic Index||GRAVY||Sub-cellular localization|
|HP0039 (899692)||10.1977||5.82||93.33||-0.017||Inner Membrane|
|HP0052 (899240)||41.0625||8.03||79.54||-0.42||Outer Membrane|
|HP0254 (899058)||4.4624||10.18||77.11||-0.542||Outer Membrane|
|HP0482 (899253)||19.3029||7.68||88.29||-0.372||Outer Membrane|
found to be involved in translation (6 enzymes/proteins), energy metabolism (5), amino acid metabolism (5), transport and binding (2), while 3 proteins showed their potential to be a part of cell envelope (). Analyses showed that 9 proteins have potential to be localized in cytoplasm, 7 in outer membrane, 4 in inner membrane and 1 will be in periplasm ().
Out of 21 only 7 proteins form stable tertiary structures () as the overall energy of the protein was -38 to -398 kcal/mole and instability index was found to be less than 40 (). This suggested in vivo stability of these proteins as stability index below 40 is considered as a good evidence of stability and it shows that proteins will be stable in vivo (). The physiochemical parameters like molecular masses of these stable proteins were determined that range from 4.46 to 41.06 KDa, suggesting the presence of different size proteins (). The isoelectric point (pI) values of the proteins vary from 5.82 to 10.38 that indicated acidic nature of only one protein (HP0039m) and basic nature of 6 proteins (HP0041, HP0052, HP0254, HP0369, HP0482, HP0505), and the aliphatic index value of the proteins ranges from 63.79 to 129.18 (higher the value, should greater the stability of protein). The hydropathicity value (GRAVY score) showed that six proteins are hydrophobic in nature while one is hydrophilic (). These stable proteins may be significant for the microorganism and may be expressed under specific conditions.
In conclusion, our study identifies 6 pseudogenes in H. pylori that appears to be active genes as they are 100% identical to other functional genes in other strains of H. pylori. Overall, we have identified 7 pseudogenes that may produce stable proteins, however further studies are required to explore exact role of these proteins.
We gratefully acknowledge Higher Education Commission (HEC) of Pakistan for grants to establish Bioinformatics research laboratory at COMSATS, Sahiwal.
Statement of conflict of interest
Authors have declared no conflict of interest.
Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T.G., Bertoni, M., Bordoli, L. and Schwede, T., 2014. SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucl. Acids Res., 42: W252–258.
Guruprasad, K., Reddy, B.V. and Pandit, M.W., 1990. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engin., 4: 155–161.
Hoefman, S., van-der-Ha, D., Boon, N., van Damme, P., de-Vos, P. and Heylen, K., 2014. Niche differentiation in nitrogen metabolism among methanotrophs within an operational taxonomic unit. BMC Microbiol., 14: 83.
Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Staerfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A., Knudsen, S., Krogh, A., Valencia, A. and Brunak, S., 2002. Prediction of human protein function from post-translational modifications and localization features. J. mol. Biol., 319: 1257–1265.
Mukund, M.A., Bannerjee, T., Ghosh, I. and Datta, S., 1999. Effect of mRNA secondary structure in the regulation of gene expression: unfolding of stable loop causes the expression of Taq polymerase in E. coli. Curr. Sci., 76: 1486–1490.
Solovyev, V. and Salamov, A., 2011. Automatic annotation of microbial genomes and metagenomic sequences. In: Metagenomics and its applications in agriculture, biomedicine and environmental studies (ed. R.W. Li), Nova Science Publishers, pp. 61-78.
Welch, J.D., Baran-Gale, J., Perou, C.M., Sethupathy, P. and Prins, J.F., 2015. Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential. BMC Genom., 16: 113.
Zhang, Z.D., Frankish, A., Hunt, T., Harrow, J. and Gerstein, M.B., 2010. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genom. Biol., 11: R26.