Single Nucleotide Polymorphism Induces a Positive Selection Pressure at Gag-Pol Sites in Human Immunodeficiency Virus Favoring Drug Resistance Mutations

Amreen Zahra1, Mushtaq A. Saleem1*, Hasnain Javed2, Muhammad Azmat Ullah Khan3, Muhammad Naveed1 and Abdul Rauf Shakoori4 1Department of Biochemistry and Biotechnology, University of Central Punjab, Lahore 54782, Pakistan 2Advanced Diagnostic Laboratory, Punjab AIDS Control Programme, Government of Punjab, Lahore 54000, Pakistan 3Department of Biochemistry and Molecular Biology, University of Gujrat, Gujrat 50700, Pakistan 4School of Biological Sciences, University of the Punjab, Quaid-I-Azam Campus, Lahore 54590, Pakistan Article Information Received 25 May 2020 Revised 08 June 2020 Accepted 12 June 2020 Available online 13 October 2020


A cquired immunodeficiency syndrome (AIDS)
is initiated by the human immunodeficiency virus belonging to the Retroviridae family. Human immunodeficiency virus (HIV) mainly leads to reduced immunity, provoking increased vulnerability to opportunistic infections like HCV HBV, sexually transmitted diseases and also a variety of co-infections with HIV (Chang et al., 2013). HIV-1 may have originated in O n l i n e F i r s t A r t i c l e spacer peptide 2, p1nucleocapsid protein p7, capsid protein p24, p1 and P6 protein (Freed, 1998). While, Pol encodes RT, RNase H, integrase and protease. Cleavage of viral polyproteins, reverse transcription and integration into the host genome during the life cycle of HIV is done with the help of these viral enzymes (Mushahwar, 2006). Viral Gag-Pol precursor proteins are known to play an elemental role in the viral life cycle. These proteins thus facilitate the formation of mature virus particles through proteolytic processing (Fun et al., 2012). Functional variants can have both damaging and neutral effects on the structure and function of a protein. The hypervirulent nature of a protein provokes its destabilization, altering the gene regulation, thus posing a risk in maintenance of structural integrity of cells (Arshad et al., 2018). To address the question of these fatal anomalies, clinical trials and cohorts involved in Gag-Pol genotypic testing would be highly beneficial for therapeutic analysis. However, computational analysis of such genes would play a significant role in revealing underlying mechanisms and understanding patterns of drug resistance and mutations. Antiretroviral therapy (ART) is the only remedy ever since the diagnosis of this fatal virus. It favors by reducing high viral loads to undetectable levels, thus recovering the immune system. But, the increased virulency in mutations promotes drug resistance to ART which causes high rise in viral load rendering disruptions of the immune system. Pakistan is working efficiently to fight and combat this deadly disease, which has increased in a couple of years. A sudden outbreak of HIV was reported in Kot Imrana, a small village located near Sargodha in Central Punjab in March 2018 (Zaid and Afzal, 2018). In 2019 more individuals were affected, indicating the spread of the disease to about 669 (13.38%) HIV patients and manifested increased mortality (Wahid, 2019). This crisis was followed by two major HIV outbreaks in 2019 in Rotodero among infants and sudden increased cases of HIV-1 in Faisalabad, which was an alarming health catastrophe in Pakistan (Zahra et al., 2019;Mir et al., 2020). To date, there is not much comprehensive molecular and computational analysis of HIV-1 in Pakistan, which could provide better insights into acquisition risks, transmission and challenges to HIV treatment. This is the first study in Pakistan on HIV-1, which covers a comprehensive in vitro and in-silico analysis targeting the Gag-Pol proteins in Punjab province of Pakistan. Since little is known about HIV-1's molecular virology and bioinformatics, the present study was aimed at analyzing the two integral genes Gag-Pol of HIV-1 at molecular and computational level using various bioinformatics strategies to detect the increased virulence and deleterious effect of the mutations. Mechanisms underlying virulence and drug resistance need to be investigated. 3D models of the Gag-Pol protein will provide better insight for future studies in HIV and may prove helpful in development of medicines for the treatment of such non-treatable anomalies caused by genomic variations.

Sample collection
This study was done in collaboration with Punjab AIDS Control Programme (PACP). Blood samples were obtained from different treatment centers of PACP, Lahore between November 2018 to October 2019. A total of 365 participants were recruited for this study, who had a viral load exceeding 100,0000 copies/ml and a first CD 4 cell count >350 cells/µl on the diagnosis. Antiretroviral (ART) naïve and ART therapy were included in the sample pool. A total number of 85 (23.29%) samples were confirmed with HIV-1 infection. Fifty two males (14.25%) and 30 (9.04%) females were diagnosed with progressive HIV-1 infection. Only 33 HIV-1 (0.1%) positive isolates fulfilled the quality processing criteria for this study. Recently infected were 23 and 10 were already on ART treatment. The successfully amplified samples for HIV-1 Gag and Pol were used for genotyping and phylogenetic analysis. For detailed mutational and phylogenetic analysis, only 11 isolates qualified depending upon sociodemographic creteria mentioned in Table I. Sample collection was permitted and facilitated by PACP and informed consent was taken from the enrolled subjects. Study approval was granted by the Research Ethics Committee of the University of Central Punjab (Supplementary Fig. 2A, 2B).

Immunoassays and reference laboratory tests
All the samples were centrifuged for 3 min at 12,000 rpm followed by serum separation. The screening was then performed for the presence of HIV-1 infection by rapid testing devices. A three device method was used according to the guidelines suggested by WHO for diagnosis of HIV-1 in resource-limited countries (i) Alere™ HIV Combo Rapid Test (Alere Medical Co. Ltd., Matsudo-chi, Japan), (ii) Uni-Gold HIV (Uni-Gold, Trinity Biotech, Ireland; #1206502) and (iii) SD Bioline HIV-1/2 3.0 (Standard Diagnostics, Kyonggi-do, South Korea).

CD 4 count testing and HIV detection by real-time PCR
After collection of the blood samples, 25 μl of blood was added to the Alere PIMA™ CD 4 cartridge and CD 4 count was displayed on the Alere PIMA™. Real-Time PCR was performed on the cobas 4800 HIV-1, COBAS® O n l i n e

F i r s t A r t i c l e
AmpliPrep/COBAS® TaqMan® HIV-1 system, an in vitro nucleic acid amplification test for the quantification of HIV-1 RNA in plasma following automated specimen processing, amplification and detection. The test quantitates HIV-1 RNA at a range of 20-10,000,000 copies/mL (33 to 1.67x 107 International Units IU/mL) based on the WHO international standards recommended for HIV-1. (Thermo Scientific, USA, # K1622) was used for cDNA preparation. Nested polymerase chain reaction (PCR) was used for the amplification of Gag and Pol genes. Partial Pol gene containing protease (PR) gene of 99 amino acid codons and the first 298 codon segment of the reverse transcriptase (RT) was amplified. Both Gag and Pol fragments were amplified by nested RT-PCR the method described by Chen et al. (2014). 1% agarose gel was used for visualization of the PCR products. Successfully amplified samples were sequenced by First Base Laboratories Sdn Bhd, Seri Kembangan, and Selangor, Malaysia on an ABI 3730xl automated DNA analyzer (Applied Biosystems, Foster City, CA.

Computational analysis
The sequences of Gag and Pol were submitted to Genbank (accession numbers MT 462197, MT 462198, MT 462199, MT 462200, MT 489306, MT 489307, MT 489308, MT 489303, MT 489304, MT 489305, MT 462202) in FASTA format using BankIt-NCBI-NIH database. Mutational analysis for Gag-Pol was performed by POLYPHEN 2, PROVEAN, I-MUTANT, SIFT and PHD-SNP to predict the allele function, effect of non-synonymous SNPs, structure and function-based characteristics (Adzhubei et al., 2013;Choi and Chan, 2015;Kamaraj and Purohit, 2013;Kumar et al., 2012). POLYPHEN 2 uses the naïve Bayesian classifier to predict the allele function with a score ranging from 0-1 calculating the probability for a given mutation to be deleterious mutation and a score nearer to 1 (Adzhubei et al., 2013). The prediction of conserved domains was performed by Pfam (Finn et al., 2015), CDD (Marchler-Bauer et al., 2014) and Scan Prosite (Hulo et al., 2006). To predict a protein sequence variation and its function, Protein Variation Effect Analyzer (PROVEAN) was used. When the score equalizes to a predefined threshold i.e. -2.5; the protein variant is found to have a more hypervirulent effect causing damage (Capriotti et al., 2006). The impact of non-synonymous SNPs on the structure and function of proteins was determined by using I-MUTANT 3.0. Sorting intolerance from tolerance (SIFT) is a web-based tool using the PSI-BLAST protein database. Functionally associated protein sequences were collected using SIFT. Due to the difference in software and algorithms, their predictions may differ from each other (Ng and Henikoff, 2001). PHD-SNP and SVM based classifier, classifies substitutions at specified positions, amino acid replacements and profilebased prediction. It classifies the mutations as neutral or deleterious polymorphism (Finn et al., 2015). Pfam database was used for analysis for the large collection of protein families represented by multiple sequence

O n l i n e F i r s t A r t i c l e alignments (MSA) and hidden Markov models (HMMs).
To reveal the biomolecular interactions and mechanisms, docking methodologies are widely applied to structurebased drug design investigations. Further the visualization of the protein -ligand interactions were made by the PyMOL software. Computational docking for the study of biomolecular interactions and mechanisms was done by AutoDock Vina (AutoDock 4) (Bhargava et al., 2014). Phylogenetic tree analysis was performed using the neighbor-joining method with 1000 bootstrap replicates using MEGA (Molecular Evolutionary Genetics Analysis, version 5.1) (Tamura et al., 2011). RAMPAGE (Ho and Brasseur, 2005) and PROCHECK (Laskowski et al., 1993) tools were used for checking stereochemical quality, analyzing overall residue-by-residue geometry.

Mutational analysis
We subjected all the detected single nucleotide polymorphisms (SNPs) of Gag and Pol to five different in silico approaches like POLYPHEN 2, I-MUTANT, SIFT, PROVEAN and PHD-SNP were used to investigate the effect on the structure of the Gag-Pol protein. However, four tools I-MUTANT, PHD-SNP, SIFT and PROVEAN strongly verified the prevailing hypervirulence in Gag and Pol with a higher degree of deleterious, specific and sensitive SNP values as shown in Table II. The identified SNPs demonstrated deleterious and damaging effects with the specified scores, thus depicting an increased influence of diseased SNPs shown in Table II.

POLYPHEN 2 (Polymorphism Phenotyping V2)
Almost the majority of the mutations in Polyphen 2 analysis showed a change in amino acid towards specificity and sensitivity, denoting that the mutation causes a deleterious and damaging effect with a score near to 1.00 and greater than 1 (Table II). The highest specificity was demonstrated at position S61 in Gag and M90L in Pol, giving a score of 1.88 (Table II). The mutational analysis of this research study revealed S61A, S61M, S61Y, S61G and S61Q were the most dysfunctional mutations in Gag and probably could be one of the reasons for increased drug resistance. The deleterious SNPs are based on positionspecific independent count score difference, where score "1" is considered the most deleterious and our results also demonstrated synchrony as reported in some recent computational studies (Arshad et al., 2018).

PROVEAN
The cutoff for Gag variants C263I, S61A, S61N, V81I and in Pol T160V depicted a significant deleterious effect of more than -2.5 thus denoting as the most virulent mutations detected by Provean analysis (Table II).

I-MUTANT
In the current study, majority of the SNPs (Table II) in the Gag gene, demonstrated a decreased score of greater than "5" significantly denoting the fact that the disease could be caused by any of these mutations. SNPs D269E and Y221H showed a score of about 5 and 8, respectively, in the Pol gene exhibiting deleterious effect with increased virulent influence. The output predicted the decreased changes in protein stability upon the possible mutations for a given position with RI value (Reliability Index) 1 and 2 for the transitions, as mentioned above. Our results revealed that most of the SNPs detected showed decreased stability in I-mutant detected Gag-Pol variants (Table II).
In accordance with these results, previous studies have also demonstrated decreased protein stability leading to degradation, misfolding and aggregation of proteins (Mayer et al., 2007). This phenomenon of decreased stability, however, sheds light on the notion that these detected polymorphisms probably cause the maximum harmful effect to the protein by affecting its stability and disrupting the proper translation of the protein.

PHD-SNP
The output of Gag-Pol variants by PHD-SNP (Table  II) contains mutations in the protein sequence. A219 P, a Gag variant, demonstrated mutation with a score of 9 and L242I, a Pol detected variant gave a score of 6.

Conserved domains and motifs prediction
For the prediction of conserved domains, the following tools were used:

Pfam
The domains predicted by Pfam for Gag-Pol from the 11 isolates were Gag_p17, Gag_p24, zf-CCHC, RVT-1, RVT thumb, RVT connect, RNase-H, Integrase-Zn and MLVINC as shown in Figure 1. The matrix proteins of Gag showed an E-value of 1.6e-61 and 4.8e-79, whereas Pol demonstrated an E-value of 8 i.e.e-31and 5.6e-18 of RVT-1 and RVT thumb (Supplementary Table S1). The significant domains of Pol were protease, RNase H and Integrase, which denoted the E-values 1.1e-24, 3.3e-11 (Supplementary Table S1). These results infer the fact that the major mutations in both these genes could be in these detected domains with raised E-values suggesting a higher rate of virulency in mutations of these genes. Highly conserved residues could be functional or structural based on their location on the protein surface or inside its core (Doniger et al., 2008). Amino acids involved in vital processes, for example, in interactions among different proteins, are more conserved than others. SNPs located at the conserved regions are considered immensely deleterious to protein as compared to those at non-conserved sites (Miller and Kumar, 2001).

CDD
Results of conserved data domain (CDD) revealed 10 major conserved domains named as RT-rtv, Gag_p24, Gag_ p17, RVT connect, RNase-H, RVP, rve, RVT-thumb (Figs. 2A and 2B). The highest E-values were depicted by Pol region of domains RVT-1 and RVT_thumb with 8.1e-31 and 5.6e-18. These domains majorly contributed to the catalytic activity of proteins at the N-terminal. Gag_p24, mainly involved in the formation of the inner protein layer, depicted an E-value of 1.36e -64, whereas Gag_p17, the matrix O n l i n e

F i r s t A r t i c l e
A. Zahra et al.   protein gave an E-value of 1.17e-20 (Table III). Pol gene gave an E-value of 5.51e -123 in RT_RTV and RVT thumb consisted of four-helix bundles sustains an E-value of 1.66e-12 ( Figs. 2A and 2B).

Scan prosite
The predictions resulted from Scan Prosite were functional sites, protein families by comparing amino acid patterns in the Gag-Pol protein domains like ferredoxin type iron sulphur binding regions, EGF-1, Anaphylatoxin-1, Aspartyl proteases and RT-Pol exhibited a confidence level of (-1) and increased high scores (Supplementary  Table S2). These predictions reveal information about the biochemical function of the core HIV-1 proteins and have also been described previously in other studies (Naveed et al., 2014).

Secondary structure prediction
The translate tool EXpasy was used to predict the secondary structure of the protein. was translated by EXpasy, further, the translated sequence was modeled with I-TASSER for tertiary structure prediction.

Tertiary structure prediction
The first model having the best-predicted scores gave a C-score of -1.18, TM-score 0.57±0.15 and an estimated RMSD 8.6±4.5Å in the Gag 3D predicted model ( Supplementary Fig. 1A). The Pol 3D model showed a C-score of -0.49, TM-score of 0.65±0.13 and an estimated RMSD 7.6±4.3Å (Supplementary Fig. 1B). Similar studies done previously on GDH gene and mental retardation genes like TUSC3 have used I-TASSER for tertiary structure prediction (Naveed et al., 2014(Naveed et al., , 2016. These findings significantly strike the notion which would be highly beneficial in the development and understanding of novel therapeutic strategies for diseases.

Phylogenetic analysis
MEGA 5 software was used to elucidate the phylogenetics and ancestral history in the current study. The (Neighbor-Joining method) was used for the construction of a phylogenetic tree. The interior branch method and a bootstrap value of 1000 were implemented to validate the ancestral relationships (Figs. 3A and 3B).

3D Structure validation
Ramachandran plot The 3D structure validation was done by RAMPAGE (http://services.mbi.ucla.edu/SAVES/Ramachandran/) and PROCHECK (Ho and Brasseur, 2005) tools which provide a view of the conformation of a protein and verification of the 3D structure. Gag had 87.3% residues in the most favored regions, 11.3% in additional allowed regions, 1.0% in generously allowed regions, 0.4% in disallowed regions (Fig. 4A). However, the final model of Pol had 87.3% residues in the most favored regions, 11.3% in additional allowed regions, 1.0% in generously allowed regions, 0.4% in disallowed regions, number of non-glycine and non-proline 100% residues. The total number of residues was 821 (Fig. 4B). Based on an analysis of 118 structures of resolution of at least 2.0 Angstroms and R-factor no greater than 20%, a good quality model would be expected O n l i n e

F i r s t A r t i c l e
A. Zahra et al.
to have over 90% in the most favored regions.

Auto dock vina
Tools such as Pymol, discovery studio and Mgl tool were used in docking methodology (Trott and Olson, 2010). Here we checked the binding capabilities between the protein of HIV and the most recommended drugs for HIV-1 like Dolutegravir (DTG) ligand possibly for control of AIDS. Molecular docking imaging analysis (Figs. 5 and 6) exhibited interactions visualized between the protein and ligand. Structural shreds of evidence demonstrated the hydrogen bonds and binding interactions both in Figures 5 and 6, respectively. Drugs like DTG have been well known for their inhibition reported in previous and current studies. DTG mainly functions by binding the catalytic domain of the HIV Gag-Pol region, diminishing the activity of the viral enzymes by the assembly of the HIV genome ceasing into the host which blocks the access of the substrate rendering the entry to the catalytic site (Wainberg et al., 2016). For the current research analysis, DTG was used as a major docking drug with the obtained I-TASSER structures of the isolates to check the ligandbinding sites and increased virulency among them and comparable with the wild type structure of HIV.

DISCUSSION
Principally in the mechanism, HIV-1 protease inhibitors (PIs) compete with the Gag substrates for attachment with the active site of the protease (da Silva et al., 2019). However, even when the enzyme exhibits reduced activity, secondary mutations develop in the protease gene, which fastens the process of Gag substrate cleavage, thus initiating Fig. 5. Molecular docking image analysis of Gag. The interactions are visualized between the protein and ligand of Dolutegravir for Gag protein. Dolutegravir is one of the core drugs used for treatment of HIV-1 targeting the Gag-Pol region. A shows the solid structure with residues; B represents the meshy surface and C represents the solid surface cartoon. viral fitness (Dam et al., 2009). Alternative theoretical methodologies have been proposed in the literature previously suggesting that mutations in the Gag-Pol region actively participated in the resistance against protease inhibitors (PI) and that is resulting in the virus resistant to these PIs in the absence of any major mutations in the protease gene (Nijhuis et al., 2007;Verheyen et al., 2006). Docking results showed binding affinities of both Gag and Pol with protease inhibitors like Dolutegravir gave a binding affinity of -5.7 kcal/mol in Gag and -7.9 kcal/mol in Pol (Supplementary Table S3A and S3B). This suggests that DTG one of the most potential drug for control of HIV. However the findings of this study also gives coverage to the fact that drug resistance could be mainly in the Gag-

O n l i n e F i r s t A r t i c l e
Positive Selection Pressure at Gag-Pol Sites in Human Immunodeficiency Virus Pol region of HIV-1 thus suggesting that these two proteins create a positive selection pressure at Gag-Pol cleavage sites leading to progression of the disease. However, the binding affinity of DTG was lower in Pol, which suggested significantly increased virulence in the Pol region and enhanced vulnerability to mutations with an up-regulated hypervirulence. The binding sites were very few or only in a single place, as shown in Figure 5A with lysine residues and with lysine and valine residues (Fig. 6A). In addition to this, the results of docking of this study also shed light on the phenomenon that in future therapeutic implications with synthetic compounds of DTG can be introduced in a combination of other drugs as the major protease inhibitors for ART therapy for Pakistani population. These in vitro findings confirm the molecular revelations for the major mutations and underlines the importance of Gag-Pol hypervirulence in resistance in HIV-1. As future research perspectives, this must be regarded as a primary significant phase for determining the mutational effects, positive selection pressure and interactions on viral resistance and fitness. Hence, this study can pave beneficial computational strategies for mutational analysis in the cure of HIV-1 by drug production. Computational analysis of Gag-Pol gene of Human Im munodeficiency Virus Type -1(HIV-1) protein is useful for the identification of deleterious mutations. The binding of ligand and protein sheds light on the phenomenon of fusion of Gag-Pol mutations leading to the initiation of AIDS. Moreover, the results of our study and computational analysis describe the suitable positions for drug designing, rendering novel methodologies for the cure of HIV. This study focuses and details the use of different bioinformatical approaches for exploring the virulent mutations, their harmful effects and structure prediction for the treatment of HIV.

CONCLUSION
Nevertheless, the underlying mechanisms still remain unclear, how HIV-1 achieves the regulation of fusion, suggesting more future studies are required to define the mechanism of this regulation. These in vitro findings confirm the molecular revelations for the Gag-Pol major mutations and underlines the importance of Gag-Pol mutations in resistance, very similar to the results obtained in this study. As future research perspectives, this must be regarded as a major significant phase for determining the mutational effects and interactions on viral resistance and fitness. Furthermore, such comprehensive investigations of immune diseases will help in the exploration of pathways for drug development and innovative methods for saving lives and curing HIV. Hence, this study can pave beneficial computational strategies for mutational analysis in the cure of HIV-1 by drug production.

ACKNOWLEDGMENTS
Special thanks to Dr. Munir Ahmad Malik Project Director, and Dr. Asim Altaf Punjab AIDS Control Programme (PACP) for providing assistance and facilities at PACP. We are also very grateful for the contributions of the laboratory staff at PACP for providing the opportunity to collect the samples and laboratory procedures as recommended by WHO.

Compliance with ethical standards
This research study was ethically approved by the Institutional Review Board of University of Central Punjab and permitted by the Punjab AIDS Control Programme also.

Funding
The research study was funded by University of Central Punjab, Lahore and Punjab AIDS Control Programme.

Supplementary material
There is supplementary material associated with this article. Access the material online at: https://dx.doi.

F i r s t A r t i c l e
Positive Selection Pressure at Gag-Pol Sites in Human Immunodeficiency Virus