Gene Multimerization in Expression Vector: A Potential Strategy for Enhanced Protein Expression in E. coli

Protein production in any expression system can be enhanced either by optimizing the culturing parameters or targeting the genetic factors that enhance protein production. One of those genetic factors is gene dosage, which can be increased by increasing the vector copy number. However, increased number of expression vector poses some metabolic burdens on bacterial cells. The current study provides gene multimerization strategy as an alternate way to amplify the gene dosage and explores its effect on the expression of human epidermal growth factor (hEGF) by constructing three types of expression vectors i.e., pET28-EGF-C1, pET28-EGF-C2 and pET28-EGF-C3, each containing single, double and triple expression cassettes, respectively. These vectors were transformed in E. coli strain Rosetta-gami 2 (DE3) to develop three different bacterial populations. Expression of human epidermal growth factor was analyzed by SDS-PAGE and subsequent densitometric analysis was done using ImageJ software. The difference in protein expression of all three populations was analyzed by using one-way ANOVA. Clones with double and triple copies of genes showed an increased expression up to 3.8 and 3.1 folds as compare to the clones having single copy of gene. Gene multimerization strategy can be used to enhance production of recombinant proteins in bacterial expression system.


INTRODUCTION
B acterial expression system has been used for research and commercial based production of both prokaryotic and eukaryotic proteins. Bacterial strain, E. coli has been used to produce biopharmaceutical products such as insulin, interferon, human growth factors and other recombinant proteins (Tripathi, 2016). Bacterial strain E. coli offers different advantages, such as easy to grow, high growth rate and cheap media demands. Moreover, its cellular components and genetics are well studied, which make it a host, that can bear fruitful genetic manipulations for expression of recombinant proteins (Joseph et al., 2015;Rosano and Ceccarelli, 2014;Sivashanmugam et al., 2009;Tripathi, 2016). However, in spite of all these advantages, protein production in bacterial expression system, has never been a straight forward method. Certain disadvantages such as lack of post-translational modifications, formation of inclusion bodies, protein toxicity etc. may result in production of proteins either with improper structure and function or with no or very low yield (Rosano and Ceccarelli, 2014). Different methodologies have been developed to target those factors that might be responsible for the problem of low yield in bacterial expression system. Briefly, these methodologies can be broadly classified as (i) those dealing with optimization of bioprocessing and culture conditions and (ii) those that deal at molecular level for enhancement of protein production, such as strain engineering, increased plasmid number, codon optimization and genetic manipulations of certain genetic elements.
Optimization of bioprocessing and culturing conditions include several strategies. For enhanced production of recombinant proteins, usually cultures with high cell densities are preferred in batch and fed-batch fermentation processes. Cultures can either be induced or grown in auto-induction media for enhanced production of recombinant proteins in a regulated way (Sivashanmugam et al., 2009). Different media compositions, have been studied and shown to affect the protein expression level.

O n l i n e F i r s t A r t i c l e
Specific media formulations can be screened for improved protein expression (Broedel et al., 2001). Besides that, different recombinant proteins such as hIGF (Zhang et al., 2010), neutrophil-acting protein (Lu et al., 2015), Fab antibody fragments (Ukkonen et al., 2013), Phenylalanine Ammonia lyase (Jaliani et al., 2014) and many others require optimization of different culturing conditions such as temperature, pH, aeration, concentration of inducer and post-induction period can also be optimized for enhanced protein expression. Selection of appropriate host for recombinant protein expression is also one of the crucial parameters that might depend upon the type of protein being produced. Strain engineering has been extensively reviewed. Targeted and whole genome based mutagenesis have been used for development of host strains, such as Origami, Rosetta and many others for enhanced protein expression. Specific strains have been developed to carry out post-translational modifications, such as disulfidebridge formation, acetylation and glycosylation, hence leading to proper folding of recombinant protein and ultimately improved protein expression (Makino et al., 2011). Though the process of optimization improves the protein yield and acts as bridge between lab and large-scale production, however, the process is time taking, tedious and needs a systematic study for every culturing parameter. Apart from above mentioned techniques, different techniques of molecular biology have been used to enhance the heterologous protein expression. The transcriptional rate for any heterologous gene expression can be related to the strength of the promoter. Different promoters, such as T7, lac, trp, lacUV5, recA, tetA, etc. have been used to produce different recombinant proteins in E. coli (Joseph et al., 2015). The strength of promoters need to be experimentally determined via comparative studies and can be considered as a function of its nucleotide sequences (Li and Zhang, 2014).
The translational rate can also be a considered as a critical factor that might affect the protein expression. The translational rate of any recombinant protein might be affected by its mRNA secondary structures and the codon biasness of host. The 5′ end of mRNA contains specific genetic elements such as initiation codon, Shine-Dalgarno sequences and enhancers that affect the rate of translation (Vimberg et al., 2007). Codon biasness can be overcome using codon optimization. In this modern era of synthetic biology, the codons that cannot be easily translated by the translation machinery of the host, are replaced by the ones that favor the efficient protein translation, hence overcoming the problem of codon biasness (Burgess-Brown et al., 2008;Elena et al., 2014). Besides that, vectors containing genes for expression of rare tRNAs can also be co-expressed in host cells to avoid the problem of codon biasness. This has been practiced using the Rosetta strains of E. coli (Novy et al., 2001). Use of some universal translation initiation tags might also improve the translation initiation rate and hence improved protein expression (Parret et al., 2016;Vimberg et al., 2007).
Gene dosage is another critical factor that might contribute to increase protein expression. The effect of gene dosage has been well studied in Pichia pastoris expression system, which provides opportunities for inserting more than one gene copy number in its genome, thus generating multicopy clones (Aw and Polizzi, 2013). The effect of gene dosage on expression of several recombinant proteins such as IFNα2b, insulin precursor, etc. has been studied in Pichia pastoris expression system (Khan et al., 2014;Mansur et al., 2005). The gene dosage in bacterial expression system ultimately depends upon the copies of expression vector, as each vector contains single gene. For such studies, different plasmids having high, medium and low copy numbers are used (Chakravartty and Cronan, 2015;Friehs, 2004). The effect of increased gene dosage to enhance the protein expression, is usually studied by increasing the plasmid copy number as has been done for production of recombinant proteins in Leuconostoc citreum (Son et al., 2016). However, the apparently simple relationship between plasmid copy number and protein expression does not always correlate with each other. A high plasmid copy number poses additional difficulties on the strain for its maintenance by saturating the replication machinery. Besides that, increased copy number has been known not only to affect the metabolism of strain but also affects its growth rate and cell physiology. The effects of high copy number have been extensively reviewed (Silva et al., 2012). In current study, we have reported for the first time an alternate way for studying the effect of increased gene dosage by increasing gene copies per plasmid keeping the plasmid copy number unaltered for enhanced protein production in bacterial expression system. The strategy has been named as gene multimerization in expression vector of bacterial expression system. An expression cassette containing T7 promoter, 6X His-tag and gene for human epidermal growth factor has been prepared in the backbone of pET28 a (+) expression vector. The expression cassette has been multimerized to prepare the expression vectors containing double and triple copies of expression cassettes. Three different populations of clones of E. coli strain Rosetta-gami 2 (DE3) containing expression vector with single, double and triple copies of expression cassettes have been generated and comparative protein expression studies have been carried out to analyze the effect of gene multimerization on protein expression level.

Strains and plasmids
E coli Top10 F′ cells were used for routine cloning A. Sultan et al.

O n l i n e F i r s t A r t i c l e
experiments. E coli. Rosetta gami TM 2 (DE3) (Novagen) was used as expression strain for protein expression studies. pCR™2.1 Vector was used for gene cloning and pET28a (+) (Novagen) was used as expression vector.
Enzymes and culture conditions T4 DNA ligase of Rapid DNA ligation Kit (Thermo Scientific, Catalog # K1422) was used at every ligation step. BamH I, Bgl II and Nde I were used for restriction digestion. All these enzymes were used from Thermo Scientific. Calf Intestine Alkaline Phosphatase (CIAP), (Merck) was used to prevent self-ligation where necessary. Culturing of cloning and expression strains was carried out in Lauria-Bertani (LB) medium at 37 o C and 200rpm, containing appropriate drugs as described in a section given below.

Multimerization of hEGF and hIGF in expression vectors
The genes hegf derived from Huh cell line, was PCR amplified using gene specific primer pairs from constructs prepared previously. Gene specific primers used in this study were NdeI-EGF-F and BamH1-EGF-R with sequences of 5′CATATGAATAGTGACTCTGAATGTCCCCTG T3′ and 5′GGATCCTTAGCGCAGTTCCCACCACTTC3′ respectively. The forward and reverse primers contained restriction sites for NdeI and BamHI, respectively. The amplified PCR product was ligated to PCR 2.1 vector and sequenced using M13 Forward and M13 Reverse primer pairs. The insert (hegf) was released using enzymes Nde1 and BamH1 and ligated to expression vector pET28 a (+) for preparing single copy constructs i.e., pET28-EGF-C1 and pET28-IGF-C1. For preparing double copy construct, i.e., pET28-EGF-C2, having two copies of genes, the inserts from single copy constructs were released using enzymes BamH1 and Bgl II and ligated back to already prepared single copy constructs, linearized with BamH1 only. The prepared double copy constructs were analyzed for successful preparation via restriction digestion with BamH1 and Bgl II. Moreover, it also served as a strategy for analyzing whether both the copies of genes were in desired orientation i.e. in head to tail fashion having promoter at 5' end and terminator at 3' end. The double copy constructs of both genes were further linearized with BamHI, to which the inserts released from single copy constructs using enzymes BamHI and Bgl II, were ligated, to prepare triple copy construct i.e., pET28-EGF-C3. The triple copy constructs were analyzed by digestion with enzymes BamH1 and Bgl II. The constructs linearized with BamHI only, were given CIAP treatment to prevent selfligation (Fig. 4).

Expression studies of prepared constructs in Rosettagami TM 2 (DE3)
To study the effect of gene multimerization on protein expression, the prepared constructs for both the genes i.e., higf and hegf, were transformed into E coli. Rosettagami TM 2(DE3), to obtain three different populations of clones containing single, double and triple copy constructs, respectively. Expression studies for single, double and triple copy constructs were carried out in three iNdepeNdent experiments. Five clones from among each population were selected for protein expression studies. Each clone was cultured in 5 ml LB media containing tetracycline (12.5 µg/ml), streptomycin (50 µg/ml), kanamycin (50 µg/ml) and chloramphenicol (34.5 µg/ml), for overnight growth at 37 o C. OD 600 of the overnight grown cultures was measured and volume was calculated to dilute the cultures in 10 ml fresh LB media containing already mentioned drugs, such that their OD 600 was maintained to be 0.1. OD 600 was measured at regular intervals of two h. As its values reached above 0.6, cultures were induced with 1mM isopropyl-β-d-thiogalactopyranoside (IPTG). Cultures were grown for 4 h of post-induction period and harvested by centrifugation at 6000 rpm for 5 min. 1 ml culture was separated before induction to be used as negative control for subsequent studies.

Protein expression analysis
The cells, harvested, were suspeNded in 500 µl of 1X PBS, pH 7.3 (Oxoid), to which equal volume of 2X protein sample buffer (Tris-Cl, pH 6.8, 120mM, Glycerol 10%, SDS 2%, Bromophenol blue 0.05% and β-mercaptoethanol 5%) was added. Heat shock was given at 100 o C for 10 min which was followed by centrifugation at 13,000 rpm, 4 o C for 10 min. Supernatant was analyzed on 15% polyacrylamide gel via sodium dodecyl-sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Protein bands were visualized by staining the gels with Coomassie Brilliant R-250. Bands, at appropriate size, having reduced intensity or absent in negative control were considered to be that of interest. Image J software was used for densitometric analysis. The band appeared with the unaltered intensity in all the gels, was used for normalization of our bands of interest. Western blotting was performed for validating the expression of recombinant hEGF and hIGF. For this purpose, the protein samples of single, double and triple copy clones, showing maximum expression were resolved in 15% SDS-polyacrylamide gel and bands were transferred to nitrocellulose blotting membrane (AmershamTMProtanTM 0.45µm NC). Followed by overnight blocking in 5% skim milk (5% w/v in PBS), membrane was incubated with mouse anti-histidine antibody diluted as 1: 1000, for three h. After thorough washing with PBST (0.05% Tween 20 in PBS),

O n l i n e F i r s t A r t i c l e
membrane was incubated for one hour with AP conjugated goat-anti mouse IgG (Santa Cruz Biotechnology, Inc.) diluted as 1: 7500. Bands were visualized by applying NBT/ BCIP (Sigma-Aldrich) solution.

Plasmids structural stability
The prepared constructs for both genes, were analyzed for structural stability as well. Plasmid DNA was isolated by using Plasmid DNA Extraction Mini Kit (Favorgen Biotech Corp.) from clones containing single, double and triple copy constructs, after overnight growth of post-induction period. The isolated plasmid DNA was subjected to restriction digestion analysis by enzymes BamHI and Bgl II.

Statistical analysis
The peak areas obtained by densitometric analysis were further statistically analyzed for each population. The mean expression level in selected five clones for each population of single, double and triple constructs, were analyzed by ANOVA to determine whether the difference in protein expression in these populations was significant. One sample Kolmogorov-Smirnov test and test of homogeneity of Variances was performed to check if the data was fulfilling the criteria for ANOVA.

Multicopy expression vectors and molecular confirmation gene multimerization.
The gene hegf was successfully amplified by PCR and ligated to pET28a (+) to prepare single copy construct i.e pET28-EGF-C1. An expression cassette of about ~340bp having T7 promoter, 6 His Tag and hegf with its stop codon at 3′ end was released by its digestion with enzymes BamH1 and Bgl II and was used to prepare double copy pET28-EGF-C2 and triple copy pET28-EGF-C3 constructs. Each prepared expression vector was analyzed by restriction digestion with BamH1 and BglII. Release of inserts of expected size i.e., ~ 700 bp and ~1000 bp clearly demonstrates the successful preparation of expression vectors containing double and triple copy constructs with desired orientation of expression cassettes i.e. in head to tail fashion having promoter at 5′ end and terminator at 3′ end. (See supplementary material for results of restriction digestion analysis). These results indicated successful construction of all three recombinant constructs required to study the effect of gene multimerization in E. coli.

Protein expression studies
The single, double and triple copy constructs for hegf were transformed and expressed in E. coli strain Rosetta-gami TM 2 (DE3). Five clones for each of the constructs were selected for protein expression analysis by PAGE.
To maintain the similar cell densities for comparative expression studies, cells were induced as their OD 600 reached just above 0.6 and were allowed to grow for a postinduction period of 4 h. The total cell lysate was analyzed on 15% Tris-Glycine polyacrylamide gel (Fig. 1). The uninduced sample for each construct was used as a control. The protein band appearing at appropriate size i.e., ~7.3 kDa was considered to be the protein of interest which was absent or had leaky expression in un-induced sample. The intensity of the bands obtained was analyzed by densitometric analysis using image J software. The ratio of intensity of bands of hEGF to that of bands of protein appearing at size above 20kDa, in all gels, showing uniform intensity in all clones for each construct was considered as normalized protein expression. The clones having double copy constructs i.e., pET28-EGF-C2 showed an overall highest expression in all three populations (Fig. 2). The average protein expression in clones having pET28-EGF-C2 and pET28-EGF-C3 was increased up to almost 4 and 3 folds respectively as compared to the protein expression in clones having single constructs i.e., pET28-EGF-C1 (Table I).

Expression confirmation by western blotting
The clones showing highest protein expression from each of the population were selected for confirmation of expression of hEGF by Western blotting. Each expression cassette in clones having pET28-EGF-C1, pET28-EGF-C2 and pET28-EGF-C3 has a sequence of His Tag between T7 promoter and hegf gene sequence. Therefore, the protein expression was validated by using mouse anti-histidine antibodies followed by their detection with AP conjugated O n l i n e

F i r s t A r t i c l e
Gene Multimerization in Expression Vector goat anti-mouse antibodies. Reddish brown bands appeared, that demonstrated the successful expression of human epidermal growth factor in clones having single, double and triple copy constructs (Fig. 3). Western blotting results further confirms that all three types of clones carrying single and multimerized genes are actively expressing recombinant hEGF. One way Analysis of Variance (ANOVA) was performed using SPSS (Statistical Package for Social Sciences) to consider whether the variations in expression level of hEGF across the three types of populations, i.e., clones having pET28-EGF-C1, pET28-EGF-C2 and pET28-EGF-C3 was significant. Before applying ANOVA it was ensured by One-Sample Kolmogorov-Smirnov Test that data was following normal distribution. Moreover, tests of Homogeneity Variance taking p value of 0.05 also confirmed that variations with in each population were not significant. For ANOVA the level of confidence was taken as 0.01. The significance obtained by ANOVA was 0.000 which was below the selected level of confidence, hence indicating that variation in protein expression across the populations i.e., single, double and triple copy clones was significant (Table II). Fig. 2. Showing relative expression level of human epidermal growth factor in 3 different populations of Rosetta-gami TM 2 (DE3) strain having single, double and triple copies of expression cassette. The mean value for expression level in clones containing pET28-EGF-C1 was found to be 0.61 ± 0.04. The mean value for expression level in clones containing pET28-EGF-C2 was found to be 2.36 ± 0.21 which is 3.8 folds higher as compare to their counterparts having single expression cassette. The clones containing expression vector with three expression cassettes i.e. pET28-EGF-C3 showed the mean expression level of 1.94 ± 0.154 which is 3.1 folds higher as compare to their single copy counterparts. The error bars indicate the standard error for each population.  Plasmid stability Structural stability of prepared constructs i.e., pET28-EGF-C1, pET28-EGF-C2 and pET28-EGF-C3 was very important to be analyzed. The clones of E. coli strain Rosetta-gami TM 2(DE3) having single, double and triple copy constructs were allowed to go through the whole process of culturing and post-induction period of 16 h. The plasmid DNA extracted from the clones was analyzed O n l i n e

F i r s t A r t i c l e
A. Sultan et al.
by digestion with BamH1 and BglII on 2% agarose gel. Detection of inserts of ~340bp, ~700bp and ~1000bp from pET28-EGF-C1, pET28-EGF-C2 and pET28-EGF-C3 (supplementary material), confirmed that the prepared constructs were structurally stable during the expression studies. Fig. 4. Maps of single, double and triple copy constructs prepared for the expression of human epidermal growth factor by multimerization of hegf gene. The expression cassette of about ~340bp was released by digestion with Bgl II and BamHI from single copy construct Figure 1A and inserted back to its BamHI linearized form for preparation of double copy construct i.e., pET28-EGF-C2 as shown in Figure 1B. The same expression cassette was inserted to the backbone of double copy construct, linearized with BamH1, for preparation of triple copy constructsi.e. pET28-EGF-C3 as shown in Figure 1C. Sequential steps repeated in same pattern can generate constructs with multiple copies of genes. Sequence maps were generated using SnapGene software version for Windows.

DISCUSSION
Bacterial expression system has been used for production of several recombinant proteins. Two major approaches can be used for enhancing the protein production. The first approach includes such methodologies that deal with optimization of culturing parameters such as pH, temperature, aeration, media formulations, concentration of inducer, use of appropriate strain etc. (Jaliani et al., 2014;Ranjbari et al., 2015;Sivashanmugam et al., 2009). On the other hand, second approach utilizes such methodologies that deal at molecular level for improved protein expression. Molecular techniques manipulate certain genetic elements which may increase the protein expression level either at transcriptional or translational level (Makino et al., 2011). Synthetic biology enables us to create and use strong promoters such as T7, trc, etc. that allow us to achieve high expression level by increased transcriptional rate (Tegel et al., 2011). Besides, optimization of translational initiation region and codon optimization has also been utilized to overcome translational difficulties and codon biasness for enhanced protein production in a particular host (Tegel et al., 2011). In addition to that the effect of gene dosage has also been a crucial factor for protein expression level. Gene dosage has been studied in yeast as well as in bacterial expression system. Multiple copies of genes can be inserted in genome of yeast to study the effect of gene dosage for enhanced protein expression (Liu et al., 2020). However, gene dosage in bacterial expression system directly relies on expression vector copy number. Usually, plasmid copy number is increased for increasing the gene dosage in bacteria. This poses additional difficulties in growth and physiology of bacteria. The current study provides an alternate strategy for increasing the gene dosage without altering the copy number of expression vector (Kittleson et al., 2011).
In this study, we prepared an expression cassette having T7 promoter, 6X His-tag and gene for human epidermal growth factor in the backbone of pET28a (+) expression vector. The expression cassette was successfully multimerized to prepare the expression vectors with two and three copies of expression cassettes (see Fig. 1). The prepared constructs were transformed into E. coli strain Rosetta-gami 2 (DE3) and expression level of human epidermal growth factor was checked by SDS-PAGE for all three types of bacterial populations having single, double and triple copy constructs as shown in Figure 2. The intensities of protein bands were analyzed using ImageJ and the normalized protein expression of clones having double and triple copies of expression cassette was compared with that of the clones having single copy of it. One way ANOVA clearly demonstrated that a significant difference in the expression level of human epidermal growth factor was present across the three types of populations (see Table II). As shown in graph 01, the expression level was increased up to 3.8 and 3.1 folds in clones having double and triple copies of expression cassettes respectively. The three expression cassettes might have led to the saturation of host metabolic machinery due to which decrease in expression level has been observed. The current study is an initiative to explore the effect of gene multimerization in bacterial expression system, which can be a useful strategy O n l i n e

F i r s t A r t i c l e
Gene Multimerization in Expression Vector 7 for enhancing the production of several other recombinant proteins in bacterial expression system. However, a more systematic study at genomic and transcriptomic level can be carried out to analyze the extent of linear relationship between gene dosage and protein expression level which can increase its potential of enhanced protein production not only at lab scale but can facilitate industrial processes in an economical way.

CONCLUSION
In conclusion, we have successfully used gene multimerization strategy for preparation of structurally stable expression constructs containing single, double and triple copies of expression cassettes and studied the effect of gene dosage on protein expression without altering the plasmid copy numbers. We consider it worth considerable strategy for enhanced protein expression in bacterial expression system.