Identification of Glycolysis-Related Gene Signature and Prediction on Prognosis of Colon Cancer

.

Identification of core biomarkers for cancer prevention and treatment is crucial. Tumor glycolysis is an important metabolic phenotype and many natural compounds from medicinal plants exhibit antiglycation effects. Therefore, it is of potential significant to find biomarkers based on glycolysis-related genes in predicting the prognosis of colon cancer (CC), a common invasive gastrointestinal tumor. In this study, clinical and gene data were collected to identify glycolytic genes that significantly associated with overall survival (OS) rate of CC patients through gene set enrichment analysis (GSEA) and Cox regression models. The K-M method was used to determine the difference of OS rate with high and low risk scores. The accuracy of risk scores were determined based on receiver operating characteristic (ROC) curve. Hierarchical analysis and Cox regression were performed to assess the correlation between CC risk score and clinical symptoms. Moreover, qPCR was used to verify the expression level of prognostic genes in human colorectal cancer cell line (HCT116) and human colorectal epithelial cells (FHC cells). The results showed 202 glycolytic genes were found with statistical differences, and 4 glycolytic genes were identified, including Enolase 3 (ENO3), glypican-1 (GPC1), Nucleolar Protein 3 (NOL3) and Stanniocalcin 2 (STC2). A prognostic risk score model was established, and the OS rate of high-risk patients was found significantly reduced (p <0.001) compared to low-risk patients. The 5-year OS ROC area under curve (AUC) of the model was 0.75. qPCR results confirmed that glycolysis-related genes ENO3, GPC1, NOL3 and STC2 were significantly upregulated in HCT116 cells compared to FHC cells.
In conclusion, our study identified four glycolytic genes that significant impact prognosis of CC, and established a prognostic risk prediction model, which can provide reference for prognosis assessment and efficient treatment in CC.

INTRODUCTION
I dentifying core biomarkers is crucial for cancer prevention and treatment, including colon cancer (CC), a common invasive gastrointestinal tumor with increasing incidence worldwide in recent years (Huang et al., 2019;Lord and Hall, 2019). Although treatment regimens such as targeted therapy have greatly improved the clinical outcomes, most CC patients still have poor prognosis due to recurrence and metastasis (Gunter and Leitzmann,

O n l i n e F i r s t A r t i c l e
2006). Therefore, it is necessary to find biomarkers to predict the prognostic risk of distant metastasis in CC treatment for timely intervention and treatment.
In normal cells, cells can use a variety of metabolic pathways associated with biosynthesis in order to keep glucose in a relatively balanced state at all times. However, recent evidence has shown that the metabolism of cancer cells is very differently from that of the normal cells from which they originate. One of the characteristics of glucose metabolism in tumor cells is that they still choose to use glycolysis when the oxygen content is normal, which is called Warburg effect. Tumor cells mainly obtain energy through glycolysis, many natural compounds isolated from medicinal plants have shown promising anti-glycation effects. Glucose metabolism in cancer cells exhibits the feature of high aerobic glycolysis (Taieb et al., 2019;Tao et al., 2018). Enhanced glycolysis and glutamine metabolic reprogramming are the main signs of cancer, leading to a variety of metabolic reprogramming and hypoxia-induced resistance to chemotherapy drugs, which made the cancer cells to maintain a high proliferation rate and resistance to certain cell death signals. This kind of phenomenon has been found to be the new therapeutic targets and the focus of new cancer drugs.
According to the relevant researches, glycolysis is very important in proteus biosynthesis, hence confering more advantages on cancer cells in the face of reduced nutrient supply (Abbaszadeh et al., 2020;Feinberg et al., 2017). The oncogenic regulation of glycolysis indicates the function of tumor glycolysis, and it is necessary to elucidate the underlying mechanism of glycolysis in CC (Jiang, 2017;Chen and Russo, 2012). Recent studies have shown that the clinicopathological characteristics such as age, metastasis and stage are not obvious and can not accurately predict the prognosis of cancer. Therefore, more and more mRNAs have been identified to determine the prognosis of CC (Bearne, 2017). However, these biomarkers are not sufficient to predict patient outcomes accurately and independently, especially the individual gene levels which may be influenced by various factors, thus restricting their application as effective independent prognostic indicators (Ji et al., 2016).
In this study, we aimed to identify glycolysis-related genes in CC through gene set enrichment analysis (GSEA). Subsequently, we extracted meaningful mRNAs from CC patient samples and used whole genome expression data from the Cancer Genome Atlas (TCGA) to build a glycolysis-related prognosis model. Combined with multivariate data analysis, receiver operating characteristic (ROC) curve and independent prognosis analysis, the gene and protein expression levels in CC samples were explored. Additionally, the expression of related genes in the human colorectal cancer cell line HCT116 was further verified by qPCR. The overall experimental design was shown in Supplementary Figure S1. These analyses are expected to predict prognosis of CC, which provides a new target for the diagnosis of CC.

Data collection
The CC patients' key features in TCGA (The Cancer Genome Atlas) data were downloaded through the data portal (https://portal.gdc.cancer.gov/). The clinical data including: survival, survival time, age, gender and cancer stages. Total 470 CC patients and 41 normal control samples in TCGA dataset were collected, and Perl code was used to preprocess the above raw data, including: transcriptome data processing, ID conversion, clinical information extraction, etc. (Maiers et al., 2019).

Functional enrichment analysis
Gene Set Enrichment Analysis (GSEA) was used for enriching and analyzing the selected gene set, to find potential significant difference genes. Normalized enrichment score (NES) ≥ 1.0 and nominal p ≤ 0.05 were set for the criteria of identifying significantly enriched gene sets in GSEA. Gene expression differences between CC patients and healthy samples were judged. The signal pathways related to glycolysis were further determined and glycolytic genes were identified. The relevant signal pathways were as follows: Glycolysis_by_fructose_2_6_ bisphosphate_metabolism, Biocarta_glycolysis_pathway, Go_glycolytic_process, Hallmark_glycolysis, Kegg_glyc-olysis_gluconeogenesis, And Reactome_glycolysis.
The KEGG database was used for pathway enrichment analysis of these DEGs in an effort to gain high-level insights into their potential functional relevance. GO enrichment analyses were conducted to assess the enrichment of these DEGs in specific biological processes, molecular functions, and cell components (BPs, MFs, and CCs, respectively) (Liu et al., 2022).

Glycolysis prognostic analysis and construction of risk scores model
The prognosis model of glycolysis was constructed by univariate Cox regression and multivariate Cox regression analyses. Cox algorithm was applied to calculate the relation of mRNA level and OS of tumor patients, and to determine the factors associate with glycolysis prognosis. For univariate Cox regression, hazard ratio (HR) >1 and Cox PV <0.05 were used as the screening criteria, multivariate Cox analysis was applied to assess results of univariate Cox analysis. The

O n l i n e F i r s t A r t i c l e
final results were used as the related factors of glycolysis prognosis and these factors were also the weight of glycolysis prognosis model. The above analysis was conducted by Rstudio based on the R package. The screening of prognostic factors were divided into two types: risk factor (HR >1) and protective factor (0 < HR < 1). As there was no protective prognostic factors screened, so it was ignored and only risk factor was focused. According to the Cox analysis, the formula of prognostic risk score was calculated as the sum of each risk coefficient (coefi) value multiply by expression of corresponding prognostic factor (Chen et al., 2020).
HCT116 cells were cultured in RPMI-1640 medium supplemented with 1% penicillin-streptomycin and 10% fetal bovine serum, while FHC cells were grown in DMEM medium with the same conditions. Both cell lines were incubated at 37°C in a 5% CO 2 humidified incubator (Sanyo, Japan), and were passaged using 0.05% trypsin when they reached 80% confluency.

Validation of prognostic genes by qPCR assay
Total RNA was extracted using the RNAeasy animal RNA isolation kit with rotating columns according to the instructions. The extracted RNA was reversetranscribed to cDNA using the RevertAid First Strand cDNA Synthesis Kit following standard protocols. The qPCR assay was performed on a Roche LightCycler 480 (Roche Diagnostics, Vilvoorde, Belgium). The protocol was as follows: initial incubation at 95°C for 10 min, followed by 50 cycles of 95°C for 15 s, annealing for 30 s, and elongation at 72°C for 30 s; and finally cooling at 40°C for 30 s. The annealing temperature for ENO3, GPC1, NOL3, STC2 and β-actin was 61, 55, 56, 60 and 55°C, respectively. Three replicates of each sample were analyzed, and the relative expression of mRNA was quantified after normalization to β-actin. The primers are listed in Table I.

Statistical analysis
The relation of the survival data of every variable were tested using K-M method, while that of clinical data and risk scores were measured using Cox analysis method (Zhang et al., 2021). The significant expression of glycolysis prognostic factors in normal and tumor samples was detected using the student t-test, with a statistically different threshold set at 0.05. All the analysis was conducted using R-studio software (Version 3.6.2, Boston, MA, USA).

Enrichment analysis of glycolysis and differential genes
The result revealed 3 significant functional groups of glycolysis genes, namely Glycolysis_by_fructose_26_ bisphosphate_metabolism, Hallmark_glycolysis, And Reactome_glycolysis ( Supplementary Fig. S2). Compared with normal samples, Glycolysis_by_fructose_2_6_ bisphosphate_metabolism in tumor samples was significantly correlated (p <0.05), while the other two showed a stronger significant correlation (p <0.01). Perl code was applied to extract the expression levels of glycolysis gene that were significantly related in the functional group, followed by the application of the difference analysis filter.
Compared with the normal samples, a total of 202 significantly differentially expressed genes were found (p <0.05), of which 54 were downregulated and 148 were upregulated in cancerous and paracancerous tissues. To confirm whether these differential genes were mainly involved in glycolysis, we conducted GO and KEGG analysis using Metascape. The result indicated that (Fig.  1), the most important biological process was associated with glycolysis and the most important metabolic process was related to glycometabolism.

Establishment of glycolysis prognosis model
The expression of the differential genes was analyzed in conjunction with the clinical survival data of tumor samples using R language. Single factor Cox regression analysis was performed on the differential gene expression of tumor samples, and the findings were shown in Table  II. A total of 7 genes were screened as potential glycolysis prognostic genes. Subsequently, using the Cox regression analysis, four potential glycolysis prognostic genes were screened, including Enolase 3 (ENO3), glypican-1 (GPC1), Nucleolar Protein 3 (NOL3) and Stanniocalcin 2 (STC2). Using the above 4 screened genes, a glycolysis prognosis model was built. Notably, the criterion for accessing various risk was based on the mean value. Consequently, patients with values greater than median were regarded as high-risk, whereas that with values below median were regarded as low-risk.

Analysis of glycolysis prognosis model
After the establishment of the glycolysis prognosis model, patients were divided into two risk groups, and their survival curves were analyzed. The survival rate of the two groups showed a downward trend over time, but the curves were significantly separated (p <0.01), with eight-year survival rates of 30.1% and 55% (Fig. 2A). The ROC curve analysis revealed an area under curve (AUC) value of 0.75. The accuracy of prediction by the model was higher than 0.7, indicating that the diagnosis of glycolysis prognostic model had practical significance (Fig. 2B). A: risk score. B: the survival number of low risk score patients was significantly more than that of high risk score patients. C: heat map analysis. ENO3, GPC1, NOL3, STC2 genes expression in patients with high risk scores were significantly more than those with low risk scores.

O n l i n e F i r s t A r t i c l e
The risk curve combinatorial graphs demonstrated that patients from left to right with low-risk scores were denoted with green and remained relatively stable over time, while those with high-risk scores were denoted with red and exhibited an increasing risk score over time, the overall risk score from left to right showing a gradually increasing (Fig. 3A). In Figure 3B, from left to right, the number of deaths increased and the maximum survival time decreased. Moreover, it can be seen that the genes level associated with prognosis of glycolysis from left to right was significantly deepened (Fig. 3C). To further analyze the mutations of the four selected genes for glycolysis prognosis of CC, clinical samples were analyzed through cBioPortal online database. Of the 105 patients studied, 1.9% had NOL3 mutation, 1% had STC2 mutation, 4% had GPC1 mutation and 12% had ENO3 mutation. Among them, 1 case of gene amplification (NOL3), 1 case of missense mutation; 4 cases of missense mutation (GPC1), 10 cases of deep deletion (ENO3), 1 case of truncation mutation, and 1 case of missense mutation (Fig. 4A). We further analyzed the expression of ENO3,

O n l i n e F i r s t A r t i c l e
Q. Ran et al. GPC1,NOL3,and STC2 in tumor and normal samples for further verification. It was found that the expression level of the four genes were significantly up-regulated in tumor samples compared with normal samples (p <0.001, Fig. 4B).

Independent prognostic analysis
To assess whether these mRNA markers on prognostic ability were independent of each other, various indicators including age, gender and tumor stage were established. The whole data set of prognostic was analyzed by single and multiple factors. The results showed that (Fig. 5A), in the single factor analysis, the p-value of age, tumor stage and risk score was less than 0.01, indicating that these three factors had a significant correlation with the survival status of patients. Moreover, the HR value was greater than 1, suggesting that these three factors were high-risk factors. That is, the order the patient, the higher the risk at tumor stage type, and the higher risk score, leading to the greater risk the patient is. The above results were further validated, confirming that age, tumor stage type and risk score can be independent of other clinical characteristics as independent prognostic factors (Fig. 5A). Similarly, results were observed in the constructed clinical survival curve (Fig. 5B), the survival rate of patients over 65 years old was significantly different from that of patients aged under 65 years (p = 0.01). Moreover, compared to stage I, II in the tumor stage, CC stage III, IV were obviously different (p = 0.01). However, there was no significant difference observed in that of CC between the sexes.

Validation of clinical grouping model
To further verify whether the model was applicable to different populations, clinical grouping verification was carried out. As depicted in Figure 6A, there was a significant difference in the high-risk and low-risk groups O n l i n e

F i r s t A r t i c l e
Glycolysis-Related Gene Signature for Prognosis Prediction of Colon Cancer 7 for both age groups > 65 years (p = 0.001) and ≤ 65 years (p = 0.012). With respect to gender groups (Fig. 6B), a significant difference between the high-risk and low-risk groups was observed among female patients (p = 0.044); the difference was more significant compared to male patients (p = 0.003). For the stage groups (Fig. 6C), there was no significant difference observed between the highrisk and low-risk groups in Stage I and II (p = 0.597), while a significant difference was observed in stage III and IV (p = 0.002). For T groups (Fig. 6D), there was no significant difference between the high-risk and low-risk groups in T1-2 group (p = 0.069); however, a significant difference was observed in T3-4 group (p = 0.002). These results suggested that the glycolysis prognosis factors in CC may not be well predicted in early diagnosis, and the glycolysis prognosis model may exhibit weaknesses in the early stage of CC.
Glycolysis-related genes ENO3, GPC1,NOL3,STC2 in HCT116 and FHC cells To identify whether these glycolysis-related genes ENO3, GPC1, NOL3 and STC2 were responsive in colon cancer, we studied the mRNA expression levels of these genes in FHC cells and HCT116 cells in vitro by qPCR. As shown in Figure 7, the mRNA expression levels of ENO3, GPC1, NOL3 and STC2 in HCT116 cells were significantly upregulated (P <0.01) compared to those in FHC cells.

DISCUSSION
CC is a common digestive system disease. Although the mortality of CC patients remains largely unchanged, the mortality of elderly patients (≥65 years old) is increasing year by year (Abdel-Wahab et al., 2019;Guillaumond et al., 2014). The occurrence of diseases is usually accompanied with the abnormal expression of specific signal molecules (Song and Miao, 2022). Metabolism of tumor cells has a special way called the Warburg effect. Although under aerobic conditions, energy is still rapidly produced mainly through glycolysis, which is therefore a crucial energy source for tumor cells. Thus, glycolysis-related genes have significant associations with the prognosis of CC patients (Gendoo, 2020;Ibrahim et al., 2018). Consequently, we developed a glycolytic-related genes model based on the characteristics of glycolytic metabolism of tumors to predict the prognosis of CC patients.
High-throughput technology has facilitated largescale biological data research, with abundant genomic data have been extracted to identify new diagnostic biomarkers (Pan et al., 2021;Chaneton and Gottlieb, 2012;Li et al., 2014). Recent studies have constructed novel prognostic markers of gene expression levels or mutations based on RNA sequencing data, and identified them using Cox proportional hazards regression models (Ren et al., 2020;Lai et al., 2017). In this study, Cox regression analysis was carried out to evaluate the application value of four glycolytic gene combinations in predicting the prognosis of CC patients, and a correlation regression analysis model was established. The CC dataset in TCGA was used, and the tissue-related data were compared and analyzed. K-M analysis showed that there was a negative correlation between risk parameters and prognosis. This study used a statistical model composed of multiple prognostic markers, combined with the predicted effect of each component glycolytic gene to improve predictive accuracy, which is widely used because they are more accurate than single biomarkers in this field. However, a limitation of the study was the lack of data about cancer cell metastasis and recurrence in TCGA, therefore we only use OS rate to evaluate the patients with CC.
Using a bioinformatics approach, 4 genes that associated with cellular aerobic glycolysis (ENO3, GPC1, NOL3, STC2) were identified, and their prognostic value in CC were demonstrated. ENO3 is mainly involved in glycolytic and glucose metabolism, and changes in ENO3 gene expression can be observed in various tumor cells. Pyruvate kinase is a kind of crucial enzyme in glycolysis pathway, regulated by the enealcoholize enzyme, which contributing in converting enol phosphate type ADP into ATP, pyruvic acid and pyruvate (Munir et al., 2020;Amin et al., 2019;Guillaumond et al., 2014). GPC1 is a member of the GPC family of glucosylphosphatidylinositolanchored membrane-related heparan sulfate proteoglycans (HSPG), which includes six subtypes that mainly regulate 2020). However, limited research data has been conducted on the association between GPC1 and other cancers, so the study of GPC1 may need further exploration and analysis for CC. Nucleolus protein plays an important role in regulating tumor nucleolus proteins, improving the synthesis level of chromosomal structural proteins and promoting the early stage of tumor cells by accelerating the spindle division rate of cancer cells. However, although NOL3 is highly expressed in many non-terminally differentiated tissue tumor cell lines, including liver cell lines (Huang et al., 2018), the association between NOL3 and tumor development remains unclear and needs to be further explored. STC is a glycoprotein hormone that is mostly involved in the physiological functions of the body through spontaneous secretion or paracrine. Changes in the concentration of STC2 can significantly promote changes in cancer cells in vivo. STC2 can affect the function of endoplasmic reticulum of cancer cells, and promote the damage of mitochondria, leading to abnormal regulation of tumor cell cycle with significantly increased changes in cell cycle (Yang et al., 2020;Bironaite et al., 2013;Ciriello et al., 2012). Therefore, further studies on the genes and their prediction models are necessary to constantly identify new biomarkers and establish related prediction models. This holds great promise for the diagnosis, treatment and prognosis of cancer.

CONCLUSION
In this study, four glycolysis-related genes as biomarker in CC were identified, and a novel predictive risk score model was established based on their expression levels to predict the prognosis of CC patients. Moreover, their mRNA expressions were detected to further confirm their connection in CC. The identification of these glycolytic gene biomarkers for CC prognosis provides a new potential approach to improve the prediction and treatment of CC. Notes: 41 normal and 470 tumor samples with colon cancer based on GSEA; NES: Normalized enrichment score according to requirements was greater than or equal to 1.0; Nominal p-value was less than or equal to 0.05; FDR: false discovery rate according to requirements was less than or equal to 0.25.