Structural Analysis and Phylogenetic Relationships of a Teleost Fish, Pethia stoliczkana Based on the Complete Mitochondrial Genome Sequence

In this study, the whole mitochondrial genome sequence of Pethia stoliczkana was obtained using high-throughput sequencing technology, and its structure and characteristics were analyzed. The P. stoliczkana mitochondrial genome contained a total of 16,966 base pairs, including 13 protein-coding genes, 22 transport RNA genes, two ribosomal RNA genes, and one control region. The A+T content (59.7%) of the whole mitochondrial genome was greater than the G+C content (40.3%), indicating an obvious A+T preference. The mitochondrial genome of P. stoliczkana is similar to that of most teleost fish, and no gene rearrangements were detected. The phylogenetic relationship tree of Smiliogastrinae fish was constructed based on 13 protein-coding genes using the Bayesian inference and maximum likelihood methods. We found that P. stoliczkana was closely related to Pethia ticto and Pethia padamya . These results enrich the mitochondrial genome database of Smiliogastrinae fish and provide reference materials for systematic classification of this group of fish.


P ethia stoliczkana
belongs to the order Cypriniformes, family Cyprinidae, and subfamily Smiliogastrinae. This tropical benthic freshwater fish is mainly distributed in Laos, Thailand, Myanmar, and India (Nath et al., 2022). The main morphological characteristics of P. stoliczkana species are as follows: Flank behind gill opening with vertically elongated black blotch; caudal peduncle with vertically elongated black blotch; dorsal fin of sexually active male is red with black margin and two rows of black spots; no barbels; and last simple dorsal ray serrated posteriorly; this fish has important economic and ornamental value (Atkore et al., 2015;Nath et al., 2022).
Mitochondrial DNA (mtDNA) is the only genetic material outside of the cell nucleus in animals that can replicate and transcribe independently. In contrast to nuclear DNA, mtDNA is maternally inherited, has a simple molecular structure, undergoes rapid evolution, and exhibits unorganized specificity. mtDNA is a powerful tool for studying the origin and phylogeny of species, genetic differentiation between related species and intraspecific populations, species identification, and genetic diversity (Funk and Omland, 2003;Wolstenholme, 1992). Fish mtDNA is useful for studying evolutionary genetics; particularly, the mitochondrial genome sequence contains more information than a single gene and more comprehensively reflects the genetic characteristics of species and phylogenetic relationships at different taxonomic levels (Avise et al., 1987). In the past ten years, the mitochondrial genomes of fish have been widely studied using high-flux sequencing technology, leading to an increase in reports on completed mitochondrial genome sequences of fish.
In this study, we determined the full mitochondrial genome sequence of P. stoliczkana using highthroughput sequencing technology and analyzed its gene composition and structural characteristics. Combined with mitochondrial sequence information of related species, the phylogenetic relationships of Smiliogastrinae fish were determined using the protein-coding genome sequence. These results fill a knowledge gap in the molecular biology of P. stoliczkana and complement and improve the limited mitochondrial genome data on Smiliogastrinae fish. This sequence provides molecular evidence and a theoretical O n l i n e F i r s t A r t i c l e reference for classification and identification, germplasm resource evaluation and development, and utilization of this group of fish.

MATERIALS AND METHODS
Experimental materials, DNA extraction, and species identification Samples were purchased from a flower, bird, fish, and insect Market in Mudanjiang, China in June 2022 and preliminarily identified on site based on their morphological characteristics. Genomic DNA was extracted from the fish fins using a noninvasive extraction method. The quality and concentration of the extracted DNA were determined using 1% agarose gel electrophoresis and a NanoDrop 2000 nucleic acid analyzer (Thermo Fisher Scientific, Waltham, MA, USA), respectively. DNA barcoding technology was performed to further identify the species.

Sequencing
DNA samples were sent to Wuhan Beina Biotechnology Co., Ltd. to construct a 350 bp small fragment sequencing library and for high-throughput sequencing. Using sequencing by synthesis technology and an Illumina HiSeq X sequencing platform (San Diego, CA, USA), the constructed sequencing library was sequenced by 150 bp at both ends, and the original sequencing data were filtered using NGS QC Toolkit 2.3.3 (Patel and Jain, 2012) to remove adapter sequences, low-quality terminals, reads with N >10%, and fragments of less than 25 bp.

Assembly, annotation, and feature analysis
Leverage SPAdes v3.11.1 (http://cab.spbu.ru/ software/spades/) (Bankevich et al., 2012) was used to splice clean reads to build contigs. SSPACE (Boetzer et al., 2011) was used to extend the contigs and obtain the final complete mitochondrial genome sequence. MITOS (Bernt et al., 2013) was used to annotate the mitochondrial genome sequence. The results were verified by homology comparison with the mitochondrial genes of known Smiliogastrinae species. tRNAscan-SE software (http:// lowelab.ucsc.edu/tRNAscan-SE/) (Lowe and Chan, 2016) was used to search for the tRNA gene. Mega 11 (Tamura et al., 2021) was used to calculate the base composition, codon usage frequency, AT-skew, and GC-skew of each coding gene in the mitochondrial genome of P. stoliczkana.

Phylogenetic analysis
To examine the phylogenetic status of P. stoliczkana in Smiliogastrinae, the nucleotide sequences of 13 proteincoding genes (PCGs) in the mitochondrial genome were used for phylogenetic analysis. The mitochondrial genomes from 18 species of Smiliogastrinae were selected as reference sequences, and a phylogenetic tree was constructed using the maximum likelihood (ML) and Bayesian (BI) methods, with Sinocyclius bicornutus and Gymnocypris eckloni as the outgroup (Table I).
After multiple nucleotide sequence alignments using Cluster X 2.0 (Larkin et al., 2007), the results were filtered using Gblocks v0.91b (Castresana, 2000), and the alignment results for each gene were concatenated using SequenceMatrix v1.7 (Vaidya et al., 2011). Using SMS software (Lefort et al., 2017) and ModelFinder (Kalyaanamoorthy et al., 2017), the most suitable alternative model obtained from the evaluation of the treebuilding dataset was GTR+I+G. The ML phylogenetic tree was built through 50,000 bootstrap operations using PhyML 3.0 (Guindon et al., 2010). MrBayes3 (Ronquist and Huelsenbeck, 2003) was used to calculate 20,000,000 generations, with the sequences sampled and saved every 100 generations; we discarded 25% of the aging samples and built a BI phylogenetic tree.

Gene structure and composition
The mitochondrial genome of P. stoliczkana obtained using high-throughput sequencing was 16,993 bp in length ( Fig. 1) and contained 22 tRNA genes (tRNAs), 13PCGs, two ribosomal RNA genes (rRNAs), and one control region.  In the control region, eight tRNAs and ND6 genes were in the light chain (L chain) and the remaining 28 genes were in the heavy chain (H chain) (Table II). There were six gene overlaps and 13 gene gaps in the whole mitochondrial genome of P. stoliczkana (Fig. 1, Table II). The total length of the gene interval was 69 bp, with a maximum interval of 31 bp between tRNA-Asn and tRNA-Cys. The total length of gene overlap was 21 bp. Large overlaps were observed between ATP8 and ATP6, ND4L, and ND4. The base number of the overlap was 7 bp. The A+T content (59.7%) was higher than the G+C content (40.3%) in the mitochondrial genome of P. stoliczkana, revealing a preference for A+T and base anti-G bias. These results are consistent with the preference for A+T bases in vertebrates (Sun et al., 2020(Sun et al., , 2022(Sun et al., , 2023.

PCGs
The total length of the 13 PCGs in the mitochondrial genome of P. stoliczkana was 11,408 bp. Except for ND6, which is in the L chain, all genes were in the H chain. Among the 13PCGs, the start codon of the COI gene was GTG, and the remaining start codons were ATG. Deletion of the termination codon is typically thought to be caused by polyadenylation. We found that the termination codons of the ND2, CO II, ATP6, CO III, ND3, ND4, and Cyt b genes in the mitochondrial genome of P. stoliczkana had the incomplete codons T or TA (Table II), which is O n l i n e

F i r s t A r t i c l e
Complete Mitochondrial Genome Sequence of Pethia stoliczkana common in the mitochondrial genomes of metazoa and similar to the termination codons of most mitochondrial PCGs in teleost fish. The uneven distribution of bases is one of the most characteristic features of coding regions. Although the base contents of the different gene fragments differed, they all presented a lower G content and higher A+T enrichment (Table III).

Codon usage and amino acid composition
The relative synonymous codon usage of the P. stoliczkana mitochondrial genome was analyzed using MEGA to determine the ratio of the expected frequency of amino acids using synonymous codons to their observed frequency (Table IV, Fig. 2). There were 25 preferred codons (relative synonymous codon usage ≥1) (Behura and Severson, 2013) in the 13 PCGs of P. stoliczkana. The 11,408-bp gene sequence encoded 3794 amino acids. The most common amino acid in the mitochondrial genome of P. stoliczkana was leucine (Leu), with a content of (11.12%), whereas the least used amino acid was cysteine (Cys), with a content of only 0.66%. rRNA, tRNA, and control region Similar to those in common bony fish, the mitochondrial genome of P. stoliczkana contained 12S rRNA and 16S rRNA, which were between tRNA-Phe and tRNA-Leu2 on the H chain and separated from each other by tRNA-Val. The 12S rRNA sequence was 956 bp in length, its position in the mitochondrial sequence was 69-1025 bp, the length of the 16S rRNA sequence was 1683 bp, and its position in the mitochondrial sequence was 1098-2780 bp. The mitochondrial genome of P. stoliczkana was found to contain 22 tRNAs with a length of 67-76 bp. The 1340-bp control region was between tRNA-Pro and tRNA-Phe.

Phylogenetic relationships
ML and BI phylogenetic trees of Smiliogastrinae were constructed based on the nucleotide tandem sequences of the 13 PCGs. The two tree-building methods generated consistent topological structures (Figs. 3 and 4). Pethia stoliczkana and Pethia ticto were clustered into one branch together with P. padamya, with confidence values of 100%. Except for the genus Puntius, all genera were clustered into one branch with a high confidence value.

DISCUSSION
With advancements in DNA sequencing technology and the rapid development of bioinformatics, fish mitochondrial genomes have been widely studied in the fields of fish germplasm protection, species identification, population polymorphism, and phylogenetic development. Previous studies showed that the mitochondrial genome of fish is typically 15-20 kb, often has a double-stranded closed circular structure, is closely arranged, and has a low molecular weight. The mitochondrial genomes of different species vary widely and contain tandem repeats, base insertions, and deletions (Peng et al., 2006). Each PCG has a different evolution rate. Zardoya and Meyer (1996) divided the evolution rates of 13 PCGs into good, medium, and poor groups, in which COI, ND2, ND4, Cytb, and ND5 genes were good, and COII, COIII, ND1, and ND6 were medium. ATP6, ATP8, ND3, and ND4L levels were poor. The evolution rate of most PCGs was between that in the control region and RNA, showing a moderate evolution rate. Pethia stoliczkana genes, such as CO I, Cyt b, and ND, which exhibit a rapid evolution rate, can be used as molecular markers to distinguish these fish from other Smiliogastrinae fishes and provide a reference for their germplasm resource protection. However, the 16S rRNA sequence in the mitochondrial genome is not a PCG and is not affected by codon selection pressure. Most mutations were neutral. In addition, the evolution rate of mtDNA is significantly higher than that of nuclear DNA. Therefore, the homology of mitochondrial 16S rRNA sequences can be compared to study phylogenetic relationships between species.
The system information contained in a single gene is too small to reflect the entire level of biological molecular evolution; thus, the results obtained by analyzing gene sequences encoded by multiple genomes are more reliable. In fish, the whole mitochondrial genome is widely used to study phylogenetic relationships at different stages. This study provides a basis for germplasm identification, phylogenetic evolution analysis, genetic diversity evaluation, and utilization of P. stoliczkana.

CONCLUSION
The whole mitochondrial genome of P. stoliczkana was obtained using second-generation sequencing. The arrangement pattern of genes in the mitochondrial genome was the same as that of P. ticto and P. padamya and was consistent with the ancestral pattern. Phylogenetic analysis supports the monophyly of the genus Pethia.