Population Data and Internal Validation of the 21 Short Tandem Repeat Loci in Turkish Population

1Institute of Forensic Sciences and Legal Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey 2Department of Biophysics, Faculty of Medicine, Inonu University, Malatya, Turkey 3Department of Health Management, Faculty of Health Sciences, Istanbul Yeni Yuzyil University, Istanbul, Turkey Article Information Received 03 August 2021 Revised 25 September 2021 Accepted 07 October 2021 Available online 08 March 2022 (early access)


INTRODUCTION
L oci used in early forensic DNA analysis not only have low discrimination power but also they required high amount of DNA with high quality and extended analysis time. Therefore, alternative systems have been searched for better DNA profiling results. One of the efficiently used system is microsatellites, also known as short tandem repeats loci (STRs), composed of repeat units with a length of 1-6 base pairs which have a high power of discrimination (McMahon et al., 2017). The allele size of the STR loci is between 60 to 470 base pairs (Butler, 2012). STRs are ideal genetic markers in forensic sciences for not requiring expensive analysis equipment and possibility of identification of degraded biological samples with automation and multiple analysis (Ezkurdia et al., 2014).

O n l i n e F i r s t A r t i c l e
STR loci-in a single tube and with a single polymerase chain reaction with primers marked with five different (6-FAM™, NED™, TAZ™, SID™, VIC®) fluorescent dyes and size standard is marked with sixth dye GeneScan™ 600 LIZ™ Size Standard v2.0 (Thermo Fisher Scientific). Moreover, the GlobalFiler™ kit which has at least 10 loci contain mini STRs with an amplicon length less than 220 base pairs, which makes it suitable for forensic scenarios including mixed samples. The problem with the mixed samples is that allele loss might be observed in Amelogenin. For instance, deletion in Y-chromosome may bring about false sex determination results. In order to prevent this problem, the GlobalFiler™ kit includes DYS391 and Y-Indel loci specific to gender determination. Furthermore, the kit includes a PCR reaction mixture that offers higher performance and there is an increase in the success rate of the kit even with samples including inhibitors. In addition to that, the GlobalFiler™ kit provides not only good DNA profiling results but also the amplification time of the kit is less than 80 min.
The validation of the GlobalFiler® Kit was performed in accordance with the guidelines published by the Scientific Working Group on DNA Analysis Methods (SWGDAM). Studies were performed to demonstrate the effectiveness and performance of the GlobalFiler® Kit for sensitivity, concordance, specificity with limit of detection, dynamic field study, limit of quantification, stochastic threshold study, sensitivity, reproducibility and repeatability, mixture and contamination study parameters. The results validate the multiplex design as well as demonstrate the kit's robustness, reliability, and suitability as an assay for human identification with casework DNA samples.
One of the significant aspects of this study is that Turkey is an important geopolitical country that is the bridge between cross-section of Asia and Europe. Migrations occur inside and out in the several regions of Turkey, which requires mapping the allele frequencies and comparing the data to others published ones. In our country, there are several population studies; however, to the best of our knowledge, there has not been any published study that includes 24 STRs (20 STR loci + 4 gender loci) by using GlobalFiler® amplification kit.

MATERIALS AND METHODS
Oral swab samples were collected and studied for the purpose of population genetic study and internal validation. All the procedures were carried out in accordance with Istanbul University-Cerrahpasa Medical Faculty Clinical Research Ethics Committee after the approval (No. 69885, 06/03/2015). All participants were informed about the study and signed a full informed consent. Oral swab samples were collected from 350 randomly selected individuals (200 men, 150 women) from seven different regions of Turkey (Marmara, Aagean, Mediterranean, Southeastern Anatolia, Eastern Anatolia, Black Sea and Central Anatolia regions) who were aged over 18 years old. Samples were taken from the mouth by using sterile swabs and dried out at room temperature. For method validation, oral swabs were collected from the personnel (3 women, 1 men) of Istanbul University-Cerrahpasa, Institute of Legal Medicine and Forensic Sciences, Forensic Molecular Genetic Laboratory. Given that, our laboratory is accredited according to TSE/IEC 17025 standard; all samples of the laboratory personnel are included in the validation as a standard requirement. All the samples were stored at -20° C prior to analysis. DNA isolation was performed using silica based QIAamp TM DNA Mini Kit (Qiagen, Stanford, CA, USA) using the swab samples collected from volunteers as previously described (QIAGEN Sample and Assay Technologies, 2016). Quantity of DNA isolates was detected in Qubit TM fluorometer (Applied Biosystems) by using the Quant-iT dsDNA HS (High Sensitive) Assay kit (Invitrogen, Paisley, Renfrewshire, UK) following the steps mentioned (Thermo Fisher Scientific, n.d.). The validation parameters studied by guidelines published by the Scientific Working Group on DNA Analysis Methods (SWGDAM). The amount of DNA measured by the fluorimetric method ranged between 1.03 and 12.82 ng/μl.
After isolation and quantification of the samples, GlobalFiler TM PCR Amplification Kit stored at -20ºC was used for polymerase chain reaction. Thermal cycling parameters were optimized using an experimental design approach to define the combination of temperatures and retention times that produce the best test performance. Controlling contamination in the polymerase chain reaction stage, negative control sample was used for each replication. In addition, assessing the optimal sensitivity of the method, positive control sample was used (Thermo Fisher Scientific Inc., 2019). As internal controls oral swab samples from personnel (3 women, 1 men) were used to diminish the individual sample errors. PCR analyses O n l i n e
The validation parameters such as limit of detection (LOD), limit of quantification (LOQ), dynamic range, stochastic threshold, repeatability, reproducibility were calculated by using Microsoft Excel 2010 bundled with Microsoft Office package.
In order to determine the limit of detection of the ABI 3130 genetic analyzer, Relative Fluorescent Unit (RFU) values of the highest peaks observed in the 10 negative control samples which were conducted and analyzed in accordance with the operating and software instructions of the device. The mean and standard deviation of the RFU values of the highest peaks obtained from all samples were calculated. According to the calculation, the cut off value was determined as 50.25 RFU.
Population and genetic data of 21 autosomal STR loci parameters such as matching probability (MP), power of discrimination (PD), polymorphism information content (PIC), power of exclusion (PE) were calculated by using Arlequin v 3.5.2.2. In addition, observed heterozygosity (H O ) and expected heterozygosity (H E ) were calculated as well as the discriminative loci between the inter-population differences with the help of Bonferroni correction (p < 0.05/21= 0.00238) assessed. Moreover, inter-population genetic distances were calculated by using Wright's F-statistics (Fixation Index-F). Allele frequencies and the Z values of allele frequencies from other countries were compared and calculated based on the peak heights of alleles by using Arlequin v 3.5.2.2 (Excoffier and Lischer, 2010).

Ethical approval
This study protocol was approved by ethical committee of the Istanbul University Cerrahpasa Medicine Faculty, Istanbul, Turkey (No. 69885, 06/03/2015) and all the participants consented to use their genetic material and other necessary information.

RESULTS
Positive control samples were prepared for eight different dilution levels (0.005 ng/ μl, 0.0125 ng/ μl, 0.025 ng/ μl, 0.05 ng/ μl, 0.125 ng/ μl, 0.25 ng/ μl, 0.5 ng/ μl, 1 ng/ μl) and each concentration level was carried out with 5 PCR and a total of 40 PCRs were performed for dynamic range. The average peak heights were obtained. Then there were 5 amplified samples in each concentration group, and average was calculated for each group and Figure  1 was obtained with the mean peak height versus each concentration. According to our study, dynamic range was determined between 0.05 ng/ μl to 0.25 ng/ μl. Determining the limit of quantification (LOQ), the applications of 40 PCR products used in the dynamic range study were examined from the lowest to the highest concentration and the sensitivity was determined for the two lowest concentrations obtained in the full profile. For this purpose, the mean and standard deviation of the peak heights of each allele detected in 5 samples in each dilution levels (0.005 ng/µl, 0.0125 ng/µl, 0.025 ng/µl, 0.05 ng/µl, 0.125 ng/µl, 0.25 ng/µl, 0.5 ng/µl, 1 ng/µl) were calculated. The values obtained for each allele were also averaged and compared with the limit of detection found for the genetic analyzer. The comparison results show that a complete and reliable DNA profile is obtained with a concentration of 0.125 ng/μl.
To find the reproducibility and repeatability, oral swab was taken and isolated from 4 people. These samples were run 3 times at different times by 4 people using the same PCR-total of 48 samples and genetic analyzer and their electrophoresis was recorded. For the reproducibility study, the standard deviation of the allele peak size was also calculated. For all loci obtained from the sequencing standard deviation values were averaged. This value is expected to be less than 0.5. For the ABI 3130 device, 7 samples and 21 sequencing results, the average value obtained from the standard deviation of all loci is 0.337.
For repeatability studies, the results of genotypes O n l i n e After that, the ratio of allelic peak heights to heterozygous gene regions was found in samples that are profiled and amplified. The stochastic threshold is 0.5 ng/ µl (Table I). Then, the values for each allele of each sample and the small peak heights used to find these values are obtained and this is showed in Figure 2. The stochastic threshold can also be expressed as the RFU imbalance points of the target DNA concentration as seen in Figure 3. The ratio of allele peak heights to heterozygous gene regions was found in the amplified and profiles samples. For this, the small peak was divided by the large peak height and multiplied by 100. The mean and standard deviation of the sister allele peak heights were found. From these averages, one threshold was obtained for the ratio of peak heights by removing three times from standard deviations. As a result of this study, the mean height of sister alleles was found to be 90.004, the standard deviation ± 7.396 and the sister allele peak height were 67.815 as in Figure 4.   For mixture studies, two previously known F and G coded samples of the profiles were mixed in a ratio of 1: 1, 1:10, 1:50, 1: 100. As a result of the examination, the observed and expected allele numbers of the profile of both persons were determined (Table II). As shown in Table III   To determine if there was a cross contamination between the wells during all experimental stages and during the preparation of the sample loading tray in the ABI 3130 genetic analyzer, first, negative control was used during the isolation phase and at the stage of PCR and electrophoresis both negative and positive controls were used. No peak was determined in the electrophoresis obtained from the negative control used in the PCR stage and no different peak was observed except the expected and known peak in the positive control. Thus, no contamination was detected during the experiments. In the second part, Gene Scan 500 Liz Size Standard and duplicated products were included in one of the wells located next to each other and only Gene Scan 500 Liz Size Standard was placed next to them. Hence empty and filled wells were located diagonally. Allelic ladder for each sequencing was also added. The sample tray shown in Table III was loaded on the ABI 3130 genetic analyzer and analyzed. For wells containing samples, we analyzed whether there was any contamination from the wells and samples containing the Gene Scan 500 Liz Size Standard from the Allelic Ladder. No contamination was detected.

F i r s t A r t i c l e
By using GlobalFiler TM PCR Amplification Kit, we have examined 68 different alleles for Turkish population. In addition to that, 44 allelic variants were identified in SE33 locus as shown in Table IV. All the SE33 variants are listed in STRBase (Butler and Redman, 1997).
According to our results, H O values ranged from 0.700 (TPOX and D5S818) to 0.920 (SE33) for Turkish population. In addition, H E values ranged from 0.696 (TPOX) to 0.955 (SE33). These results indicate that TPOX and D5S818 have the lowest discrimination power among 21 autosomal loci since H E and H O is used for calculating allele frequencies in population studies. Moreover, H E and H O values of SE33 demonstrates the discrimination power of the locus in forensic cases as seen in Table V. Similar to other population studies (Alsafiah et al., 2017;Park et al., 2016;Zhang et al., 2016), our results indicated that TPOX (PD= 0.868) has the least discriminative power among 21 autosomal loci whereas SE33 (PD = 0.986) has the most discriminative one. Similar results were observed for TPOX (TPI= 1.667) being less useful compared to other loci and SE33 would contribute the most for paternal cases statistically. Thus, according to our data, TPOX is the least polymorphic locus and SE33 was concluded to be the most polymorphic one as seen in Table VI. According to our results, genetic distance between Turkish population and South African population (0.1198) is the most distinct whereas genetic distance between Turkish population and Azerbaijan (0.0097) is found to be the closest to one another as seen in Table VII.

DISCUSSION
In criminal cases such as murder, sexual assault and theft, detection of the link between crime and criminal, genetic information from biological materials at the scene is used to determine whether there is a connection between the suspect and the incident (Rudin and Inman, 2001        results, the development of automation and the cheapening of technology have provided a great use and success in forensic genetics. In order to provide all these vital expertise services, it is necessary to use reliable and upto-date methods and systems in accordance with quality standards. The allelic frequencies of short tandem repeats loci vary from population to population. For this reason, the genetic markers used in forensic sciences should be determined for each population and a database should be established. In the statistical calculations made in the evaluation of DNA analysis results, it is important to use the database of the population (Budowle et al., 2001). Although short tandem repeats databases have been created in many countries around the world, different kits are produced in the criminal research area due to the increasing number of polymorphic loci.

O n l i n e F i r s t A r t i c l e
One of the commercially available GlobalFiler™ Polymerase Chain Reaction kit is used in our study to obtain more rapid and reliable results in forensic cases by validation of DNA analysis test methods from oral swab. These values correspond to the validity of the polymerase chain reaction kit and our results is other studies. Mentioned in some studies, the capillary electrophoresis device, sample and polymerase chain reaction kits sensitivity and efficiency play a vital role in internal validation studies. This shows that sample collection methods change the efficiency of the study, even if same kit was used (Flores et al., 2014). Flores et al. (2014) conducted a study with buccal swab and blood samples by using GlobalFiler Express kit and in ABI 3500xl, according to their results, thermal cycle was found to be 28 cycles, injection time was 12 seconds and minimum detection threshold was indicated as 120 RFU. In addition to that, the study found stochastic threshold as 400 RFU for buccal swab samples and emphasized that increasing the thermal cycle would result in an increase for stochastic threshold value. Moreover, the minimum amount of input DNA to have a full profile is found to be 0.5 g whereas in our study it was 0.125 ng.

O n l i n e F i r s t A r t i c l e
Population Data and Internal Validation of the 21 Short Tandem Repeat Loci  According to another study Almeida et.al conducted in Brazilian population with 502 participants by using Globalfiler Express Kit, the cut-off value was found as 50 RFU similar to our results (Almeida et al., 2015). In one genetic validation study conducted in America population, almost all of the minor components in the 1:9 and 9:1 mix samples were detected and it was reported that a significant proportion of the minor components could be detected in the 1:19 and 19:1 mix samples. In our study, observed alleles of minor components were found as 23% for 1:100 mixture, 45% for 1:50, and 86% for 1:10 mixture samples. However, in 1:1 ratio mixture samples, the alleles of the minor component was observed most efficiently (Ludeman et al., 2018).
Along with the validation parameters that make up the first part of the study; individuals with no consanguineous relationship and taken into consideration of Turkey's population density in seven regions, randomly selected 350 individuals have been studied and allele frequencies of these individuals were calculated in the second part of the study. In our study, it was determined that SE33 locus is the most polymorphic locus Turkish population. In the literature search, many researchers stated that SE33 loci is the most discriminative loci among 21 autosomal loci. This shows that our study is compatible with other studies (Alsafiah et al., 2017;Hennessy et al., 2014;Park et al., 2016;Zhang et al., 2016). For Turkey population, Combined Power of Exclusion is % 99.99999963961 and Combined Power of Discrimination is % 99.999999999999999999999998267. The allele frequencies obtained in our study were compared with the populations of Iraq, Saud Arabia, China (Han), USA (Cauc), Iran, Afghan, Azerbaijan, Romania, South Africa (Table VII). According to the results of the analysis, a significant difference was observed between the Turkish population and the South Africa population (0,1198) and no significant difference was observed for the Azerbaijan (0,0097) population. The differences between populations, the result of the distributions of the alleles on a locus and allele frequencies on a given population should be considered regardless of the geographical distances in between.

CONCLUSIONS
This study was aimed to conduct internal validation of Globalfiler Amplification kit and provide a Turkish population data by using the kit. The results of both validation and population studies were compared with other countries. However, one of the points that should be considered in terms of population study is that the GlobalFiler™ kit is sensitive and has more definite results in terms of some loci than the fact that it has high discriminative power. In many studies some degenerated samples in other kits, Y allele was deficient. Such conditions may occur due to major deletions on the Y chromosome or primers. The absence of the allele can cause problems in forensic cases or identification. This risk is lower in the GlobalFiler™ kit. The Y-indel, Y-short tandem repeats and DYS391 markers found in the GlobalFiler™ kit are particularly enlightening for the detection of men individuals. As a biological sample we used the oral swabs and GlobalFiler™ kit as a polymerase chain reaction kit to determine the allelic frequencies and population genetic parameters in the Turkish population and validation of the GlobalFiler™ kit was carried out in our laboratory. The GlobalFiler™ polymerase chain reaction kit which was validated in our laboratory, provides faster, more reliable and more informative results in degraded samples for identification, to solve the incidents of sexual assault involving more than one person, in the identification of the victims of the disaster, compared to other kits, because of the inclusion of high discriminative locis.