A COMPARATIVE ANALAYSIS OF INTER-SITE GENE EXPRESSION HETEROGENICITY OF NORMAL HUMAN BUCCAL MUCOSA WITH NORMAL GINGIVAL MUCOSA

BACKGROUND Description of heterogeneity of gene expression of various human intraoral sites are not adequate. The aim of this study was to explore the difference of gene expression profiles of whole tissue obtained from apparently normal human gingiva and buccal mucosa (HGM, HBM). MATERIALS AND METHODS Gene sets fulfilling inclusion and exclusion criteria of HGM and HBM in gene Expression Omnibus(GEO) database were identified, segregated, filtered and analysed using the ExAtlas online web tool using pre-determined cut-off. The differentially expressed genes were studied for epithelial keratinization related, housekeeping(HKG), extracellular matrix related(ECMRG) and epithelial-mesenchymal transition related genes(EMTRGs). RESULTS In all 40 HBM and 64 HGM formed the study group. In all there were 18012 significantly expressed genes. Of this, 1814 were over-expressed and 1862 under-expressed HBM genes as compared to HGM. One in five of all studied genes significantly differed between HBM and HGM. For the keratinization genes, 1 in 6 differed. One of every 5 HKG-proteomics genes differed between HBM and HGM, while this ratio was 1-in 4 for all ECMRGs and EMTRGs. DISCUSSION This difference in the gene expression between the HBM and HGM could possibly influence a multitude of biological pathways. This result could explain partly the difference in clinicopathological features of oral lesions occurring in HBM and HGM. The innate genotypic difference between the two intra-oral niches could serve as confounding factor in genotypic studies. Hence studies that compare the HBM and HGM should factor-in these findings while evaluating their results.


4
Fibroblast of anatomically distinct sites have distinct transcriptional patterns. On appropriate stimulation, the relatively quiescent fibroblasts can acquire an active synthetic, contractile phenotype and express several smooth muscle cell markers, which are not exclusive for fibroblasts of that particular niche. [4,5] Failure to account and consider the phenotypic as well as expression profile of these heterogenic differences can lead to potential erroneous interpretation of data collected for the specific experimental purposes. [6] All oral fibroblasts are derived from the ecto-mesenchymal cells of the cranially migrated neural crest cells. In spite of the same lineage, there is anatomical, histological heterogeneity among various intra-oral sites. [7] Human buccal fibroblasts (HBF) have partly primed, committed cells of neural crest lineage characterised by plasticity and longevity controlled by WNT and p53 gene network. [8] The existence of heterogeneity in human gingival fibroblasts (HGF) is documented, particularly those of the gingival papillary and reticular area. [9,10] A subset population of HGF is known to heal without fibrosis. [11] When HGF and HBF cultured fibroblasts were studied, the cell density migration index was higher in HBF than HGF. On the other hand, the HGF had more adult phenotype in contrast to foetal phenotype of HBF. [9] Oral epithelial cells are exposed to external environment and they also interact extensively with underlying cells such as fibroblasts and immune cells. Alteration in fibroblasts influences proliferation, repair and inflammatory cytokine secretion and factors released by fibroblasts influence overlying epithelial cells. [12] Oral keratinocytes interact with themselves and underlying connective tissue to provide a tight barrier and heal when damaged. [13][14][15][16] The aim of this manuscript was to study the difference of expression profiles of epithelial and the extracellular matrix (ECM) components of whole tissue obtained from two common 5 oral niches -gingiva and buccal mucosa from publicly available Gene Expression Omnibus -GEO datasets.

Source of microarray datasets
A systematic search of the National Centre for Biotechnology Information (NCBI-GEO -http://www.ncbi.nlm.nih.gov/geo) repository using the keywords "buccal mucosa" "gingiva" was carried out. The microarray datasets and their descriptions contents were further limited to a. in Humans -Homo sapiens b. type -"expression profiling by the array.", c. has normal healthy patient, tissue biopsy (epithelium and connective tissue) sample with no obvious disease d. ideal, acceptable extraction protocol e. has mRNA. Datasets that were from culture explants or had only fibroblasts or only keratinocytes or non-tissues or that unclear extraction protocols or sources or details were excluded. No emphasis was placed on types of dataset platforms. The gene series were collated.

Pair-Wise Comparison
The individual patient's gene expression datasets from methodically reviewing and screening the microarray datasets were collated using the pair wise comparison using the ExAtlas online web tool at https://lgsun.irp.nia.nih.gov/exatlas. [17] The genes from the GEO datasets were log 2 transformed, normalized with quantile normalization method and later combined. Then, the samples -Human Buccal Mucosa (HBM) and Human Gingival Mucosa (HGM) gene-sets were combined. Principal component analysis was performed to check the tissue/gene distribution with FDR ≤ 0.05, correlation threshold = 0.7 and fold change threshold of 2. In the tool, PCA is calculated using the Singular Value Decomposition (SVD) method that generates eigenvectors for rows as well as columns of the log-transformed data 6 matrix. [17] For plotting of tissues and genes (biplot) column projections were used as biplots is helpful for visually exploring associations between genes and tissues.
A pairwise compared with ANOVA statistical methodology was employed to compare HBM and HGM. The quality assessment measurement of the samples within the pooled datasets was the standard deviation (SD) value, and the criterion was SD ≤ 0.3. The correlation of gene expression for housekeeping genes was established at >0.5. The false discovery rate (FDR) based on the Benjamini-Hochberg was ≤ 0.05 and fold change was fixed at 2. The scatter plot was made to identify the over-expressed and under-expressed genes in HBM as compared to HGF. Only genes that had gene symbol was considered for further studies.

Gene-set enrichment analysis of up/down-regulated genes
The listed genes were then used to compare with the Public human Gene Ontology gene set with functional role of 9606 genes. [18,19] A FDR≤0.05, fold enrichment threshold of 2 and a minimum of 5 gene overlap in biological process was kept as standard norms. For this purpose, exatlas tool employs, Parametric Analysis of Gene Enrichment (PAGE) as it is simple and reliable. [20,21]

Differential expression of epithelial keratinization related genes
The genes associated with epithelial keratinization were collected from gene ontology (GO: 0031424; http://amigo.geneontology.org/amigo/term/GO:0031424). The differentially expressed genes from section-2.2 were assessed for the same.

Differential expression of housekeeping gene
From a database of human housekeeping genes (HKG-proteomics) (https://www.proteinatlas.org/humanproteome/tissue/housekeeping), the results of section-2.2 were compared and the housekeeping genes that were differentially expressed between HBM and HGM tabulated. [22] 2.6 Differential expression of extracellular matrix related genes [ECMRGs] The list of over and under-expressed genes of HBM as compared to HGM was searched for extracellular matrix (ECM) and annotated according to matrisome divisions (core matrisome or matrisome-associated) and categories (ECM glycoproteins, collagens and proteoglycans, for core matrisome genes, ECM-affiliated, ECM regulators and secreted factors, for matrisome-associated genes) in lines as proposed by Naba et al., using the MatrisomeAnnotator tool at http://matrisomeproject.mit.edu/analytical-tools/matrisomeannotator/ [23]

Differential expression of epithelial-mesenchymal transition related genes [EMTRGs]
From the database of human epithelial-mesenchymal transition (EMT) database, the results from section-2.2 were compared and the differentially expressed EMT genes were listed out. [24]

Differential expression of fibroblast markers
The list of fibroblast markers of heterogeneity as revealed by recent single-cell analysis through cell identification and discrimination was collected from previously published literature. [25] A comparison of the same was performed based on ANOVA results and the overlap tabulated along with rank. 8

Microarray datasets
From the publicly available database, in early March 2021, a total of 4 datasets (GSE7307, n=4; GSE3526, n=4; GSE17913, n=40) non-smoker HBM and 5 datasets (GSE106090, n=6; GSE10334, n=64; GSE4250, n=2; GSE3374, n=8; GSE23586, n=3) had healthy HGM were collected and processed in the Exatlas software as outlined. In all there were 48 HBM and 83 HGM gene-sets. All these gene-sets were subjected to quality assessment procedure as previously outlined. Those sets whose house keeping genes did not fulfil the required minimum quality parameters of correlation or SD were removed from further analysis. After removal of non-qualifying datasets, a total of 40 HBM and 64 HGM formed the final study group. The final study group gene-set had used GPL570 platform to analyse the array.

Gene-set enrichment analysis of up/down-regulated genes
The over/under expressed gene enrichment analysis using Gene-Ontology is shown in supplementary file-1, (Tables1, 2). There were 217 biological processes in over-expressed genes. The commonly involved GO processes (>15% of cluster) and their cluster frequency

Differential expression of epithelial keratinization related genes
In all there were 227 genes associated with epithelial keratinization. Of this, there were 13 genes overexpressed and 25 under-expressed in HBM as compared to HGM.
Combined, this 38 genes accounted to 16.74% of all keratinization associated genes.

Differential expression of ECMRGs
There are 847 ECMRGs identified and reported. [26] The identified over/underexpressed genes were subjected to matrisome analysis. This accounts to 23.73% of the ECM genes. EMT related genes in present study accounted to 24.49% of all known EMT genes.

Differential expression of fibroblast markers
Of

Overall difference between gene expression of HBM and HGM
Among all the significant genes in this study, 1-in-5 significantly differed between HBM and HGM. For the keratinization genes, 1 in 6 differed. One of every 5 HKGproteomics genes differed between HBM and HGM, while this ratio was 1-in 4 for all ECMRGs and EMTRGs. (Figure-XX)

DISCUSSION
Several studies have reported the spatio-temporal gene expression difference among various human tissues, organs and even different parts of same organ, including skin, adipose tissue, brain, cornea and epidymis. [27][28][29][30][31][32] The role of heterogeneity of gene expression of 1 2 fibroblasts of various sites and its implication on human diseases have been reported. [33] Studies exploring the heterogeneity of the various intra-oral sites in terms of pH, histo-cytoarchitecture, immune-topology and gene expressions are limited. [7-16, 34,35] Existing studies have reflected the inherent differences between the various intra-oral sites and even among the same site such as gingiva and periodontium. [9,10,11] The spatio-temporal heterogeneity of the gingiva at single cell level has been recently described widening our understanding of the biology of these intra-oral sites. [36] The intra-oral site specific gene expression differences in pathologic process also has been previously reported. [36,37] However, the innate difference between the gene expression of whole HBM and HGM among normal, non-diseased population has not been reported. Hence this study was attempted.
The present study reveals the existence of substantial difference between the HBM and HGM in terms of gene expression. The figure-xx shows the extent of difference. The difference emanates from the basic housekeeping genes, keratinization process to complex ECMRGs and EMTRGs. This inherent difference in gene expression may have ramification and may partially account for site predilection of pathologies. [37,38] In addition to differential expression, the net difference in key biological reactions between HBM and HGM have been evidenced by change in the KEGG pathways. The alterations could be reflecting on the molecular profile of the cells of HBM and HGM.
For a typical biomedical research involving molecular techniques, the need for positive and negative control is mandatory. There are guidelines for the use of such procedure related control tissues and mandatory reporting format. [39] However, to the best of our knowledge, there are only few suggestions and deliberations for "normal" tissues that are involved as an experimental control arm, especially in oncology. [40] Ethically, ideal normal tissues are difficult to acquire. [41][42][43] Most of the bioethical guidelines in force advises against incising/excising or enlarging to non-lesional areas, for the exclusive purpose of conducting a 1 3 research. [42] Generally a normal tissue that is trimmed/excised for approximation after unrelated surgical treatment, after voluntary consent is used as normal tissues. [43] This "uninvolved matching" tissue may not a true representation of a native, normal tissue because molecular changes may have occurred under the influence of adjacent infection, inflammation or an adjacent neoplastic process ("condemned mucosa"). When "normal" tissue is required as a control arm, it becomes pertinent to specify what tissues constitute "normal" and justify that such a tissue (if from diseased entity) would not have altered molecular signals that could adversely influence the outcome of the comparison. [41] The influence of age, gender and habits need to be also accounted. The apparent clinically normal tissues could possibly harbour mutations and molecular changes that could potentially influence the outcome of comparisons. [40] Hence the apparent healthy tissues should be used with caution.
Some investigators request normal tissues from trauma cases or edges of chronic infection and even from preserved anatomic entities. Wound approximation edges from trauma cases may be contaminated and if late, be affected by inflammatory process. Similarly chronic inflammation could alter the molecular nature of the native tissues. It may not be ethical to obtain tissues from autopsies or stored specimen because these may be done without consent. [41] There has been attempts to elucidate the normal human differential protein expression in various tissues. Even in the study, different intra-oral sites have not been evaluated in such attempts except for tongue. [44,45] Cancer process per se, are known to down-regulate tissue specific genes and are directly associated with prognosis. While studying differences between same type of cancer (for example oral squamous cell carcinoma) at different sites (of oral cavity), the inherent difference between the normal tissues would be highlighted more rather 1 4 than differences between those that are independent of normal tissue physiology. [46] If not accounted, this phenomenon may lead to erroneous conclusions.
In studies involving human oral diseases, the normal, non-diseased control tissues are often obtained from gingival tissues or from the retromolar tissue-an area where HBM and HGM meet. In such instance, with chronic exposure to pro-inflammatory and pro-fibrotic cytokines, as in inflammation, the connective tissue and epithelium may be irreversibly damaged. As a part of reparative mechanism, there could a cascade of triggered epigenetic modifications and activation of related genes, leading to their engagement in further differentiation and fibrosis development. Based on the type of cells, the reaction could also widely vary. There are gaps in knowledge regarding this complex mechanism at the cellular and molecular levels. [47] The biological niches in the HBM and HGM are different. In addition, the normal tissue resident microbial flora could influence the gene expression.
There are recent reports emanating from cancer and normal tissues. [48,49] In such a background, the inherent geno-typological difference between HBM and HGM, as evidenced in this study assumes an important role.
Limitation of the study includes using samples from different studies and population, though the difference have been accounted for in the analysis; the samples differing from several studies but had used one platform; stringent quality control parameter being enforced; non-consideration of age, gender, and deleterious habits. Future studies need to account for the same.

CONCLUSION
An attempt is made to characterize the genetic expression of HGM and HBM using pre-existing dataset and employing stringent statistical approach. The study identified that 1in-5 genes significantly differed between HBM and HGM. Of genes responsible for 1 5 keratinization process, 1 in 6 differed. Similarly housekeeping genes, genes associated with extracellular matrix and epithelial-mesenchymal transition were also significantly altered.