Introduction

It is a decade since the first completely annotated and continuous human major histocompatibility complex (MHC) genomic sequence map was published.1 The main purpose of the initial genomic sequences was to produce gene and genomic feature maps incorporating known and predicted gene loci. Since then, the MHC genomic sequence template has been used extensively to investigate single nucleotide polymorphism (SNP) and haplotype variation, gene expression, sequence diversity between and within species, and the evolution of the MHC structural organization.2, 3, 4, 5, 6, 7, 8 The continuing strong interest in the MHC genomic sequence stems from its well-established role in regulating inflammation, the complement cascade and the innate and adaptive (acquired) immune responses using the natural killer (NK) and T-cell systems. The MHC locus contributes to restricted cellular interactions and tissue histocompatibility owing to the cellular discrimination of ‘self’ and ‘non-self’ that requires an essential knowledge of the effects of MHC-matched and -mismatched donors in transplantation medicine9 and transfusion therapy.10 Similarly, a fully annotated MHC genomic and diversity map is useful for understanding autoimmunity11 and for charting the host response to infectious agents.12, 13 Apart from regulating immunity, the MHC genes may have a role in reproduction and social behavior, such as pregnancy maintenance, mate selection and kin recognition.14, 15 The MHC genomic region also appears to influence central nervous system (CNS) development and plasticity,16, 17, 18, 19, 20 neurological cell interactions,21, 22 synaptic function and behavior,23, 24 cerebral hemispheric specialization,25 and neurological and psychiatric disorders.26, 27, 28, 29, 30

The MHC region at 4 Mb occupies 0.13% of the human genome (3 × 109 bp), but contains 0.5% (>150) of the 32 000 known protein coding genes. Many of the MHC gene products are ligands, receptors, interacting proteins, signaling factors and transcription regulators involved in the inflammatory response, antigen processing and presentation as part of the adaptive immune response, and interactions with NK cells and cytokines as part of the innate immune responses. The MHC genomic landscape is composed mainly of genes, retrotransposons, transposons, regulatory elements, pseudogenes and a few remaining undefined sequences. The MHC genomic region is one of the most gene-dense and best-defined regions within the human genome, and the undefined sequences contribute to only a low percentage of the MHC region.

The human leukocyte antigen (HLA) is the name for the human MHC and we will use both names interchangeably in this overview, which outlines the HLA genomic loci, SNP and haplotype diversity, gene interactions and expression, and disease associations. This presentation complements other recent reviews on the human MHC architecture, duplications, diversity, disease and evolution.5, 6, 14, 31, 32, 33

Definition and annotation of gene classifications

Table 1 is a summary of the latest (16 September 2008) locus information gathered on the genomic sequence of the HLA region providing the official gene and locus symbols, geneIDs, gene type, isoforms, mRNA and protein sequence accession numbers, and Online Mendelian Inheritance in Man (OMIM) identification numbers. The genomic sequence of the HLA region used for the present annotations is the PGF haplotype sequence34 that was derived from a consanguineous HLA-homozygous cell line carrying the HLA-A3, -B7, -Cw7, -DR15(DR2) combination of alleles. This sequence is different from the original HLA virtual genomic sequence that was first reported1 and reviewed31 as a continuous, but mixed genomic sequence obtained from different haplotypes. The locus information in Table 1 is divided into five subregions from the telomeric to the centromeric end, the extended class I (GABBR1 to ZFP57), class I (HLA-F to MICB), class III (PPIAP9 to BTNL2), class II (HLA-DRA to HLA-DPA3) and the extended class II (COL11A2 to KIFC1) regions. The definition of the extended class I and II regions is ambiguous, and we have included only four well-analyzed loci in the extended class I and 19 in the extended class II regions as shown in Table 1.

Table 1 Locus information in the HLA region (16 September 2008)

Locus information was assembled by using the Entrez Gene database (http://www.ncbi.nlm.nih.gov/sites/entrez) of the National Center for Biotechnology Information (NCBI) and previously published reports and papers.1, 35 The Homo sapiens official gene symbols and gene names of the MHC genomic region can be accessed by way of the ‘GeneID’ using Entrez Gene at NCBI.36 Of the 224 loci mapped and reported by The MHC Sequencing Consortium in 1999,1 more than half of them (124 loci per 224 loci) were replaced within 5 years with a new and official gene symbol and name approved by the HUGO Gene Nomenclature Committee (HGNC).31 Since then, another 21 gene symbols and names have been changed. We have provided only one ‘old symbol in 2004 and 2008’ in Table 1, but many of the official gene symbols and names have alternate symbols and aliases. For example, the alternative symbols for HLA-F (GeneID 143110) are DADB-68M4.2, CDA12, HLA-5.4, HLA-CDA12 and HLAF. There are 11 alternative names for the gene DDR1 (GeneID 780). The old or alternative gene/locus names and symbols can also be accessed through the GeneID (Table 1) at NCBI.

The assembled loci in Table 1 were classified into four categories of gene status: ‘protein coding,’ ‘gene candidate (candidate),’ ‘non-coding RNA (NC gene)’ and ‘pseudogene (pseudo).’ The descriptor ‘protein coding’ means a gene that is transcribed to mRNA and also has a reliable open reading frame (ORF) and/or a known protein product, with the accession numbers for the mRNA and protein sequences provided. The ‘gene candidate’ is transcribed to mRNA (an mRNA sequence accession number is provided), but has an unknown or uncertain ORF. It may or may not have an accession number for a protein sequence listed. The ‘NC gene’ is transcribed to mRNA (accession number is provided), but does not have any ORF or a known protein or peptide product. The ‘pseudo’ is generally not transcribed to mRNA, and it may be a fragmented gene structure or a retrotransposed and unprocessed cDNA structure. Some of the pseudogenes, such as the P5-1 family in Table 1, are known to be the remnants or hybrids of ancient endoretroviral sequences.37 Interestingly, SNP variants for one of the members of the P5-1 family, the gene locus HCP5 located near HLA-B, have been strongly associated with the progression of HIV infection,13, 38 psoriasis vulgaris and psoriatic arthritis.39

Gene numbers in the HLA region

A total of 253 loci have now been identified and/or reclassified in the 3.78 Mb HLA region of the PGF haplotype34 from BABBR1 located on the most telomeric side of the extended class I region to KIFC1 (past name: HSET) located on the most centromeric side of the extended class II region (Figure 1 and Table 1). There are an additional 29 loci since the 224 loci were first identified in the HLA region and reported in 1999.1 The locus numbers of HLA-DRB and RP-C4-CYP21-TNX subregions generated by gene duplication vary in number and reflect HLA haplotypic differences, as reported earlier.1 When all the loci of the HLA complex were grouped into four categories of gene status, 133, 19, 22 and 79 loci were classified as protein coding, gene candidates, non-coding RNAs and pseudogenes, respectively. It is clear from Table 1 that the non-HLA genes greatly outnumber the HLA-like genes (HLA-class I, MIC and HLA-class II genes). Of the 45 HLA-like genes, 20 were identified as protein coding genes, 4 were NC genes and 21 were pseudogenes. Of the 208 non-HLA genes, 112 were identified as protein coding genes, 20 candidate genes, 18 NC genes and 58 pseudogenes.

Figure 1
figure 1

Gene map of the human leukocyte antigen (HLA) region. The major histocompatibility complex (MHC) gene map corresponds to the genomic coordinates of 29 677 984 (GABBR1) to 33 485 635 (KIFC1) in the human genome build 36.3 of the National Center for Biotechnology Information (NCBI) map viewer. The regions separated by arrows show the HLA subregions such as extended class I, classical class I, class III, classical class II and extended class II regions from telomere (left and top side) to centromere (right and bottom side). White, gray, striped and black boxes show expressed genes, gene candidates, non-coding genes and pseudogenes, respectively. The location of the alpha, beta and kappa blocks containing the cluster of duplicated HLA class I genes in the class I region are indicated.

Of the total number of 113 non-HLA protein coding genes, 9 (SFTA2, MUC21, PSORS1C3, MCCD1, SLC44A4, ZBTB12, PRRT1, WDR46 and PFDN6) were newly identified to be functional loci (Tables 1 and 2). Of them, PSORS1C3 is one of the associating genes of psoriasis vulgaris.40 MCCD1 encodes mitochondrial coiled-coil domain 1 and is highly polymorphic, containing approximately one SNP in every 99 basepairs.41 PFDN6 encodes prefoldin subunit 6, and the gene was reported to be overexpressed in certain cancers compared with normal counterparts in a tissue microarray study.42

Table 2 Gene numbers in the HLA region

Thirty-three of the non-HLA expressed genes (GABBR1, MOG, ZNRD1, RNF39, TRIM10, TRIM39, PRR3, ABCF1, DDR1, CCHCR1, TCF19, POU5F1, BAT1, ATP6V1G2, LTB, LST1, AIF1, BAT3, MSH5, EHMT2, STK19, CYP21A2, TNXB, PPT2, AGPAT1, AGER, TAP2, PSMB8, PSMB9, BRD2, COL11A2, SLC39A7 and TAPBP) and HLA-F appear to express spliced variants with an overall average of 2.6 different kinds of spliced variants per gene. One of the recently identified expressed genes with a relatively large number of spliced variants is C6orf25 that is located between LY6G6C and DDAH2 within the class III region. This gene has at least seven spliced variants, and it is a member of the immunoglobulin (Ig) superfamily that encodes a glycosylated, plasma membrane-bound cell surface receptor as well as soluble isoforms. Some of the membrane-bound and soluble products encoded by the C6orf25 splice variants contain two immunoreceptor tyrosine-based inhibitory motifs (ITIMs) that were found to interact by phosphorylation with the SH2-containing protein tyrosine phosphatases SHP-1 and SHP-2.43

Regional analysis of the HLA super-locus

The HLA super-locus can be separated into the traditional five HLA regions with 4, 128, 75, 27 and 19 loci within the extended class I, class I, class III, class II and extended class II regions, respectively (Figure 1 and Table 2).

Extended class I region

In this version of the HLA loci, only four genes (BABBR1, SUMO2P, MOG and ZNP57) have been included in the extended class I region. However, numerous duplicated genes encoding the olfactory receptor, histone, tRNA and zinc-finger protein are located on the telomeric segment of the extended class I region. The hemochromatosis gene (HFE) that is similar in structure to an HLA class I gene is located outside the HLA super-locus 3.6 Mb away on the telomeric side of HLA-F and the extended class I region.44

Class I region

The class I region contains the six classical and non-classical HLA class I genes. The non-classical HLA class I genes are differentiated from the classical class I genes on the basis that they have limited polymorphism; the tissue distribution of gene expression is restricted and they appear to play a less well-defined role in transplantation medicine.45 There are 19 HLA class I gene loci, where 3 are classical (HLA-A, -B and -C), 3 non-classical (HLA-E, -F and -G) and 12 non-coding genes or pseudogenes (HLA-S/17, -X, -N/30, -L/92, -J/59, -W/80, -U/21, -K/70, -16, -H/54, -90 and -75), clustered within three separate duplication blocks, designated as the alpha, beta and kappa blocks46 (Figure 1). Of the HLA pseudogenes, HLA-H/54 appears to encode two mRNA sequences (AK090500 and AK308374), whereas the transcript AK127349 and hypothetical protein FLJ45422 sequence were mapped to a part of overlapping exons of HLA-L/92. The FLJ45422 gene is composed of five exons and contains an Ig domain constant region (IGc) and transmembrane domain, but its polymorphisms and function are unknown.

There are seven MIC genes, which are HLA class I-like genes, distributed across the three duplication blocks; two are expressed within the beta block, whereas the remainder are non-expressed pseudogenes within the kappa and alpha blocks.46, 47, 48 These MIC genes have been generated with HLA class I genes by several rounds of segmental duplication events.35 There are 34 non-HLA class I protein coding genes distributed between the duplication blocks that from an evolutionary perspective are termed anchor or framework genes.48, 49

Overall, there are 128 loci within the 1.8 Mb class I region from HCP5P15 to MICB, with 42 expressed genes, 12 gene candidates, 10 non-coding genes and 64 (50%) pseudogenes (Table 2). Of the 54 protein coding genes and gene candidates, 7 non-HLA genes (LOC100133214, FLJ45422, LOC100133303, LOC100129065, LOC729792, HCG22 and PSORS1C3) were identified in the region after the previous locus information report.31 Of the 42 protein coding genes, 4 (SFTA2, MUC21, CCHCR1 and PSORS1C3) were previously unknown to be functional loci, and TUBB received a new official symbol and name (Table 1).

Class III region

The class III region, located between the class I and II regions, contains 75 loci within 0.9 Mb of DNA from PPIAP9 to BTNL2 (Table 1), with 55 protein coding genes and 5 (6.7%) pseudogenes (Table 2). Most of the protein coding genes and gene candidates were described earlier in the locus information report of 2004,31 but three genes (LY6G6F, C6orf26 and LOC100128067) were identified more recently. LY6G6F belongs to a cluster of leukocyte antigen-6 (LY6) genes in the class III region and it encodes a type I transmembrane protein belonging to the Ig superfamily,43 which may have a role in signal transduction in response to platelet activation.50 Of the 55 protein coding genes, 5 (MCCD1, SLC44A4, EHMT2, ZBTB12 and PRRT1) were previously unknown to be functional loci, and three (VARS, LSM2 and CFB) had a symbol and name change (Table 1). In addition, five small nuclear RNA sequences (SNORD84, SNORD117, SNORA38, SNORD48 and SNORD52) were identified in the vicinity of the BAT1, BAT2 and C6orf48 genes, respectively.51, 52, 53 The class III region has no known HLA class I- and class II-like genes, but contains the complement factor genes, C2, C4, CFB, the cytokine genes TNF, LTA and LTB, and many genes with no obvious relationship to immune function or inflammation. The gene combination of RP-C4-CYP21-TNX is modular in structure and varies in copy number and has haplotypic variability. Many of the gene products expressed in the class III region have fundamental roles in cellular processes, such as transcription regulation (BAT1, VARS, RDBP, STK19, SKIV2L, CREBL1 and PBX2), housekeeping (DOM3Z, NEU1, AGPAT1, CL1C1 and CSNK2B), biosynthesis, electron transport and hydrolase activity (PPT2, DDAH2 and ATP6V1G2) and protein–protein interactions for either intracellular or intercellular interactions, chaperone function and signaling (C6orf46, HSPA1A, HSPA1B, BAT3, BAT8, AGAR, RNF5, FKRPL, TNXB, NOTCH4).

Class II region

The class II region spans 0.7 Mb of DNA and contains the classical class II alpha and beta chain genes, HLA-DP, -DQ and -DR that are expressed on the surface of antigen-presenting cells to present peptides to T-helper cells. There are 27 loci identified within the class II region from HLA-DRA to HLA-DPA3 (Table 1), with 17 protein coding genes, seven gene candidates and five pseudogenes (Table 2). In total, 19 of the loci are HLA class II-like sequences, including the 15 classical HLA class II loci and the four non-classical HLA class II loci (HLA-DM and -DO). The HLA-DRB loci are variable in number and MHC haplotype-dependent. The HLA-DRB locus in the PGF haplotype (Table 1) contains four copies of the HLA-DRB gene, HLA-DRB1 (coding), -DRB5 (coding), -DRB6 (non-coding) and -DRB9 (non-coding), whereas the HLA-DRB copy numbers vary for other haplotypes.5 All of the 17 protein coding genes were previously known to be functional genes. Of all the protein coding genes in this region, BRD2 (alias RING3) is the only gene without an established immune function. It is a transcription factor with widespread specificity, possibly remodeling chromatin complexes through interactions with histone acetyltransferase complexes, and its activity is high in myeloid leukemias.54 Although BRD2 may have a homologous sequence in yeast and Drosophila, it is strongly linked with the MHC of most vertebrates in the evolutionary path from sharks to man.48

Extended class II region

The extended class II region spans 0.2 Mb of DNA from COL11A2 to KIFC1 (Table 1), with 19 loci; that is, 15 protein coding genes, 1 gene candidate, 1 non-coding gene and 2 pseudogenes (Table 2). There was only one newly identified gene candidate (LOC646720) since the locus information report of 2004.31 However, of the protein coding genes, two (WDR46 and PFDN6) were previously unknown to be functional genes.

Interspersed repeats

Apart from the gene loci, 49.5% of the HLA genomic sequence is composed of interspersed repeat elements, such as SINE (Alu, MIR), LINE (LINE1 and 2, L3/CR1), LTR elements (ERVL, ERV class I and class II) and DNA elements (hAI-Charlie, TeMar-Tigger). Table 3 presents a summary of the repeat elements as detected by RepeatMasker (http://www.repeatmasker.org/). A comparable analysis with slightly different results and annotations (data not shown) was obtained with the repeat analysis program CENSOR.55

Table 3 Features of repeat sequences

Genomic diversity

HLA genes

A total of 3201 HLA allele sequences (2215 in class I and 986 in class II) were released by the IMmunoGeneTics HLA (IMGT/HLA) database release 2.22 in July 2008 (http://www.ebi.ac.uk/imgt/hla/). The IMGT/HLA Database is a specialist database for HLA sequences. Ten years ago, the allele numbers were only 964, but since then the numbers have increased by 200–300 allele sequences each year. Of the 2176 HLA class I alleles, 673, 1077, 360, 9, 21 and 36 alleles were counted in HLA-A, -B, -C, -E, -F and -G genes, respectively (Table 4); 2110 and 66 alleles were counted in the classical and non-classical HLA class I genes, respectively. Of 986 HLA class II alleles, 3, 669, 34, 93, 27, 128, 4, 7, 12 and 9 alleles were counted in HLA-DRA, -DRB, -DQA1, -DQB1, -DPA1, -DPB1, -DMA, -DMB, -DOA and -DOB genes, respectively (Table 4), with 954 and 32 alleles in the classical and non-classical HLA class II genes, respectively. In addition, 64 and 30 alleles were detected for the MHC class I-like gene, MICA and MICB, respectively.

Table 4 Number of HLA alleles

Microsatellites

A total of 1527 microsatellite loci (846 in class I, 295 in class III and 386 in class II) were detected in the COX-MHC sequence (accession number NT_113891) by the Sputnik program (http://espressosoftware.com/pages/sputnik.jsp). Of them, 268 microsatellites (146 in class I, 61 in class III and 61 in the II) were developed as genetic markers.56 These polymorphic microsatellite markers have been useful for precise mapping of disease-related genes within the HLA region in linkage analysis and disease association studies.57, 58 Moreover, they provide a powerful tool to study recombination events in this region, which contributes to haplotypic diversification. Detailed microsatellite marker information is provided by the dbMHC database of the NCBI (http://www.ncbi.nlm.nih.gov/gv/mhc/main.fcgi?cmd=init).

SNPs

A total of 60 928 to 71 569 SNPs were detected in a pairwise analysis of five different genomic sequence assemblies (PGF, Celera, HuRef, C6_COX and C6_QBL), ranging from GABBR1 to KIFC1, by dbSNP (http://www.ncbi.nlm.nih.gov/SNP/). SNP markers are useful for constructing HLA haplotypes and for precise mapping of disease-related genes within the HLA region.59, 60, 61, 62 Figure 2 shows the marked peaks and troughs of the SNP distributions for the pairwise analysis of the five assemblies. The main peak diversities were observed not only in genomic segments harboring the highly polymorphic HLA-A, -B, -C, -DR, -DQ and -DP loci but also within some non-HLA loci such as those telomeric of HLA-C. Therefore, the HLA diversity is not limited to the antigen/T-cell receptor)-interacting sites of the HLA molecules,63 but spreads to the surrounding loci as hitchhiking diversity owing to the accumulated effect of overdominant selection acting on HLA loci.3 Interestingly, several disease-related genes, such as diffuse panbronchiolitis, psoriasis vulgaris, rheumatoid arthritis and sarcoidosis, were identified in the hitchhiking diversity-affected segments.57, 58, 64, 65 It was hypothesized by Shiina et al.3 that some non-HLA disease alleles co-evolved with the positively selected HLA loci that were in linkage with harmful polymorphisms within the negative or neutrally selected non-HLA loci in response to various selection, population, genetic and environmental factors.

Figure 2
figure 2

Single nucleotide polymorphism (SNP) distribution within the human leukocyte antigen (HLA) region. Diversity plots (ae) drawn by comparing the released SNPs in dbSNP database against the reference assembly sequence determined in 19991 (accession no. NT_007592) (a), Celera alternate assembly sequence (accession no. NW_923073) (b), HuRef alternate assembly sequence based on HuRef SCAF_1103279188254 (accession no. NW_001838980) (c), c6_COX sequence (accession no. NT_113891) (d) and c6_QBL sequence (accession no. NT_113893 to NT_113897) (e). Gray backgrounds show significantly higher SNP regions that may have been generated by hitchhiking diversity.3

Genomic variation

The HLA genomic variations generated by HLA-DRB gene copy number in class II and/or the copy number variations (CNVs) of the RP-C4-CYP21-TNX gene combination in class III were previously associated with a number of different autoimmune diseases well before the complete, continuous HLA super-locus sequence was available.46 The HLA-DR haplotypes consist of a number of copies of coding and non-coding HLA-DR genes. The expressed DRB sequences have been assigned to four different loci, DRB1, 3, 4 and 5. The highly polymorphic DRB1 alleles (Table 4) are present in all haplotypes, whereas DRB3, 4 and 5 are present only in some haplotypes, as are the HLA-DRB2 and HLA-DRB6 to -DRB9 pseudogenes. The HLA-DRB2 pseudogene lacks exon 2 and contains a 20-nt deletion in exon 3, which has interrupted the correct translational reading frame.66 The common HLA-DR alleles, major allotypes and their association with disease have been reviewed by Marsh.67 The low and high copy numbers of the C4 gene in the class III region have been recently associated as risk and protective genes, respectively, for systemic lupus erythematosus (SLE) susceptibility in European Americans.68

Genomic variations, such as insertion or deletion (InDel), inversion and other CNV, have been detected in recent genome-wide studies by comparative genomic hybridization (CGH) array mapping, fosmid end mapping, Mendelian inconsistencies, paired-end mapping of 454 sequencing reads, SNP chips and computational mapping of re-sequencing traces.69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 From the Database of Genomic Variants (http://projects.tcag.ca/variation/; 26 June 2008), 181 variations (50 InDels, 1 inversion and 130 CNVs) were detected at 49 genomic positions of the HLA region, especially within the HLA class I and II gene regions and a part of the class III region (Table 5). Some InDels are repetitive elements, such as Alu, HERV, L1 and SVA, or were generated by the influence of repetitive elements.7, 34, 80, 81, 82, 83

Table 5 Genomic variations of the HLA region

Intra- and extra-MHC gene interactions

MHC genes do not function in isolation from other genes in the human genome, but they may interact with other genes inside (local or intra-MHC gene interaction) or outside the MHC region (global or extra-MHC gene interactions). The MHC gene interactions may be viewed as quantitative interactions between alleles at different loci that affect fitness or contribute to complex disease phenotypes (epistasis),84, 85 as simple statistical interactions between alleles at different loci (linkage disequilibrium or LD) as a consequence of functional selection or a hitchhiking effect,86, 87 as functional protein-binding interactions detected by two-hybrid, affinity capture or phage display methods,88 or as protein–DNA interactions such as those between transcription factors and gene promoter and enhancer regions89, 90 or between replication protein factors and DNA replication sites and elements.91, 92 The study of genetic interactions can reveal gene function, the nature of the mutations, functional redundancy, transcription regulation and protein interactions in normal and disease processes.

Table 6 provides an example of some protein interactions encoded by genes located inside and outside the MHC genomic region. Of the interactions between different genes within the MHC, the most definitively studied examples are those involved in protein dimer formation and peptide presentation in the adaptive immune response. In the former case, the interaction of the HLA class II alpha and beta proteins encoded by the classical class II A and B gene loci, respectively, have long been known to form the alpha and beta heterodimer chains and consequently have received extensive investigations at various levels, including X-ray structural analysis.93, 94 The interaction of proteins involved in antigen presentation, such as HLA class I proteins, TAP1, TAP2, HLA-DM and TAPBP, have also been extensively studied.95, 96 The interactions between the alleles of the HLA-DR haplotypes, which are in strong LD, were found to affect the immune response levels and disease susceptibility. For example, the results obtained for two multiple sclerosis-associated HLA-DR alleles at separate loci of the HLA-DR2 haplotype in a humanized mice functional assay imply that the LD between these two alleles is due to a functional epistatic interaction.97 Accordingly, one allele modifies the T-cell response activated by the second allele through activation-induced cell death resulting in a milder form of multiple sclerosis. Other protein interactions encoded by genes within the MHC genomic region include those between RFP5 and BAT5, C4B and C2, CFB and C4B, LTA and LTB, IER3 and BAT3, and between MRPS18B and NFKBIL1.

Table 6 Examples of some MHC gene interactions sourced from Entrez gene at NCBI

Examples of protein interactants encoded by genes inside and outside the MHC are more numerous than those encoded by genes within the MHC genomic region. Recent research has focused strongly on the HLA class I interactions with the killer Ig receptor (KIR) genes and the leukocyte Ig-like receptor (LIR) gene family encoded in the leukocyte receptor complex (LRC) on chromosome 19q13.98, 99 Combinations of HLA class I and KIR variants have been associated with autoimmunity, viral infections, pregnancy-related disorders and cancer.100, 101 Similarly, the proteins encoded by the MICA and MICB genes (Table 6) are known to interact with KLRC4 and KLRK1 that are encoded by the genes on chr 12, to regulate innate immunity by way of the NK cell systems.47 The proteins encoded by the C4, CFB and C2 genes in the HLA class III region are involved in complement activation and consequently interact with proteins encoded by genes from outside the MHC (Table 6). Allelic variations between the MHC complement genes and non-MHC gene sequences have been associated with macular degeneration and SLE.102 Recently, Lester et al.103 reported finding an epistasis between the MHC C4 gene region and the RCAa block in primary Sjögren syndrome. The RCAa block (regulators of complement activation, 1q32) contains critical complement regulatory genes such as CR1 and MCP, and the epistasis was attributed to an interaction between C4 and its receptor, CR1, encoded within the RCAa block. Furthermore, the IFN-regulator factor 5 (IRF5) gene variants located on chr 7q32 were found to interact with the class I MHC locus in people with psoriasis104 and possibly other autoimmune diseases.105

Most proteins encoded by the 132 protein coding genes within the MHC interact with proteins encoded by genes outside the MHC region. The protein and genetic interactions of the MHC genes listed in Table 1 can be accessed and viewed by way of the GeneID number. For example, the interaction data and online links for the MDCI gene (GeneID: 9656), mediator of DNA damage checkpoint 1, which is required to activate the intra-S phase and G2/M phase cell cycle checkpoints in response to DNA damage, includes information on the peptide or protein interactants, the interacting genes, the source databases (Human Protein Reference Database (HPRD) or BioGRID) and published references (PubMed). The 13 genes found to interact with MDC1 and listed at Entrez Gene are ATM, BRCA1, CHEK2, H2AFX, NBN, SMC1A, TP53, TP53BP1, CENPC1, CHEK2, GATA4, H2AFX and HDAC10. In another example, the protein expressed by the CCHCR1 gene (ID:54535), which has at least three splice variants, was identified to promote steroidogenesis by interacting with STAR, the steroidogenesis acute regulatory protein106 encoded by a gene on chr 8p (Table 6), which may be downregulated in psoriatic keratinocytes.107 A public online service for protein interaction datasets is also provided by BioGRID at http://www.thebiogrid.org/index.php and the HPRD at http://www.hprd.org/index_html. The knowledge extracted from protein interaction databases might assist in a more efficient organization and analysis of genome-wide studies by revealing which gene interactions warrant epistatic investigation.

MHC and genome-wide gene expression profiling

Most knowledge on MHC gene expression at the transcript and protein levels has depended on individual gene studies (Table 1). However, in recent years, the development of genome-wide gene expression assays, including some or many of the MHC genes, has provided a more global perspective of different expression patterns in immune- and disease-related pathways. Gene expression profiling of normal and diseased cells and/or tissues using oligonucleotides, cDNA or genomic arrays has been a particularly successful by-product of genome sequence research. Global transcriptome studies are performed using various descriptive, experimental and disease conditions, and the data are often deposited into public databases, such as Gene Expression Omnibus (GEO), that can be accessed online for review and/or reanalysis (http://www.ncbi.nlm.nih.gov/geo/).

Genome-wide gene expression data have permitted an examination and comparison of the mRNA profiles expressed by genes both inside and outside the MHC region. For example, in our study of the gene transcription patterns in the skin lesions of four Japanese patients with psoriasis vulgaris and three normal controls, we found that only seven MHC genes (LY6G6C, CDSN, TAP1, HLA-G, HLA-F, TUBB and CFB) from a total of approximately 90 MHC protein coding and non-coding genes represented on the HUG95A Affymetrix oligonucleotide array of 12 000 human genes were significantly upregulated in the affected skin compared with normal skin; no significant statistical changes occurred in the expression of the classical HLA class I and II genes.108 The only MHC gene that was significantly downregulated in the psoriatic lesions was GABBR1. Most of the 263 significantly upregulated changes in the psoriatic-affected skin occurred for genes located outside the MHC region that were involved with interferon mediation, inflammation immunity, cell adhesion, cytoskeleton restructuring, protein trafficking and degradation, RNA regulation and degradation, signaling transduction, apoptosis and atypical epidermal cellular proliferation and differentiation. Bioinformatics analysis of the significantly upregulated genes in psoriatic skin compared with normal skin, using a commercially available computer network program (MetaCore) in Figure 3, shows that inflammation and cell cycle regulation were the two most significant molecular pathways involved in psoriasis by way of the STAT and Myc gene regulatory systems as well as by the MHC genes, HLA-G (interacting with KIR2DL4 and ILT2 on chr 19), DDR1 and TNF (MetaCore Applications (2007) http://www.genego.com/pdf/PsoriasisCS.pdf). The HLA-G locus was recently found to also interact with the IRF5, encoded by gene variants on chr 7q32 in Swedes with psoriasis.104

Figure 3
figure 3

The involvement of major histocompatibility complex (MHC) genes, HLA-G, DDR1 and tumor necrosis factor (TNF)-alpha, in the molecular pathways of psoriasis. The whole-genome microarray data of Kulski et al.108 were evaluated using the MetaCore software package to identify the molecular character and pathways involved in psoriasis. The MHC genes are highlighted by black squares. Red rectangles and orange ovals represent the genes involved in the inflammation and cell cycle regulation pathways (thick blue lines), respectively, and red circles represent overexpressed key transcription regulators. The figure was produced by MetaCore from GeneGo Inc. (St Joseph, MI, USA). The color reproduction of this figure is available on the html full text version of the manuscript.

Other investigators have used similar gene microarray assays to identify the patterns of MHC and non-MHC gene transcription in skin lesions of patients with psoriasis,109, 110 atopic dermatitis111 and porokeratosis, a skin disorder of keratinization.112 Gene expression profiling of peripheral blood mononuclear leukocytes has been performed on psoriasis patients for disease stage prediction113, 114 and treatments with therapeutic TNF and IFN-gamma antibodies.115 Leukocytes and/or lymphocytes express more than 75% of the human genome and provide an alternative to tissue biopsies for studies of the association between HLA gene activity and autoimmune diseases, such as psoriasis, asthma, rheumatoid arthritis (RA) and SLE. A number of different MHC-related diseases, including SLE,114, 116 RA117, 118 and OA,119 have been investigated by gene expression profiling. For example, van der Pouw Kraan et al.120 used cDNA microarray technology to subclassify RA patients and to disclose different disease pathways in rheumatoid synovium. They found that among the 121 genes overexpressed in one of the main tissue groups (RA-I) identified by a hierarchical clustering of gene expression data, 9 genes from the MHC region were indicative of an adaptive immune response, whereas another group (RA-II) expressed genes suggestive of fibroblast dedifferentiation. Microarray analyses of peripheral blood cells from patients with psoriatic arthritis identified downregulation of innate and acquired immune responses as well as the MHC genes from the PSORS1 and PSORS2 susceptibility loci.121

Peripheral arterial occlusive disease (PAOD: OMIM 606787) is commonly found in elderly patients as a result of atherosclerosis of large and medium peripheral arteries, or aorta, and often coexists with coronary artery disease and cerebrovascular disease. Recently, Fu et al.122 analysed 30 femoral arteries (11 with intermediate and 14 with advanced atherosclerotic lesions and 5 normal femoral arteries) by genome-wide gene expression profiling using the Affymetrix microarray platform and found that most of the MHC class II and complement molecules were significantly upregulated in the intermediate lesions, but not in the advanced lesions. They concluded from the results of their expression study that different immune and inflammatory responses occur at different stages of PAOD and development of artherosclerotic lesions. The MHC class II and complement gene activity was related in different ways to the Toll-like receptor signaling and NK cell-mediated cytotoxicity enrichment found to take place in the intermediate and advanced atherosclerotic lesions.

HLA-wide gene expression profiling using the Affymetrix microarray platform also allows researchers an opportunity to determine the degree of positive and negative coordination between HLA and non-HLA gene expression in controlled experiments, cell and tissue types, and in population and disease studies. For example, Figure 4 shows the microarray expression profiles for some non-HLA class I genes relative to the expression of the non-classical HLA class I genes, HLA-E, -F and -G, in established cell lines derived from different cancers, with data provided by The Cancer Genome Anatomy Project (http://cgap.nci.nih.gov/Genes). It can be seen in Figure 4 that the FLOT1 gene was expressed at highest levels in cancer cells derived from the CNS, whereas DDR1 and TRIM15 (alias Hs.591789) were expressed most strongly in the colonic cancer cell lines. In comparison, the non-classical HLA class I genes were expressed most consistently at moderate to high levels in the cell lines derived from renal carcinomas. The variable expression of TRIM15 among the different cancer cell types is notable given its possible antiviral role in innate immunity.123, 124

Figure 4
figure 4

The relative expression of some human major histocompatibility complex (MHC) class I genes in different cancer cell lines. The gene examples from the class I region are non-human leukocyte antigen (HLA) genes (DDR1, IER3, HCG18, PPP1R11, RPP21, DHX16, GTF2H4, GNL1, RNF39, TRIM31, Hs.591789 (TRIM15), FLOT1 and PP1R10) and the non-classical HLA class I genes, HLA-E, -F and -G. The data are taken from The Cancer Genome Anatomy Project at the National Cancer Institute (USA) using the batch gene finder to find the expression data for the selected genes of interest (query) in the gene list of NC160_U133 (Affymetrix platform). The present image for the transcriptome analysis was produced online at http://cgap.nci.nih.gov/Genes/BatchGeneFinder using only the selected gene list shown in the image. The level of transcriptional activity in the cells ranged from the highest (red squares) to the lowest (blue squares) according to the color scale indicated at the top left-handed side of the figure. The rectangular blocks labeled (a–e) within the matrix of the figure highlight the detection probes with relatively high expression levels of FLOT1 in central nervous system (CNS) cancer cells (a), DDR1 (b) and Hs.59178 (TRIM15) (c) in colon cancer cells, IER3 and HLA-E in melanoma (d) and the non-classical HLA class I genes (e) in the renal cancer cells. Of the list of cancerous tissue at the bottom of the matrix, ‘Leuk’ is leukemia and ‘P’ is prostate. The color reproduction of this figure is available on the html full text version of the manuscript.

Although an HLA and global picture of gene expression in tissues and cells can be obtained by using a full set of Affymetrix GeneChips, CGH for SNP analysis in combination with gene expression is still a relatively new and demanding approach for the study of complex diseases. CGH, in an attempt to improve functional genome research and disease associations, is particularly useful for detecting genomic sequence alterations or gene CNVs125, 126 that might be associated with disease. For example, CNVs of defensin genes on chr 8 were found to be strongly associated with Crohn's disease and the skin disease, psoriasis.127, 128 Similar studies on the effects of genomic alteration or CNVs on the expression of MHC genes are still limited, but a few recent reports suggest that this approach might yield important new insights into the interaction between the genes of the MHC and other genomic regions in disease studies. For example, the study by Jiang et al.129 using cDNA microarrays to detect the simultaneous genomic and expression alterations in prostate cancer, has implicated the dysregulation of exogenous antigen presentation through MHC class II and protein ubiquitination during protein-dependent protein catabolism in the tumorigenic process. They found that the expressions of the MHC genes ABCF1, HLA-DRB1 and HLA-A, located on the chromosome 6p21, and of the MHC class II chaperone gene, CD74, located on 5q32 were both significantly downregulated, probably as a consequence of the CD74 gene deletion.

Genome tiling arrays is another improving methodology that appears useful for future investigations into MHC epigenetics,130 SNPs,7 gene–gene interactions131 and gene expression activity132 both inside and outside the MHC genomic region by using high-density oligonucleotide arrays with probes chosen uniformly from both strands of the entire genome, including all genic and intergenic regions. Genome-wide protein profiling (proteomics) by using chips, arrays or high-throughput mass spectrometry is a rapidly emerging technology in disease and diversity studies to screen for protein activities such as protein–protein, protein–DNA, protein–drug and protein–peptide interactions; to identify enzyme substrates and to profile immune responses.133, 134 Some of these procedures have been applied specifically to MHC gene functions, particularly to detect and characterize antigen-specific T-cell populations in disease,135 HLA protein–peptide (antigen) interactions,136 targeting autoantibody/autoantigen targets137, 138 and to profile other immune responses.139 Bioinformatic and statistical algorithms are continually being developed to integrate the genomics of DNA variation, transcription and phenotypic data, to provide a system genetics view of disease and to enhance identification of the associations between DNA variation and diseases as well as to characterize those parts of the molecular networks that drive disease.140

MHC and disease associations

The main function of the MHC gene region is to protect itself and its organism against harmful infectious agents (to recognize and deal with foreign organisms and antigens) and to dispense with the damaged, dying or infected cells and tissues. The extremely high levels of polymorphism and heterozygosity within the MHC genomic region provide the immune system with a selective advantage against the diversity and variability of pathogens. However, the high level of polymorphisms and mutations in the MHC has the added risk of generating autoimmune diseases and other genetic disorders. Several hundred autoimmune and infectious diseases have been associated with the MHC since the first report in 1967 that HLA-B antigens were increased in frequency in patients with Hodgkin's lymphoma.141 At least another 40 different autoimmune diseases were linked to specific HLA types by the end of 1986.142, 143 In an update on the role of the MHC genes in disease, Shiina et al.31 presented an overview of 109 HLA-associated diseases. When PubMed online at NCBI was searched in September 2008 with the keywords ‘human MHC (or HLA) gene disease,’ 3151 journal publications were listed on the subject of HLA and disease. Using ‘HLA’ as a keyword to search the Genetic Association Database (GAD) (http://geneticassociationdb.nih.gov/cgi-bin/index.cgi), 500 journal publications were found on HLA gene association and disease between 1999 and 2007. The statistical, biological and medical significance of many of the MHC disease association studies, however, remain unclear or doubtful.

A number of recent reviews are available on HLA and infections,12, 144, 145, 146 as well as HLA and autoimmune diseases,11, 31, 32, 147, 148, 149, 150 and will not be considered in any detail here. OMIM is a database of human genes and genetic disorders that provides information and references on the discoverers, chromosomal location, molecular functions, mutations and associations between the genes and disease.151 There are at least 100 OMIM identifiers concerning the HLA region loci, mostly of expressed genes, that can be accessed through http://www.ncbi.nlm.nih.gov/ or through links from other sites, including Entrez Gene database at NCBI.36

The 31 HLA disease associations listed in Table 7 and sourced from the OMIM database152 are some examples of HLA-associated diseases that have a strong experimental or statistical association with reasonable reproducibility. At least 26 of these diseases have been associated with non-HLA genes encoded within the MHC, with the regulatory cytokines TNF and LTA contributing to a large number of disease associations by way of mutations or polymorphisms within the gene promoter or coding regions that might affect expression levels.153, 154, 155, 156 Ten of the diseases appear to be monogenic owing to mutations within one of the MHC genes. Adrenal hyperplasia is now well accepted to be the consequence of 21-hydroxylase deficiency and alterations in the CYP21A2 gene.157 Some of the CYP21A2 gene alterations may arise by transference of sequences to CYP21A2 from the neighboring non-coding CYP21A1P pseudogene by gene conversion.158 It is also generally well accepted that mutations within the NEU1 gene are responsible for neuraminidase deficiency and sialidosis, which is characterized by the progressive lysosomal storage of sialylated glycopeptides and oligosaccharides,159 and that C2 mutations cause C2 deficiency in the process of the complement cascade.160 Of the 21 multifactorial diseases listed in Table 7, 11 (type I diabetes (T1D), inflammatory bowel disease, multiple sclerosis (MS), AITD, PV, RA, celiac disease (CD), ankylosing spondylitis (AS), SLE, juvenile RA (JRA) and vitiligo (VIT)) were linked most significantly to the HLA region in a recent meta-analysis of 42 independent genome-wide linkage studies.161 In a recent genome-wide association study of seven common diseases using SNP markers, the MHC associations were strongest for RA, T1D, moderate for CD and weak or absent for bipolar disorder, coronary artery disease, hypertension and type II diabetes.162 In another recent review and pooled analysis of the MHC in autoimmunity, a number of overlapping HLA class II and TNF alleles and haplotypes were associated with the diseases MS, T1D, SLE, UC, CD and RA.11

Table 7 MHC monogenic and polygenic disease associations

Most of the 21 multifactorial diseases listed in Table 7 are polygenic with a few specified or unspecified MHC gene alleles possibly interacting in some unspecified way with other genes inside and/or outside the MHC region. The exact MHC genes involved with many of the diseases are still not clearly defined. For example, the association of an HLA genomic region with the onset or maintenance of psoriasis is definite, but which of a number of MHC candidate genes (or combination of genes) ranging between the MICA and CDSN loci is responsible remains uncertain.39, 58, 163, 164, 165, 166, 167, 168, 169

Only a few autoimmune diseases have been related just to the classical class I and II alleles, in spite of the continuing dogma that disease associations are caused by altered or faulty peptide presentation to T cells by polymorphic class I and II gene products. AS is primarily attributed to HLA-B27, with minor associations such as HLA-Cwl and -Cw2 or HLA-DR7 considered secondary because of LD or a hitchhiking effect. Similarly, HLA-B51 continues to be strongly associated with Behcet syndrome,170 although other chromosomal regions may be involved.171 In Caucasian populations of Northern European descent, the DR15 haplotype (DRB1*1501-DQA1*0102-DQB1*0602) is hypothesized to be the primary HLA genetic susceptibility factor for MS. Experiments with transgenic mice have confirmed the importance of the DRB5*0101 and DRB1*1501 allelic interactions in creating a mild form of MS-like disease,97 but more severe forms probably depend on other genes172 such as T-cell receptor beta, CTLA4, ICAM1 and SH2D2A. Schmidt et al.149 reviewed 72 publications on the HLA association with MS and found that most investigators reported a higher frequency of the DR15 haplotype and/or its component alleles for the MS cases than the controls, but the results may have been biased by poor study designs.

Owing to the difficulty in identifying a single MHC gene that is responsible for disease, some researchers prefer to examine the association between MHC haplotypes and disease susceptibility and resistance.46 Common Caucasian MHC haplotypes may be accounted for by a limited number of ancestral haplotypes using the alleles of five or more gene loci.173 The MHC ancestral haplotype (AH) 8.1, characterized by the alleles HLA-A*01, -B*08, -DRB1*03, -DQB1*02 and -DQA1*05 has been dubbed the ‘autoimmune haplotype’ because of its association with numerous autoimmune diseases, including T1D, CD, Graves’ disease, SLE and Myasthenia Gravis (MS).174 The complete MHC genomic sequences for eight haplotypes involved in autoimmune diseases, including the 8.1 AH, have been published.7 In this regard, Shiina et al.3 proposed, on the basis of comparative genomics between human haplotype sequences and the sequences of chimpanzee and rhesus macaque, that the rapid evolution of the MHC class I genes in primates is likely to have generated new disease alleles in humans through hitchhiking diversity.

The results of MHC disease association studies are complicated by race and population differences, influences of LD, the large polymorphism, copy number and InDel variations between different MHC haplotypes, disease severity and the need for large sample numbers to provide statistical significance. Fernando et al.11 noted in their review of six autoimmune diseases with genetically complex disease traits that nearly all association studies of the MHC in autoimmune and inflammatory disease have been limited to a subset of 20 genes and performed only in small cohorts of predominantly European origin. As highlighted in a recent review,5 the MHC association with complex disease phenotypes is dependent on the HLA and non-HLA genes, the genetic code (SNPs, CNV, InDels and inversions), the epigenetic code (DNA methylation and histone modification), biological effects (structural and biochemical changes in gene products and transcriptional regulation) and environmental factors (diet and antigen exposure). Modern HLA and whole genome association studies of SNPs, microsatellites, InDels and CNVs are now broadening toward elucidating gene interactions, epistasis, risk and penetrance of autoimmune diseases,162 although clear-cut results are often hampered by multiple testing errors and the statistical type I (false positives owing to multiple sample analysis) and statistical type II errors (false negatives owing to insufficient number of samples and other factors). Whole genome gene expression studies in combination with DNA variation and phenotypic data, as a single systematic study, have a greater potential for elucidating disease pathways and dissecting the role of individual genes and genomic loci, similar to the HLA super-locus, that interact in a molecular network. Such studies are still in their infancy, and much experimentation may be needed to overcome the potential data overload as we move rapidly toward a system genetics view of disease.140

HLA and cancer

The loss of HLA gene expression owing to viral infection, somatic mutations or other causes may have important effects on immune suppression and cancer development.175 To identify the molecular mechanisms involved in the maintenance of Epstein–Barr virus (EBV)-associated epithelial cancers, Sengupta et al.176 performed genome-wide expression profiling for all human genes and all latent EBV genes in a collection of 31 laser-captured, microdissected nasopharyngeal carcinoma (NPC) tissue samples and 10 normal nasopharyngeal tissues. They determined that all the HLA class I genes, TAP2 and HCG9 genes involved in regulating immune response through antigen presentation correlated negatively with increased EBV gene expression in NPC and concluded that antigen display is either directly inhibited by EBV, facilitating immune evasion by tumor cells and/or that tumor cells were selected for their EBV oncogene-mediated tumor-promoting actions. Global gene expression profiling of human papillomavirus (HPV)-positive and -negative head and neck cancers revealed a significant downregulation for two of the MHC genes, CDSN and LY6G6C, but not other MHC genes in HPV-16-positive head and neck squamous cell carcinomas.177

Non-viral tumors frequently lose expression of HLA molecules such as the reduction or total loss in colorectal carcinoma.178 Cells participating in immune response may fail to exert function without adequate MHC signaling in tumor cells, with the exception of NK cells, which may recognize MHC class I-negative tumor cells. Furthermore, soluble MHC class I-related (MIC) molecules play important roles in tumor immune surveillance through their interaction with the NKG2D receptor on NK, NKT and cytotoxic T cells.179, 180 Interestingly, genome-wide expression profiling has shown that non-steroidal anti-inflammatory drug (NSAID) treatment upregulated HLA class II genes in tumor tissue, but not in normal colon tissue, from the same patient.181 In total, 23 of the 100 most upregulated genes belonged to MHC class II; HLA-DM, -DO (peptide loading), HLA-DP, -DQ, -DR (antigen presentation), as did CD4+ T-helper cells, whereas HLA-A and -C expression were not increased by NSAID treatment.

In breast cancer, metastasis may be suppressed in part by the activity of the breast cancer metastasis suppressor 1 (BRMS1) gene, which can block development of metastasis without preventing tumor growth. In a comparison of gene expression patterns in BRMS1-expressing vs non-expressing human breast carcinoma cells, the BRMS1 expression in 435/BRMS1 cells was strongly correlated with an increased expression of MHC genes, HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DQA1, HLA-DPA1, HLA-DRA, HLA-DRB4, HLA-DMA, C1S, HLA-B, HLA-C and HLA-F.182 Thus, the induction of MHC class I and II genes may be one mechanism by which 435/BRMS1 cells are kept at low populations, that is, by triggering an immune response that eliminates or reduces their metastasizing potential.

In an interesting paper by Rimsza et al.,183 gene expression profiling data were used to correlate the expression levels of MHCII genes with each other and their transcriptional regulator, CIITA (16p13), in 240 cases of diffuse large B-cell lymphoma (240 cases in the LLMPP data set). A correlation map was created for expression of the genes that are telomeric (HSPA1L, HSPA1A, BAT8, RDBP, CREBL1 and PBX2), within (MHCII genes, TAP1, TAP2, PSMB9 and BRD2) or centromeric (RXRB, RING1, RPS18, TAPBP, DAXX and BAK1) to the MHCII locus. Correlation coefficients among MHCII genes were high (0.73–0.92), whereas those between adjacent and intervening genes were low (0.12–0.49). The authors concluded that the loss of MHCII expression in non-immune-privileged site diffuse large B-cell lymphoma is highly coordinated and not due to chromosomal deletions or rearrangements. Furthermore, Dave et al.184 showed that gene expression profiling of MHC and non-MHC genes is an accurate, quantitative method for distinguishing Burkitt's lymphoma with the t(8;14) c-myc translocation from diffuse large-B-cell lymphoma. Burkitt's lymphoma was readily distinguished from diffuse large-B-cell lymphoma by the high-level expression of c-myc target genes and the low-level expression of all the MHC class I genes.

Conclusion

The human MHC genomic region is a super-locus composed of at least 250 coding and non-coding genes, the structural organization of which has evolved gradually, involving various mutation, duplication, deletion and genomic rearrangement events over a period of 450–520 Myr, at least from the time of the emergence of sharks (phylum Chordata, subphylum Vertebrata and class Chondrichthyes). A strong and progressive research interest remains toward haplotyping the entire human MHC genomic region by genomic resequencing for SNP, InDel and CNV analysis. The MHC genomic analysis was the prototype for many of the current procedures in genome-wide research, such as haplotyping, SNP and microsatellite analysis, and LD analysis for studies on human population diversity and disease association. The MHC genomic region is now part of the global systems analysis and network programs involved in the storage and dissemination of data on genome-wide gene expression at the level of the proteome, transcriptome, metabolome and phenotome, system and immune pathways, and disease associations using SNP, InDel and microsatellites as genomic markers or haplotype tags for statistical analysis. The degree and type of total MHC coordinated gene expression profiles have yet to be fully defined and understood in the processes of normal physiology, inflammatory and immune responses and autoimmune, chronic and infectious diseases. The field of MHC genomic research will clearly continue to expand into the future with the development of new procedures and studies to gain a better understanding of the intra- and extra-MHC gene interactions and their effects on human diversity and disease.

Website references

http://www.ncbi.nlm.nih.gov/sites/entrez Entrez Gene database

http://www.ncbi.nih.gov/entrez/query.fcgi?db=OMIM. OMIM: Online Mendelian Inheritance in Man

http://www.repeatmasker.org/ RepeatMasker program

http://www.ebi.ac.uk/imgt/hla/IMGT/HLA database: ImMunoGeneTics/HLA Sequence Database

http://espressosoftware.com/pages/sputnik.jsp Sputnik program

http://www.ncbi.nlm.nih.gov/gv/mhc/main.fcgi?cmd=initdbMHC database

http://www.ncbi.nlm.nih.gov/SNP/dbSNP database

http://projects.tcag.ca/variation/ Database of Genomic Variants

http://www.thebiogrid.org/index.php BioGRID: General Repository for Interaction Datasets

http://www.hprd.org/index_html HPRD: Human Protein Reference Database

http://www.ncbi.nlm.nih.gov/geo/ GEO: Gene Expression Omnibus

http://www.genego.com/pdf/PsoriasisCS.pdf MetaCore Applications

http://cgap.nci.nih.gov/Genes The Cancer Genome Anatomy Project

http://geneticassociationdb.nih.gov/cgi-bin/index.cgi GAD: Genetic Association Database