Introduction

Psoriasis is a common immune-mediated chronic inflammatory disease characterized by hyperplasia, altered proliferation and differentiation of keratinocytes, vascular remodelling and inflammation in the skin, with musculoskeletal inflammation of the joints observed in up to 30% of patients1. Psoriasis shows a diverse prevalence across worldwide populations, 2.5% in Europeans, 0.05–3% in Africans and 0.1–0.5% in Asians2,3,4. Phenotypic heterogeneity of psoriasis has also been reported among ethnic populations, such as clinical manifestation, response to treatments and disease progression3,4,5,6.

Psoriasis has a high genetic predisposition with estimated heritability up to 80% (refs 1, 7). Forty-one susceptibility loci have been identified at genome-wide significance (P<5 × 10−8) mainly through genome-wide association studies (GWASs)8,9,10,11,12,13,14,15,16, but these susceptibility loci collectively only explain a limited fraction of the heritability of psoriasis16,17. In addition, few loci also show differential associations between ethnic populations, suggesting potential genetic heterogeneity of psoriasis, although most of the loci have not been systematically studied beyond the original population in which they are discovered2.

Recently, trans-ethnic genome-wide meta-analysis (GWMA) of the GWAS data sets of diverse populations have been performed to identify additional susceptibility loci and dissect validated loci through fine-mapping analysis, which can help to better understand the genetic architecture of complex diseases18,19,20,21,22,23,24,25,26. Trans-ethnic GWMA across diverse populations has not been performed in psoriasis, and the genetic heterogeneity of psoriasis across diverse ancestries has not yet been systematically investigated.

Here, we describe a large-scale trans-ethnic GWMA of psoriasis using multiple GWAS data sets as well as independent validation samples from Caucasian and Chinese populations. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. In addition, our study suggests population-specific effect or allelic heterogeneity for 11 loci, and indicates that these population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. Together, these results provide novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrate a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations.

Results

Genome-wide Meta-Analysis

We carried out GWMA of psoriasis using seven independent GWAS data sets, including five published ones (four Caucasians and one Chinese)11,15,16 and two unpublished ones (one Caucasian and one Chinese; Table 1). The combined data set consists of a total number of 5,084 psoriatic cases and 8,732 controls, with 3,496 cases and 5,186 controls from Caucasian population and 1,588 cases and 3,546 controls from Chinese population. To combine these seven independent data sets that were generated by using different genotyping platforms and further enhance the coverage of genetic variants for association analysis, whole-genome imputation was performed in each of the seven data sets separately using the 1000 Genomes Project data set (Phase I, Dec. 2010 version) as the reference panel. After applying stringent quality controls (QCs), we tested 4,778,154, 4,562,294 and 3,621,551 single-nucleotide polymorphisms (SNPs) for association in the Caucasian, Chinese and combined cohorts, respectively (Table 1). Both the quantile–quantile plots and genomic inflation factors (λGC) of the genome-wide test statistic (1.017 for Caucasian cohort, 1.01 for Chinese cohort and 1.011 for the combined cohort) demonstrated that the three genome-wide association analyses had negligible inflation because of population stratification (Table 1 and Supplementary Fig. 1). Meanwhile, the quantile–quantile and Manhattan plots of the three genome-wide analyses showed an excess of SNPs with small P values (<1 × 10−4 from the logistic regression analysis in individual ethnic and combined data sets) after removing the SNPs within 41 known loci, suggesting additional associations (of more modest effect) may be identified with further validation (Fig. 1 and Supplementary Fig. 1). All the known loci showed supporting evidences of association with nominal significance (P<0.05) in at least one genome-wide analysis, except CSMD1 and SERPINB8 that did not show any association, likely due to the insufficient statistical power (<30%) of the current study for detecting their associations (Supplementary Data 1).

Table 1 Description of study samples.
Figure 1: Manhattan plot of single SNP association test results.
figure 1

(a) Chinese GWAS meta-analysis with 1,588 cases and 3,546 controls; (b) Caucasian GWAS meta-analysis with 3,496 cases and 5,186 controls; (c) trans-ethnic GWAS meta-analysis with 5,084 cases and 8,732 controls. The x-axis indicates the chromosomal position. The y-axis indicates the –log10 P values of genome-wide SNP associations from each GWAS meta-analysis using logistic regression. The horizontal green line represents the genome-wide significance threshold of P=5.0 × 10−8. Blue dots indicate the association results of SNPs within the 41 known psoriasis risk loci; red dots indicate the association results of SNPs within the four new psoriasis risk loci; grey dots indicate the association results of SNPs outside the 45 psoriasis risk loci.

Validation Analysis of Novel Associations

We carried out the validation study of novel associations suggested by GWMA. Forty-three novel SNPs with suggestive association (P<5.00 × 10−5 from the logistic regression analysis) in at least one of the three discovery analyses (Caucasian, Chinese and Combined) were genotyped in additional two independent validation cohorts totaling 10,285 psoriatic cases and 10,785 controls, with 5,134 cases and 5,633 controls from Caucasian and 5,151 cases and 5,152 controls from Chinese populations (Table 1 and Supplementary Data 2). In the Caucasian samples, four novel associations were confirmed at genome-wide significance with consistent effects between the discovery and validation samples: rs9533962 (odds ratio (OR)=1.14, P=1.93 × 10−8, LOC144817), rs34394770 (OR=1.16, P=2.65 × 10−8, COG6), rs8128234 (OR=1.17, P=3.74 × 10−8, RUNX1) and rs28512356 (OR=1.17, P=4.31 × 10−8, TP63; Table 2 and Supplementary Figs 2 and 3). In the Chinese samples, all the four SNPs showed the consistent effect of association, but only rs34394770 and rs9533962 achieved nominal significance (P=4.63 × 10−2, 2.25 × 10−4, respectively). Both SNPs achieved genome-wide significance in the combined cohort (P=4.87 × 10−8, 7.53 × 10−11; Table 2 and Supplementary Figs 2 and 3). However, the other two novel SNPs did not achieve nominal significant association in Chinese samples, although the Chinese samples have sufficient power (statistic power>90%) to detect their effects observed in Caucasian population and was adequately powered to detect effect size (for example, ORs) as small as 1.15–1.25 (Supplementary Data 1 and Supplementary Table 1). No other SNPs selected from the discovery analyses in the Chinese and combined samples were validated.

Table 2 Meta-analysis results of the 4 novel psoriasis susceptibility loci.

Giving that all the newly discovered SNPs are noncoding variants, we have investigated the regulatory functions of these SNPs by analysing the information from the HaploReg (v2) and eQTL databases. The four novel SNPs were enriched in the enhancers of epidermal keratinocytes, and resided in known DNAse I hypersensitivity in multiple cell lines including normal human epidermal keratinocytes (Supplementary Table 2). And it was shown that three linked SNPs with rs34394770 conferred eQTL effect for COG6 in monocytes (Supplementary Table 3).

We also searched for independent secondary associations within 45 previously and newly confirmed susceptibility loci in the discovery samples through conditional logistic regression analysis using the leading SNP within each locus as covariate (Fig. 2 and Supplementary Figs 3–5). First, we discovered three independent associations within IL12B locus at two novel SNPs, rs4921493 (Pcondition=6.80 × 10−13) and rs2853694 (Pcondition=1.22 × 10−9) and the previously reported rs7709212 (OR=1.38, P=2.44 × 10−30) in the combined discovery samples, whose independent effects were also confirmed by multivariate analysis (Supplementary Table 4a). The three SNPs are in low or moderate linkage disequilibrium (LD) (Supplementary Fig. 3e and Supplementary Table 4b) and showed consistent associations between the Caucasian and Chinese samples (Supplementary Table 4a). In addition, haplotype analysis also supported independent effects of the three SNPs (Supplementary Table 4c). Second, we discovered a novel independent SNP within the IFIH1 locus. In Caucasian samples, the conditional analysis on the top SNP rs1990760 (OR=1.20, P=3.21 × 10−13) revealed a novel independent association at rs3747517 (ORcondition=0.75, Pcondition=2.87 × 10−11), although rs3747517 did not show association by itself (OR=1.04, P=0.15; Supplementary Fig. 3f and Supplementary Table 5a). Consistently, rs3747517 only showed significant association (ORcondition=0.80, Pcondition=5.16 × 10−9) in the Chinese cohort after conditioning on rs1990760 (Supplementary Fig. 3f and Supplementary Table 5a). The interactive effect between the two SNPs was confirmed by haplotype association analysis showing that only the haplotype 2 carrying the C alleles of the two SNPs showed strong protective effect (Supplementary Table 5b). In addition, our analysis showed that conditioning on rs1990760 can abolish the two previously reported independent associations at rs2111485 and rs17716942 (Supplementary Table 5a). These results suggest that both rs1990760 and rs3747517 tag the association effect of this haplotype in Caucasian and Chinese populations, and true causal variant carried by this haplotype remains to be discovered. In addition, we have also confirmed the previously reported independent associations at rs2910686 and rs30376 within ERAP1/ERAP2 locus16 in the Caucasian GWAS samples (Supplementary Table 6 and Supplementary Fig. 3g). However, we did not detect the independent association effect of rs2910688 (OR=0.99, P=0.67) in the Chinese GWAS samples, although there was sufficient power to detect the effect of the same size (power=0.98 at P=0.05). Another six suggestive secondary associations failed to be validated or reach genome-wide significance (Supplementary Data 3).

Figure 2: The regional association plots of ZMIZ1 showing locus heterogeneity.
figure 2

The relative location of annotated genes and the direction of transcription are shown in the lower portion of the figure, and the chromosomal position is shown on the x axis. The blue line shows the recombination rate (estimated from HapMap data of CEU, CHB and combined population) across the region (right y axis), and the left y axis shows the significance of the SNP associations (−log10 P). The square indicates the SNPs for conditional analysis (these are the top or secondary SNPs); the circle labelled with rs IDs are reported psoriasis susceptibility SNPs. All circles and squares are colour filled based on the heterogeneity results (I2) in our trans-ethnic meta analysis. (a,b) Unconditional and conditional logistic association results of Caucasian, (c, d) unconditional and conditional logistic association results of Chinese.

Fine Mapping Analysis of HLA Associations

To better understand the strong and extensive association within the major histocompatibility complex (MHC) region (chr6: 20–40 Mb, build 36; Supplementary Fig. 5), we performed a fine-mapping analysis of the region by imputing classical alleles and coding variants of HLA molecules and untyped SNPs in the Caucasian and Chinese discovery cohorts separately. As expected, the HLA-C*06:02 allele showed the strongest association within the region in both Caucasian (P=3.05 × 10−167) and Chinese (P=2.09 × 10−101). Stepwise conditional analysis revealed additional independent associations at the penta-allelic amino-acid polymorphism at the position 67 of HLA-B (HLA-B_AA_67) (Pcondition=4.31 × 10−25 after conditioning on HLA-C*06:02), HLA-A*02:01 (Pcondition=3.62 × 10−10 after conditioning on both HLA-C*0602 and HLA-B_AA_67) and rs9265656 (Pcondition=5.27 × 10−9 after conditioning on the three HLA variants) in the Caucasian cohort (Supplementary Table 7). Rs9265656 tags (r2=0.78) the classical allele HLA-B*07 (Pcondition=1.13 × 10−5 after conditioning on the three HLA variants above) and did not show any eQTL effect. In the Chinese population, the stepwise conditional analysis revealed additional independent associations at HLA-A*02:07 (Pcondition=5.30 × 10−40 after conditioning on HLA-C*06:02), amino-acid position 67 of HLA-B (Pcondition=8.96 × 10−25 after conditioning on HLA-C*06:02 and HLA-A*02:07), the quad-allelic amino-acid position 114 of HLA-A (Pcondition=6.04 × 10−13 after conditioning on three HLA variants) and rs3131857 (Pcondition= 9.76 × 10−8 after conditioning on all four HLA variants; Supplementary Table 7). Rs3131857 tags (r2=0.72) the biallelic amino-acid polymorphism at the position 144 of HLA-A (HLA-A_AA_144) (Pcondition=1.55 × 10−6 after conditioning on the prior four HLA variants) and did not show any eQTL effect. Beyond these HLA variants, no other HLA variants or SNPs showed independent association with P<10−7 (Supplementary Table 7 and Supplementary Fig. 5).

Although the HLA-C*06:02 and the AA position 67 of HLA-B appear to be shared between Caucasian and Chinese populations, the other independent HLA risk variants differ between the two populations. HLA-A*02:07 shows a strong association in Chinese but is very rare or absent in Europeans, whereas HLA-B*07 shows a strong association in Caucasians but is in turn very rare in Chinese. The other HLA variants are common (>10%) in both Caucasian and Chinese, but show population-specific associations, HLA-A*02:01 for Caucasian and the AA positions 114 and 144 of HLA-A for Chinese. Our findings indicate that all three major HLA class I genes may play an important role in psoriasis, and more importantly, illustrate a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations.

Analysis of Ethnic Heterogeneity

To further investigate the ethnic heterogeneity of psoriasis susceptibility, we compared the association signals of 44 confirmed (40 previously reported and 4 newly discovered) non-MHC loci between the Caucasian and Chinese cohorts, searching for the loci where association effects are mapped to different independent SNPs in two populations (allelic heterogeneity) or only detected in one population (locus heterogeneity). Of the 44 loci, we did not observe evidence of genetic heterogeneity among the independent samples of each ethnic population, but found the evidence of population-specific effect at 10 loci (ELMO1, ERAP2, PRDX5, PRM3.SOCS1, RNF114, RUNX1, TP63, TRAF3IP2, TYK2 and ZMIZ1), where the association effect was only detected in the Caucasian samples (Pcorrected<0.05 after correction for 44 independent SNPs tested), but not in the Chinese GWAS samples (P≥0.1; Supplementary Data 1). For each of the nine loci, the Chinese samples provide sufficient statistical power (>95%) for detecting the association effect observed in the Caucasian samples at nominal significance (P<0.05), and the OR estimate was significantly different between the two populations (PQ<0.05). Regional plots of association results clearly demonstrate that besides the top Caucasian SNPs, there are no additional associations within the effective LD of these nine loci in the Chinese samples (Fig. 2 and Supplementary Figs 3 and 4). We also constructed a genetic risk score of these ten loci (count of risk alleles carried by an individual) and evaluated its association. Although this genetic risk score shows a highly significant association in Caucasian (P=3.94 × 10−36), there is no evidence of association in the Chinese population (P=0.21; Fig. 5). Taken together, these results provide strong evidence for the population-specific effects of these loci in the Caucasian population.

Figure 3: Odds ratios by the decile of polygenic risk score estimated based the top SNPs within the ten Caucasian-specific loci.
figure 3

The polygenic risk score (PRS) was calculated based on the ten Caucasian-specific SNPs as described (online Methods). The PRS were converted to deciles (1=lowest, 10=highest RPS), and nine dummy variables created to contrast deciles 2–10 to decile 1 as the reference. Odds ratios and 95% confidence intervals (error bars) were estimated using logistic regression with principal components to control for population stratification.

Figure 4: The regional association plots of IL23R showing allelic heterogeneity.
figure 4

The relative location of annotated genes and the direction of transcription are shown in the lower portion of the figure, and the chromosomal position is shown on the x axis. The blue line shows the recombination rate (estimated from HapMap data of CEU, CHB and Combined population) across the region (right y axis), and the left y axis shows the significance of the SNP associations (−log10 P). The square indicates the SNPs for conditional analysis (these are the top or secondary SNPs); the circle labelled with rs IDs are reported psoriasis susceptibility SNPs. All circles and squares are colour filled based on the heterogeneity results (I2) in our trans-ethnic meta analysis. (a,b) Unconditional and conditional logistic association results of Caucasian, (c, d) unconditional and conditional logistic association results of Chinese.

Figure 5: The regional association plots of IL28RA shared locus without heterogeneity.
figure 5

The relative location of annotated genes and the direction of transcription are shown in the lower portion of the figure, and the chromosomal position is shown on the x axis. The blue line shows the recombination rate (estimated from HapMap data of CEU, CHB and combined population) across the region (right y axis), and the left y axis shows the significance of the SNP associations (−log10 P). The square indicates the SNPs for conditional analysis (these are the top or secondary SNPs); the circle labelled with rs IDs are reported psoriasis susceptibility SNPs. All circles and squares are colour filled based on the heterogeneity results (I2) in our trans-ethnic meta analysis. (a,b) Unconditional and conditional logistic association results of Caucasian, (c,d) unconditional and conditional logistic association results of Chinese, (e,f) unconditional and conditional logistic association results of trans-ancestry combined sample.

We have also estimated the contribution of these Caucasian-specific loci to the prevalence difference of psoriasis between the Caucasian and the Chinese populations. By assuming the independent and multiplicative effects of all the ten loci in Caucasians and no genetic effects in Chinese population (OR=1), our analysis indicated that the cumulative effects of these Caucasian-specific loci could explain up to 82.83% of the prevalence difference of psoriasis between the Caucasian and the Chinese populations. Because our Chinese samples only have sufficient power (statistic power>90%) to detect genetic effect of OR=1.15 or higher, our analysis may have overestimated the contribution of these loci to the prevalence difference by assuming that the ORs of these loci in Chinese population equal to 1. Our analysis, however, has clearly demonstrated that the cumulative genetic effects of those Caucasian-specific loci make a significant contribution to the prevalence difference of psoriasis between the Caucasian and the Chinese populations.

In addition, the association evidences within the IL23R locus were shown to be consistent with allelic heterogeneity between Caucasian and Chinese. There were two independent associations at rs2295359 and rs12564022 in the Caucasian samples (ORcondition=1.15, Pcondition=8.7 × 10−8 and ORcondition=1.23, Pcondition=9.0 × 10−14 respectively), but only one association at rs2295359 (OR=1.15, P=3.62 × 10−8) in the Chinese (Fig. 3 and Supplementary Data 4). The risk T allele of rs12564022 is more frequent in Chinese than Caucasians, but did not show independent association in Chinese. The haplotypes of the two SNPs showed different associations between the two populations: although the haplotype GC showed similar protective effect in both populations, the other two common haplotypes showed population-specific risk effect with the GT showing risk effect in Caucasian and the AC showing risk effect in Chinese (Supplementary Table 8). The diverse patterns of associations are consistent with different risk variants and thus allelic heterogeneity between the two populations.

For the remaining 27 loci, consistent association (OR in the same direction and without evidence of heterogeneity) was observed at either the same SNP or SNPs in high LD between the Caucasian and Chinese populations, and the meta-analysis of the combined Caucasian and Chinese samples revealed stronger evidence for association than the two individual samples (under a fixed effects model; Fig. 4, Supplementary Data 1 and Supplementary Figs 3, 4 and 6). Although not evidence of heterogeneity for the OR estimate, we observed significant frequency difference (PQ<0.05) for three SNPs within COG6, FBXL29 and IL23A/STAT2 between Caucasian and Chinese populations (Supplementary Fig. 7). In addition, we constructed a genetic risk score of the 30 SNPs in these 27 loci and found that the genetic risk score showed a significant difference among 52 Human Genome Diversity Panel (HGDP) populations and a significant correlation with the longitude of population, suggesting that these loci may be under selection and polygenetic adaptation to local environment (Supplementary Fig. 8). In addition, it is also interesting to note that the populations of European origin show higher risk score than the ones of Asian origin (Supplementary Fig. 8), which is consistent with the known higher disease prevalence in Europeans than Asians.

Discussion

TP63 is a strong candidate within the locus 3q11.2. TP63 is a key regulator of mammalian epidermal stratification and keratinocyte proliferation and differentiation process27,28 and is expressed significantly in human skin and keratinocyte (Supplementary Tables 9 and 10). The discovery of TP63 as a novel susceptibility locus provides additional biological insight into the role of keratinocyte proliferation and differentiation in the development of psoriasis. It is also interesting to note that TP63 has also been discovered as susceptibility gene for lung and bladder cancers as well as immune and inflammatory response29,30,31,32,33. Rs8128234 on 21q22 is located within RUNX1. RUNX1 encodes the alpha subunit of core-binding factor, which is a heterodimeric transcription factor that binds to the core element of many enhancers34. RUNX1 gene has been shown to regulate alpha-beta T-cell differentiation and epidermis development (Supplementary Table 9)35,36, which are both critical for psoriasis development, and has been implicated in multiple autoimmune-related disease phenotypes including psoriasis37,38. Rs9533962 is located in long intergenic non-protein coding RNA LOC144817 (ref. 39). And rs34394770 is an intronic variant within COG6 that encodes a subunit of conserved oligomeric Golgi complex that is required for maintaining normal structure and activity of the Golgi apparatus40. Rs34394770 is in high LD with previously reported rs7993214 in Caucasian population (r2>0.98, D′>0.99)41, and our study has therefore confirmed the previously suggested association of COG6 at genome-wide significance. The enrichment analysis of the previously and newly discovered loci has implicated a number of biological processes (gene ontology (GO) terms), which suggest the important roles of immune system and transcriptional regulation in psoriasis development (Supplementary Table 11).

We confirmed that there were multiple susceptible signals for psoriasis in MHC region. Besides HLA-C*0602, it was shown that common risk amino acid of HLA-B at position 67 in both Caucasian and Chinese, which was also consistent with the finding in a recent study42. We observed the significant allele frequency difference of some HLA risk alleles, such as HLA-A*02:07 and HLA-B*07, between Caucasian and Chinese populations, which could be a result of strong selection against diverse pathogens between the two ethnic populations43.

We also observed the population-specific effects of the AA positions 114 and 144 of HLA-A as well as another 10 non-MHC loci. These population-specific effects suggest a substantial genetic heterogeneity of psoriasis susceptibility between ethnic populations, which could be a result of different evolutionary process and environmental exposures that were experienced by two populations. Such population-specific effects could also be a reflection of potential clinical differences (such as clinical subtypes and family history) between Caucasian and Chinese patients used in the current study, but such a impact should be limited due to the fact that all the Caucasian and Chinese patients used in the current study were recruited by using the same criteria. Further stratified studies, such as by clinical subtypes and family history, can help to gain better understanding on the genetic heterogeneity of psoriasis susceptibility between Caucasian and Chinese populations.

In summary, we have conducted the first large-scale trans-ethnic GWMA in psoriasis and discovered seven novel associations, including four novel susceptibility loci and three novel independent associations within previous known loci. The new loci implicate additional players of the immune and keratinocyte development mechanism of psoriasis. In addition, we have investigated the genetic heterogeneity of psoriasis susceptibility by directly comparing the evidences of association of all the 45 confirmed loci between Caucasian and Chinese populations. Although many loci show consistent associations between the two ethnic populations, Caucasian-specific effect was observed for ten loci whose effects contribute significantly to the higher prevalence of psoriasis in Caucasian than in Chinese population. And the comparison of the shared loci across world populations has provided further evidence for the contribution of genetic susceptibility loci to disease prevalence difference between populations. In conclusion, our trans-ethnic genome-wide study has advanced the understanding of the genetic architecture of psoriasis susceptibility by discovering novel associations as well as revealing genetic heterogeneity across different ethnic populations.

Methods

Study subjects

The samples included in the five GWAS (University of California (UC), The Genetic Association Information Network (GAIN), Kiel, Genizon and Anhui) were previously published15,16. The UC data set included 202 psoriatic cases and 492 controls of European ancestry. The data set used for the analyses in GAIN of GWAS stage were obtained from the database of Genotype and Phenotype at http://www.ncbi.nlm.nih.gov/gap through database of Genotype and Phenotype accession number (phs000019.v1.p1). The Kiel sample included 471 cases and 1,129 controls, and the Genizon data from Canada included 1,755 samples. Anhui data set was comprised of 1,139 cases and 1,112 healthy controls in Han Chinese (Table 1). KI and Singapore data sets included samples of European descent from Sweden and Han Chinese from Singapore, respectively. The Singapore study included additional controls from Singapore Prospective Study Program (SP2). Replication analysis were performed in seven independent samples, six from populations of European descent (UC, KI, Harvard, Spain, Celera, Kiel) and one from Chinese population (Anhui). Recruitment of participants for all studies was approved by the local institutional ethics review boards, Institute of Dermatology, Anhui Medical University, National Skin Centre and National University of Singapore, Christian-Albrechts-University of Kiel, University of California San Francisco, Department of Medicine, Karolinska Institutet, Celera, Brigham & Women’s Hospital, Harvard Medical School and Centre for Genomic Regulation (CRG), in adherence with the Declaration of Helsinki Principles. DNA was isolated from blood using standard methods.

Genome-wide genotyping analysis

The genotyping and the QC procedures of the five published GWAS data sets can be found in the previous publications15,16. The cases of the Singapore GWAS data set were genotyped by using the Illumina Human550 BeadChip, whereas the controls were genotyped by using Illumina Human550, Illumina Human610 Quad and 1Mduo3 BeadChip. The cases and controls of the KI data set were genotyped by using Illumina Human Hap550 BeadChip. Standard QC procedures were applied as previously in Singapore and KI cohorts15. Briefly, samples with call rate of <0.95 and SNPs with mean BeadStudio GenCall score of <0.7, call rate of<0.95, Hardy–Weinberg equilibrium (HWE) P value of <1 × 10−6 or minor allele frequency (MAF) of <0.01 were excluded. Cryptic relatedness between individuals was identified through a full identity-by-state matrix. Population substructure of the Singapore and KI data sets was ascertained using principal components analyses (PCAs) with the EIGENSTRAT programme44 with respect to four population panels in the HapMap samples (CEU, CHB, JPT, YRI). The final Singapore and KI data sets consisted of 461,696 SNPs in 2,883 samples and 522,758 SNPs in 1,925 samples, respectively (Table 1).

Whole-genome imputation

Whole-genome imputation was performed by using the IMPUTE version 2 (ref. 45), and the East Asia (ASN) and Europe (EUR) haplotypes of the 1000 Genomes Project reference panel (Dec 2010 version) were used as reference for imputation of Chinese (ASN) and European (EUR) samples, respectively 46. Imputation was performed in each of the seven GWAS data sets individually and by using those SNPs that passed QC thresholds (mean call rate >0.95, MAF>0.01, HWE P>1 × 10−6 in controls). SNPs with impute information less than 80%, MAF less than 1% and HWE P in controls less than 1 × 10−6 were excluded from further analysis.

To determine whether specific coding variants within HLA genes contribute to the diverse association signals, we imputed classical HLA alleles and coding variants across the HLA region (chr6:20–40 Mb build 36) in each of the seven data sets. The five European data sets were imputed using a reference panel constructed using 2,767 individuals from the MHC Working Group of the Type 1 Diabetes Genetics Consortium (T1DGC)47. The two Chinese data sets were imputed using a reference panel constructed using genotypes from 89 Chinese (CHB) and Japanese (JPT) samples from the HapMap project48. Each reference panel was constructed using dense SNP genotype data and classical HLA alleles determined to four-digit resolution. Based on the EMBL-EBI Immunogenetics HLA Database, amino-acid variants were coded in as binary markers (present/absent) in the reference panel. Imputation of classical HLA alleles and their corresponding amino acids were performed using BEAGLE as previously described47,49. A total of 253 classical HLA alleles (two- and four-digit resolution) and 357 amino-acid positions were imputed in the European samples, and 167 classical alleles and 305 amino-acid positions were imputed in the Chinese samples.

Genomic controls

We used PCA-based method to analyse and detect population stratification in individual GWAS data set. Principal component adjustments were performed in four data sets, respectively (one, one, two, three principal components (PCs) for Anhui, KI, UC and Kiel cohorts, respectively), to adjust for population stratification. This reduced the extent of genome-wide inflation to <1.05 in all individual GWAS study (Table 1), a level considered acceptable by conventional GWAS standard. The cases and controls of the Singapore data set were well matched, and no PCA-based correction was performed.

The final inflation factors of the test statistics were modest (λ1,000=1.017, 1.01 and 1.011 for the Caucasian, Chinese and combined discovery cohorts, respectively; Table 1), which suggested that the final association results from the three genome-wide analysis were largely free from major inflation because of population stratification.

Association analysis

Genome-wide meta analyses (GWMA) were performed in this study: (i) Caucasian data (N=8,682, 4,778,154 SNPs); (ii) Han Chinese data (N=5,134, 4,562,294 SNPs); (iii) combined data (N=13,816, 3,621,551 SNPs).

The association analyses were done in the imputed dosage data using SNPtest v2 using score test, with the study and principal components (for four data sets) included as covariates to adjust for study effect and population stratification (Table 1). We also applied DerSimonian-Laid random effect model to detect ethnic heterogeneity by using odds ratio estimates and standard errors from GWMA of Caucasian and GWMA of Chinese data as the inputs. Association evidence from replication study was tested using an inverse variance meta-analysis assuming fixed-effects, with a Cochran’s Q test and I2 to assess between-study heterogeneity. A P<5 × 10−8 was used as threshold for genome-wide significance.

In HLA region, association tests were performed on genotype dosages as determined from the imputed genotype probabilities, which would account for any uncertainties in the imputed genotypes. We conducted Wald tests on all the imputed bi-allelic variants, adjusting for study variables and principal components as covariates. For multiallelic amino-acid sites, we performed the global ‘omnibus’ test at each site taking into account all alleles at each multiallelic position. The omnibus test was performed as previously described50, in summary: convert k-alleles to k-1 bi-alleles, invoke the glm function in R to estimate the multivariate model, and use the likelihood ratio test to compute the global multiallelic test P-values. All conditional logistic regression analyses were done using R.

SNP selection and replication

We selected 43 SNPs with suggestive evidence of association (P<5 × 10−5 from the single variant logistic regression analysis) in at least one meta-analysis (Caucasian, Chinese and Combined) for the independent validation analysis. The Sequenom MassArray system was used for most of the replication studies, except for Celera samples, where 30 SNPs were genotyped using TaqMan assays (Life Technologies), and Han Chinese samples, where 12 SNPs were genotyped using TaqMan assays.

Conditional analysis

To detect novel secondary association signals, we performed conditional analysis in 45 regions (41 known GWAS loci and 4 novel loci). Secondary SNPs with conditional P<5 × 10−8 was then assumed to be independent from the lead SNP in this region. The P values in multivariate analysis were reported after adjusting for the leading SNPs.

Functional annotation and eQTL analysis

We annotated the four novel SNPs and their linked SNPs in 1000 Genomes Project Phase 1 CEU panel (r2>0.8) using HaploReg 2 (http://www.broadinstitute.org/mammals/haploreg/haploreg.php). We further performed their eQTL analyses using the following recently published eQTL data sets: (i) a meta-analysis of the transcriptional profiles from the peripheral blood cells of 5,311 Europeans (http://genenetwork.nl/bloodeqtlbrowser)51; (ii) the transcriptional profile from primary B cells and monocytes in 288 healthy Europeans52; (iii) the eqtl.uchicago browser with compiled data across diverse tissues.

GO enrichment analysis

We submitted the 54 implicated genes (Supplementary Data 1) to the Database for Annotation, Visualization and Integrated Discovery (DAVID)53 for GO term enrichment analysis based on GO level. Fisher’s exact test was implemented in DAVID to compute the enrichment P value for each GO term. We excluded the HLA region in this analysis. And we only reported the significant results at P<10−4.

Ethnic heterogeneity

To compare the ethnic heterogeneity of susceptibility in psoriasis, we selected from the 44 non-MHC confirmed SNPs or proxy SNPs with considerate LD for validation mostly in our GWAS samples. Based on the comparisons between reference allele frequency and the association results, ethnic heterogeneity was categorized into three types: (i) loci with common variants, in which the same susceptible signals existed in Caucasian and Chinese Han; (ii) loci ethnic specific, in which there was significant association evidence only in one specific race (locus heterogeneity); (iii) Loci with allelic heterogeneity, in which the association in diverse populations showed obvious different causal variants.

Geospatial risk analysis for 27 shared loci (30 SNPs) on HGDP

To assess the geospatial pattern of genetic risk, we used the publicly available genotype data from the HGDP with 1,043 individuals of 52 worldwide populations. The genotypes were generated on Illumina 650Y arrays for 660,918 markers (Stanford University). This high-quality genotype data (overall sample call rate >98.5%) was available for 10 out of 30 shared SNPs. The QC had been done before imputing the missing genotyped SNPs. Eighty-eight individuals were removed because of low heterozygosity rate and first-degree relatives and markers excluded with low call rate (<95%) and MAF (<0.01). Using the remaining data set with 955 individuals (52 populations) and 630,559 markers, the whole-genome imputation was performed using 1000 Genome Projects data set (Phase I, Dec. 2013 version) as reference panel. All 30 SNPs were imputed with high confidence (impute info 0.90 or above), and MAF bigger than 0.05.

For each individual (j), the genetic risk score of multiple locus (m) was estimated by

where was the log of adjusted meta-OR for each locus, Gij was the number of risk alleles at each locus. The standardized risk score was calculated using a z-score method across populations by

The median of standardized risk score was compared among 52 worldwide populations by Kruskal–Wallis test. The standardized risk score was correlated with longitude, latitude of the population.

Odds ratio by risk score profile of ten population-specific loci

For this purpose, the polygenic risk score (PRS) for each individual was calculated as counts of risk alleles of ten population-specific loci54. The OR was estimated using logistic regression on PRS with adjustment of PCs to control for population stratification. There are seven cohorts (two for Chinese and five for Caucasian) in the discovery stage of this study. The adjusted ORs were estimated individually for seven cohorts. Meta-analysis was performed separately in Chinese and Caucasian using PLINK v1.07.

To analyse the ORs by risk score profile, the PRS was converted to deciles (1=lowest, 10=highest PRS). Nine dummy variables were created to compare 2–10 to decile 1 as the reference. ORs by deciles were estimated by logistic regression with PCs adjusted. The ORs were combined separately in Chinese and Caucasian using META v1.5 with fixed effects model. The 95% confidence intervals were calculated by exp(BETA±1.96*SE).

Difference in prevalence between Chinese and Caucasian accounted for by ten population-specific loci

The method was modified based on Supplementary Methods published by William et al.55. We modelled the psoriasis prevalence accounted for by the ten Caucasian-specific loci in both Caucasian and Han Chinese and determined the amount of psoriasis prevalence would be reduced if the ten SNPs were absent from each population. The psoriasis prevalence was about ten times higher in Caucasian (2.5%) than in Chinese (0.2%). So we calculated estimates of reduction relative to the impact sum of the ten SNPs on Chinese prevalence. We applied a standard log-additive effect model and used population-based ORs from either Chinese (OR=1) or Caucasian (ORs from the current study) and assumed the OR was a good estimate of relative risk, R. We also assumed these ten loci contributed to the risk of psoriasis independently, and they exerted additive risk effect in total. Then, the overall psoriasis prevalence in population P was modelled as: , where KPA was the disease prevalence if all loci were absent from the population, m was the number of loci, pi was the frequency of non-risk allele for ith loci, qi the frequency of risk allele for ith loci. We then calculated the proportion of prevalence in difference in Caucasian (CA) and Chinese (CH) accounted for by ten loci by

Where KCA was the overall prevalence in Caucasian and KCH the overall prevalence in Chinese.

Co-expression-driven gene functional prediction

We used a described method to shed insight into the putative functions of the biological implicated genes in our study (http://genenetwork.nl:8080/GeneNetwork/)56. Gene function prediction is based on the idea that genes with shared expression profiles are likely to have related biological functions. The method uses data on co-expression profiles to predict the likely functions of as-of-yet uncharacterized genes and refine our understanding of the function of other genes. To apply the method, we queried the co-expression database with our four novel genes. The query for each gene returned the probable function of the gene or the reconstituted pathway in which it operates. The database was generated by linking information about gene expression obtained from published data on approximately 80,000 gene expression profiles (from the database Gene Expression Omnibus).

Additional information

How to cite this article: Yin, X. et al. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat. Commun. 6:6916 doi: 10.1038/ncomms7916 (2015).