Family-wide analysis of integrin structures predicted by AlphaFold2

Recent advances in highly accurate protein structure prediction by AlphaFold have opened new avenues for analyzing all structures within a single protein family. In this study, we evaluated the capacity of the newly developed AlphaFold2-multimer for predicting integrin heterodimers. Integrins are heterodimeric cell surface receptors made up of a combination of 18 α and 8 β subunits, forming a family of 24 different members. Both α and β subunits contain a large extracellular domain, a short transmembrane domain, and usually a short cytoplasmic domain. Integrins play a wide range of cellular functions by recognizing diverse ligands. Structural studies in recent decades have greatly advanced our understanding of integrin biology, but high-resolution structures have only been determined for a few members of the integrin family. We studied the single-chain atomic structures of 18 α and 8 β integrins in the AlphaFold2 protein structure database. We then applied the AlphaFold2-multimer program to predict the α/β heterodimer structures of all 24 human integrins. The results show a high level of accuracy in the predicted structures for the subdomains of both α and β subunits and provide high-resolution structure information of all integrin heterodimers. Our structural analysis of the entire integrin family reveals a potentially diverse range of conformations among the 24 members and provides a useful structure database for guiding functional studies. However, our results also suggest the limitations of AlphaFold2 structure prediction and thus caution is required in the interpretation and usage of the AlphaFold2 structures.


Introduction
Integrins are cell surface receptors that recognize a variety of extracellular or cell surface ligands, allowing cells to integrate signals from both inside and outside of the cell. 1 The human integrin family consists of 24 members, which result from the combination of 18 α and 8 β subunits (Fig. 1). The 24 α/β integrin heterodimers are either widely distributed or specifically expressed in certain cell types, thus playing common or specialized functions in cellular responses related to cell adhesion and migration. Based on ligand or cell specificity, integrins can be classified into subfamilies of RGD (Arg-Gly-Asp) receptors, collagen receptors, laminin receptors, and leukocyte-specific receptors. 1 Integrins are key players in diseases such as thrombosis, inflammation, and cancer, making them prime targets for small molecule or antibody inhibitors. [2][3][4] Since their discovery in the early 1980s, understanding the structure and function of integrins has been an area of continuous research interest. 5 The α and β subunits of integrin are composed of multiple subdomains. The α subunit contains β-propeller, thigh, calf-1, calf-2, transmembrane (TM), and cytoplasmic tail (CT) domains ( Fig. 1A-C). The β subunit contains βI, hybrid, PSI, I-EGF 1-4, β-tail, TM, and CT domains ( Fig. 1A-C). A subclass of α integrins have an extra αI domain inserted into the β-propeller domain (Fig.1D-F). For αI-less integrins, the α β-propeller and β βI domains come together to form the ligand binding site (Fig. 1A-C). For αI integrins, the αI domain is responsible for binding ligands (Fig. 1D-F). Integrin domains can also be divided into headpiece (containing head and upper legs) and lower legs (Fig. 1C). In the past few decades, structural studies of integrins have revealed a conformation-dependent activation and ligand binding mechanism, involving a transition among at least three conformational states. 6,7 The bent conformation with closed headpiece represents the resting state of integrin (Fig. 1A, D), while the extended closed headpiece and extended open headpiece represent the intermediate and high-affinity active states, respectively ( Fig. 1B-C, E-F). The conformational transition of integrin can be initiated by the binding of intracellular activators, including talin and kindlin, to β CT, resulting in inside-out signaling, or by the binding of extracellular ligands, resulting in outside-in signaling ( Fig.  1A-F). 8,9 The conformation-dependent activation model, however, was largely derived from structural studies of the highly-regulated β 2 and β 3 integrins that are primarily expressed in blood cells. 6,7 Since structural information for most integrin members is lacking, it remains unclear whether the current model of integrin conformational change can be applied to the entire integrin family.
using the artificial intelligence-based AlphaFold2 program has provided a powerful tool for analyzing previously hard-to-determine protein structures with a high level of accuracy. 44 We analyzed the predicted atomic structure models of single-chain 18 α and 8 β integrins that are available in the AlphaFold2 database (Fig. 1G). Furthermore, using the recently developed AlphaFold2-multimer program, 45 we predicted the structures of all 24 human integrin α/β heterodimers. Our analysis of the integrin family structures revealed potential conformational diversity across its 24 members. We also identified previously unknown structural features, and created a comprehensive database of integrin structures that can guide functional and structural studies. These findings highlight the effectiveness of AlphaFold2 in predicting the structures of large, complex protein families, including integrins.

Running AlphaFold2
AlphaFold 2 Version 2.1.2 was running on the HPC Cluster at the Medical College of Wisconsin using Miniconda3 virtual environments. The AlphaFold2 downloaded reference files are located at: "/hpc/refdata/alphafold". Customized sbatch job script was submitted for structure prediction. Typically, one GPU (--gres=gpu:1) and 100 GB memory (--mem=100gb) was requested to run AlphaFold2. The maximum job running time was set to 48 h (--time=48:00:00). To run AlphaFold2-Multimer for structure prediction of integrin heterodimers, an input fasta file containing the sequences of both integrin  and  subunits was provided. The multimer prediction function was enabled with command "--model_preset=multimer". Full length or extracellular domain structures of integrin heterodimers without signal peptides were predicted with or without templates by setting the parameter of "--max_template_date=2000-05-14" or "--max_template_date=2023-01-01". For integrin α 6 β 4 structure prediction, the large cytoplasmic tail of β 4 was truncated after KGRDV to simplify the prediction. The top ranked models were selected for further analysis.

Comparison of AlphaFold2 predicted integrin structures
The single chain α integrin structures downloaded from AlphaFold2 database were superimposed based on the α IIb calf-2 domain using the "super" command in PyMOL. The single chain  integrin structures downloaded from AlphaFold2 database were superimposed based on the  3 βI domain using the super command in PyMOL. The experimentally determined structures for α IIb (PDB 3FCS), α V (PDB 4G1E), α 5 (PDB 7NXD), and α X (PDB 4NEH) were superimposed onto the predicted corresponding structures. The experimentally determined structures for β 3 (PDB 3FCS), β 1 (PDB 7NXD), and β 2 (PDB 4NEH) were superimposed onto the predicted structures accordingly. The integrin heterodimer structures predicted by AlphaFold2-multimer with or without TM-CT domains were superimposed onto the calf-2 domain of α IIb in PyMOL. For structure comparison of integrin TM-CT heterodimers, the structures were superimposed based on the α IIb TM domain. The aligned structures were individually oriented to position them perpendicularly to the cell membrane.

Flow Cytometry Analysis of LIBS mAb binding
The HEK293T cells were grown in complete DMEM (Corning) supplemented with 10% fetal bovine serum (FBS) (Sigma-Aldrich). Cells were maintained in a 37C incubator with 5% CO 2 . Flow cytometry analysis of integrin expression and LIBS mAb binding were as described previously. 47 In brief, HEK293T cells were transfected with EGFP-tagged α integrin constructs plus β 1 integrin. 48 hours post-transfection, the cells were detached, washed, and resuspended in HBSGB buffer (25 mM HEPES pH 7.4, 150 mM NaCl, 2.75 mM glucose, 0.5% BSA) containing 1 mM Ca 2+ /Mg 2+ or 0.1 mM Ca 2+ plus 2 mM Mn 2+ . Cells were incubated with either 9EG7 mAb (BD Biosciences) or MAR4 (BD Biosciences) for 15 mins followed by additional 15 min incubation with Alexa Fluor 647-conjugated goat anti-rat or mouse IgG. Surface binding of mAb was measured by a BD Accuri TM C6 (BD Biosciences). Relative surface expression of extended  1 integrins were normalized to total surface  1 integrins and plotted with Prism 9.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The predicted structures were deposited online as supplementary materials.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint

The AlphaFold2 structures of 18 α and 8 β human integrins
We extracted the structures of 18 human α integrins from the AlphaFold2 protein structure database. All the structures in the database were predicted based on the full-length single-chain amino acid sequence, including the signal peptide, ectodomain, TM and CT domains. To compare the ectodomain structures of all the α integrins, we superimposed the structures based on the calf-2 domain of α IIb , and then the individual structures were oriented vertically to membrane normal and rotated as necessary to show the position of β-propeller domain relative to cell membrane (Fig. 2). The structures were grouped based on ligand or cell specificity. For all the structures, the correct folding of individual domains was successfully predicted (Fig. 2). The four RGD-binding α integrins all show a sharp bent conformation, with the α IIb and α V nearly identical to their crystal structures ( Fig. 2A). However, the α 5 AlphaFold2 structure is more bent than its half-bent cryo-EM structure ( Fig. 2A). For the three laminin receptors, only α 6 is in a sharp bent conformation as the RGD receptors, while α 3 and α 7 are in a half-bent conformation (Fig. 2B). The α 4 and α 9 integrins are also in the bent conformation as the RGD receptors (Fig. 2C). Interestingly, all the four α integrins of collagen receptors are more extended than bent, with α 10 exists in a nearly fully extended conformation (Fig. 2D). The five leukocyte-specific α integrins also display conformational diversity, with α L and α X more bent than α M , α D , and α E (Fig.  2E). The AlphaFold2 structure of α X is nearly identical to the crystal structure (Fig. 2E).
We compared the structures of 8 human β integrins that were predicted as single-chain structures in the AlphaFold2 protein structure database. The structures were superimposed based on the β 3 βI domain and then orientated individually to position them vertically to membrane normal. As shown in Fig. 3, the correct folding of individual domains including βI, PSI, hybrid, I-EGF domains, and β-tail domain (β-TD) were accurately predicted for β 1 to β 7 integrins. The β-TD of β 8 is smaller than other β integrins and its structure was incompletely predicted (Fig. 3, β 8 ). The AlphaFold2 structures of β 3 , β 2 , β 4 , β 5 , β 6 , and β 7 all assembly a bent conformation as seen in the crystal structures of β 3 and β 2 , while the β 1 and β 8 structures are less bent (Fig. 3). The half-bent conformation of β 1 AlphaFold2 structure is comparable to the cryo-EM structure ( Fig. 3 β 1 ).

The domain interface where α and β integrin undergo extension
Previous structure studies revealed that the extension of α integrin happens at the interface between thigh and calf-1 domains, where located a disulfide bonded knob, denoted as genu (Fig. 1C, F). We did sequence alignment of all α integrins for the junction of thigh and calf-1 domains (Fig. 4A). The structure of α IIb was used as an example to show the interface between thigh and calf-1 domains at bent conformation (Fig. 4B). The interfacial residues were shown in red in the sequence alignment (Fig. 4A) and as red sticks in the structure (Fig. 4B). Sequence alignment shows that the interfacial residues as well as the putative N-glycan sites are not well conserved (Fig. 4A). Some α integrins, such as α V , α 8 , α 4 , α 9 , α 10 , and α E have putative N-glycan sites on the interface of either thigh or calf-1. Interestingly, the laminin receptors α 3 , α 6 , and α 7 all have a longer . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint interfacial loop (region 1) on calf-1 (Fig. 4A). However, there are no signature sequences that appear to prefer a bent or extended conformation.
Integrin β subunit extends at the I-EGF-1 and I-EGF-2 junction (Fig. 1C, F). Sequence alignment of 8 human β integrin at this region showed no obvious residue conservation except the classical disulfide bonds of EGF domains (Fig. 4C). In the bent conformation of β 3 integrin (Fig. 4D), the interface between I-EGF-1 and I-EGF-2 is much smaller compared with the interface between thigh and calf-1 in α IIb (Fig. 4B), and unlikely plays a major role in maintaining the bent structure. However, the length of C1-C2 loop in I-EGF-2 domain has been shown to regulate integrin extension. 48 A landmark disulfide bond is missing in the I-EGF-1 of β 8 (Fig. 4C), which may contribute at least in part to the distinct conformational regulation of β 8 integrin.

The integrin α/β heterodimer structures predicted by AlphaFold2-multimer
The integrin structures in AlphaFold2 database are only for single-chain α and β subunits. We utilized the AlphaFold2-multimer module to predict the α/β heterodimer structures of all 24 human integrins. To avoid any potential model bias from known integrin structures, we set the template search date at the year 2000, prior to which no integrin structures had been reported. All 24 integrin structures were successfully predicted by AlphaFold2multimer (Fig. 6). All the structures were superimposed onto the α IIb calf-2 domain and then orientated individually to position the ectodomains vertically to membrane normal and grouped based on ligand or cell specificity (Fig. 6). Overall, the inter-subunit . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint interfaces such as that of α β-propellor and β βI domains were correctly predicted. The α IIb β 3 and α V β 3 ectodomain structures are very close to the crystal structures with Cα RMSD about 2 Å. All the RGD receptors assembled a bent conformation, except α 5 β 1 is half-bent (Fig. 6A). Similarly, all the laminin receptors are also bent (Fig. 6B). The four collogen receptors all show a half-bent structure, including α 10 β 1 (Fig. 6C). α 10 β 1 was suggested to be more extended based on 9EG7 mAb binding (Fig. 5D). Among the leukocyte-specific integrins, only α L β 2 and α E β 7 are sharp bent, while α M β 2 , α X β 2 , α D β 2 , and α E β 7 are more extended (Fig. 6D). The α 4 β 1 and α 9 β 1 are in half bent conformation (Fig. 6E). The relative orientation of TM domains to the cell membrane are incorrectly predicted for most of the structures.
Since 9 of the integrin structures predicted by AlphaFold2-multimer show interactions between TM-CT and ectodomains (Fig. 7A), we asked if such artificial interactions affect the overall integrin structure prediction. We re-calculated the 9 integrin structures without TM-CT domains using AlphaFold2-multimer. The structures show essentially the same conformations as those with TM-CT domains (Fig. 7B), suggesting that the folding of ectodomain and TM-CT does not influence each other during the structure calculation by AlphaFold2-multimer.
Next, we investigated whether enabling the template option for AlphaFold2-multimer had any impact on the calculation of integrin structures. The α 5 β 1 , α 10 β 1 , α V β 8 , and α X β 2 integrins were selected for the test. When setting the template searching date at the year of 2023, all four integrin structures calculated by AlphaFold2-multimer showed a sharp bent conformation (Fig. 8A-D), which closely resembled the crystal structure of bent α IIb β 3 (Fig. 8A). Notably, the AlphaFold2-multimer predicted α 5 β 1 structure with a template search date set at 2023 is much more bent than the one with a search date set at 2000, which closely resembles the α 5 β 1 cryo-EM structure (Fig. 8A). Similarly, the α 10 β 1 -2023 is bent comparing with half bent α 10 β 1 -2000 structure (Fig. 8B). However, both α V β 8 -2023 and α V β 8 -2000 structures are bent to a similar level (Fig. 8C). In sharp contrast, the α X β 2 -2023 structure shows a conformation resembling the bent α X β 2 crystal structure, while the α X β 2 -2000 structure is extended (Fig. 8D). These findings suggest that the inclusion of template structures can significantly affect the structure prediction outcomes of AlphaFold2-multimer.

Structures of integrin TM-CT domains
Despite the relative simplicity of the sequence and structure of integrin TM and CT domains, only the structure of the α IIb β 3 TM-CT heterodimer has been experimentally determined to date. Sequence alignment of the TM-CT domains from 8 β and 18 α human integrins reveals conservative features at the TM, membrane-proximal (MP), and membrane-distal (MD) regions (Fig. 9A). We analyzed the 24 integrin TM-CT structures calculated by AlphaFold2-multimer (Fig. 9B). Structure alignment reveals a high degree of structural similarity for the heterodimers at TM domain, highlighting the conserved α GXXXG motif and β small G/A residue on α/β interface (Fig. 9B). The conserved GFFKR motifs in the CT MP regions of 18 α integrins all adopt a reverse turn conformation, while the β CT MP regions, except for β 4 and β 8 integrins, all display an α-helical structure . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint extended from the TM region, with the conserved Asp residue located on α/β interface (Fig. 9B). The β CT Asp residue is proximal to the Arg residue in the α GFFKR motif (Fig.  9B), which was proposed to form a salt bridge interaction. 50 The CT MD regions of both α and β subunits exhibit diverse disordered conformations, including the conserved NPXY motif that binds talin, as shown in Figure 9B. Despite not allowing any integrin TM-CT structure templates during the AlphaFold2-multimer calculation, the predicted structure of α IIb β 3 TM-CT closely resembles the experimentally determined structure (Fig. 9C). We also generated a prediction for the α IIb β 3 TM-CT structure in the absence of the ectodomain, which showed a similar TM interface to that predicted with the ectodomain present (Fig. 9D). These results suggest that AlphaFold2-multimer is capable of accurately predicting integrin TM structures.

Discussion
Sequence alignment analysis revealed the averaged sequence identity was 30-40% among the 8 β integrins, and 20-40% among the 18 α integrins. 38 For individual integrin domains, such as the βI domain, the sequence identity can be up to over 60%. 38 Since AlphaFold2 incorporates amino acid sequence, multiple sequence alignments and homologous structures in its structure calculation, 44 the high sequence identity observed among integrin domains may facilitate the accurate prediction of integrin domain structures for most of the family members. This is demonstrated by the comparison of predicted structures with the experimental structures of α IIb β 3 , α V β 3 , and α X β 2 , which show a high degree of structural similarity. Thus, the predicted integrin domain structures can be utilized with a high level of confidence.
Our sequence and structure analysis of the inter-domain interface in bent integrin conformations did not reveal any signature sequences that clearly favor either a bent or extended conformation for both α and β subunits. Despite using bent α IIb , α V , and α X structures in the PDB as part of its training sets, AlphaFold2 was able to predict a diverse range of conformations for the 18 single-chain α structures, ranging from sharp bent to almost fully extended structures, as seen in the case of α 10 . In contrast, the eight singlechain β structures predicted by AlphaFold2 were mostly bent. Also, AlphaFold2-multimer predicated both bent and extended conformations of α/β heterodimers. While our functional assay suggested that α 10 β 1 may adopt an extended conformation on the cell surface, consistent with the predicted single-chain α 10 structure, the heterodimer structure of α 10 β 1 predicted by AlphaFold2-multimer does not exhibit such a conformation. Since integrins can adopt multiple conformational states, the predicted structures by AlphaFold2 may reflect such conformational diversity. We observed that the AlphaFold2-multimer prediction may be influenced by homologous structures, which could potentially introduce bias in predicting the overall conformation of integrins. AlphaFold2-multimer has demonstrated remarkable success in accurately predicting the structures of protein complexes, including those with transient interactions, multiple subunits, and large interfaces. 51,52 Here, we showed that the AlphaFold2-multimer algorithm can successfully predict large complexes of integrin heterodimer structures.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint Although AlphaFold2 has its limitations as many other protein structure prediction programs, its ability to predict integrin domain structures, inter-and intra-domain interfaces, and overall domain organization with an acceptable level of accuracy makes the predicted structures highly useful for designing both functional and structural studies. These structures can be used to analyze interesting N-glycan sites, functional mutations, antibody epitopes, and design constructs for integrin expression and purification. Additionally, the structures can be used for molecular dynamics simulations, ligand docking, and interpreting some functional data. However, caution is needed when using these predicted structures without experimental validation.

Author contributions
J.Z. designed and supervised the research. H.Z. performed the prediction using AlphaFold2-multimer and the 9EG7 binding assay. H.Z, D.Z., and J.Z. analyzed the data, prepared the figures, and wrote the manuscript.

Acknowledgement
This work was supported by the grant R01 HL131836 (to J. Zhu) from the Heart, Lung, and Blood Institute of the National Institute of Health. We thank the Research Computing Center at the Medical College of Wisconsin for providing the help and resources in running the AlphaFold2 program.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023.   . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint The conserved residues highlighted in panel A are shown as yellow Cα spheres. (C) Superimposition of AlphaFold2-predicted α IIb β 3 TM-CT structure (in wheat and magenta) on the heterodimeric structure of α IIb β 3 TM-CT structure determined by disulfide crosslinking and Rosetta modeling (in blue). The α and β subunits are shown in wheat and magenta, respectively. (D) Superimposition of α IIb β 3 TM-CT structure predicted without (green) and with ectodomain (wheat and magenta).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 2, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint