Revisiting structural organization of proteins at high temperature from network perspective

Interactions between distantly placed amino acids in the primary chain (long-range) play a very crucial role in the formation and stabilization of the tertiary structure of a protein, while interactions between closely placed amino acids in the primary chain (short-range) mostly stabilize the secondary structures. Every protein needs to maintain marginal stability in order to perform its physiological functions in its native environment. The requirements for this stability in mesophilic and thermophilic proteins are different. Thermophilic proteins need to form more interactions as well as more stable interactions to survive in the extreme environment, they live in. Here, we aim to find out how the interacting amino acids in three-dimensional space are positioned in the primary chains in thermophilic and mesophilic. How does this arrangement help thermophiles to maintain their structural integrity at high temperatures? Working on a dataset of 1560 orthologous pairs we perceive that thermophiles are not only enriched with long-range interactions, they feature bigger connected clusters and higher network densities compared to their mesophilic orthologs, at higher interaction strengths between the amino acids. Moreover, we have observed the enrichment of different types of interactions at different secondary structural regions.

Folding of proteins to their characteristic three dimensional (3D) structures is an important aspect for 2 every living organisms. The underlying mechanisms of protein folding and stability is yet to be 3 completely understood. 1,2 Various theoretical studies, which include simulations, statistical comparison 4 and machine learning, have been performed on available protein structures to understand the mechanism 5 of protein stability and to propose a guideline for enhancing it. [3][4][5] Beside there are experimental works 6 that try to engineer thermostable enzymes incorporating the knowledge from preceding theoretical end 7 experimental studies. 6,7 Even after rigorous research over several decades, researchers are unable to 8 concur with any universal adaptation mechanism of thermostability, rather, it has appeared to be an 9 interplay of several molecular features, encoded underneath the 3D structures. [8][9][10] During protein folding, 10 amino acids make several contacts among themselves, which play crucial role in determining the native 11 3D structure of the protein. These contacts can be represented as an amino acid contact network where the 12 nodes are denoted by amino acids and an edge between two nodes is considered if the respective amino 13 acids occur within certain distance cutoff in 3D. 11 If two interacting nodes, i and j, in 3D space, are 14 separated by more than 10 amino acids in the primary chain (| − | > 10), they form long range interaction 15 network (LRN). 11 On the other hand, nodes within 10 amino acid separation in the primary chain form 16 short range interaction network (SRN, where, | − | > 3 and | − | ≤ 10, a lower cutoff of 3 amino acid 17 separation in the primary chain is taken to minimize the effect of secondary structure in the 3D contact 18 network). 11 This network or graph-theoretical approach is widely used in deciphering several inherent 19 features of residue contacts in various studies. It has also been used to compare several network properties 20 of contact networks and their sub-networks of thermophilic and mesophilic proteins. 4

21
In our last two studies we have shown that thermophilic proteins feature more salt-bridges, disulfide 22 bridges, cation-π, π-π and Coulombic interactions compared to their orthologous mesophilic counterparts, 23 but how these interacting amino acids are positioned in the primary chain and whether they have any 24 association with thermostability, were not tested. 9,10 Two interacting amino acids in the three dimension, 25 when close to one another in the primary chain, would stabilize local structures, while if distantly placed 26 in primary chain, would stabilize the tertiary structure and have a larger impact on global stability. 11,12 27 Previously Sengupta and Kundu, comparing 12 meso-thermo orthologous protein pairs, showed that the 28 length-normalized size of largest connected component (LCC) of long-range interaction network (LRN) is 29 bigger in thermophilic than mesophilic at higher interaction strength cutoff, while short-range interaction 30 network (SRN) exhibits no such trend. 11 In the current study, with a large dataset of 1560 orthologous 31 protein pairs of different folds and functions, 9 we try to explore-whether the above mentioned 32 observations are valid irrespective of wide variety of size, fold, class and function of the proteins and to 33 further analyze the association of network density and salt bridge, π-π, cation-π and electrostatic 34 interactions etc, at long and short range, in thermostability. We observe that the number of different 35 interactions in orthologous thermo-meso protein pairs, differs more significantly in long range than short 36 range. We observe a pattern of thermopholic LRN having larger connected components and higher 37 density than their mesophilic counterparts for nearly 95% and 85% of the total 1560 orthologous pairs, at 38 higher interaction cutoff ( , discussed in materials and methods section). Whereas, no significant 39 difference is observed in these parameters in SRN. While number of interactions in thermophilic SRN is 40 higher than their mesophilic orthologs, no statistical significance is observed in LRN. We further analyze, 41 how these interactions are distributed among the secondary structural elements in a protein to find out a 42 possible explanation for two opposing pattern in Coulombic and van der Waals interaction networks. This 43 study, in its limited scope, with its simple experimental framework, is able to capture several important 1 features of residue contact network and their association with themostability. 2

Data collection 4
Comparison of two thermo-meso orthologous proteins based on only their evolutionary ancestry, without 5 considering their global topologies, structures and functions often lead to inconsistent and contradictory 6 outcomes. 9,10 Therefore, in the current study we have used a very carefully curated dataset that we have 7 used in our two previously published studies. 9,10 This dataset contains the sequences and x-ray 8 crystallographic structures (≤ 3.0 Å resolution, and ≥80% protein sequence coverage) of 1550 Meso-9 Thermo/Hyperthermo (M-T/HT) orthologous protein pairs which exhibit similar lengths, 3D topologies, 10 ACO 13 (ACO is calculated as the average separation of contacting amino acids in the primary chain), and 11 same domain architectures. Here with these 1550 orthologous pair, we have combined 12 pairs from 12 Sengupta and Kundu's study. 11 After removing the duplicates, this combined dataset contains a total of 13 1560 Meso-Thermo orthologous proteins pairs (Data S1). Our dataset includes proteins from 82 bacteria 14 and 14 archaea distributed in mesophilic, thermophilic and hyperthermophilic groups based on their 15 optimal growth temperature. The lengths of the proteins vary in the range from 50 to over 1000 amino 16 acids. The ACO of the proteins vary in a wide range of 13 to 107. 9 The orthologous pairs contain nearly 17 300 different Pfam 14 domains. So, our input dataset is free from any kind of biasness. 18 Finding salt-bridge, cation-π, π-π and di-sulfide bridge 19 Primary structure of protein is a one dimensional arrangement of twenty different amino acids, connected 20 through peptide bonds. During folding, distant parts of this primary chain come closer and form different 21 kinds of molecular contacts such as salt-bridge, di-sulfide bonds, cation-π, π-π contacts, van der Waals 22 interaction etc. 9 Salt bridges are strong Columbic interactions between side chains of oppositely charged 23 residues approximately ≤ 5 Å apart. 15 Oxidization of the sulfhydryl (SH) groups of cystine residues 24 spaced within approximately 2.3 Å in protein structures form disulfide bonds. 16 A π-π interaction is 25 created when the distance between the centroids of two aromatic side chain are placed between 4.5 to 7.0 26 Å. 17 Cation-π interaction is formed when the positively charged side chains of lysine and arginine and the 27 aromatic rings of aromatic amino acids fall within a distance of approximately 4 Å. 18 28

Finding out van der Waals and electrostatic interactions 29
Van der Waals interaction between two residues is considered when their atoms comes within a cut-off 30 distance of their combined van der Waals radii + 0.5 Å. 19 All these interactions have been found to 31 subsequently enhance the stability of a protein in various literatures. Our own in-house python scripts are 32 used to identify these interactions. 33 Electrostatic interaction between two charged amino acids is considered when at least one ion from each 34 of the two different amino acids come within a distance cutoff ranging from 4 Å to 10 Å. 20 For simplicity 35 we have not calculated electrostatic interaction energy of the interactions. However, we can consider that 36 higher the distance cutoff, lower is the electrostatic interaction energy for the same pair of ions. 37

Calculating network strength 38
Protein contact networks are constructed considering the amino acids as nodes and van der Waals 1 interaction among these nodes as edges. For the network formation, we have calculated the interaction 2 strength of amino acid side chains given in by, 11,21 3 Where, is the number of distinct interacting pairs of side-chain atoms between the residues i and j, 5 which come within distance d in 3D space (d = combined van der Waals radii of the atoms + 0.5 Å 19 ). 6 and are the normalization factors for the residues i and j, respectively. Normalization factor for the 7 amio acid i is given as, 11,22 8 For a protein, k, all the side and main chain interactions of residue type i with its surrounding residues are 10 calculated.
( ( )) is the maximum number of interaction pair by residue type i in protein k. 11 The normalization factor for the residue type i ( ), is the average of ( ( )) for total number 12 of proteins in the study ( ). The normalization factors consider the differences in different types of amino 13 acids in terms of their sizes of the side chains and propensity of making contacts with other amino acid 14 residues in protein structures. 11,22 Using the same methodology, the interaction strengths for electrostatic 15 interactions are also calculated for different distance cutoffs. 16

Construction of long range network (LRN) and short range network (SRN) for van der Waals and 17
Coulombic interaction 18 Once for all amino acid pairs are evaluated, the contact networks are formed, where an edge between 19 nodes i and j is considered when is higher than a chosen cutoff value ( > ). 11 This cutoff 20 value is varied from 0% to 12% for van der Waalss interaction network and 0% to 25% for Coulombic 21 interaction network in our study. Number of edges in the networks decreases with increasing cutoff as the 22 number of nodes with higher number of interactions also decreases. The contact networks were then used 23 to construct Long range interaction network (LRN, where, | − | > 10 in the primary chain) and Short 24 range interaction network (SRN, where, | − | ≥ 3 and | − | ≤ 10 in the primary chain). 11 The primary 25 chain separation below three are not considered in the contact network as the immediate neighbors 26 naturally feature large number of interacting atom pairs among them. Using the same methodology the 27 LRNs and SRNs for all the orthologous pairs are calculated. 28

Estimating the size of largest connected component (LCC) and network density (ND) for van der 29 Waals interaction networks 30
For different cutoff values we have extracted the LCC for the LRNs and SRNs. A LCC is the largest 31 group of connected nodes in a network that are reachable to each other directly or indirectly. 11,21,22 Here, 32 we have evaluated and compared the size and average network density (ND) of the LCC in LRNs and 33 SRNs for each of the M-T/HT orthologous protein pairs. The ND of a network component is defined as 34 the ratio of edges in the component over the maximum possible edges between the nodes in that 35 component. For number of edges E and total number of nodes in a connected component N, ND is given 1 by ND = 2E/N(N−1). 23 Further the weighted average for the NDs of all connected components is 2 calculated. 3

Statistics and plot generation 4
PAST4.02 (PAleontological STatistics) software 24 and our own in-house python scripts are used for the 5 statistical analyses. All the plots are produced using OriginPro (ORIGINLAB, NORTHAMPTON, MA, 6 USA) and Pyplot package of Python 2.7. 7

RESULTS AND DISCUSSION
8 LRN based on van der Waals interactions possess larger connected components even at higher 9 interaction strength cutoffs 10 The role of van der Waals interactions in protein stability has been well documented in previous 11 literatures. There are several studies of the network (created based on van der Waals interaction) 12 parameters for M-T/HT orthologous proteins, but they cannot profoundly differentiate between meso and 13 thermo, rather often provide contradictory outcome. Among several parameters the importance of the 14 largest connected component (LCC) in network analysis has been well tested. 11,21,22 LCC provides the 15 information on nature and connectivity of a network and it undergoes a transition in its size as a function 16 of the cutoff for the whole contact network as well as for its different sub-network. Sengupta and 17 Kundu showed that the difference in normalized size of LCC in thermo and meso is higher for long range 18 network (LRN) than short-range network (SRN). 11 Similar trend is observed in the normalized LCC vs 19 for 1560 orthologous pairs in the current study. This pattern is not equally present in every 20 orthologous pairs (Supplementary Figure S1). Therefore, we take the mean value of normalized LCC at 21 every cutoffs and have plotted the mean value in Figure 1A. The mean LCC size in SRN in meso 22 and thermo do not possess significant difference in any of the cut-off while LRN in thermo holds a 23 significantly higher (the p values of Mann Whitney-U test is given as gradient under the plot) value then 24 LRN in meso even at higher cutoffs ( Figure 1A). The percentage increase in LCC size in thermo for 25 SRN and LRN exhibit a significant difference in all the cutoffs ( Figure 1B) while the difference in 26 their median value increases with increasing cutoff. We then estimate the percentage of pairs for 27 which the LCC size is bigger in thermophilic proteins than their mesophilic orthologs in all the higher 28 cutoffs than a selected cutoff . For that, we calculated the average LCC size for the orthologous 29 proteins for all the cutoffs, higher than . We also taken the percentage change of LCC size in 30 thermophilic with respect to their mesophilic counterpart, which is denoted by ∆ for SRN and 31 ∆ for LRN, where, ∆ can be expressed as ( ℎ − ) × 100. In Figure 1C and 1D, 32 we have represented the percentage of orthologous pair, where LCC size is higher in thermophilic 33 proteins than their mesophilic orthologs, with varying cutoff from 0% to 10% and ∆ cutoff from 34 0% to 20% in an incremental step of 5% for LRN and SRN respectively. At higher and higher 35 ∆ cutoffs, the percentage of pairs with ∆ > 0, drops below 30% (Figure 1D). On the other 36 hand, for LRN, nearly 90% of the total pairs satisfy the trend at higher cutoff ( ≥ 6%) even at 37 ∆ ≥ 20% (Figure 1C). This clearly indicate that, at higher cutoff, the size of the largest 38 connected component van der Waals interaction network in thermophilic proteins is bigger than their 39 respective mesophilic orthologs, while the transition of LCC size for SRN is similar for thermophiles and 1 mesophiles. 2 Long-range interactions among amino acids in a protein contribute to the stability and integration of its 3 tertiary structure. Moreover, the strength of interactions reflects the connectivity of different amino acids 4 among themselves within a protein, influencing its packing and stability. The difference in the transition 5 profiles of LCC in LRN between thermophilic and mesophilic orthologous proteins suggests that long-6 range interactions guide the enhanced stability of a protein's tertiary structure in thermophiles. Whereas, 7 larger long-range network components in thermophilic proteins indicate a need for more non-covalent 8 interactions to bring together distant parts of a protein's primary structure in three-dimensional space. At 9 higher temperatures proteins naturally destabilize, one convenient way of preventing it could be creating 10 higher number of interactions among the constituent amino acids. 11

Denser LRN of van der Waal's interaction network is a key feature of thermophilic proteins 5
A network is made of a set of connected components. The largest connected component alone is not 6 enough for expressing protein stability from network point of view. All the connected components will 7 also have some effect on the stability of a protein, based on how densely connected the nodes in all these 8 components are. Therefore, we have calculated the average network density (ND) of the proteins for the 9 components that contains at least three nodes. This average ND is calculated for different cutoff 10 varying from 0% to 12% and plotted using python matplotlib package (a sample of 4 pairs are provided in 11 Supplementary Figure S2). The mean of average ND of all the connected components in a network for 12 1560 pairs is plotted against cutoff in Figure 1E and Supplementary Figure S2. For SRN, the mean 13 of average ND for thermo and meso appears to be very close to one another in every cutoffs 14 (Supplementary Figure S2). Whereas, for LRN, we observe significant difference among thermo and 15 meso, in all cutoffs. Also, the p value of significance gets higher in higher cutoffs ( ≥ 5) 16 ( Figure 1E). We also estimated the percentage of pairs, for which the average ND value is consistently 17 bigger in all cutoffs higher than a selected cutoff, . Using a similar formula, we used for 18 calculating ∆ , we also estimated ∆ and ∆ . The percentage of pairs with higher average 19 ND value in thermophilic than their mesophilic orthologs, is represented in a contour plot for different 20 ∆ and different cutoffs for LRN and SRN in Figure 1F and Supplementary Figure S3 proteins are more densely connected than their mesophilic orthologs. Because of these densely connected 29 nodes, thermophilic proteins can provide more resistance against thermal denaturation. 30 The network density (ND) of a protein refers to the level of connectivity among its residues. A higher ND 31 indicates a greater number of interactions and a more densely connected protein structure. ND plays a 32 crucial role in the stability of proteins. A densely connected network provides structural integrity and 33 support, ensuring that the protein maintains its 3D shape. Therefore, changes in ND can be indicative of 34 conformational changes or structural rearrangements in response to external stresses like temperature, 35 salinity etc. Here the ND of SRN in both thermophilic and mesophilic orthologous proteins shown a 36 similar transition profile over a wide range of cut-offs, whereas ND in LRN shows significant 37 difference. It clearly indicates that the need of thermal adaptation does not affect SRN but changes LRN 38 for more interconnectedness between the nodes. 39

Thermophilic and mesophilic proteins differ in SRN formed on the basis of Coulombic interactions
The LRN and SRN of the Coulombic interaction networks contain both attractive and repulsive 1 interactions. But most of them do not form any cluster, rather occur as isolated interaction pairs in the 2 network. So instead of estimating the LCC size, we took the number of pairs featuring higher interaction 3 strength than a selected cutoff. Then we take the net attractive−repulsive interaction count, 1560 orthologous pairs for different distance cutoffs (Figure 2A-D). In contrary to van der Waals 8 network, we observe that SRN possesses higher difference between thermo and meso and this trend 9 weakens with increasing value, when the distance cutoff for considering a Coulombic interaction is 10 ≤10Å (Figure 2A). As we decrease the distance cutoff for Coulombic interaction the pattern does not 11 change for the cutoff of ≤8Å and ≤6Å (Figure 2B and 2C). But for the distance cutoff ≤4Å, we observe 12 both of the LRN and SRN Coulombic interaction networks exhibit higher mean for both − 13 − and − − . However, the mean value is higher for LRN for ≤4Å 14 (Figure 2D), which means that the distant amino acids of the thermophilic proteins form higher number 15 of stronger attractive (as the Coulombic interaction force increases with decreasing distance cutoff) 16 Coulombic interaction than their mesophilic ortholgs do, although the effect of Coulombic interaction is 17 limited to mostly the stabilization of the local structure.

24
Coulombic interactions play an important role in the folding of proteins and also in maintaining their 3D 25 structures. [25][26][27] During protein folding, charged residues interact with each other and with the solvent, helping the protins to stabilize into specific conformations. 26,27 These interactions help in the proper 1 positioning of secondary structure elements as well as in the formation of tertiary structures. 26,27 A 2 Coulombic interaction occurs between charged residues that are spatially separated in the protein 3 structure. Although the energetic contribution of Coulombic interactions are not as much as a salt-bridge, 4 and it varies based on the distance and the charges of the ions, cumulatively these interactions can 5 influence the stability of the protein by contributing to the overall electrostatic potential energy. 10 6 Depending on the charges involved, Coulombic interactions can either stabilize or destabilize the protein 7 structure. In the current study when we take the net attractive − repulsive Coulombic interaction count we 8 consider all the interactions above a cut-off contribute equally to the overall electrostatic potential 9 energy. Therefore, an enhancement in this count indicates higher contribution in electrostatic potential 10 energy. We have observed that thermophilic proteins feature higher number of salt-bridges in both LRN 11 and SRN than their mesophilic orthologs ( Figure 3A). Similarly, when we consider net electrostatic 12 interactions of LRN and SRN we observe indistinguishable pattern for distance cutoff ≤4Å (close to 13 cutoff of salt-bridge) ( Figure 2D). However, when we consider higher distance cutoffs, the difference of 14 net Coulombic interaction in SRNs of thermophilic and mesophilic is much higher compared to LRNs of 15 the orthologs (Figure 2A-C). This indicates that Coulombic SRN aid more in achieving thermostability 16 than Coulombic LRN. 17

Stabilization of secondary structures and higher order structural organization 18
Van der Waals and Coulombic interactions are two of the five major interactions those play a key role in 19 the folding and stability of proteins. The distribution of Coulombic and van der Waals' interactions 20 between different structural segments of proteins vary depending on the specific protein, its conformation, 21 amino acid sequence, 3D structure, and the local environment of the protein. Thermophilic proteins are 22 expected to have more van der Waals and Coulombic interactions, 9,10 but at the same time it would be 23 more interesting to find how these additional interactions in a thermophilic-ortholog are distributed within 24 the protein structure. 25 The largest connected component (LCC) in van der Waals SRN and the number of net 26 attractive−repulsive interaction in Coulombic SRN denote a strong local structural organization, whereas 27 the same in LRN denote strong global structural organization of the contacting nodes. As we found that 28 the thermophilic proteins feature higher number of net attractive−repulsive interaction in Coulombic 29 SRN, we expect that Coulombic interaction will be over represented in intra secondary structural (intra 30 SS) nodes, while van der Waals interactions are expected to be found in the inter secondary structural 31 (inter SS) nodes. Distributing these interactions from both mesophilic and thermophilic orthologous 32 proteins in three groups-i) intra SS, ii) inter SS and iii) loop-linked, we observe that thermophilic proteins 33 are enriched with higher inter SS and loop-linked van der Waals interactions ( Figure 2E) and intra SS 34 and loop-linked Coulombic interactions ( Figure 2F). As loop regions feature mostly polar and charged 35 residues, 28 they can be stabilized by forming both van der Waals and Coulombic interactions within and 36 with the structured region.  Salt-bridge, cation-π and π-π interactions are more abundant in thermophilic LRN than their 4 orthologous mesophilic LRN 5 Folding of a protein to its native structure is achieved by clustering of hydrophobic patches followed by 6 formation of several kind of interactions as distant regions of the proteins come closer. While interactions 7 in SRN are involved in stabilization of mostly the secondary structures, interactions in LRN aid in 8 holding two distant region of the proteins together. In our previous studies we have shown that 9 thermophilic proteins feature higher number of salt-bridges, di-sulfide bridges, cation-π, π-π interactions 10 as well as higher Coulombic energy gain coming from partially exposed charge reversal mutations but 11 their association with thermostability at long and short range are yet to be explored. Here, an interaction is 12 considered short range if the primary chain separation of the interacting residues are between three to ten 13 and beyond ten it is considered as long range. So, we compare the number of these interaction (N(T) and 14 N(M) are the number of a type of interaction in a thermophilic protein and it's mesophilic counterpart, 15 respectively) in short and long range for the orthologous pairs. In these comparisons, our null hypothesis thermophilic, respectively. The number of salt bridges in both mesophilic and thermophilic are higher in 21 long range than their respective short range interaction. The p value of significance in Mann-Whitney U 1 test between mesophilic and thermophilic in both long and short range exhibit similar values (8.7E−35 in 2 short range and 1.8E−38 in long range, Figure 3A). The average numbers of cation-π interactions in 3 mesophilic and thermophilic proteins are 1.9 and 2.8 in short range, respectively, whereas the same in 4 long range are 3.3 and 5, respectively. In case of π-π interaction, these numbers are 2.4 and 2.7 in short 5 range and 5.4 and 6.1 in long range in mesophilic and thermophilic, respectively. The p values of Mann-6 Whitney U test between thermophilic and mesophilic for both of these interactions show higher 7 significance in long range than short range (Figure 3B and 3C). Di-sulfide bridges are comparatively 8 rarely found in the protein structures. When we distribute them according to long and short range, the 9 number of di-sulfide bridges become even smaller. We could not found any significant difference 10 between the Meso-Thermo/Hyperthermo (M-T/HT) orthologous proteins either in the short or long 11 range ( Figure 3D). 12

Structural organization of thermophilic proteins depends on the long range interactions to a 13 greater extent 14
Earlier studies have shown that, the normalized size of LCC undergoes a transition with increasing 15 cutoff for all the proteins and transition is sharper for SRN than LRN. Working on a dataset of twelve 16 meso-thermo orthologous protein pairs, Sengupta and Kundu showed that the transition of normalized 17 size of LCC in SRN is similar for both mesophilic and thermophilic proteins but the transition in LRN is 18 sharper for mesophilic than thermophilic, which indicates the presences of larger interconnected long 19 range interaction in thermophilic proteins. 11 SRN is the network of interacting amino acids that are also 20 closely positioned in the primary chain, hence provides local stability. On the other hand, LRN contain 21 amino acids positioned far from each other in the primary chain but interacting in three dimension. So, 22 LRN connects distant parts of a protein, establishes interactions between several secondary structures, 23 thus stabilize the tertiary structure of a protein. We observe that with increasing cutoff, the 24 normalized LCC size gradually decreases in both SRN and LRN. To study this transition we estimated 25 value 11 (the value at which the normalized size of LCC becomes half of the size it has at 26 cutoff = 0%) of all the networks for 1560 orthologous pairs. Lower the value for a plot, 27 sharper is the transition, which mean the network loses its connections at a smaller cutoff. The 28 transitions in SRN in both mesophilic and thermophilic look similar ( value falls with a range of 29 2%-2.25% for both meso and thermo SRN in 1560 pairs). The transition in LRN networks, are not as 30 sharp as that in SRN. LRN networks drop down to half of its LCC size at higher cutoff, i.e. higher 31 value. However, the value for thermo_LRN ( value is 4.0% to 4.5% for 1560 32 pairs) is even higher than meso_LRN ( value is 3.5% to 4.0% for 1560 pairs). This means not the 33 thermophilic proteins have higher number of interacting amino acid pairs also larger connected cluster of 34 long range interactions at larger cutoff. Besides our study also consider the effect of other connected 35 components alongside the largest one. The average ND values of connected components is higher in 36 thermophilic proteins than their mesophilic counterparts indicating densely connected nodes stabilizing 37 thermophilic proteins even at higher cutoff. 38 From previous literature we know that one of the key forces that stabilizes individual secondary structures 39 is the formation of hydrogen bonds in a repetitive manner. Alpha helix and beta sheets are the two most 40 stable polypeptide chains that have the highest hydrogen bonding potential. 29 So, the higher number of 41 Coulombic interaction within same secondary structure (intra SS in Figure 2F), is not the key force in 42 stabilizing them, rather an associative one. Beside when we consider the distance cutoff for Coulombic 1 interaction ≤4Å, higher number of interaction is found in thermophilic LRN, which means although a 2 huge number of long distance (weak) Coulombic interaction gain is found in SRN, fewer but strongest 3 Coulombic interaction is found in LRN network. Therefore, Coulombic interaction also have a key role in 4 stabilizing global structure. Moreover, when different secondary structural elements are driven closer by 5 hydrophobic interactions a large number of non-specific van der Waals interactions are formed between 6 them. Beta sheet structure is also stabilized by formation of long range interaction between separate beta 7 strands. Our result of over representation of van der Waals interactions in inter secondary structures and 8 the loop-linked interactions supports the previous observations ( Figure 2E). 9 Our study also exhibit higher number of long range salt-bridge, cation-π and π-π interactions in 10 thermophiles which may assist in stabilizing the tertiary structure at high temperature. Our analysis on a 11 large, carefully curated dataset, reconfirms the outcomes of previous studies and found few more features 12 of the association of thermostability with long range interactions, can encourage further exploration of the 13 network aspect of thermostability. 14 Author Contribution research; both analyzed data and wrote the paper. 17