Ab initio modelling of an essential mammalian protein: Transcription Termination Factor 1 (TTF1)

Transcription Termination Factor 1 (TTF1) is an essential mammalian protein that regulates cellular transcription, replication fork arrest, DNA damage repair, chromatin remodelling etc. TTF1 interacts with numerous cellular proteins to regulate various cellular phenomena, and plays a crucial role in maintaining normal cellular physiology, dysregulation of which has been reported towards cancerous transformation of the cells. However, despite its key role in cellular physiology, the complete structure of human TTF1 has not been elucidated to date, either experimentally or computationally. Hence, understanding the structure of human TTF1 becomes highly important for studying its functions and interactions with other cellular factors. Therefore, the aim of this study was to construct the complete structure of human TTF1 protein, using molecular modelling approaches. Owing to the lack of suitable homologues in the PDB, the complete structure of human TTF1 was constructed using ab initio modelling. The structural stability was determined using molecular dynamics (MD) simulations in explicit solvent, and trajectory analyses. The representative structure of human TTF1 was obtained by trajectory clustering, and the central residues were determined by centrality analyses of the residue interaction network of TTF1. Two residue clusters, in the oligomerisation domain and C-terminal domain, were determined to be central to the structural stability of human TTF1. To the best of our knowledge, this study is the first to report the complete structure of human TTF1, and the results obtained herein will provide structural insights for future research in cancer biology and related studies. Author Summary The transcription termination factor 1 (TTF1) is an essential multifunctional mammalian protein which plays important role in regulating important cellular process like transcription, replication, DNA damage repair, chromatin remodelling etc. and its dysregulation leads to various cancers. Despite its being such an important factor, the complete structure of human TTF1 has not been determined to date, either using experimental techniques or computationally. Therefore, the aim of this study was to construct the complete structure of human TTF1 using computational modelling. In this study the complete structure of human TTF1 was constructed by ab initio modelling using iTasser. The stability of this model was determined by 200 ns molecular dynamics (MD) simulations. The representative conformation of human TTF1 was further determined by clustering the simulation trajectory and the residues that are central to the stability of this structure were identified. The results demonstrate the presence of two residue clusters in human TTF1, one in the oligomerisation domain and other in the C-terminal domain, which were found to be crucial for the structural stability of this protein. Hence, the results of this study will aid future studies in this field towards engineering this important protein for further biochemistry and cell biology research.


Introduction
Ribosomes are essential cellular organelles that partake in protein synthesis in both prokaryotes and eukaryotes.Ribosomes are comprised of ribosomal proteins and ribosomal RNA (rRNA), which is encoded by ribosomal DNA (rDNA), and serves as the catalytic subunit of the protein translation machinery.Eukaryotic rDNA is distributed in clusters of ~300-400 copies at both ends of the respective chromosomes (acrocentric chromosomes : 13, 14, 15, 21, and 22).These tandem repeats of rDNA copies create dense chromosomal regions called Nucleolar Organizer Regions (NORs) which consists of a non-transcribed spacer region flanked by pre-RNA coding regions.Of the total RNA that is transcribed, 80% consists of rRNAs [1,2].Both the initiation and termination of rDNA transcription is mediated by a transcriptional regulator called Transcription Termination Factor 1 (TTF1), which is an essential protein in mammalian cells.The gene encoding TTF1 is located on 9q34.13, in the long arm of chromosome 9. Transcription Termination Factor 1 protein (TTF1p) binds to DNA elements known as Sal box, located upstream and downstream of the rDNA gene repeats.In mammalian cells, the Sal box element consists of a SalI restriction site within the 11 bp sequence, GGGTCGACCAG [3].Following its discovery as a transcription regulator, subsequent studies demonstrated that TTF1 is involved in polar replication fork arrest and also acts as a chromatin remodelling factor [4,5].Current findings demonstrate that TTF1p interacts with various DNA damage sensing proteins, including Cockayne Syndrome B (CSB) [6], Mouse Double Minute 2 (MDM2) [7] and tumor suppressor Alternative Reading Frame (ARF) [8] protein, but the mechanism and exact roles of TTF1p remains to be identified to date.The overexpression of TTF1 has been corelated in various tumours, which indicates that owing to tumor hyperproliferation, TTF1 is required in higher quantities to meet the higher rate of ribosome biogenesis in tumor cells [9][10][11].The TTF1 protein has several other unidentified roles, as it appears to interact with various other factors necessary for regulating a wide variety of physiological phenomena in cells.TTF1 is truly a multifunctional protein, and hence, it becomes important to characterise the numerous unidentified roles of this protein in cellular physiology both in healthy and cancerous cells.
TTF1p has distinct functional domains, including an N-terminal regulatory domain (NRD), which also is responsible for the oligomerisation of TTF1 [12].It has been shown that due to its oligomerisation property, TTF1p can loop the ends of rDNA together, thereby placing the promoter and terminator regions in proximity to efficiently recycle the transcription machinery, and this model is known as the "ribomotor" model [13].
Furthermore, TTF1 has a functional central domain and a C-terminal domain, which is essential for the activation and termination of Pol I-mediated transcription on a nucleosomal rDNA template [14].The central domain has the highly conserved DNA binding myb/SANTlike domain which has strong homology with the DNA binding domain of Reb1 protein of Schizosaccharomyces pombe, and proto-oncoprotein c-Myb [15,16].
The only crystal structure of its yeast homolog protein, RNA Polymerase I enhancer binding protein (Reb1p) [17], bound to DNA, was solved to atomic resolution by our group [15].The structure clearly shows an N-terminal regulatory domain which is also known as the dimerization domain, a central DNA-binding domain, and the C-terminal transcriptional terminator domain.Using various mutants, it was demonstrated that the mere binding of DNA to Reb1p is not sufficient for terminating transcription.Further it was shown that the interaction of Reb1p with Replication Protein A (RPA), via the C-terminal domain of Reb1p, is an essential requirement for effective transcriptional termination.The interaction with RPA induces an allosteric change which is necessary for stopping the movement of RNA polymerase I. Also, the domain of Reb1p which binds to DNA was identified to atomic resolution and the residues involved in protein-DNA contacts were identified.This region consists of two myb-associated domains (mybAD1 and mybAD2) and two Myb repeats (mybR1 and mybR2).The helices involved in this region make contact with DNA at various residues [17].
TTF1 is an essential cellular protein owing to its numerous roles in several vital cellular functions, which are necessary for maintaining healthy cellular physiology.
Understanding the structure of TTF1 would provide insights into the mechanistic aspect of its function.To date, there are no experimentally-determined structures or in silico models of TTF1.Our lab is involved in purifying and physically solving the structure of this protein.
So far, crystallization trials have proved to be unsuccessful, and we are therefore attempting cryo-EM studies as well.Alternatively, computational modelling studies on this essential protein will provide a better understanding so that we can engineer the protein for future studies.
In the absence of experimentally-derived structures, homology modelling serves as a reliable method for the construction of protein structures.However, the reliability of the protein model depends on various factors, including the sequence identity between the template and target proteins.When the template-target identity falls below 30%, known as the twilight zone, the protein structure needs to be constructed by threading or ab initio methods [18].This is due to the fact that below the twilight zone, the evolutionary relatedness between the template and target is doubtful, and the confidence of the prediction is low [19].
The worldwide experiment for protein structure prediction, Critical Assessment of protein Structure Prediction (CASP), ranked the iTasser (iterative threading assembly refinement) server as the best tool for ab initio protein modelling.In the latest CASP14 experiment conducted in 2020, the iTasser server (Zhang server) ranked the best among 47 groups [20,21].The iTasser server also ranked best in the previous CASP7, CASP8, CASP9, CASP10, CASP11, CASP12, and CASP13 experiments [22].In the CASP9 experiments in 2010, the iTasser server was predicted to the best tool for protein function prediction [21].In this study, the structure of the TTF1 protein was constructed by molecular modelling, using the iTasser server.The predicted models were validated and the structure was subjected to molecular dynamics (MD) simulations for 200 ns for studying the structural stability of TTF1, and determining the most stable conformation of the protein.Our study aimed to predict the structure of TTF1, which is an essential protein, using computational modelling.
The results of our study will prove to be important for understanding the structural, functional, and therapeutic role of this essential protein.

Sequence-based analyses
The results of sequence-based analysis with ProtParam showed that TTF1 is an unstable hydrophilic protein, as revealed by an instability index of 51.13 and grand average of hydrophobicity (GRAVY) of -0.939.This was corroborated by the results of disorder prediction, which showed that more than 50% of the residues of TTF1 are disordered (Fig 1).The results of disorder prediction further revealed that residues 1-3, 689-696, 700-701, 709, and 903-905 were disordered and had protein binding properties (S1 Fig) .The physicochemical properties predicted by ProtParam and anticipation of disulphide bond (S-S) pattern by CYS REC tool are enlisted in Table 1.Residues with disorder score ≥ 0.5 (represented by the horizontal red line) were considered to be disordered.

Ab inito modelling and structural validation of TTF1
The results of template search using BLASTp against the PDB revealed that the highest target-template coverage was 4%, which was well below the twilight zone for homology modelling [23].Therefore, the structure of human TTF1 could not be modelled using the template-based methods in comparative modelling.The complete structure of human TTF1p was therefore modelled using ab initio methods, using the iTasser server.The confidence of the models predicted by iTasser are indicated by the C-score, which is a confidence score that provides a measure of the quality of the models generated by iTasser.The C-scores range between -5 and 2, with higher values indicating predictions of higher confidence, while lower values of C-score indicate predictions of lower confidence [20].In this study, the model with the highest C-score of -0.60 was selected for subsequent analyses.This model was further minimised using Yasara, and the energy minimised structure was validated using ProSA [24,25].The results of ProSA validation revealed that the structure of TTF1 was comparable to structures of similar size in the PDB, which had been determined using X-ray crystallography (Fig 2A).Analysis of the Ramachandran plot with Procheck revealed that only 1.0% of the residues were in the disallowed regions of the plot, while 82.9% and 14.1% of the residues were in the most favoured and additional allowed regions, respectively (Fig

Functional validation of TTF1
The results of analysis with TM-align revealed that the model of TTF1 generated by iTasser  The results of consensus-based GO prediction revealed that the molecular function of the TTF1 protein model was associated with GO terms GO:0035639 (purine ribonucleoside triphosphate binding), GO:0032559 (adenyl ribonucleotide binding), and GO:0043167 (ion binding), with GO scores of 0.40, 0.40, and 0.39, respectively.These results further confirmed the nucleotide binding properties of the structure of TTF1 obtained with iTasser.
The results of functional validation thus implied that the TTF1 model obtained using ab initio modelling has potential nucleic acid-binding properties, and agrees with the data reported in literature and observed in our lab.

Trajectory analyses
The values of RMSF revealed that some residues had higher flexibility, as indicated by the RMSF values, which were higher than 1.5 Å.The higher flexibility of these residues could be attributed to the fact that these residues mapped to the disordered regions predicted using DisoPred (Fig 4C ).

Centrality analyses
The RINs of the representative structure of TTF1 was determined using Cytoscape v3. ).The Z-scores of the residues in the interaction cluster in the oligomerisation domain were higher than those of the residues in the C-terminal domain, indicating that the interaction cluster in the oligomerisation domain plays a more crucial role in the stability of the human TTF1 protein than that of the interaction cluster in the C-terminal domain.The Z-scores of the central residues determined by centrality analysis are enlisted in Table 2.

Intra-residue hydrogen bonds
Hydrogen bonds with occupancy ≥ 75% and ≥ 85% throughout the 200 ns trajectory and in the last 50 ns, respectively, were considered to be important for the structural stability of the protein.The frequency of the hydrogen bonds throughout the trajectory and in the last 50 ns was determined using VMD.The occupancy of the intra-residue hydrogen bonds formed by the central residues is provided in S 1 Table , and the occupancy of all the intra-residue hydrogen bonds with occupancy ≥ 75% and ≥ 85% throughout the 200 ns trajectory and in the last 50 ns, respectively, are provided in the S 1 Table .The results of interaction analyses revealed that residues K17, E27, Q30, E35, R164, W198, and N228 of the oligomerisation domain, K434 of the chromatin remodelling region, and F657 of the myb/SANT-like-1 domain were most crucial to the structural stability of the protein, as indicated by the number of intra-residue hydrogen bonds and the occupancy of the hydrogen bonds throughout the trajectory.

Discussion
TTF1 is a crucial multifunctional nucleolar protein that regulates both transcription initiation as well as transcriptional termination of ribosomal genes by binding to specific motif sequence and also arrests of the replication fork in polar fashion [2].In addition, TTF1 regulates the transcription of genes transcribed by RNA polymerase I. Using truncated human and murine TTF1 proteins, Evers and Grummt first reported species-specific sequence differences in the DNA-binding region of mammalian TTF1 [3].Despite its major regulatory role in mammalian transcription, replication and chromatin remodelling, the complete structure of human TTF1 remains to be elucidated to date.A partial structure of human TTF1 has been predicted by AlphaFold v2.0, which uses artificial intelligence for predicting the 3dimensional structure of proteins.However, the structure predicted by AlphaFold is partial (residues 491-866), and the remaining residues are largely unfolded, and the confidence of prediction of these unfolded regions is very low [27].As all the residues of a protein are important for its complete regulation and function, it is necessary to consider that protein in its entirety in structural analyses.In this study, we therefore attempted to construct the complete structure of the human TTF1 protein using ab initio modelling and MD simulations, and also identified the residues that are central to the structural stability of human TTF1 by network analyses.To the best of our knowledge, this study is the first to report the complete structure of the human TTF1 protein (refer supplementary for coordinate file).
Owing to the lack of suitable structural homologues in the PDB with sequence coverage above the twilight zone, the structure of TTF1 was modelled using ab initio methods.The model of TTF1 thus obtained was subjected to functional validation and GO analysis for establishing the functional relevance.MD simulations are frequently used for obtaining atom-level insights into the structural dynamics and behaviour of biomolecular system.The stability of the model was subsequently evaluated by MD simulation for 200 ns, using an explicit TIP4P solvent, and the trajectory was analysed for investigating structural stability and hydrogen bond frequency.The representative conformation of the human TTF1 protein was obtained by trajectory clustering, and the residues that play a central role in the structural stability of TTF1 were identified by network analysis and determination of residue centrality.
The results of RIN analysis and computation of centrality measures revealed two interaction clusters in the structure of human TTF1, with one in the oligomerisation domain of TTF1 and the other in the C-terminal domain.The data further indicated that the residue cluster in the oligomerisation domain plays a more significant role in the stability of TTF1, compared to that in the C-terminal domain.The N-terminal oligomerization domain has been shown to play important regulatory function [2] while the C-terminal domain is involved in transcription termination [5].In the absence of experimentally-derived structural data pertaining to the human TTF1 protein, we believe that the results of our study provide valuable structural information, including domain architecture, and their characteristics, among others.Hence, our study could facilitate future studies aimed towards understanding the mechanism underlying the function of the human TTF1, including its interaction with other protein, and for engineering this protein with the purpose of solving its physical structure, drug design and therapeutic applications etc.

Conclusion
Conclusively, this is very first study to report complete structure of the essential human TTF1 protein, using computational modelling, and identify the residues and its characteristics that are central to the structural stability of the protein.

Sequence retrieval and sequence-based analyses
The sequence of TTF1 was retrieved from UniProtKB (UniProtKB accession number: Q15361).The physicochemical properties of TTF1 were analyzed using ProtParam [28], and the disorder profile was analyzed using DisoPred version 3.1 [29,30].

Ab initio modelling of TTF1
The structural homologues of human TTF1 in the PDB was searched using BLASTp and threading-based approaches, for identifying suitable templates for homology modelling.
Owing to the lack of suitable structural templates, the structure of human TTF1 was modelled using ab initio modelling, using the iTasser server [20].In the iTasser algorithm, the final models are selected using the SPICKER program for clustering the generated structures.The structure of TTF1 generated by iTasser was initially minimised using the Yasara energy minimization server, with the Yasara force field [24].The energy minimised structure was then validated using Ramachandran plot analysis and ProSA [31,32].

Functional validation of TTF1 constructed by ab initio modelling
The models generated by iTasser were functionally validated using the TM-align program for determining the structures in the PDB that are structurally, and thus functionally, similar to the models of TTF1p constructed by ab initio modelling.The TM-align program was used to identify structures in the PDB that are structurally similar to the model generated by iTasser.This program determines the similarity between proteins on the basis of the TMscore, a scoring function that provides a quantitative measure of topological similarity between proteins [33].It provides a measure of structural similarity, with values > 0.5 indicating models of correct topology [34].The models were further validated using the COACH and COFACTOR programs for predicting the ligand binding sites, based on the similarity of the protein folds with functional templates [35,36].The result of ligand binding site prediction was mapped to the results of sequence-based conserved domain (CD) analyses using the CD search tool of NCBI [37].The molecular function of the modelled protein was further validated by consensus-based gene ontology (GO) search.

MD simulations
The model of TTF1 obtained by ab initio modelling was subjected to MD simulations for 200 ns using Flare v4, which is based on the OpenMM Toolkit, for studying the structural stability and determining any possible conformational changes of TTF1p.The protein was then prepared in Flare v4 at pH 7.4, and solvated in TIP4P solvent using a buffer of 10 Å thickness.The system was subsequently neutralised by the addition of 28 Cl -ions.The system was then minimized until the energy tolerance reached 0.

Fig 1 :
Fig 1: Graphical representation of the disordered regions of the human TTF1 protein.

Fig 2 :
Fig 2: Structural validation of the energy minimised model of TTF1 using A) ProSA and B)

(Fig 3 )
was structurally most similar to cas13b (PDB ID: 6AAY), which is an RNA-binding protein from Bergeyella zoohelcum with RNase activity[26].The human TTF1 protein is a DNA-binding protein that plays an important role in transcriptional termination.The TMscore of the alignment was 0.960, indicating correct topology, and the RMSD between the generated model of TTF1 and cas13b was 2.29 Å, indicating high structural similarity between the two proteins.The structural similarity between TTF1 and cas13b indicated that the model of TTF1 obtained herein, possesses potential nucleic acid binding properties, similar to cas13b.

Fig 3 :
Fig 3: The structure of TTF1 constructed by ab initio modelling is depicted in light blue Fig 4A, the values of RMSD became increasingly steady after 100 ns, and remained steady

Fig 4 : 2 . 5 .
Fig 4: Graphical representation of the values of A) RMSD and B) RoG of the protein

Fig 5 :
Fig 5: A) Ribbon and B) Surface representation of the representative structure of TTF1

8 . 2 (
Fig 6), and the central residues were identified using the RINspector plugin, based on the RCA Z-scores.In the RIN, the nodes indicate the residues, while the edges represent the intra-residue interactions.Residues with RCA Z-scores ≥ 2 were considered to be central to the structural stability of the protein.As depicted in Fig 6, the residues with Z-scores ≥ 2 are coloured in yellow, and those with Z-scores ≥ 2 are represented in red.The bigger nodes indicate residues with higher values of Z-scores.The RIN revealed two interaction clusters, with one cluster being located in the oligomerisation domain of TTF1, and the other being located towards the C-terminal region of the protein (Fig 6A and 6B

Fig 6 :
Fig 6: The central residues of TTF1 identified by RIN and centrality analyses in the A) 3-