Experiment level curation identifies high confidence transcriptional regulatory interactions in neurodevelopment

To facilitate the development of large-scale transcriptional regulatory networks (TRNs) that may enable in-silico analyses of disease mechanisms, a reliable catalogue of experimentally verified direct transcriptional regulatory interactions (DTRIs) is needed for training and validation. There has been a long history of using low-throughput experiments to validate single DTRIs. Therefore, we hypothesize that a reliable set of DTRIs could be produced by curating the published literature for such evidence. In our survey of previous curation efforts, we identified the lack of details about the quantity and the types of experimental evidence to be a major gap, despite the importance of such details for the identification of bona fide DTRIs. We developed a curation protocol to inspect the published literature for support of DTRIs at the experiment level, focusing on genes important to the development of the mammalian nervous system. We sought to record three types of low-throughput experiments: Transcription factor (TF) perturbation, TF-DNA binding, and TF-reporter assays. Using this protocol, we examined a total of 1,310 papers to assemble a collection of 1,499 unique DTRIs, involving 251 TFs and 825 target genes, many of which were not reported in any other DTRI resource. The majority of DTRIs (965, 64%) were supported by two or more types of experimental evidence and 27% were supported by all three. Of the DTRIs with all three types of evidence, 170 had been tested using primary tissues or cells and 44 had been tested directly in the central nervous system. We used our resource to document research biases among reports towards a small number of well-studied TFs. To demonstrate a use case for this resource, we compared our curation to a previously published high-throughput perturbation screen and found significant enrichment of the curated targets among genes differentially expressed in the developing brain in response to Pax6 deletion. This study demonstrates a proof-of-concept for the assembly of a high confidence DTRI resource in order to support the development of large-scale TRNs. Author Summary The capacity to computationally reconstruct gene regulatory networks using large-scale biological data is currently limited by the absence of a high confidence set of one-to-one regulatory interactions. Given the lengthy history of using small scale experimental assays to investigate individual interactions, we hypothesize that a reliable collection of gene regulatory interactions could be compiled by systematically inspecting the published literature. To this end, we developed a curation protocol to examine and record evidence of regulatory interactions at the individual experiment level. Focusing on the area of brain development, we applied our pipeline to 1,310 publications. We identified 3,601 individual experiments, providing detailed information about 1,499 regulatory interactions. Many of these interactions have verified activity specifically in the embryonic brain. By capturing reports of regulatory interactions at this level of granularity, we present a resource that is more interpretable than other similar resources.


21
To facilitate the development of large-scale transcriptional regulatory networks (TRNs) 22 that may enable in-silico analyses of disease mechanisms, a reliable catalogue of experimentally 23 verified direct transcriptional regulatory interactions (DTRIs) is needed for training and 24 validation. There has been a long history of using low-throughput experiments to validate single 25 DTRIs. Therefore, we hypothesize that a reliable set of DTRIs could be produced by curating the 26 published literature for such evidence. In our survey of previous curation efforts, we identified 27 the lack of details about the quantity and the types of experimental evidence to be a major gap, 28 despite the importance of such details for the identification of bona fide DTRIs. We developed a 29 curation protocol to inspect the published literature for support of DTRIs at the experiment level, 30 focusing on genes important to the development of the mammalian nervous system. We sought 31 to record three types of low-throughput experiments: Transcription factor (TF) perturbation, TF-32 DNA binding, and TF-reporter assays. Using this protocol, we examined a total of 1,310 papers 33 to assemble a collection of 1,499 unique DTRIs, involving 251 TFs and 825 target genes, many 34 of which were not reported in any other DTRI resource. The majority of DTRIs (965, 64%) were 35 supported by two or more types of experimental evidence and 27% were supported by all three. 36 Of the DTRIs with all three types of evidence, 170 had been tested using primary tissues or cells 37 and 44 had been tested directly in the central nervous system. We used our resource to document 38 research biases among reports towards a small number of well-studied TFs. To demonstrate a use 39 case for this resource, we compared our curation to a previously published high-throughput 40 perturbation screen and found significant enrichment of the curated targets among genes 41 differentially expressed in the developing brain in response to Pax6 deletion. This study 42 Introduction 57 Reconstruction of transcriptional regulatory networks (TRNs) has the potential to enable 58 in-silico analysis of developmental processes and disease mechanisms. As such, using high-59 throughput biological data to infer large scale TRNs is an area under active research; recent 60 examples include (1)(2)(3)(4). However, the utility of these TRNs has been hindered by the absence of 61 a high confidence set of regulatory interactions for training and validation. Researchers have 62 historically used less scalable experimental techniques to investigate direct transcriptional 63 regulatory interactions (DTRIs). While low-throughput, such methods tend to be considered 64 reliable, especially if there are multiple independent lines of evidence supporting a DTRI. Thus, 65 there would be value in having resources that aggregate high-quality reports of DTRIs, forming 66 the topic of the current work. Our particular interest is in DTRIs of relevance to the developing 67 nervous system, as mutations in transcription factor (TF) genes (5-7) and regulatory regions (8-68 10) have been highly implicated in neurodevelopmental disorders. 69 We define DTRIs as pairwise interactions between a transcription factor (TF) and a target 70 gene where the TF modulates target expression by physically binding to a cis-regulatory element 71 (cRE). There are three types of low-throughput experimental paradigms commonly used to 72 elucidate DTRIs, including TF perturbation, TF-DNA binding, and TF-reporter assays ( Fig 1A). 73 In TF perturbation assays, manipulation of TF expression is followed by an assessment of target 74 gene expression. In TF-DNA binding assays, protein-DNA interactions between the TF and the 75 cRE are evaluated. Finally, TF-reporter assays measure the functional impact of the TF binding 76 on the associated cRE sequence. While low-throughput assays are not infallible, they generally 77 yield higher confidence than high-throughput alternatives by evading the need for large scale 78 inferential statistics and enabling detailed and readily replicable characterization of single 79 DTRIs; examples: (11,12) ( Fig 1B). Notably, such low-throughput experiments are routinely 80 used to validate putative targets identified by more scalable approaches. Given the importance 81 and wide acceptance of these types of evidence, it would be useful to assemble a centralized 82 catalogue of DTRIs that is supported by low-throughput experimental evidence in the published 83 literature. 84 There have been a number of earlier efforts to aggregate DTRIs from the literature: This is notable because the type and quantity of evidence is expected to affect the reliability of a 90 reported interaction. Specifically, each individual type of evidence provides only a limited view 91 of any given DTRI. TF perturbation assays enable the assessment of the TF's ability to modulate 92 target gene expression but cannot decipher its functional dependence on direct physical binding. 93 Likewise, while TF binding at a cRE is necessary for regulation, detection of TF-DNA binding 94 alone is insufficient for demonstrating functional activity. TF-reporter assays simultaneously 95 demonstrate both functional modulation and physical binding but often by examining the given 96 DTRI outside of the native genomic and cellular context. As such, integration across these types 97 of experiments should help establish DTRIs with high confidence. 98 We hypothesize that curation of details at the individual experiment level would 99 facilitate identification of bona fide DTRIs. In this study, we undertook a systematic effort to 100 curate the literature at a high level of detail. Consequently, we present a resource that is highly 101 interpretable and more suitable for the evaluation of high-throughput predictions than other 102 similar resources. Finally, our curation effort provides a partial summary snapshot of the 103 literature landscape surrounding transcriptional regulation in the developing brain.

Overview of curation 116
Our curation pipeline is summarized in Fig 1C (see Methods for details). Briefly, for each 117 TF, we assembled a set of candidate papers (S . Table S1). Next, we manually prioritized TFs for 118 curation based on annotated associations with central nervous system (CNS) development and 119 the number of candidate papers retrieved (S . Table S2). For each paper examined, we recorded 120 the details of all reported experiments that lend support to any DTRI in humans or mice (Table 1,  121 S. Table S3). For reporting, we mapped all genes to human orthologs while retaining the species 122 information as an additional feature. Applying this pipeline to a total of 1,310 papers, we 123 established a collection of 1,499 unique DTRIs, involving 251 TFs and 825 targets, from 828 124 papers. This manually curated network is displayed in Figure 1D and the complete set of curated 125 interactions are provided in S.

Context Type
A broad classification of the cellular context tested. Value: Primary Tissue, Primary Cells, Cell Line, or In-Vitro.

Cell Type
An ontology term that best corresponds to the tissue or cell type used. Example: UBERON:0001017 (central nervous system)

TF Species
The species of the TF protein or sequence. Value: Human or Mouse.

Target Species
The species of the target protein or regulatory element. Value: Human or Mouse.

TFBS Position
A broad classification of the distance between the transcription factor binding site (TFBS) and the target transcription start site (TSS). Value: Proximal or Distal.

Mode
The mode or direction of regulation. Value: Activation or Repression.

Identification of candidate papers highlights biases in TF coverage 139
The input to our curation was a corpus of candidate publications. To establish this corpus, 140 we started by taking advantage of previous curation efforts. We obtained 14,364 papers from 141 seven external resources, covering 1,305 TFs Fig 2A, S. Table S1). TRRUST, the largest 142 database of literature curated DTRIs, provided more than 10,000 publications but only recovered 143 about ~20% of those recorded in the other resources (Fig 2B, 2C). Further, overlaps among the 144 other resources are generally small (0%-22%). These observations suggest that there may be 145 additional papers in the literature containing reports of DTRIs. As such, we expanded the pool of 146 candidate papers by searching PubMed using several relevant Medical Subject Headings (MeSH) 147 terms (see Methods for details). We identified an additional set of 6,989 candidate papers for 148 1,140 TFs (Fig 2A). In particular, for TFs directly associated with CNS development, we were 149 able to increase the total number of candidate papers from 5,729 to 9,839. Together, we 150 assembled a set of 21,353 candidate papers covering 1,486 TFs.     TFs identified in our independent PubMed query (Jaccard Index = 0.52), suggesting shared 171 biases. The overall pattern is shown in Fig 2D. Further, a substantial fraction of TFs (749; 34%) 172 had no candidate papers. Importantly, some key neurodevelopmental TFs appear to have had 173 very limited investigation. For example, TBR1 is a TF recently implicated in Intellectual 174 Disability (ID) and Autism Spectrum Disorder (ASD) (24). Despite this, we were able to identify 175 only eight candidate papers for this gene (Fig 2D), suggesting that TBR1 was not previously 176 popular enough to warrant much attention. We hypothesized that this bias in TF coverage 177 reflects gene popularity differences in general. As expected, we found that the total number of 178 papers per TF in PubMed is highly correlated with the number of candidate papers retrieved 179 (Spearman's correlation = 0.86). As we discuss later, these biases in the literature influence the 180 resulting database of interactions and its interpretation.  Table S3). A small fraction (204; 14%) of all DTRIs were supported by evidence in 185 both humans and mice ( Fig 3A). About half (798; 53%) were reported only for mice and the 186 remainder (497; 33%) only for humans. We were able to annotate 39 TFs with 10 or more DTRIs 187 ( Fig 3C). Collectively, these top 39 TFs regulate more than half (1018; 68%) of all curated 188 DTRIs. The remaining 481 (32%) DTRIs were distributed across 212 TFs (S. Fig S1). 189 Unsurprisingly given our TF selection criteria, 31 of the top 39 TFs, are associated with 190 neurodevelopment ( Fig 3C). Notably, PAX6, a key TF implicated in corticogenesis (25,26) has 191 63 recorded targets. Further, we identified 12 targets with ten or more recorded TF regulators (S. 192 Fig S2). Eight of these 12 targets are themselves neurodevelopmental TFs including HES1, 193 ASCL1, NEUROG2, MEF2C among others (S. Fig S3). In particular, HES1, a TF known to be 194 involved in the proliferation of neural progenitors (27)   were validated using at least one such experiment. Further, we found that the majority (965; 211 64%) of DTRIs were supported by two or more types of evidence and 398 (27%) DTRIs were 212 supported by all three (Fig 3B). Of the DTRIs with all three types of evidence, 170 had been 213 tested using primary tissues or cells and 44 had been tested directly in the CNS ( Fig 3B). 214 For each type of experiment, we further explored a number of factors that may influence 215 reliability of the reported DTRI (Table 1) Table S5). Further, in TF perturbation experiments that use 220 primary tissues or cells, time-limited modifications may be preferred. Importantly, we found that 221 it is common (373 experiments) to induce a constitutive loss-of-function mutation in the TF and 222 then compare the resulting target gene expression to that of wildtype samples (S. Fig S4).  Table S6). 230 However, EMSAs were also commonly employed to test for in-vitro TF protein-DNA 231 interactions (469 experiments). The reliability of EMSAs might be improved by using the 232 endogenous TF protein, as opposed to using recombinant versions. We found a small number of 233 EMSA experiments (48) that used TF proteins obtained by nuclear extractions directly from 234 primary tissues or cells (S. Figure S5; S. Table S6). Finally, for TF-reporter experiments, we 235 recorded whether mutated versions of the TFBS sequence were assayed to confirm a direct 236 binding mechanism. We found that 407 of the 930 reporter gene assays examined the functional 237 consequence of mutating the corresponding TFBS sequence (S. Fig S6, S. Table S6). Overall, the 238 granularity of our curation highlighted a wide range in the quality and quantity of evidence 239 supporting the reported DTRIs. 240 Our curation also accounted for tissues and cell types, which we recorded at the highest 241 resolution possible with existing ontologies. This allows subdivision of the data in terms of 242 relevance to particular contexts. In total, 951 (26%) experiments recorded (for 620 DTRIs) were 243 performed using primary tissues or cells. In terms of anatomical systems, among these 244 experiments, the most represented was the CNS, with 243 experiments (155 DTRIs) (S. Fig S7). 245 The set of DTRIs in the CNS is highly enriched for neurodevelopmental TFs (p-value < 5.6x10 -9 , 246 hypergeometric test). Further, a large fraction (181; 74%) of these experiments used embryonic 247 CNS samples, thus providing evidence of activity in the developing CNS (S. Figure S7). For 248 example, ASCL1, FGF19, and SOX2 were reported to regulate targets in the embryonic 249 telencephalon (29), diencephalon (30), and neural stem cells (31), respectively. We also found 250 some DTRIs involving known neurodevelopmental TFs that were assayed only in other tissues, 251 such as a small number of PAX6 targets in pancreatic islets (32,33) and small and large intestine 252 (34). Over half (2,181; 60%) of all experiments were performed in cell lines, regardless of the 253 experiment type (Fig 3A). Among these, the most popular were kidney derived cell lines (S. 254 Figure S8). As expected, cell line experiments accounted for a larger proportion of human 255 samples compared to primary tissue or cells (Fig 3A). Our detailed information about cellular 256 contexts allows efficient and accurate data subsetting based on user requirements. 257 Next, we assessed overlaps with other DTRI resources. Since we sourced many candidate 258 papers directly from such earlier curation efforts, a significant amount of overlap is expected. By 259 examining 657 previously curated papers, we managed to extract 809 DTRIs from 467 papers 260 but failed to identify low-throughput experimental evidence in the remaining 190 papers (Fig 4A,  261 S. Table S1). At the level of DTRIs, 40% of our database overlaps with TRRUST while other 262 resources contain up to 8% of our records (S. Fig S9). Limited overlap is common among the 263 other resources as well, with the overwhelming majority of DTRIs having been recorded only in 264 a single database (Fig 4B). This demonstrates the general incomplete coverage of the literature 265 even by the most comprehensive curation efforts to date. Despite being smaller than most other 266 resources (Fig 4C), we still managed to identify 775 DTRIs that were not previously curated in 267 any other database, 541 of which directly involved a neurodevelopmental TF (Fig 4D). 268 Importantly, 449 (58%) of these newly identified interactions were supported by multiple lines of 269 experimental evidence (Fig 4E). Taken together, our curation has expanded the repertoire of 270 annotated DTRIs among the existing DTRI data resources. 271

280
Because we curated only a fraction of the literature, it is of interest to estimate the total 281 number of DTRI reports with low-throughput experimental evidence in the remainder. We base 282 our estimate on the observation that of the 1,310 candidate papers that we examined, 63% (828)  283 were found to contain at least one report of DTRI. It follows that approximately >12,000 of the 284 remaining 20,043 candidate papers contain experimental evidence of DTRI. With an average of 285 1.9 DTRIs reported by any single publication (S. Fig S10)

Network properties reflect potential research biases 299
Given the literature biases in coverage of TFs (Fig 2D), we suspected that similar biases 300 may exist in the selection of regulatory targets. Specifically, researchers may be more likely to 301 choose to investigate interactions between genes that are suspected to be related. If this is true, 302 the manually curated network should be more connected than expected if the targets were chosen 303 randomly. To test this, we integrated all DTRIs in our database to construct a directed network 304 consisting of 955 nodes and 1,499 edges (Fig 1D). We measured network connectivity in three 305 ways. First, we counted the number of valid gene-to-gene paths in the network. Briefly, for every 306 gene in the network, we counted the number of other genes that are within reach via at least one 307 continuous path. The total count was then obtained by summing across all genes. Because the 308 edges are directed and the network consists of multiple components, not every gene is reachable 309 from every other gene in the network. In total, we observed more than 77,000 gene-to-gene paths 310 in the curated network, which is significantly higher than the mean of 55,411 paths among a null 311 constructed from random networks (p < 0.01; see Methods). This indicates a high degree of 312 global connectivity within the network. Next, we counted the number of cliques with three or 313 more nodes, ignoring directionality. We found 215 cliques in the curated network, which is 314 higher than a mean of 140 cliques among the random networks (p-value < 0.01), demonstrating a 315 large number of locally interconnected modules. Finally, we observed only four independent 316 components in the curated network whereas a typical random network had 26 components (p-317 value < 0.01), implying that even peripheral genes with low node degrees remain connected to 318 the rest of the network. Taken together, the manually curated network is highly interconnected, 319 even after controlling for biases in TF coverage. This strongly suggests substantial biases in the 320 selection of targets by investigators, as observed for TF selection. 321 Continuing our investigation of biases in the data, we hypothesized that TSS proximal 322 cREs would be enriched among the reported DTRIs since distal elements are likely more 323 difficult to identify. We define proximal regulatory elements to be either promoters or regulatory 324 elements that fall within 3 kb of the target TSS, as indicated by the original publication. We 325 found that most (595 of 663 DTRIs where the TFBS position was annotated) of the reported 326 DTRIs involve proximal cREs and only 68 DTRIs have been annotated with distal regulatory 327 sites (S. Fig S11, S. Fig S12). Distal sites include the well-documented interaction between 328 SOX2 and SHH where the corresponding enhancer is 5 kb downstream of the TSS (Favaro et al.,  329   2009). Such cases are, by far, the minority in our curation. In addition to TFBS proximity, we 330 also annotated whether a regulatory interaction is activating or repressive, referred to as the 331 mode of regulation (Table 1). We found that about less than a third (313/1,317) of the DTRIs are 332 repressive (S. Fig S11). It is less clear whether this trend reflects underlying biological trends or 333 another form of investigator bias in the selection of interactions to study. Notably, several 334 repressive DTRIs involve TFs that are generally characterized as repressors including HES1, 335 GLI3, and REST (S. Fig S13). In particular, HES1 was annotated to repress 16 of its 26 targets 336 One application of our curated DTRI resource is to benchmark high-throughput screens. 346 To demonstrate this use case, we analyzed a previously published TF perturbation screen for 347 Pax6 (Fig 5A). In this study, the authors sought to identify Pax6 targets in the embryonic mouse 348 forebrain by examining genome wide differential expression between wildtype and Pax6 mutant 349 mice using microarrays (35). We assessed enrichment of our curated PAX6/Pax6 targets in this 350 dataset (this includes targets validated in either humans or mice). We found that 22 of all 56 351 curated PAX6/Pax6 targets were differentially expressed at a false discovery rate (FDR) of 0.1 352 (p-value < 4.4x10 -6 , hypergeometric test; similar results were obtained with a threshold-free 353 comparison (S. Fig S14)). Among these include several known neurodevelopmental genes such 354 as ASCL1, SOX2, and NEUROG2 (Fig 5A). We conclude there is significant correspondence 355 between the curated targets and the high-throughput differential expression screen. 356  correspondence. To test for this, we divided the PAX6/Pax6 targets into three tissue types: the 370 CNS, the eye, and "other", with the latter containing mostly DTRIs validated in cell lines. Since 371 the differential expression profile was generated in the embryonic mouse forebrain, we 372 hypothesized that the targets supported by low-throughput CNS evidence would be most highly 373 enriched. We found that this is indeed the case. Thirteen of 18 CNS targets were differentially 374 expressed at an FDR of 0.1 (p-value < 8.5x10 -9 , hypergeometric test) (Fig 5B). This is nearly a 375 twofold improvement over the set of all curated PAX6/Pax6 targets. Again, this observation was 376

A B
corroborated by an additional threshold-free analysis (S. Fig S15). Further, we confirmed that the 377 increase in the level enrichment for CNS targets over the set of all PAX6/Pax6 targets was 378 statistically significant by using a resampling strategy to estimate the 95 th percentile confidence 379 intervals of the enrichment values (S. Fig S16). The level of enrichment for targets validated in 380 the eye is approximately the same as the set of all targets (7 of 22 targets were differentially 381 expressed at FDR of 0.1; p-value < 1.5x10 -2 , hypergeometric test). Finally, we did not observe 382 enrichment for the set of targets in the "other" category (2 of 16 targets were differentially 383 expressed at FDR of 0.1; p-value < 0.47, hypergeometric test) (Fig 5B, S. Fig S15, S. Fig S16). 384 Similar to cellular contexts, we also found significant difference in the level of enrichment 385 between the curated targets with single vs. multiple types of recorded experiments (S. Fig S16, S. 386 Fig S17). These results provide a proof-of-principle for using our curation resource to evaluate 387 high-throughput screens. 388

389
The elucidation of the genetic circuits underpinning neurodevelopmental disorders has 390 been a major challenge. While there has been progress in the development of TRN 391 reconstruction methods using high-throughput data, it is reasonable to ask how much has already 392 been captured in the lengthy history of low-throughput experiments, and to make maximal use of 393 this information. Because low-throughput methods appear to be considered reliable (they are 394 often used to validate high-throughput methods), especially when there are multiple lines of 395 evidence, having a high-quality assembly of such data would be beneficial. In our survey of 396 previous efforts to produce such resource, we identified the lack of detail about the amount and 397 type of low-throughput evidence to be a major gap. To this end, we undertook a systematic and 398 detailed effort to inspect the published literature for support of DTRIs at the individual experiment level. We show that this approach improves interpretability of a curated DTRI data 400 resource. Here, we release the result of our curation for use by the wider research community. There are still a number of limitations to our work. In general, manual curation can have 430 errors. In order to minimize mistakes, we established and strictly followed a formal curation 431 protocol. In particular, we introduced controlled vocabularies for all recorded attributes to 432 simplify the curation process. All records were checked twice, and any conflicts were resolved 433 by the first author. Next, incomplete retrieval of candidate papers is a potential concern. While 434 we strived to find as many papers as possible using both previous curation resources and 435 independent PubMed queries, it is plausible that we have missed some candidate papers given 436 our selection of and reliance on the MeSH search terms. Nonetheless, the pronounced popularity 437 biases we report are unlikely to be an artifact of our search strategy. Further, since we aimed to 438 curate only a handful of DTRIs for a small set of TFs of interest, an incomplete pool of candidate 439 papers was not a major issue. However, for a more comprehensive curation effort with the goal 440 of increasing coverage of less popular TFs, it is possible that a future study may benefit from 441 using more elaborate text mining approaches for retrieving candidate papers. 442 In order to establish a direct binding mechanism for regulatory interactions with the 443 highest possible confidence, the effect of modifying cREs in their endogenous chromosomal loci 444 should be considered. Emerging studies are using CRISPR-KRAB and related approaches to 445 perform this analysis; recent examples include (37,38). However, such studies are few, and 446 therefore have not been included in our curation protocol. Instead, we focused only on the three 447 most commonly reported types of experimental evidence. In the future, it may be possible to 448 integrate such data types in order to improve reliability beyond current standards. 449 The lack of a negative set may limit the utility of this resource for validation. For 450 example, in the PAX6/Pax6 analysis we could only assess sensitivity of the high-throughput 451 perturbation study with respect to our database, not specificity. This is because our attempts to 452 find negative examples was largely unsuccessful. During curation, we took note of any TF 453 perturbation or TF-reporter experiments that yielded negative results in the papers that we 454 examined. We only found 11 such cases (S. It is also important to emphasize that the network we obtained here cannot and should not 461 be used for large scale biological inference, because the structure of the network is strongly 462 influenced by research biases and the relationship with the true regulatory network is very 463 uncertain. The highly skewed TF coverage among the candidate papers, coupled with the 464 correlation between the number of candidate papers and gene popularity implies that researchers 465 generally choose to study DTRIs involving TFs of previously known significance. Conversely, 466 some genes, such as TBR1, are functionally important but lack experimental characterization, 467 perhaps due to their more recently discovered functional roles. This general research bias, 468 combined with biases of our curation, has obvious impacts on the resulting network structure. 469 Previous work by our group has documented the impact of bias towards well studied, 470 multifunctional genes in other types of network analyses (40). Our observation of the high 471 internode connectivity in the curated network demonstrates the presence of DTRI biases beyond 472 gene popularity. Likewise, it is unclear whether the skewed representation of DTRIs involving 473 activation of proximal cREs is the result of research bias or a real biological pattern. As such, we 474 caution against interpretations based on the properties of the manually curated network. 475 We curated what we estimate to be a substantial, but still small, fraction of the relevant 476 literature. Fortunately, our curation protocol can be scaled up to produce a considerably larger 477 collection of high confidence DTRIs. According to our estimates, our current curation has 478 captured less than ten percent of all experimentally verified DTRIs reported in the published 479 literature. The bulk of our curation was performed in four months by two full time curators. 480 Given this experience, we estimate an exhaustive curation effort could be completed by a team of 481 ten curators in approximately 12 months. Importantly, we predict that about a third of all 482 reported DTRIs would be supported by all three types of experimental evidence. However, we 483 take note of the scarcity of specific DTRIs in particular contexts. In particular, we found less 484 than 5% of all recorded DTRIs to have reliably demonstrated activity specifically in the CNS. 485 While we postulate that a manual curation approach is required to establish a high confidence 486 DTRI catalogue for training and validating high-throughput predictions, the aforementioned 487 biases and scarcity of low-throughput experiments will prevent the use of manually curated 488 networks directly for analysis. To elucidate the architecture of gene regulation underpinning 489 neurodevelopment and disease, it is imperative to develop effective means for accurately 490 predicting DTRIs based on high-throughput data. This curation effort supports progress towards 491 this end. 492

Obtaining records from external resources 497
We obtained records from eight external databases: ENdb, TRRUST, CytReg, 498 OReganno, HTRIdb, TFe, TFactS, and InnateDB. We downloaded the ENdb records from 499 http://www.licpathway.net/ENdb/ on Sept. 14 th , 2020. Records from the remaining databases 500 were downloaded between Dec. 9 and Dec. 16, 2019. We obtained the CytReg records from the 501 supplementary data of the original publication. For TRRUST, we downloaded both the human 502 and mouse data tables directly from https://www.grnpedia.org/trrust/. An additional column was 503 added to preserve the species annotation before joining the two tables. The most recent version 504 of the records in ORegAnno were obtained from http://www.oreganno.org/. Here, we retained 505 only records with valid Entrez or Ensembl ID, and PubMed ID annotations. In addition, records 506 annotated as miRNA regulation or those resulting from high-throughput screens were excluded. 507 We downloaded InnateDB records from https://www.innatedb.com/ and filtered for records 508 reporting protein-DNA interactions. The TFe records were retrieved from the now deprecated 509 web API, http://cisreg.cmmt.ubc.ca/cgi-bin/tfe. Species information was inferred from the TF 510 gene symbols recorded in TFe. The TFactS records were downloaded from 511 http://www.tfacts.org/. A union set was derived by merging both signed and signless data tables 512 in TFactS. Finally, for HTRIdb, we downloaded the data from http://www.lbbc.ibb.unesp.br/htri. 513 Here, we filtered for literature curated records with valid PubMed ID annotations. 514 From each database we retained records of one-to-one regulator-target interactions with 515 annotations in either human or mouse. We indexed genes using Entrez IDs. In cases where only 516 the gene symbols were available, we mapped the symbols to Entrez IDs, first by using the 517 official HGNC or MGI symbols and then by gene aliases. With the exception of ENdb, which 518 was published after we completed curation, the retrieved set of publications was used as a source 519 of candidate papers for curation in the present study (S. Table S1). Each publication was 520 assigned to one or more TFs based on the recorded DTRIs. Additionally, we also retained 521

Identification of neurodevelopmental TFs 528
We define TFs to be either the genes annotated with least one regulatory target in any of 529 the previous resources or those identified as TFs by Lambert et al. (41). Collectively, this TF set 530 consists of 2,235 genes. Given our particular focus in this study on neurodevelopment, we 531 further designated 438 TFs as neurodevelopmental TFs based Gene Ontology annotations, and 532 disease association records from SFARI (S . Table S2) (42,43). We downloaded the list of genes 533 annotated with the central nervous system development GO term (GO:0007417) or any of its 534 descendent terms for both human and mouse from AmiGO (http://amigo.geneontology.org/). 535 Next, we downloaded the list of genes associated with neurodevelopmental disorders from the 536 SFARI database (https://gene.sfari.org/). The list of TFs is provided in S. Table S2. Finally, we 537 manually prioritized these TFs for curation based on the annotated association with 538 neurodevelopment and the number of candidate papers retrieved. 539

Obtaining candidate publications for curation 540
In addition to the candidate papers derived from the external resources, we also 541 performed an independent PubMed query for each TF (refer to the previous section for the 542 operational definition of a TF). We took advantage of the E-Utilities API provided by NCBI to 543 perform searches programmatically (44). We selected six MeSH terms that indicate experimental 544 evidence for: "Regulatory Sequences, Nucleic Acid", "Transcription, Genetic", "Intracellular 545 Signaling Peptides and Proteins", "Gene Expression Regulation", "Chromatin 546 Immunoprecipitation", and "Electrophoretic Mobility Shift Assay". The set of the selected search 547 terms were appended to the gene symbol of each TF to form an independent search query to 548 obtain the corresponding set of candidate papers. To approximate gene popularity of TFs, we 549 performed another round of PubMed query for each TF using only the gene symbol without the 550 MeSH terms. 551

Experiment-centric curation of DTRIs 552
For each paper that we examined, we recorded all low-throughput experimental evidence 553 of DTRIs. Specifically, we look for three types of experiments: TF perturbation, TF-DNA 554 binding, and TF-reporter assays. As such, each experiment constitutes an independent record in 555 the database and is assigned a unique identifier (S. Table S3). Gene identifiers were translated 556 into Entrez IDs at the time of recording. Species information was recorded separately for the TF 557 and the target genes. The context type may be cell lines, or primary tissue or cells. In the case of 558 EMSA experiments, the context types are designated to be in-vitro. In addition to the context 559 type, we further annotated each experiment with a specific ontology term in order to retain the 560 highest context resolution possible. We used terms from the UBERON ontology (45) for primary 561 tissue, the CL ontology (46) for primary cells, and the CLO ontology (47) for cell lines. Where 562 the appropriate ontology term could not be found in the aforementioned ontologies, we 563 additionally used terms from the BTO (48) and the EFO (49) ontologies. When all else fails, we 564 directly recorded the name provided in the original publication. Age was also recorded as a 565 separate attribute for experiments that used primary tissues or cells. Where available, we also 566 recorded the direction of regulation as well as whether the reported regulatory element is 567 proximal or distal to the TSS of the target gene. Proximal elements were defined to be either 568 promoters or cREs within 3 kb upstream or downstream of the TSS. Table 1 contains the full list 569 of recorded attributes and the corresponding descriptions. 570 For each type of experiment, we selected a number of details. For TF perturbation 571 experiments, we recorded whether the TF was overexpressed, down regulated, or knocked out. 572 We also recorded whether the perturbation was dynamically induced before the time of assay or 573 constitutively modified at the beginning of life. For TF-DNA binding experiments, we recorded 574 both ChIP-assay and EMSA experiments. For EMSA, we further annotated the source of the TF 575 protein. Finally, for reporter assays, we recorded whether mutations were introduced to the cRE 576 sequence for comparison. 577

Network analysis 578
To assess the connectivity of the manually curated network, we used the iGraph package 579 in R (50). First, we constructed a directed network consisting of all curated DTRIs. Three metrics 580 were computed to measure internode connectivity: the number of valid gene-to-gene paths, the 581 number of cliques with three or more nodes, and the number of independent components. To 582 assess statistical significance, we constructed 1000 network permutations by randomly swapping 583 all edges while preserving both in and out degrees of all nodes. This set of random networks 584 were then used to generate empirical null distributions for each of the three metrics. One-tailed 585 p-values were computed by obtaining the fraction of random values larger or smaller than the 586 observed values. 587 Comparison with the high-throughput Pax6 perturbation screen 588 We selected PAX6/Pax6, the TF with the highest number of recorded targets, for 589 assessing correspondence with a high-throughput screen. We obtained the genome wide 590 expression data generated by previous study (35) along with the metadata from Gemma (51). 591 This dataset was selected for its relevance to brain development. We then performed a 592 differential expression analysis between the wild type and the Pax6-Sey samples using limma 593 (52). This resulted in a list of genes with p-values representing significance of differential 594 expression upon Pax6 knockout. For hit list analyses, we used a cut-off FDR of 0.1. We used the 595 ranking of nominal p-values for AUROC analyses. The Entrez IDs for the mouse genes were 596 mapped to human orthologs using HomoloGene so that the results could be compared with the 597 current curation. 598 Next, we took all 56 curated targets for PAX6/Pax6 that were present in the microarray 599 dataset and sliced it according to cellular context and quality of evidence. To retrieve targets with 600 demonstrated activity in the CNS, we retrieved all interactions for PAX6/Pax6 where there is at 601 least one experiment annotated with the CNS ontology term (UBERON:0001017) or any of its 602 descendent terms. Similarly, we searched for all targets annotated with the eye term 603 (UBERON:0000970). Targets with evidence in both the CNS and the eye were placed only in the 604 CNS category so that the categories are mutually exclusive. The remaining targets were 605 classified as "other". To subset by quality of evidence, we binned all PAX6/Pax6 targets into 606 those with multiple types of experiments vs. only a single type of experiment. 607 Each of these target subsets were then tested for enrichment in the high-throughput 608 differential expression screen. Enrichment was tested in two ways. First, a hit list of 2,780 609 differentially expressed genes were generated using an FDR threshold. Overrepresentation of the 610 curated targets in this list was tested by using the hypergeometric distribution, yielding a p-value 611 for each set of curated targets. Next, we generated a ranking of differentially expressed genes 612