Single-cell proteomics reveals downregulation of TMSB4X to drive actin release for stereocilia assembly

Hearing and balance rely on small sensory hair cells that reside in the inner ear. To explore dynamic changes in the abundant proteins present in differentiating hair cells, we used nanoliter-scale shotgun mass spectrometry of single cells, each ∼1 picoliter, from utricles of embryonic day 15 chickens. We identified unique constellations of proteins or protein groups from presumptive hair cells and from progenitor cells. The single-cell proteomes enabled the de novo reconstruction of a developmental trajectory. Inference of protein expression dynamics revealed that the actin monomer binding protein thymosin β4 (TMSB4X) was present in progenitors but dropped precipitously during hair-cell differentiation. Complementary single-cell transcriptome profiling showed downregulation of TMSB4X mRNA during maturation of hair cells. We propose that most actin is sequestered by TMSB4X in progenitor cells, but upon differentiation to hair cells, actin is released to build the sensory hair bundle.


33
Hair cells, the sensory cells of the inner ear, carry out a finely orchestrated construction of an elaborate 34 actin cytoskeleton during differentiation. Progenitors of vestibular hair cells, the supporting cells 35 (Roberson et al., 1992), have an unremarkable actin cytoskeleton. By contrast, differentiating hair cells 36 express a wide array of actin associated proteins, including crosslinkers, membrane-to-actin linkers, and 37 capping molecules, and use them to rapidly assemble mechanically sensitive hair bundles on their apical 38 surfaces (Shin et al., 2013;Ellwanger et al., 2018). Hair bundles consist of ~100 stereocilia each, filled 39 with filamentous actin (F-actin) and arranged in multiple rows of increasing length; by maturity, stereocilia 40 contain >90% of the F-actin in a hair cell (Tilney and Tilney, 1988). 41 Along the axis of the chicken cochlea, stereocilia systematically decrease in maximum height (from 42 >5 to 1.5 µm) and increase in number per cell (from 30 to 300) as the frequency encoded increases 43 (Tilney et al., 1992). Despite these changes, a quantitative analysis suggested that within experimental 44 error, hair cells use the same amount of actin to build these disparate hair bundles (Tilney and Tilney,45 alkylation, and proteolysis, digested peptides were collected and separated using nano-liquid 87 chromatography on a 30-µm-i.d. column (Zhu et Figure 1D). The total number of unique peptides increased to ~2500 99 in pools of 20 cells, with about 70% identified by MS/MS scans ( Figure 1D). Only ~1200 peptides were 100 identified in pools of 20 FM1-43low cells, with about 60% identified by MS/MS scans ( Figure 1E). Total 101 iBAQ rose nonlinearly with the number of cells ( Figure 1F), suggesting that some protein was lost to 102 surface adsorption; while small relative to typical sample wells, the volume of the nanowell is still 50,000-103 fold larger than the volume of a utricle cell. Because sample processing occurred in a protected nanowell 104 environment using robotic liquid handling, the total iBAQ attributed to keratins (e.g., human skin 105 contamination) was only ~0.1% of the total, far less than >50% occurring in some mass-spectrometry 106 experiments with small amounts of protein. The number of proteins or protein groups identified increased 107 from ~60 for FM1-43high single cells to nearly 600 for pools of 20 cells ( Figure 1G); fewer proteins were 108 identified in supporting cells, likely because of their smaller volume. 109 Comparison of single FM1-43high and FM1-43low cells to wells with collection triggered to noise allowed 110 us to confirm the presence of single cells, even without visual inspection of the wells. Total iBAQ did not 111 accurately indicate which wells contained single cells ( Figure 1H

126
The proteomics experiments also revealed several abundant proteins that had not been previously found 127 to be hair-cell specific, including GSTO1, GPX2, CRABP1, and AK1; TMSB4X and AGR3 were examples 128 of proteins that were much more abundant in supporting cells (Figure 1-Source Data 1; Figure 2A-B). 129 We examined several of these proteins in E15 chick utricles using immunocytochemistry. Antibody 130 labeling for AGR3 and the hair-cell marker OTOF labeling did not overlap, and the elongated cell bodies 131  The thymosin-beta family of proteins, which includes TMSB4X, are actin monomer binding proteins that 135 sequester substantial fractions of actin in many cell types (Nachmias, 1993;Sun et al., 1995). Five 136 TMSB4X peptides were identified by mass spectrometry, which covered 75% of the ~5 kD protein; one 137 of the peptides was shared by TMSB15B, another member of the family (Figure 1-Figure Supplement  138 2). Analysis of transcript expression in mouse inner ear using gEAR (https://gear.igs.umaryland.edu) 139 indicated that Tmsb4x expression was considerably higher than that of another paralog, Tmsb10, and 140 much higher than the two Tmsb15 isoforms, justifying our focus on TMSB4X. 141 To localize TMSB4X in the E15 chick utricle, we used an antibody that has been validated previously with The concentration of TMSB4X relative to total actin indicates how much free actin is available for 151 assembling filamentous structures like stereocilia. Analyzing the 20-cell samples, we found that the 152 ACTG1 protein group-total actin-accounted for a relative molar fraction (riBAQ) of 0.043 ± 0.001 153 (mean ± SEM) in FM1-43high cells and 0.060 ± 0.005 in FM1-43low cells ( Figure 2C). A mixed-effects model 154 accounting for intra-sample correlations indicated that these concentrations differed significantly, albeit 155 only at an alpha level of 0.05 (summary statistics with confidence intervals are reported in Table 1). While 156 TMSB4X accounted for a relative molar fraction of only 0.006 ± 0.002 in FM1-43high cells, it was 0.056 ± 0.012 in FM1-43low cells, ten-fold higher ( Figure 2C) and significantly different (p<0.001). 158 Critically, the concentration of hair-cell TMSB4X differed significantly from that of hair-cell actin (p=0.001), 159 while the concentration of supporting cell TMSB4X did not differ from that of supporting cell actin 160 (p=0.660). Because TMSBX and actin interact with a 1:1 stoichiometry (Goldschmidt-Clermont et al.,  161 1992), and no other actin-binding proteins are detected at similar high levels, our quantitation suggests 162 that TMSB4X is capable of binding most actin monomers in supporting cells. 163 In wholemount preparations, we counted 72 ± 8 stereocilia per utricle hair cell (mean ± SD; N=26 from 164 striolar and extrastriolar regions). The actin quantitation suggested that each E15 hair cell contains 165 ~15,000,000 actin molecules (G-and F-actin combined). If nearly all actin is in stereocilia (Tilney and 166 Tilney, 1988), then each stereocilium would contain ~200,000 actin molecules. While fewer than the 167 400,000 molecules estimated per E20 chick stereocilium (Shin et al., 2013), the value is consistent with 168 the relative immaturity of E15 cells. 169 Developmental trajectory analysis using single-cell proteomics suitable for dissection of the single-cell proteomics results, we applied CellTrails, which we previously 182 used to uncover the branching trajectory from progenitors to hair cells in the chicken utricle using 183 transcript data . To interpret the latent structure in the single-cell mass 184 spectrometry data, its lower-dimensional manifold was investigated using CellTrails' robust nonlinear 185 spectral embedding on the submatrix of the 37 highest variable identifications ( Figure 4A). Appropriately, 186 the cells distinctly segregated according to their FM1-43 uptake ( Figure 4B). We noted that the protein 187 pattern of three cells classified as FM1-43high appeared to match better to the FM1-43low (supporting cell) 188 pattern. Similarly, two FM1-43low cells were embedded in the neighborhood of cells with a high FM1-43 189 uptake (hair cells). While FM1-43 is useful for labeling hair cells, transcript analysis showed that FM1-43 190 levels are not a perfect proxy for hair-cell maturity ; for example, hair cells with 191 damaged mechanotransduction will not load with the dye and such cells would be classified as  43low. Alternatively, cells with relatively low FM1-43 could be transitional cells between progenitors and 193 mature hair cells . We therefore surmised that we could elicit a developmental 194 trajectory from the single-cell protein expression patterns. The chronological ordering of the cells was 195 learned in the lower-dimensional manifold and a pseudotime value was assigned to each cell ( Figure  196 4C). 197 The 75 proteins sufficiently detected on single-cell level are all relatively highly expressed and largely do 198 not include those expected to distinguish different classes of hair cells . Moreover,199 we expect that our sample is dominated by type II hair cells, especially those from extrastriola regions, 200 as they are much more numerous than type I hair cells (Ellwanger et  Transcriptomic confirmation of TMSB4X enrichment in progenitor cells 212 We predicted that the decrease in TMSB4X as hair cells mature arose from downregulation of TMSB4X 213 transcript expression during differentiation of hair cells. We therefore used transcriptomic profiling of 214 single cells isolated from E15 chick utricle to examine gene expression during the bifurcating trajectory 215 that describes the development of progenitor cells to mature striolar and extrastriolar hair cells (Ellwanger 216 et al., 2018). We carried out scRNA-seq transcriptomic profiling using the Smart-seq protocol (Picelli et 217 al., 2014) on 384 FACS-sorted E15 chick utricle hair cells. To provide maximum correlation of TMSB4X 218 expression changes with chicken utricle hair cell maturation, we reconstructed the trajectory in similar 219 fashion as previously described , carrying out the analysis with 182 assay genes 220 already including GSTO1 and CRABP1 from that previous study, supplemented with TMSB4X, AGR3, 221 GPX2 and AK1. Nine cellular subgroups emerged, each of which was distinguished by distinct marker 222 gene sets ( Figure 5A). Based on their expression profiles, for example the lack of TECTA and especially 223 high levels of TMSB4X ( Figure 5A; see also Figure 3), two subgroups (S8 and S9) appeared to be stromal 224 cells; to focus on the developmental progression of progenitor (supporting) cells to hair cells, we removed 225 S8 and S9 for subsequent analysis. 226 We mapped the remaining 254 individual cells of subgroups S1-S7 along developmental trajectories, 227 plotting CellTrails maps  to demonstrate the branching nature of the trajectory 228 ( Figure 5B). Our assay was biased for hair-bundle genes, and at least half of the cells isolated were hair 229 cells, so it is unsurprising that the final trajectory revealed not only the transition from progenitor 230 (supporting) cells to hair cells, but also further developmental branching. One major branch was 231 supporting cells, as these cells expressed markers like TECTA and OTOA ( Figure 5H; Figure  hand branch was equivalent to the novel hair cell type TrES* found in our previous study (Ellwanger et 242 al., 2018). 243 To confirm the spatial identity of the two major hair cell branches, all experiments were carried out with 244 E15 chicken utricles split apart into lateral halves, which contain striolar and extrastriolar cells, and medial 245 halves, which contain only extrastriolar cells. The lower-right branch was populated nearly entirely by 246 lateral cells, which confirms that it represents striolar hair cells ( Figure 5D-E). We conclude that the 247 scRNA-seq experiment accurately replicated our previous experiment using a multiplex RT-qPCR 248 approach . 249 We next examined the genes highlighted in the proteomics experiments, including TMSB4X, AGR3, 250 GSTO1, GPX2, CRABP1, and AK1. As predicted from the proteomics and localization experiments, the 251 CellTrails analysis showed that TMSB4X and AGR3 were specific to progenitor (supporting) cells ( Figure  252 5H), while GSTO1, GPX2, CRABP1, and AK1 were specific to hair cells ( Figure 5I). The CellTrails maps 253 indicated that GPX2 and AK1 were concentrated in striolar hair cells, while CRABP1 was enriched in 254 extrastriolar cells, particularly TrES* ( Figure 5I). Cells observed with high levels of CRABP1 in 255 immunocytochemistry experiments could be the TrES* cells ( Figure 3G,I). GSTO1 was expressed at 256 similar levels in both hair cell types. Available antibodies against GPX2, AK1, and GSTO1 were 257 insufficiently reliable to check their hair-cell specificity. Examining databases in gEAR, however, we noted 258 that Gpx2 and Ak1 are predicted to be substantially enriched in hair cells as compared to non-hair cells 259 in mouse utricle; by contrast, Gsto1 is expressed at higher levels in mouse utricle non-hair cells than in 260 hair cells. 261 The scRNA-seq results corroborated the expression dynamics of TMSB4X on transcriptional level. High 262 in progenitor cells, its transcriptional activity decreased substantially during hair cell differentiation ( Figure 5H), and was nearly undetectable in striolar hair cells. Interestingly, TMSB4X was expressed at 264 detectable levels, albeit relatively low, in cells along TrES as compared to TrES* ( Figure 5H). 265 We also noted a striking decrease in ACTB expression as hair cells differentiated; the CellTrails maps 266 suggested that ACTB was >10-fold higher in progenitor cells than in hair cells ( Figure 5K; Figure 5-267 Source Data 1). ACTG1 increased modestly in expression, especially in TrES cells, but overall was 268 present at lower levels than ACTB ( Figure 5K; Figure 5-Source Data 1). Similar trends for these actin 269 isoforms were also seen in our previous data . ACTB and ACTG1 differ by only 270 four amino acids, however, and we only detected one of the peptides that distinguish the isoforms (Ac-271 EEEIAALVIDNGSGMCK from ACTG1) in single mass spectrometry run. We were therefore unable to 272 accurately measure the relative abundance of the two actin isoforms in our protein mass spectrometry In addition, we identified several proteins not previously highlighted as specific for hair cells (CRABP1, 284 GSTO1, GPX2, AK1) or for supporting cells (AGR3, TMSB4X). TMSB4X was present at nearly equimolar 285 levels with respect to actin in supporting cells, indicating that most actin is sequestered in those cells. By 286 contrast, in hair cells, TMSB4X was only one-tenth as abundant as actin. This developmental change 287 was characterized in more depth using single-cell RNA sequencing, which showed that the drop in 288 TMSB4X was greater in extrastriolar hair cells than in striolar hair cells. Together, these data strongly 289 suggest that downregulation of TMSB4X allows differentiating hair cells to construct their hair bundles 290 with newly available actin monomers. 291 proteins in hair cells than in HeLa cells was not surprising. 296

Single-cell proteomics detection
While the nanoPOTS approach is useful for characterizing abundant proteins in small cells or many 297 proteins in larger cells, without further increases in sensitivity, the small number of proteins detected in 298 single hair cells will prevent characterization of low-abundance proteins or deeply categorizing 299 developmental pathways using protein expression. Because the relationship between cell number and 300 total protein signal intensity was nonlinear, especially for 1-3 cell samples ( Figure 1F), we concluded that 301 we lost significant amounts of protein to adsorption to the nanowells. These results indicate that further 302 improvement of protein recovery is critical to increase proteome coverage and quantification performance 303 of single-cell proteomics technology employing nanoPOTS. Fabricating nanowells with smaller 304 dimensions would be straightforward; however, dispensing single cells by FACS to yet-smaller wells 305 would be difficult to carry out reproducibly. In addition, the nanowell surfaces could be coated chemically 306 with antifouling materials such as polyethylene glycol or poly(2-methyl-2-oxazoline) polymers (Weydert 307 et al., 2017). An alternative strategy for single-cell protein analysis uses TMT multiplex labeling with one 308 channel utilized by a sample of several hundred carrier cells, which will reduce the relative error due to 309 protein loss (Budnik et al., 2018). Coupling of TMT multiplex labeling approach with nanoPOTS could 310 significantly increase proteome coverage and analysis throughput of single cell proteomics. 311 We detected much higher levels of small proteins (<20 kD) here than in previous experiments using the Although application of trajectory-analysis methods to single-cell proteomics is very much in its infancy, 331 we show here that CellTrails is suitable for this purpose. The analysis was limited by sensitivity, as 332 proteins detected by mass spectrometry were limited to those expressed at relatively high levels. 333 Improvements of the nanoPOTS method that lead to increased sensitivity and reproducibility will enhance 334 future protein-based trajectory analyses. Nevertheless, while robotic manipulation allows for increased 335 sample-preparation output, the number of cells analyzed presently must remain low because of slow 336 throughput of the mass spectrometry steps. Single-cell RNA-seq approaches are likely to continue to 337 offer much higher throughput and depth for the foreseeable future. That said, analysis of developmental 338 pathways using single-cell proteomics allows the identification of key proteins that change in protein 339 expression level without alternations in transcript levels. Moreover, future single-cell proteomics 340 approaches will allow analysis of posttranslational modifications like phosphorylation, which will expand 341 our ability to probe developmental cascades. 342 Interestingly, GAPDH was found to increase during hair cell maturation ( Figure 4E-F While TMSB4X is present at high levels in supporting cells, presumably sequestering actin monomers, it 371 drops substantially in concentration after cells differentiate to hair cells. If total protein in each cell type is 372 the typical ~250 mg/ml (Fulton, 1982;Srivastava and Bernhard, 1986;Brown, 1991)  plausible hypothesis incorporating these observations is that ACTB is sequestered with TMSB4X in 395 supporting cells; upon differentiation to hair cells, ACTB is made immediately available for stereocilia 396 elongation by degradation of the actin buffer TMSB4X, while ACTG1 expression is increased to provide 397 actin for other assemblies, including the cuticular plate and circumferential actin belt (Höfer et al., 1997). 398 provided support for imaging). We also received support from the following Stanford core facilities: the 404  427 Single cells were collected from utricles of E15 chick embryos using methods previously described 428

475
To quantify cell volume, cells were FACS-sorted (for hair cells and supporting cells), fixed, stained with 476 DAPI and phalloidin, and imaged as described above. For each slice of the z-stack, the Threshold and 477 Make Binary tools of Fiji/ImageJ were used to generate a binary stack, which defined the cell perimeter 478 in each z-slice. The Analyze Particles tool was then used to determine the cell area in each slice. The 479 volume for a slice is the product of the single-slice area multiplied by the z-stack interval; all slice volumes 480 were added together to estimate total cell volume. 481 To count stereocilia per hair bundle, Airyscan z-stack images of phalloidin-stained E15 chicken utricles 482 were obtained using a Zeiss LSM 880 microscope; images were acquired near the base of bundles to 483 ensure that all stereocilia were in each image. Stereocilia were manually counted from single x-y images. 484

485
A capillary solid phase extraction (SPE) column (75 μm i.d., with 3 μm C18 particles of 300 Å pore size; 486 Phenomenex, Torrance, USA) was used for initial sample loading and desalting, and was then connected 487 to a 50 cm, 30 μm i.d. column packed with the same material. Mobile phase was delivered at 60 nl/min 488 with a Dionex UltiMate NCP-3200RS pump system (Thermo Fisher). Peptides were separated with a 489 linear 8-22% Buffer B (0.1% formic acid in acetonitrile) gradient over 60 min, followed by a 10-min 490 increase to 45%. The column was washed with 80% Buffer B for 10 min and then equilibrated with 2% 491 Buffer B for 15 min. An Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher) was used for 492 data collection. Peptides were ionized at a spray voltage of 2 kV and ions were collected into an ion 493 transfer capillary set at 150°C. The RF lens was set at 30%. MS1 scans used a 375-1575 mass range, a 494 scan resolution of 120,000, an AGC target of 3 x 10 6 , and a maximum injection time of 246 ms. Precursor ions were selected for MS/MS sequencing if they had charges of +2 to +7 and intensities >8,000; 496 precursors were isolated with an m/z window of 2 and fragmented by high energy dissociation (HCD) set 497 at 30%. Repeat sampling was reduced by using an exclusion duration of 40 s and m/z tolerance of ±10 498 ppm. MS2 scans were carried out in the Orbitrap with an AGC target of 2 x 10 5 . The maximum injection 499 time and MS2 scan resolution were set as 502 ms and 120,000, respectively. For the enrichment analysis, we only used the 345 proteins that were measured in at least two replicates 515 in each group. To correct for multiple tests (Benjamini and Hochberg, 1995), the FDR was used to correct 516 two-sided p-values from a moderated t-test (Ritchie et al., 2015); an enrichment of >1.5-fold and a FDR-517 adjusted p-value less than 0.05 was considered significant. 518 For statistical comparisons of ACTG1 and TMSB4X mass spectrometry results, to account for potential 519 intra-sample correlations, a mixed-effects model with a random intercept for samples was fitted to the 520 data and used t-tests of contrasts to assess differences between groups (Pinheiro and Bates, 2000). The 521 lmerTest R package (version 3.1-0) was used for the computation (Kuznetsova et al., 2017). A p-value 522 less than 0.05 was considered significant. 523 framework has been used previously to correct for batch effects in protein mass spectrometry data 531 (Carlyle et al., 2017). The resulting values are referred to as log2-normalized iBAQ (niBAQ) units. The 532 relationship between protein expression variance and its average expression was fitted using a log-log 533 cubic smoothing spline with four degrees of freedom; 37 proteins with a higher average expression than 534 a log2 niBAQ value of 1.0 and a higher variance than the fit ( Figure 4A)  expressed genes were removed from the count matrix before read count normalization using SCnorm 596 (Bacher et al., 2017). The CellTrails R package (10.18129/B9.bioc.CellTrails) was then utilized following 597 the strategy described in our previous study . The variable trajFeatureNames was 598 set to the 182 previously used assay genes with the addition of AGR3, AK1, GPX2, and TMSB4X, which 599 restricted to the analysis to 186 genes (Table 3).

601
In all cases, samples were biological replicates-none of the biological samples were split to be run 602 separately as multiple technical replicates. Figure 1. B-C Tables   787   Table 1. Summary statistics with confidence intervals for Fig. 2C