Diatom pyrenoids are encased in a protein shell that enables efficient CO2 fixation

Pyrenoids are subcompartments of algal chloroplasts that concentrate Rubisco enzymes and their CO2 substrate, thereby increasing the efficiency of carbon fixation. Diatoms perform up to 20% of global CO2 fixation, but their pyrenoids remain poorly characterized at a molecular level. Here, we used in vivo photo-crosslinking to catalogue components of diatom pyrenoids and identified a pyrenoid shell (PyShell) protein, which we localized to the pyrenoid periphery of both the pennate diatom, Pheaodactylum tricornutum, and the centric diatom, Thalassiosira pseudonana. In situ cryo-electron tomography (cryo-ET) revealed that the pyrenoids of both diatom species are encased in a lattice-like protein sheath. Disruption of PyShell expression in T. pseudonana resulted in the absence of this protein sheath, altered pyrenoid morphology, and a high-CO2 requiring phenotype, with impaired growth and reduced carbon fixation efficiency under standard atmospheric conditions. Pyrenoids in mutant cells were fragmented and lacked the thylakoid membranes that normally traverse the Rubisco matrix, demonstrating how the PyShell plays a guiding role in establishing pyrenoid architecture. Recombinant PyShell proteins self-assembled into helical tubes, enabling us to determine a 3.0 Å-resolution PyShell structure. We then fit this in vitro structure into an in situ subtomogram average of the pyrenoid’s protein sheath, yielding a putative atomic model of the PyShell within diatom cells. The structure and function of the diatom PyShell provides a new molecular view of how CO2 is assimilated in the ocean, a crucial biome that is on the front lines of climate change.


INTRODUCTION
Diatoms are one of the most dominant groups of phytoplankton in the ocean.They are responsible for 15%-20% of annual global primary production, 1,2 powering the Earth's carbon cycle and feeding energy into vast marine food webs.Despite their importance, the underlying molecular mechanisms that enable diatoms to efficiently assimilate CO 2 remain poorly understood.Diverse clades of eukaryotic algae, including diatoms, rely on a biophysical CO 2 -concentrating mechanism (CCM) to thrive in CO 2 -limited aquatic environments.Algal CCMs use HCO 3 À transporters to accumulate dissolved inorganic carbon (DIC) in the chloroplast and then use carbonic anhydrases (CAs) to convert this HCO 3 À into a high local concentration of CO 2 in the pyrenoid, a chloroplast subcompartment packed with the carbon-fixing enzyme ribulose 1,5-bisphosphate carboxylase/ oxygenase (Rubisco).][5][6] Pyrenoids are a general feature of algal CCMs.However, these chloroplast subcompartments have convergently evolved numerous times and exhibit a wide variety of morphologies, [7][8][9] indicating that pyrenoids in different clades may have distinct components and specialized mechanisms.To date, only the pyrenoid of the freshwater green alga Chlamydomonas reinhardtii has been characterized in molecular detail.5][26] Several proteins have been localized to the pyrenoid of the marine diatom, Phaeodactylum tricornutum.In addition to Rubisco, this pyrenoid contains b-type CAs, 27 fructose 1,6-bisphosphate aldolases (FBAs), 28 and a q-type CA specifically localized in the lumen of the thylakoids at the center of the pyrenoid. 6,29Although these observations strongly suggest that diatom pyrenoids increase CO 2 concentration around Rubisco in a similar fashion to green algae, the pyrenoid components in P. tricornutum have distinct origins and arose from endosymbiotic red algae, stramenopile host cells, or diatomspecific bacterial gene transfer. 28,30,31In other words, the pyrenoids of diatoms and green algae may have convergently evolved similar functions from a different set of proteins.
In this study, we identify and characterize a distinct component of diatom pyrenoids that is not present in C. reinhardtii: the pyrenoid shell (PyShell).This proteinaceous sheath tightly encases the Rubisco matrix, is required for establishing pyrenoid architecture, and is essential for efficient photosynthesis and cell growth.We directly observe PyShells in both pennate diatoms (P.tricornutum) and centric diatoms (Thalassiosira pseudonana), while bioinformatic analysis suggests that PyShells are common in several clades of marine algae and, thus, likely play a major role in driving the ocean's carbon cycle.

Identification and localization of diatom PyShell proteins
To identify components of diatom pyrenoids, we performed in vivo photo-crosslinking, 32 then disrupted the cells and looked for proteins that co-migrated with the Rubisco large subunit (RbcL) by sucrose density gradient centrifugation and SDS-PAGE (diagrammed in Figure 1A).P. tricornutum cells were fed with L-photo-leucine and L-photo-methionine, synthetic amino acid derivatives with diazirine rings in their side chains.Because they are structurally similar to natural amino acids, these photoreactive amino acids (pAAs) are taken up by the cells and incorporated during protein synthesis. 33These P. tricornutum cells were then irradiated with UV light, causing the pAAs to form reactive carbenes that enable ''zero-distance'' photo-crosslinking with directly interacting proteins.Crude extracts from the photo-crosslinked cells were separated by SDS-PAGE and immunoblotted for RbcL (''procedure A,'' Figure 1B).In only the sample with both pAAs and UV irradiation, we observed a RbcL-positive band trapped in the stacking gel, which we analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS).In an alternative approach, the crude extracts were separated by sucrose density gradient followed by SDS-PAGE (''procedure B,'' Figure 1C).The sample treated with both pAAs and UV irradiation showed RbcL bands in denser sucrose fractions, which we subjected to LC-MS/MS analysis.
From these two procedures, we identified more than 100 candidates for P. tricornutum pyrenoid proteins.We then filtered this list for the presence of the stramenopile-specific plastid-targeting sequence (ER+ASAFAP) at the N terminus, 34 yielding 22 and 36 candidate chloroplast proteins from procedures A and B, respectively (Tables S1A and S1B).Eleven of the candidate proteins were identified in both procedures (Figure S1A).In addition to known pyrenoid proteins such as Rubisco, b-CAs, 27 and FBAs (C1 and C5), 28 we identified several new candidates, including an abundant protein of unknown function encoded by the gene Pt45465.We chose to study this uncharacterized protein and its homologs in the model diatoms P. tricornutum and T. pseudonana.
The P. tricornutum gene Pt45465 (PtPyShell1a) is located on chromosome 7 together with a paralog Pt45466 (PtPyShell2a), which shares 74% identity.There are duplications of both genes on chromosome 28: Pt50215 (PtPyShell1b) and Pt50214 (PtPy-Shell2b).PtPyShell2 was also detected as a putative pyrenoid protein in procedure A (Table S1A).Analysis by quantitative PCR (qPCR) showed that PtPyShell1a/b and PtPyShell2a/b are expressed in wild-type (WT) P. tricornutum cells grown under normal atmospheric CO 2 (0.04%, hereafter ''LC'' for ''low CO 2 '') and high CO 2 (1%, hereafter ''HC'') (Figure S1D).All PtPyShell proteins harbor stramenopile-type plastid-targeting sequences.These proteins contain no transmembrane helices and thus are likely localized to the stroma.We defined two conserved regions (CR1 and CR2) in the PyShell proteins (Figures S1B and S1C), which were composed of >50% hydrophobic amino acids.Using CR1 and CR2 as reference sequences, we searched for candidate PyShell genes and found homologs primarily in diatoms and haptophytes, but also in a few marine algae from other clades (Figure S1F).
We next checked the subcellular localization of PyShell proteins in P. tricornutum and T. pseudonana by fluorescence microscopy (Figures 1D and S2).We generated strains of these two diatom species expressing PtPyShell1a, PtPyShell2a, TpPyShell1, TpPyShell2, or TpPyShell3 fused to a C-terminal green fluorescent protein (GFP) tag.In P. tricornutum, PtPyShell1a:GFP and PtPyShell2a:GFP signal was clearly detected in a ''hollow rod'' shape at the center of the chloroplast where the chlorophyll autofluorescence was dim, suggesting localization to the pyrenoid, perhaps surrounding the Rubisco matrix (Figures 1D, S2A, and S2B).In T. pseudonana, we similarly observed TpPyShell1:GFP, TpPyShell2:GFP, and TpPyShell3:GFP signal in a rod shape at the center of the chloroplast (Figures 1D and  S2B).We further analyzed P. tricornutum by immunoelectron mi-croscopy, with anti-GFP nanogold localization confirming that the PtPyShell1a:GFP proteins were accumulated along the peripheral regions of the pyrenoid (Figure 1E).

Molecular architecture of the PyShell lattice inside native diatom cells
[37]    P. tricornutum and T. pseudonana cells were vitreously plungefrozen on EM grids, thinned with a focused ion beam (FIB), 38,39 and imaged in three dimensions with cryo-ET.We observed that the pyrenoids of both diatom species are surrounded by a proteinaceous sheath, which tightly encloses the Rubisco matrix (Figures 2A-2D and S3).Closer inspection of these sheaths revealed that they are apparently formed from a repetitive lattice of protein subunits (Figures 2E, 2F, S3C, and S3G).We hereafter refer to this pyrenoid-encapsulating shell as ''the PyShell''; its location is consistent with our observations of GFP-tagged Py-Shell proteins in P. tricornutum and T. pseudonana, while our structural and mutational analysis described later in this study definitively implicate PyShell proteins in sheath formation.
The native cellular views provided by cryo-ET revealed some species-specific differences in pyrenoid architecture.In P. tricornutum, the PyShell is relatively flat and straight.Two specialized thylakoids penetrate the Rubisco matrix and run the length of the pyrenoid (Figures 2A, 2G, and S3B).The luminal space of these traversing thylakoids is swollen and sometimes filled with dense particles (red arrowheads).At the two ends of the pyrenoid, the PyShell closely associates with these two traversing thylakoids, which exit the pyrenoid and connect to the rest of the thylakoid network (Figures 2A, 2B, 2G, and S3A).In T. pseudonana, the PyShell has more regions of high local curvature.This pyrenoid is also bisected by one or two specialized thylakoids (Figures 2C, 2D, 2H, 2I, and S3F) that frequently contain dense particles in their lumen; however, we never observed these thylakoids exiting the pyrenoid.Instead, at the two ends of the pyrenoid, the PyShell interacts with itself like a zipper to seal the Rubisco matrix (Figures 2C, 2D, 2H,  and S3E).Due to the limited cell area visualized by cryo-ET, we cannot definitely conclude that the pyrenoid-traversing thylakoids in T. pseudonana are disconnected from the rest of the thylakoid network.However, if such connection sites exist, they are much rarer than in P. tricornutum.We observed enough cells in asynchronous cultures to conclude that these architectural differences are characteristic of the two diatom species and not due to cell division.
In our raw tomograms, the PyShell showed different features, depending on its orientation.When observed in cross-section, it resembled a solid line, a chain of dots, or a zig-zag (Figures 2I right, S3C, and S3G).However, when the PyShell twisted 90 to show its face, we could observe a lattice of subunits producing clear stripe patterns (Figures 2I middle, S3C, and S3G).To understand the three-dimensional (3D) structure of the PyShell, we performed subtomogram averaging (STA) 40,41 of subvolumes picked along PyShell sheaths from our highest quality tomograms of T. pseudonana.After iterative alignment and classification steps, we ultimately resolved an $20-A ˚-density map of the native T. pseudonana PyShell using 14,341 subvolumes from seven tomograms (Figures 2J and S4).The STA density contains the stripe and zig-zag features seen in the raw tomograms, and reveals the 3D architecture of a tightly packed pseudocrystalline protein lattice.
High-resolution in vitro structure of the T. pseudonana PyShell lattice We required even higher resolution to determine precisely how individual PyShell proteins assemble to form a tight protein lattice.Therefore, we reconstituted the PyShell in vitro and performed single-particle cryoelectron microscopy (cryo-EM).We expressed and purified recombinant TpPyShell1, which when concentrated in vitro to 2 mg/mL, self-assembled to form both flat sheets and hollow tubes with an outer diameter of $32 nm (Figures S5A-S5C).Following cryo-EM imaging and particle picking along the tubes, we used single-particle analysis (SPA) and helical reconstruction to attain a 2.4-A ˚density map (Figures 3A, S5E, and S5F), enabling us to build an atomic model of TpPyShell1 assembled in a homo-oligomeric lattice (Figures 3B-3E and S5G).
The in vitro tube is an assembly of TpPyShell1 proteins in two alternating poses: half of the monomers face inward, while the other half face outward with a 90 in-plane rotation relative to the inward-facing monomers (Figures 3A, 3C, and 3D).Thus, the minimal building block of the tube is a homodimer of TpPyShell1 proteins adopting these two poses (Figure 3E).
Each TpPyShell1 monomer contains 16 b-strands arranged in two parallel b-sheets, one slightly more extended than the other (Figure 3B, teal and purple for more-and less-extended sheets, respectively).The TpPyShell1 monomer has an internal pseudo-2-fold symmetry that subdivides the two b-sheets into the two conserved regions that we previously identified by sequence homology: CR1 and CR2 (Figure S5H).A short a-helix spanning residues 169-181 (Figures 3B and S5H, pink) is positioned along the wall of the less-extended b-sheet and connects CR1 with CR2.Our TpPyShell1 model starts at Trp69 because the N-terminal domain is poorly resolved and likely flexible; a second short a-helix appears to be present in this region (approximately spanning residues 64-72) but could not be clearly modeled.The C-terminal domain extends from one monomer and inserts its (G and H) Comparison of pyrenoid ends.In P. tricornutum, there is a gap in the PyShell that allows entry of two specialized thylakoids into the pyrenoid.In T. pseudonana, two apposing sheets of the PyShell bind each other to seal the pyrenoid matrix.Scale bars: 50 nm.(I) Molecular details of the PyShell in raw tomograms.Left: overview revealing a stripe pattern when the PyShell twists to show its surface view (red arrowhead: particles inside the lumen of traversing thylakoid).Center: zoom-in on the surface view, with the major stripes of the PyShell lattice marked with yellow arrowheads.Right: zoom-in on a cross-section view, showing an apparent lattice of dimers.Scale bars: 50 nm in left, 10 nm in middle, 5 nm in right.(J and K) Subtomogram average (STA) of the PyShell from T. pseudonana, displayed in 3D isosurface view (J), as well as 2D slices (K) showing the surface view (yellow arrowheads: major stripes of lattice) and cross-section view.Scale bars: 10 nm in left, 5 nm in right.See Figure S3 for additional cryo-ET images from both species.See also Figures S3 and S4 and Videos S1 and S2.(E) Surface model representation of a homodimer unit from the lattice.The C-term extends and contacts a putative pocket in the adjacent monomer (see also Figure S5J).
(legend continued on next page) positively charged Arg292 residue into a negatively charged pocket on the adjacent monomer of the opposite pose, possibly providing a stabilizing interaction for the lattice (Figures 3E and  S5J).We also observed small $1 nm gaps in the lattice at the junctions between four monomers; the residues surrounding these gaps do not have obvious surface charge or hydrophobic properties (Figure S5I).
We next compared the lattice architecture of in vitro TpPy-Shell1 to the architecture of the PyShell sheath from inside T. pseudonana cells (Figures 3F and S6).To do so, we unrolled the high-resolution SPA map of the TpPyShell1 tube to match the lateral subunit periodicity that we measured from TpPyShell1 flat sheets (Figures S5C and S6A-S6E).The unrolled SPA map was then filtered to match the 20-A ˚resolution of the in situ STA map (Figures 3F and S6E).The two maps were overlaid and then scored for correlation with an in-plane rotational search, determining an optimal fit from the scores and visual inspection (Figures S6F and S6G).The two 20-A ˚maps are not identical (perhaps reflecting additional structural heterogeneity in situ), but the overall features align well.From this analysis, we conclude that the SPA density of the in vitro TpPyShell1 lattice shows good correspondence with the STA density of the native PyShell sheath when compared at 20 A ˚resolution.This helps confirm that the pyrenoid sheath we observe inside diatom cells is indeed composed of PyShell proteins and also indicates that the overall subunit architecture of the in vitro TpPyShell1 lattice is likely physiologically relevant.
To better visualize the molecular architecture of a flat TpPyShell1 lattice, we fit models of the TpPyShell1 monomers into the full-resolution unrolled SPA map.Cross-section views through the flat lattice model (Figure 3G) show how the moreextended b-sheets (teal) align to form a continuous wall at the center of the PyShell.The less-extended b-sheets (purple) protrude inward and outward, positioning the short a-helices (pink) on each lobe as the most distant domain from the central wall of the PyShell lattice.Although this simplified model using only TpPyShell1 is consistent with the native PyShell architecture at the modest 20-A ˚resolution of our in situ average, our localization of GFP-tagged PyShell proteins (Figure S2) suggests that the native PyShell lattice is likely more heterogeneous and composed of multiple PyShell isoforms, which have similar structures that would require a much higher resolution in situ STA map to distinguish.Other factors that might bind the native PyShell lattice asymmetrically or substoichiometrically are also not resolved in our STA map (see ''limitations of the study'').

PyShell mutants have altered pyrenoid morphology and reduced CO 2 fixation
To understand the physiological role of the PyShell in vivo, we performed simultaneous gene disruptions of TpPyShell1 and TpPyShell2 in T. pseudonana using a CRISPR-Cas9 (D10A) nick-ase approach that we recently developed for diatom gene editing. 42Because these two genes share high sequence identity (92%), we were able to design a single set of guide RNAs specifically targeting the CR1 domain of both genes (Figures S7A and  S7B).Two independent biallelic double-knockout mutants were successfully obtained, denoted DTpPyShell1/2-1 (m1) and DTpPyShell1/2-2 (m2) (Figure S7C).Western blotting indicated that m1 and m2 lacked both the TpPyShell1 and TpPyShell2 proteins (Figure 4A).Expression of other PyShell family genes (TpPyShell3, 4, 5, and 6) did not increase to compensate for the loss of TpPyShell1 and 2 (Figure S7D).
Mutants m1 and m2 both showed severely inhibited growth in normal atmospheric CO 2 (LC, Figure 4B).Compared with WT, the mutants had a longer lag phase, a slower growth rate, and took twice as long to reach stationary phase.In contrast, the growth profiles of WT, m1, and m2 were similar when supplemented with 1% CO 2 (HC, Figure 4C).This indicates that disruption of the major PyShell genes in T. pseudonana gives a clear HC-requiring phenotype, presumably because of an impaired CCM.To test this hypothesis, we analyzed the photosynthetic affinity of WT and mutant cells for DIC by measuring net O 2 evolution rate at increasing DIC concentrations (Figures 4D and  S7F).Whereas WT cells reached their maximum photosynthetic rate (P max ) at <0.5 mM [DIC], the mutants m1 and m2 required >10 mM [DIC] to reach their P max .Other photosynthetic parameters, including the DIC compensation point and apparent photosynthetic conductance, also support the highly impaired photosynthetic activity in these mutants (Table S2), indicating that they cannot efficiently provide CO 2 to Rubisco in seawater with less than 10 mM [DIC].Currently, the average [DIC] near the ocean surface is $2 mM. 43o investigate how this HC-requiring mutant phenotype is related to pyrenoid morphology, we reconstructed the 3D architecture of WT, m1, and m2 cells using FIB scanning electron microscopy (FIB-SEM).Cells were cryo-fixed at high pressure to improve sample preservation, then subjected to freeze substitution and resin embedding.Following FIB-SEM imaging (Fig- ure 4E), we used 3D segmentation to quantify the volumes and shapes of chloroplast regions (Figures 4F-4J).In WT T. pseudonana, pyrenoids consisted of a single elongated compartment occupying around 10% of the chloroplast volume, similar to our previous measurements of P. tricornutum. 9Conversely, m1 and m2 chloroplasts contained multiple ellipsoid pyrenoidlike structures of heterogeneous size and higher sphericity than WT, indicating that removal of the PyShell causes fragmentation of the Rubisco matrix and loss of its normal elongated architecture.Furthermore, the total measured pyrenoid volume decreased to <5% of the chloroplast in the mutants (Figure 4J), suggesting that a portion of the Rubisco may be dispersed in the stroma or present as small aggregates not detectable by our FIB-SEM imaging.

(legend continued on next page)
We next performed in situ cryo-ET of m1 and m2 cells to gain a molecular-resolution view of the mutant pyrenoids (Figures 4K-4R).Consistent with the FIB-SEM imaging, we observed pyrenoid-like aggregates of Rubisco matrix that were rounder than WT pyrenoids.With the resolution afforded by cryo-ET, we confirmed that these mutant pyrenoids were not encased in a PyShell sheath.Nevertheless, the cohesion of the Rubisco matrix was maintained without the PyShell, consistent with a recent report of a linker protein that mediates phase separation of Rubisco in diatoms. 44The rounder shape of the mutant pyrenoids and clear boundary between the Rubisco matrix and the stroma (Figures 4O, 4P, and S7E) are also consistent with phase separation.Strikingly, the specialized thylakoids with luminal particles that normally traverse the long axis of WT pyrenoids were strongly disturbed in the mutants.Thylakoids containing luminal particles were frequently seen in the peripheral regions of mutant pyrenoids (Figures 4Q, 4R, and S7E) but often appeared fragmented and did not pass through the matrix center (Figures 4K-4N).Therefore, the PyShell seems to play important roles in maintaining the elongated shape and singular cohesiveness of T. pseudonana pyrenoids as well as in helping position the specialized thylakoids on an end-to-end trajectory bisecting the Rubisco matrix.

DISCUSSION
In this study, we discovered the PyShell, a protein sheath that encases the pyrenoids of P. tricornutum and T. pseudonana, model species of the pennate and centric diatom clades, respectively (Figure 1).We characterized the structure of the T. pseudonana PyShell lattice across scales, from near-atomic resolution in vitro to molecular resolution within native diatom cells (Figures 2 and 3).Our functional analysis of T. pseudonana PyShell deletion mutants showed that the PyShell sheath maintains pyrenoid architecture and is essential for efficient performance of the CCM and, thereby, the ability of diatoms to grow by assimilating environmental CO 2 (Figure 4).
How does the PyShell contribute so significantly to diatom CCM function?Comparison with the well-characterized pyrenoid of the green alga C. reinhardtii may help provide some insight.Reaction-diffusion modeling of C. reinhardtii 45 suggests that all pyrenoid-based CCMs require the following essential features: (1) aggregation of most of the chloroplast's Rubisco enzymes, (2) a local source of high CO 2 concentration at the center of this Rubisco aggregate, and (3) a diffusion barrier at the aggregate border to prevent CO 2 leakage.Our data indicate that the PyShell contributes to the first two essential pyrenoid features (Figure 5A), and we wonder whether the PyShell may directly perform the third (Figure 5B).
In C. reinhardtii, the molecular mechanism underlying aggregation of pyrenoid Rubisco (essential feature #1) has been well characterized.7][48] Evidence is now mounting that the Rubisco matrix of diatom pyrenoids also forms by phase separation.Recently, the multivalent linker protein PYCO1 was shown to localize to the P. tricornutum pyrenoid matrix and phase separate with Rubisco in vitro. 44Furthermore, our FIB-SEM and cryo-ET imaging of the T. pseudonana DTpPyShell1/2 mutants (m1 and m2) show that Rubisco continues to aggregate in the absence of the PyShell, forming rounder bodies with clear boundaries between Rubisco and stroma that are consistent with phase-separated condensates ( Figures 4E, 4F, 4H, 4K-4P, and S7E).The PyShell is thus not required for phase separation of Rubisco.However, it may still play a role in concentrating the majority of Rubisco at one location.In the m1 and m2 mutants, Rubisco forms multiple smaller condensates throughout the chloroplast, and the combined volume of these dispersed condensates is less than a single WT pyrenoid, suggesting that some Rubisco may not enter the condensed phase (Figures 4E-4G, 4I, and 4J).We speculate that a direct or indirect interaction between the PyShell and Rubisco helps concentrate all the condensed Rubisco into the pyrenoid matrix.Indeed, in some cryo-ET images, it appears that a single layer of ordered Rubisco is lined up along the PyShell lattice (Figures 2F, S3D,  and S3H).The precise mechanism of this direct or indirect PyShell-Rubisco interaction requires further investigation.
C. reinhardtii and diatoms use a common strategy to produce a local source of high CO 2 concentration at the center of the pyrenoid's Rubisco matrix (essential feature #2).Both algae have thylakoid-derived membrane systems that cross the center of the pyrenoids, taking the shape of cylindrical tubules in C. reinhardtii 12,49 and specialized thylakoid sheets in P. tricornutum and T. pseudonana. 25,50Inside the luminal space of these pyrenoid-traversing tubules and thylakoids, there are CAs (a-type CAH3 in C. reinhardtii, q-type Ptq-CA1 in P. tricornutum, and Tpq-CA2 in T. pseudonana) 6,15,29,51 that convert HCO 3 À into CO 2 at the center of the pyrenoid.This helps increase Rubisco efficiency by providing the enzyme with a high concentration of its CO 2 substrate, outcompeting unproductive binding to O 2 .However, in the T. pseudonana m1 and m2 mutants, the specialized thylakoids are mislocalized to the periphery of the Rubisco condensate and, therefore, cannot provide a source of CO 2 at the center of the pyrenoid (Figure 5A).Our cryo-ET observations clearly show that the PyShell is necessary to correctly position these specialized thylakoids along the long axis of the Rubisco matrix (Figures 4K-4N).By constricting the pyrenoid matrix into an elongated compartment surrounding the CA-containing thylakoids, the PyShell minimizes the diffusion distance of CO 2 from CA-source to Rubisco-sink.Modeling indicates that CCM efficiency is greatly enhanced when a pyrenoid's Rubisco matrix is surrounded by a diffusion barrier to limit CO 2 leakage (essential feature #3).In C. reinhardtii, the identity of this barrier is debated, with candidates including a pyrenoid-peripheral CA and the pyrenoid's surrounding starch sheath. 21,45In diatoms, the PyShell forms a tight protein sheath around the Rubisco matrix that larger molecules certainly cannot pass.In this way, the PyShell is analogous to the proteinaceous shells of cyanobacterial carboxysomes, 52,53 which similarly form a dense wall around an aggregate of encapsulated Rubisco.5][56][57][58] In this way, HCO 3 À would diffuse into the carboxysome, where it is converted to CO 2 by a CA, trapping a high concentration of CO 2 with Rubisco inside the carboxysome shell.
0][61] Our high-resolution in vitro structure of a homo-oligomerized TpPyShell1 tube revealed $1 nm gaps in the otherwise densely packed lattice (Figure S5I).However, to understand whether the PyShell functions as a selective diffusion barrier (Figure 5B), detailed investigations are required to probe the PyShell's permeability to small molecules, in particular CO 2 , O 2 , and the sugar metabolites ribulose 1,5-bisphosphate (RuBP) and 3-phosphoglyceric acid (3-PGA).These studies must take into consideration that the in vivo PyShell may be heterogeneous and also bound by interacting proteins.In summary, the m1 and m2 PyShell deletion mutants exhibit such a strong inhibition of photosynthetic efficiency and growth (Figures 4B-4D) because they may have defects in all three essential features of pyrenoid CCMs: Rubisco aggregation into the pyrenoid matrix, a CO 2 source at the center of the matrix, and-speculatively-a barrier around the matrix to prevent CO 2 leakage.Our physiological measurements (Figures 4D and S7F) show that the PyShell-deficient mutants require $10 mM [DIC] to saturate their photosynthesis, which is roughly 5-fold higher than the concentration of DIC in the ocean.Thus, the PyShell is likely essential to maintain efficient carbon fixation of diatoms in the wild.Our bioinformatic analysis indicates that PyShell orthologs are widespread mainly in diatoms and haptophytes (Figure S1F).Although these two clades are not closely related phylogenetically, both have a plastid of red algal origin.Orthologs are also found in some non-diatom stramenopiles, including pelagophytes and dictyochophytes, as well as several species of alveolata.It is plausible that the PyShell originated from the photosynthetic ancestor of haptophytes and the SAR supergroup (stramenopiles, alveolates, and Rhizaria), and, thereafter, evolved independently in each clade.Even in our cryo-ET comparison between two diatom species, we noted substantial differences in PyShell architecture, in particular at the pyrenoid ends (Figures 2A-2D, 2G, 2H, S3A, and S3E).Capturing the variations in PyShell architecture across evolution will require extensive cryo-ET of diverse algae, including in non-model species from culture collections or sampled directly from the environment. 62iatoms and haptophytes produce immense biomass through the photosynthetic uptake of CO 2 , which provides an energy source for much of the life in the ocean.To understand the global prevalence of the PyShell, we queried Tara Oceans metagenomic environmental sampling data in the Ocean Gene Atlas 63,64 and found that PyShell genes are broadly detected in marine environments around the world (Figure S1E).The PyShell therefore plays a major role in the CCM of environmentally relevant marine algae, which account for roughly half of carbon fixation in the ocean and, by extension, one quarter of carbon fixation on our planet.To forecast the future of the global carbon cycle, it will be important to understand how well PyShell-mediated carbon fixation can adapt to rapidly accelerating climate change.
Major efforts are underway to engineer cyanobacterial carboxysomes and algal pyrenoids into plants to increase their carbonfixation capacity. 5,65It is estimated that introducing such a CCM could increase yield by up to 60%, while reducing water and fertilizer requirements. 66Recent progress has been made in assembling components of the C. reinhardtii pyrenoid inside Arabidopsis thaliana, including the EPYC1 linker, which causes Rubisco to condense to form a pyrenoid-like matrix within these plant chloroplasts. 67,68Our discovery and characterization of the diatom Py-Shell expands the molecular toolbox with a lattice-forming protein that may be able to encapsulate these Rubisco condensates, providing a boundary between the artificial pyrenoid and the surrounding chloroplast.These engineering efforts hold potential for designing crops that grow faster, consume less resources, and are more resistant to environmental stress, helping feed the world's growing population in regions of the planet that climate change is rapidly making less arable.Although reducing CO 2 emissions is the immediate priority, 69 improving biological carbon capture may even one day help slow the progression of global warming by removing more CO 2 from the atmosphere.

PyShell structure
The N-terminal 68 residues of TpPyShell1 were not well resolved in our in vitro SPA map, likely due to flexibility, and were not modeled.The high resolution of the map did reveal a putative electrostatic interaction between the extended C-terminal domain and a pocket on the neighboring PyShell subunit (Fig-ure S5J), but the role of this interaction remains to be tested experimentally.The biggest mystery from our SPA data is that the inside and outside surfaces of the TpPyShell1 lattice are structurally interchangeable (Figures 3C and 3G).When filtered to 20 A ˚resolution, the in vitro SPA map corresponds well to the STA map of the native PyShell (Figures 3F and S6).However, additional studies are required to understand whether the two surfaces of the PyShell are equivalent or distinct in vivo.For example, in the pyrenoid, Rubisco is only bound to the PyShell's inner surface, but we cannot say from our data whether this is driven by asymmetry of the PyShell surfaces or rather is a consequence of pyrenoid biogenesis events.Comparison of the SPA and STA maps reveals some extra density in the STA map at the lobes extending furthest from the sheet; this could perhaps be due to additional proteins binding the native PyShell (Figures 3F and S6F), but we cannot make confident conclusions about a small region of density at 20 A ˚resolution.With the cryo-ET dataset we acquired for this study, we were unable to improve the STA resolution further due to the small size of the PyShell monomers, 70 preferred orientation of particles picked along PyShell sheets, and potential heterogeneity of PyShell proteins within the native lattice.PyShell heterogeneity There are multiple homologous PyShell genes in both P. tricornutum and T. pseudonana (Figure S1B), with the latter expressing both TpPyShell1 and TpPyShell2 at high levels, along with lower levels of TpPyShell3-6 (Figure S1D).To resolve a high-resolution PyShell structure, we assembled the in vitro tube from only TpPyShell1, which also proves that a homogeneous solution of this single protein is sufficient to assemble a lattice.It is quite possible that multiple PyShell isoforms hetero-oligomerize to make the PyShell lattice in vivo.Indeed, we observed that GFP-tagged TpPyShell1, 2, and 3 each localize to the pyrenoid (Figure S2), and a parallel study by Nam et al. in this issue of Cell localized all six TpPyShell proteins to the pyrenoid, with distinct functions suggested for TpPyShell1/2 and TpPyShell4. 71owever, our cryo-ET analysis lacks the resolution to distinguish this heterogeneity due to the high homology and predicted structural similarity of different PyShell proteins.

PyShell permeability
Our in vitro SPA structure shows small $1 nm gaps at the junctions between four monomers in the TpPyShell1 lattice (Figures 3C and S5I).Although these gaps are potentially large enough to allow gas and some small metabolites to pass through the Py-Shell wall, in-depth functional studies would be required to assay PyShell permeability.

Pyrenoid-traversing thylakoids
In cryo-ET of both diatom species, we observed dense particles within the lumen of pyrenoid-traversing thylakoids (Figures 2I,  4Q, 4R, S3B, and S3F).This is the location thought to be occupied by a CA (Ptq-CA1 and Tpq-CA2), but we lack the resolution to assign an identity to these luminal particles.Our cryo-ET data of the DTpPyShell1/2 mutants does not explain how thylakoids in the proximity of the Rubisco matrix acquire these luminal particles yet fail to properly bisect the matrix.

Prevalence of the PyShell in other algae
Our bioinformatic analysis indicates that PyShell homologs are commonly found throughout diatoms and haptophytes, and they may also be present in other algal clades (Figure S1F).However, this analysis is biased toward sequenced species, so some clades may be underrepresented.

Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Yusuke Matsuda (yusuke@kwansei.ac.jp).

Materials availability
Diatom strains and plasmids generated in this study are available from the lead contact without restriction.

Diatom Cultures
The marine diatoms P. tricornutum Bohlin (UTEX642) and T. pseudonana (Hustedt) Hasle et Heimdal (CCMP 1335) were axenically and photoautotrophically cultured in artificial seawater medium with the addition of half-strength Guillard's 'F' solution 101,102 supplemented with 10 nM sodium selenite under continuous light (20 C, 40 mmol photons m À2 s À1 , fluorescent lamp).The cultures were aerated with ambient air (0.04% CO 2 ) or 1% CO 2 gas for LC or HC conditions, respectively.For the culture of T. pseudonana, the concentration of NaCl was lowered to 270 mM in the medium.Diatom cells were first inoculated from the plate to 50 mL Erlenmeyer flasks with gentle agitation, then cultures were transferred to 100 mL test tubes that were aerated (300 mL min À1 ) with normal atmospheric air or air enriched with 1% (v/v) CO 2 .The mutant cultures m1 and m2 were grown in the same conditions but supplemented with 5 mM bicarbonate.
Expression of GFP fusion proteins in P. tricornutum and T. pseudonana Correct full-length coding sequences for PyShell orthologues in P. tricornutum and T. pseudonana (PtPyShell1a, PtPyShell2a, TpPyShell1, TpPyShell2, and TpPyShell3) were determined by RACE using a SMARTer RACE 5'/3' kit (TaKaRa).Sequences were amplified by PCR and cloned into pPha-T1 or pTha-NR vectors containing a fragment of enhanced GFP by a seamless ligation cloning extract method. 104The resulting plasmids were introduced into each WT cell using particle bombardment (PDS-1000/He, BioRad, Tokyo, Japan), and transformants expressing GFP were screened by fluorescence microscopy. 27Primers used are listed in Table S3B.

Bacterial expression of recombinant TpPyShell1 protein for cryo-EM analysis
The expression plasmid for TpPyShell with an N-terminal His6-tag on pET28a was transformed into Escherichia coli strain BL21(DE3).
Cells were cultured at 37 C in 6 L of LB medium containing 100 mg/mL Kanamycin.When the OD 600 reached 0.5, an IPTG solution was added to a final concentration of 0.1 mM for induction of TpPyShell expression, and the culture was incubated overnight at 37 C.

METHOD DETAILS
In vivo cross-linking with photo-reactive amino acids P. tricornutum cells grown in LC were harvested at logarithmic growth phase and resuspended in fresh medium at a concentration of OD 730 = 1.0À1.2, in the presence of 1 mM ʟ-photo-leucine and 2 mM ʟ-photo-methionine (Thermo Fisher Scientific, Waltham, MA, USA).Incorporation of these photo-reactive amino acids (pAA) was performed under illumination with blue (455 nm) and red (635 nm) LED light (50 mmol photons m À2 s À1 ) for 6À24 hours.Subsequently, the cell cultures were irradiated with UV light (365 nm) for 30À45 min to perform in vivo photo-cross-linking.Cells were harvested and resuspended in 25 mM Tris-HCl (pH 7.0), then disrupted by sonication.Insoluble debris was removed by centrifugation, and the resulting supernatant was either subjected to SDS-PAGE (for gel digestion, ''procedure A'') or centrifuged on a 25À55% (w/v) linear sucrose gradient in 25 mM Tris-HCl (pH 7.0) at 210,000 3 g for 4 h at 4 C (for solution digestion, ''procedure B'').Aliquots (200 mL) of each fraction were collected, and the protein concentrations were determined with a protein assay kit (Bio-Rad, Hercules, CA, USA) using bovine serum albumin as a standard.

Western blotting
Proteins extracted as described above were electrophoretically separated by SDS-PAGE, transferred to PVDF membrane, and blocked with 1% (w/v) skim milk dissolved in phosphate-buffered saline (PBS) containing 0.05% (v/v) Tween 20.For detection of RbcL, a rabbit anti-RbcL antiserum generated against P. tricornutum RbcL partial peptide (Japan Bio Serum, Hiroshima, Japan) was used as the primary antibody (diluted 1:1000).Goat anti-rabbit IgG conjugated with horseradish peroxidase was used as the secondary antibody (diluted 1:10000).Immunoreactive signals were detected by an enhanced chemiluminescence reagent (ImmunoStar Zeta, Wako, Osaka, Japan) with a high sensitivity CCD imaging system (Luminograph I, ATTO, Tokyo, Japan).
For the confirmation of the deletions of TpPyShell1 and 2 in m1 and m2, we disrupted the T. pseudonana WT and mutants grown under HC by sonication in 50 mM HEPES (pH 7.5) with a protease inhibitor cocktail (nacalai, Kyoto, Japan) to obtain the crude extracts.Each 5 mg of protein were loaded and analyzed by immunoblotting as mentioned above.A rabbit anti-TpPyShell1 and 2 antiserum (Japan Bio Serum) targeting the conserved peptide sequence ''GTARDLAEIWDNSS'' (residues 60-73 of TpPyShell1 and 59-72 of TpPyShell2) was used as the primary antibody.The anti-RbcL antibody also used as a loading control.

Identification of proteins by LC-MS/MS
Proteins either in acrylamide gel (''Procedure A'') or solution (''Procedure B'') were subjected to reduction, alkylation, and digested by trypsin before injection into the LC-MS/MS.For gel samples, gel blocks (ca. 1 mm 3 ) were dehydrated in 100 mL acetonitrile for 10 min at room temperature.After the removal of acetonitrile, the gel block was dried in an evaporator and incubated in 25 mM NH 4 HCO 3 containing 10 mM dithiothreitol for 1 h at 56 C. The gel block was washed in 100 mL of 25 mM NH 4 HCO 3 and incubated with 55 mM iodoacetamide for 45 min at room temperature.After washing twice with 100 mL of 25 mM of NH 4 HCO 3 , the gel block was dehydrated in acetonitrile.The dried gel blocks were soaked in 50 mM NH 4 HCO 3 containing 10 ng mL À1 trypsin at 37 C for 16À20 h.Digested peptides were extracted with 50% (v/v) acetonitrile containing 5% (v/v) formic acid, concentrated by an evaporator, and dissolved in 1% (v/v) formic acid.The solutions were desalted by ZipTip C18 (Merck Millipore).For solution samples, disulfide bonds were reduced in 50 mM Tris-HCl (pH 8.5) containing 10 mM dithiothreitol at 37 C for 1.5 h, and subsequently alkylated with 50 mM iodoacetamide for 30 min at room temperature.Proteins were digested with 2 ng mL À1 trypsin in 50 mM NH 4 HCO 3 for 16À20 h at 37 C, and then the reaction was stopped by addition of 0.18% (v/v) formic acid.Peptide samples were concentrated in an evaporator, dissolved in 0.1% (v/v) formic acid, and then desalted by ZipTip C18 (Merck Millipore, Burlington, MA, USA).The digested samples were injected into EASY-nLC 1000 connected to LTQ Orbitrap XL (Thermo Fisher Scientific).Data of LC-MS/MS were analyzed by the software Proteome Discoverer 1.4 (Thermo Fisher Scientific) with the open genome data resource for P. tricornutum from JGI (Phatr2).Homologs of PtPyShell1 were identified by BLAST search on the reference genomes of P. tricornutum CCAP1055/1 (Phatr2) and T. pseudonana CCMP1335 (Thaps3).Two conserved regions (CR1 and CR2) were identified in all PyShell proteins using ClustalW sequence alignment.

qRT-PCR of PyShell transcripts
Transcript levels of PyShell genes were quantified by qRT-PCR in P. tricornutum and T. pseudonana (Figures S1D and S6D).In T. pseudonana, TpdCA3 (Tp233) was also analyzed as a control LC-inducible gene.The internal controls were Actin (Pt51157), GapC2 (Pt51129), and Histone 4 (Pt26896) in P. tricornutum and Actin (Tp25772), GapC3 (Tp28241), and Histone H4 (Tp3184) in T. pseudonana, which were confirmed to be unresponsive to different CO 2 concentrations. 105Transcript levels were calculated with the 2 ÀDDCt method against each internal control separately. 106Then, the average values of DDCt were calculated for each replicate.Primers used are listed in Table S3A.

Confocal fluorescence microscopy
To observe subcellular localizations of GFP fusion proteins, we used confocal laser microscopes A1 (Nikon, Tokyo, Japan) or SP8 (Leica, Wetzlar, Germany).When imaging with the A1, chlorophyll autofluorescence was detected at 662-737 nm after excitation with a 638 nm laser, and GFP fluorescence was monitored at 500-550 nm following excitation at 488 nm.When imaging with the SP8, chlorophyll was excited by a 552 nm laser and detected at 600-750 nm.GFP was excited by a 488 nm laser and detected at 500-520 nm.

Immunoelectron microscopy
The strain of P. tricornutum expressing PtPyShell1a:GFP was fixed by as previously described 29 with small modifications in the polymerization step; the samples immersed in resin were polymerized at À30 C for 5 days under UV light.Thin sections cut with a Leica EM UC7 were mounted on nickel slot grids, followed by an edging step with 1% (w/v) sodium periodate.After the blocking step, the sections were reacted with polyclonal anti-GFP antibody (AnaSpec, Fremont, CA, US) diluted 1:500 in 3% (w/v) BSA in PBS at 25 C overnight.After rinsing with PBS, they were incubated for 60 min at room temperature with a goat anti-rabbit IgG conjugated to 10-nm colloidal gold particles (1:50 diluted in PBS; BBI Solutions, Crumlin, UK).The thin sections were stained with TI blue (Nisshin EM, Aichi, Japan), following washing with distilled water.The sections were observed with a JEM-1011 electron microscope (JEOL, Tokyo, Japan).

Purification of recombinant TpPyShell1 protein
Escherichia coli BL21(DE3) cells expressing TpPyShell1 protein (see culture conditions above) were harvested by two rounds of centrifugation (6,000 rpm, 10 min, 4 C, JLA-9.1000rotor; Beckman), resuspended in Buffer A containing 50 mM Tris-HCl pH8.0, 0.3 M NaCl, 1 mM EDTA and 1 mM DTT, and then frozen for storage at -80 C or used immediately for cell disruption.Frozen cells were resuspended in 200 mL of freshly prepared pre-cooled Buffer A with 0.25 mM PMSF and disrupted by sonication.After cell debris and undisrupted cells were removed by centrifugation (45,000 rpm, 30 min, 4 C, 70Ti rotor; Beckman), supernatant was applied to an open column with Ni-IMAC resin (BIO-RAD).The column was washed with wash buffer (50 mM Tris-HCl pH8.0, 0.3 M NaCl, 10 mM Imidazole, 1 mM EDTA, 1 mM DTT), and then TpPyShell1 protein was eluted from the column using elution buffer (50 mM Tris-HCl pH8.0, 0.3 M NaCl, 300 mM Imidazole, 1 mM EDTA and 1 mM DTT).TEV protease equivalent to 3% of the TpPyShell1 concentration was added to the fractionated solution.His-tagged TpPyShell1 together with TEV protease was dialyzed overnight at 4 C in SnakeSkin dialysis tube (Thermo Fisher Scientific) against dialysis buffer (50 mM Tris-HCl pH8.0, 0.3 M NaCl, 1 mM EDTA and 1 mM DTT).The dialyzed solution was applied to an open column with Ni-IMAC resin equilibrated by the dialysis buffer, and the His-tag free TpPyShell1 was eluted.Collected TpPyShell1 was applied to Superdex75 16/60 equilibrated with the dialysis buffer, and the fraction containing TpPyShell1 was collected.The fraction containing TpPyShell1 was further concentrated by centrifugation (4000 g, 15 min, 4 C, SX4400 rotor; Beckman) with Amicon Ultra (M.W.4000) to reach a TpPyShell1 concentration of 2.0 mg/mL.
Cryo-EM grid prep and data acquisition 3.0 mL of TpPyShell1 protein solution (purified as described above) was applied to a glow-discharged Quantifoil holey carbon grid (R1.2/1.3,Cu, 200 mech), blotted for 3.5 sec at 4 C and plunge-frozen into liquid ethane using a Vitrobot Mark IV (Thermo Fisher Scientific).The grid was inserted into a Titan Krios (Thermo Fisher Scientific) operating at an acceleration voltage of 300 kV and equipped with a Cs corrector (CEOS, GmbH).Images were recorded with a K3 direct electron detector (Gatan) in CDS mode with an energy filter at a slit width of 20 eV.Data were automatically collected using SerialEM software 93 at a physical pixel size of 0.87 A ˚, with 52 frames at a dose of 0.96 e -/A ˚2 per frame, an exposure time of 2.63 sec per movie, and defocus ranging from -0.5 to -1.7 mm.A total of 5,951 movies were collected.

Cryo-EM image processing and model building
The movie frames were subjected to beam-induced motion correction using MotionCorr2.1, 85and the contrast transfer function (CTF) was evaluated using Gctf. 79Motion correction and CTF estimation were processed using RELION 3.1. 73The motion corrected micrographs were imported into cryoSPARC ver.4.0.2, 78 and approximately 500 particles were manually selected from 10 micrographs to perform two-dimensional (2D) classification.Using a good 2D class average image as a template, a total of 800,778 particle images were automatically picked from all micrographs in a filament tracer job and were extracted with a box size of 150 pixels with 4x binning.After two rounds of 2D classification, 299,145 particles were selected and extracted with a box size of 600 pixels.The reextracted particles were subjected to refinement without helical parameters.Even after refinement, symmetry search for the helical parameters were not convincing.Therefore, preliminarily modelled structures were manually fitted into an EM density map to estimate two helical parameters, and the helical parameters were determined to be rise of 25 A ˚and twist of -30 degrees.Based on these helical parameters, 444,736 particle images were automatically picked from all micrographs using Topaz 80 and were extracted with a box size of 170 pixels with 4x binning using RELION 4.0. 77After 2D classification (Figures S5B and S5C), selected particles were sorted into 4 classes by 3D classification (Figure S5D).A total of 322,887 particles were re-extracted at a pixel size of 1.09A ˚and subjected to five rounds of helical refinement, three rounds of CTF refinement, and Bayesian polishing.The 3D refinement and post-processing yielded a map with global resolution of 2.6 A ˚, according to Fourier shell correlation (FSC) with the 0.143 criterion.The refined values of helical rise and twist were 25.13A ˚and -32.47 degrees, respectively.From the 3D refinement, the dimer of TpPy-Shell1 proteins was found to be the asymmetric unit; the helical parameters of the dimer were determined to be rise of 4 A ˚and twist of -56 degrees.The final 3D refinement and post-processing yielded a map with global resolution of 2.4 A ˚, according to Fourier shell correlation (FSC) with the 0.143 criterion (Figure S5E).The final refined values of helical rise and twist for the dimer were 3.59 A ånd -56.07 degrees, respectively.Local resolution was estimated using RELION 4.0 (Figure S5F).The processing workflow is outlined in Figure S5D.
The model of TpPyShell1 from Trp69 to Phe321 (excluding the chloroplast targeting sequence) (Figures 3B and S5G) was built starting from the predicted AlphaFold2 model. 81After manually fitting this predicted model into the EM density map using UCSF Chimera, 91 each domain was manually remodeled and refined iteratively using COOT, 82 Phenix, 83 and the Servalcat pipeline in RE-FMAC5. 84In one monomer of the dimer model, the N-terminus modeling was extended by 5 residues, revealing the shape of a putative short a-helix, but the side chains of these residues could not be assigned.All figures were prepared using UCSF ChimeraX. 92he statics of the 3D reconstruction and model refinement are summarized in Table S4.
Cryo-ET sample prep and data acquisition T. pseudonana and P. tricornutum were grown under normal atmosphere in F/2 artificial seawater at 18 C and 40 mmol photons m À2 s À1 light without shaking.For the T. pseudonana m1 and m2 mutants (DTpPyShell1/2), 5 mM Na 2 CO 3 was supplemented in the medium.Cells were sedimented at 800 3 g for 5 min prior to vitrification, 4 mL of cell suspension was applied on 200-mesh R1/4 SiO 2 -film covered gold grids or 200-mesh R2/1 carbon-film covered copper grids (Quantifoil Micro Tools) (for P. tricornutum and T. pseudonana cells, respectively) and plunge frozen using a Vitrobot Mark IV (Thermo Fisher Scientific).EM grids were clipped into Autogrid supports (Thermo Fisher Scientific) and loaded into Aquilos 1 or 2 FIB-SEM instruments (Thermo Fisher Scientific), where they were thinned with a Gallium ion beam as previously described. 38The resulting EM grids with thin lamellae were transferred to a transmission electron microscope for tomographic imaging.
For microscopes M1 and M2, tilt-series were obtained using SerialEM 3.8 software. 93In all cases, tilt-series were acquired using a dose-symmetric tilt scheme, 107 with 2 steps totaling 60 tilts per series.Each image was recorded in counting mode with ten frames per second.The target defocus of individual tilt-series ranged from À2 to À5 mm.Total dose per tilt series was approximately 120 e -/A ˚2.Image pixel sizes for microscopes M1 and M2 were 3.52 and 2.143 A ˚, respectively.For microscope M3, tilt-series were obtained using the Tomography 5.11 software (Thermo Fisher Scientific), using the same acquisition scheme as above, except for the use of multi-shot acquisition.Data was acquired in EER mode with a calibrated image pixel size of 2.93 A ˚.
In total, we acquired 21 tomograms for WT P. tricornutum, 40 tomograms for WT T. pseudonana, 8 tomograms for mutant m1, and 7 tomograms for mutant m2.Each tomogram was acquired from a different chloroplast.All cells were in asynchronous culture, and multiple cultures were imaged for each strain.
Cryo-ET data analysis TOMOMAN Matlab scripts (version 0.6.9) 88were used to preprocess the tomographic tilt series data.Raw frames were aligned using MotionCor2 (version 1.5.0), 85then tilt-series were dose-weighted 108 followed by manual removal of bad tilts.The resulting tilt-series (binned 4 times, pixel sizes: 14.08 A ˚for M1, 8.57A ˚for M2, 11.6 A ˚for M3) were aligned in IMOD (version 4.11) 109 using patch tracking and were reconstructed by weighted back projection.Cryo-CARE (version 0.2.1) 90 was applied on reconstructed tomogram pairs from odd and even raw frames to enhance contrast and remove noise.Snapshots of denoised tomograms were captured using the IMOD 3dmod viewer.Denoised tomograms were used as input for automatic segmentation using MemBrain. 86The resulting segmentations were manually curated in Amira (version 2021.2).
Subtomogram averaging of T. pseudonana PyShell For subtomogram averaging, only data from microscope M2 was used.Segmented surfaces corresponding to the PyShell were used as input to determine initial normal vectors in MemBrain's point and normal sampling module. 110Vectors were sampled densely on the surface with a spacing of 1.5 voxels at bin4 (8.572A ˚/px).The resulting positions were used as initial coordinates to extract subvolumes (box size of 32 pixels) from bin4 tomograms corrected for the contrast transfer function (CTF) using phase flipping in IMOD.Starting from the normal vectors determined in MemBrain, multiple rounds of subtomogram alignment, averaging and classification were carried out in STOPGAP software. 89False positives and poorly aligning particles were removed by classification steps.A second round of extraction, alignment and classification was performed at bin2 (4.286A ˚/px), starting with the coordinates from the previous round of averaging in bin4.During particle alignment a maximum resolution of 16 A ˚was allowed to prevent overfitting.For resolution estimation and map filtering, the half-maps from the final round of alignment were postprocessed in RELION using a soft disk-shaped mask.
Comparison of density maps from in vitro SPA and in situ STA The 2.4 A ˚-resolution SPA density map of TpPyShell1 was unrolled from a tube to a flat sheet using the unroll command in ChimeraX. 92ifferent inner radii for the unroll operation were tested with the aim to match the lateral periodicity measured in the 2D class average of TpPyShell1 flat sheets (Figures S6B-S6D).The flattened maps obtained using the measured inner tube radius and the measured sheet periodicity were then rigid-body fit into the STA density and resampled on the grid of the STA map to place them at the same box and pixel size.The same disk-shaped soft mask used for postprocessing the STA map was then applied to both unrolled, resampled versions of the SPA map.Finally, these two maps had their power spectrum matched to that of the STA map using the re-lion_image_handler program.The correlation scores between the matched SPA and STA maps were measured at each 10 in-plane rotation angle in ChimeraX; this was performed for both versions of the unrolled SPA map.For Figure 3G, the unrolled 2.4 A ˚-resolution SPA density map was fit with models of individual TpPyShell1 monomers (using ChimeraX rigid body fitting) to obtain a model of a flat TpPyShell1 lattice.

Photosynthesis measurements
T. pseudonana WT, m1, and m2 cells were cultured under LC and HC (as described above in ''Diatom Cultures'') with 40 mmol photons m À2 s À1 light, harvested at the logarithmic growth phase, and resuspended in freshly prepared DIC-free F/2 artificial seawater.Chlorophyll a concentration of the samples was determined in 100% (v/v) methanol, 111 and the cell samples were applied to an oxygen electrode (Hansatech, King's Lynn, U.K.) at 10 mg chlorophyll a mL À1 in the DIC-free F/2 artificial seawater (pH 8.1).Simultaneous measurement of net O 2 evolution rate with total DIC concentration in the sample mixture was achieved by an oxygen electrode and a gas-chromatography flame ionization detector (GC-8A, Shimadzu, Kyoto, Japan) during stepwise addition of NaHCO 3 , as previously reported. 29Measurements used constant actinic light of 900 mmol photons m À2 s À1 .The photosynthetic parameters were calculated from the plot of O 2 evolution rate against DIC concentration by curve fitting with the non-linear least squares method: P max , maximum net O 2 evolution rate; K 0.5 , DIC concentration giving half of P max ; [DIC] comp , [DIC] giving no net O 2 evolution; and APC, apparent photosynthetic conductance.See Table S2.

FIB-SEM data acquisition and analysis
Cells were grown under the same conditions as for the cryo-ET analysis.Sample preparation was performed as in. 9FIB-SEM tomography was performed with a Zeiss CrossBeam 550 microscope, equipped with Fibics Atlas 3D software for tomography.The voxel size was 16x16x16 nm for the WT, 6x6x6 nm for the m1 mutant, and 8x8x8 nm for the m2 mutant.The whole volumes were imaged with an average of 300 frames for WT and 1000 frames for the mutants.Single cells were isolated by cropping in 3D using the open software Fiji. 94Image misalignment was corrected using the "StackReg" plugin in Fiji.We used 3D Slicer 99 for segmentation and 3D reconstruction, and Meshlab 98 to reduce noise and enhance contours of reconstructed objects.The quantitative measurements of chloroplasts and pyrenoids organelles (volume, diameter, sphericity) were implemented in python using libraries including trimesh, stl and scikit-image.

Phylogenetic analysis
Homologs of T. pseudonana TpPyShell1 were retrieved from the National Centre for Biotechnology Information and the Marine Microbial Eukaryote Transcriptome Sequence Project (MMETSP). 112The highest scoring sequences per species were selected (E-value cutoff = 1eÀ35).Gaps and non-conserved regions were removed, and the protein sequences were subsequently aligned using Clustal Omega. 95The alignment was used to generate a maximum likelihood tree, using IQTREE with standard settings and visualized with iTOL. 96,97Taxonomic distribution of the TpPyShell1 protein sequence in the ocean was queried against the EUK_SMAGs dataset 63 in the Ocean Gene Atlas v2.0 webserver (https://tara-oceans.mio.osupytheas.fr). 64,113ANTIFICATION AND STATISTICAL ANALYSIS LC-MS/MS data were quantified with Proteome Discoverer 1.4 (Thermo Fisher Scientific) (Table S1).For qRT-PCR experiments, DDCt values 106 were calculated as described in the STAR Methods section above.The resolution of cryo-EM SPA maps and cryo-ET STA maps were estimated by ''gold-standard'' Fourier shell correlation (FSC), using independent half-maps and the 0.143 criterion 114 (Figures S4B and S5E).All statistics of the atomic model built into the cryo-EM map were analyzed using Phenix 83,115 (Table S4).Correlation scoring between the SPA and STA maps (Figure S6G) was performed with ChimeraX. 92Quantitative measurements of FIB-SEM data were implemented in python, with scripts deposited on GitHub and Zenodo (key resources table); the number of chloroplasts used for each calculation (n) is detailed in the legend of Figures 4G-4J.The number of acquired cryo-EM movies and cryo-ET tilt-series are noted in the relevant STAR Methods sections.Photosynthetic measurements and growth curves (Figures 4B-4D; Table S2) were quantified using three independent biological replicates.
(E) Global distribution of TpPyShell1 homologous sequences in fractions from Tara Oceans sampling, 116 identified in the Ocean Gene Atlas v2.0 64 by searching the EUK_SMAGs dataset, 63 which contains over 700 eukaryotic environmental genomes of diverse lineages (not just diatoms), built from 280 billion metagenomic reads from sunlit oceans in polar, temperate, and tropical regions.(F) Maximum likelihood unrooted gene tree of TpPyShell1 (left), constructed with IQ-TREE, and an algal phylogenetic tree (right).The color in the phylogenetic tree indicates the clade to which each species in the gene tree belongs.Shapes and colors in the gene tree correspond to clades in the phylogenetic tree.The PyShell genes of T. pseudonana and P. tricornutum described in this study (TpPyshell1, 2, 3; PtPyShell1a/1b, 2a/2b) are highlighted in blue and teal, respectively.showing the measured lateral periodicities of unrolling the SPA tube map using different inner radii.The two radii selected for further analysis are indicated in gray and red.(E) The unrolled SPA maps were filtered to 20 A ˚resolution by matching to the power spectrum of the in situ STA map.Isosurface threshold here and in subsequent panels = 1 standard deviation.(F and G) In-plane rotational searches were performed in ChimeraX to compare the unrolled and filtered SPA maps (gray: unrolled with a 100-A ˚radius, red: unrolled with a 65-A ˚radius) to the postprocessed 20-A ˚resolution STA map.All maps were masked with a soft disk-shaped mask to facilitate the rotation and correlation operations in an unbiased manner.The initial 0 orientation was determined by a rigid-body fitting in ChimeraX.(F) Views of overlaying STA and SPA filtered maps at 0 and 90 offset, with correlation scores reported below.(G) Left: diagram of the rotational search.Right: correlation scores for the full 360 search of both filtered SPA maps against the STA map.The best scoring fit was obtained using the SPA density map unrolled with the measured sheet periodicity (65 A ˚radius) and 0 in-plane rotation relative to the STA map.

Figure 1 .
Figure 1.Identification of PyShell proteins in diatoms (A) Proteomics-based workflow for detecting pyrenoid proteins in P. tricornutum.Cells were cultured with (+) or without (À) photo-reactive amino acids (pAAs), photo-crosslinked in vivo with UV irradiation, and then disrupted by sonication.(B and C) The crude extracts were subjected to either (B) SDS-PAGE (''procedure A'') or (C) 22%-55% (w/v) sucrose density gradient centrifugation (''procedure B'').Gel shift of the crosslinked Rubisco was detected by immunoblotting against the Rubisco large subunit (RbcL).Rubisco-containing gels or collected fractions (indicated by blue boxes in B and C) were digested by trypsin and analyzed by LC-MS/MS (for list of candidate Rubisco interactors, see Table S1).(D) Confocal images of PtPyShell1a:GFP in P. tricornutum (top row) and TpPyShell1:GFP in T. pseudonana (bottom row).See Figure S2 for additional examples.Scale bars: 5 mm.(E) Immunogold-labeling transmission electron microscopy (TEM) of a P. tricornutum PtPyshell1a:GFP transformant probed with an anti-GFP antibody.Gold particles are indicated by black arrowheads.The darker contrast region in the boxed inset corresponds the pyrenoid's Rubisco matrix.Scale bar: 200 nm.See also Figures S1 and S2.
). (D) Confocal images of PtPyShell1a:GFP in P. tricornutum (top row) and TpPyShell1:GFP in T. pseudonana (bottom row).See Figure S2 for additional examples.Scale bars: 5 mm.(E) Immunogold-labeling transmission electron microscopy (TEM) of a P. tricornutum PtPyshell1a:GFP transformant probed with an anti-GFP antibody.Gold particles are indicated by black arrowheads.The darker contrast region in the boxed inset corresponds the pyrenoid's Rubisco matrix.Scale bar: 200 nm.See also Figures S1 and S2 .

Figure 2 .
Figure 2. In situ cryo-ET reveals the native architecture of the PyShell inside diatom cells (A-D) Magenta labels and arrowheads: P. tricornutum.Orange labels and arrowheads: T. pseudonana.(A and C) Two-dimensional (2D) overview slices through tomograms and (B and D) corresponding 3D segmentations (green: thylakoids, blue: Rubisco complexes, magenta or orange: PyShell).Scale bars: 100 nm.(E and F) Close-up views of native PyShells (marked by arrowheads) in both diatom species.Scale bars: 50 nm.(Gand H) Comparison of pyrenoid ends.In P. tricornutum, there is a gap in the PyShell that allows entry of two specialized thylakoids into the pyrenoid.In T. pseudonana, two apposing sheets of the PyShell bind each other to seal the pyrenoid matrix.Scale bars: 50 nm.(I) Molecular details of the PyShell in raw tomograms.Left: overview revealing a stripe pattern when the PyShell twists to show its surface view (red arrowhead: particles inside the lumen of traversing thylakoid).Center: zoom-in on the surface view, with the major stripes of the PyShell lattice marked with yellow arrowheads.Right: zoom-in on a cross-section view, showing an apparent lattice of dimers.Scale bars: 50 nm in left, 10 nm in middle, 5 nm in right.(J and K) Subtomogram average (STA) of the PyShell from T. pseudonana, displayed in 3D isosurface view (J), as well as 2D slices (K) showing the surface view (yellow arrowheads: major stripes of lattice) and cross-section view.Scale bars: 10 nm in left, 5 nm in right.See FigureS3for additional cryo-ET images from both species.See also FiguresS3 and S4and Videos S1 and S2.

Figure 3 .
Figure 3. High-resolution in vitro structure of the T. pseudonana PyShell lattice (A) Cryo-EM density map obtained by single-particle analysis (SPA) and helical reconstruction of TpPyShell1, which assembles into a tube in vitro.Global resolution: 2.4 A ˚(see Figures S5E and S5F).(B) Cartoon model of the TpPyShell1 monomer.The two b-sheets (each composed of eight b-strands) and the adjacent a-helix are indicated in teal, purple, and pink, respectively.See Figure S5H for the monomer's pseudo-2-fold symmetry.(C) Models of six TpPyShell1 monomers fit into the cryo-EM density map from (A) (yellow).The minimal building block of the tube's lattice is a homodimer of TpPyShell1 proteins (outlined in orange), which are flipped and rotated 90 relative to each other.(D) Schematic representation of this lattice arrangement, with hands indicating the flipping and rotation of monomers.The pinky finger represents the C-terminal domain (C-term).(E)Surface model representation of a homodimer unit from the lattice.The C-term extends and contacts a putative pocket in the adjacent monomer (see also FigureS5J).
). (B) Cartoon model of the TpPyShell1 monomer.The two b-sheets (each composed of eight b-strands) and the adjacent a-helix are indicated in teal, purple, and pink, respectively.See Figure S5H for the monomer's pseudo-2-fold symmetry.(C) Models of six TpPyShell1 monomers fit into the cryo-EM density map from (A) (yellow).The minimal building block of the tube's lattice is a homodimer of TpPyShell1 proteins (outlined in orange), which are flipped and rotated 90 relative to each other.(D) Schematic representation of this lattice arrangement, with hands indicating the flipping and rotation of monomers.The pinky finger represents the C-terminal domain (C-term).
(F) Comparison of the in situ STA map from Figure 2J (yellow), with the in vitro SPA map (red), which has been unrolled to a flat lattice and filtered to the same 20-A resolution as the STA map.Both isosurfaces displayed with thresholds of 1.5 standard deviations.Left: overlay of the two maps in their best fitting orientation (see Figure S6 for details).Right: side views of the two maps, corresponding to cross-section views through the PyShell lattice.(G) Lattice model generated by fitting TpPyShell1 monomer structures into the full-resolution unrolled SPA map.The same cross-section views are shown as in (F).See also Figures S5 and S6 and Video S3.
(G-J) Morphometric quantification of pyrenoids (Rubisco matrix regions) from FIB-SEM data: (G) volume per pyrenoid, (H) pyrenoid sphericity, (I) number of pyrenoids per cell, (J) percent chloroplast volume occupied by pyrenoid, thylakoids, and stroma.Boxplots in (G)-(I) show median (center line), 75%-25% percentiles (box borders), and max-min values (whiskers).Error bars in (J): standard deviation (n chloroplasts = 8 WT; 4 m1; 17 m2).(K-R) Cryo-ET of m1 and m2 cells.Overviews (K and M: tomographic slices, L and N: 3D segmentations) show higher sphericity of the Rubisco matrix and failure of specialized thylakoids to properly traverse the matrix.(O and P) Close-up tomographic slices showing the defined border of the Rubisco matrix (light blue) in the PyShell mutants.(Q and R) Close-up tomographic slices showing luminal particles (red arrowheads) inside the mislocalized specialized thylakoids.Scale bars: 100 nm.See Figure S7E for additional cryo-ET images of the mutants.See also Figure S7.

Figure 5 .
Figure 5.The PyShell's role in organizing diatom pyrenoid architecture, and open questions about PyShell permeability (A) In wild-type cells, the PyShell encloses the Rubisco matrix, enforcing an elongated pyrenoid shape.One or two specialized thylakoids traverse the long axis of the pyrenoid.Carbonic anhydrase (CA) inside the lumen of these thylakoids generates CO 2 , which diffuses through the thylakoid membranes and permeates the Rubisco matrix, enabling efficient carbon fixation.In the PyShell-deficient mutants (Figure 4), much of the Rubisco matrix remains aggregated by a linker protein but forms a rounder, more ellipsoid shape.The specialized thylakoids fail to bisect the matrix, delocalizing the source of CO 2 from the pyrenoid center.This defective CO 2 -concentrating mechanism (CCM) underlies the mutants' high-CO 2 requiring phenotype.(B) Diagrams of the CCMs in cyanobacterial carboxysomes and C. reinhardtii pyrenoids compared with our present understanding of diatom pyrenoids.All three compartments demonstrate Rubisco clustering and a local CA-generated source of CO 2 .There is evidence that carboxysome shells and C. reinhardtii starch may serve as diffusion barriers that limit leakage of CO 2 and entry of O 2 .It remains to be determined whether the PyShell serves a similar diffusion barrier function for diatom pyrenoids.Given how tightly the PyShell encapsulates the pyrenoid, it is also an open question how sugar substrates (ribulose 1,5-bisphosphate [RuBP]) and products (3-phosphoglyceric acid [3-PGA]) transit between the stroma and the Rubisco matrix.

(
A-H) Left panels show pyrenoid overviews of P. tricornutum (pink) and T. pseudonana (orange), with lettered boxes indicating corresponding pyrenoid regions detailed in panels to the right.For P. tricornutum and T. pseudonana, respectively: (A and E) examples of the pyrenoid ends (tips), which differ between species (PyShell: orange and pink arrowheads); (B and F) pyrenoid-traversing thylakoids (pyr.thyl.), which sometimes have dense particles in the lumen (red arrowheads); (C and G) PyShell surface (side) and cross-section (top) views.Yellow arrowheads: major stripes of the PyShell lattice; (D and H) ordered layers of Rubisco (blue arrowheads) adjacent to the PyShell.Scale bars: 100 nm in overviews, 50 nm in all others.

(
A) Cryo-ET data processing workflow.Determination of initial and final coordinates and vectors (blue/red/yellow arrows) is shown for an example tomogram.Coordinates were initially oversampled along the PyShell segmentation in MemBrain. 110After subtomogram averaging in STOPGAP, 89 the coordinates converged to the repeat of the PyShell lattice.Only one subvolume per coordinate was retained in the final average.(B) Fourier shell correlation (FSC) resolution determination of the resulting STA map, using the 0.143 cutoff.(C) Inclined view of the in situ PyShell STA density map.Scale bar: 10 nm.(D) Angular distribution of particles contributing to the STA map.Red: more populated orientations, blue: less populated orientations.

Figure S6 .
Figure S6.Comparison of the in vitro SPA map with the in situ STA map, related to Figures 3F and 3G

(
A) SPA 3D map of in vitro TpPyShell1 assembled in a tube (Figures3A and S5F).Shown in cross-section view, with measured inner and outer radius indicated.(B) SPA 2D class average of in vitro TpPyShell1 assembled in a flat sheet (FigureS5B).Shown in surface view, with measured lateral repeat indicated.(C) SPA 3D map unrolled with ChimeraX 92 at full resolution, using an unrolling inner radius of 100 A ˚(gray, approximate inner radius of the TpPyShell1 tube) or 65 A (red, resulting in a lateral periodicity matching the TpPyShell1 flat sheet).Lateral periodicities of the two unrolled maps are indicated.(D) Table

Figure S7 .
Figure S7.Generation of DTpPyShell1/2 mutants, O 2 evolution, and additional cryo-ET, related to Figures 4A, 4D, and 4O-4R (A) Schematic representation of simultaneous CRISPR-Cas9 targeting in both TpPyShell loci.PAM, protospacer adjacent motif.(B) Prediction of potential off-target sites of the sgRNAs in the T. pseudonana genome.(C) Resulting genomic deletions in the different TpPyShell1/2 alleles.Asterisk: this is an in-frame deletion; however, no protein product was detected in the mutant (Figure 4A).(D) qPCR analysis of the six PyShell family genes identified in T. pseudonana, showing relative expression of transcripts in the mutant strains m1 and m2 compared with expression in WT cells.Error bars: standard deviation.Cross: expression of TpPyShell3 and TpPyShell6 are already low in the WT background (Figure S1D).(E) Additional cryo-ET of Rubisco condensates in m1 and m2 cells.Light blue: Rubisco matrix; red arrowheads: densities in thylakoid lumen.Scale bars: 100 nm.(F) Dependence of photosynthetic activity (measured by O 2 evolution under 900 mmol photons m À2 s À1 constant actinic light) on DIC concentration (set by supplementing with bicarbonate) in WT cells (gray), m1 (orange), and m2 (yellow).Different symbols: three independent experiments.Cells were either preconditioned in HC or LC conditions; the WT cells preconditioned in LC have a more robust O 2 evolution response in low DIC concentrations, likely due to full activation of their CCM (compare inset panels).
Please cite this article in press as: Shimakawa et al., Diatom pyrenoids are encased in a protein shell that enables efficient CO 2 fixation, Cell (2024), https://doi.org/10.1016/j.cell.2024.09.013 d All original code has been deposited at Zenodo and is publicly available.DOIs are listed in the key resources table.d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.Cell 187, 1-16, October 17, 2024 (Continued on next page)