gRNAde: Geometric Deep Learning for 3D RNA inverse design

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure.

Figure 1: The gRNAde pipeline for 3D RNA inverse design.gRNAde is a generative model for RNA sequence design conditioned on backbone 3D structure(s).gRNAde processes one or more RNA backbone graphs (a conformational ensemble) via a multi-state GNN encoder which is equivariant to 3D roto-translation of coordinates as well as conformer order, followed by conformer order-invariant pooling and autoregressive sequence decoding.

Introduction
Why RNA design?Historical efforts in computational drug discovery have focussed on designing small molecule or protein-based medicines that either treat symptoms or counter the end stages of disease processes.In recent years, there is a growing interest in designing new RNA-based therapeutics that intervene earlier in disease processes to cut off disease-causing information flow in the cell [Damase et al., 2021, Zhu et al., 2022].Notable examples of RNA molecules at the forefront of biotechnology today include mRNA vaccines [Metkar et al., 2024] and CRISPR-based genomic medicine [Doudna and Charpentier, 2014].Of particular interest for structure-based design are ribozymes and riboswitches in the untranslated regions of mRNAs [Mandal andBreaker, 2004, Leppek et al., 2018].In addition to coding for proteins (such as the spike protein in the Covid vaccine), naturally occurring mRNAs contain riboswitches that are responsible for cell-state dependent protein expression of the mRNA.Riboswitches act by 'switching' their 3D structure from an unbound conformation to a bound one in the presence of specific metabolites or small molecules.Rational design of riboswitches will enable translation to be dependent on the presence or absence of partner molecules, essentially acting as 'on-off' switches for highly targeted mRNA therapies in the future [Felletti et al., 2016, Mustafina et al., 2019, Mohsen et al., 2023].
Challenges of RNA modelling.Despite the promises of RNA therapeutics, proteins have instead been the primary focus in the 3D biomolecular modelling community.Availability of large-scale protein structures from the PDB combined with advances in deep learning for structured data [Bronstein et al., 2021, Duval et al., 2023] have revolutionized protein 3D structure prediction [Jumper et al., 2021] and rational design [Dauparas et al., 2022, Watson et al., 2023].Applications of deep learning for computational RNA design are underexplored compared to proteins due to paucity of 3D structural data [Schneider et al., 2023].Most tools for RNA design primarily focus on secondary structure without considering 3D geometry [Churkin et al., 2018] and use non-learnt algorithms for aligning 3D RNA fragments [Han et al., 2017, Yesselman et al., 2019], which can be restrictive due to the hand-crafted nature of the heuristics used.
In addition to limited 3D data for training deep learning models, the key technical challenge is that RNA is more dynamic than proteins.The same RNA can adopt multiple distinct conformational states to create and regulate complex biological functions [Ganser et al., 2019, Hoetzel and Suess, 2022, Ken et al., 2023].Computational RNA design pipelines must account for both the 3D geometric structure and conformational flexibility of RNA to engineer new biological functions.
Our contributions.This paper introduces gRNAde, a geometric deep learning-based pipeline for RNA inverse design conditioned on 3D structure, analogous to ProteinMPNN for proteins [Dauparas et al., 2022].As illustrated in Figure 1, gRNAde generates candidate RNA sequences conditioned on one or more backbone 3D conformations, enabling both single-and multi-state fixed-backbone sequence design.The model is trained on RNA structures from the PDB at 4.0Å or better resolution (12K 3D structures from 4.2K unique RNAs) [Adamczyk et al., 2022], ranging from short RNAs such as riboswitches, aptamers and ribozymes to larger ribosomal RNAs.
We demonstrate the utility of gRNAde for the following design scenarios: • Improved performance and speed over Rosetta.We compare gRNAde to Rosetta [Leman et al., 2020], the state-of-the-art physically based tool for 3D RNA inverse design, for singlestate fixed backbone design of 14 RNA structures of interest from the PDB identified by Das et al. [2010].We obtain higher native sequence recovery rates with gRNAde (56% on average) compared to Rosetta (45% on average).Additionally, gRNAde is significantly faster than Rosetta for inference; e.g.sampling 100+ designs in 1 second for an RNA of 60 nucleotides on an A100 GPU, compared to the reported hours for Rosetta.• Enables multi-state and partial RNA design, which were previously not possible with Rosetta.
gRNAde with multi-state GNNs moderately improves sequence recovery over an equivalent single-state model on a new benchmark of structurally flexible RNAs, especially for surface nucleotides which undergo positional or secondary structural changes.• Zero-shot learning of RNA fitness landscape.In a retrospective analysis of mutational fitness landscape data for an RNA polymerase ribozyme [McRae et al., 2024], we show how gRNAde's perplexity, the likelihood of a sequence folding into a backbone structure, can be used to rank mutants based on fitness in a zero-shot/unsupervised manner and outperforms random mutagenesis for improving fitness over the wild type in low throughput scenarios.2 The gRNAde pipeline 2.1 The 3D RNA inverse folding problem Figure 1 illustrates the RNA inverse folding problem: the task of designing new RNA sequences conditioned on a structural backbone.Given the 3D coordinates of a backbone structure, machine learning models must generate sequences that are likely to fold into that shape.The underlying assumption behind inverse folding (and rational biomolecule design) is that structure determines function [Huang et al., 2016].To the best of our knowledge, gRNAde is the first explicitly multi-state inverse folding pipeline, allowing users to design sequences for backbone conformational ensembles (a set of 3D backbone structures) as opposed to a single structure.Our multi-state design framework aims to better capture RNA conformational dynamics which is often important for functionality in structured RNAs [Ken et al., 2023].

RNA conformational ensembles as geometric multi-graphs
Featurization.The input to gRNAde is a set of 3D RNA backbone structures (a conformational ensemble) and the corresponding sequence of n nucleotides, for instance via PDB files.As shown in Figure 2, gRNAde builds a geometric graph representation for each input structure: 1. We start with a 3-bead coarse-grained representation of the RNA backbone, retaining the coordinates for P, C4', N1 (pyrimidine) or N9 (purine) for each nucleotide [Dawson et al., 2016].This 'pseudotorsional' representation can describe RNA backbone conformations completely in most cases while reducing the size of the torsional space [Wadley et al., 2007].
2. Each nucleotide i is assigned a node in the geometric graph with the 3D coordinate ⃗ x i ∈ R 3 corresponding to the centroid of the 3 bead atoms.Random Gaussian noise with standard deviation 0.1Å is added to coordinates during training to prevent overfitting on crystallisation artifacts, following Dauparas et al. [2022].Each node is connected by edges to its 32 nearest neighbours as measured by the pairwise distance in 3D space, ∥⃗ x i − ⃗ x j ∥ 2 .
3. Nodes are initialized with geometric features analogous to the featurization used in protein inverse folding [Ingraham et al., 2019, Jing et al., 2020]: (a) forward and reverse unit vectors along the backbone backbone from the 5' end to the 3' end, (⃗ x i+1 − ⃗ x i and ⃗ x i − ⃗ x i−1 ); and (b) unit vectors, distances, angles, and torsions from each C4' to the corresponding P and N1/N9.4. Edge features for each edge from node j to i are initialized as: (a) the unit vector from the source to destination node, ⃗    [Jing et al., 2020] to build latent representations of the local 3D geometric neighbourhood of each nucleotide within each state.Representations from multiple states for each nucleotide are then pooled together via permutation invariant Deep Sets [Zaheer et al., 2017], and fed to an autoregressive decoder to predict a probabilities over the four possible bases (A, G, C, U).The probability distribution can be sampled to design a set of candidate sequences.During training, the model is trained end-to-end by minimising a cross-entropy loss between the predicted probability distribution and the true sequence identity.
Multi-graph representation.As described in the previous section, given a set of k structures in the input conformational ensemble, each RNA backbone is featurized as a separate geometric graph , and A (k) , an n × n adjacency matrix.For clear presentation and without loss of generality, we omit edge features and use f , f ′ to denote scalar/vector feature channels.
The input to gRNAde is thus a set of geometric graphs {G (1) , . . ., G (k) } which is merged into what we term a 'multi-graph' representation of the conformational ensemble, M = (A, S, ⃗ V ), by stacking the set of scalar features {S (1) , . . ., S (k) } into one tensor S ∈ R n×k×f along a new axis for the set size k.Similarly, the set of vector features { ⃗ V (1) , . . ., ⃗ V (k) } is stacked into one tensor ⃗ V ∈ R n×k×f ′ ×3 .Lastly, the set of adjacency matrices {A (1) , . . ., A (k) } are merged via a union ∪ into one single joint adjacency matrix A.

Multi-state GNN for representation learning on conformational ensembles
The gRNAde model, illustrated in Figure 3, processes one or more RNA backbone graphs (a conformational ensemble) via a multi-state GNN encoder which is equivariant to 3D roto-translation of coordinates as well as to the ordering of conformers, followed by conformer order-invariant pooling and sequence decoding.We describe each component in the following sections.
Multi-state GNN encoder.When representing conformational ensembles as a multi-graph, each node feature tensor contains three axes: (no. of nodes, no. of conformations, feature channels).We perform message passing on the multi-graph adjacency to independently process each conformer, while maintaining permutation equivariance of the updated feature tensors along both the first (no. of nodes) and second (no. of conformations) axes.This works by operating on only the feature channels axis and generalising the PyTorch Geometric [Fey and Lenssen, 2019] message passing class to account for the extra conformations axis; see Figure 10 and the pseudocode in the Appendix.
We use multiple rotation-equivariant GVP-GNN [Jing et al., 2020] layers to update scalar features s i ∈ R k×f and vector features ⃗ v i ∈ R k×f ′ ×3 for each node i: where MSG, UPD are Geometric Vector Perceptrons, a generalization of MLPs to take tuples of scalar and vector features as input and apply O(3)-equivariant non-linear updates.The overall GNN encoder is SO(3)-equivariant due to the use of reflection-sensitive input features (dihedral angles) combined with O(3)-equivariant GVP-GNN layers.
Our multi-state GNN encoder is easy to implement in any message passing framework and can be used as a plug-and-play extension for any geometric GNN pipeline to incorporate the multi-state inductive bias.It serves as an elegant alternative to batching all the conformations, which we found required major alterations to message passing and pooling depending on downstream tasks.
Conformation order-invariant pooling.The final encoder representations in gRNAde account for multi-state information while being invariant to the permutation of the conformational ensemble.To achieve this, we perform a Deep Set pooling [Zaheer et al., 2017] over the conformations axis after the final encoder layer to reduce S ∈ R n×k×f and ⃗ A simple sum or average pooling does not introduce any new learnable parameters to the pipeline and is flexible to handle a variable number of conformations, enabling both single-state and multi-state design.It may be interesting to ablate the impact of more expressive geometric GNN layers [Joshi et al., 2023] as well as set pooling functions [Maron et al., 2020] in future iterations of gRNAde.
Sequence decoding and loss function.We feed the final encoder representations after pooling, S ′ , ⃗ V ′ , to autoregressive GVP-GNN decoder layers to predict the probability of the four possible base identities (A, G, C, U) for each node/nucleotide.Decoding proceeds according to the RNA sequence order from the 5' end to 3' end.gRNAde is trained in a self-supervised manner by minimising a cross-entropy loss (with label smoothing value of 0.05) between the predicted probability distribution and the ground truth identity for each base.During training, we use autoregressive teacher forcing [Williams and Zipser, 1989] where the ground truth base identity is fed as input to the decoder at each step, encouraging the model to stay close to the ground-truth sequence.
Sampling.When using gRNAde for inference and designing new sequences, we iteratively sample the base identity for a given nucleotide from the predicted conditional probability distribution, given the partially designed sequence up till that nucleotide/decoding step.We can modulate the smoothness or sharpness of the probability distribution by using a temperature parameter.At lower temperatures, for instance ≤1.0, we expect higher native sequence recovery and lower diversity in gRNAde's designs.At higher temperatures, the model produces more diverse designs by sampling from a smoothed probability distribution.We can also consider unordered decoding [Dauparas et al., 2022] and masking or logit biasing during sampling, depending on the design scenario at hand.This enables gRNAde to perform partial re-design of RNA sequences, retaining specified nucleotide identities while designing the rest of the sequence.Similar approaches for functional protein design have been shown to be successful in the wet lab [Sumida et al., 2024].

Evaluation metrics for designed sequences
In principle, inverse folding models can be sampled from to obtain a large number of designed sequences for a given backbone structure.Thus, in-silico metrics to determine which sequences are useful and which ones to prioritise in wet lab experiments are a critical part of the overall pipeline.We currently use the following metrics to evaluate gRNAde's designs, visualised in Appendix Figure 8: • Native sequence recovery, which is the average percentage of native (ground truth) nucleotides correctly recovered in the sampled sequences.Recovery is the most widely used metric for biomolecule inverse design [Dauparas et al., 2022].
• Secondary structure self-consistency score, where we 'forward fold' the sampled sequences using a secondary structure prediction tool (we used EternaFold [Wayment-Steele et al., 2022]) and measure the average Mathew's Correlation Coefficient (MCC) to the ground truth secondary structure, represented as a binary adjacency matrix.MCC values range between -1 and +1, with +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.Similar forward folding metrics, sometimes termed 'designability', have been found to correlate with experimental success in protein design [Watson et al., 2023] 1 .
• Perplexity, which can be thought of as the average number of bases that the model is selecting from for each nucleotide.Formally, perplexity is the average exponential of the negative log-likelihood of the sampled sequences.A perfect model would have perplexity of 1, while a perplexity of 4 means that the model is making random predictions (the model outputs a uniform probability over 4 possible bases).Perplexity does not require a ground truth structure to calculate, and can also be used for ranking sequences as it is the model's estimate of the compatibility of a sequence with the input backbone structure.
3 Experimental Setup 3D RNA structure dataset.We create a machine learning-ready dataset for RNA inverse design using RNASolo [Adamczyk et al., 2022], a novel repository of RNA 3D structures extracted from solo RNAs, protein-RNA complexes, and DNA-RNA hybrids in the PDB.We used structures at resolution ≤4.0Å resulting in 4,223 unique RNA sequences for which a total of 12,011 structures are available (downloaded from RNASolo on 31 October 2023).Dataset statistics are available in Figure 9 in the Appendix, illustrating the diversity of our dataset in terms of sequence length, number of structures per sequence, as well as structural variations among conformations per sequence.
Structural clustering.In order to ensure that we evaluate gRNAde's generalization ability to novel RNAs, we cluster the 4,223 unique RNAs into groups based on structural similarity.We use qTMclust for efficiently applying US-align [Zhang et al., 2022] with a similarity threshold of TM-score > 0.45, and ensure that we train, validate and test gRNAde on structurally dissimilar clusters (see next paragraph).We also provide utilities for clustering based on sequence homology using CD-HIT [Fu et al., 2012], which leads to splits containing biologically dissimilar clusters of RNAs.
Splits to evaluate generalization.After clustering, we split the RNAs into training (∼4000 samples), validation and test sets (100 samples each) to evaluate two different design scenarios: 1. Single-state split.This split is used to fairly evaluate gRNAde for single-state design on a set of RNA structures of interest from the PDB identified by Das et al. [2010], which mainly includes riboswitches, aptamers, and ribozymes (full listing in Table 1).We identify the structural clusters belonging to the RNAs identified in Das et al. [2010] and add all the RNAs in these clusters to the test set (100 samples).The remaining clusters are randomly added to the training and validation splits.2. Multi-state split.This split is used to tests gRNAde's ability to design RNA with multiple distinct conformational states.We order the structural clusters based on median intra-sequence RMSD among available structures within the cluster2 .The top 100 samples from clusters with the highest median intra-sequence RMSD are added to the test set.The next 100 samples are added to the validation set and all remaining samples are used for training.
Validation and test samples come from clusters with at most 5 unique sequences, in order to ensure diversity.Any samples that were not assigned clusters are directly appended to the training set.We also directly add very large RNAs (> 1000 nts) to the training set, as it is unlikely that we want to design very large RNAs.We exclude very short RNA strands (< 10 nts).
Evaluation metrics.For a given data split, we evaluate models on the held-out test set by designing 16 sequences (sampled at temperature 0.1) for each test data point and computing averages for each of the three metrics described in Section 2.4: native sequence recovery, secondary structure MCC self-consistency score, and perplexity.We employ early stopping by reporting test set performance for the model checkpoint for the epoch with the best validation set recovery.Standard deviations are reported across 3 consistent random seeds for all models.
Hyperparameters.All models use 4 encoder and 4 decoder GVP-GNN layers, with 128 scalar/16 vector node features, 64 scalar/4 vector edge features, and drop out probability 0.5, resulting in 2,147,944 trainable parameters.All models are trained for a maximum of 50 epochs using the Adam optimiser with an initial learning rate of 0.0001, which is reduced by a factor 0.9 when validation performance plateaus with patience of 5 epochs.Ablation studies of key modelling decisions are available in Appendix Table 2.   [2010].gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45%) and FARNA (32%).(b) Sequence recovery per sample for Rosetta and gRNAde, shaded by gRNAde's perplexity for each sample.gRNAde's perplexity is correlated with native sequence recovery for designed sequences.Full results available in Appendix Table 1.

Single-state RNA design benchmark
We set out to compare gRNAde to Rosetta, a state-of-the-art physically based toolkit for biomolecular modelling and design [Leman et al., 2020].We reproduced the benchmark setup from Das et al.
[2010] for Rosetta's fixed backbone RNA sequence design workflow on 14 RNA structures of interest from the PDB, which mainly includes riboswitches, aptamers, and ribozymes (full listing in Table 1).We trained gRNAde on the single-state split detailed in Section 3, explicitly excluding the 14 RNAs as well as any structurally similar RNAs in order to ensure that we fairly evaluate gRNAde's generalization abilities vs. Rosetta.
gRNAde improves sequence recovery over Rosetta.In Figure 4, we compare gRNAde's native sequence recovery for single-state design with numbers taken from Das et al. [2010] for Rosetta and FARNA (a predecessor of Rosetta).gRNAde has higher recovery of 56% on average compared to 45% for Rosetta and 32% for FARNA.See Appendix Table 1 for full results.
gRNAde is significantly faster than Rosetta.In addition to superior sequence recovery, gRNAde is significantly faster than Rosetta for high-throughout design pipelines.Training gRNAde from scratch takes roughly 2-6 hours on a single A100 GPU, depending on the exact hyperparameters.Once trained, gRNAde can design hundreds of sequences for backbones with hundreds of nucleotides in ∼1 second with GPU acceleration.On the other hand, Rosetta takes order of hours to produce a single design due performing expensive Monte Carlo optimisations3 .Deep learning methods like gRNAde are arguably easier to use since no expert customization is required and setup is easier compared to Rosetta [Dauparas et al., 2022], potentially making RNA design more broadly accessible.
gRNAde's perplexity correlates with recovery.In Figure 4b, we plot native sequence recovery per sample for Rosetta vs. gRNAde, shaded by gRNAde's average perplexity for each sample.Perplexity is an indicator of the model's confidence in its own prediction (lower perplexity implies higher confidence) and appears to be correlated with native sequence recovery.In Section 4.3, we further demonstrate the utility of gRNAde's perplexity for unsupervised learning of RNA fitness landscapes.

Multi-state RNA design benchmark
Structured RNAs often adopt multiple distinct conformational states to perform biological functions [Ken et al., 2023] In order to evaluate gRNAde's multi-state design capabilities, we trained equivalent single-state and multi-state gRNAde models on the multi-state split detailed in Section 3, where the validation and test sets contain progressively more structurally flexible RNAs as measured by median RMSD among multiple available states for an RNA.
Multi-state gRNAde boosts sequence recovery.In Figure 5, we compared a single-state variant of gRNAde with otherwise equivalent multi-state models (with up to 3 and 5 states, respectively) in terms of native sequence recovery and secondary structure self-consistency score5 .Multi-state variants show marginal improvements for both metrics.As a caveat, it is worth noting that multi-state models consume more GPU memory than an equivalent single-state model during mini-batch training (approximate peak GPU usage for max.number of states set to 1: 12GB, 3: 28GB, 5: 50GB on a single A100 with otherwise equivalent hyperparameters).
Improved recovery in structurally flexible regions.In Figure 6, we evaluated gRNAde's multi-state sequence recovery at a fine-grained, per-nucleotide level.gRNAde with multi-state GNN architectures improves sequence recovery over its single-state variant on structurally flexible nucleotides, as characterised by undergoing changes in base pairing/secondary structure, higher average RMSD between 3D coordinates across states, and larger solvent accessible surface area.

Zero-shot ranking of RNA fitness landscape
Lastly, we explored the use of gRNAde as an zero-shot ranker of mutants in RNA engineering campaigns.Given the backbone structure of a wild type RNA of interest as well as a candidate set of mutant sequences, we can compute gRNAde's perplexity of whether a given sequence folds into the backbone structure.Perplexity is inversely related to the likelihood of a sequence conditioned on a structure, as described in Section 2.4.We can then rank sequences based on how 'compatible' they are with the backbone structure in order to select a subset to be experimentally validated in wet labs.
Retrospective analysis on ribozyme fitness landscape.A recent study by McRae et al. [2024] determined a cryo-EM structure of an RNA polymerase ribozyme at 5Å resolution6 , along with a fitness landscape of ∼75,000 mutant sequences for the catalytic subunit 5TU.We design a retrospective study using this data of (sequence, fitness value) pairs where we simulate an RNA engineering campaign with the aim of improving fitness over the wild type sequence.
We consider various design budgets ranging from hundreds to thousands of sequences selected for experimental validation, and compare 4 unsupervised approaches for ranking/selecting variants: (1) random choice from all ∼75,000 sequences; (2) random choice from all 449 single mutant sequences; (3) random choice from all single and double mutant sequences (as sequences with higher mutation order tend to be less fit); and (4) negative gRNAde perplexity (lower perplexity is better).For each design budget and ranking approach, we compute the expected maximum change in fitness over the wild type that could be achieved by screening as many variants as allowed in the given design budget.We run 10,000 simulations to compute confidence intervals for the 3 random baselines.
gRNAde outperforms random baselines in low design budget scenarios.Figure 7 illustrates the results of our retrospective study.At low design budgets of up to hundreds of sequences, which are relevant in the case of a low throughput fitness screening assay, gRNAde outperforms all random baselines in terms of the maximum change in fitness over the wild type.The top 10 mutants as ranked by gRNAde contains a sequence with 4-fold improved fitness, while the top 200 leads to a 5-fold improvement7 .Beyond design budgets of thousands, random selection from double mutants starts outperforming gRNAde.
Perspective.Overall, it is promising that gRNAde's perplexity is somewhat correlated with experimental fitness measurements out-of-the-box (zero-shot) and can be a useful ranker of mutant fitness in our retrospective study.In realistic design scenarios, improvements could likely be obtained by fine-tuning gRNAde on a low amount of experimental fitness data.For example, latent features from gRNAde may be finetuned or used as input to a prediction head with supervised learning on fitness landscape data.This study acts as a sanity check before committing to wet lab validation of gRNAde designs.We see random mutagenesis and directed evolution-based approaches as complementary to de-novo design and inverse folding approaches like gRNAde.Random mutagenesis can be thought of as local exploration around a wild type sequence, optimising fitness within an 'island' of activity.Structure-based design approaches are akin to global jumps in sequence space, with the potential to find new islands further away from the wild type [Huang et al., 2016].et al., 2024], we retrospectively analyse how well we can rank variants at multiple design budgets using random selection vs. gRNAde's perplexity for mutant sequences conditioned on the backbone structure.Note that gRNAde is used zero-shot here, i.e. it was not fine-tuned on any assay data.For stochastic strategies, bars indicate median values, and error bars indicate the interquartile range estimated from 10,000 simulations per strategy and design budget.At low throughput design budgets of up to ∼500 sequences, selecting mutants using gRNAde outperforms random baselines in terms of the expected maximum improvement in fitness over the wild type.In particular, gRNAde performs better than single site saturation mutagenesis, even when all single mutants are explored (449 mutants for the RNA polymerase ribozyme in McRae et al. [2024]).

Conclusion
We introduce gRNAde, a geometric deep learning pipeline for RNA sequence design conditioned on one or more 3D backbone structures.gRNAde is superior to the physically based Rosetta for 3D RNA inverse folding in terms of performance, inference speed, and ease of use.Additionally, gRNAde enables explicit multi-state design for structurally flexible RNAs which was previously not possible with Rosetta.To the best of our knowledge, gRNAde is also the first geometric deep learning architecture for multi-state biomolecule representation learning; the model is generic and can be repurposed for other learning tasks on conformational ensembles, including multi-state protein design.Key limitations of gRNAde and avenues for future development include the lack of support for multiple interacting RNA chains, or accounting for biomolecular interactions of RNAs with proteins, small molecules, and other ligands.More advanced user configurations, such as negative design against undesired conformations or modulating conformational propensities are also not currently possible.Finally, we are hopeful that advances in RNA structure determination and computationally assisted cryo-EM [Kappel et al., 2020, Bonilla andKieft, 2022] will further increase the amount of RNA structures available for training geometric deep learning models in the future.

A Related Work
We attempt to briefly summarise recent developments in RNA structure modelling and design, with an emphasis on deep learning-based approaches.
RNA inverse folding.Most tools for RNA inverse folding focus on secondary structure without considering 3D geometry [Churkin et al., 2018, Runge et al., 2019] and approach the problem from the lens of energy optimisation [Ward et al., 2023].Rosetta backbone re-design [Das et al., 2010] is the only energy optimisation-based approach that accounts for 3D structure.Deep neural networks such as gRNAde can incorporate 3D structural constraints and are orders of magnitude faster than optimisation-based approaches; this is particularly attractive for high-throughput design pipelines as solving the inverse folding optimisation problem is NP hard [Bonnet et al., 2020].
RNA structure design.Inverse folding models for protein design have often been coupled with backbone generation models which design structural backbones conditioned on various design constraints [Watson et al., 2023, Ingraham et al., 2023, Didi et al., 2023].Current approaches for RNA backbone design use classical (non-learnt) algorithms for aligning 3D RNA motifs [Han et al., 2017, Yesselman et al., 2019], which are small modular pieces of RNA that are believed to fold independently.Such algorithms may be restricted by the use of hand-crafted heuristics and we plan to explore data-driven generative models for RNA backbone design in future work.
RNA structure prediction.There have been several recent efforts to adapt protein folding architectures such as AlphaFold2 [Jumper et al., 2021] and RosettaFold [Baek et al., 2021] for RNA structure prediction [Li et al., 2023b, Wang et al., 2023, Baek et al., 2024].A previous generation of models used GNNs as ranking functions together with Rosetta energy optimisation [Watkins et al., 2020, Townshend et al., 2021].None of these architectures aim at capturing conformational flexibility of RNAs, unlike gRNAde which represents RNAs as multi-state conformational ensembles.Neither can structure prediction tools be used for RNA design tasks as they are not generative models.
RNA language models.Self-supervised language models have been developed for predictive and generative tasks on RNA sequences, including general-purpose models such as RNA FM [Chen et al., 2022] and RiNaLMo [Penic et al., 2024] as well as mRNA-specific CodonBERT [Li et al., 2023a].RNA sequence data repositories are orders of magnitude larger than those for RNA structure (eg.RiNaLMo is trained on 36 million sequences).However, standard language models can only implicitly capture RNA structure and dynamics through sequence co-occurence statistics, which can pose a chellenge for designing structured RNAs such as riboswitches, aptamers, and ribozymes.RibonanzaNet [He et al., 2024] represents a recent effort in developing structure-informed RNA language models by supervised training on experimental readouts from chemical mapping, although RibonanzaNet cannot be used for RNA design.Inverse folding methods like gRNAde are language models conditioned on 3D structure, making them a natural choice for structure-based design.

B FAQs on using gRNAde
How to chose the number of states to provide as input to gRNAde?In general, this would depend on the design objective.For instance, designing riboswitches may necessitate multi-state design, while a single-state pipeline may be more sensible for locking an aptamer into its bound conformation [Yesselman et al., 2019].Note that it may be possible to benefit from multi-state gRNAde models even when performing single-state design by using slightly noised variations of the same backbone structure as an input conformational ensemble.
How to prioritise or chose amongst designed sequences?We have currently provided 3 evaluation metrics: native sequence recovery, secondary structure self-consistency score and perplexity, towards this end.We suspect that recovery may not be the ideal choice, except for design scenarios where we require certain regions of the RNA to be conserved or native-like.Self-consistency score may be a more holistic evaluation metric as it accounts for alternative base pairings leading to similar structures, but may inherit the limitations of the secondary structure prediction method used as part of its computation.In realistic design scenarios, we can pair gRNAde with another machine learning model (an 'oracle') for ranking or predicting the suitability of designed sequences for the objective (for instance, binding affinity or some other notion of fitness).We hope to conduct further experimental validation of gRNAde designs in the wet lab in order to better understand these tradeoffs.(c) Average pairwise RMSD per sequence.For 1,547 sequences with multiple structures, there is significant structural diversity among conformations.On average, the pairwise C4' RMSD among the set of structures for a sequence is greater than 1Å.Table 2: Aggregated benchmark results for gRNAde.Split: Single-and multi-state splits are described in Section 3; models trained on 'All data' use all RNASolo samples for training and are evaluated on the multi-state test set, solely for the purpose of releasing the best possible gRNAde checkpoints for real-world usage.Model: 'AR' implies autoregressive decoding (described in Section 2.3), while 'NAR' implies non-autoregressive, one-shot decoding using an MLP.AR models have significantly higher self-consistency scores than NAR, as autoregressive decoding can condition predictions at each iteration on past predictions while one-shot decoding cannot.Max.#states: Multi-state models marginally improve native sequence recovery over an equivalent single state variant, even for the single-state benchmark.Max.train RNA length: Limiting the maximum length of RNAs used for training often leads to lower sequence recovery and higher self-consistency scores.Results in the main paper are reported for models shaded in gray which are all trained on maximum length of 5000 nucleotides.

Figure 2 :
Figure 2: gRNAde featurizes RNA backbone structures as 3D geometric graphs.Each RNA nucleotide is a node in the graph, consisting of 3 coarse-grained beads for the coordinates for P, C4', N1 (pyrimidines) or N9 (purines) which are used to compute initial geometric features and edges to nearest neighbours in 3D space.Backbone chain figure adapted from Ingraham et al. [2019].
encoded by 32 radial basis functions; and (c) the distance along the backbone, j − i, encoded by 32 sinusoidal positional encodings.

Figure 3 :
Figure 3: gRNAde model architecture.One or more RNA backbone geometric graphs are encoded via a series of SE(3)-equivariant Graph Neural Network layers[Jing et al., 2020] to build latent representations of the local 3D geometric neighbourhood of each nucleotide within each state.Representations from multiple states for each nucleotide are then pooled together via permutation invariant Deep Sets[Zaheer et al., 2017], and fed to an autoregressive decoder to predict a probabilities over the four possible bases (A, G, C, U).The probability distribution can be sampled to design a set of candidate sequences.During training, the model is trained end-to-end by minimising a cross-entropy loss between the predicted probability distribution and the true sequence identity.

Figure 4 :
Figure 4: gRNAde compared to Rosetta for single-state design.(a) We benchmark native sequence recovery of gRNAde, Rosetta, and FARNA on 14 RNA structures of interest identified by Das et al.[2010].gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45%) and FARNA (32%).(b) Sequence recovery per sample for Rosetta and gRNAde, shaded by gRNAde's perplexity for each sample.gRNAde's perplexity is correlated with native sequence recovery for designed sequences.Full results available in Appendix Table1.

Figure 6 :
Figure 6: Per-nucleotide sequence recovery for multi-state design.Multi-state gRNAde shows improved sequence recovery over a single-state model for structurally flexible regions of RNAs, as characterised by nucleotides that tend to undergo changes in base pairing (left), nucleotides with greater average solvent accessible surface area (centre), and nucleotides with higher average RMSD (right) across multiple states.Marginal histograms in blue show the distribution of values.

Figure 7 :
Figure 7: Retrospective study of gRNAde for ranking ribozyme mutant fitness.Using the backbone structure and mutational fitness landscape data from an RNA polymerase ribozyme [McRaeet al., 2024], we retrospectively analyse how well we can rank variants at multiple design budgets using random selection vs. gRNAde's perplexity for mutant sequences conditioned on the backbone structure.Note that gRNAde is used zero-shot here, i.e. it was not fine-tuned on any assay data.For stochastic strategies, bars indicate median values, and error bars indicate the interquartile range estimated from 10,000 simulations per strategy and design budget.At low throughput design budgets of up to ∼500 sequences, selecting mutants using gRNAde outperforms random baselines in terms of the expected maximum improvement in fitness over the wild type.In particular, gRNAde performs better than single site saturation mutagenesis, even when all single mutants are explored (449 mutants for the RNA polymerase ribozyme inMcRae et al. [2024]).

Figure 10 :
Figure 10: Multi-graph tensor representation of RNA conformational ensembles.Listing 1: Pseudocode for multi-state GNN encoder

Figure 8 :
Figure 8: In-silico evaluation metrics for gRNAde designed sequences.We consider (1) the percentage of native nucleotides recovered in designed samples, (2) self-consistency of 'forward folded' secondary structures of designs with the native secondary structure, as well as (3) perplexity of the model's prediction (not shown in the figure).
Number of structures per sequence.The dataset covers a wide range of RNA conformation ensembles, with on average 3 structures per sequence.There are multiple structures available for 1,547 sequences.The remaining 2,676 sequences have one corresponding structure.
RMSD among structures (Å) (d) Bivariate distribution for sequence length vs. avg.RMSD.The joint plot illustrates how structural diversity (measured by avg.pairwise RMSD) varies across sequence lengths.We notice similar structural variations regardless of sequence length.

Figure 9 :
Figure9: RNASolo data statistics.We plot histograms to visualise the diversity of RNAs available in terms of (a) sequence length, (b) number of structures available per sequence, as well as (c) structural variation among conformations for those RNA that have multiple structures.The bivariate distribution plot (d) for sequence length vs. average pairwise RMSD illustrates structural diversity regardless of sequence lengths.
[Stagno et al., 2017switches adopt at least two distinct functional conformation: a ligand bound (holo) and unbound (apo) state, which helps them regulate and control gene expression[Stagno et al., 2017].If we were to attempt single-state inverse design for such RNAs, each backbone structure may lead to a different set of sampled sequences.It is not obvious how to select the input backbone as well as designed sequence when using single-state models for multi-state design 4 .gRNAde's multi-state GNN, descibed in Section 2.3, directly 'bakes in' the multi-state nature of RNA as an architectural inductive bias and designs sequences explicitly conditioned on multiple states.

Table 1 :
Full results for Figure4comparing gRNAde to Rosetta and FARNA for single-state design.