How to predict RNA structures

Introduction

RNA folding prediction is a challenging problem that requires understanding how a nucleotide sequence adopts a stable secondary and tertiary structure. Unlike proteins (where AlphaFold has made great strides), RNA structure prediction still faces hurdles due to limited available training data and complex RNA-specific interactions (When will RNA get its AlphaFold moment? - PubMed) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Contemporary approaches to RNA folding can be broadly divided into three categories: data-driven methods, physics-based simulations, and knowledge-based (human prior) strategies (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). In practice, state-of-the-art solutions often combine elements from all three – for example, using machine learning predictions refined by molecular simulations and guided by expert knowledge. Below, we review recent advances in each area, key tools and models, and how integrating these approaches is improving RNA fold prediction. We also discuss the strengths and limitations of each strategy.

Data-Driven Methods: Machine Learning and Deep Learning

Machine Learning in RNA Secondary Structure Prediction: Data-driven algorithms have increasingly been applied to predict RNA secondary structure (base-pairing patterns). Early methods relied on thermodynamic optimization (free energy minimization) or comparative sequence analysis, but recent work leverages neural networks to learn folding rules from known structures. For example, convolutional and recurrent neural networks have been trained on databases of RNA structures (such as bpRNA or Rfam) to predict which bases pair (Machine learning for RNA 2D structure prediction benchmarked on experimental data - PubMed). Notable models include SPOT-RNA and UFold, which use sequence and evolutionary information to output secondary structures. These deep learning (DL) models can outperform traditional energy-based algorithms on test sets drawn from similar distributions as their training data (Machine learning for RNA 2D structure prediction benchmarked on experimental data - PubMed). For instance, DL-based predictors achieved higher accuracies than classic programs on many Rfam families (Machine learning for RNA 2D structure prediction benchmarked on experimental data - PubMed). Some approaches integrate prior thermodynamic knowledge into the learning process: MXfold2 (Sato et al. 2021) added a thermodynamic regularization term so that the network’s scoring of helices and loops correlates with known free-energy rules (Frontiers__Deep Learning in RNA Structure Studies). This helps ensure the ML predictions obey physical constraints (e.g. penalizing structures that would be energetically unfavorable). Despite these successes, a key challenge is generalization. A benchmarking study found that when predicting structures for novel RNA families not represented in training, deep models often lost their edge – performing no better (and sometimes worse) than “shallow” or physics-based methods (Machine learning for RNA 2D structure prediction benchmarked on experimental data - PubMed). Overfitting to the training distribution means ML models may miss unusual motifs or pseudoknots unless those were present in training data. In summary, data-driven secondary structure predictors are fast and often accurate for common RNA motifs, but they require abundant high-quality training examples and can struggle with unseen structure types.

Deep Learning for RNA Tertiary Structure: Spurred by protein-folding breakthroughs, researchers are developing deep learning methods for full 3D RNA structure prediction (“RNA’s AlphaFold moment”). Several transformer-based architectures have emerged, trained on the limited set of known RNA 3D structures (only a few thousand in the PDB (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC)). These models typically predict inter-nucleotide distances or contacts, then assemble a 3D model. For example, DeepFoldRNA uses a self-attention (transformer) network to predict pairwise geometrical restraints (distances/angles between residues), and then applies gradient-based folding to obtain 3D coordinates (The landscape of RNA 3D structure modeling with transformer networks - PMC). Another approach, trRosettaRNA, adapts the trRosetta framework to RNA by predicting inter-residue distances and orientations, which are then used to construct the 3D structure (The landscape of RNA 3D structure modeling with transformer networks - PMC). In contrast, end-to-end models like RoseTTAFoldNA (an RNA adaptation of RoseTTAFold) attempt to directly output 3D coordinates by incorporating SE(3)-equivariant transformers, blending geometric reasoning into the network (The landscape of RNA 3D structure modeling with transformer networks - PMC). Hybrid strategies also exist: DRFold and RhoFold integrate end-to-end learning with intermediate geometric restraints – for instance, simultaneously learning local structural frames and global distance constraints in a single framework (The landscape of RNA 3D structure modeling with transformer networks - PMC). These diverse deep models have been benchmarked recently. A 2024 systematic study compared five DL-based RNA 3D predictors (including DeepFoldRNA, DRFold, RhoFold, RoseTTAFoldNA, trRosettaRNA) against traditional fragment-assembly methods (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology) (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology). Overall, the machine-learning approaches achieved better accuracy in predicting the overall folds (global topology) of RNAs than physics-based methods, especially when tested on targets similar to their training set (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology). DeepFoldRNA was found to produce the best models on average, with DRFold the next best (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology). These networks have, for the first time, enabled de novo RNA tertiary predictions in a matter of minutes or hours, a significant speed-up over intensive physics simulations. However, their limitations mirror those in secondary structure: performance drops on “orphan” RNAs (with no close homologs or similar examples in training) (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology). Most models also struggle with fine local details, such as non-canonical base pairs and small loops, which are often predicted incorrectly (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Training data scarcity is a fundamental issue – there are two orders of magnitude fewer known RNA structures than proteins (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Consequently, current RNA DL models are forced to generalize from a small corpus, and they may not capture rare RNA motifs or the full flexibility of RNA backbones. In summary, data-driven 3D prediction is an exciting and rapidly advancing field, with new transformer-based tools showing promise. They excel at recognizing overall folding patterns learned from data (often outperforming older methods on global fold metrics (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC)), but may require further refinement to accurately model local interactions and novel folds.

Training Data and Features:

Modern RNA folding networks draw on a variety of data sources for training. Known RNA structures from crystallography and NMR (as compiled in the PDB or RNA 3D Hub) provide the ground truth for supervised learning, albeit in limited numbers. Many methods supplement this with in silico-generated structures or fragments. Multiple sequence alignments (MSAs) of RNA families are an important input feature for some predictors: just as protein models leverage co-evolution, RNA models use covariation signals from homologous sequences to infer which nucleotides pair.

For instance, SPOT-RNA and others incorporate evolutionary coupling scores to guide base-pair predictions. Some 3D predictors (e.g. RoseTTAFoldNA or certain CASP15 models) also take an MSA as input to inform their predictions (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology) – the presence of a deep alignment can significantly boost accuracy, suggesting that incorporating homologous sequence information is beneficial when available (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology).

In cases where an MSA is unavailable (single-sequence prediction), models rely on learned statistical patterns of RNA structure (e.g. common helices, junctions, pseudoknots present in the training set). Other features used include predicted secondary structure (it has been shown that giving a correct 2D structure as input improves 3D prediction quality (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC)), and base-pairing probabilities from secondary structure models. In fact, a recent benchmark noted that the accuracy of an RNA 3D predictor is strongly influenced by the quality of the secondary structure prediction it builds upon (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology).

To mitigate data scarcity, researchers are exploring transfer learning and multitask learning. For example, some frameworks train on easier tasks like secondary structure or base-pair scoring and then fine-tune on 3D structure prediction (this multitask approach can regularize the model). Unsupervised pre-training (like RNA language models that capture sequence patterns) is another avenue to supply additional “knowledge” to data-driven methods. Despite these innovations, a consensus in the field is that purely data-driven RNA folding still falls short of the accuracy achieved in protein folding (When will RNA get its AlphaFold moment? - PubMed). It likely will require either much more data (e.g. high-throughput structure determination or large repositories of RNA chemical probing data) or new architectures that can generalize from fewer examples (When will RNA get its AlphaFold moment? - PubMed). In practice, many pipelines now combine ML predictions with physics-based refinement or known structural motifs to compensate for these shortcomings.

Pros & Cons of Data-Driven Approaches:

Data-driven methods (ML/DL) offer rapid predictions once trained, and they can capture complex sequence-structure relationships automatically from data. They have made RNA secondary structure prediction almost instantaneous and moderately improved 3D structure accuracy on average (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology). Another advantage is that they can leverage subtle signals (covariation, sequence motifs, etc.) that might be hard-coded in known RNAs.

However, their dependency on training data means they may be biased towards frequent motifs and may miss atypical folds. They are also essentially “black boxes,” so interpreting why a certain fold is predicted can be difficult – though efforts exist to interpret attention weights or feature attributions in these models (Frontiers__Deep Learning in RNA Structure Studies). Finally, ML models do not explicitly ensure physical realism – they might predict a structure that is geometrically plausible but not the lowest free energy state, or even one that violates stereochemical constraints (though adding physics-inspired regularization, as done by MXfold2 or some 3D models, helps reduce such errors (Frontiers__Deep Learning in RNA Structure Studies)). Overall, data-driven methods are powerful for leveraging existing knowledge and are rapidly improving, but they benefit greatly from integration with other approaches to handle cases beyond their training experience.

Physics-Based Dynamics Simulation: Molecular Dynamics for RNA Folding

All-Atom MD Simulations:

Physics-based methods approach RNA folding by simulating the molecule’s motions under physical force fields. All-atom molecular dynamics (MD) uses detailed atomic force fields (e.g. AMBER RNA parameters) to model the RNA in an aqueous ionic environment and integrate Newton’s equations of motion. In principle, MD can capture the entire folding trajectory and all intermediate states of an RNA, providing an atomically detailed view of the folding process. In practice, however, straightforward MD of RNA folding is extremely challenging. RNA folding landscapes are rugged with many local minima separated by high energy barriers (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). Standard MD simulation is often not ergodic on accessible timescales – meaning a simulation might get trapped in a metastable state and never sample the native fold within the limited time (typically microseconds to milliseconds of simulated time). Moreover, the computational cost of MD grows quickly with system size: folding even a small RNA (say 30 nucleotides) in all-atom detail might require millions of CPU/GPU hours to reach the native state, which is generally infeasible. As a result, direct MD folding simulations are usually limited to short RNAs (e.g. small hairpins or tetraloops) or to refining already near-native structures. When applied, all-atom MD has illuminated certain folding pathways – for instance, the spontaneous folding of a UUCG tetraloop has been observed in microsecond-scale simulations, validating aspects of the folding mechanism. But longer or more complex RNAs (with pseudoknots, multi-helix junctions, etc.) are beyond the reach of brute-force MD given current computing power. Another practical use of MD in structure prediction is refinement: one can take a coarse model (from either prediction or homology) and run short MD simulations (with or without restraints) to relax clashes and improve local geometry. MD refinement can correct certain local errors in automated models, as recent studies noted that simple energy minimization or short dynamics greatly improved geometrical accuracy of predicted RNA models (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). However, one must be cautious – RNA force fields still have limitations; unrestrained MD can sometimes drift a model away from the native state if the force field has biases (e.g. overstabilizing certain loop conformations).

Enhanced Sampling Techniques:

To overcome timescale limitations, researchers employ enhanced sampling MD methods that accelerate the exploration of RNA conformational space. One major class is Replica Exchange Molecular Dynamics (REMD) (also known as parallel tempering). In temperature-REMD, multiple copies of the RNA simulation are run in parallel at different temperatures, occasionally exchanging configurations (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). High-temperature replicas cross energy barriers more readily (unfolding and refolding), while low-temperature replicas refine stable structures; the exchanges allow each replica to sample a broader portion of the energy landscape than it would at a fixed temperature. Variants like Replica Exchange with Solute Tempering (REST2) focus the temperature differences on the RNA molecule only (keeping solvent at constant temperature) to avoid boiling off secondary structure entirely (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). REMD has been widely used to study RNA hairpins and small riboswitch domains (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). A critical consideration in REMD is ensuring sufficient overlap between adjacent temperature states so exchanges are accepted (the “temperature overlap” concept) – otherwise the replicas won’t mix and sampling gains are limited. With a well-calibrated temperature ladder, REMD can yield ensembles from which the lowest free-energy structure (putative native fold) can be identified.

Another powerful approach is metadynamics.

In metadynamics, one defines a few collective variables (CVs) that capture key folding progress coordinates (for example, the fraction of native contacts formed, or the distance between helix junctions) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). The simulation then periodically adds a bias potential pushing the system away from already sampled values of these CVs (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). Over time, this “fills” energy wells and forces the RNA to escape local minima, thereby mapping out the free energy surface along those CVs. Metadynamics has been applied, for instance, to riboswitch aptamers to explore transitions between open and folded states (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). The success of this method depends on choosing CVs that truly describe the slow degrees of freedom in folding – a non-trivial task for RNA, where one must capture base-pairing, tertiary contacts, and ion-binding states. Recent studies have designed specialized CVs, such as a collective variable tracking multiple tertiary contacts simultaneously, to drive folding of complex RNAs in metadynamics (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC). Other enhanced sampling methods used in RNA folding include umbrella sampling (restraining the simulation at various fixed values of a coordinate to compute a free energy profile) (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC), adaptive biasing force methods (Exploring the energy landscape of riboswitches using collective variables based on tertiary contacts - PMC), and temperature-accelerated MD. These techniques can be combined; for example, one can run REMD where each replica is itself biased by metadynamics (well-tempered metadynamics in replicas). There are also coarse-grained simulations like the oxRNA model (treating each nucleotide as a few beads) that can reach millisecond to second timescales for processes like RNA folding or hybridization. Coarse-grained models sacrifice atomic detail but can reveal large-scale folding mechanisms for RNAs of 100+ nucleotides.

Applications of MD in RNA Structure Prediction:

While MD is typically too slow to serve as a standalone predictor for arbitrary RNA sequences, it plays a crucial role in specific scenarios. One is validating and refining models as mentioned. Another is investigating folding kinetics and pathways, which deep learning cannot do – MD (especially with methods like Markov State Models or reinforced dynamics) can propose how a molecule transitions from unfolded to folded state. For instance, enhanced MD was used to propose pathways for a small hairpin ribozyme folding, identifying metastable intermediates consistent with experiments. Understanding such pathways can indirectly improve prediction (by informing which intermediate conformations are likely or which misfolded states to avoid). MD is also used to compute free energy differences between candidate structures – e.g. comparing two possible folds of a non-coding RNA by alchemical or thermodynamic integration methods to see which is more stable. In structure modeling pipelines, one might generate many candidate 3D structures (via fragment assembly or ML) and then run short MD-based scoring to select the most stable. In this way, physics-based scoring complements the generation step.

Despite improvements, limitations of MD remain. Force fields for RNA are an active area of development – inaccuracies in parameters (e.g. overestimating base stacking or mis-modelling backbone torsions) can lead MD astray (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Moreover, enhanced sampling methods require expertise to set up and interpret; a poorly chosen bias or collective variable might not actually accelerate the correct motions (or worse, could drive the RNA into unphysical conformations). Computational cost is high, although specialized hardware (GPUs) and software optimizations have made microsecond simulations more common. Summing up, physics-based simulations provide a gold-standard physical fidelity and are invaluable for exploring dynamics and refining models. Their main strengths are capturing thermodynamics and kinetics (something data-driven methods lack) and not being limited by training data. However, they are typically too slow and force-field-dependent to rely on alone for de novo structure prediction of all but the smallest RNAs. The most promising use of MD in folding prediction is in tandem with other methods – e.g. using MD to evaluate or polish structures predicted by ML or inferred from homology.

Pros & Cons of Dynamics Simulation:

The advantage of MD and related simulations is that they are rooted in fundamental physics – given enough sampling, one can in principle obtain not just a single structure but the full Boltzmann ensemble of an RNA, revealing alternative folds and flexibility. They do not require prior structures or training, so they can be applied to novel RNAs (with the caveat of force field accuracy). MD can capture transient interactions, ion effects, and mechanistic pathways of folding, providing insight beyond static structure. On the downside, MD is computationally intensive and time-consuming. Even with enhanced sampling, there is no guarantee of reaching the native state for complex folds if the chosen method isn’t well-suited to the system. Results can be sensitive to force field choices and simulation setup (ion concentrations, etc.), which means predictions might require careful validation. In summary, physics-based simulation is an essential tool for RNA folding studies, especially to cross-check or refine predictions, but it is not a high-throughput solution for routine structure prediction of large RNAs. Instead, it’s often used in hybrid pipelines where its strengths compensate for the weaknesses of data-driven or knowledge-based approaches.

Integrating Human Prior Knowledge: Comparative Analysis, Expert Rules, and Reinforcement Learning

Human expertise and biological knowledge have long been central to RNA structure determination, and they continue to inform computational prediction. Comparative sequence analysis is one classical approach: if multiple homologous RNA sequences are available, experts (or algorithms) can create a multiple sequence alignment and identify covarying nucleotide pairs (positions where mutations occur in a correlated fashion, suggesting those nucleotides base-pair with each other). This was historically how many RNA secondary structures (tRNA, rRNA, etc.) were first elucidated. Modern tools like R-scape and Infernal automate covariation detection, and programs such as RNAalifold (part of the ViennaRNA package) can predict a consensus secondary structure from an alignment, which often surpasses single-sequence predictions in accuracy. In RNA folding contests, providing the correct secondary structure (whether from covariation or experimental probing) dramatically improves 3D predictions (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Thus, human-guided alignment and covariation analysis remain invaluable: they essentially inject evolution’s “knowledge” of the structure into the prediction pipeline. The drawback is that many RNAs (especially novel or engineered ones) may lack a large family of homologs to analyze; comparative methods shine when at least a few related sequences (>50 or so) are known, but for unique sequences they offer no help. Additionally, manual alignment refinement by experts can be labor-intensive but can significantly improve covariation signals by ensuring homologous positions are properly aligned.

Expert Knowledge and Motif Libraries:

RNA structures often contain recurrent submotifs (tetraloops, kink-turns, A-minor interactions, pseudoknots, G-quadruplexes, etc.) that an expert might recognize in a sequence. Incorporating known motifs can prevent common errors. For instance, if a predictor outputs a loop that is supposed to be a GNRA tetraloop, an expert knows this loop likely adopts a specific geometry stabilized by a sheared G-A base pair – a detail a generic algorithm might miss. Computationally, such knowledge is used in knowledge-based modeling. Tools like RNAComposer or ModeRNA use fragment libraries extracted from known RNA crystal structures. Given a secondary structure (or alignment), RNAComposer assembles 3D fragments for helices and loops from its library to quickly build a full model (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed) (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). Rosetta’s RNA de novo pipeline (FARNA/FARFAR) similarly relies on known small motifs: it builds structures by piecing together fragments (e.g. helix fragments, loop conformations from PDB) and then uses a knowledge-based scoring function to select plausible structures (Frontiers__Prediction of the RNA Tertiary Structure Based on a Random Sampling Strategy and Parallel Mechanism). The FARFAR2 update improved this process and can incorporate secondary structure restraints – if the base pairs are known, FARFAR2 will enforce them, which greatly constrains the search (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). This highlights how human-provided information (correct base-pairs) helps: in tests, methods like FARFAR2, SimRNA, or Vfold achieved much better accuracy when secondary structure restraints were given than when they had to deduce them on their own (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Experts may also add restraints based on experimental data (e.g. distances from FRET or chemical probing signals) to guide the folding algorithm.

Another way expert knowledge appears is through scoring functions. Knowledge-based statistical potentials (like the BRiQ potential or others ([PDF] RNA Tertiary Structure prediction: Deep Learning vs. Statistical …)) are derived from frequencies of interactions in known RNA structures. These can be used to evaluate candidate models. Human intuition also guides parameter tuning – e.g., penalizing steric clashes or implausible backbone conformations based on known RNA geometry. In short, human prior knowledge enters predictions via curated alignments, template-based modeling, fragment libraries, and handcrafted energy terms, all of which encode insights gleaned from known structures or experiments.

Reinforcement Learning and Interactive Refinement:

Reinforcement Learning (RL) is a machine learning approach that can incorporate objectives and feedback in a way somewhat analogous to how a human trial-and-error process might work. While RL has not yet been broadly adopted for direct RNA 3D structure prediction, there are interesting exploratory projects. One example is using RL for RNA design (the inverse problem of folding). Eastman et al. (2018) trained an RL agent to design sequences that fold into a given target structure, and notably the agent rediscovered strategies that human players of the EteRNA game had developed (Solving the RNA design problem with reinforcement learning__PLOS Computational Biology) (Solving the RNA design problem with reinforcement learning__PLOS Computational Biology) – essentially learning from scratch some “rules” that experts knew, like avoiding certain sequence patterns that cause misfolds. This illustrates RL’s capacity to incorporate complex strategy (which in this case was equivalent to human prior knowledge). On the folding side, Mao et al. (2021) introduced 2dRNA-Fold, an RL + Monte Carlo Tree Search method that learns policies for sequentially forming base pairs in an RNA sequence ((PDF) Learning the Fastest RNA Folding Path Based on Reinforcement Learning and Monte Carlo Tree Search) ((PDF) Learning the Fastest RNA Folding Path Based on Reinforcement Learning and Monte Carlo Tree Search). The goal was to find the fastest folding path to reach the native secondary structure. Interestingly, after training on known structures, 2dRNA-Fold could predict secondary structures by simulating an “agent” that pairs bases step by step, rather than using global free energy minimization. Its accuracy was comparable to other secondary structure predictors, indicating that an RL policy can encode folding heuristics. However, the authors noted the fastest-path criterion might not mirror physical reality ((PDF) Learning the Fastest RNA Folding Path Based on Reinforcement Learning and Monte Carlo Tree Search) – it was more about solving the puzzle quickly than following natural thermodynamics. Still, this approach hints that RL could be used to explore folding mechanisms or to guide the search in tertiary structure prediction by treating it as a sequential decision process (e.g., choosing which helix to form or which loop conformation to adopt next, with a reward for reaching a low-energy state).

Human experts also directly intervene via interactive modeling: adjusting helical orientations, manually docking known substructures, or correcting obvious errors in automated models (for instance, flipping a base that was paired incorrectly). In RNA-Puzzles rounds, top-performing groups often combined automated algorithms with manual refinement steps (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Expert modelers might recognize that a predicted structure misses a known pseudoknot and manually introduce it, or use their judgment to discard implausible models that an algorithm couldn’t distinguish. Reinforcement learning could eventually formalize some of this expert decision-making – e.g., an agent could be trained on past modeling actions to emulate how an expert improves a structure. There is ongoing research into human-in-the-loop prediction systems where user input (such as “these two residues likely interact” or “this region is flexible”) steers the algorithm.

Combining Knowledge with Data and Physics:

The most successful RNA folding prediction efforts nowadays tend to be hybrid. They use ML predictions as a baseline, enforce known secondary structure from experiments or comparative analysis, assemble initial 3D models with fragment libraries (knowledge-based), then refine with energy minimization or short MD (physics-based). Each layer adds a form of prior knowledge or verification. For instance, a pipeline might use a deep learning model to predict an RNA’s contact map, but also incorporate covariation-derived contacts to reinforce true positive pairs. Then it could use Rosetta FARFAR2 to build 3D models consistent with those contacts, and finally run a few nanoseconds of MD on the top models to relax them. By leveraging expert knowledge (in the form of known motifs, alignments, or simply human oversight), the prediction accuracy improves. In fact, a review noted that even with the advent of automation, “user experience is a critical factor in accurately predicting RNA structures” (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). This is a reminder that fully automated methods, while improving, are still catching up to what a skilled RNA structural biologist can do by combining multiple sources of insight. The downside is that relying on human input makes the process less scalable and more subjective. There is also a risk of bias – an expert might impose an incorrect assumption (e.g., expecting a certain motif that isn’t actually present). Reinforcement learning and other systematic approaches to encode human knowledge aim to get the best of both worlds: incorporate expert strategies, but in a rigorous, reproducible way.

Pros & Cons of Human Knowledge Integration:

The clear advantage is improved accuracy and realism. Comparative analysis can identify base pairs with near-certain accuracy if enough homologs exist. Template-based modeling can instantly provide a plausible fold if a similar structure is known (this has been a huge boon in protein modeling and is useful in RNA when appropriate templates exist). Expert adjustments can fix algorithmic mistakes (e.g., resolving a mis-modeled active site in a ribozyme). Overall, human prior knowledge helps especially with local correctness – as noted in one study, non-ML methods with secondary structure input had higher precision in local interactions than unguided ML methods (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). The main drawback is dependence on existing knowledge: if an RNA has a completely novel architecture, there may be no template or alignment to guide us, and even experts might be in uncharted territory. Also, manual processes don’t scale well to thousands of predictions and can introduce inconsistency. Reinforcement learning approaches are still nascent in this field – promising, but not yet a standard tool in RNA structure pipelines. If not carefully designed, RL agents might learn shortcuts that don’t generalize or reflect true physics (like forming unrealistic intermediate states just to achieve a goal). Nonetheless, the trend is toward encoding more biochemical knowledge into algorithms, whether via hard constraints (no backbone clash, canonical base pairs, etc.), custom reward functions, or multi-step refinement strategies that mimic an expert’s workflow.

Latest Tools and State-of-the-Art Advances

The past few years have seen rapid development of RNA folding prediction tools that blend the aforementioned strategies. Here we highlight some of the latest and most advanced methods, along with their characteristics:

AlphaFold2/AlphaFold3-Inspired Methods for RNA: After DeepMind’s AlphaFold2 revolutionized protein prediction, researchers speculated on a similar breakthrough for RNA. Deep learning frameworks like AlphaFold2 cannot be directly applied to RNA due to differences in alphabets and the smaller database of RNA structures. However, new efforts have emerged. Notably, DeepMind’s team introduced AlphaFold 3 in 2024, which extends the protein model to handle diverse biomolecules (including RNA and protein-RNA complexes) (The landscape of RNA 3D structure modeling with transformer networks - PMC) (The landscape of RNA 3D structure modeling with transformer networks - PMC). AlphaFold3 integrates some physical modeling (it was reported to incorporate molecular dynamics principles into its architecture) and was shown to directly predict RNA 3D structures from sequence (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed) (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). In a comparison on several RNAs, AlphaFold3 produced structures that closely aligned with experimental ones, even accommodating common RNA modifications (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). This is a remarkable achievement – for instance, it accurately modeled the tertiary fold of a small aptamer and even accepted an input with pseudouridine (Ψ) modifications (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed) (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). However, AlphaFold3 is not yet a panacea. Benchmarking studies indicate its performance on RNAs is comparable to other top ML methods, but not overwhelmingly superior (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Particularly on larger RNAs or those with complex loops not seen before, it still faces challenges (e.g., mispredicting some peripheral loops in a pre-miRNA test case) (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). As of now, AlphaFold3’s implementation for RNA is not publicly available, limiting its widespread use (The landscape of RNA 3D structure modeling with transformer networks - PMC). Nonetheless, its development signals a push toward unified models that can handle proteins and RNAs together, possibly even capturing their interactions. Another AlphaFold-inspired tool is RoseTTAFoldNA (from the Baker lab and collaborators), which adapts the RoseTTAFold protein model to RNA by using a transformer that is SE(3)-equivariant (meaning it respects 3D rotational symmetries) (The landscape of RNA 3D structure modeling with transformer networks - PMC). RoseTTAFoldNA is an end-to-end predictor that directly outputs 3D RNA coordinates and was one of the participants in the recent CASP/RNA-Puzzles challenge. It performed well, though not topping the charts, indicating that further fine-tuning or larger training sets might be needed. Overall, the “AlphaFold for RNA” race has begun, but current consensus is that RNA’s data limitations and structural peculiarities make it a harder target (When will RNA get its AlphaFold moment? - PubMed). In the short term, researchers suggest either gathering more RNA data or developing architectures that require less data (When will RNA get its AlphaFold moment? - PubMed), as well as incorporating non-sequence information (e.g. known RNA chemical probing data) to supplement sparse sequence alignments.
DeepFoldRNA and DRFold: These are two leading deep learning-based RNA 3D structure predictors that emerged around 2022–2023. DeepFoldRNA (Pearce et al.) uses transformers to predict distance matrices and then reconstructs the structure by minimizing a loss function that includes both the network’s predicted distances and steric constraints (The landscape of RNA 3D structure modeling with transformer networks - PMC). It operates “coarse-to-fine,” first predicting a coarse backbone layout, then adding atomic detail. DRFold (Zhang Lab) takes a somewhat hybrid approach: it predicts local backbone conformations (using deep learning on frames) and global restraints simultaneously (Integrating end-to-end learning with deep geometrical potentials for …). DRFold introduced the idea of “deep geometrical potential,” effectively a learned scoring function that guides assembly of the RNA. In independent benchmarks, DeepFoldRNA often achieves the lowest RMSD (best accuracy) among automated methods (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology) (Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction__PLOS Computational Biology), with DRFold close behind. Both are available as open-source or web servers, making them accessible to users. They represent the first generation of RNA-centric deep networks that are pushing the field forward.
Other Notable Tools: RhoFold is another deep model (mentioned in literature alongside DRFold) that uses an RL-like training objective in combination with deep learning (The landscape of RNA 3D structure modeling with transformer networks - PMC). trRosettaRNA was an early attempt to adapt protein contact prediction (trRosetta) to RNA; it predicts inter-nucleotide distances via a residual neural network. E2Efold-3D (Shen et al. 2023) is an “end-to-end” method that first predicted 2D structure and then 3D using deep learning (The landscape of RNA 3D structure modeling with transformer networks - PMC), though it did not achieve the same level of accuracy as DeepFoldRNA or DRFold. On the other end of the spectrum, fragment-assembly tools remain in use: RNAComposer (a template/fragment assembly method) can quickly generate a model if you provide a secondary structure (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). It performed well on simple cases like small aptamers in a recent comparison, but it heavily relies on the input secondary structure being correct (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). Rosetta FARFAR2 (2019–2020) is the Rosetta-based pipeline that improved on the original FARNA/FARFAR. FARFAR2 can handle RNAs up to ~200 nt (with secondary structure as input) and introduced better fragment picking and a refined scoring function. It was a top performer among physics/knowledge-based methods prior to the deep learning surge. FARFAR2 excels in modeling precise geometries when given the right constraints, but without secondary structure input, it sometimes failed to find the correct global fold (for example, it struggled to reproduce the canonical L-shaped tRNA in an ab initio test without knowing the secondary structure) (Comparison of Three Computational Tools for the Prediction of RNA Tertiary Structures - PubMed). SimRNA (a Monte Carlo simulated annealing tool with a coarse-grained force field) is another notable method; it’s faster than Rosetta and can incorporate base-pair restraints. Vfold2 (2018, Chen et al.) is a coarse-grained thermodynamic model particularly adept at handling pseudoknots and G-quadruplexes – it enumerates possible secondary/tertiary structures based on statistical potentials and filters them by free energy. Vfold2 was highlighted for getting reasonable folds even for some complex RNAs, though at lower resolution. BRiQ potential (by AIchemy_RNA team) is a knowledge-based scoring approach that has been used in combination with other modeling techniques to evaluate RNA models in CASP15 ([PDF] RNA Tertiary Structure prediction: Deep Learning vs. Statistical …).

In a 2024 Nucleic Acids Research study, six representative methods – DeepFoldRNA, RhoFold, BRiQ, FARFAR2, SimRNA, and Vfold2 – were benchmarked side by side for their ability to model RNAs (including ligand-bound RNAs) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). The findings echoed the earlier points: ML methods (DeepFoldRNA, RhoFold) got the global fold right more often (higher TM-score, lower RMSD to native overall), whereas knowledge-based methods with constraints (FARFAR2, SimRNA with secondary structure given) achieved better local interaction accuracy (e.g., base pairing and base stacking in the core) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). Interestingly, for ligand-binding pockets, even a partially correct RNA model often has the correct local shape to bind the ligand, suggesting current methods might be sufficient for some drug-design applications if the region of interest is modeled well (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). The same study tested AlphaFold 3 on completely novel structures and found it performed on par with the other ML methods (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) – an encouraging result, but indicating that there is still room to improve (AlphaFold3 did not dramatically exceed the field on those blind targets). This competitive benchmarking is driving rapid improvements. Many groups are now focusing on the hardest cases: RNAs with complex topologies, or those requiring recognition of non-Watson-Crick interactions (e.g. triples, A-minor contacts), which most current algorithms miss.

(Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) Graphical summary of RNA structure prediction approaches. Data-driven methods (DeepFoldRNA, RhoFold, etc.), knowledge-based tools (fragment assembly like FARFAR2, SimRNA, Vfold2), and integrated frameworks (AlphaFold 3) can take an RNA sequence (with optional secondary structure input) and produce 3D models. These models are then evaluated for accuracy, including the geometry of ligand-binding sites. Combining multiple methods and using prior information often yields the best results. (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC) (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC)

In terms of practical tools for users, many of the mentioned methods are available as web servers or downloadable programs. For secondary structure, tools like RNAstructure, ViennaRNA (RNAfold), LinearFold, CONTRAfold, as well as newer ML tools like UFold and E2Efold, are accessible and often used in combination (consensus approaches can improve reliability). For 3D, RNA-Puzzles organizers maintain a repository of methods; currently, a user might try RNAComposer for quick modelling, FARFAR2 or SimRNA if they have some expertise (and possibly incorporate known base pairs), or newer servers like RNA-Fold (Yang Zhang’s server running DRFold) or NAFold (if AlphaFold for NA becomes available). Each tool has its input requirements and ideal use cases, so users often iteratively refine predictions using multiple approaches.

Conclusion and Outlook

RNA folding prediction is benefitting from a convergence of data-driven algorithms, advanced simulations, and human insight. Data-driven models bring speed and a capacity to learn complex patterns from existing structures, MD simulations contribute physical rigor and detail, and expert knowledge provides guidance and sanity-checks that purely automated methods may lack. The latest research clearly shows that no single approach is sufficient for all cases – rather, the best results come from carefully orchestrating these methods. Deep learning has dramatically improved and will continue to do so as more RNA structures become available and models are refined to be less data-hungry (When will RNA get its AlphaFold moment? - PubMed). At the same time, improved force fields and faster computing are expanding the size and time limits of MD, enabling, for example, routine folding simulations of small RNAs which can validate ML predictions. On the knowledge side, community efforts like RNAcentral and Rfam are curating more RNA alignments and 3D motifs, which can be fed into predictors as priors. We also see a trend of integrative modeling – using experimental data (chemical probing, cryo-EM density for large RNAs, FRET distance restraints) alongside computation to tackle structures that purely computational methods find intractable.

In evaluating different methods, data-driven techniques excel in throughput and often in getting the rough fold correct, but they may need assistance to achieve atomic accuracy or handle exotic motifs. Physics-based methods are invaluable for final refinements and understanding the dynamics, yet they are unlikely to solve large RNA structures from scratch due to resource constraints. Human prior knowledge acts as a glue, improving each step: providing secondary structures to ML and physics methods, offering templates or analogies from known RNAs, and even inspiring algorithmic innovations (many ML architectures for RNA are informed by what experts know about RNA secondary structure hierarchy). The major pros and cons of each approach can be summarized as follows:

Data-Driven (ML/DL): Pros: Very fast predictions once trained; can implicitly learn and apply evolutionary couplings and complex dependencies; improved constantly by new data and architectures. Cons: Requires large training sets (limited for RNA); may not generalize to novel folds; can have trouble with fine details (e.g. uncommon base pairs or backbone strain); acts as a black box with limited explainability.
Dynamics Simulation (MD): Pros: Physically interpretable; explores actual folding pathways and energetics; does not require homologous data; provides an ensemble of structures and dynamics information (fluctuations, transitions). Cons: Computationally expensive; timescale issues – might miss rare transitions; results depend on force field quality; typically needs expert setup (e.g. choosing collective variables for enhanced sampling).
Human Knowledge Integration: Pros: Can greatly boost accuracy by leveraging evolution (covariation), known motifs, and logical constraints; allows correction of algorithmic errors; essential for cases with experimental data integration. Cons: Not automated – needs available homologous sequences or templates; hard to quantify the added value of “intuition”; can introduce bias or error if the prior is wrong; doesn’t scale easily to high-throughput predictions.

Looking at current frontiers, one exciting direction is reinforcement learning and adaptive sampling for RNA structure – methods that iteratively improve a model by feedback, potentially bridging the gap between static prediction and dynamic refinement. Another is the development of diffusion models and generative models for RNA 3D structures, analogous to those emerging for protein folding. Generative models could in theory sample the distribution of possible RNA folds and even design new RNA shapes. Also, incorporation of chemical modification data (like SHAPE reactivity, which reports flexible vs paired regions) into ML models is likely to enhance secondary and tertiary predictions by providing experimental constraints.

Finally, community benchmarks such as RNA-Puzzles are crucial for driving progress. They have revealed that while automated methods are catching up, expert hybrid approaches often still win (Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions - PMC). The gap is narrowing each year, and knowledge from experts is steadily being distilled into algorithms. We can be optimistic that in the near future, with continued integration of data, physics, and human insight, RNA folding predictions will become more routinely reliable – perhaps not quite as solved as protein folding, but sufficiently accurate to guide biological discovery and therapeutic design for many types of RNA. Each approach on its own has limitations, but together they are moving us toward a comprehensive solution for the RNA folding problem (When will RNA get its AlphaFold moment? - PubMed).

References: (Key references are cited in text in the format 【citation】 linking to relevant literature and tool descriptions.)