Polar Contacts and RNA-Ligand Structural Similarity

Role of Hydrogen Bonds and Polar Interactions in RNA-Ligand Binding

Hydrogen bonds and other polar contacts play a pivotal role in RNA–ligand recognition. Analyses of known RNA–small molecule complexes show that hydrogen bonding and π-stacking interactions are among the most frequent contacts at RNA binding sites (Advances and Mechanisms of RNA–Ligand Interaction Predictions) ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). In one survey of riboswitch and aptamer complexes, hydrogen bonds accounted for roughly one-third of all ligand contacts (about 34%), underscoring their significance in binding specificity (Advances and Mechanisms of RNA–Ligand Interaction Predictions). Notably, ligands tend to form H-bonds primarily with nucleotide bases (especially guanine) rather than the RNA backbone (Advances and Mechanisms of RNA–Ligand Interaction Predictions). This preference means that hydrogen bonds often encode key specific interactions required for recognition, as base functional groups provide unique H-bond donors/acceptors. In short, polar contacts (H-bonds, ionic bridges, etc.) are critical for high-affinity and specific RNA–ligand binding, making them a logical feature to examine when comparing binding modes.

However, the presence of hydrogen bonds alone doesn’t tell the whole story of binding. Other interaction types (π–π stacking with aromatic bases, cation–π contacts, van der Waals contacts, etc.) also contribute significantly (Advances and Mechanisms of RNA–Ligand Interaction Predictions). For example, aromatic drugs often intercalate or stack between bases in addition to H-bonding. Hydrophobic contacts, though less prevalent in RNA than in protein pockets, still play a role in ligand stabilization ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ) (Advances and Mechanisms of RNA–Ligand Interaction Predictions). Thus, while hydrogen bonds are key contributors to RNA-ligand complexes, a complete binding mode is defined by a network of interactions. Polar contacts must be considered in context with these other forces. Water- and metal-mediated contacts are another complexity: RNA binding sites are highly polar and often coordinate cations or ordered water molecules that bridge ligand–RNA interactions ( RNA-ligand molecular docking: advances and challenges - PMC ). Many docking studies note that neglecting explicit waters/ions can cause missed or mis-scored interactions ( RNA-ligand molecular docking: advances and challenges - PMC ). In summary, hydrogen bonds are central to RNA-ligand binding energetics and specificity, but they are one part of a multifaceted interaction network.

Polar Contact Metrics vs. RMSD for Structural Similarity

Because hydrogen bonds and similar contacts define how a ligand fits functionally in an RNA pocket, one might ask if matching these contacts is a good proxy for structural similarity between two complexes. Computational assessments traditionally use root-mean-square deviation (RMSD) of atomic coordinates to judge similarity of a predicted pose to a reference. RMSD is a straightforward geometric metric, but it has well-known limitations for flexible molecules. In RNA–ligand modeling, a low RMSD indicates a close overlap with the crystal pose, but a higher RMSD does not always mean the binding mode is wrong ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). For instance, if an RNA undergoes minor rearrangement far from the ligand site or a flexible ligand tail moves, the RMSD can be high even though the key contacts in the binding pocket are reproduced ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ) ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). Likewise, RMSD is biased by ligand size – larger ligands tend to yield higher RMSDs for the same local deviation ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). These issues can make RMSD a noisy measure of “pose correctness.”

Contact-based similarity metrics have emerged as an alternative that focuses on the interaction pattern rather than exact atomic positions. The idea is that if a docking pose reproduces the crucial hydrogen bonds and other contacts that the crystal structure has, it should be considered a successful prediction, even if the ligand’s orientation differs slightly. Several studies support the reliability of contact-based measures. For example, Kroemer et al. introduced Interactions-Based Accuracy Classification (IBAC), which scores a pose by how many key interactions (H-bonds, etc.) it shares with the reference complex ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). In their tests, IBAC often disagreed with RMSD-based judgments – and in “a number of cases significant discrepancies were found between IBAC and RMSD-based classifications. Despite being more subjective, the IBAC proved to be a more meaningful measure of docking accuracy” (Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations - PubMed). In other words, some poses with higher RMSD still made all the right contacts and were biologically correct, and IBAC correctly accepted these where an RMSD cutoff would have failed (Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations - PubMed). Another approach by Ding et al. defined a Contact Mode Score (CMS), using the Matthews correlation coefficient to compare binary contact maps of the ligand with the RNA in the model vs. the crystal ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). This contact-centric score was shown to be effective in evaluating flexible docking results ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ).

Beyond custom metrics, tools now encode interaction patterns as fingerprints. For instance, fingeRNAt generates a Structural Interaction Fingerprint (SIFt) for RNA–ligand complexes (a binary string representing H-bonds, stacking, ionic contacts, etc. present) ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ) ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). By comparing fingerprints, one can quantify similarity in binding interactions. Szulc et al. (2022) explicitly propose fingerprint-based similarity “as an alternative measure to RMSD to recapitulate complexes with similar interactions but different folding” ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). In their approach, ligand poses that achieve the same network of contacts are clustered together, even if their atomic RMSDs differ, highlighting that a ligand can bind in a functionally equivalent way despite slight shifts in position ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). This is especially useful in RNA, where the receptor might flex or an induced fit causes the overall geometry to differ while maintaining the critical hydrogen bonds.

How robust are polar contacts as a similarity indicator? Generally, preserving the key polar contacts of a known binder is a strong sign that a predicted pose is meaningful. If a docked pose shares all the important H-bonds and ionic interactions with the native complex, it often implies the ligand is oriented and positioned correctly for activity. Studies have found that interaction-based metrics correlate with successful pose predictions better than RMSD alone (Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations - PubMed). However, one must be cautious: a pose might form one or two of the same hydrogen bonds as the real complex yet still be mispositioned in other respects. For example, a ligand could form a correct H-bond to an RNA base but rotate such that it misses other contacts, yielding a high RMSD. A case was noted in protein-ligand docking where a pose achieved a particular hydrogen bond yet had an RMSD of ~6–7 Å – clearly a largely incorrect pose despite one correct contact (illustrating that a single H-bond match is not sufficient on its own). Therefore, robust assessment often combines metrics. In practice, many evaluators require that a predicted complex not only has a low RMSD or high contact overlap, but specifically that it reproduces all or most of the known key interactions. Missing a critical H-bond is usually a sign of an incorrect pose, whereas gaining a spurious H-bond that wasn’t in the crystal may indicate an alternative binding mode that might or might not be valid. In summary, polar contact similarity is a valuable indicator of structural similarity – often more directly tied to functional correctness than RMSD – but it should be applied with nuance. The best approaches use contact-based measures in tandem with geometric criteria. This consensus is reflected in community assessments: for example, RNA-Puzzle contests evaluating RNA–ligand models considered both RMSD and an “interaction network fidelity” score, ensuring that predicted structures captured the correct interaction network of the ligand ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). When coordinate-based evaluation failed (in one case, ligand coordinates couldn’t be RMSD-evaluated due to format issues), interaction fingerprints were used as a fallback to judge whether the pose was likely correct ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ).

In conclusion, hydrogen bonds and polar contacts are robust descriptors of binding mode similarity. Methods that compare interaction patterns (H-bond networks, contact maps) often identify correct poses that RMSD would mis-rank. Nevertheless, they work best when considering the full set of interactions (including hydrophobic contacts and sterics) rather than hydrogen bonds alone. A combined view – “does the model make the same key contacts?” and “is it in essentially the same location/orientation?” – provides the most reliable assessment of RNA–ligand structural similarity.

Docking Scores vs. Contact-Based Scoring Methods for RNA–Ligand Binding

Computational scoring functions for RNA–ligand binding can be broadly classified into docking energy scores and contact or knowledge-based scores. Each approach has strengths and weaknesses in predicting binding affinity and pose correctness.

Docking energy scores (AutoDock, Vina, Glide, etc.): These are the scoring functions used by automated docking programs, typically physics-based or empirical models originally developed for protein–ligand systems. For example, AutoDock 4’s scoring function is semi-empirical, combining van der Waals, hydrogen-bonding (Lennard-Jones 12–10) potentials, electrostatics, and desolvation terms calibrated to approximate binding free energy. Glide’s scoring (GlideScore) is an empirical force-field-based score with adjustments (penalties for desolvation, rewards for hydrophobic enclosure, hydrogen-bond geometry checks, etc.), and AutoDock Vina uses a hybrid scoring function optimized for pose prediction speed. These scores output an energy (or pseudo-energy) where more negative is “better.” In principle, a lower docking score should correlate with a more favorable (tighter) binding and often the correct pose. In practice, docking scores have limited accuracy for RNAs. RNA presents unique challenges – a highly charged backbone, flexible loops, and frequent involvement of structural water or ions – that generic scoring models struggle to capture. Indeed, benchmarking studies have shown that out-of-the-box docking programs perform modestly on RNA targets. For instance, one comparison on a test set of ~56 RNA–small molecule complexes found that AutoDock Vina and Glide could only correctly place a ligand within 2.5 Å RMSD of the native pose about 17–30% of the time (Glide ~17.8%, Vina ~29% success) ( RNA-ligand molecular docking: advances and challenges - PMC ). These success rates are significantly lower than typically seen in protein–ligand docking, indicating the scoring and sampling are less reliable for RNA. The negatively charged RNA pockets often cause docking scores to favor certain polar interactions excessively or penalize poses due to lack of explicit counter-ions. Moreover, many RNA binders are flexible or extended, making sampling difficult and scoring sensitive to conformation.

In terms of predicting binding affinity, docking scores are at best semi-quantitative. When tested on RNA–ligand complexes with known binding constants, even advanced scoring models reach only moderate correlation with experimental affinities (Pearson $R \approx 0.5$–0.6) ( RNA-ligand molecular docking: advances and challenges - PMC ). For example, a knowledge-tuned scoring function (SPA-LN) achieved $R \sim 0.58$ on 77 nucleic acid complexes from PDBbind, whereas a standard protein-trained scoring would likely be lower ( RNA-ligand molecular docking: advances and challenges - PMC ). This underscores that while a very favorable docking score can distinguish strong binders from weak ones in coarse terms, the absolute accuracy is limited. Docking scores generally should not be taken as precise predictors of $K_d$ or $ΔG$ for RNA targets – rather, they rank order candidates. A common strategy is to use docking scores to filter a library (virtual screening for RNA-binding compounds) and then rely on more detailed analysis or experimental testing for affinities.

Strengths: Docking scores consider a broad range of physicochemical factors. They inherently account for hydrogen bond formation, steric fit, desolvation, and sometimes entropic effects via parameterization. For example, Glide and AutoDock implicitly reward hydrogen bonds (up to a certain cap) and penalize strains or clashes, which means a pose that maximizes hydrogen bonds and avoids clashes will usually score well. This multi-factor evaluation is a strength because binding affinity is indeed multi-factorial. Docking scores are also fast to compute, enabling high-throughput screening of many compounds against an RNA structure. They can differentiate obviously bad poses (e.g., ligand placed outside the pocket or with severe steric clashes) from plausible ones.

Limitations: A major limitation is inaccuracy in ranking and false positives/negatives. For RNA-ligand systems, standard scoring functions may mis-rank poses or compounds because they don’t capture certain RNA-specific interactions. For instance, many scores do not explicitly model cation–π interactions or the stabilizing effect of a divalent metal ion that might be present in the real complex. Hydrogen bonds in scoring functions are usually treated in a straightforward way; thus, a docking algorithm might place a ligand in a wrong orientation that creates an “extra” hydrogen bond to the RNA – the score improves, but the pose could be artifactual (especially if that H-bond would require an unrealistic RNA backbone twist or an unsatisfied counter-charge). Conversely, a pose that is actually correct might involve a water-mediated hydrogen bond (common in RNA pockets) which the scoring function fails to reward because it only counts direct H-bonds ( RNA-ligand molecular docking: advances and challenges - PMC ). Such a pose could be scored worse despite being the true binding mode. Flexibility is another challenge: most docking runs keep RNA rigid or semi-rigid; if the RNA needs to change conformation to accommodate the ligand (induced fit), the scoring of the “docked” pose might be poor even though that pose would be valid in the real (flexed) RNA. Finally, many docking scores are not specifically parametrized for nucleic acids – for example, AutoDock’s partial charges and solvation parameters were fit on proteins, and Glide’s training sets mostly involve protein receptors. As a result, the energetic contributions in an RNA context (like phosphate–ligand electrostatics or base stacking) may be mis-estimated.

Contact-based and knowledge-based scoring: To address the shortcomings of generic docking scores, researchers have developed scoring methods that rely on known interaction patterns or simplified contact models. These methods often either rescore poses generated by docking or even guide the docking search by emphasizing matching interactions. A simple form is a contact count or matching score – e.g., count how many hydrogen bonds a pose has to the RNA, or compare the set of contacting nucleotides to those observed in a reference complex. More sophisticated are methods like IBAC (discussed above), which requires defining “key” interactions and then checks if the pose has them ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). IBAC essentially gives a pass/fail or a graded score based on critical contacts, and it proved more discerning than RMSD in many cases (Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations - PubMed). Another example is the Interaction Network Fidelity (INF) score used in RNA 3D predictions, which compares the network of all atomic contacts in a model to that in the native structure (this can include ligand contacts as well). These contact-centric approaches directly evaluate whether a predicted complex recapitulates the bonding network of the real complex, which is a strong indicator of correctness. They excel at evaluating pose accuracy (did we get the binding mode right?) but are not designed to predict the magnitude of binding affinity.

Separate from binary contact matching, there are knowledge-based scoring functions specifically devised for RNA–ligand docking. These are derived from statistical analysis of structural databases, analogous to knowledge-based potentials in protein modeling. For instance, DrugScoreRNA (2007) and LigandRNA (2013) introduced potentials trained on RNA–ligand crystal structures ( RNALigands: a database and web server for RNA–ligand interactions - PMC ). LigandRNA uses a grid-based knowledge potential, essentially capturing favorable interaction geometries (including polar contacts and stacking) observed in known complexes (LigandRNA: computational predictor of RNA-ligand interactions - PubMed). Such a score can be used to rank docked poses; Philips et al. reported that LigandRNA’s predictions compared favorably to five other methods and that using it to rescore Dock6 poses improved identification of near-native poses (LigandRNA: computational predictor of RNA-ligand interactions - PubMed). Another modern example is ITScore-NL, an iterative knowledge-based scoring function that explicitly added terms for base stacking and long-range electrostatics to better suit nucleic-acid environments ( RNA-ligand molecular docking: advances and challenges - PMC ). By including RNA-specific interactions, ITScore-NL achieved higher success rates in pose prediction (in one benchmark, correctly picking ~50% of native poses as top-ranked versus ~35% for a prior method) ( RNA-ligand molecular docking: advances and challenges - PMC ). Similarly, a scoring function called SPA-LN (Specificity plus Affinity for Ligand–Nucleic acid) was developed to account for RNA-ligand peculiarities; it not only considers binding affinity but also the specificity of interactions, which led to improved pose prediction success (≈54% success within 2.5 Å for top-ranked poses, doubling the success rate of Glide) ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ). These knowledge-based scores effectively encode the tendency of RNA to form certain interaction patterns – for example, the strong preference for ligands to make hydrogen bonds to guanine’s Watson-Crick face, or common stacking motifs – and thus can recognize a “plausible” pose even if it doesn’t maximize a generic energy function.

Another category on the rise is machine-learning based scoring. Instead of human-crafted energy terms, ML models learn from data which features make a pose correct or a ligand bind strongly. One such approach, AnnapuRNA, uses a coarse-grained representation of RNA–ligand complexes and trains algorithms (random forests, neural networks) to predict pose “nativeness” ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ). AnnapuRNA encodes contacts in a simplified way (each nucleotide as beads, ligand as pharmacophore points) and was shown to outperform many traditional scoring functions in distinguishing near-native poses ( RNA-ligand molecular docking: advances and challenges - PMC ). The advantage of ML or knowledge-based methods is that they can capture subtle patterns (e.g. a particular hydrogen bond geometry that’s especially favorable, or the penalty of burying an unsatisfied polar group) directly from data. The limitation, of course, is that they require sufficient high-quality RNA–ligand complex data for training – which is still a relatively small set – and they may not generalize beyond what they’ve seen ( RNA-ligand molecular docking: advances and challenges - PMC ).

Strengths of contact/knowledge-based scores: They tend to be more accurate in pose ranking for RNA ligands, because they incorporate the known chemistry of RNA binding. They explicitly reward the presence of interactions known to be important (like specific H-bonds, stacking) rather than just counting raw energetic contributions. This often leads to better agreement with which pose is actually correct. For example, by adding a stacking term and tuning hydrogen-bond potentials, one study was able to raise the success rate of identifying the native pose from ~35% to ~50% on a benchmark set ( RNA-ligand molecular docking: advances and challenges - PMC ). Contact-based metrics are also intuitive – they allow researchers to rationalize why a pose is scored well (“it makes the key contacts we expect”) or poorly (“it’s missing the salt bridge to the backbone phosphate, so it’s likely incorrect”). This interpretability is a strength when refining models. Moreover, these methods can be very fast – comparing fingerprint similarity or contact maps is usually less computationally intensive than calculating full physics-based energies, which makes them suitable for rescoring large numbers of poses quickly.

Limitations: A contact-based score by itself doesn’t guarantee a truly stable binding mode; it checks the presence of interactions but not whether the geometry and environment are fully favorable. There’s a risk of false positives if, say, a pose manages to form all the “expected” hydrogen bonds but in a strained conformation or while incurring other unseen penalties. (In a contrived example, a ligand might twist to make all the right contacts but also clash with the RNA backbone – a pure contact count wouldn’t catch the clash, whereas an energy score would.) Many knowledge-based potentials address this by incorporating packing or clash terms, but it’s something to consider. Also, contact methods often need a reference: IBAC needs predefined key contacts from a crystal structure ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ), and fingerprint similarity needs either a known reference pose or at least a template of what interactions are desired. This makes purely novel predictions harder – you can’t use IBAC to score a brand-new RNA target unless you guess which contacts would be important. However, generic knowledge-based scores (like LigandRNA or ITScore-NL) don’t require a known reference – they learn overall preferences from many known complexes, so they can score novel cases, albeit based on statistical tendencies. Another limitation is that contact-based metrics don’t directly yield a binding affinity estimate. They output a similarity or a probability of being “native-like,” not a $\Delta G$. So for ranking different ligands (as opposed to poses of the same ligand), they are less useful – two different compounds might both satisfy key contacts in their respective binding modes, but one could bind much tighter than the other due to additional interactions or entropy, which a contact score won’t capture. In such cases, a hybrid approach is often used: first ensure a pose is interaction-valid (using contacts), then use a more nuanced energy calculation (MM-GBSA, free energy perturbation, etc.) to estimate actual binding strengths.

Accuracy, Strengths, and Limitations in Predicting Binding and Benchmarks

Pose prediction accuracy: Thanks to these scoring advancements, the reliability of RNA–ligand docking has improved, but it still lags behind protein–ligand docking. Early studies using general-purpose dockers on RNA had low success, as noted above (often <30% success for top-ranked pose within 2–3 Å) ( RNA-ligand molecular docking: advances and challenges - PMC ). Newer RNA-specific scoring functions have roughly doubled this success rate. For example, a 2017 evaluation reported ~54% success using a tailored scoring function (SPA-LN) on a standard test set, versus 18% for Glide on the same set ( RNA-ligand molecular docking: advances and challenges - PMC ). Another study introduced a statistic potential with RNA-specific terms (ITScore-NL) and showed it could identify the correct pose in the top ranks for ~71% of complexes (within 1.5 Å if considering top-3 predictions) ( RNA-ligand molecular docking: advances and challenges - PMC ). LigandRNA, combined with DOCK, was also able to improve pose identification significantly compared to docking alone (LigandRNA: computational predictor of RNA-ligand interactions - PubMed). These numbers, while not as high as one would like, indicate steady progress. Notably, methods that explicitly model the unique RNA interactions (H-bonds, stacking, ions) consistently outperform those that don’t ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ). This highlights that including polar contact criteria (either via scoring or post-analysis) is crucial for accuracy. The strength of contact-focused evaluation is best seen in cases where RMSD fails: e.g., when multiple protein or RNA conformations are involved, a “footprint similarity” metric (comparing the pattern of interaction energies per residue) was found to capture pose similarity across diverse crystal structures better than ligand-centric RMSD ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). In other words, by focusing on interactions, one can recognize the same binding mode even if the precise atomic positions vary.

Binding affinity prediction: Predicting absolute binding affinities to RNA remains very challenging. Docking scores can be used to rank compounds qualitatively, but their correlation with experimental affinity is moderate at best ( RNA-ligand molecular docking: advances and challenges - PMC ). For instance, SPA-LN’s Pearson $R\approx0.6$ on a set of ~77 RNA complexes was considered an encouraging result ( RNA-ligand molecular docking: advances and challenges - PMC ) – many other methods would be lower. The limited accuracy is due to many factors: the difficulty of modeling ion and solvent effects (very important for RNA), the entropic cost of ligand and RNA conformational changes, and simply the paucity of high-quality binding data to calibrate models. To improve this, some protocols use docking scores as one component and then apply more detailed physics-based calculations (like molecular dynamics with free energy estimations) for a subset of top compounds, albeit at much higher computational cost. As an example of innovation, one group suggested incorporating kinetic factors (residence time) into docking scoring to better correlate with in vivo efficacy ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ), acknowledging that static affinity alone might not tell the whole story for drug action on RNA.

Strengths and limitations recap: In summary, docking scoring methods (AutoDock, Glide, etc.) are fast and broadly applicable, providing a starting point for RNA–ligand binding predictions. They capture many important interactions, including polar contacts to an extent, and have had success in virtual screening for RNA-binding leads. Their main weakness lies in accuracy – they can miss the correct pose or mis-rank it due to incomplete modeling of RNA’s peculiarities. Polar contact-focused methods (interaction fingerprints, contact scores) offer a complementary strength: they are very good at validating whether a pose is likely correct by checking if it makes the right contacts. They’ve proven more reliable than RMSD when judging poses, especially in flexible scenarios (Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations - PubMed). The limitation is that by themselves they don’t measure energy or affinity and may require known interaction data. Hybrid approaches are therefore common. For example, one might dock a library with AutoDock or Glide to generate poses and initial scores, then use a contact filter or rescore with a knowledge-based potential to re-rank those poses. This takes advantage of the strengths of each method – the broad search and energy screening of docking, plus the precise interaction checking of contact-based scoring.

Benchmark studies and datasets: Several benchmark sets have been established to evaluate these methods. A frequently cited benchmark is a set of ~40–42 RNA–ligand complexes compiled from the PDB (often riboswitch aptamers bound to small molecules, viral RNA fragments with ligands, etc.). This set was used by Philips et al. in developing LigandRNA in 2013 and by others; for example, LigandRNA reported a pose prediction success of ~36% on 42 complexes, which improved to ~48% when combined with DOCK6 as a meta-predictor ( RNA-ligand molecular docking: advances and challenges - PMC ) (LigandRNA: computational predictor of RNA-ligand interactions - PubMed). Later methods like ITScore-NL and RLDOCK used similar datasets – Li et al. (2015) trained RLDOCK on 30 RNA complexes and tested on 38–42 complexes, showing improved accuracy over earlier docking tools ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ). The Protein Data Bank (PDB) itself now has enough RNA–ligand structures that subsets have been extracted for scoring function benchmarks. The PDBbind database, known for protein–ligand affinity benchmarks, has a nucleic acid subset: one study used 77 RNA-containing complexes from PDBbind (with binding affinity data) to validate the SPA-LN scoring function ( RNA-ligand molecular docking: advances and challenges - PMC ). Another 34 RNA–ligand complexes with measured affinities were tested separately in that study, yielding similar correlation results ( RNA-ligand molecular docking: advances and challenges - PMC ). These datasets serve as a reference to quantify how well a scoring method can predict real binding affinities.

For pose prediction benchmarks, researchers often report the fraction of cases where the native-like pose is ranked within the top N by the scoring function (using an RMSD cutoff like 2 Å to define “native-like”). As mentioned, specialized RNA scoring methods now achieve ~50% or better on these benchmarks ( RNA-ligand molecular docking: advances and challenges - PMC ) ( RNA-ligand molecular docking: advances and challenges - PMC ), whereas generic methods were around 20–30%. Another community resource is the RNA-Puzzles competition, which occasionally includes RNA–small molecule complexes as targets. Participants must predict the 3D structure of an RNA with its ligand, and organizers evaluate results with various metrics. In one RNA-Puzzles target involving a ligand, the organizers measured not just RMSD but also whether the predicted ligand made the same contacts as in the true structure, reflecting the importance of interaction-based assessment ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). Interaction Network Fidelity (INF), a metric originally for RNA base-pairing networks, has been adapted to include ligand contacts in some cases, effectively scoring how many of the native contacts (including H-bonds to the ligand) are recovered ( fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions - PMC ). These community benchmarks reinforce that a multifactor evaluation is best – winning predictions usually have both low RMSD and high contact fidelity.

Looking forward, the field is moving towards integrating these approaches. The reliability of using polar contacts as a similarity measure is well-supported: preserving the hydrogen-bond network of a known RNA–ligand complex is a strong indicator of a correct pose. Yet, the best predictive power comes from balanced scoring – combining the precision of contact-based metrics with the broad evaluation of physics-based scores. Benchmark studies underscore that no single metric is perfect, but together, they can compensate for each other’s blind spots. As more RNA–ligand structures and affinity data become available (efforts like the RNALigands database ( RNALigands: a database and web server for RNA–ligand interactions - PMC ) are compiling these), scoring functions will continue to improve in both pose prediction and affinity estimation. In the meantime, researchers assessing RNA–ligand models will continue to use hydrogen bonds and polar contacts as a key litmus test of correctness – a reliable guide alongside other metrics like RMSD and docking score to ensure a comprehensive evaluation of RNA–ligand interactions.

Sources: