basepairs
Hydrogen Bond Distances and Base Plane Angles in GC Base Pairing
For two RNA bases to be considered base-paired, they must form hydrogen bonds at close range and lie in roughly the same plane. In practice, a heavy-atom donor–acceptor distance below ~3.4 Å is used as a cutoff for a valid hydrogen bond between bases ( Tools for the automatic identification and classification of RNA base pairs - PMC ). At least one such H-bond (and preferably two or more) should link the G and C. Meanwhile, the angle between the base planes (the angle between their normal vectors) should be relatively small – typically under ~65° – so that the bases are near-coplanar ( Tools for the automatic identification and classification of RNA base pairs - PMC ). In your example, a G–C pair with H-bond distance <3.4 Å and a 46° inter-plane angle would satisfy these criteria: the bond length is within hydrogen-bonding range and the bases, though tilted, are still reasonably coplanar (46° is below common cutoff values). Such a pair is likely a genuine interaction rather than mere stacking. By contrast, if the bases were almost perpendicular (angle approaching 90°) or too offset (large vertical separation), the H-bond would be geometrically strained and the interaction might be considered invalid as a base pair ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA). In summary, both a short H-bond and a fairly planar orientation are needed: one without the other is usually insufficient for a stable RNA base pair.
Leontis–Westhof Classification and Geometric Limits
Leontis–Westhof (L–W) classification defines RNA base pairs by the interacting edges of each base (Watson–Crick, Hoogsteen, or Sugar edge) and the orientation (cis or trans) of the glycosidic bonds (). Importantly, the original L–W scheme was intended for well-formed base pairs, typically those with at least two hydrogen bonds connecting the bases (). It assumes the bases interact edge-to-edge in a planar fashion. While Leontis and Westhof did not rigidly specify an angle cutoff in their 2001 paper, “planar edge-to-edge” implies the bases should be nearly coplanar. In practice, later implementations of L–W classification established clear geometric standards. For example, RNAview (Yang et al., 2003) – a tool that fully implements L–W classification – requires the base–base angle αb to be less than 65° for an interaction to count as a base pair ( Tools for the automatic identification and classification of RNA base pairs - PMC ). If the bases are tilted more than that, the interaction is no longer considered a standard base pair but something else (like a stacking or tertiary contact). Likewise, RNAview requires the base planes to be separated by <2.5 Å vertically ( Tools for the automatic identification and classification of RNA base pairs - PMC ), reinforcing that the bases must lie in essentially the same plane.
Crucially, L–W’s “12 basic geometric families” cover only those pairs with ≥2 H-bonds in roughly planar geometry. A 46° inter-base angle, for instance, is within the tolerated range and would not on its own disqualify a G–C pair from being classified (it’s below the ~65° cutoff). In fact, L–W classification includes many non-Watson–Crick G–C pairings (e.g. reverse G–C, Hoogsteen-type) that often have tilted geometry yet still form two robust H-bonds. The scheme’s scope does not typically extend to interactions with only one hydrogen bond, because those are considered either “bifurcated” pairs or tertiary contacts rather than one of the 12 standard families () ( Tools for the automatic identification and classification of RNA base pairs - PMC ). For borderline cases, there are stricter guidelines: Yang et al. note that if a base pair’s two H-bonds come from a single donor (a bifurcated bond), a tighter coplanarity cutoff of <50° is applied to accept it as one of the 12 families ( Tools for the automatic identification and classification of RNA base pairs - PMC ). This ensures only reasonably planar bifurcated pairs (which are inherently weaker) are included. Conversely, a near-zero angle with too large a separation is treated as mere stacking and rejected as a base pair (e.g. αb<10° but bases >2.2 Å apart vertically is considered stacking) ( Tools for the automatic identification and classification of RNA base pairs - PMC ). In summary, L–W classification is meant for bona fide base pairs with good geometry – typically two or more H-bonds and a planar alignment. While the original scheme didn’t dictate an exact angle value, tools that implement it consistently use an upper bound around 60–65° for the base-plane angle ( Tools for the automatic identification and classification of RNA base pairs - PMC ), thereby operationalizing the “planarity” concept.
Handling by Annotation Tools: MC-Annotate, RNAView, FR3D, and DSSR
Different RNA structure analysis tools employ these geometric criteria in slightly varied ways to detect and classify base pairs, especially non-canonical ones:
-
MC-Annotate (Major **et al., 2002):** This tool uses a probabilistic algorithm to identify base pairs, rather than fixed cutoffs. It encodes all potential donor and acceptor atoms in a bipartite graph and assigns each possible H-bond a probability based on its geometry (distance and angle) ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ). A maximum-flow algorithm then finds the best matching of donors to acceptors, effectively picking out base pairs and even distinguishing whether they have three, two, or one H-bond ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ). Because of this approach, MC-Annotate is very sensitive to borderline cases. It can detect non-canonical pairs stabilized by a single H-bond or bifurcated hydrogen bonds, labeling them with an extended L–W nomenclature for one‐bond interactions ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ). For example, if a G–C forms one strong H-bond and one weaker C–H…O contact, MC-Annotate’s algorithm might still flag it as an interaction (with lower confidence) rather than ignoring it. In their analysis, Major’s group showed that a strict distance-only rule can be misleading – some contacts within 3.4 Å are not true H-bonds and vice versa. By computing probabilities, MC-Annotate avoids false positives/negatives that a single cutoff would create ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ). In practice, they set a default threshold (e.g. expecting ≥0.5 hydrogen bonds in expectation) so that even somewhat distorted pairs can be identified as base-paired ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ). MC-Annotate thus handles unusual geometries gracefully: a 46° inclined G–C pair with sub-3.4 Å contacts would likely still be detected (albeit with a slightly lower probability score), ensuring such a pair isn’t missed due to a hard geometry filter.
-
RNAView (Yang **et al., 2003):** RNAView was one of the first automated tools implementing the Leontis–Westhof scheme. It applies explicit geometric criteria to PDB structures to decide if two bases form a pair. As mentioned, it requires at least two H-bonds for a classified base pair, with at least one base–base H-bond <3.4 Å ( Tools for the automatic identification and classification of RNA base pairs - PMC ). It also enforces the planarity conditions (base-plane angle <65°, separation <2.5 Å) to distinguish true pairing from accidental contacts ( Tools for the automatic identification and classification of RNA base pairs - PMC ). If a G and C meet these cutoffs, RNAView will classify them into one of the 12 families (e.g. cis Watson–Crick if it’s the standard G≡C, or perhaps trans Watson–Hoogsteen, etc., depending on the geometry). Notably, RNAView has logic for special cases: bifurcated pairs (where one atom participates in two H-bonds) must be extra planar (angle <50°) to be accepted ( Tools for the automatic identification and classification of RNA base pairs - PMC ), reflecting the stricter standard needed for these weaker interactions. And if a nucleotide pair doesn’t have two clear H-bonds but does have a single hydrogen bond within a tight cutoff (e.g. N/O…N/O <3.4 Å), RNAView will not assign it to an L–W family, but will still record it as a tertiary interaction (drawn as a dashed line in their output diagrams) ( Tools for the automatic identification and classification of RNA base pairs - PMC ). In other words, a one-bond G–C contact might be noted as a tertiary base–base interaction but not counted as a formal “base pair.” This approach ensures that nearly all meaningful contacts are captured, yet only well-formed pairs get classified. In summary, RNAView would treat a <3.4 Å, 46° G–C as a valid base pair (likely classifying which edges are involved), whereas a more distorted or single-bond case might be flagged differently (tertiary contact or stacking) ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ( Tools for the automatic identification and classification of RNA base pairs - PMC ).
-
FR3D (Find RNA 3D, Leontis/Zirbel **et al., 2009):** FR3D is a toolkit for searching RNA 3D motifs, and it includes functionality to identify and categorize base interactions in structures. It uses a base-centric geometric approach: each nucleotide is represented in a standard reference frame (aligned as if in an ideal Watson–Crick pair), and FR3D checks relative orientations and distances to find interactions ( FR3D: finding local and composite recurrent structural motifs in RNA 3D structures - PMC ). Essentially, the software computes whether two bases have the proper spatial arrangement to hydrogen bond on specific edges. FR3D then assigns an L–W notation to each base pair it finds (cWW, tWH, etc.), just like RNAView ( FR3D: finding local and composite recurrent structural motifs in RNA 3D structures - PMC ). The criteria under the hood are comparable – bases need to be sufficiently close and coplanar and show the correct edge alignment. FR3D’s developers expanded on L–W by defining isosteric groups and even recognizing stacking interactions symbolically ( FR3D: finding local and composite recurrent structural motifs in RNA 3D structures - PMC ), but for base pairs their detection is in line with the conventional geometric rules. One can assume FR3D uses cutoffs in the same ballpark (distance ~3.5 Å, angle ~65°) to initially detect a candidate pair, after which it verifies which edges are in contact. If a G–C pair is unusual (say a trans Hoogsteen/Sugar pairing with a tilted geometry), FR3D will still catch it as long as the base reference frames can be superimposed within tolerance. Leontis et al. also introduced the concept of isostericity – how similar different base pairs are in shape (RNA Basepair Catalog) (RNA Basepair Catalog). Using this, FR3D can recognize that a distorted G–C pairing might still belong to a known family (if it’s geometrically isosteric to others). In practical terms, FR3D should identify a valid 46°-tilted G–C pair and label it with the appropriate L–W class. Differences from RNAView might arise in edge cases: FR3D might be a bit more permissive in grouping interactions if they resemble known motif instances, but it adheres to the same fundamental H-bonding requirements. (Internally, FR3D relies on the same distance/coplanarity checks – glycosidic bond distances, base normals, etc. – to decide if a pair exists ( FR3D: finding local and composite recurrent structural motifs in RNA 3D structures - PMC ).)
-
DSSR (Dissecting the Spatial Structure of RNA, Lu **et al., 2015):** DSSR is a modern, highly robust tool (part of the 3DNA suite) for analyzing nucleic acid structures. It automatically finds canonical and noncanonical base pairs using a combination of simple geometric criteria. By default, DSSR is deliberately inclusive (lenient) in what it counts as a base pair, so as not to miss any plausible interactions ( New restraints and validation approaches for nucleic acid structures in PDB-REDO - PMC ). The program identifies base pairs if the bases are sufficiently close and aligned, and if there is at least one hydrogen bond between them ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA). Specifically, DSSR’s algorithm looks at the distance and coplanarity of the base rings and checks for one or more H-bonds connecting the bases ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA). The default cutoff values are very much in line with RNAView’s: a vertical separation ≤2.5 Å and an inter-base angle ≤65° (along with an overall separation constraint, e.g. base origins within 15 Å) are used as criteria, plus the presence of at least one H-bond involving a base atom (biotite/src/biotite/structure/basepairs.py at main · biotite-dev/biotite · GitHub). If these are met, DSSR declares a base pair. This means that even a single-H-bond interaction (like certain awkward G–C alignments or C–H…O contacts) will be listed as a base pair in DSSR output – often classified as “other” if it doesn’t fit a standard family ( New restraints and validation approaches for nucleic acid structures in PDB-REDO - PMC ). DSSR then classifies each detected pair by two schemes: the 12 L–W geometric types and the older Saenger 28-type nomenclature (DSSR: an integrated software tool for dissecting the spatial structure …). For example, DSSR might report a given G–C pairing as “G+c (cis Watson–Crick)” if it’s a normal pair, or perhaps “G–C cHS” if guanine’s cis Hoogsteen edge contacts cytosine’s sugar edge, etc., along with the H-bond details. Because DSSR is tuned for completeness, it may catch somewhat distorted pairs that other programs skip. (Lu et al. note that DSSR’s liberality is useful when analyzing low-resolution models or refining structures ( New restraints and validation approaches for nucleic acid structures in PDB-REDO - PMC ).) One can impose stricter criteria afterwards – for instance, filtering DSSR output to require two H-bonds if one only wants confident pairs ( New restraints and validation approaches for nucleic acid structures in PDB-REDO - PMC ). In short, DSSR would definitely identify a <3.4 Å, 46° G–C interaction as a base pair, flagging the H-bonds and labeling the geometry, albeit it might classify it as an uncommon type if it doesn’t match the usual cWW or wobble patterns. The key point is that DSSR’s detection algorithm explicitly combines distance and coplanarity checks with H-bond presence, reflecting the consensus criteria from RNA literature ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA).
Despite minor differences, all these tools broadly agree on core principles. Notably, canonical Watson–Crick pairs (like standard G≡C) are recognized easily by all. Discrepancies arise mainly with non-canonical or borderline cases. For instance, one study showed that while these methods concur on ~80–90% of base pairs, their annotations of non-canonical pairs can vary a lot ([CompAnnotate: a comparative approach to annotate base-pairing interactions in RNA 3D structures | Nucleic Acids Research | Oxford Academic](https://academic.oup.com/nar/article/45/14/e136/3875524#:~:text=MC,with%20the%20annotation%20of%20benchmark)). MC-Annotate might report more single-H-bond pairs (due to its permissive approach) whereas RNAView might ignore those or mark them as tertiary, and DSSR might list them but classify as “other.” FR3D, being tuned for motif discovery, often emphasizes recurring valid interactions, possibly filtering out random one-off contacts. When formulating your own criteria, it’s wise to lean on the common ground that these tools have established: require at least one strong H-bond and a roughly planar alignment. You can then decide how strict to be (e.g. insist on two H-bonds for calling it a structured base pair, as L–W originally does, or allow one H-bond if you simply want to catalog any contact). |
Standardizing Base Pair Criteria (Literature-Supported)
Drawing from the above, we can outline a universal rule set for identifying and classifying RNA base pairs, backed by authoritative sources:
-
Require at least two hydrogen bonds for a definitive base pair. This was the original L–W criterion for the 12 standard families (). In practice, at least one of these bonds should be a direct base–base interaction with heavy-atom distance ≤3.4 Å ( Tools for the automatic identification and classification of RNA base pairs - PMC ). (The second can be slightly longer or even a C–H…O/N contact up to ~3.7–3.9 Å as used in RNAview’s allowances ( Tools for the automatic identification and classification of RNA base pairs - PMC ).) Two true H-bonds ensure the interaction is robust and specific. Literature: Leontis-Westhof (2001) (); RNAview criteria (Yang et al., 2003) ( Tools for the automatic identification and classification of RNA base pairs - PMC ).
-
Maintain coplanarity of bases. The interacting bases should lie roughly in the same plane. Implement a cutoff around 65° for the angle between base normals – interactions more skewed than this are usually not considered base pairs ( Tools for the automatic identification and classification of RNA base pairs - PMC ) (biotite/src/biotite/structure/basepairs.py at main · biotite-dev/biotite · GitHub). Most genuine pairs in high-resolution structures have much smaller angles (often <30° for canonical pairs). The 65° limit, used by RNAview and DSSR, covers even highly twisted non-canonical pairs while excluding near-orthogonal contacts. Additionally, enforce a vertical separation ≤2.5 Å between base planes ( Tools for the automatic identification and classification of RNA base pairs - PMC ) to guarantee the bases overlap in space. Literature: Yang et al. (2003) cutoff values ( Tools for the automatic identification and classification of RNA base pairs - PMC ); DSSR default criteria (Lu et al., 2015) (biotite/src/biotite/structure/basepairs.py at main · biotite-dev/biotite · GitHub).
-
Identify and handle special cases: If an interaction has only one hydrogen bond, or a bifurcated bond, treat it cautiously. Such a contact can be recorded as a tertiary interaction (it may still be functionally relevant), but not classified into the canonical families ( Tools for the automatic identification and classification of RNA base pairs - PMC ). For instance, a single-H-bond G–C (perhaps via one donor to two acceptors) might be noted as “G–C base–base contact” but it won’t be called, say, cWH or cHS, etc. Similarly, bifurcated pairs (one atom on one base donating to two atoms on the other) must meet stricter planarity to count as base pairs: e.g. angle <50°, distance <2.1 Å as found by RNAview ( Tools for the automatic identification and classification of RNA base pairs - PMC ). If they’re more distorted than that, they likely represent an intermediate between true pair and non-pair. Literature: RNAview rules for bifurcated H-bonds ( Tools for the automatic identification and classification of RNA base pairs - PMC ) and treatment of single-bond cases ( Tools for the automatic identification and classification of RNA base pairs - PMC ); Major et al. (2002) addressing C–H and bifurcated bonds as extensions of L–W ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ).
-
Distinguish stacking-only interactions: If two bases are nearly parallel (small angle) yet do not form significant H-bonds (or are too far apart), classify them as stacked, not paired. Quantitatively, if angle <~15° but no H-bond <3.4 Å (or the planes are >2.2–2.7 Å apart), the interaction is purely stacking ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ( Tools for the automatic identification and classification of RNA base pairs - PMC ). All major tools separate stacking criteria from pairing criteria to avoid confusion. Literature: Yang et al. (2003) stacking vs pairing cutoffs ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ( Tools for the automatic identification and classification of RNA base pairs - PMC ).
-
Use a consistent nomenclature for valid pairs: Once a G–C pair passes the above geometric tests, assign it a Leontis–Westhof class (e.g. cis Watson–Crick (cWW) for standard GC, trans Sugar/Hoogsteen (tSH), etc.) based on which edges are hydrogen-bonded (). This classification is supported by numerous studies and allows one to leverage known frequency and isostericity data (RNA Basepair Catalog) (RNA Basepair Catalog). Tools like RNAview, FR3D, and DSSR will all output such annotations, which you can use as a reference. For example, if your G–C is a bit twisted but still clearly Watson–Crick edges in cis orientation, it’s a cWW pair. If it’s something like guanine’s N3 and amino group bonding to cytosine’s N3 and O2 (a rare mismatch), that might be classified as, say, trans Watson–Watson or another category – recognizing this ensures your rule aligns with known types (3DNA Homepage – Nucleic Acid Structures).
By following these guidelines – drawn from authoritative literature and validated in software – you can confidently judge G–C base pair validity across all RNA structures. In essence, demand the presence of one or two good hydrogen bonds and near-planar geometry for base pairs ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA). The references above (Leontis & Westhof 2001; Major et al. 2002; Yang et al. 2003; Lu et al. 2015, among others) provide a strong foundation to justify these rules in any publication or analysis. Adhering to such standardized criteria will make your base-pair annotations consistent with community norms and tools, ensuring that your G–C pairs (and other base pairs) are identified and classified in line with established RNA structural biology practices ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ).
References: Key sources include Leontis & Westhof (2001) for the base pair classification framework (), Major et al. (2002) introducing MC-Annotate’s H-bond probability method ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ), Yang et al. (2003) describing RNAview’s geometric criteria ( Tools for the automatic identification and classification of RNA base pairs - PMC ) ( Tools for the automatic identification and classification of RNA base pairs - PMC ), and Lu et al. (2015) detailing DSSR’s all-inclusive base-pair annotation approach ((PDF) DSSR: An integrated software tool for dissecting the spatial structure of RNA). These, along with additional analyses ( RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire - PMC ) ( Tools for the automatic identification and classification of RNA base pairs - PMC ), collectively underpin the recommended rules for RNA base pair validation. Each of the mentioned tools and studies converges on the principle that distance + planarity + H-bonds are the triad for base-pair identification, providing a well-founded, generalizable standard for all RNA structural analyses.