Class TripletShortCycles

java.lang.Object
org.openscience.cdk.graph.TripletShortCycles

public final class TripletShortCycles extends Object
Compute the shortest cycles through each vertex triple. This allows one to directly obtain the envelope rings of bicyclic fused system. These cycles can be thought of as the 'ESSSR' (extended smallest set of smallest rings) and 'envelope' rings as used by PubChem fingerprints (CACTVS Substructure Keys). The PubChem fingerprint documentation exclusively refers to the ESSSR and envelopes as just the ESSSR and the rest of this documentation does the same. This class provides the cycles (vertex paths) for each ring in the ESSSR. The ESSSR should not be confused with the extended set of smallest rings (ESSR) [Downs, G et. al.. J. Chem. Inf. Comput. Sci. 1989. 29]. Algorithm To our knowledge no algorithm has been published for the ESSSR. The PubChem Specifications states - "An ESSSR ring is any ring which does not share three consecutive atoms with any other ring in the chemical structure. For example, naphthalene has three ESSSR rings (two phenyl fragments and the 10-membered envelope), while biphenyl will yield a count of only two ESSSR rings". The name implies the use of the smallest set of smallest rings (SSSR). Not every graph has an SSSR and so the minimum cycle basis is used instead. With this modification the algorithm is outlined below.
  1. Compute a minimum cycle basis (or SSSR) of the graph (may not be unique)
  2. For each vertex v and two adjacent vertices (u and w) check if the path -u-v-w- belongs to any cycles already in the basis
  3. If no such cycle can be found compute the shortest cycle which travels through -u-v-w- and add it to the basis. The shortest cycle is the shortest path from u to w which does not travel through v
In the case of naphthalene the minimum cycle basis is the two phenyl rings. Taking either bridgehead atom of naphthalene to be v and choosing u and w to be in different phenyl rings it is easy to see the shortest cycle through -u-v-w- is the 10 member envelope ring. Canonical and Non-Canonical Generation The algorithm can generate a canonical or non-canonical (preferred) set of cycles. As one can see from the above description depending on the order we check each triple (-u-v-w-) and add it to basis we may end up with a different set. To avoid this PubChem fingerprints uses a canonical labelling ensuring the vertices are always checked in the same order. The vertex order used by this class is the natural order of the vertices as provided in the graph. To ensure the generated set is always the same vertices should be ordered beforehand or the non-canonical option should be used. Although this canonical sorting allows one to reliable generate the same set of cycles for a graph this is not true for subgraphs. For two graphs G, H and a canonical ordering (π). If H is a subgraph of G then for two vertices u, v. It follows that π(u) < π(v)Hπ(u) < π(v)G. In other words, we can canonically label a graph and inspect the ordering of vertices u and v. We now take a subgraph which contains both u and v - the ordering does not need to be the same as the full graph. This means that a subgraph may contain a ring in its ESSSR which does not belong to the ESSSR of the full graph. To resolve this problem you can turn off the canonical option. This relaxes the existing condition (Step 2.) and adds all shortest cycles through each triple (-u-v-w-) to the basis. The number of cycles generated may be larger however it is now possible to ensure that if H is a subgraph of G then ESSSR of H will be a subset of the ESSSR or G. Alternatively one may consider using the RelevantCycles which is the the smallest set of short cycles which is uniquely defined for a graph. To better explain the issue with the canonical labelling several examples are shown below. The table outlining the size of rings found for each molecule when using canonical and non-canonical generation. Also shown are the sizes of rings stored in the PubChem fingerprint associated with the entry. The fingerprints were obtained directly from PubChem and decoded using the specification. Sizes underlined and coloured red represent rings which may or may not be present depending on the atom ordering. It can be seen from the PubChem fingerprint that even using a consistent canonical labelling rings may be absent which would be present if the subgraph was used.
PubChem CIDDiagramSize of Rings in ESSSR
(fingerprints only store cycles |C| <= 10)
Source
CID 135973 Compound Image
{3, 3, 4}
{3, 3, 4}
{3, 3, 4}
Canonical
Non-canonical
PubChem Fingerprint
CID 9249 Compound Image
{3, 3, 4, 6, 6}
{3, 3, 4, 6, 6}
{3, 3, 6, 6}
Canonical - 4 member cycle only added if found before larger 6 member cycles
Non-canonical
PubChem Fingerprint - 4 member cycle not found
CID 931 Compound Image
{6, 6, 10}
{6, 6, 10}
{6, 6, 10}
Canonical
Non-canonical
PubChem Fingerprint
CID 5702 Compound Image
{6, 6, 6, 6, 10, 10, 20, 22, 22, 24, 24}
{6, 6, 6, 6, 10, 10, 20, 22, 22, 24, 24}
{6, 6, 6, 6}
Canonical - 10 member cycles only added if found before larger cycles
Non-canonical
PubChem Fingerprint - 10 member cycles not found
CID 1211 Compound Image
{6, 6, 6, 6, 6, 6, 10, 10, 18, 18, 20, 20, 22, 22, 22}
{6, 6, 6, 6, 6, 6, 10, 10, 18, 18, 20, 20, 22, 22, 22}
{6, 6, 6, 6, 6, 6, 10, 10}
Canonical - 10 member cycles only added if found before larger cycles
Non-canonical
PubChem Fingerprint - 10 member cycles were found
CID 17858819 Compound Image
{5, 6, 9}
{5, 6, 9}
{5, 6, 9}
Canonical
Non-canonical
PubChem Fingerprint
CID 1909 Compound Image
{5, 5, 5, 6, 9, 16, 17, 17, 17, 18}
{5, 5, 5, 6, 9, 16, 17, 17, 17, 18}
{5, 5, 5, 6}
Canonical - 9 member cycle only added if found before larger cycles
Non-canonical
PubChem Fingerprint - 9 member cycle not found
Author:
John May
See Also:
Source code:
main
Belongs to CDK module:
core
Keywords:
ESSSR, ring, cycle
  • Constructor Details

    • TripletShortCycles

      public TripletShortCycles(MinimumCycleBasis mcb, boolean canonical)
      Compute the cycles of the extended smallest set of smallest rings (ESSSR) for an existing minimum cycle basis. Choosing the set to be canonical means the set depends on the order of the vertices and may not be consistent in subgraphs. Given a different order of vertices the same cycles may not be found.
      Parameters:
      mcb - minimum cycle basis
      canonical - should the set be canonical (non-unique)
  • Method Details

    • paths

      public int[][] paths()
      Access the vertex paths for all cycles of the basis.
      Returns:
      paths of the basis
    • size

      public int size()
      Size of the cycle basis, cardinality of the ESSSR.
      Returns:
      number of cycles in the basis