java.lang.Object
org.openscience.cdk.graph.invariant.Canon

public final class Canon extends Object
An implementation based on the canon algorithm [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. The algorithm uses an initial set of of invariants which are assigned a rank. Equivalent ranks are then shattered using an unambiguous function (in this case, the product of primes of adjacent ranks). Once no more equivalent ranks can be shattered ties are artificially broken and rank shattering continues. Unlike the original description rank stability is not maintained reducing the number of values to rank at each stage to only those which are equivalent. The initial set of invariants is basic and are - "sufficient for the purpose of obtaining unique notation for simple SMILES, but it is not necessarily a “complete” set. No “perfect” set of invariants is known that will distinguish all possible graph asymmetries. However, for any given set of structures, a set of invariants can be devised to provide the necessary discrimination" [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. As such this producer should not be considered a complete canonical labelled but in practice performs well. For a more accurate and computationally expensive labelling, please using the InChINumbersTools.
 IAtomContainer m = ...;
 int[][]        g = GraphUtil.toAdjList(m);

 // obtain canon labelling
 long[] labels = Canon.label(m, g);

 // obtain symmetry classes
 long[] labels = Canon.symmetry(m, g);
 
Author:
John May
Source code:
main
Belongs to CDK module:
standard
  • Method Details

    • label

      public static long[] label(IAtomContainer container, int[][] g, int opts)
      Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
      Parameters:
      container - structure
      g - adjacency list graph representation
      opts - canonical generation options see CanonOpts
      Returns:
      the canonical labelling
      See Also:
    • label

      public static long[] label(IAtomContainer container, int[][] g)
      Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
      Parameters:
      container - structure
      g - adjacency list graph representation
      Returns:
      the canonical labelling
      See Also:
    • label

      public static long[] label(IAtomContainer container, int[][] g, long[] initial)
      Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. This method allows provision of a custom array of initial invariants. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
      Parameters:
      container - structure
      g - adjacency list graph representation
      initial - initial seed invariants
      Returns:
      the canonical labelling
      See Also:
    • label

      public static long[] label(IAtomContainer container, int[][] g, Comparator<IAtom> cmp)
      Compute the canonical labels for the provided structure. The initial labelling is seed-ed with the provided atom comparator cmp allowing arbitary properties to be distinguished or ignored.
      Parameters:
      container - structure
      g - adjacency list graph representation
      cmp - comparator to compare atoms
      Returns:
      the canonical labelling
    • symmetry

      public static long[] symmetry(IAtomContainer container, int[][] g, int opts)
      Compute the symmetry classes for the provided structure. There are known examples where symmetry is incorrectly found. The EquivalentClassPartitioner gives more accurate symmetry perception but this method is very quick and in practise successfully portions the majority of chemical structures.
      Parameters:
      container - structure
      g - adjacency list graph representation
      opts - canonical generation options see CanonOpts
      Returns:
      symmetry classes
      See Also:
    • symmetry

      public static long[] symmetry(IAtomContainer container, int[][] g)
      Compute the symmetry classes for the provided structure. There are known examples where symmetry is incorrectly found. The EquivalentClassPartitioner gives more accurate symmetry perception but this method is very quick and in practise successfully portions the majority of chemical structures.
      Parameters:
      container - structure
      g - adjacency list graph representation
      Returns:
      symmetry classes
      See Also:
    • basicInvariants

      public static long[] basicInvariants(IAtomContainer container, int[][] graph)
      Parameters:
      container - an atom container to generate labels for
      graph - graph representation (adjacency list)
      Returns:
      the initial invariants
      See Also:
    • basicInvariants

      public static long[] basicInvariants(IAtomContainer container, int[][] graph, int flav)
      Generate the initial invariants for each atom in the container. The labels use the invariants described in [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. The bits in the low 32-bits are: 0000000000xxxxXXXXeeeeeeescchhhh where:
      • 0: padding
      • x: number of connections
      • X: number of non-hydrogens bonds
      • e: atomic number
      • s: sign of charge
      • c: absolute charge
      • h: number of attached hydrogens
      Important: These invariants are basic and there are known examples they don't distinguish. One trivial example to consider is [O]C=O where both oxygens have no hydrogens and a single connection but the atoms are not equivalent. Including a better initial partition is more expensive
      Parameters:
      container - an atom container to generate labels for
      graph - graph representation (adjacency list)
      flav - bit mask canon flavor (see CanonOpts)
      Returns:
      initial invariants
      Throws:
      NullPointerException - an atom had unset atomic number, hydrogen count or formal charge