Class Canon


  • public final class Canon
    extends Object
    An implementation based on the canon algorithm [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. The algorithm uses an initial set of of invariants which are assigned a rank. Equivalent ranks are then shattered using an unambiguous function (in this case, the product of primes of adjacent ranks). Once no more equivalent ranks can be shattered ties are artificially broken and rank shattering continues. Unlike the original description rank stability is not maintained reducing the number of values to rank at each stage to only those which are equivalent. The initial set of invariants is basic and are - "sufficient for the purpose of obtaining unique notation for simple SMILES, but it is not necessarily a “complete” set. No “perfect” set of invariants is known that will distinguish all possible graph asymmetries. However, for any given set of structures, a set of invariants can be devised to provide the necessary discrimination" [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. As such this producer should not be considered a complete canonical labelled but in practice performs well. For a more accurate and computationally expensive labelling, please using the InChINumbersTools.
     IAtomContainer m = ...;
     int[][]        g = GraphUtil.toAdjList(m);
    
     // obtain canon labelling
     long[] labels = Canon.label(m, g);
    
     // obtain symmetry classes
     long[] labels = Canon.symmetry(m, g);
     
    Author:
    John May
    Source code:
    main
    Belongs to CDK module:
    standard
    • Method Detail

      • label

        public static long[] label​(IAtomContainer container,
                                   int[][] g,
                                   int opts)
        Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
        Parameters:
        container - structure
        g - adjacency list graph representation
        opts - canonical generation options see CanonOpts
        Returns:
        the canonical labelling
        See Also:
        EquivalentClassPartitioner, InChINumbersTools
      • label

        public static long[] label​(IAtomContainer container,
                                   int[][] g)
        Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
        Parameters:
        container - structure
        g - adjacency list graph representation
        Returns:
        the canonical labelling
        See Also:
        EquivalentClassPartitioner, InChINumbersTools
      • label

        public static long[] label​(IAtomContainer container,
                                   int[][] g,
                                   long[] initial)
        Compute the canonical labels for the provided structure. The labelling does not consider isomer information or stereochemistry. This method allows provision of a custom array of initial invariants. The current implementation does not fully distinguish all structure topologies but in practise performs well in the majority of cases. A complete canonical labelling can be obtained using the InChINumbersTools but is computationally much more expensive.
        Parameters:
        container - structure
        g - adjacency list graph representation
        initial - initial seed invariants
        Returns:
        the canonical labelling
        See Also:
        EquivalentClassPartitioner, InChINumbersTools
      • label

        public static long[] label​(IAtomContainer container,
                                   int[][] g,
                                   Comparator<IAtom> cmp)
        Compute the canonical labels for the provided structure. The initial labelling is seed-ed with the provided atom comparator cmp allowing arbitary properties to be distinguished or ignored.
        Parameters:
        container - structure
        g - adjacency list graph representation
        cmp - comparator to compare atoms
        Returns:
        the canonical labelling
      • symmetry

        public static long[] symmetry​(IAtomContainer container,
                                      int[][] g,
                                      int opts)
        Compute the symmetry classes for the provided structure. There are known examples where symmetry is incorrectly found. The EquivalentClassPartitioner gives more accurate symmetry perception but this method is very quick and in practise successfully portions the majority of chemical structures.
        Parameters:
        container - structure
        g - adjacency list graph representation
        opts - canonical generation options see CanonOpts
        Returns:
        symmetry classes
        See Also:
        EquivalentClassPartitioner
      • symmetry

        public static long[] symmetry​(IAtomContainer container,
                                      int[][] g)
        Compute the symmetry classes for the provided structure. There are known examples where symmetry is incorrectly found. The EquivalentClassPartitioner gives more accurate symmetry perception but this method is very quick and in practise successfully portions the majority of chemical structures.
        Parameters:
        container - structure
        g - adjacency list graph representation
        Returns:
        symmetry classes
        See Also:
        EquivalentClassPartitioner, basicInvariants(IAtomContainer, int[][], int)
      • basicInvariants

        public static long[] basicInvariants​(IAtomContainer container,
                                             int[][] graph,
                                             int flav)
        Generate the initial invariants for each atom in the container. The labels use the invariants described in [Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29]. The bits in the low 32-bits are: 0000000000xxxxXXXXeeeeeeescchhhh where:
        • 0: padding
        • x: number of connections
        • X: number of non-hydrogens bonds
        • e: atomic number
        • s: sign of charge
        • c: absolute charge
        • h: number of attached hydrogens
        Important: These invariants are basic and there are known examples they don't distinguish. One trivial example to consider is [O]C=O where both oxygens have no hydrogens and a single connection but the atoms are not equivalent. Including a better initial partition is more expensive
        Parameters:
        container - an atom container to generate labels for
        graph - graph representation (adjacency list)
        flav - bit mask canon flavor (see CanonOpts)
        Returns:
        initial invariants
        Throws:
        NullPointerException - an atom had unset atomic number, hydrogen count or formal charge