Class Mappings

  • All Implemented Interfaces:
    Iterable<int[]>

    public final class Mappings
    extends Object
    implements Iterable<int[]>
    A fluent interface for handling (sub)-graph mappings from a query to a target structure. The utility allows one to modify the mappings and provides convenience utilities. Mappings are obtained from a (sub)-graph matching using Pattern.
     IAtomContainer query  = ...;
     IAtomContainer target = ...;
    
     Mappings mappings = Pattern.findSubstructure(query)
                                .matchAll(target);
     
    The primary function is to provide an iterable of matches - each match is a permutation (mapping) of the query graph indices (atom indices).
    
     for (int[] p : mappings) {
         for (int i = 0; i < p.length; i++)
             // query.getAtom(i) is mapped to target.getAtom(p[i]);
     }
     
    The matches can be filtered to provide only those that have valid stereochemistry.
     for (int[] p : mappings.stereochemistry()) {
         // ...
     }
     
    Unique matches can be obtained for both atoms and bonds.
     for (int[] p : mappings.uniqueAtoms()) {
         // ...
     }
    
     for (int[] p : mappings.uniqueBonds()) {
         // ...
     }
     
    As matches may be lazily generated - iterating over the match twice (as above) will actually perform two graph matchings. If the mappings are needed for subsequent use the toArray() provides the permutations as a fixed size array.
     int[][] ps = mappings.toArray();
     for (int[] p : ps) {
        // ...
     }
     
    Graphs with a high number of automorphisms can produce many valid matchings. Operations can be combined such as to limit the number of matches we retrieve.
     // first ten matches
     for (int[] p : mappings.limit(10)) {
         // ...
     }
    
     // first 10 unique matches
     for (int[] p : mappings.uniqueAtoms()
                            .limit(10)) {
         // ...
     }
    
     // ensure we don't waste memory and only 'fix' up to 100 unique matches
     int[][] ps = mappings.uniqueAtoms()
                          .limit(100)
                          .toArray();
     
    There is no restrictions on which operation can be applied and how many times but the order of operations may change the result.
     // first 100 unique matches
     Mappings m = mappings.uniqueAtoms()
                          .limit(100);
    
     // unique matches in the first 100 matches
     Mappings m = mappings.limit(100)
                          .uniqueAtoms();
    
     // first 10 unique matches in the first 100 matches
     Mappings m = mappings.limit(100)
                          .uniqueAtoms()
                          .limit(10);
    
     // number of unique atom matches
     int n = mappings.countUnique();
    
     // number of unique atom matches with correct stereochemistry
     int n = mappings.stereochemistry()
                     .countUnique();
    
     
    Author:
    John May
    See Also:
    Pattern
    Source code:
    main
    Belongs to CDK module:
    isomorphism
    Keywords:
    substructure search, structure search, mappings, matching
    • Method Detail

      • filter

        public Mappings filter​(Predicate<int[]> predicate)
        Filter the mappings and keep only those which match the provided predicate (Guava).
        
        
             final IAtomContainer query;
             final IAtomContainer target;
        
             // obtain only the mappings where the first atom in the query is
             // mapped to the first atom in the target
             Mappings mappings = Pattern.findSubstructure(query)
                                        .matchAll(target)
                                        .filter(new Predicate<int[]>() {
                                            public boolean apply(int[] input) {
                                                return input[0] == 0;
                                            }});
        
         
        Parameters:
        predicate - a predicate
        Returns:
        fluent-api reference
      • map

        public <T> Iterable<T> map​(Function<int[],​T> f)
        Map the mappings to another type. Each mapping is transformed using the provided function.
        
        
             final IAtomContainer query;
             final IAtomContainer target;
        
             Mappings mappings = Pattern.findSubstructure(query)
                                        .matchAll(target);
        
             // a string that indicates the mapping of atom elements and numbers
             Iterable&lt;String&gt; strs = mappings.map(new Function<int[], String>() {
                 public String apply(int[] input) {
                     StringBuilder sb = new StringBuilder();
                     for (int i = 0; i &lt; input.length; i++) {
                         if (i > 0) sb.append(", ");
                         sb.append(query.getAtom(i))
                           .append(i + 1)
                           .append(" -> ")
                           .append(target.getAtom(input[i]))
                           .append(input[i] + 1);
                     }
                     return sb.toString();
                 }});
        
         
        Parameters:
        f - function to transform a mapping
        Returns:
        iterable of the transformed type
      • limit

        public Mappings limit​(int limit)
        Limit the number of mappings - only this number of mappings will be generate.
        Parameters:
        limit - the number of mappings
        Returns:
        fluent-api instance
      • stereochemistry

        @Deprecated
        public Mappings stereochemistry()
        Deprecated.
        Results now automatically consider stereo if it's present, to match without stereochemistry remove the stereo features.
        Filter the mappings for those which preserve stereochemistry specified in the query.
        Returns:
        fluent-api instance
      • uniqueAtoms

        public Mappings uniqueAtoms()
        Filter the mappings for those which cover a unique atoms in the target. The unique atom mappings are a subset of the unique bond matches.
        Returns:
        fluent-api instance
        See Also:
        uniqueBonds(), exclusiveAtoms()
      • exclusiveAtoms

        public Mappings exclusiveAtoms()
        Filter the mappings for those which cover an exclusive set of atoms in the target. If a match overlaps with another one it is not returned. For example suppose we had the query C~O and matched against a carboxylic acid *C(O)=O, there are 2 unique matches but only 1 exclusive match. If we had two -CO2 groups (c1ccc(C(O)=O)cc1C(O)=O there are unique matches and 2 exclusive matches. The exclusive atom mappings are therefore a subset of the unique atom matches.
        Returns:
        fluent-api instance
        See Also:
        uniqueAtoms(), ExclusiveAtomMatches
      • uniqueBonds

        public Mappings uniqueBonds()
        Filter the mappings for those which cover a unique set of bonds in the target.
        Returns:
        fluent-api instance
        See Also:
        uniqueAtoms()
      • toArray

        public int[][] toArray()
        Mappings are lazily generated and best used in a loop. However if all mappings are required this method can provide a fixed size array of mappings.
        
         IAtomContainer query  = ...;
         IAtomContainer target = ...;
        
         Pattern pat = Pattern.findSubstructure(query);
        
         // lazily iterator
         for (int[] mapping : pat.matchAll(target)) {
             // logic...
         }
        
         int[][] mappings = pat.matchAll(target)
                               .toArray();
        
         // same as lazy iterator but we now can refer to and parse 'mappings'
         // to other methods without regenerating the graph match
         for (int[] mapping : mappings) {
             // logic...
         }
         
        The method can be used in combination with other modifiers.
        
         IAtomContainer query  = ...;
         IAtomContainer target = ...;
        
         Pattern pat = Pattern.findSubstructure(query);
        
         // array of the first 5 unique atom mappings
         int[][] mappings = pat.matchAll(target)
                               .uniqueAtoms()
                               .limit(5)
                               .toArray();
         
        Returns:
        array of mappings
      • toAtomMap

        public Iterable<Map<IAtom,​IAtom>> toAtomMap()
        Convert the permutations to a atom-atom map.
         for (Map<IAtom,IAtom> map : mappings.toAtomMap()) {
             for (Map.Entry<IAtom,IAtom> e : map.entrySet()) {
                 IAtom queryAtom  = e.getKey();
                 IAtom targetAtom = e.getValue();
             }
         }
         
        Returns:
        iterable of atom-atom mappings
      • toBondMap

        public Iterable<Map<IBond,​IBond>> toBondMap()
        Convert the permutations to a bond-bond map.
         for (Map<IBond,IBond> map : mappings.toBondMap()) {
             for (Map.Entry<IBond,IBond> e : map.entrySet()) {
                 IBond queryBond  = e.getKey();
                 IBond targetBond = e.getValue();
             }
         }
         
        Returns:
        iterable of bond-bond mappings
      • toAtomBondMap

        public Iterable<Map<IChemObject,​IChemObject>> toAtomBondMap()
        Convert the permutations to an atom-atom bond-bond map.
         for (Map<IChemObject,IChemObject> map : mappings.toBondMap()) {
             for (Map.Entry<IChemObject,IChemObject> e : map.entrySet()) {
                 IChemObject queryObj  = e.getKey();
                 IChemObject targetObj = e.getValue();
             }
        
             IAtom matchedAtom = map.get(query.getAtom(i));
             IBond matchedBond = map.get(query.getBond(i));
         }
         
        Returns:
        iterable of atom-atom and bond-bond mappings
      • toChemObjects

        public Iterable<IChemObject> toChemObjects()
        Obtain the chem objects (atoms and bonds) that have 'hit' in the target molecule.
         for (IChemObject obj : mappings.toChemObjects()) {
           if (obj instanceof IAtom) {
              // this atom was 'hit' by the pattern
           }
         }
         
        Returns:
        non-lazy iterable of chem objects
      • toSubstructuresStream

        public Stream<IAtomContainer> toSubstructuresStream()
        Obtain the mapped substructures (atoms/bonds) of the target compound. The atoms and bonds are the same as in the target molecule but there may be less of them.
         IAtomContainer query, target
         Mappings mappings = ...;
         for (IAtomContainer mol : mol.toSubstructures()) {
            for (IAtom atom : mol.atoms())
              target.contains(atom); // always true
            for (IAtom atom : target.atoms())
              mol.contains(atom): // not always true
         }
         
        Returns:
        lazy stream iterable of molecules
      • toSubstructures

        public Iterable<IAtomContainer> toSubstructures()
        Obtain the mapped substructures (atoms/bonds) of the target compound. The atoms and bonds are the same as in the target molecule but there may be less of them.
         IAtomContainer query, target
         Mappings mappings = ...;
         for (IAtomContainer mol : mol.toSubstructures()) {
            for (IAtom atom : mol.atoms())
              target.contains(atom); // always true
            for (IAtom atom : target.atoms())
              mol.contains(atom): // not always true
         }
         
        Returns:
        non-lazy iterable of molecules
      • atLeast

        public boolean atLeast​(int n)
        Efficiently determine if there are at least 'n' matches
         Mappings mappings = ...;
        
         if (mappings.atLeast(5))
            // set bit flag etc.
        
         // are the at least 5 unique matches?
         if (mappings.uniqueAtoms().atLeast(5))
            // set bit etc.
         
        Parameters:
        n - number of matches
        Returns:
        there are at least 'n' matches
      • first

        public int[] first()
        Obtain the first match - if there is no first match an empty array is returned.
        Returns:
        first match
      • count

        public int count()
        Convenience method to count the number mappings. Note mappings are lazily generated and checking the count and then iterating over the mappings currently performs two searches. If the mappings are also needed, it is more efficient to check the mappings and count manually.
        Returns:
        number of matches
      • countUnique

        public int countUnique()
        Convenience method to count the number of unique atom mappings. Note mappings are lazily generated and checking the count and then iterating over the mappings currently performs two searches. If the mappings are also needed, it is more efficient to check the mappings and count manually. The method is simply invokes
        mappings.uniqueAtoms().count()
        .
        Returns:
        number of matches