Class Mappings

java.lang.Object
org.openscience.cdk.isomorphism.Mappings
All Implemented Interfaces:
Iterable<int[]>

public final class Mappings extends Object implements Iterable<int[]>
A fluent interface for handling (sub)-graph mappings from a query to a target structure. The utility allows one to modify the mappings and provides convenience utilities. Mappings are obtained from a (sub)-graph matching using Pattern.
 IAtomContainer query  = ...;
 IAtomContainer target = ...;

 Mappings mappings = Pattern.findSubstructure(query)
                            .matchAll(target);
 
The primary function is to provide an iterable of matches - each match is a permutation (mapping) of the query graph indices (atom indices).

 for (int[] p : mappings) {
     for (int i = 0; i < p.length; i++)
         // query.getAtom(i) is mapped to target.getAtom(p[i]);
 }
 
The matches can be filtered to provide only those that have valid stereochemistry.
 for (int[] p : mappings.stereochemistry()) {
     // ...
 }
 
Unique matches can be obtained for both atoms and bonds.
 for (int[] p : mappings.uniqueAtoms()) {
     // ...
 }

 for (int[] p : mappings.uniqueBonds()) {
     // ...
 }
 
As matches may be lazily generated - iterating over the match twice (as above) will actually perform two graph matchings. If the mappings are needed for subsequent use the toArray() provides the permutations as a fixed size array.
 int[][] ps = mappings.toArray();
 for (int[] p : ps) {
    // ...
 }
 
Graphs with a high number of automorphisms can produce many valid matchings. Operations can be combined such as to limit the number of matches we retrieve.
 // first ten matches
 for (int[] p : mappings.limit(10)) {
     // ...
 }

 // first 10 unique matches
 for (int[] p : mappings.uniqueAtoms()
                        .limit(10)) {
     // ...
 }

 // ensure we don't waste memory and only 'fix' up to 100 unique matches
 int[][] ps = mappings.uniqueAtoms()
                      .limit(100)
                      .toArray();
 
There is no restrictions on which operation can be applied and how many times but the order of operations may change the result.
 // first 100 unique matches
 Mappings m = mappings.uniqueAtoms()
                      .limit(100);

 // unique matches in the first 100 matches
 Mappings m = mappings.limit(100)
                      .uniqueAtoms();

 // first 10 unique matches in the first 100 matches
 Mappings m = mappings.limit(100)
                      .uniqueAtoms()
                      .limit(10);

 // number of unique atom matches
 int n = mappings.countUnique();

 // number of unique atom matches with correct stereochemistry
 int n = mappings.stereochemistry()
                 .countUnique();

 
Author:
John May
See Also:
Source code:
main
Belongs to CDK module:
isomorphism
Keywords:
substructure search, structure search, mappings, matching
  • Method Details

    • filter

      public Mappings filter(Predicate<int[]> predicate)
      Filter the mappings and keep only those which match the provided predicate (Guava).
      
      
           final IAtomContainer query;
           final IAtomContainer target;
      
           // obtain only the mappings where the first atom in the query is
           // mapped to the first atom in the target
           Mappings mappings = Pattern.findSubstructure(query)
                                      .matchAll(target)
                                      .filter(new Predicate<int[]>() {
                                          public boolean apply(int[] input) {
                                              return input[0] == 0;
                                          }});
      
       
      Parameters:
      predicate - a predicate
      Returns:
      fluent-api reference
    • map

      public <T> Iterable<T> map(Function<int[],T> f)
      Map the mappings to another type. Each mapping is transformed using the provided function.
      
      
           final IAtomContainer query;
           final IAtomContainer target;
      
           Mappings mappings = Pattern.findSubstructure(query)
                                      .matchAll(target);
      
           // a string that indicates the mapping of atom elements and numbers
           Iterable&lt;String&gt; strs = mappings.map(new Function<int[], String>() {
               public String apply(int[] input) {
                   StringBuilder sb = new StringBuilder();
                   for (int i = 0; i &lt; input.length; i++) {
                       if (i > 0) sb.append(", ");
                       sb.append(query.getAtom(i))
                         .append(i + 1)
                         .append(" -> ")
                         .append(target.getAtom(input[i]))
                         .append(input[i] + 1);
                   }
                   return sb.toString();
               }});
      
       
      Parameters:
      f - function to transform a mapping
      Returns:
      iterable of the transformed type
    • limit

      public Mappings limit(int limit)
      Limit the number of mappings - only this number of mappings will be generate.
      Parameters:
      limit - the number of mappings
      Returns:
      fluent-api instance
    • stereochemistry

      @Deprecated public Mappings stereochemistry()
      Deprecated.
      Results now automatically consider stereo if it's present, to match without stereochemistry remove the stereo features.
      Filter the mappings for those which preserve stereochemistry specified in the query.
      Returns:
      fluent-api instance
    • uniqueAtoms

      public Mappings uniqueAtoms()
      Filter the mappings for those which cover a unique atoms in the target. The unique atom mappings are a subset of the unique bond matches.
      Returns:
      fluent-api instance
      See Also:
    • exclusiveAtoms

      public Mappings exclusiveAtoms()
      Filter the mappings for those which cover an exclusive set of atoms in the target. If a match overlaps with another one it is not returned. For example suppose we had the query C~O and matched against a carboxylic acid *C(O)=O, there are 2 unique matches but only 1 exclusive match. If we had two -CO2 groups (c1ccc(C(O)=O)cc1C(O)=O there are unique matches and 2 exclusive matches. The exclusive atom mappings are therefore a subset of the unique atom matches.
      Returns:
      fluent-api instance
      See Also:
    • uniqueBonds

      public Mappings uniqueBonds()
      Filter the mappings for those which cover a unique set of bonds in the target.
      Returns:
      fluent-api instance
      See Also:
    • toArray

      public int[][] toArray()
      Mappings are lazily generated and best used in a loop. However if all mappings are required this method can provide a fixed size array of mappings.
      
       IAtomContainer query  = ...;
       IAtomContainer target = ...;
      
       Pattern pat = Pattern.findSubstructure(query);
      
       // lazily iterator
       for (int[] mapping : pat.matchAll(target)) {
           // logic...
       }
      
       int[][] mappings = pat.matchAll(target)
                             .toArray();
      
       // same as lazy iterator but we now can refer to and parse 'mappings'
       // to other methods without regenerating the graph match
       for (int[] mapping : mappings) {
           // logic...
       }
       
      The method can be used in combination with other modifiers.
      
       IAtomContainer query  = ...;
       IAtomContainer target = ...;
      
       Pattern pat = Pattern.findSubstructure(query);
      
       // array of the first 5 unique atom mappings
       int[][] mappings = pat.matchAll(target)
                             .uniqueAtoms()
                             .limit(5)
                             .toArray();
       
      Returns:
      array of mappings
    • toAtomMap

      public Iterable<Map<IAtom,IAtom>> toAtomMap()
      Convert the permutations to a atom-atom map.
       for (Map<IAtom,IAtom> map : mappings.toAtomMap()) {
           for (Map.Entry<IAtom,IAtom> e : map.entrySet()) {
               IAtom queryAtom  = e.getKey();
               IAtom targetAtom = e.getValue();
           }
       }
       
      Returns:
      iterable of atom-atom mappings
    • toBondMap

      public Iterable<Map<IBond,IBond>> toBondMap()
      Convert the permutations to a bond-bond map.
       for (Map<IBond,IBond> map : mappings.toBondMap()) {
           for (Map.Entry<IBond,IBond> e : map.entrySet()) {
               IBond queryBond  = e.getKey();
               IBond targetBond = e.getValue();
           }
       }
       
      Returns:
      iterable of bond-bond mappings
    • toAtomBondMap

      public Iterable<Map<IChemObject,IChemObject>> toAtomBondMap()
      Convert the permutations to an atom-atom bond-bond map.
       for (Map<IChemObject,IChemObject> map : mappings.toBondMap()) {
           for (Map.Entry<IChemObject,IChemObject> e : map.entrySet()) {
               IChemObject queryObj  = e.getKey();
               IChemObject targetObj = e.getValue();
           }
      
           IAtom matchedAtom = map.get(query.getAtom(i));
           IBond matchedBond = map.get(query.getBond(i));
       }
       
      Returns:
      iterable of atom-atom and bond-bond mappings
    • stream

      public Stream<int[]> stream()
      Convert the Mappings to a Java 8 Stream. The Stream API was written after this class and provides much of the functionality (e.g. map(java.util.function.Function<int[], T>) is Stream.map(java.util.function.Function) etc. Unlike an Iterable, a stream cannot be traversed more than once.
      Returns:
      the stream
    • toChemObjects

      public Iterable<IChemObject> toChemObjects()
      Obtain the chem objects (atoms and bonds) that have 'hit' in the target molecule.
       for (IChemObject obj : mappings.toChemObjects()) {
         if (obj instanceof IAtom) {
            // this atom was 'hit' by the pattern
         }
       }
       
      Returns:
      non-lazy iterable of chem objects
    • toSubstructuresStream

      public Stream<IAtomContainer> toSubstructuresStream()
      Obtain the mapped substructures (atoms/bonds) of the target compound. The atoms and bonds are the same as in the target molecule but there may be less of them.
       IAtomContainer query, target
       Mappings mappings = ...;
       for (IAtomContainer mol : mol.toSubstructures()) {
          for (IAtom atom : mol.atoms())
            target.contains(atom); // always true
          for (IAtom atom : target.atoms())
            mol.contains(atom): // not always true
       }
       
      Returns:
      lazy stream iterable of molecules
    • toSubstructures

      public Iterable<IAtomContainer> toSubstructures()
      Obtain the mapped substructures (atoms/bonds) of the target compound. The atoms and bonds are the same as in the target molecule but there may be less of them.
       IAtomContainer query, target
       Mappings mappings = ...;
       for (IAtomContainer mol : mol.toSubstructures()) {
          for (IAtom atom : mol.atoms())
            target.contains(atom); // always true
          for (IAtom atom : target.atoms())
            mol.contains(atom): // not always true
       }
       
      Returns:
      non-lazy iterable of molecules
    • atLeast

      public boolean atLeast(int n)
      Efficiently determine if there are at least 'n' matches
       Mappings mappings = ...;
      
       if (mappings.atLeast(5))
          // set bit flag etc.
      
       // are the at least 5 unique matches?
       if (mappings.uniqueAtoms().atLeast(5))
          // set bit etc.
       
      Parameters:
      n - number of matches
      Returns:
      there are at least 'n' matches
    • first

      public int[] first()
      Obtain the first match - if there is no first match an empty array is returned.
      Returns:
      first match
    • count

      public int count()
      Convenience method to count the number mappings. Note mappings are lazily generated and checking the count and then iterating over the mappings currently performs two searches. If the mappings are also needed, it is more efficient to check the mappings and count manually.
      Returns:
      number of matches
    • countUnique

      public int countUnique()
      Convenience method to count the number of unique atom mappings. Note mappings are lazily generated and checking the count and then iterating over the mappings currently performs two searches. If the mappings are also needed, it is more efficient to check the mappings and count manually. The method is simply invokes
      mappings.uniqueAtoms().count()
      .
      Returns:
      number of matches
    • iterator

      public Iterator<int[]> iterator()
      Specified by:
      iterator in interface Iterable<int[]>