Class UniversalIsomorphismTester

java.lang.Object
org.openscience.cdk.isomorphism.UniversalIsomorphismTester

public class UniversalIsomorphismTester extends Object
This class implements a multipurpose structure comparison tool. It allows to find maximal common substructure, find the mapping of a substructure in another structure, and the mapping of two isomorphic structures.

Structure comparison may be associated to bond constraints (mandatory bonds, e.g. scaffolds, reaction cores,...) on each source graph. The constraint flexibility allows a number of interesting queries. The substructure analysis relies on the RGraph generic class (see: RGraph) This class implements the link between the RGraph model and the the CDK model in this way the RGraph remains independent and may be used in other contexts.

This algorithm derives from the algorithm described in [Tonnelier, C. et. al.. Tetrahedron Comput. Methodol.. 1990. 3] and modified in the thesis of T. Hanser [Hanser, Th., Apprentissage automatique de méthodes de synthèse à partir d'exemples, 1993, ?Institute?].

With the isSubgraph(IAtomContainer, IAtomContainer) method, the second, and only the second argument may be a IQueryAtomContainer, which allows one to do SMARTS or MQL like queries. The first IAtomContainer must never be an IQueryAtomContainer. An example:

  SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance());
  IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C"); // acetic acid anhydride
  IAtomContainer SMILESquery = sp.parseSmiles("CC"); // ethane
  IQueryAtomContainer query = IQueryAtomContainerCreator.createBasicQueryContainer(SMILESquery);
  boolean isSubstructure = UniversalIsomorphismTester.isSubgraph(atomContainer, query);
  

WARNING: As a result of the adjacency perception used in this algorithm there is a single limitation: cyclopropane and isobutane are seen as isomorph. This is due to the fact that these two compounds are the only ones where each bond is connected two each other bond (bonds are fully connected) with the same number of bonds and still they have different structures The algorithm could be easily enhanced with a simple atom mapping manager to provide an atom level overlap definition that would reveal this case. We decided not to penalize the whole procedure because of one single exception query. Furthermore isomorphism may be discarded since the number of atoms are not the same (3 != 4) and in most case this will be already screened out by a fingerprint based filtering. It is possible to add a special treatment for this special query. Be reminded that this algorithm matches bonds only.

NoteWhile most isomorphism queries involve a multi-atom query structure there may be cases in which the query atom is a single atom. In such a case a mapping of target bonds to query bonds is not feasible. In such a case, the RMap objects correspond to atom indices rather than bond indices. In general, this will not affect user code and the same sequence of method calls for matching multi-atom query structures will work for single atom query structures as well.

Author:
Stephane Werner from IXELIS mail@ixelis.net
Source code:
main
Belongs to CDK module:
standard
Created on:
2002-07-17
Requires:
java1.4+
  • Constructor Details

    • UniversalIsomorphismTester

      public UniversalIsomorphismTester()
  • Method Details

    • isIsomorph

      public boolean isIsomorph(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Tests if g1 and g2 are isomorph.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      true if the 2 molecule are isomorph
      Throws:
      CDKException - if the first molecule is an instance of IQueryAtomContainer
    • getIsomorphMap

      public List<RMap> getIsomorphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns the first isomorph mapping found or null.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the first isomorph mapping found projected of g1. This is a List of RMap objects containing Ids of matching bonds.
      Throws:
      CDKException
    • getIsomorphAtomsMap

      public List<RMap> getIsomorphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns the first isomorph 'atom mapping' found for g2 in g1.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the first isomorph atom mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
      Throws:
      CDKException - if the first molecules is not an instance of IQueryAtomContainer
    • getIsomorphMaps

      public List<List<RMap>> getIsomorphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns all the isomorph 'mappings' found between two atom containers.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the list of all the 'mappings'
      Throws:
      CDKException
    • getSubgraphMaps

      public List<List<RMap>> getSubgraphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns all the subgraph 'bond mappings' found for g2 in g1. This is an List of Lists of RMap objects. Note that if the query molecule is a single atom, then bond mappings cannot be defined. In such a case, the RMap object refers directly to atom - atom mappings. Thus RMap.id1 is the index of the target atom and RMap.id2 is the index of the matching query atom (in this case, it will always be 0). Note that in such a case, there is no need to call makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer), though if it is called, then the return value is simply the same as the return value of this method.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the list of all the 'mappings' found projected of g1
      Throws:
      CDKException
      See Also:
    • getSubgraphMap

      public List<RMap> getSubgraphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns the first subgraph 'bond mapping' found for g2 in g1.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the first subgraph bond mapping found projected on g1. This is a List of RMap objects containing Ids of matching bonds.
      Throws:
      CDKException
    • getSubgraphAtomsMaps

      public List<List<RMap>> getSubgraphAtomsMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned. This is an List of Lists of RMap objects.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - substructure to be mapped. May be an IQueryAtomContainer.
      Returns:
      all subgraph atom mappings found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
      Throws:
      CDKException
    • getSubgraphAtomsMap

      public List<RMap> getSubgraphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - substructure to be mapped. May be an IQueryAtomContainer.
      Returns:
      the first subgraph atom mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
      Throws:
      CDKException
    • isSubgraph

      @Deprecated public boolean isSubgraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Deprecated.
      Use the Pattern APIs from the cdk-isomorphism module
      Tests if g2 a subgraph of g1.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      true if g2 a subgraph on g1
      Throws:
      CDKException
    • getOverlaps

      public List<IAtomContainer> getOverlaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Returns all the maximal common substructure between two atom containers.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      the list of all the maximal common substructure found projected of g1 (list of AtomContainer )
      Throws:
      CDKException
    • getBitSet

      public static BitSet getBitSet(IAtomContainer ac)
      Transforms an AtomContainer into a BitSet (which's size = number of bond in the atomContainer, all the bit are set to true).
      Parameters:
      ac - IAtomContainer to transform
      Returns:
      The bitSet
    • buildRGraph

      public static RGraph buildRGraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Builds the RGraph ( resolution graph ), from two atomContainer (description of the two molecules to compare) This is the interface point between the CDK model and the generic MCSS algorithm based on the RGRaph.
      Parameters:
      g1 - Description of the first molecule
      g2 - Description of the second molecule
      Returns:
      the rGraph
      Throws:
      CDKException
    • search

      public List<List<RMap>> search(IAtomContainer g1, IAtomContainer g2, BitSet c1, BitSet c2, boolean findAllStructure, boolean findAllMap) throws CDKException
      General RGraph parsing method (usually not used directly) This method is the entry point for the recursive search adapted to the atom container input.
      Parameters:
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      c1 - initial condition ( bonds from g1 that must be contains in the solution )
      c2 - initial condition ( bonds from g2 that must be contains in the solution )
      findAllStructure - if false stop at the first structure found
      findAllMap - if true search all the 'mappings' for one same structure
      Returns:
      a List of Lists of RMap objects that represent the search solutions
      Throws:
      CDKException
    • project

      public static IAtomContainer project(List<RMap> rMapList, IAtomContainer g, int id)
      Projects a list of RMap on a molecule.
      Parameters:
      rMapList - the list to project
      g - the molecule on which project
      id - the id in the RMap of the molecule g
      Returns:
      an AtomContainer
    • projectList

      public static List<IAtomContainer> projectList(List<List<RMap>> rMapsList, IAtomContainer g, int id)
      Projects a list of RMapsList on a molecule.
      Parameters:
      rMapsList - list of RMapsList to project
      g - the molecule on which project
      id - the id in the RMap of the molecule g
      Returns:
      a list of AtomContainer
    • checkSingleAtomCases

      public static List<RMap> checkSingleAtomCases(IAtomContainer g1, IAtomContainer g2) throws CDKException
      Checks for single atom cases before doing subgraph/isomorphism search.
      Parameters:
      g1 - AtomContainer to match on. Must not be an IQueryAtomContainer.
      g2 - AtomContainer as query. May be an IQueryAtomContainer.
      Returns:
      List of List of RMap objects for the Atoms (not Bonds!), null if no single atom case
      Throws:
      CDKException - if the first molecule is an instance of IQueryAtomContainer
    • makeAtomsMapsOfBondsMaps

      public static List<List<RMap>> makeAtomsMapsOfBondsMaps(List<List<RMap>> l, IAtomContainer g1, IAtomContainer g2)
      This makes maps of matching atoms out of a maps of matching bonds as produced by the get(Subgraph|Ismorphism)Maps methods.
      Parameters:
      l - The list produced by the getMap method.
      g1 - The first atom container. Must not be a IQueryAtomContainer.
      g2 - The second one (first and second as in getMap). May be an IQueryAtomContainer.
      Returns:
      A List of Lists of RMap objects of matching Atoms.
    • makeAtomsMapOfBondsMap

      public static List<RMap> makeAtomsMapOfBondsMap(List<RMap> l, IAtomContainer g1, IAtomContainer g2)
      This makes a map of matching atoms out of a map of matching bonds as produced by the get(Subgraph|Ismorphism)Map methods.
      Parameters:
      l - The list produced by the getMap method.
      g1 - first molecule. Must not be an IQueryAtomContainer.
      g2 - second molecule. May be an IQueryAtomContainer.
      Returns:
      The mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
    • setTimeout

      public void setTimeout(long timeout)
      Sets the time in milliseconds until the substructure search will be breaked.
      Parameters:
      timeout - Time in milliseconds. -1 to ignore the timeout.