Class UniversalIsomorphismTester
- java.lang.Object
-
- org.openscience.cdk.isomorphism.UniversalIsomorphismTester
-
public class UniversalIsomorphismTester extends Object
This class implements a multipurpose structure comparison tool. It allows to find maximal common substructure, find the mapping of a substructure in another structure, and the mapping of two isomorphic structures.Structure comparison may be associated to bond constraints (mandatory bonds, e.g. scaffolds, reaction cores,...) on each source graph. The constraint flexibility allows a number of interesting queries. The substructure analysis relies on the RGraph generic class (see: RGraph) This class implements the link between the RGraph model and the the CDK model in this way the
RGraph
remains independent and may be used in other contexts.This algorithm derives from the algorithm described in [Tonnelier, C. et. al.. Tetrahedron Comput. Methodol.. 1990. 3] and modified in the thesis of T. Hanser [Hanser, Th., Apprentissage automatique de méthodes de synthèse à partir d'exemples, 1993, ?Institute?].
With the
isSubgraph(IAtomContainer, IAtomContainer)
method, the second, and only the second argument may be aIQueryAtomContainer
, which allows one to do SMARTS or MQL like queries. The firstIAtomContainer
must never be anIQueryAtomContainer
. An example:SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance()); IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C"); // acetic acid anhydride IAtomContainer SMILESquery = sp.parseSmiles("CC"); // ethane IQueryAtomContainer query = IQueryAtomContainerCreator.createBasicQueryContainer(SMILESquery); boolean isSubstructure = UniversalIsomorphismTester.isSubgraph(atomContainer, query);
WARNING: As a result of the adjacency perception used in this algorithm there is a single limitation: cyclopropane and isobutane are seen as isomorph. This is due to the fact that these two compounds are the only ones where each bond is connected two each other bond (bonds are fully connected) with the same number of bonds and still they have different structures The algorithm could be easily enhanced with a simple atom mapping manager to provide an atom level overlap definition that would reveal this case. We decided not to penalize the whole procedure because of one single exception query. Furthermore isomorphism may be discarded since the number of atoms are not the same (3 != 4) and in most case this will be already screened out by a fingerprint based filtering. It is possible to add a special treatment for this special query. Be reminded that this algorithm matches bonds only.
NoteWhile most isomorphism queries involve a multi-atom query structure there may be cases in which the query atom is a single atom. In such a case a mapping of target bonds to query bonds is not feasible. In such a case, the RMap objects correspond to atom indices rather than bond indices. In general, this will not affect user code and the same sequence of method calls for matching multi-atom query structures will work for single atom query structures as well.
- Author:
- Stephane Werner from IXELIS mail@ixelis.net
- Source code:
- main
- Belongs to CDK module:
- standard
- Created on:
- 2002-07-17
- Requires:
- java1.4+
-
-
Constructor Summary
Constructors Constructor Description UniversalIsomorphismTester()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static RGraph
buildRGraph(IAtomContainer g1, IAtomContainer g2)
Builds theRGraph
( resolution graph ), from two atomContainer (description of the two molecules to compare) This is the interface point between the CDK model and the generic MCSS algorithm based on the RGRaph.static List<RMap>
checkSingleAtomCases(IAtomContainer g1, IAtomContainer g2)
Checks for single atom cases before doing subgraph/isomorphism search.static BitSet
getBitSet(IAtomContainer ac)
Transforms an AtomContainer into aBitSet
(which's size = number of bond in the atomContainer, all the bit are set to true).List<RMap>
getIsomorphAtomsMap(IAtomContainer g1, IAtomContainer g2)
Returns the first isomorph 'atom mapping' found for g2 in g1.List<RMap>
getIsomorphMap(IAtomContainer g1, IAtomContainer g2)
Returns the first isomorph mapping found or null.List<List<RMap>>
getIsomorphMaps(IAtomContainer g1, IAtomContainer g2)
Returns all the isomorph 'mappings' found between two atom containers.List<IAtomContainer>
getOverlaps(IAtomContainer g1, IAtomContainer g2)
Returns all the maximal common substructure between two atom containers.List<RMap>
getSubgraphAtomsMap(IAtomContainer g1, IAtomContainer g2)
Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure of g1.List<List<RMap>>
getSubgraphAtomsMaps(IAtomContainer g1, IAtomContainer g2)
Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure of g1.List<RMap>
getSubgraphMap(IAtomContainer g1, IAtomContainer g2)
Returns the first subgraph 'bond mapping' found for g2 in g1.List<List<RMap>>
getSubgraphMaps(IAtomContainer g1, IAtomContainer g2)
Returns all the subgraph 'bond mappings' found for g2 in g1.boolean
isIsomorph(IAtomContainer g1, IAtomContainer g2)
Tests if g1 and g2 are isomorph.boolean
isSubgraph(IAtomContainer g1, IAtomContainer g2)
Deprecated.Use the Pattern APIs from the cdk-isomorphism modulestatic List<RMap>
makeAtomsMapOfBondsMap(List<RMap> l, IAtomContainer g1, IAtomContainer g2)
This makes a map of matching atoms out of a map of matching bonds as produced by the get(Subgraph|Ismorphism)Map methods.static List<List<RMap>>
makeAtomsMapsOfBondsMaps(List<List<RMap>> l, IAtomContainer g1, IAtomContainer g2)
This makes maps of matching atoms out of a maps of matching bonds as produced by the get(Subgraph|Ismorphism)Maps methods.static IAtomContainer
project(List<RMap> rMapList, IAtomContainer g, int id)
Projects a list ofRMap
on a molecule.static List<IAtomContainer>
projectList(List<List<RMap>> rMapsList, IAtomContainer g, int id)
Projects a list of RMapsList on a molecule.List<List<RMap>>
search(IAtomContainer g1, IAtomContainer g2, BitSet c1, BitSet c2, boolean findAllStructure, boolean findAllMap)
GeneralRGraph
parsing method (usually not used directly) This method is the entry point for the recursive search adapted to the atom container input.void
setTimeout(long timeout)
Sets the time in milliseconds until the substructure search will be breaked.
-
-
-
Method Detail
-
isIsomorph
public boolean isIsomorph(IAtomContainer g1, IAtomContainer g2) throws CDKException
Tests if g1 and g2 are isomorph.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- true if the 2 molecule are isomorph
- Throws:
CDKException
- if the first molecule is an instance of IQueryAtomContainer
-
getIsomorphMap
public List<RMap> getIsomorphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns the first isomorph mapping found or null.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the first isomorph mapping found projected of g1. This is a List of RMap objects containing Ids of matching bonds.
- Throws:
CDKException
-
getIsomorphAtomsMap
public List<RMap> getIsomorphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns the first isomorph 'atom mapping' found for g2 in g1.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the first isomorph atom mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
- Throws:
CDKException
- if the first molecules is not an instance ofIQueryAtomContainer
-
getIsomorphMaps
public List<List<RMap>> getIsomorphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns all the isomorph 'mappings' found between two atom containers.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the list of all the 'mappings'
- Throws:
CDKException
-
getSubgraphMaps
public List<List<RMap>> getSubgraphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns all the subgraph 'bond mappings' found for g2 in g1. This is anList
ofList
s ofRMap
objects. Note that if the query molecule is a single atom, then bond mappings cannot be defined. In such a case, theRMap
object refers directly to atom - atom mappings. Thus RMap.id1 is the index of the target atom and RMap.id2 is the index of the matching query atom (in this case, it will always be 0). Note that in such a case, there is no need to callmakeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer)
, though if it is called, then the return value is simply the same as the return value of this method.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the list of all the 'mappings' found projected of g1
- Throws:
CDKException
- See Also:
makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer)
-
getSubgraphMap
public List<RMap> getSubgraphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns the first subgraph 'bond mapping' found for g2 in g1.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the first subgraph bond mapping found projected on g1. This is a
List
ofRMap
objects containing Ids of matching bonds. - Throws:
CDKException
-
getSubgraphAtomsMaps
public List<List<RMap>> getSubgraphAtomsMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned. This is anList
ofList
s ofRMap
objects.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- substructure to be mapped. May be anIQueryAtomContainer
.- Returns:
- all subgraph atom mappings found projected on g1. This is a
List
ofRMap
objects containing Ids of matching atoms. - Throws:
CDKException
-
getSubgraphAtomsMap
public List<RMap> getSubgraphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- substructure to be mapped. May be anIQueryAtomContainer
.- Returns:
- the first subgraph atom mapping found projected on g1.
This is a
List
ofRMap
objects containing Ids of matching atoms. - Throws:
CDKException
-
isSubgraph
@Deprecated public boolean isSubgraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
Deprecated.Use the Pattern APIs from the cdk-isomorphism moduleTests if g2 a subgraph of g1.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- true if g2 a subgraph on g1
- Throws:
CDKException
-
getOverlaps
public List<IAtomContainer> getOverlaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
Returns all the maximal common substructure between two atom containers.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- the list of all the maximal common substructure found projected of g1 (list of AtomContainer )
- Throws:
CDKException
-
getBitSet
public static BitSet getBitSet(IAtomContainer ac)
Transforms an AtomContainer into aBitSet
(which's size = number of bond in the atomContainer, all the bit are set to true).- Parameters:
ac
-IAtomContainer
to transform- Returns:
- The bitSet
-
buildRGraph
public static RGraph buildRGraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
Builds theRGraph
( resolution graph ), from two atomContainer (description of the two molecules to compare) This is the interface point between the CDK model and the generic MCSS algorithm based on the RGRaph.- Parameters:
g1
- Description of the first moleculeg2
- Description of the second molecule- Returns:
- the rGraph
- Throws:
CDKException
-
search
public List<List<RMap>> search(IAtomContainer g1, IAtomContainer g2, BitSet c1, BitSet c2, boolean findAllStructure, boolean findAllMap) throws CDKException
GeneralRGraph
parsing method (usually not used directly) This method is the entry point for the recursive search adapted to the atom container input.- Parameters:
g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.c1
- initial condition ( bonds from g1 that must be contains in the solution )c2
- initial condition ( bonds from g2 that must be contains in the solution )findAllStructure
- if false stop at the first structure foundfindAllMap
- if true search all the 'mappings' for one same structure- Returns:
- a List of Lists of
RMap
objects that represent the search solutions - Throws:
CDKException
-
project
public static IAtomContainer project(List<RMap> rMapList, IAtomContainer g, int id)
Projects a list ofRMap
on a molecule.- Parameters:
rMapList
- the list to projectg
- the molecule on which projectid
- the id in theRMap
of the molecule g- Returns:
- an AtomContainer
-
projectList
public static List<IAtomContainer> projectList(List<List<RMap>> rMapsList, IAtomContainer g, int id)
Projects a list of RMapsList on a molecule.- Parameters:
rMapsList
- list of RMapsList to projectg
- the molecule on which projectid
- the id in the RMap of the molecule g- Returns:
- a list of AtomContainer
-
checkSingleAtomCases
public static List<RMap> checkSingleAtomCases(IAtomContainer g1, IAtomContainer g2) throws CDKException
Checks for single atom cases before doing subgraph/isomorphism search.- Parameters:
g1
- AtomContainer to match on. Must not be anIQueryAtomContainer
.g2
- AtomContainer as query. May be anIQueryAtomContainer
.- Returns:
List
ofList
ofRMap
objects for the Atoms (not Bonds!), null if no single atom case- Throws:
CDKException
- if the first molecule is an instance of IQueryAtomContainer
-
makeAtomsMapsOfBondsMaps
public static List<List<RMap>> makeAtomsMapsOfBondsMaps(List<List<RMap>> l, IAtomContainer g1, IAtomContainer g2)
This makes maps of matching atoms out of a maps of matching bonds as produced by the get(Subgraph|Ismorphism)Maps methods.- Parameters:
l
- The list produced by the getMap method.g1
- The first atom container. Must not be aIQueryAtomContainer
.g2
- The second one (first and second as in getMap). May be anIQueryAtomContainer
.- Returns:
- A List of
List
s ofRMap
objects of matching Atoms.
-
makeAtomsMapOfBondsMap
public static List<RMap> makeAtomsMapOfBondsMap(List<RMap> l, IAtomContainer g1, IAtomContainer g2)
This makes a map of matching atoms out of a map of matching bonds as produced by the get(Subgraph|Ismorphism)Map methods.- Parameters:
l
- The list produced by the getMap method.g1
- first molecule. Must not be anIQueryAtomContainer
.g2
- second molecule. May be anIQueryAtomContainer
.- Returns:
- The mapping found projected on g1. This is a
List
ofRMap
objects containing Ids of matching atoms.
-
setTimeout
public void setTimeout(long timeout)
Sets the time in milliseconds until the substructure search will be breaked.- Parameters:
timeout
- Time in milliseconds. -1 to ignore the timeout.
-
-