public class UniversalIsomorphismTester extends Object
Structure comparison may be associated to bond constraints
(mandatory bonds, e.g. scaffolds, reaction cores,...) on each source graph.
The constraint flexibility allows a number of interesting queries.
The substructure analysis relies on the RGraph generic class (see: RGraph)
This class implements the link between the RGraph model and the
the CDK model in this way the RGraph
remains independent and may be used
in other contexts.
This algorithm derives from the algorithm described in (Tonnelier, C. et. al.. Tetrahedron Comput. Methodol.. 1990. 3) and modified in the thesis of T. Hanser (Hanser, Th., Apprentissage automatique de méthodes de synthèse à partir d'exemples, 1993, ?Institute?).
With the isSubgraph(IAtomContainer, IAtomContainer)
method,
the second, and only the second argument may be a IQueryAtomContainer
,
which allows one to do SMARTS or MQL like queries.
The first IAtomContainer
must never be an IQueryAtomContainer
.
An example:
SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance()); IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C"); // acetic acid anhydride IAtomContainer SMILESquery = sp.parseSmiles("CC"); // ethane IQueryAtomContainer query = IQueryAtomContainerCreator.createBasicQueryContainer(SMILESquery); boolean isSubstructure = UniversalIsomorphismTester.isSubgraph(atomContainer, query);
WARNING: As a result of the adjacency perception used in this algorithm there is a single limitation: cyclopropane and isobutane are seen as isomorph. This is due to the fact that these two compounds are the only ones where each bond is connected two each other bond (bonds are fully connected) with the same number of bonds and still they have different structures The algorithm could be easily enhanced with a simple atom mapping manager to provide an atom level overlap definition that would reveal this case. We decided not to penalize the whole procedure because of one single exception query. Furthermore isomorphism may be discarded since the number of atoms are not the same (3 != 4) and in most case this will be already screened out by a fingerprint based filtering. It is possible to add a special treatment for this special query. Be reminded that this algorithm matches bonds only.
NoteWhile most isomorphism queries involve a multi-atom query structure there may be cases in which the query atom is a single atom. In such a case a mapping of target bonds to query bonds is not feasible. In such a case, the RMap objects correspond to atom indices rather than bond indices. In general, this will not affect user code and the same sequence of method calls for matching multi-atom query structures will work for single atom query structures as well.
Constructor and Description |
---|
UniversalIsomorphismTester() |
Modifier and Type | Method and Description |
---|---|
static RGraph |
buildRGraph(IAtomContainer g1,
IAtomContainer g2)
Builds the
RGraph ( resolution graph ), from two atomContainer
(description of the two molecules to compare)
This is the interface point between the CDK model and
the generic MCSS algorithm based on the RGRaph. |
static List<RMap> |
checkSingleAtomCases(IAtomContainer g1,
IAtomContainer g2)
Checks for single atom cases before doing subgraph/isomorphism search.
|
static BitSet |
getBitSet(IAtomContainer ac)
Transforms an AtomContainer into a
BitSet (which's size = number of bond
in the atomContainer, all the bit are set to true). |
List<RMap> |
getIsomorphAtomsMap(IAtomContainer g1,
IAtomContainer g2)
Returns the first isomorph 'atom mapping' found for g2 in g1.
|
List<RMap> |
getIsomorphMap(IAtomContainer g1,
IAtomContainer g2)
Returns the first isomorph mapping found or null.
|
List<List<RMap>> |
getIsomorphMaps(IAtomContainer g1,
IAtomContainer g2)
Returns all the isomorph 'mappings' found between two
atom containers.
|
List<IAtomContainer> |
getOverlaps(IAtomContainer g1,
IAtomContainer g2)
Returns all the maximal common substructure between two atom containers.
|
List<RMap> |
getSubgraphAtomsMap(IAtomContainer g1,
IAtomContainer g2)
Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure
of g1.
|
List<List<RMap>> |
getSubgraphAtomsMaps(IAtomContainer g1,
IAtomContainer g2)
Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure
of g1.
|
List<RMap> |
getSubgraphMap(IAtomContainer g1,
IAtomContainer g2)
Returns the first subgraph 'bond mapping' found for g2 in g1.
|
List<List<RMap>> |
getSubgraphMaps(IAtomContainer g1,
IAtomContainer g2)
Returns all the subgraph 'bond mappings' found for g2 in g1.
|
boolean |
isIsomorph(IAtomContainer g1,
IAtomContainer g2)
Tests if g1 and g2 are isomorph.
|
boolean |
isSubgraph(IAtomContainer g1,
IAtomContainer g2)
Deprecated.
Use the Pattern APIs from the cdk-isomorphism module
|
static List<RMap> |
makeAtomsMapOfBondsMap(List<RMap> l,
IAtomContainer g1,
IAtomContainer g2)
This makes a map of matching atoms out of a map of matching bonds as produced by the
get(Subgraph|Ismorphism)Map methods.
|
static List<List<RMap>> |
makeAtomsMapsOfBondsMaps(List<List<RMap>> l,
IAtomContainer g1,
IAtomContainer g2)
This makes maps of matching atoms out of a maps of matching bonds as produced by the
get(Subgraph|Ismorphism)Maps methods.
|
static IAtomContainer |
project(List<RMap> rMapList,
IAtomContainer g,
int id)
Projects a list of
RMap on a molecule. |
static List<IAtomContainer> |
projectList(List<List<RMap>> rMapsList,
IAtomContainer g,
int id)
Projects a list of RMapsList on a molecule.
|
List<List<RMap>> |
search(IAtomContainer g1,
IAtomContainer g2,
BitSet c1,
BitSet c2,
boolean findAllStructure,
boolean findAllMap)
General
RGraph parsing method (usually not used directly)
This method is the entry point for the recursive search
adapted to the atom container input. |
void |
setTimeout(long timeout)
Sets the time in milliseconds until the substructure search will be breaked.
|
public boolean isIsomorph(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
- if the first molecule is an instance of IQueryAtomContainerpublic List<RMap> getIsomorphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
public List<RMap> getIsomorphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
- if the first molecules is not an instance of IQueryAtomContainer
public List<List<RMap>> getIsomorphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
public List<List<RMap>> getSubgraphMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
List
of List
s of RMap
objects.
Note that if the query molecule is a single atom, then bond mappings
cannot be defined. In such a case, the RMap
object refers directly to
atom - atom mappings. Thus RMap.id1 is the index of the target atom
and RMap.id2 is the index of the matching query atom (in this case,
it will always be 0). Note that in such a case, there is no need
to call makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer)
,
though if it is called, then the
return value is simply the same as the return value of this method.g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer)
public List<RMap> getSubgraphMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.List
of
RMap
objects containing Ids of matching bonds.CDKException
public List<List<RMap>> getSubgraphAtomsMaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
List
of List
s of RMap
objects.g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- substructure to be mapped. May be an IQueryAtomContainer
.List
of RMap
objects containing Ids of matching atoms.CDKException
public List<RMap> getSubgraphAtomsMap(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- substructure to be mapped. May be an IQueryAtomContainer
.List
of RMap
objects containing Ids of matching atoms.CDKException
@Deprecated public boolean isSubgraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
public List<IAtomContainer> getOverlaps(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.CDKException
public static BitSet getBitSet(IAtomContainer ac)
BitSet
(which's size = number of bond
in the atomContainer, all the bit are set to true).ac
- IAtomContainer
to transformpublic static RGraph buildRGraph(IAtomContainer g1, IAtomContainer g2) throws CDKException
RGraph
( resolution graph ), from two atomContainer
(description of the two molecules to compare)
This is the interface point between the CDK model and
the generic MCSS algorithm based on the RGRaph.g1
- Description of the first moleculeg2
- Description of the second moleculeCDKException
public List<List<RMap>> search(IAtomContainer g1, IAtomContainer g2, BitSet c1, BitSet c2, boolean findAllStructure, boolean findAllMap) throws CDKException
RGraph
parsing method (usually not used directly)
This method is the entry point for the recursive search
adapted to the atom container input.g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.c1
- initial condition ( bonds from g1 that
must be contains in the solution )c2
- initial condition ( bonds from g2 that
must be contains in the solution )findAllStructure
- if false stop at the first structure foundfindAllMap
- if true search all the 'mappings' for one same
structureRMap
objects that represent the search solutionsCDKException
public static IAtomContainer project(List<RMap> rMapList, IAtomContainer g, int id)
RMap
on a molecule.rMapList
- the list to projectg
- the molecule on which projectid
- the id in the RMap
of the molecule gpublic static List<IAtomContainer> projectList(List<List<RMap>> rMapsList, IAtomContainer g, int id)
rMapsList
- list of RMapsList to projectg
- the molecule on which projectid
- the id in the RMap of the molecule gpublic static List<RMap> checkSingleAtomCases(IAtomContainer g1, IAtomContainer g2) throws CDKException
g1
- AtomContainer to match on. Must not be an IQueryAtomContainer
.g2
- AtomContainer as query. May be an IQueryAtomContainer
.List
of List
of RMap
objects for the Atoms (not Bonds!), null if no single atom caseCDKException
- if the first molecule is an instance of IQueryAtomContainerpublic static List<List<RMap>> makeAtomsMapsOfBondsMaps(List<List<RMap>> l, IAtomContainer g1, IAtomContainer g2)
l
- The list produced by the getMap method.g1
- The first atom container. Must not be a IQueryAtomContainer
.g2
- The second one (first and second as in getMap). May be an IQueryAtomContainer
.List
s of RMap
objects of matching Atoms.public static List<RMap> makeAtomsMapOfBondsMap(List<RMap> l, IAtomContainer g1, IAtomContainer g2)
l
- The list produced by the getMap method.g1
- first molecule. Must not be an IQueryAtomContainer
.g2
- second molecule. May be an IQueryAtomContainer
.List
of RMap
objects
containing Ids of matching atoms.public void setTimeout(long timeout)
timeout
- Time in milliseconds. -1 to ignore the timeout.Copyright © 2018. All Rights Reserved.