Class UniversalIsomorphismTester

  • public class UniversalIsomorphismTester
    extends Object
    This class implements a multipurpose structure comparison tool. It allows to find maximal common substructure, find the mapping of a substructure in another structure, and the mapping of two isomorphic structures.

    Structure comparison may be associated to bond constraints (mandatory bonds, e.g. scaffolds, reaction cores,...) on each source graph. The constraint flexibility allows a number of interesting queries. The substructure analysis relies on the RGraph generic class (see: RGraph) This class implements the link between the RGraph model and the the CDK model in this way the RGraph remains independent and may be used in other contexts.

    This algorithm derives from the algorithm described in [Tonnelier, C. et. al.. Tetrahedron Comput. Methodol.. 1990. 3] and modified in the thesis of T. Hanser [Hanser, Th., Apprentissage automatique de méthodes de synthèse à partir d'exemples, 1993, ?Institute?].

    With the isSubgraph(IAtomContainer, IAtomContainer) method, the second, and only the second argument may be a IQueryAtomContainer, which allows one to do SMARTS or MQL like queries. The first IAtomContainer must never be an IQueryAtomContainer. An example:

      SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance());
      IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C"); // acetic acid anhydride
      IAtomContainer SMILESquery = sp.parseSmiles("CC"); // ethane
      IQueryAtomContainer query = IQueryAtomContainerCreator.createBasicQueryContainer(SMILESquery);
      boolean isSubstructure = UniversalIsomorphismTester.isSubgraph(atomContainer, query);

    WARNING: As a result of the adjacency perception used in this algorithm there is a single limitation: cyclopropane and isobutane are seen as isomorph. This is due to the fact that these two compounds are the only ones where each bond is connected two each other bond (bonds are fully connected) with the same number of bonds and still they have different structures The algorithm could be easily enhanced with a simple atom mapping manager to provide an atom level overlap definition that would reveal this case. We decided not to penalize the whole procedure because of one single exception query. Furthermore isomorphism may be discarded since the number of atoms are not the same (3 != 4) and in most case this will be already screened out by a fingerprint based filtering. It is possible to add a special treatment for this special query. Be reminded that this algorithm matches bonds only.

    NoteWhile most isomorphism queries involve a multi-atom query structure there may be cases in which the query atom is a single atom. In such a case a mapping of target bonds to query bonds is not feasible. In such a case, the RMap objects correspond to atom indices rather than bond indices. In general, this will not affect user code and the same sequence of method calls for matching multi-atom query structures will work for single atom query structures as well.

    Stephane Werner from IXELIS
    Source code:
    Belongs to CDK module:
    Created on:
    • Constructor Detail

      • UniversalIsomorphismTester

        public UniversalIsomorphismTester()