Class SMARTSQueryTool

java.lang.Object
org.openscience.cdk.smiles.smarts.SMARTSQueryTool

@Deprecated public class SMARTSQueryTool extends Object
Deprecated.
This class provides a easy to use wrapper around SMARTS matching functionality. User code that wants to do SMARTS matching should use this rather than using SMARTSParser (and UniversalIsomorphismTester) directly. Example usage would be

 SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance());
 IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C");
 SMARTSQueryTool querytool = new SMARTSQueryTool("O=CO");
 boolean status = querytool.matches(atomContainer);
 if (status) {
    int nmatch = querytool.countMatches();
    List mappings = querytool.getMatchingAtoms();
    for (int i = 0; i < nmatch; i++) {
       List atomIndices = (List) mappings.get(i);
    }
 }
 
SMARTS Extensions
Currently the CDK supports the following SMARTS symbols, that are not described in the Daylight specification. However they are supported by other packages and are noted as such.
Table 1 - Supported Extensions
SymbolMeaningDefaultNotes
GxPeriodic group numberNonex must be specified and must be a number between 1 and 18. This symbol is supported by the MOE SMARTS implementation
#XAny non-carbon heavy elementNoneThis symbol is supported by the MOE SMARTS implementation
^xAny atom with the a specified hybridization stateNonex must be specified and should be between 1 and 8 (inclusive), corresponding to SP1, SP2, SP3, SP3D1, SP3D2 SP3D3, SP3D4 and SP3D5. Supported by the OpenEye SMARTS implementation
Notes
  • As described by Craig James the h<n> SMARTS pattern should not be used. It was included in the Daylight spec for backwards compatibility. To match hydrogens, use the H<n> pattern.
  • The wild card pattern (*) will not match hydrogens (explicit or implicit) unless an isotope is specified. In other words, * gives two hits against C[2H] but 1 hit against C[H]. This also means that it gives no hits against [H][H]. This is contrary to what is shown by Daylights depictmatch service, but is based on this discussion. A work around to get * to match [H][H] is to write it in the form [1H][1H]. It's not entirely clear what the behavior of * should be with respect to hydrogens. it is possible that the code will be updated so that * will not match any hydrogen in the future.
  • The org.openscience.cdk.aromaticity.CDKHueckelAromaticityDetector only considers single rings and two fused non-spiro rings. As a result, it does not properly detect aromaticity in polycyclic systems such as [O-]C(=O)c1ccccc1c2c3ccc([O-])cc3oc4cc(=O)ccc24. Thus SMARTS patterns that depend on proper aromaticity detection may not work correctly in such polycyclic systems
Author:
Rajarshi Guha
Source code:
main
Belongs to CDK module:
smarts
Keywords:
SMARTS, substructure search
Created on:
2007-04-08
  • Constructor Details

    • SMARTSQueryTool

      public SMARTSQueryTool(String smarts, IChemObjectBuilder builder)
      Deprecated.
      Create a new SMARTS query tool for the specified SMARTS string. Query objects will contain a reference to the specified IChemObjectBuilder.
      Parameters:
      smarts - SMARTS query string
      Throws:
      IllegalArgumentException - if the SMARTS string can not be handled
  • Method Details

    • setQueryCacheSize

      public void setQueryCacheSize(int maxEntries)
      Deprecated.
      Set the maximum size of the query cache.
      Parameters:
      maxEntries - The maximum number of entries
    • useSmallestSetOfSmallestRings

      public void useSmallestSetOfSmallestRings()
      Deprecated.
      Indicates that ring properties should use the Smallest Set of Smallest Rings. The set is not unique and may lead to ambiguous matches.
      See Also:
    • useRelevantRings

      public void useRelevantRings()
      Deprecated.
      Indicates that ring properties should use the Relevant Rings. The set is unique and includes all of the SSSR but may be exponential in size.
      See Also:
    • useEssentialRings

      public void useEssentialRings()
      Deprecated.
      Indicates that ring properties should use the Essential Rings (default). The set is unique but only includes a subset of the SSSR.
      See Also:
    • setAromaticity

      public void setAromaticity(Aromaticity aromaticity)
      Deprecated.
      Set the aromaticity perception to use. Different aromaticity models may required certain attributes to be set (e.g. atom typing). These will not be automatically configured and should be preset before matching.
       SMARTSQueryTool sqt = new SMARTSQueryTool(...);
       sqt.setAromaticity(new Aromaticity(ElectronDonation.cdk(),
                                          Cycles.cdkAromaticSet));
       for (IAtomContainer molecule : molecules) {
      
           // CDK Aromatic model needs atom types
           AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(molecule);
      
           sqt.matches(molecule);
       }
       
      Parameters:
      aromaticity - the new aromaticity perception
      See Also:
    • getSmarts

      public String getSmarts()
      Deprecated.
      Returns the current SMARTS pattern being used.
      Returns:
      The SMARTS pattern
    • setSmarts

      public void setSmarts(String smarts) throws CDKException
      Deprecated.
      Set a new SMARTS pattern.
      Parameters:
      smarts - The new SMARTS pattern
      Throws:
      CDKException - if there is an error in parsing the pattern
    • matches

      public boolean matches(IAtomContainer atomContainer) throws CDKException
      Deprecated.
      Perform a SMARTS match and check whether the query is present in the target molecule. This function simply checks whether the query pattern matches the specified molecule. However the function will also, internally, save the mapping of query atoms to the target molecule Note: This method performs a simple caching scheme, by comparing the current molecule to the previous molecule by reference. If you repeatedly match different SMARTS on the same molecule, this method will avoid initializing ( ring perception, aromaticity etc.) the molecule each time. If however, you modify the molecule between such multiple matchings you should use the other form of this method to force initialization.
      Parameters:
      atomContainer - The target moleculoe
      Returns:
      true if the pattern is found in the target molecule, false otherwise
      Throws:
      CDKException - if there is an error in ring, aromaticity or isomorphism perception
      See Also:
    • matches

      public boolean matches(IAtomContainer atomContainer, boolean forceInitialization) throws CDKException
      Deprecated.
      Perform a SMARTS match and check whether the query is present in the target molecule. This function simply checks whether the query pattern matches the specified molecule. However the function will also, internally, save the mapping of query atoms to the target molecule
      Parameters:
      atomContainer - The target moleculoe
      forceInitialization - If true, then the molecule is initialized (ring perception, aromaticity etc). If false, the molecule is only initialized if it is different (in terms of object reference) than one supplied in a previous call to this method.
      Returns:
      true if the pattern is found in the target molecule, false otherwise
      Throws:
      CDKException - if there is an error in ring, aromaticity or isomorphism perception
      See Also:
    • countMatches

      public int countMatches()
      Deprecated.
      Returns the number of times the pattern was found in the target molecule. This function should be called after matches(org.openscience.cdk.interfaces.IAtomContainer). If not, the results may be undefined.
      Returns:
      The number of times the pattern was found in the target molecule
    • getMatchingAtoms

      public List<List<Integer>> getMatchingAtoms()
      Deprecated.
      Get the atoms in the target molecule that match the query pattern. Since there may be multiple matches, the return value is a List of List objects. Each List object contains the indices of the atoms in the target molecule, that match the query pattern
      Returns:
      A List of List of atom indices in the target molecule
    • getUniqueMatchingAtoms

      public List<List<Integer>> getUniqueMatchingAtoms()
      Deprecated.
      Get the atoms in the target molecule that match the query pattern. Since there may be multiple matches, the return value is a List of List objects. Each List object contains the unique set of indices of the atoms in the target molecule, that match the query pattern
      Returns:
      A List of List of atom indices in the target molecule