Class SmartsFragmentExtractor

java.lang.Object
org.openscience.cdk.smarts.SmartsFragmentExtractor

public final class SmartsFragmentExtractor extends Object
Utility class to create SMARTS that match part (substructure) of a molecule. SMARTS are generated by providing the atom indexes. An example use cases is encoding features from a fingerprint.

The extractor has two modes. MODE_EXACT (default) captures the element, valence, hydrogen count, connectivity, and charge in the SMARTS atom expressions. The alternative mode, MODE_JCOMPOUNDMAPPER, only captures the element, non-zero charge, and peripheral bonds. Although the later looks cleaner, the peripheral bonds intend to capture the connectivity of the terminal atoms but since the valence is not bounded further substitution is still allowed. This mirrors functionality from jCompoundMapper [Hinselmann et. al.. Journal of Cheminformatics. 2011. 3].

The difference is easily demonstrated for methyl. Consider the compound of 2-methylpentane CC(C)CCC, if we extract one of the methyl atoms depending on the mode we obtain [CH3v4X4+0] or C*. The first of these patterns (obtained by MODE_EXACT) matches the compound in three places (the three methyl groups). The second matches six times (every atom) because the substituion on the carbon is not locked. A further complication is introduced by the inclusion of the peripheral atoms, for 1H-indole [nH]1ccc2c1cccc2 we can obtain the SMARTS n(ccc(a)a)a that doesn't match at all. This is because one of the aromatic atoms ('a') needs to match the nitrogen.

Basic Usage:



 IChemObjectBuilder      bldr      = SilentChemObjectBuilder.getInstance();
 SmilesParser            smipar    = new SmilesParser(bldr);

 IAtomContainer          mol       = smipar.parseSmiles("[nH]1ccc2c1cccc2");
 SmartsFragmentExtractor subsmarts = new SmartsFragmentExtractor(mol);

 // smarts=[nH1v3X3+0][cH1v4X3+0][cH1v4X3+0][cH0v4X3+0]
 // hits  =1
 String             smarts    = mol.generate(new int[]{0,1,3,4});

 subsmarts.setMode(MODE_JCOMPOUNDMAPPER);
 // smarts=n(ccc(a)a)a
 // hits  = 0 - one of the 'a' atoms needs to match the nitrogen
 String             smarts    = mol.generate(new int[]{0,1,3,4});
 
Author:
Nikolay Kochev, Nina Jeliazkova, John May
  • Field Details

    • MODE_JCOMPOUNDMAPPER

      public static final int MODE_JCOMPOUNDMAPPER
      Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.
      See Also:
    • MODE_EXACT

      public static final int MODE_EXACT
      Sets the mode of the extractor to produce exact SMARTS.
      See Also:
  • Constructor Details

    • SmartsFragmentExtractor

      public SmartsFragmentExtractor(IAtomContainer mol)
      Create a new instance over the provided molecule.
      Parameters:
      mol - molecule
  • Method Details

    • setMode

      public void setMode(int mode)
      Set the mode of SMARTS substructure selection
      Parameters:
      mode - the mode
    • generate

      public String generate(int[] atomIdxs)
      Generate a SMARTS for the substructure formed of the provided atoms.
      Parameters:
      atomIdxs - atom indexes
      Returns:
      SMARTS, null if an empty array is passed