Class SmartsFragmentExtractor


  • public final class SmartsFragmentExtractor
    extends Object
    Utility class to create SMARTS that match part (substructure) of a molecule. SMARTS are generated by providing the atom indexes. An example use cases is encoding features from a fingerprint.

    The extractor has two modes. MODE_EXACT (default) captures the element, valence, hydrogen count, connectivity, and charge in the SMARTS atom expressions. The alternative mode, MODE_JCOMPOUNDMAPPER, only captures the element, non-zero charge, and peripheral bonds. Although the later looks cleaner, the peripheral bonds intend to capture the connectivity of the terminal atoms but since the valence is not bounded further substitution is still allowed. This mirrors functionality from jCompoundMapper [Hinselmann et. al.. Journal of Cheminformatics. 2011. 3].

    The difference is easily demonstrated for methyl. Consider the compound of 2-methylpentane CC(C)CCC, if we extract one of the methyl atoms depending on the mode we obtain [CH3v4X4+0] or C*. The first of these patterns (obtained by MODE_EXACT) matches the compound in three places (the three methyl groups). The second matches six times (every atom) because the substituion on the carbon is not locked. A further complication is introduced by the inclusion of the peripheral atoms, for 1H-indole [nH]1ccc2c1cccc2 we can obtain the SMARTS n(ccc(a)a)a that doesn't match at all. This is because one of the aromatic atoms ('a') needs to match the nitrogen.

    Basic Usage:

    
    
     IChemObjectBuilder      bldr      = SilentChemObjectBuilder.getInstance();
     SmilesParser            smipar    = new SmilesParser(bldr);
    
     IAtomContainer          mol       = smipar.parseSmiles("[nH]1ccc2c1cccc2");
     SmartsFragmentExtractor subsmarts = new SmartsFragmentExtractor(mol);
    
     // smarts=[nH1v3X3+0][cH1v4X3+0][cH1v4X3+0][cH0v4X3+0]
     // hits  =1
     String             smarts    = mol.generate(new int[]{0,1,3,4});
    
     subsmarts.setMode(MODE_JCOMPOUNDMAPPER);
     // smarts=n(ccc(a)a)a
     // hits  = 0 - one of the 'a' atoms needs to match the nitrogen
     String             smarts    = mol.generate(new int[]{0,1,3,4});
     
    Author:
    Nikolay Kochev, Nina Jeliazkova, John May
    • Field Detail

      • MODE_JCOMPOUNDMAPPER

        public static final int MODE_JCOMPOUNDMAPPER
        Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.
        See Also:
        Constant Field Values
      • MODE_EXACT

        public static final int MODE_EXACT
        Sets the mode of the extractor to produce exact SMARTS.
        See Also:
        Constant Field Values
    • Constructor Detail

      • SmartsFragmentExtractor

        public SmartsFragmentExtractor​(IAtomContainer mol)
        Create a new instance over the provided molecule.
        Parameters:
        mol - molecule
    • Method Detail

      • setMode

        public void setMode​(int mode)
        Set the mode of SMARTS substructure selection
        Parameters:
        mode - the mode
      • generate

        public String generate​(int[] atomIdxs)
        Generate a SMARTS for the substructure formed of the provided atoms.
        Parameters:
        atomIdxs - atom indexes
        Returns:
        SMARTS, null if an empty array is passed