Class SmartsFragmentExtractor
- java.lang.Object
-
- org.openscience.cdk.smarts.SmartsFragmentExtractor
-
public final class SmartsFragmentExtractor extends Object
Utility class to create SMARTS that match part (substructure) of a molecule. SMARTS are generated by providing the atom indexes. An example use cases is encoding features from a fingerprint.The extractor has two modes.
MODE_EXACT
(default) captures the element, valence, hydrogen count, connectivity, and charge in the SMARTS atom expressions. The alternative mode,MODE_JCOMPOUNDMAPPER
, only captures the element, non-zero charge, and peripheral bonds. Although the later looks cleaner, the peripheral bonds intend to capture the connectivity of the terminal atoms but since the valence is not bounded further substitution is still allowed. This mirrors functionality from jCompoundMapper [Hinselmann et. al.. Journal of Cheminformatics. 2011. 3].The difference is easily demonstrated for methyl. Consider the compound of 2-methylpentane
CC(C)CCC
, if we extract one of the methyl atoms depending on the mode we obtain[CH3v4X4+0]
orC*
. The first of these patterns (obtained byMODE_EXACT
) matches the compound in three places (the three methyl groups). The second matches six times (every atom) because the substituion on the carbon is not locked. A further complication is introduced by the inclusion of the peripheral atoms, for 1H-indole[nH]1ccc2c1cccc2
we can obtain the SMARTSn(ccc(a)a)a
that doesn't match at all. This is because one of the aromatic atoms ('a') needs to match the nitrogen.Basic Usage:
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); SmilesParser smipar = new SmilesParser(bldr); IAtomContainer mol = smipar.parseSmiles("[nH]1ccc2c1cccc2"); SmartsFragmentExtractor subsmarts = new SmartsFragmentExtractor(mol); // smarts=[nH1v3X3+0][cH1v4X3+0][cH1v4X3+0][cH0v4X3+0] // hits =1 String smarts = mol.generate(new int[]{0,1,3,4}); subsmarts.setMode(MODE_JCOMPOUNDMAPPER); // smarts=n(ccc(a)a)a // hits = 0 - one of the 'a' atoms needs to match the nitrogen String smarts = mol.generate(new int[]{0,1,3,4});
- Author:
- Nikolay Kochev, Nina Jeliazkova, John May
-
-
Field Summary
Fields Modifier and Type Field Description static int
MODE_EXACT
Sets the mode of the extractor to produce exact SMARTS.static int
MODE_JCOMPOUNDMAPPER
Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.
-
Constructor Summary
Constructors Constructor Description SmartsFragmentExtractor(IAtomContainer mol)
Create a new instance over the provided molecule.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
generate(int[] atomIdxs)
Generate a SMARTS for the substructure formed of the provided atoms.void
setMode(int mode)
Set the mode of SMARTS substructure selection
-
-
-
Field Detail
-
MODE_JCOMPOUNDMAPPER
public static final int MODE_JCOMPOUNDMAPPER
Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.- See Also:
- Constant Field Values
-
MODE_EXACT
public static final int MODE_EXACT
Sets the mode of the extractor to produce exact SMARTS.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SmartsFragmentExtractor
public SmartsFragmentExtractor(IAtomContainer mol)
Create a new instance over the provided molecule.- Parameters:
mol
- molecule
-
-
Method Detail
-
setMode
public void setMode(int mode)
Set the mode of SMARTS substructure selection- Parameters:
mode
- the mode
-
generate
public String generate(int[] atomIdxs)
Generate a SMARTS for the substructure formed of the provided atoms.- Parameters:
atomIdxs
- atom indexes- Returns:
- SMARTS, null if an empty array is passed
-
-