Package org.openscience.cdk.fingerprint
Class Fingerprinter
- java.lang.Object
-
- org.openscience.cdk.fingerprint.AbstractFingerprinter
-
- org.openscience.cdk.fingerprint.Fingerprinter
-
- All Implemented Interfaces:
IFingerprinter
- Direct Known Subclasses:
GraphOnlyFingerprinter
,HybridizationFingerprinter
public class Fingerprinter extends AbstractFingerprinter implements IFingerprinter
Generates a fingerprint for a given AtomContainer. Fingerprints are one-dimensional bit arrays, where bits are set according to a the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information). Fingerprints allow for a fast screening step to exclude candidates for a substructure search in a database. They are also a means for determining the similarity of chemical structures.A fingerprint is generated for an AtomContainer with this code:
Molecule molecule = new Molecule(); IFingerprinter fingerprinter = new Fingerprinter(); IBitFingerprint fingerprint = fingerprinter.getBitFingerprint(molecule); fingerprint.size(); // returns 1024 by default fingerprint.length(); // returns the highest set bit
The FingerPrinter has the option to ignore explicit hydrogen's (
Warning: The aromaticity detection for this FingerPrinter relies on AllRingsFinder, which is known to take very long for some molecules with many cycles or special cyclic topologies. Thus, the AllRingsFinder has a built-in timeout of 5 seconds after which it aborts and throws an Exception. If you want your SMILES generated at any expense, you need to create your own AllRingsFinder, set the timeout to a higher value, and assign it to this FingerPrinter. In the vast majority of cases, however, the defaults will be fine.setHashExplicitHydrogens(boolean)
) and pseudo atoms (setHashPseudoAtoms(boolean)
). This ensures the fingerprint can be used for substructure screening by default.Another Warning : The daylight manual says: "Fingerprints are not so definite: if a fingerprint indicates a pattern is missing then it certainly is, but it can only indicate a pattern's presence with some probability." In the case of very small molecules, the probability that you get the same fingerprint for different molecules is high.
- Author:
- steinbeck
- Source code:
- main
- Belongs to CDK module:
- standard
- Keywords:
- fingerprint, similarity
- Created on:
- 2002-02-24
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_SEARCH_DEPTH
The default search depth used to create the fingerprints.static int
DEFAULT_SIZE
The default length of created fingerprints.
-
Constructor Summary
Constructors Constructor Description Fingerprinter()
Creates a fingerprint generator of lengthDEFAULT_SIZE
and with a search depth ofDEFAULT_SEARCH_DEPTH
.Fingerprinter(int size)
Fingerprinter(int size, int searchDepth)
Constructs a fingerprint generator that creates fingerprints of the given size, using a generation algorithm with the given search depth.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected void
encodePaths(IAtomContainer mol, int depth, BitSet fp, int size)
protected int[]
findPathes(IAtomContainer container, int searchDepth)
Deprecated.IBitFingerprint
getBitFingerprint(IAtomContainer container)
Generates a fingerprint of the default size for the given AtomContainer.IBitFingerprint
getBitFingerprint(IAtomContainer container, AllRingsFinder ringFinder)
Generates a fingerprint of the default size for the given AtomContainer.protected String
getBondSymbol(IBond bond)
Gets the bondSymbol attribute of the Fingerprinter classICountFingerprint
getCountFingerprint(IAtomContainer container)
Returns the count fingerprint for the givenIAtomContainer
.protected List<Map.Entry<String,String>>
getParameters()
Base classes should override this method to report the parameters they are configured with.Map<String,Integer>
getRawFingerprint(IAtomContainer container)
Returns the raw representation of the fingerprint for the given IAtomContainer.int
getSearchDepth()
int
getSize()
Returns the size (or length) of the fingerprint.void
setHashExplicitHydrogens(boolean value)
Include explicit hydrogen atoms in the fingerprint.void
setHashPseudoAtoms(boolean value)
Include pseudo/query atoms in the fingerprint with atomic number 0.void
setPathLimit(int limit)
-
Methods inherited from class org.openscience.cdk.fingerprint.AbstractFingerprinter
getFingerprint, getVersionDescription
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.openscience.cdk.fingerprint.IFingerprinter
getFingerprint, getVersionDescription
-
-
-
-
Field Detail
-
DEFAULT_SIZE
public static final int DEFAULT_SIZE
The default length of created fingerprints.- See Also:
- Constant Field Values
-
DEFAULT_SEARCH_DEPTH
public static final int DEFAULT_SEARCH_DEPTH
The default search depth used to create the fingerprints.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
Fingerprinter
public Fingerprinter()
Creates a fingerprint generator of lengthDEFAULT_SIZE
and with a search depth ofDEFAULT_SEARCH_DEPTH
.
-
Fingerprinter
public Fingerprinter(int size)
-
Fingerprinter
public Fingerprinter(int size, int searchDepth)
Constructs a fingerprint generator that creates fingerprints of the given size, using a generation algorithm with the given search depth.- Parameters:
size
- The desired size of the fingerprintsearchDepth
- The desired depth of search (number of bonds)
-
-
Method Detail
-
getParameters
protected List<Map.Entry<String,String>> getParameters()
Description copied from class:AbstractFingerprinter
Base classes should override this method to report the parameters they are configured with.- Overrides:
getParameters
in classAbstractFingerprinter
- Returns:
- The key=value pairs of configured parameters
-
getBitFingerprint
public IBitFingerprint getBitFingerprint(IAtomContainer container, AllRingsFinder ringFinder) throws CDKException
Generates a fingerprint of the default size for the given AtomContainer.- Parameters:
container
- The AtomContainer for which a Fingerprint is generatedringFinder
- An instance ofAllRingsFinder
- Returns:
- A
BitSet
representing the fingerprint - Throws:
CDKException
- if there is a timeout in ring or aromaticity perception
-
getBitFingerprint
public IBitFingerprint getBitFingerprint(IAtomContainer container) throws CDKException
Generates a fingerprint of the default size for the given AtomContainer.- Specified by:
getBitFingerprint
in interfaceIFingerprinter
- Parameters:
container
- The AtomContainer for which a Fingerprint is generated- Returns:
- the bit fingerprint
- Throws:
CDKException
- may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(IAtomContainer container) throws CDKException
Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.- Specified by:
getRawFingerprint
in interfaceIFingerprinter
- Parameters:
container
- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the raw fingerprint
- Throws:
CDKException
-
getCountFingerprint
public ICountFingerprint getCountFingerprint(IAtomContainer container) throws CDKException
Description copied from interface:IFingerprinter
Returns the count fingerprint for the givenIAtomContainer
.- Specified by:
getCountFingerprint
in interfaceIFingerprinter
- Parameters:
container
-IAtomContainer
for which the fingerprint should be calculated.- Returns:
- the count fingerprint
- Throws:
CDKException
- if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
-
findPathes
@Deprecated protected int[] findPathes(IAtomContainer container, int searchDepth) throws CDKException
Deprecated.Get all paths of lengths 0 to the specified length. This method will find all paths up to length N starting from each atom in the molecule and return the unique set of such paths.- Parameters:
container
- The molecule to searchsearchDepth
- The maximum path length desired- Returns:
- A Map of path strings, keyed on themselves
- Throws:
CDKException
-
encodePaths
protected void encodePaths(IAtomContainer mol, int depth, BitSet fp, int size) throws CDKException
- Throws:
CDKException
-
getBondSymbol
protected String getBondSymbol(IBond bond)
Gets the bondSymbol attribute of the Fingerprinter class- Parameters:
bond
- Description of the Parameter- Returns:
- The bondSymbol value
-
setPathLimit
public void setPathLimit(int limit)
-
setHashPseudoAtoms
public void setHashPseudoAtoms(boolean value)
Include pseudo/query atoms in the fingerprint with atomic number 0. Generally for substructure screening, which path based fingerprints are most useful, this is not wanted.- Parameters:
value
- the setting (false by default)
-
setHashExplicitHydrogens
public void setHashExplicitHydrogens(boolean value)
Include explicit hydrogen atoms in the fingerprint. This means you get a different fingerprint if hydrogens are implicit/explicit. Generally for substructure screening, which path based fingerprints are most useful, this is not wanted.- Parameters:
value
- the setting (false by default)
-
getSearchDepth
public int getSearchDepth()
-
getSize
public int getSize()
Description copied from interface:IFingerprinter
Returns the size (or length) of the fingerprint.- Specified by:
getSize
in interfaceIFingerprinter
- Returns:
- the size of the fingerprint
-
-