Class Fingerprinter

java.lang.Object
org.openscience.cdk.fingerprint.AbstractFingerprinter
org.openscience.cdk.fingerprint.Fingerprinter
All Implemented Interfaces:
IFingerprinter
Direct Known Subclasses:
GraphOnlyFingerprinter, HybridizationFingerprinter

public class Fingerprinter extends AbstractFingerprinter implements IFingerprinter
Generates a fingerprint for a given AtomContainer. Fingerprints are one-dimensional bit arrays, where bits are set according to a the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information). Fingerprints allow for a fast screening step to exclude candidates for a substructure search in a database. They are also a means for determining the similarity of chemical structures.

A fingerprint is generated for an AtomContainer with this code:

   Molecule molecule = new Molecule();
   IFingerprinter fingerprinter = new Fingerprinter();
   IBitFingerprint fingerprint = fingerprinter.getBitFingerprint(molecule);
   fingerprint.size(); // returns 1024 by default
   fingerprint.length(); // returns the highest set bit
 

The FingerPrinter assumes that hydrogens are explicitly given! Furthermore, if pseudo atoms or atoms with malformed symbols are present, their atomic number is taken as one more than the last element currently supported in PeriodicTable. Warning: The aromaticity detection for this FingerPrinter relies on AllRingsFinder, which is known to take very long for some molecules with many cycles or special cyclic topologies. Thus, the AllRingsFinder has a built-in timeout of 5 seconds after which it aborts and throws an Exception. If you want your SMILES generated at any expense, you need to create your own AllRingsFinder, set the timeout to a higher value, and assign it to this FingerPrinter. In the vast majority of cases, however, the defaults will be fine.

Another Warning : The daylight manual says: "Fingerprints are not so definite: if a fingerprint indicates a pattern is missing then it certainly is, but it can only indicate a pattern's presence with some probability." In the case of very small molecules, the probability that you get the same fingerprint for different molecules is high.

Author:
steinbeck
Source code:
main
Belongs to CDK module:
standard
Keywords:
fingerprint, similarity
Created on:
2002-02-24
  • Field Details

    • DEFAULT_SIZE

      public static final int DEFAULT_SIZE
      The default length of created fingerprints.
      See Also:
    • DEFAULT_SEARCH_DEPTH

      public static final int DEFAULT_SEARCH_DEPTH
      The default search depth used to create the fingerprints.
      See Also:
  • Constructor Details

    • Fingerprinter

      public Fingerprinter()
      Creates a fingerprint generator of length DEFAULT_SIZE and with a search depth of DEFAULT_SEARCH_DEPTH.
    • Fingerprinter

      public Fingerprinter(int size)
    • Fingerprinter

      public Fingerprinter(int size, int searchDepth)
      Constructs a fingerprint generator that creates fingerprints of the given size, using a generation algorithm with the given search depth.
      Parameters:
      size - The desired size of the fingerprint
      searchDepth - The desired depth of search (number of bonds)
  • Method Details

    • getParameters

      protected List<Map.Entry<String,String>> getParameters()
      Description copied from class: AbstractFingerprinter
      Base classes should override this method to report the parameters they are configured with.
      Overrides:
      getParameters in class AbstractFingerprinter
      Returns:
      The key=value pairs of configured parameters
    • getBitFingerprint

      public IBitFingerprint getBitFingerprint(IAtomContainer container, AllRingsFinder ringFinder) throws CDKException
      Generates a fingerprint of the default size for the given AtomContainer.
      Parameters:
      container - The AtomContainer for which a Fingerprint is generated
      ringFinder - An instance of AllRingsFinder
      Returns:
      A BitSet representing the fingerprint
      Throws:
      CDKException - if there is a timeout in ring or aromaticity perception
    • getBitFingerprint

      public IBitFingerprint getBitFingerprint(IAtomContainer container) throws CDKException
      Generates a fingerprint of the default size for the given AtomContainer.
      Specified by:
      getBitFingerprint in interface IFingerprinter
      Parameters:
      container - The AtomContainer for which a Fingerprint is generated
      Returns:
      the bit fingerprint
      Throws:
      CDKException - may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
    • getRawFingerprint

      public Map<String,Integer> getRawFingerprint(IAtomContainer container) throws CDKException
      Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.
      Specified by:
      getRawFingerprint in interface IFingerprinter
      Parameters:
      container - IAtomContainer for which the fingerprint should be calculated.
      Returns:
      the raw fingerprint
      Throws:
      CDKException
    • getCountFingerprint

      public ICountFingerprint getCountFingerprint(IAtomContainer container) throws CDKException
      Description copied from interface: IFingerprinter
      Returns the count fingerprint for the given IAtomContainer.
      Specified by:
      getCountFingerprint in interface IFingerprinter
      Parameters:
      container - IAtomContainer for which the fingerprint should be calculated.
      Returns:
      the count fingerprint
      Throws:
      CDKException - if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
    • findPathes

      @Deprecated protected int[] findPathes(IAtomContainer container, int searchDepth) throws CDKException
      Get all paths of lengths 0 to the specified length. This method will find all paths up to length N starting from each atom in the molecule and return the unique set of such paths.
      Parameters:
      container - The molecule to search
      searchDepth - The maximum path length desired
      Returns:
      A Map of path strings, keyed on themselves
      Throws:
      CDKException
    • encodePaths

      protected void encodePaths(IAtomContainer mol, int depth, BitSet fp, int size) throws CDKException
      Throws:
      CDKException
    • getBondSymbol

      protected String getBondSymbol(IBond bond)
      Gets the bondSymbol attribute of the Fingerprinter class
      Parameters:
      bond - Description of the Parameter
      Returns:
      The bondSymbol value
    • setPathLimit

      public void setPathLimit(int limit)
    • setHashPseudoAtoms

      public void setHashPseudoAtoms(boolean value)
    • getSearchDepth

      public int getSearchDepth()
    • getSize

      public int getSize()
      Description copied from interface: IFingerprinter
      Returns the size (or length) of the fingerprint.
      Specified by:
      getSize in interface IFingerprinter
      Returns:
      the size of the fingerprint