Class Fingerprinter

  • All Implemented Interfaces:
    IFingerprinter
    Direct Known Subclasses:
    GraphOnlyFingerprinter, HybridizationFingerprinter

    public class Fingerprinter
    extends AbstractFingerprinter
    implements IFingerprinter
    Generates a fingerprint for a given AtomContainer. Fingerprints are one-dimensional bit arrays, where bits are set according to a the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information). Fingerprints allow for a fast screening step to exclude candidates for a substructure search in a database. They are also a means for determining the similarity of chemical structures.

    A fingerprint is generated for an AtomContainer with this code:

       Molecule molecule = new Molecule();
       IFingerprinter fingerprinter = new Fingerprinter();
       IBitFingerprint fingerprint = fingerprinter.getBitFingerprint(molecule);
       fingerprint.size(); // returns 1024 by default
       fingerprint.length(); // returns the highest set bit
     

    The FingerPrinter has the option to ignore explicit hydrogen's (setHashExplicitHydrogens(boolean)) and pseudo atoms (setHashPseudoAtoms(boolean)). This ensures the fingerprint can be used for substructure screening by default.

    Warning: The aromaticity detection for this FingerPrinter relies on AllRingsFinder, which is known to take very long for some molecules with many cycles or special cyclic topologies. Thus, the AllRingsFinder has a built-in timeout of 5 seconds after which it aborts and throws an Exception. If you want your SMILES generated at any expense, you need to create your own AllRingsFinder, set the timeout to a higher value, and assign it to this FingerPrinter. In the vast majority of cases, however, the defaults will be fine.

    Another Warning : The daylight manual says: "Fingerprints are not so definite: if a fingerprint indicates a pattern is missing then it certainly is, but it can only indicate a pattern's presence with some probability." In the case of very small molecules, the probability that you get the same fingerprint for different molecules is high.

    Author:
    steinbeck
    Source code:
    main
    Belongs to CDK module:
    standard
    Keywords:
    fingerprint, similarity
    Created on:
    2002-02-24
    • Field Detail

      • DEFAULT_SIZE

        public static final int DEFAULT_SIZE
        The default length of created fingerprints.
        See Also:
        Constant Field Values
      • DEFAULT_SEARCH_DEPTH

        public static final int DEFAULT_SEARCH_DEPTH
        The default search depth used to create the fingerprints.
        See Also:
        Constant Field Values
    • Constructor Detail

      • Fingerprinter

        public Fingerprinter()
        Creates a fingerprint generator of length DEFAULT_SIZE and with a search depth of DEFAULT_SEARCH_DEPTH.
      • Fingerprinter

        public Fingerprinter​(int size)
      • Fingerprinter

        public Fingerprinter​(int size,
                             int searchDepth)
        Constructs a fingerprint generator that creates fingerprints of the given size, using a generation algorithm with the given search depth.
        Parameters:
        size - The desired size of the fingerprint
        searchDepth - The desired depth of search (number of bonds)
    • Method Detail

      • getBitFingerprint

        public IBitFingerprint getBitFingerprint​(IAtomContainer container,
                                                 AllRingsFinder ringFinder)
                                          throws CDKException
        Generates a fingerprint of the default size for the given AtomContainer.
        Parameters:
        container - The AtomContainer for which a Fingerprint is generated
        ringFinder - An instance of AllRingsFinder
        Returns:
        A BitSet representing the fingerprint
        Throws:
        CDKException - if there is a timeout in ring or aromaticity perception
      • getBitFingerprint

        public IBitFingerprint getBitFingerprint​(IAtomContainer container)
                                          throws CDKException
        Generates a fingerprint of the default size for the given AtomContainer.
        Specified by:
        getBitFingerprint in interface IFingerprinter
        Parameters:
        container - The AtomContainer for which a Fingerprint is generated
        Returns:
        the bit fingerprint
        Throws:
        CDKException - may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
      • getRawFingerprint

        public Map<String,​Integer> getRawFingerprint​(IAtomContainer container)
                                                    throws CDKException
        Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.
        Specified by:
        getRawFingerprint in interface IFingerprinter
        Parameters:
        container - IAtomContainer for which the fingerprint should be calculated.
        Returns:
        the raw fingerprint
        Throws:
        CDKException
      • findPathes

        @Deprecated
        protected int[] findPathes​(IAtomContainer container,
                                   int searchDepth)
                            throws CDKException
        Get all paths of lengths 0 to the specified length. This method will find all paths up to length N starting from each atom in the molecule and return the unique set of such paths.
        Parameters:
        container - The molecule to search
        searchDepth - The maximum path length desired
        Returns:
        A Map of path strings, keyed on themselves
        Throws:
        CDKException
      • getBondSymbol

        protected String getBondSymbol​(IBond bond)
        Gets the bondSymbol attribute of the Fingerprinter class
        Parameters:
        bond - Description of the Parameter
        Returns:
        The bondSymbol value
      • setPathLimit

        public void setPathLimit​(int limit)
      • setHashPseudoAtoms

        public void setHashPseudoAtoms​(boolean value)
        Include pseudo/query atoms in the fingerprint with atomic number 0. Generally for substructure screening, which path based fingerprints are most useful, this is not wanted.
        Parameters:
        value - the setting (false by default)
      • setHashExplicitHydrogens

        public void setHashExplicitHydrogens​(boolean value)
        Include explicit hydrogen atoms in the fingerprint. This means you get a different fingerprint if hydrogens are implicit/explicit. Generally for substructure screening, which path based fingerprints are most useful, this is not wanted.
        Parameters:
        value - the setting (false by default)
      • getSearchDepth

        public int getSearchDepth()
      • getSize

        public int getSize()
        Description copied from interface: IFingerprinter
        Returns the size (or length) of the fingerprint.
        Specified by:
        getSize in interface IFingerprinter
        Returns:
        the size of the fingerprint