Class Fingerprinter

  • All Implemented Interfaces:
    IFingerprinter
    Direct Known Subclasses:
    GraphOnlyFingerprinter, HybridizationFingerprinter

    public class Fingerprinter
    extends AbstractFingerprinter
    implements IFingerprinter
    Generates a fingerprint for a given AtomContainer. Fingerprints are one-dimensional bit arrays, where bits are set according to a the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information). Fingerprints allow for a fast screening step to exclude candidates for a substructure search in a database. They are also a means for determining the similarity of chemical structures.

    A fingerprint is generated for an AtomContainer with this code:

       Molecule molecule = new Molecule();
       IFingerprinter fingerprinter = new Fingerprinter();
       IBitFingerprint fingerprint = fingerprinter.getBitFingerprint(molecule);
       fingerprint.size(); // returns 1024 by default
       fingerprint.length(); // returns the highest set bit
     

    The FingerPrinter assumes that hydrogens are explicitly given! Furthermore, if pseudo atoms or atoms with malformed symbols are present, their atomic number is taken as one more than the last element currently supported in PeriodicTable. Warning: The aromaticity detection for this FingerPrinter relies on AllRingsFinder, which is known to take very long for some molecules with many cycles or special cyclic topologies. Thus, the AllRingsFinder has a built-in timeout of 5 seconds after which it aborts and throws an Exception. If you want your SMILES generated at any expense, you need to create your own AllRingsFinder, set the timeout to a higher value, and assign it to this FingerPrinter. In the vast majority of cases, however, the defaults will be fine.

    Another Warning : The daylight manual says: "Fingerprints are not so definite: if a fingerprint indicates a pattern is missing then it certainly is, but it can only indicate a pattern's presence with some probability." In the case of very small molecules, the probability that you get the same fingerprint for different molecules is high.

    Author:
    steinbeck
    Source code:
    main
    Belongs to CDK module:
    standard
    Keywords:
    fingerprint, similarity
    Created on:
    2002-02-24
    • Field Detail

      • DEFAULT_SIZE

        public static final int DEFAULT_SIZE
        The default length of created fingerprints.
        See Also:
        Constant Field Values
      • DEFAULT_SEARCH_DEPTH

        public static final int DEFAULT_SEARCH_DEPTH
        The default search depth used to create the fingerprints.
        See Also:
        Constant Field Values
    • Constructor Detail

      • Fingerprinter

        public Fingerprinter()
        Creates a fingerprint generator of length DEFAULT_SIZE and with a search depth of DEFAULT_SEARCH_DEPTH.
      • Fingerprinter

        public Fingerprinter​(int size)
      • Fingerprinter

        public Fingerprinter​(int size,
                             int searchDepth)
        Constructs a fingerprint generator that creates fingerprints of the given size, using a generation algorithm with the given search depth.
        Parameters:
        size - The desired size of the fingerprint
        searchDepth - The desired depth of search (number of bonds)
    • Method Detail

      • getBitFingerprint

        public IBitFingerprint getBitFingerprint​(IAtomContainer container,
                                                 AllRingsFinder ringFinder)
                                          throws CDKException
        Generates a fingerprint of the default size for the given AtomContainer.
        Parameters:
        container - The AtomContainer for which a Fingerprint is generated
        ringFinder - An instance of AllRingsFinder
        Returns:
        A BitSet representing the fingerprint
        Throws:
        CDKException - if there is a timeout in ring or aromaticity perception
      • getBitFingerprint

        public IBitFingerprint getBitFingerprint​(IAtomContainer container)
                                          throws CDKException
        Generates a fingerprint of the default size for the given AtomContainer.
        Specified by:
        getBitFingerprint in interface IFingerprinter
        Parameters:
        container - The AtomContainer for which a Fingerprint is generated
        Returns:
        the bit fingerprint
        Throws:
        CDKException - may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
      • getRawFingerprint

        public Map<String,​Integer> getRawFingerprint​(IAtomContainer container)
                                                    throws CDKException
        Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.
        Specified by:
        getRawFingerprint in interface IFingerprinter
        Parameters:
        container - IAtomContainer for which the fingerprint should be calculated.
        Returns:
        the raw fingerprint
        Throws:
        CDKException
      • findPathes

        @Deprecated
        protected int[] findPathes​(IAtomContainer container,
                                   int searchDepth)
                            throws CDKException
        Get all paths of lengths 0 to the specified length. This method will find all paths up to length N starting from each atom in the molecule and return the unique set of such paths.
        Parameters:
        container - The molecule to search
        searchDepth - The maximum path length desired
        Returns:
        A Map of path strings, keyed on themselves
        Throws:
        CDKException
      • getBondSymbol

        protected String getBondSymbol​(IBond bond)
        Gets the bondSymbol attribute of the Fingerprinter class
        Parameters:
        bond - Description of the Parameter
        Returns:
        The bondSymbol value
      • setPathLimit

        public void setPathLimit​(int limit)
      • setHashPseudoAtoms

        public void setHashPseudoAtoms​(boolean value)
      • getSearchDepth

        public int getSearchDepth()
      • getSize

        public int getSize()
        Description copied from interface: IFingerprinter
        Returns the size (or length) of the fingerprint.
        Specified by:
        getSize in interface IFingerprinter
        Returns:
        the size of the fingerprint