Class CircularFingerprinter

  • All Implemented Interfaces:
    IFingerprinter

    public class CircularFingerprinter
    extends AbstractFingerprinter
    implements IFingerprinter

    Circular fingerprints: for generating fingerprints that are functionally equivalent to ECFP-2/4/6 and FCFP-2/4/6 fingerprints, which are partially described by Rogers et al. [Rogers and Hahn. J. Chem. Inf. Mod.. 2010. 50].

    While the literature describes the method in detail, it does not disclose either the hashing technique for converting lists of integers into 32-bit codes, nor does it describe the scheme used to classify the atom types for creating the FCFP-class of descriptors. For this reason, the fingerprints that are created are not binary compatible with the reference implementation. They do, however, achieve effectively equal performance for modelling purposes.

    The resulting fingerprint bits are presented as a list of unique bits, each with a 32-bit hashcode; typically there are no more than a hundred or so unique bit hashcodes per molecule. These identifiers can be folded into a smaller array of bits, such that they can be represented as a single long binary number, which is often more convenient.

    The integer hashing is done using the CRC32 algorithm, using the Java CRC32 class, which is the same formula/parameters as used by PNG files, and described in:

    http://www.w3.org/TR/PNG/#D-CRCAppendix

    Implicit vs. explicit hydrogens are handled, i.e. it doesn't matter whether the incoming molecule is hydrogen suppressed or not.

    Implementation note: many of the algorithms involved in the generation of fingerprints (e.g. aromaticity, atom typing) have been coded up explicitly for use by this class, rather than making use of comparable functionality elsewhere in the CDK. This is to ensure that the CDK implementation of the algorithm is strictly equal to other implementations: dependencies on CDK functionality that could be modified or improved in the future would break binary compatibility with formerly identical implementations on other platforms.

    For the FCFP class of fingerprints, atom typing is done using a scheme similar to that described by Green et al [Green1994 (not found in db)].

    The fingerprints and their uses have been described in Clark et al. [Clark, A. et. al.. J. Cheminformatics. 2014. 6].
    Important! this fingerprint can not be used for substructure screening.

    Author:
    am.clark
    Source code:
    main
    Belongs to CDK module:
    standard
    Keywords:
    fingerprint, similarity
    Created on:
    2014-01-01
    • Constructor Detail

      • CircularFingerprinter

        public CircularFingerprinter()
        Default constructor: uses the ECFP6 type.
      • CircularFingerprinter

        public CircularFingerprinter​(int classType)
        Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
        Parameters:
        classType - one of CLASS_ECFP{n} or CLASS_FCFP{n}
      • CircularFingerprinter

        public CircularFingerprinter​(int classType,
                                     int len)
        Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
        Parameters:
        classType - one of CLASS_ECFP{n} or CLASS_FCFP{n}
        len - size of folded (binary) fingerprint
    • Method Detail

      • setPerceiveStereo

        public void setPerceiveStereo​(boolean val)
        Sets whether stereochemistry should be re-perceived from 2D/3D coordinates. By default stereochemistry encoded as IStereoElements are used.
        Parameters:
        val - perceived from 2D
      • calculate

        public void calculate​(IAtomContainer mol)
                       throws CDKException
        Calculates the fingerprints for the given IAtomContainer, and stores them for subsequent retrieval.
        Parameters:
        mol - chemical structure; all nodes should be known legitimate elements
        Throws:
        CDKException
      • getFPCount

        public int getFPCount()
        Returns the number of fingerprints generated.
        Returns:
        total number of unique fingerprint hashes generated
      • getFP

        public CircularFingerprinter.FP getFP​(int N)
        Returns the requested fingerprint.
        Parameters:
        N - index of fingerprint (0-based)
        Returns:
        instance of a fingerprint hash
      • getBitFingerprint

        public IBitFingerprint getBitFingerprint​(IAtomContainer mol)
                                          throws CDKException
        Calculates the circular fingerprint for the given IAtomContainer, and folds the result into a single bitset (see getSize()).
        Specified by:
        getBitFingerprint in interface IFingerprinter
        Parameters:
        mol - IAtomContainer for which the fingerprint should be calculated.
        Returns:
        the fingerprint
        Throws:
        CDKException - may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
      • getCountFingerprint

        public ICountFingerprint getCountFingerprint​(IAtomContainer mol)
                                              throws CDKException
        Calculates the circular fingerprint for the given IAtomContainer, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e. does not fold them into a bitmask).
        Specified by:
        getCountFingerprint in interface IFingerprinter
        Parameters:
        mol - IAtomContainer for which the fingerprint should be calculated.
        Returns:
        the count fingerprint
        Throws:
        CDKException - if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
      • getSize

        public int getSize()
        Returns the extent of the folded fingerprints.
        Specified by:
        getSize in interface IFingerprinter
        Returns:
        the size of the fingerprint