Class CircularFingerprinter

java.lang.Object
org.openscience.cdk.fingerprint.AbstractFingerprinter
org.openscience.cdk.fingerprint.CircularFingerprinter
All Implemented Interfaces:
IFingerprinter

public class CircularFingerprinter extends AbstractFingerprinter implements IFingerprinter

Circular fingerprints: for generating fingerprints that are functionally equivalent to ECFP-2/4/6 and FCFP-2/4/6 fingerprints, which are partially described by Rogers et al. [Rogers and Hahn. J. Chem. Inf. Mod.. 2010. 50].

While the literature describes the method in detail, it does not disclose either the hashing technique for converting lists of integers into 32-bit codes, nor does it describe the scheme used to classify the atom types for creating the FCFP-class of descriptors. For this reason, the fingerprints that are created are not binary compatible with the reference implementation. They do, however, achieve effectively equal performance for modelling purposes.

The resulting fingerprint bits are presented as a list of unique bits, each with a 32-bit hashcode; typically there are no more than a hundred or so unique bit hashcodes per molecule. These identifiers can be folded into a smaller array of bits, such that they can be represented as a single long binary number, which is often more convenient.

The integer hashing is done using the CRC32 algorithm, using the Java CRC32 class, which is the same formula/parameters as used by PNG files, and described in:

http://www.w3.org/TR/PNG/#D-CRCAppendix

Implicit vs. explicit hydrogens are handled, i.e. it doesn't matter whether the incoming molecule is hydrogen suppressed or not.

Implementation note: many of the algorithms involved in the generation of fingerprints (e.g. aromaticity, atom typing) have been coded up explicitly for use by this class, rather than making use of comparable functionality elsewhere in the CDK. This is to ensure that the CDK implementation of the algorithm is strictly equal to other implementations: dependencies on CDK functionality that could be modified or improved in the future would break binary compatibility with formerly identical implementations on other platforms.

For the FCFP class of fingerprints, atom typing is done using a scheme similar to that described by Green et al [Green1994 (not found in db)].

The fingerprints and their uses have been described in Clark et al. [Clark, A. et. al.. J. Cheminformatics. 2014. 6].
Important! this fingerprint can not be used for substructure screening.

Author:
am.clark
Source code:
main
Belongs to CDK module:
standard
Keywords:
fingerprint, similarity
Created on:
2014-01-01
  • Field Details

  • Constructor Details

    • CircularFingerprinter

      public CircularFingerprinter()
      Default constructor: uses the ECFP6 type.
    • CircularFingerprinter

      public CircularFingerprinter(int classType)
      Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
      Parameters:
      classType - one of CLASS_ECFP{n} or CLASS_FCFP{n}
    • CircularFingerprinter

      public CircularFingerprinter(int classType, int len)
      Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
      Parameters:
      classType - one of CLASS_ECFP{n} or CLASS_FCFP{n}
      len - size of folded (binary) fingerprint
  • Method Details

    • setPerceiveStereo

      public void setPerceiveStereo(boolean val)
      Sets whether stereochemistry should be re-perceived from 2D/3D coordinates. By default stereochemistry encoded as IStereoElements are used.
      Parameters:
      val - perceived from 2D
    • getParameters

      protected List<Map.Entry<String,String>> getParameters()
      Description copied from class: AbstractFingerprinter
      Base classes should override this method to report the parameters they are configured with.
      Overrides:
      getParameters in class AbstractFingerprinter
      Returns:
      The key=value pairs of configured parameters
    • calculate

      public void calculate(IAtomContainer mol) throws CDKException
      Calculates the fingerprints for the given IAtomContainer, and stores them for subsequent retrieval.
      Parameters:
      mol - chemical structure; all nodes should be known legitimate elements
      Throws:
      CDKException
    • getFPCount

      public int getFPCount()
      Returns the number of fingerprints generated.
      Returns:
      total number of unique fingerprint hashes generated
    • getFP

      public CircularFingerprinter.FP getFP(int N)
      Returns the requested fingerprint.
      Parameters:
      N - index of fingerprint (0-based)
      Returns:
      instance of a fingerprint hash
    • getBitFingerprint

      public IBitFingerprint getBitFingerprint(IAtomContainer mol) throws CDKException
      Calculates the circular fingerprint for the given IAtomContainer, and folds the result into a single bitset (see getSize()).
      Specified by:
      getBitFingerprint in interface IFingerprinter
      Parameters:
      mol - IAtomContainer for which the fingerprint should be calculated.
      Returns:
      the fingerprint
      Throws:
      CDKException - may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
    • getCountFingerprint

      public ICountFingerprint getCountFingerprint(IAtomContainer mol) throws CDKException
      Calculates the circular fingerprint for the given IAtomContainer, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e. does not fold them into a bitmask).
      Specified by:
      getCountFingerprint in interface IFingerprinter
      Parameters:
      mol - IAtomContainer for which the fingerprint should be calculated.
      Returns:
      the count fingerprint
      Throws:
      CDKException - if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
    • getRawFingerprint

      public Map<String,Integer> getRawFingerprint(IAtomContainer mol) throws CDKException
      Invalid: it is not appropriate to convert the integer hash codes into strings.
      Specified by:
      getRawFingerprint in interface IFingerprinter
      Parameters:
      mol - IAtomContainer for which the fingerprint should be calculated.
      Returns:
      the raw fingerprint
      Throws:
      CDKException
    • getSize

      public int getSize()
      Returns the extent of the folded fingerprints.
      Specified by:
      getSize in interface IFingerprinter
      Returns:
      the size of the fingerprint