Class CircularFingerprinter
- java.lang.Object
-
- org.openscience.cdk.fingerprint.AbstractFingerprinter
-
- org.openscience.cdk.fingerprint.CircularFingerprinter
-
- All Implemented Interfaces:
IFingerprinter
public class CircularFingerprinter extends AbstractFingerprinter implements IFingerprinter
Circular fingerprints: for generating fingerprints that are functionally equivalent to ECFP-2/4/6 and FCFP-2/4/6 fingerprints, which are partially described by Rogers et al. [Rogers and Hahn. J. Chem. Inf. Mod.. 2010. 50].
While the literature describes the method in detail, it does not disclose either the hashing technique for converting lists of integers into 32-bit codes, nor does it describe the scheme used to classify the atom types for creating the FCFP-class of descriptors. For this reason, the fingerprints that are created are not binary compatible with the reference implementation. They do, however, achieve effectively equal performance for modelling purposes.
The resulting fingerprint bits are presented as a list of unique bits, each with a 32-bit hashcode; typically there are no more than a hundred or so unique bit hashcodes per molecule. These identifiers can be folded into a smaller array of bits, such that they can be represented as a single long binary number, which is often more convenient.
The integer hashing is done using the CRC32 algorithm, using the Java CRC32 class, which is the same formula/parameters as used by PNG files, and described in:
http://www.w3.org/TR/PNG/#D-CRCAppendixImplicit vs. explicit hydrogens are handled, i.e. it doesn't matter whether the incoming molecule is hydrogen suppressed or not.
Implementation note: many of the algorithms involved in the generation of fingerprints (e.g. aromaticity, atom typing) have been coded up explicitly for use by this class, rather than making use of comparable functionality elsewhere in the CDK. This is to ensure that the CDK implementation of the algorithm is strictly equal to other implementations: dependencies on CDK functionality that could be modified or improved in the future would break binary compatibility with formerly identical implementations on other platforms.
For the FCFP class of fingerprints, atom typing is done using a scheme similar to that described by Green et al [Green1994 (not found in db)].
The fingerprints and their uses have been described in Clark et al. [Clark, A. et. al.. J. Cheminformatics. 2014. 6].
Important! this fingerprint can not be used for substructure screening.- Author:
- am.clark
- Source code:
- main
- Belongs to CDK module:
- standard
- Keywords:
- fingerprint, similarity
- Created on:
- 2014-01-01
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CircularFingerprinter.FP
-
Field Summary
Fields Modifier and Type Field Description static int
CLASS_ECFP0
static int
CLASS_ECFP2
static int
CLASS_ECFP4
static int
CLASS_ECFP6
static int
CLASS_FCFP0
static int
CLASS_FCFP2
static int
CLASS_FCFP4
static int
CLASS_FCFP6
-
Constructor Summary
Constructors Constructor Description CircularFingerprinter()
Default constructor: uses the ECFP6 type.CircularFingerprinter(int classType)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.CircularFingerprinter(int classType, int len)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
calculate(IAtomContainer mol)
Calculates the fingerprints for the givenIAtomContainer
, and stores them for subsequent retrieval.IBitFingerprint
getBitFingerprint(IAtomContainer mol)
Calculates the circular fingerprint for the givenIAtomContainer
, and folds the result into a single bitset (see getSize()).ICountFingerprint
getCountFingerprint(IAtomContainer mol)
Calculates the circular fingerprint for the givenIAtomContainer
, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e.CircularFingerprinter.FP
getFP(int N)
Returns the requested fingerprint.int
getFPCount()
Returns the number of fingerprints generated.protected List<Map.Entry<String,String>>
getParameters()
Base classes should override this method to report the parameters they are configured with.Map<String,Integer>
getRawFingerprint(IAtomContainer mol)
Invalid: it is not appropriate to convert the integer hash codes into strings.int
getSize()
Returns the extent of the folded fingerprints.void
setPerceiveStereo(boolean val)
Sets whether stereochemistry should be re-perceived from 2D/3D coordinates.-
Methods inherited from class org.openscience.cdk.fingerprint.AbstractFingerprinter
getFingerprint, getVersionDescription
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.openscience.cdk.fingerprint.IFingerprinter
getFingerprint, getVersionDescription
-
-
-
-
Field Detail
-
CLASS_ECFP0
public static final int CLASS_ECFP0
- See Also:
- Constant Field Values
-
CLASS_ECFP2
public static final int CLASS_ECFP2
- See Also:
- Constant Field Values
-
CLASS_ECFP4
public static final int CLASS_ECFP4
- See Also:
- Constant Field Values
-
CLASS_ECFP6
public static final int CLASS_ECFP6
- See Also:
- Constant Field Values
-
CLASS_FCFP0
public static final int CLASS_FCFP0
- See Also:
- Constant Field Values
-
CLASS_FCFP2
public static final int CLASS_FCFP2
- See Also:
- Constant Field Values
-
CLASS_FCFP4
public static final int CLASS_FCFP4
- See Also:
- Constant Field Values
-
CLASS_FCFP6
public static final int CLASS_FCFP6
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CircularFingerprinter
public CircularFingerprinter()
Default constructor: uses the ECFP6 type.
-
CircularFingerprinter
public CircularFingerprinter(int classType)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.- Parameters:
classType
- one of CLASS_ECFP{n} or CLASS_FCFP{n}
-
CircularFingerprinter
public CircularFingerprinter(int classType, int len)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.- Parameters:
classType
- one of CLASS_ECFP{n} or CLASS_FCFP{n}len
- size of folded (binary) fingerprint
-
-
Method Detail
-
setPerceiveStereo
public void setPerceiveStereo(boolean val)
Sets whether stereochemistry should be re-perceived from 2D/3D coordinates. By default stereochemistry encoded asIStereoElement
s are used.- Parameters:
val
- perceived from 2D
-
getParameters
protected List<Map.Entry<String,String>> getParameters()
Description copied from class:AbstractFingerprinter
Base classes should override this method to report the parameters they are configured with.- Overrides:
getParameters
in classAbstractFingerprinter
- Returns:
- The key=value pairs of configured parameters
-
calculate
public void calculate(IAtomContainer mol) throws CDKException
Calculates the fingerprints for the givenIAtomContainer
, and stores them for subsequent retrieval.- Parameters:
mol
- chemical structure; all nodes should be known legitimate elements- Throws:
CDKException
-
getFPCount
public int getFPCount()
Returns the number of fingerprints generated.- Returns:
- total number of unique fingerprint hashes generated
-
getFP
public CircularFingerprinter.FP getFP(int N)
Returns the requested fingerprint.- Parameters:
N
- index of fingerprint (0-based)- Returns:
- instance of a fingerprint hash
-
getBitFingerprint
public IBitFingerprint getBitFingerprint(IAtomContainer mol) throws CDKException
Calculates the circular fingerprint for the givenIAtomContainer
, and folds the result into a single bitset (see getSize()).- Specified by:
getBitFingerprint
in interfaceIFingerprinter
- Parameters:
mol
- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the fingerprint
- Throws:
CDKException
- may be thrown if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error
-
getCountFingerprint
public ICountFingerprint getCountFingerprint(IAtomContainer mol) throws CDKException
Calculates the circular fingerprint for the givenIAtomContainer
, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e. does not fold them into a bitmask).- Specified by:
getCountFingerprint
in interfaceIFingerprinter
- Parameters:
mol
- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the count fingerprint
- Throws:
CDKException
- if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(IAtomContainer mol) throws CDKException
Invalid: it is not appropriate to convert the integer hash codes into strings.- Specified by:
getRawFingerprint
in interfaceIFingerprinter
- Parameters:
mol
- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the raw fingerprint
- Throws:
CDKException
-
getSize
public int getSize()
Returns the extent of the folded fingerprints.- Specified by:
getSize
in interfaceIFingerprinter
- Returns:
- the size of the fingerprint
-
-