public class CircularFingerprinter extends AbstractFingerprinter implements IFingerprinter
Circular fingerprints: for generating fingerprints that are functionally equivalent to ECFP-2/4/6 and FCFP-2/4/6 fingerprints, which are partially described by Rogers et al. (Rogers and Hahn. J. Chem. Inf. Mod.. 2010. 50).
While the literature describes the method in detail, it does not disclose either the hashing technique for converting lists of integers into 32-bit codes, nor does it describe the scheme used to classify the atom types for creating the FCFP-class of descriptors. For this reason, the fingerprints that are created are not binary compatible with the reference implementation. They do, however, achieve effectively equal performance for modelling purposes.
The resulting fingerprint bits are presented as a list of unique bits, each with a 32-bit hashcode; typically there are no more than a hundred or so unique bit hashcodes per molecule. These identifiers can be folded into a smaller array of bits, such that they can be represented as a single long binary number, which is often more convenient.
The integer hashing is done using the CRC32 algorithm, using the Java CRC32 class, which is the same formula/parameters as used by PNG files, and described in:
http://www.w3.org/TR/PNG/#D-CRCAppendixImplicit vs. explicit hydrogens are handled, i.e. it doesn't matter whether the incoming molecule is hydrogen suppressed or not.
Implementation note: many of the algorithms involved in the generation of fingerprints (e.g. aromaticity, atom typing) have been coded up explicitly for use by this class, rather than making use of comparable functionality elsewhere in the CDK. This is to ensure that the CDK implementation of the algorithm is strictly equal to other implementations: dependencies on CDK functionality that could be modified or improved in the future would break binary compatibility with formerly identical implementations on other platforms.
For the FCFP class of fingerprints, atom typing is done using a scheme similar to that described by Green et al (Green1994 (not found in db)).
The fingerprints and their uses have been described in Clark et al. (Clark, A. et. al.. J. Cheminformatics. 2014. 6).
Important! this fingerprint can not be used for substructure screening.
Modifier and Type | Class and Description |
---|---|
static class |
CircularFingerprinter.FP |
Modifier and Type | Field and Description |
---|---|
static int |
CLASS_ECFP0 |
static int |
CLASS_ECFP2 |
static int |
CLASS_ECFP4 |
static int |
CLASS_ECFP6 |
static int |
CLASS_FCFP0 |
static int |
CLASS_FCFP2 |
static int |
CLASS_FCFP4 |
static int |
CLASS_FCFP6 |
Constructor and Description |
---|
CircularFingerprinter()
Default constructor: uses the ECFP6 type.
|
CircularFingerprinter(int classType)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is
for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the
path diameter, and may be 0, 2, 4 or 6.
|
CircularFingerprinter(int classType,
int len)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is
for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the
path diameter, and may be 0, 2, 4 or 6.
|
Modifier and Type | Method and Description |
---|---|
void |
calculate(IAtomContainer mol)
Calculates the fingerprints for the given
IAtomContainer , and stores them for subsequent retrieval. |
IBitFingerprint |
getBitFingerprint(IAtomContainer mol)
Calculates the circular fingerprint for the given
IAtomContainer , and folds the result into a single bitset
(see getSize()). |
ICountFingerprint |
getCountFingerprint(IAtomContainer mol)
Calculates the circular fingerprint for the given
IAtomContainer , and returns a datastructure that enumerates all
of the fingerprints, and their counts (i.e. |
CircularFingerprinter.FP |
getFP(int N)
Returns the requested fingerprint.
|
int |
getFPCount()
Returns the number of fingerprints generated.
|
protected List<Map.Entry<String,String>> |
getParameters()
Base classes should override this method to report the parameters they
are configured with.
|
Map<String,Integer> |
getRawFingerprint(IAtomContainer mol)
Invalid: it is not appropriate to convert the integer hash codes into strings.
|
int |
getSize()
Returns the extent of the folded fingerprints.
|
void |
setPerceiveStereo(boolean val)
Sets whether stereochemistry should be re-perceived from 2D/3D
coordinates.
|
getFingerprint, getVersionDescription
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getFingerprint, getVersionDescription
public static final int CLASS_ECFP0
public static final int CLASS_ECFP2
public static final int CLASS_ECFP4
public static final int CLASS_ECFP6
public static final int CLASS_FCFP0
public static final int CLASS_FCFP2
public static final int CLASS_FCFP4
public static final int CLASS_FCFP6
public CircularFingerprinter()
public CircularFingerprinter(int classType)
classType
- one of CLASS_ECFP{n} or CLASS_FCFP{n}public CircularFingerprinter(int classType, int len)
classType
- one of CLASS_ECFP{n} or CLASS_FCFP{n}len
- size of folded (binary) fingerprintpublic void setPerceiveStereo(boolean val)
IStereoElement
s
are used.val
- perceived from 2Dprotected List<Map.Entry<String,String>> getParameters()
AbstractFingerprinter
getParameters
in class AbstractFingerprinter
public void calculate(IAtomContainer mol) throws CDKException
IAtomContainer
, and stores them for subsequent retrieval.mol
- chemical structure; all nodes should be known legitimate elementsCDKException
public int getFPCount()
public CircularFingerprinter.FP getFP(int N)
N
- index of fingerprint (0-based)public IBitFingerprint getBitFingerprint(IAtomContainer mol) throws CDKException
IAtomContainer
, and folds the result into a single bitset
(see getSize()).getBitFingerprint
in interface IFingerprinter
mol
- IAtomContainer for which the fingerprint should be calculated.CDKException
- may be thrown if there is an error during aromaticity detection
or (for key based fingerprints) if there is a SMARTS parsing errorpublic ICountFingerprint getCountFingerprint(IAtomContainer mol) throws CDKException
IAtomContainer
, and returns a datastructure that enumerates all
of the fingerprints, and their counts (i.e. does not fold them into a bitmask).getCountFingerprint
in interface IFingerprinter
mol
- IAtomContainer for which the fingerprint should be calculated.CDKException
- if there is an error during aromaticity detection
or (for key based fingerprints) if there is a SMARTS parsing error.public Map<String,Integer> getRawFingerprint(IAtomContainer mol) throws CDKException
getRawFingerprint
in interface IFingerprinter
mol
- IAtomContainer for which the fingerprint should be calculated.CDKException
public int getSize()
getSize
in interface IFingerprinter
Copyright © 2021. All rights reserved.