Package org.openscience.cdk.fingerprint
Class PubchemFingerprinter
- java.lang.Object
-
- org.openscience.cdk.fingerprint.AbstractFingerprinter
-
- org.openscience.cdk.fingerprint.PubchemFingerprinter
-
- All Implemented Interfaces:
IFingerprinter
public class PubchemFingerprinter extends AbstractFingerprinter implements IFingerprinter
Generates a Pubchem fingerprint for a molecule. These fingerprints are described here and are of the structural key type, of length 881. SeeFingerprinter
for a more detailed description of fingerprints in general. This implementation is based on the public domain code made available by the NCGC here A fingerprint is generated for an AtomContainer with this code:IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance(); IAtomContainer molecule = ...; // e.g. from SMILES // note likely now optional: // AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol); // Aromaticity.cdkLegacy().apply(mol); // AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol); PubchemFingerprinter fprinter = new PubchemFingerprinter(builder); BitSet fingerprint = fprinter.getBitFingerprint(molecule).asBitSet();
Note: the fingerprinter originally assumed you have detected aromaticity and atom types before evaluating the fingerprint, hydrogens should also be explicit. Modifications have been made to automatically aromatize before SMARTS matching and work with either implicit/explicit Hydrogens but further testing is needed to confirm the correct results (as defined in PubChem SDfiles) are obtained. Note that this fingerprint is not particularly fast, as it will perform ring detection usingAllRingsFinder
as well as multiple SMARTS queries. Some SMARTS patterns have been modified from the original code, since they were based on explicit H matching. As a result, we replace the explicit H's with a query of the#<N>&!H0
where<N>
is the atomic number. Thus bit 344 was originally[#6](~[#6])([H])
but is written here as[#6&!H0]~[#6]
. In some cases, where the H count can be reduced to single possibility we directly use that H count. An example is bit 35, which was[#6](~[#6])(~[#6])(~[#6])([H])
and is rewritten as[#6H1](~[#6])(~[#6])(~[#6])
.
Warning - this class is not thread-safe and uses stores intermediate steps internally. Please use a separate instance of the class for each thread.
Important! this fingerprint can not be used for substructure screening.- Author:
- Rajarshi Guha
- Source code:
- main
- Belongs to CDK module:
- fingerprint Thread Safe: No
- Keywords:
- fingerprint, similarity
-
-
Field Summary
Fields Modifier and Type Field Description static int
FP_SIZE
Number of bits in this fingerprint.
-
Constructor Summary
Constructors Constructor Description PubchemFingerprinter(IChemObjectBuilder builder)
PubchemFingerprinter(IChemObjectBuilder builder, boolean esssr)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static BitSet
decode(String enc)
Returns a fingerprint from a Base64 encoded Pubchem fingerprint.IBitFingerprint
getBitFingerprint(IAtomContainer atomContainer)
Calculate 881 bit Pubchem fingerprint for a molecule.ICountFingerprint
getCountFingerprint(IAtomContainer container)
Returns the count fingerprint for the givenIAtomContainer
.byte[]
getFingerprintAsBytes()
Returns the fingerprint generated for a molecule as a byte[].Map<String,Integer>
getRawFingerprint(IAtomContainer iAtomContainer)
Returns the raw representation of the fingerprint for the given IAtomContainer.int
getSize()
Get the size of the fingerprint.-
Methods inherited from class org.openscience.cdk.fingerprint.AbstractFingerprinter
getFingerprint, getParameters, getVersionDescription
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.openscience.cdk.fingerprint.IFingerprinter
getFingerprint, getVersionDescription
-
-
-
-
Field Detail
-
FP_SIZE
public static final int FP_SIZE
Number of bits in this fingerprint.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PubchemFingerprinter
public PubchemFingerprinter(IChemObjectBuilder builder, boolean esssr)
-
PubchemFingerprinter
public PubchemFingerprinter(IChemObjectBuilder builder)
-
-
Method Detail
-
getBitFingerprint
public IBitFingerprint getBitFingerprint(IAtomContainer atomContainer) throws CDKException
Calculate 881 bit Pubchem fingerprint for a molecule. See here for a description of each bit position.- Specified by:
getBitFingerprint
in interfaceIFingerprinter
- Parameters:
atomContainer
- the molecule to consider- Returns:
- the fingerprint
- Throws:
CDKException
- if there is an error during substructure searching or atom typing- See Also:
getFingerprintAsBytes()
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(IAtomContainer iAtomContainer) throws CDKException
Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.- Specified by:
getRawFingerprint
in interfaceIFingerprinter
- Parameters:
iAtomContainer
- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the raw fingerprint
- Throws:
CDKException
-
getSize
public int getSize()
Get the size of the fingerprint.- Specified by:
getSize
in interfaceIFingerprinter
- Returns:
- The bit length of the fingerprint
-
getFingerprintAsBytes
public byte[] getFingerprintAsBytes()
Returns the fingerprint generated for a molecule as a byte[]. Note that this should be immediately called after callinggetBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer)
- Returns:
- The fingerprint as a byte array
- See Also:
getBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer)
-
decode
public static BitSet decode(String enc)
Returns a fingerprint from a Base64 encoded Pubchem fingerprint.- Parameters:
enc
- The Base64 encoded fingerprint- Returns:
- A BitSet corresponding to the input fingerprint
-
getCountFingerprint
public ICountFingerprint getCountFingerprint(IAtomContainer container) throws CDKException
Returns the count fingerprint for the givenIAtomContainer
.- Specified by:
getCountFingerprint
in interfaceIFingerprinter
- Parameters:
container
-IAtomContainer
for which the fingerprint should be calculated.- Returns:
- the count fingerprint
- Throws:
CDKException
- if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
-
-