Package org.openscience.cdk.fingerprint
Class PubchemFingerprinter
java.lang.Object
org.openscience.cdk.fingerprint.AbstractFingerprinter
org.openscience.cdk.fingerprint.PubchemFingerprinter
- All Implemented Interfaces:
IFingerprinter
Generates a Pubchem fingerprint for a molecule.
These fingerprints are described
here and are of the structural key type, of length 881. See
Warning - this class is not thread-safe and uses stores intermediate steps internally. Please use a separate instance of the class for each thread.
Important! this fingerprint can not be used for substructure screening.
Fingerprinter for a
more detailed description of fingerprints in general. This implementation is
based on the public domain code made available by the NCGC
here
A fingerprint is generated for an AtomContainer with this code:
IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance(); IAtomContainer molecule = ...; // e.g. from SMILES // note likely now optional: // AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol); // Aromaticity.cdkLegacy().apply(mol); // AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol); PubchemFingerprinter fprinter = new PubchemFingerprinter(builder); BitSet fingerprint = fprinter.getBitFingerprint(molecule).asBitSet();Note: the fingerprinter originally assumed you have detected aromaticity and atom types before evaluating the fingerprint, hydrogens should also be explicit. Modifications have been made to automatically aromatize before SMARTS matching and work with either implicit/explicit Hydrogens but further testing is needed to confirm the correct results (as defined in PubChem SDfiles) are obtained. Note that this fingerprint is not particularly fast, as it will perform ring detection using
AllRingsFinder
as well as multiple SMARTS queries.
Some SMARTS patterns have been modified from the original code, since they
were based on explicit H matching. As a result, we replace the explicit H's
with a query of the #<N>&!H0 where <N> is the atomic number. Thus bit 344 was
originally [#6](~[#6])([H]) but is written here as
[#6&!H0]~[#6]. In some cases, where the H count can be reduced
to single possibility we directly use that H count. An example is bit 35,
which was [#6](~[#6])(~[#6])(~[#6])([H]) and is rewritten as
[#6H1](~[#6])(~[#6])(~[#6]).
Warning - this class is not thread-safe and uses stores intermediate steps internally. Please use a separate instance of the class for each thread.
Important! this fingerprint can not be used for substructure screening.
- Author:
- Rajarshi Guha Thread Safe: No
- Keywords:
- fingerprint, similarity
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionPubchemFingerprinter(IChemObjectBuilder builder) PubchemFingerprinter(IChemObjectBuilder builder, boolean esssr) -
Method Summary
Modifier and TypeMethodDescriptionstatic BitSetReturns a fingerprint from a Base64 encoded Pubchem fingerprint.getBitFingerprint(IAtomContainer atomContainer) Calculate 881 bit Pubchem fingerprint for a molecule.getCountFingerprint(IAtomContainer container) Returns the count fingerprint for the givenIAtomContainer.byte[]Returns the fingerprint generated for a molecule as a byte[].Base classes should override this method to report the parameters they are configured with.getRawFingerprint(IAtomContainer iAtomContainer) Returns the raw representation of the fingerprint for the given IAtomContainer.intgetSize()Get the size of the fingerprint.Methods inherited from class org.openscience.cdk.fingerprint.AbstractFingerprinter
getFingerprint, getVersionDescriptionMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.openscience.cdk.fingerprint.IFingerprinter
getFingerprint, getVersionDescription
-
Field Details
-
FP_SIZE
public static final int FP_SIZENumber of bits in this fingerprint.- See Also:
-
-
Constructor Details
-
PubchemFingerprinter
-
PubchemFingerprinter
-
-
Method Details
-
getParameters
Description copied from class:AbstractFingerprinterBase classes should override this method to report the parameters they are configured with.- Overrides:
getParametersin classAbstractFingerprinter- Returns:
- The key=value pairs of configured parameters
-
getBitFingerprint
Calculate 881 bit Pubchem fingerprint for a molecule. See here for a description of each bit position.- Specified by:
getBitFingerprintin interfaceIFingerprinter- Parameters:
atomContainer- the molecule to consider- Returns:
- the fingerprint
- Throws:
CDKException- if there is an error during substructure searching or atom typing- See Also:
-
getRawFingerprint
Returns the raw representation of the fingerprint for the given IAtomContainer. The raw representation contains counts as well as the key strings.- Specified by:
getRawFingerprintin interfaceIFingerprinter- Parameters:
iAtomContainer- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the raw fingerprint
- Throws:
CDKException
-
getSize
public int getSize()Get the size of the fingerprint.- Specified by:
getSizein interfaceIFingerprinter- Returns:
- The bit length of the fingerprint
-
getFingerprintAsBytes
public byte[] getFingerprintAsBytes()Returns the fingerprint generated for a molecule as a byte[]. Note that this should be immediately called after callinggetBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer)- Returns:
- The fingerprint as a byte array
- See Also:
-
decode
Returns a fingerprint from a Base64 encoded Pubchem fingerprint.- Parameters:
enc- The Base64 encoded fingerprint- Returns:
- A BitSet corresponding to the input fingerprint
-
getCountFingerprint
Returns the count fingerprint for the givenIAtomContainer.- Specified by:
getCountFingerprintin interfaceIFingerprinter- Parameters:
container-IAtomContainerfor which the fingerprint should be calculated.- Returns:
- the count fingerprint
- Throws:
CDKException- if there is an error during aromaticity detection or (for key based fingerprints) if there is a SMARTS parsing error.
-