Class PubchemFingerprinter

  • All Implemented Interfaces:
    IFingerprinter

    public class PubchemFingerprinter
    extends AbstractFingerprinter
    implements IFingerprinter
    Generates a Pubchem fingerprint for a molecule. These fingerprints are described here and are of the structural key type, of length 881. See Fingerprinter for a more detailed description of fingerprints in general. This implementation is based on the public domain code made available by the NCGC here A fingerprint is generated for an AtomContainer with this code:
       IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
       IAtomContainer molecule = ...; // e.g. from SMILES
    
       // note likely now optional:
       // AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
       // Aromaticity.cdkLegacy().apply(mol);
       // AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol);
    
       PubchemFingerprinter fprinter = new PubchemFingerprinter(builder);
       BitSet fingerprint = fprinter.getBitFingerprint(molecule).asBitSet();
     
    Note: the fingerprinter originally assumed you have detected aromaticity and atom types before evaluating the fingerprint, hydrogens should also be explicit. Modifications have been made to automatically aromatize before SMARTS matching and work with either implicit/explicit Hydrogens but further testing is needed to confirm the correct results (as defined in PubChem SDfiles) are obtained. Note that this fingerprint is not particularly fast, as it will perform ring detection using AllRingsFinder as well as multiple SMARTS queries. Some SMARTS patterns have been modified from the original code, since they were based on explicit H matching. As a result, we replace the explicit H's with a query of the #<N>&!H0 where <N> is the atomic number. Thus bit 344 was originally [#6](~[#6])([H]) but is written here as [#6&!H0]~[#6]. In some cases, where the H count can be reduced to single possibility we directly use that H count. An example is bit 35, which was [#6](~[#6])(~[#6])(~[#6])([H]) and is rewritten as [#6H1](~[#6])(~[#6])(~[#6]).
    Warning - this class is not thread-safe and uses stores intermediate steps internally. Please use a separate instance of the class for each thread.
    Important! this fingerprint can not be used for substructure screening.
    Author:
    Rajarshi Guha
    Source code:
    main
    Belongs to CDK module:
    fingerprint
    Thread Safe: No
    Keywords:
    fingerprint, similarity