Package org.openscience.cdk.similarity
Class Tanimoto
- java.lang.Object
-
- org.openscience.cdk.similarity.Tanimoto
-
public class Tanimoto extends Object
Calculates the Tanimoto coefficient for a given pair of two fingerprint bitsets or real valued feature vectors. The Tanimoto coefficient is one way to quantitatively measure the "distance" or similarity of two chemical structures.You can use the FingerPrinter class to retrieve two fingerprint bitsets. We assume that you have two structures stored in cdk.Molecule objects. A tanimoto coefficient can then be calculated like:
BitSet fingerprint1 = Fingerprinter.getBitFingerprint(molecule1); BitSet fingerprint2 = Fingerprinter.getBitFingerprint(molecule2); float tanimoto_coefficient = Tanimoto.calculate(fingerprint1, fingerprint2);
The FingerPrinter assumes that hydrogens are explicitely given, if this is desired!
Note that the continuous Tanimoto coefficient does not lead to a metric space
- Author:
- steinbeck
- Source code:
- main
- Belongs to CDK module:
- fingerprint
- Keywords:
- jaccard, similarity, tanimoto
- Created on:
- 2005-10-19
-
-
Field Summary
Fields Modifier and Type Field Description static String
EMPTY_FINGERPRINTS_PROVIDED
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static float
calculate(double[] features1, double[] features2)
Evaluates the continuous Tanimoto coefficient for two real valued vectors.static float
calculate(BitSet bitset1, BitSet bitset2)
Evaluates Tanimoto coefficient for two bit sets.static float
calculate(Map<String,Integer> features1, Map<String,Integer> features2)
Evaluate continuous Tanimoto coefficient for two feature, count fingerprint representations.static double
calculate(IBitFingerprint fingerprint1, IBitFingerprint fingerprint2)
Evaluates Tanimoto coefficient for twoIBitFingerprint
.static double
calculate(ICountFingerprint fp1, ICountFingerprint fp2)
Evaluate continuous Tanimoto coefficient for two feature, count fingerprint representations.static double
method1(ICountFingerprint fp1, ICountFingerprint fp2)
Calculates Tanimoto distance for two count fingerprints using method 1.static double
method2(ICountFingerprint fp1, ICountFingerprint fp2)
Calculates Tanimoto distance for two count fingerprints using method 2 [J.A. Grant, J.A. Haigh, B.T. Pickup, A. Nicholls and R.A. Sayle. J. Chem. Inf. Model.. 2006. 46].
-
-
-
Field Detail
-
EMPTY_FINGERPRINTS_PROVIDED
public static final String EMPTY_FINGERPRINTS_PROVIDED
- See Also:
- Constant Field Values
-
-
Method Detail
-
calculate
public static float calculate(BitSet bitset1, BitSet bitset2) throws CDKException
Evaluates Tanimoto coefficient for two bit sets.- Parameters:
bitset1
- A bitset (such as a fingerprint) for the first moleculebitset2
- A bitset (such as a fingerprint) for the second molecule- Returns:
- The Tanimoto coefficient
- Throws:
CDKException
- if bitsets are not of the same length
-
calculate
public static double calculate(IBitFingerprint fingerprint1, IBitFingerprint fingerprint2)
Evaluates Tanimoto coefficient for twoIBitFingerprint
.- Parameters:
fingerprint1
- fingerprint for the first moleculefingerprint2
- fingerprint for the second molecule- Returns:
- The Tanimoto coefficient
- Throws:
IllegalArgumentException
- if bitsets are not of the same length
-
calculate
public static float calculate(double[] features1, double[] features2) throws CDKException
Evaluates the continuous Tanimoto coefficient for two real valued vectors.- Parameters:
features1
- The first feature vectorfeatures2
- The second feature vector- Returns:
- The continuous Tanimoto coefficient
- Throws:
CDKException
- if the features are not of the same length
-
calculate
public static float calculate(Map<String,Integer> features1, Map<String,Integer> features2)
Evaluate continuous Tanimoto coefficient for two feature, count fingerprint representations.Note that feature/count type fingerprints may be of different length. Uses Tanimoto method from 10.1021/ci800326z
- Parameters:
features1
- The first feature mapfeatures2
- The second feature map- Returns:
- The Tanimoto coefficient
-
calculate
public static double calculate(ICountFingerprint fp1, ICountFingerprint fp2)
Evaluate continuous Tanimoto coefficient for two feature, count fingerprint representations.Note that feature/count type fingerprints may be of different length. Uses Tanimoto method from 10.1021/ci800326z
- Parameters:
fp1
- The first fingerprintfp2
- The second fingerprint- Returns:
- The Tanimoto coefficient
- See Also:
method1(org.openscience.cdk.fingerprint.ICountFingerprint, org.openscience.cdk.fingerprint.ICountFingerprint)
,method2(org.openscience.cdk.fingerprint.ICountFingerprint, org.openscience.cdk.fingerprint.ICountFingerprint)
-
method1
public static double method1(ICountFingerprint fp1, ICountFingerprint fp2)
Calculates Tanimoto distance for two count fingerprints using method 1. The feature/count type fingerprints may be of different length. Uses Tanimoto method from [Andreas Steffen, Thierry Kogej, Christian Tyrchan and Ola Engkvist. J. Chem. Inf. Model.. 2009. 49].- Parameters:
fp1
- count fingerprint 1fp2
- count fingerprint 2- Returns:
- a Tanimoto distance
-
method2
public static double method2(ICountFingerprint fp1, ICountFingerprint fp2)
Calculates Tanimoto distance for two count fingerprints using method 2 [J.A. Grant, J.A. Haigh, B.T. Pickup, A. Nicholls and R.A. Sayle. J. Chem. Inf. Model.. 2006. 46].- Parameters:
fp1
- count fingerprint 1fp2
- count fingerprint 2- Returns:
- a Tanimoto distance
-
-