Package org.openscience.cdk.tools
Class SugarRemovalUtility
- java.lang.Object
-
- org.openscience.cdk.tools.SugarRemovalUtility
-
public class SugarRemovalUtility extends Object
The Sugar Removal Utility (SRU) implements a generalized algorithm for automated detection of circular and linear sugars in molecular structures and their removal, as described in "Schaub, J., Zielesny, A., Steinbeck, C., Sorokina, M. Too sweet: cheminformatics for deglycosylation in natural products. J Cheminform 12, 67 (2020). https://doi.org/10.1186/s13321-020-00467-y". It offers various functions to detect and remove sugar moieties with different options. Example usage://prepare test molecule SmilesParser smiPar = new SmilesParser(SilentChemObjectBuilder.getInstance()); //COCONUT DB CNP0220816 IAtomContainer molecule = smiPar.parseSmiles("CC1=CC(OC2OC(CO)C(O)C(O)C2O)=C2C3=C(CCC3)C(=O)OC2=C1"); //instantiate sugar removal utility SugarRemovalUtility sugarRemovalUtil = new SugarRemovalUtility(SilentChemObjectBuilder.getInstance()); //remove sugar moieties, note that this changes the molecule instance! boolean sugarsWereRemoved = sugarRemovalUtil.removeCircularAndLinearSugars(molecule); if (sugarsWereRemoved) { //saturate open valences where sugars were situated if needed AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(molecule); CDKHydrogenAdder.getInstance(SilentChemObjectBuilder.getInstance()).addImplicitHydrogens(molecule); }
- Author:
- Jonas Schaub, Maria Sorokina
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SugarRemovalUtility.PreservationMode
Enum with options for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar.
-
Field Summary
Fields Modifier and Type Field Description protected static String[]
CIRCULAR_SUGARS_SMILES
Circular sugar structures represented as SMILES codes.static boolean
DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT
Default setting for whether detected circular sugar candidates must have a sufficient number of attached, single-bonded exocyclic oxygen atoms in order to be detected as a sugar moiety (default: true).static boolean
DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT
Default setting for whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed (default: false).static boolean
DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT
Default setting for whether sugar-like rings that have keto groups should also be detected as circular sugars (default: false).static boolean
DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT
Default setting for whether to include the linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule (default: false).static boolean
DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT
Default setting for whether linear sugar structures that are part of a ring should be detected (default: false).static boolean
DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT
Default setting for whether to include spiro rings in the initial set of detected rings considered for circular sugar detection (default: false).protected static SmartsPattern
ESTER_SMARTS_PATTERN
Daylight SMARTS pattern for matching ester bonds between linear sugars.protected static SmartsPattern
ETHER_SMARTS_PATTERN
Daylight SMARTS pattern for matching ether bonds between linear sugars.static double
EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT
Default setting for the minimum ratio of attached exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated (default: 0.5 so at a minimum 3 connected, exocyclic oxygen atoms for a six-membered ring, for example).static String
INDEX_PROPERTY_KEY
Property key for index that is added to any IAtom object in a given IAtomContainer object for internal unique identification of the respective IAtom object.static String
IS_SPIRO_ATOM_PROPERTY_KEY
Key for property that is added to IAtom objects that connect a spiro ring system for identification and preservation of these atoms in the removal process.protected static String[]
LINEAR_ACIDIC_SUGARS_SMILES
Linear acidic sugar structures represented as SMILES codes.static int
LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT
Default setting for the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed, default: 7, inclusive).static int
LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT
Default setting for the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed, default: 4, inclusive).protected static String[]
LINEAR_SUGARS_SMILES
Linear sugar structures represented as SMILES codes.protected static SmartsPattern
PEROXIDE_SMARTS_PATTERN
Daylight SMARTS pattern for matching peroxide bonds between linear sugars.static SugarRemovalUtility.PreservationMode
PRESERVATION_MODE_DEFAULT
Default setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar.static boolean
REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT
Default setting for whether only terminal sugar moieties should be removed, i.e.static String
TOO_SMALL_DISCONNECTED_PART_TO_PRESERVE_PROPERTY_KEY
Key for property that is added to IAtom objects that belong to unconnected parts of an input molecule that would be cleared away because it is too small according to the set preservation mode for preservation of these atoms in the removal process.
-
Constructor Summary
Constructors Constructor Description SugarRemovalUtility(IChemObjectBuilder builder)
Sole constructor of this class.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
addCircularSugarToPatternsList(String smilesCode)
Allows to add an additional sugar ring (represented as a SMILES string) to the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
addCircularSugarToPatternsList(IAtomContainer circularSugar)
Allows to add an additional sugar ring to the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
addLinearSugarToPatternsList(String smilesCode)
Allows to add an additional linear sugar (represented as SMILES string) to the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
addLinearSugarToPatternsList(IAtomContainer linearSugar)
Allows to add an additional linear sugar to the list of linear sugar structures an input molecule is scanned for in linear sugar detection.protected void
addUniqueIndicesToAtoms(IAtomContainer moleculeParam)
Adds an index as property to all atom objects of the given atom container to identify them uniquely within the atom container and its copies.protected boolean
areAllExocyclicBondsSingle(IAtomContainer ringToTest, IAtomContainer originalMolecule, boolean ignoreKetoGroups)
Checks whether all exocyclic bonds connected to a given ring fragment of a parent atom container are of single order.boolean
areCircularSugarsWithKetoGroupsDetected()
Specifies whether potential sugar cycles with keto groups are detected in circular sugar detection.boolean
areLinearAcidicSugarsDetected()
Specifies whether linear acidic sugar patterns are currently included in the linear sugar structures used for initial detection of linear sugars in a given molecule.boolean
areLinearSugarsInRingsDetected()
Specifies whether linear sugar structures that are part of a ring should be detected according to the current settings.boolean
areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected()
Specifies whether detected circular sugar candidates must have a sufficient number of attached exocyclic oxygen atoms in order to be detected as a sugar moiety.boolean
areOnlyCircularSugarsWithOGlycosidicBondDetected()
Specifies whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed.boolean
areOnlyTerminalSugarsRemoved()
Specifies whether only terminal sugar moieties should be removed, i.e.boolean
areSpiroRingsDetectedAsCircularSugars()
Specifies whether spiro rings are included in the initial set of detected rings considered for circular sugar detection.protected boolean
checkUniqueIndicesOfAtoms(IAtomContainer moleculeParam)
Checks whether all atoms in the given molecule have a unique (in the given molecule) index as property.void
clearCircularSugarPatternsList()
Clears all the circular sugar structures an input molecule is scanned for in circular sugar detection.void
clearLinearSugarPatternsList()
Clears all the linear sugar structures an input molecule is scanned for in linear sugar detection.protected List<IAtomContainer>
combineOverlappingCandidates(List<IAtomContainer> candidateList)
Combines all overlapping (i.e.protected IAtomContainer
copy(IAtomContainer molecule)
Generates a very basic copy of the given molecule, intended for testing whether one of its substructures is terminal inisTerminal(IAtomContainer, IAtomContainer, List)
.protected List<IAtomContainer>
detectLinearSugarCandidatesByPatternMatching(IAtomContainer moleculeParam)
Initial detection of linear sugar candidates by substructure search for the linear sugar patterns in the given molecule.protected List<IAtomContainer>
detectPotentialSugarCycles(IAtomContainer moleculeParam, boolean includeSpiroRings, boolean ignoreKetoGroups)
Detects and returns cycles of the given molecule that are isolated (spiro rings included or not according to the boolean parameter), isomorphic to the circular sugar patterns, and only have exocyclic single bonds (keto groups ignored or not according to the boolean parameter).protected boolean
doesRingHaveEnoughExocyclicOxygenAtoms(int numberOfAtomsInRing, int numberOfAttachedExocyclicOxygenAtoms)
Simple decision-making function for deciding whether a candidate sugar ring has enough attached, single-bonded exocyclic oxygen atoms according to the set threshold.protected void
flagTooSmallDisconnectedPartsToPreserve(IAtomContainer moleculeParam)
Checks the given input molecule for unconnected parts that would be cleared away after sugar removal, in order to add a flag to the atoms of these parts to be able to preserve them in the later removal.protected String
generateSubstructureIdentifier(IAtomContainer substructure)
Creates an identifier string for substructures of a molecule, based on the unique indices of the included atoms.List<IAtomContainer>
getCircularSugarCandidates(IAtomContainer moleculeParam)
Extracts circular sugar moieties from the given molecule, according to the current settings for circular sugar detection.List<IAtomContainer>
getCircularSugarPatternsList()
Returns a list of atom containers representing the circular sugar structures an input molecule is scanned for in circular sugar detection.protected int
getExocyclicOxygenAtomCount(IAtomContainer ringToTest, IAtomContainer originalMolecule)
Returns the number of attached exocyclic oxygen atoms of a given ring in the original atom container.double
getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting()
Returns the currently set minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.int
getLinearSugarCandidateMaxSizeSetting()
Returns the currently set maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).int
getLinearSugarCandidateMinSizeSetting()
Returns the currently set minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).List<IAtomContainer>
getLinearSugarCandidates(IAtomContainer moleculeParam)
Extracts linear sugar moieties from the given molecule, according to the current settings for linear sugar detection.List<IAtomContainer>
getLinearSugarPatternsList()
Returns a list of atom containers representing the linear sugar structures an input molecule is scanned for in linear sugar detection.int
getNumberOfCircularAndLinearSugars(IAtomContainer moleculeParam)
Detects circular and linear sugar moieties in the given molecule according to the current settings for circular and linear sugar detection and returns the number of detected moieties.int
getNumberOfCircularSugars(IAtomContainer moleculeParam)
Detects circular sugar moieties in the given molecule according to the current settings for circular sugar detection and returns the number of detected moieties.int
getNumberOfLinearSugars(IAtomContainer moleculeParam)
Detects linear sugar moieties in the given molecule according to the current settings for linear sugar detection and returns the number of detected moieties.SugarRemovalUtility.PreservationMode
getPreservationModeSetting()
Returns the current setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar.int
getPreservationModeThresholdSetting()
Returns the current threshold of e.g.boolean
hasCircularOrLinearSugars(IAtomContainer moleculeParam)
Detects circular and linear sugar moieties in the given molecule, according to the current settings for sugar detection.boolean
hasCircularSugars(IAtomContainer moleculeParam)
Detects circular sugar moieties in the given molecule, according to the current settings for circular sugar detection.protected boolean
hasGlycosidicBond(IAtomContainer ringToTest, IAtomContainer originalMolecule)
Checks all exocyclic connections of the given ring to detect an O-glycosidic bond.boolean
hasLinearSugars(IAtomContainer moleculeParam)
Detects linear sugar moieties in the given molecule, according to the current settings for linear sugar detection.protected boolean
isMoleculeEmptyAfterRemovalOfThisRing(IAtomContainer ringParam, IAtomContainer moleculeParam)
Checks whether the given molecule would be empty after removal of the given ring.boolean
isQualifiedForGlycosidicBondExemption(IAtomContainer moleculeParam)
Tests whether the given molecule qualifies for the glycosidic bond exemption.protected boolean
isTerminal(IAtomContainer substructure, IAtomContainer parentMolecule, List<IAtomContainer> candidateList)
Checks whether the given substructure is terminal (i.e.protected boolean
isTooSmallToPreserve(IAtomContainer moleculeParam)
Checks whether the given molecule or structure is too small to be kept according to the current preservation mode and threshold setting.List<IAtomContainer>
removeAndReturnCircularAndLinearSugars(IAtomContainer moleculeParam)
Removes circular and linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).List<IAtomContainer>
removeAndReturnCircularSugars(IAtomContainer moleculeParam)
Removes circular sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed circular sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).List<IAtomContainer>
removeAndReturnLinearSugars(IAtomContainer moleculeParam)
Removes linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed linear sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).protected void
removeAtomsOfCircularSugarsFromCandidates(List<IAtomContainer> candidateList, IAtomContainer parentMolecule)
Removes all atoms belonging to possible circular sugars, as returned by the method for initial circular sugar detection, from the given linear sugar candidates.boolean
removeCircularAndLinearSugars(IAtomContainer moleculeParam)
Removes circular and linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).boolean
removeCircularSugarFromPatternsList(String smilesCode)
Allows to remove a sugar ring pattern (represented as SMILES string) from the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
removeCircularSugarFromPatternsList(IAtomContainer circularSugar)
Allows to remove a sugar ring from the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
removeCircularSugars(IAtomContainer moleculeParam)
Removes circular sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).protected void
removeCyclicAtomsFromSugarCandidates(List<IAtomContainer> candidateList, IAtomContainer moleculeParam)
Removes all atoms that are part of a cycle from the given linear sugar candidates.boolean
removeLinearSugarFromPatternsList(String smilesCode)
Allows to remove a linear sugar pattern (represented as SMILES string) from the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
removeLinearSugarFromPatternsList(IAtomContainer linearSugar)
Allows to remove a linear sugar pattern from the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
removeLinearSugars(IAtomContainer moleculeParam)
Removes linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!).protected List<IAtomContainer>
removeSugarCandidates(IAtomContainer moleculeParam, List<IAtomContainer> candidateList)
Removes the given sugar moieties (or substructures in general) from the given molecule and returns the removed moieties (not the aglycone!).protected List<IAtomContainer>
removeTooSmallAndTooLargeCandidates(List<IAtomContainer> candidateList)
Discards all linear sugar candidates that are too small or too big according to the current settings.protected void
removeTooSmallDisconnectedStructures(IAtomContainer moleculeParam)
Removes all unconnected fragments that are too small to keep according to the current preservation mode and threshold setting.void
restoreDefaultSettings()
Sets all settings to their default values (see public static constants or enquire via get/is methods).void
setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting(boolean bool)
Sets the option to only detect (and subsequently remove) circular sugars that have a sufficient number of attached, exocyclic, single-bonded oxygen atoms.void
setDetectCircularSugarsOnlyWithOGlycosidicBondSetting(boolean bool)
Sets the option to only detect (and subsequently remove) circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond.void
setDetectCircularSugarsWithKetoGroupsSetting(boolean bool)
Sets the option to detect potential sugar cycles with keto groups as circular sugars in circular sugar detection.void
setDetectLinearAcidicSugarsSetting(boolean bool)
Sets the option to include linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule.void
setDetectLinearSugarsInRingsSetting(boolean bool)
Sets the option to detect linear sugar structures that are part of a ring.void
setDetectSpiroRingsAsCircularSugarsSetting(boolean bool)
Sets the option to include spiro rings in the initial set of detected rings considered for circular sugar detection.boolean
setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting(double threshold)
Sets the minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.void
setLinearSugarCandidateMaxSizeSetting(int maxSize)
Sets the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).void
setLinearSugarCandidateMinSizeSetting(int minSize)
Sets the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).void
setPreservationModeSetting(SugarRemovalUtility.PreservationMode option)
Sets the preservation mode for structures that get disconnected by sugar removal and the preservation mode threshold is set to the default value of the given enum constant.void
setPreservationModeThresholdSetting(int threshold)
Sets the preservation mode threshold, i.e.void
setRemoveOnlyTerminalSugarsSetting(boolean bool)
Sets the option to remove only terminal sugar moieties, i.e.protected List<IAtomContainer>
splitEtherEsterAndPeroxideBonds(List<IAtomContainer> candidateList)
Splits all ether, ester, and peroxide bonds in the given linear sugar candidates and separates those that get disconnected in the process.protected void
updateLinearSugarPatterns()
All linear sugar patterns represented by atom containers in the respective list are sorted, parsed into actual pattern objects, and stored in the internal list for initial linear sugar detection.
-
-
-
Field Detail
-
INDEX_PROPERTY_KEY
public static final String INDEX_PROPERTY_KEY
Property key for index that is added to any IAtom object in a given IAtomContainer object for internal unique identification of the respective IAtom object. For internal use only.- See Also:
- Constant Field Values
-
IS_SPIRO_ATOM_PROPERTY_KEY
public static final String IS_SPIRO_ATOM_PROPERTY_KEY
Key for property that is added to IAtom objects that connect a spiro ring system for identification and preservation of these atoms in the removal process. For internal use only.- See Also:
- Constant Field Values
-
TOO_SMALL_DISCONNECTED_PART_TO_PRESERVE_PROPERTY_KEY
public static final String TOO_SMALL_DISCONNECTED_PART_TO_PRESERVE_PROPERTY_KEY
Key for property that is added to IAtom objects that belong to unconnected parts of an input molecule that would be cleared away because it is too small according to the set preservation mode for preservation of these atoms in the removal process. For internal use only.- See Also:
- Constant Field Values
-
LINEAR_SUGARS_SMILES
protected static final String[] LINEAR_SUGARS_SMILES
Linear sugar structures represented as SMILES codes. An input molecule is scanned for these substructures for the detection of linear sugars. This set consists of multiple aldoses, ketoses, and sugar alcohols with sizes between 3 and 7 carbons. Additional structures can be added or specific ones removed from the set at run-time using the respective methods.
-
LINEAR_ACIDIC_SUGARS_SMILES
protected static final String[] LINEAR_ACIDIC_SUGARS_SMILES
Linear acidic sugar structures represented as SMILES codes. These can be optionally added to the linear sugar structures used for initial detection of linear sugars in an input molecule.
-
CIRCULAR_SUGARS_SMILES
protected static final String[] CIRCULAR_SUGARS_SMILES
Circular sugar structures represented as SMILES codes. The isolated rings of an input molecule are matched with these structures for the detection of circular sugars. The structures listed here only represent the circular part of sugar rings (i.e. one oxygen atom and multiple carbon atoms). Common exocyclic structures like hydroxy groups are not part of the patterns and therefore not part of the detected circular sugar moieties. The set includes tetrahydrofuran, tetrahydropyran, and oxepane to match furanoses, pyranoses, and heptoses per default. It can be configured at run-time using the respective methods.
-
DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT
Default setting for whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed (default: false).- See Also:
- Constant Field Values
-
REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT
public static final boolean REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT
Default setting for whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures (default: true).- See Also:
- Constant Field Values
-
PRESERVATION_MODE_DEFAULT
public static final SugarRemovalUtility.PreservationMode PRESERVATION_MODE_DEFAULT
Default setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. (default: preserve all structures that consist of 5 or more heavy atoms). The set option plays a major role in discriminating terminal and non-terminal sugar moieties. The minimum value to reach for the respective characteristic to judge by is set in an additional option and all enum constants have their own default values. See the PreservationMode enum.
-
DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT
Default setting for whether detected circular sugar candidates must have a sufficient number of attached, single-bonded exocyclic oxygen atoms in order to be detected as a sugar moiety (default: true). The 'sufficient number' is defined in another option / default setting.- See Also:
- Constant Field Values
-
EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT
public static final double EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT
Default setting for the minimum ratio of attached exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated (default: 0.5 so at a minimum 3 connected, exocyclic oxygen atoms for a six-membered ring, for example).- See Also:
- Constant Field Values
-
DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT
public static final boolean DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT
Default setting for whether linear sugar structures that are part of a ring should be detected (default: false). This setting is important for e.g. macrocycles that contain sugars or pseudosugars.- See Also:
- Constant Field Values
-
LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT
public static final int LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT
Default setting for the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed, default: 4, inclusive).- See Also:
- Constant Field Values
-
LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT
public static final int LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT
Default setting for the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed, default: 7, inclusive).- See Also:
- Constant Field Values
-
DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT
public static final boolean DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT
Default setting for whether to include the linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule (default: false).- See Also:
- Constant Field Values
-
DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT
public static final boolean DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT
Default setting for whether to include spiro rings in the initial set of detected rings considered for circular sugar detection (default: false). If the option is turned on and a spiro sugar ring is removed, its atom connecting it to another ring is preserved.- See Also:
- Constant Field Values
-
DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT
Default setting for whether sugar-like rings that have keto groups should also be detected as circular sugars (default: false). The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off (default), every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- See Also:
- Constant Field Values
-
ESTER_SMARTS_PATTERN
protected static final SmartsPattern ESTER_SMARTS_PATTERN
Daylight SMARTS pattern for matching ester bonds between linear sugars. Defines an aliphatic carbon atom connected to a double-bonded oxygen atom and a single-bonded oxygen atom that must not be in a ring and is connected to another aliphatic carbon atom via a single bond. The oxygen atom must not be in a ring to avoid breaking circular sugars.
-
ETHER_SMARTS_PATTERN
protected static final SmartsPattern ETHER_SMARTS_PATTERN
Daylight SMARTS pattern for matching ether bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is in turn connected to another aliphatic carbon atom. The oxygen atom must not be in a ring to avoid breaking circular sugars. This pattern also matches ester bonds which is why esters must be detected and processed before ethers.
-
PEROXIDE_SMARTS_PATTERN
protected static final SmartsPattern PEROXIDE_SMARTS_PATTERN
Daylight SMARTS pattern for matching peroxide bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is connected to another oxygen atom of the same kind, followed by another aliphatic carbon atom. Even tough it is highly unlikely for a peroxide bond to be in a ring, every ring should be preserved.
-
-
Constructor Detail
-
SugarRemovalUtility
public SugarRemovalUtility(IChemObjectBuilder builder)
Sole constructor of this class. All settings are set to their default values (see public static constants or enquire via get/is methods). To change these settings, use the respective 'setXY()' methods.- Parameters:
builder
- IChemObjectBuilder for i.a. parsing SMILES strings of sugar patterns into atom containers
-
-
Method Detail
-
getLinearSugarPatternsList
public List<IAtomContainer> getLinearSugarPatternsList()
Returns a list of atom containers representing the linear sugar structures an input molecule is scanned for in linear sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures not, and the linear acidic sugar structures are only included if the respective option is activated.
Note: do not change the returned list but useaddLinearSugarToPatternsList(IAtomContainer)
andremoveLinearSugarFromPatternsList(IAtomContainer)
to modify it (or the corresponding SMILES string-based methods) because these sync updates with the actually used list of DfPattterns.- Returns:
- a list of atom containers representing a current snapshot of the linear sugar patterns used for detection
-
getCircularSugarPatternsList
public List<IAtomContainer> getCircularSugarPatternsList()
Returns a list of atom containers representing the circular sugar structures an input molecule is scanned for in circular sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures are not.
Note: do not change the returned list but useaddCircularSugarToPatternsList(IAtomContainer)
andremoveCircularSugarFromPatternsList(IAtomContainer)
to modify it (or the corresponding SMILES string-based methods) because these sync updates with the actually used list of DfPattterns.- Returns:
- a list of atom containers representing a current snapshot of the circular sugar patterns used for detection
-
areOnlyCircularSugarsWithOGlycosidicBondDetected
public boolean areOnlyCircularSugarsWithOGlycosidicBondDetected()
Specifies whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed.- Returns:
- true if only circular sugar moieties connected via a glycosidic bond are removed according to the current settings
-
areOnlyTerminalSugarsRemoved
public boolean areOnlyTerminalSugarsRemoved()
Specifies whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.- Returns:
- true if only terminal sugar moieties are removed according to the current settings
-
getPreservationModeSetting
public SugarRemovalUtility.PreservationMode getPreservationModeSetting()
Returns the current setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight, or it can be specified that all structures are to be preserved. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough').- Returns:
- a PreservationMode enum object representing the current setting
-
getPreservationModeThresholdSetting
public int getPreservationModeThresholdSetting()
Returns the current threshold of e.g. molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule by the removal of a sugar moiety has to reach in order to be preserved and not discarded.- Returns:
- an integer specifying the currently set threshold (either specified in Da or number of heavy atoms)
-
areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected
public boolean areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected()
Specifies whether detected circular sugar candidates must have a sufficient number of attached exocyclic oxygen atoms in order to be detected as a sugar moiety. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting.- Returns:
- true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring is evaluated at circular sugar detection according to the current settings
-
getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting
public double getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting()
Returns the currently set minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.- Returns:
- the minimum ratio of attached oxygen atoms to the number of atoms in the sugar ring; A value of e.g. 0.5 means that a six-membered sugar ring needs at least 3 attached oxygen atoms to be classified as a circular sugar moiety
-
areLinearSugarsInRingsDetected
public boolean areLinearSugarsInRingsDetected()
Specifies whether linear sugar structures that are part of a ring should be detected according to the current settings. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.- Returns:
- true if linear sugars in rings are detected and removed with the current settings
-
getLinearSugarCandidateMinSizeSetting
public int getLinearSugarCandidateMinSizeSetting()
Returns the currently set minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).- Returns:
- the set minimum carbon atom count of detected linear sugars (inclusive)
-
getLinearSugarCandidateMaxSizeSetting
public int getLinearSugarCandidateMaxSizeSetting()
Returns the currently set maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).- Returns:
- the set maximum carbon atom count of detected linear sugars (inclusive)
-
areLinearAcidicSugarsDetected
public boolean areLinearAcidicSugarsDetected()
Specifies whether linear acidic sugar patterns are currently included in the linear sugar structures used for initial detection of linear sugars in a given molecule.- Returns:
- true if acidic sugars are detected
-
areSpiroRingsDetectedAsCircularSugars
public boolean areSpiroRingsDetectedAsCircularSugars()
Specifies whether spiro rings are included in the initial set of detected rings considered for circular sugar detection.
Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.- Returns:
- true if spiro rings can be detected as circular sugars with the current settings
-
areCircularSugarsWithKetoGroupsDetected
public boolean areCircularSugarsWithKetoGroupsDetected()
Specifies whether potential sugar cycles with keto groups are detected in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- Returns:
- true if potential sugar cycles having keto groups are detected in circular sugar detection
-
addCircularSugarToPatternsList
public boolean addCircularSugarToPatternsList(IAtomContainer circularSugar)
Allows to add an additional sugar ring to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorphic to the already present ones, and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.- Parameters:
circularSugar
- an atom container representing only one isolated sugar ring- Returns:
- true if the addition was successful; false if the given atom container is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring or is isomorphic to a circular sugar structure already present
-
addCircularSugarToPatternsList
public boolean addCircularSugarToPatternsList(String smilesCode)
Allows to add an additional sugar ring (represented as a SMILES string) to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorphic to the already present ones, and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.- Parameters:
smilesCode
- a SMILES code representation of a molecule consisting of only one isolated sugar ring- Returns:
- true if the addition was successful; false if the given SMILES string is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring, is isomorphic to a circular sugar structure already present or if the given SMILES string cannot be parsed into a molecular structure
-
addLinearSugarToPatternsList
public boolean addLinearSugarToPatternsList(IAtomContainer linearSugar)
Allows to add an additional linear sugar to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorphic to the already present ones or the patterns for circular sugars.
Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
Additional note: If the given structure is isomorphic to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.- Parameters:
linearSugar
- an atom container representing a molecular structure to search for at linear sugar detection- Returns:
- true if the addition was successful; false if the given atom container is empty or is isomorphic to a linear sugar structure already present or a circular sugar pattern
-
addLinearSugarToPatternsList
public boolean addLinearSugarToPatternsList(String smilesCode)
Allows to add an additional linear sugar (represented as SMILES string) to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorphic to the already present ones or the patterns for circular sugars.
Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
Additional note: If the given structure is isomorphic to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.- Parameters:
smilesCode
- a SMILES code representation of a molecular structure to search for- Returns:
- true if the addition was successful; false if the given SMILES string is empty or does represent a molecule that is isomorphic to a linear sugar structure already present or a circular sugar pattern or if it cannot be parsed into a molecular structure
-
removeCircularSugarFromPatternsList
public boolean removeCircularSugarFromPatternsList(String smilesCode)
Allows to remove a sugar ring pattern (represented as SMILES string) from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given character string must be a valid SMILES notation and be isomorphic to one of the currently used structure patterns. Example usage: Pass the argument "C1CCOC1" (tetrahydrofuran) to stop detecting furanoses in the circular sugar detection algorithm.- Parameters:
smilesCode
- a SMILES code representation of a structure present in the circular sugar pattern list- Returns:
- true if the removal was successful; false if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the circular sugar pattern list
-
removeCircularSugarFromPatternsList
public boolean removeCircularSugarFromPatternsList(IAtomContainer circularSugar)
Allows to remove a sugar ring from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given molecule must be isomorphic to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of tetrahydrofuran to stop detecting furanoses in the circular sugar detection algorithm.- Parameters:
circularSugar
- a molecule isomorphic to a structure present in the circular sugar pattern list- Returns:
- true if the removal was successful; false if the given atom container is empty or its structure is not isomorphic to a circular sugar pattern structure in use
-
removeLinearSugarFromPatternsList
public boolean removeLinearSugarFromPatternsList(String smilesCode)
Allows to remove a linear sugar pattern (represented as SMILES string) from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given character string must be a valid SMILES notation and be isomorphic to one of the currently used structure patterns. Example usage: Pass the argument "C(C(C=O)O)O" (aldotriose) to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.- Parameters:
smilesCode
- a SMILES code representation of a structure present in the linear sugar pattern list- Returns:
- true if the removal was successful; false if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the linear sugar pattern list
-
removeLinearSugarFromPatternsList
public boolean removeLinearSugarFromPatternsList(IAtomContainer linearSugar)
Allows to remove a linear sugar pattern from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given molecule must be isomorphic to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of aldotriose to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.- Parameters:
linearSugar
- a molecule isomorphic to a structure present in the linear sugar pattern list- Returns:
- true if the removal was successful; false if the given atom container is empty or its structure is not isomorphic to a linear sugar pattern structure in use
-
clearCircularSugarPatternsList
public void clearCircularSugarPatternsList()
Clears all the circular sugar structures an input molecule is scanned for in circular sugar detection.
-
clearLinearSugarPatternsList
public void clearLinearSugarPatternsList()
Clears all the linear sugar structures an input molecule is scanned for in linear sugar detection. If the detection of linear acidic sugars is turned on, it is turned off in this method and these structures are also cleared from the linear sugar patterns.
-
setDetectCircularSugarsOnlyWithOGlycosidicBondSetting
public void setDetectCircularSugarsOnlyWithOGlycosidicBondSetting(boolean bool)
Sets the option to only detect (and subsequently remove) circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond.- Parameters:
bool
- true, if only circular sugar moieties connected via a glycosidic bond should be detected (and removed)
-
setRemoveOnlyTerminalSugarsSetting
public void setRemoveOnlyTerminalSugarsSetting(boolean bool)
Sets the option to remove only terminal sugar moieties, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.- Parameters:
bool
- true, if only terminal sugar moieties should be removed
-
setPreservationModeSetting
public void setPreservationModeSetting(SugarRemovalUtility.PreservationMode option)
Sets the preservation mode for structures that get disconnected by sugar removal and the preservation mode threshold is set to the default value of the given enum constant. The preservation mode option specifies how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight, or it can be specified that all structures are to be preserved. The available options can be selected from the PreservationMode enum. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough'). This threshold is set to the default value of the given enum constant in this method.
Note that if the option "ALL" is combined with the removal of only terminal moieties, even the smallest attached structure will prevent the removal of a sugar. The most important consequence is that circular sugars with any hydroxy groups will not be removed because these are not considered as part of the sugar moiety.- Parameters:
option
- the selected preservation mode option
-
setPreservationModeThresholdSetting
public void setPreservationModeThresholdSetting(int threshold)
Sets the preservation mode threshold, i.e. the molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule during the removal of a sugar moiety has to reach in order to be kept and not removed along with the sugar. If the preservation mode is set to "HEAVY_ATOM_COUNT", the threshold is interpreted as the needed minimum number of heavy atoms and if it is set to "MOLECULAR_WEIGHT", the threshold is interpreted as minimum molecular weight in Da.
Notes: A threshold of zero can be set here, but it is recommended to choose the preservation mode "ALL" instead. On the other hand, if the preservation mode is set to "ALL", this threshold is automatically set to zero.- Parameters:
threshold
- the new threshold
-
setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting
public void setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting(boolean bool)
Sets the option to only detect (and subsequently remove) circular sugars that have a sufficient number of attached, exocyclic, single-bonded oxygen atoms. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting. If this option is re-activated, the previously set threshold is used again or the default value if no custom threshold has been set before.- Parameters:
bool
- true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring should be evaluated at circular sugar detection
-
setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting
public boolean setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting(double threshold)
Sets the minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.
A ratio of e.g. 0.5 means that a six-membered candidate sugar ring needs to have at least 3 attached, exocyclic single-bonded oxygen atoms in order to be classified as a circular sugar.
A zero value can be given if the option to remove only sugar rings with a sufficient number of exocyclic oxygen atoms is activated, but it is recommended to turn this option of instead.
Note: The normally present oxygen atom within a sugar ring is included in the number of ring atoms. So setting the threshold to 1.0 implies that at least one of the carbon atoms in the ring has two attached oxygen atoms. In general, the threshold can be set to values higher than 1.0 or even to negative values, but it does not make a lot of sense.- Parameters:
threshold
- the new ratio threshold- Returns:
- true if updating the value was successful; false if the given number is infinite, 'NaN', or smaller than 0
-
setDetectLinearSugarsInRingsSetting
public void setDetectLinearSugarsInRingsSetting(boolean bool)
Sets the option to detect linear sugar structures that are part of a ring. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.- Parameters:
bool
- true, if linear sugar structures that are part of a ring should be detected (and removed)
-
setLinearSugarCandidateMinSizeSetting
public void setLinearSugarCandidateMinSizeSetting(int minSize)
Sets the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed). This minimum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too small are discarded.
Note: It is not tested whether the given minimum size is actually smaller than the set maximum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.- Parameters:
minSize
- the new minimum size (inclusive) of linear sugars detected, interpreted as carbon atom count
-
setLinearSugarCandidateMaxSizeSetting
public void setLinearSugarCandidateMaxSizeSetting(int maxSize)
Sets the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed). This maximum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too big are discarded.
Note: It is not tested whether the given maximum size is actually greater than the set minimum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.- Parameters:
maxSize
- the new maximum size (inclusive) of linear sugars detected, interpreted as carbon atom count
-
setDetectLinearAcidicSugarsSetting
public void setDetectLinearAcidicSugarsSetting(boolean bool)
Sets the option to include linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule. If the option is turned on, the linear acidic sugar patterns are added to the linear sugar patterns list and can be retrieved and configured in the same way as the 'normal' linear sugar patterns. If the option is turned off, they are all removed again from the linear sugar patterns list.- Parameters:
bool
- true, if linear acidic sugar patterns should be included in the linear sugar structures used for initial detection of linear sugars
-
setDetectSpiroRingsAsCircularSugarsSetting
public void setDetectSpiroRingsAsCircularSugarsSetting(boolean bool)
Sets the option to include spiro rings in the initial set of detected rings considered for circular sugar detection. If the option is turned on, spiro atoms connected two spiro rings will be protected if a spiro sugar ring is removed. In the opposite case, spiro rings will be filtered from the set of isolated cycles detected in the given molecule.
Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.- Parameters:
bool
- true, if spiro rings should be detectable as circular sugars
-
setDetectCircularSugarsWithKetoGroupsSetting
public void setDetectCircularSugarsWithKetoGroupsSetting(boolean bool)
Sets the option to detect potential sugar cycles with keto groups as circular sugars in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- Parameters:
bool
- true, if circular sugars with keto groups should be detected
-
restoreDefaultSettings
public void restoreDefaultSettings()
Sets all settings to their default values (see public static constants or enquire via get/is methods). This includes the pattern lists for linear and circular sugars. To call this method is equivalent to using the constructor of this class.
-
hasLinearSugars
public boolean hasLinearSugars(IAtomContainer moleculeParam)
Detects linear sugar moieties in the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal linear sugar moieties are detected.- Parameters:
moleculeParam
- the atom container to scan for the presence of linear sugar moieties- Returns:
- true, if the given molecule contains linear sugar moieties
-
hasCircularSugars
public boolean hasCircularSugars(IAtomContainer moleculeParam)
Detects circular sugar moieties in the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal circular sugar moieties are detected.- Parameters:
moleculeParam
- the atom container to scan for the presence of circular sugar moieties- Returns:
- true, if the given molecule contains circular sugar moieties
-
hasCircularOrLinearSugars
public boolean hasCircularOrLinearSugars(IAtomContainer moleculeParam)
Detects circular and linear sugar moieties in the given molecule, according to the current settings for sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal sugar moieties are detected.- Parameters:
moleculeParam
- the atom container to scan for the presence of sugar moieties- Returns:
- true, if the given molecule contains sugar moieties of any kind (circular or linear)
-
isQualifiedForGlycosidicBondExemption
public boolean isQualifiedForGlycosidicBondExemption(IAtomContainer moleculeParam)
Tests whether the given molecule qualifies for the glycosidic bond exemption. This is true for molecules that practically are single-cycle circular sugars, meaning that the molecule is empty if the sugar ring is detected and removed according to the current settings. These molecules or sugar rings do not need to have a glycosidic bond in order to be detected as a sugar ring if the option to only detect those circular sugars that have one is activated. This exemption was introduced because these molecules do not contain any other structure to bind to via a glycosidic bond.
Note: It is checked whether the sugar ring really does not have a glycosidic bond and false is returned if it does.- Parameters:
moleculeParam
- the molecule to check- Returns:
- true, if the given molecule qualifies for the exemption (it only has one sugar cycle, is empty after its removal, and does not have a glycosidic bond); false otherwise
-
getNumberOfCircularSugars
public int getNumberOfCircularSugars(IAtomContainer moleculeParam)
Detects circular sugar moieties in the given molecule according to the current settings for circular sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
moleculeParam
- the atom container to scan for the presence of circular sugar moieties- Returns:
- an integer representing the number of detected circular sugar moieties in the given molecule
-
getNumberOfLinearSugars
public int getNumberOfLinearSugars(IAtomContainer moleculeParam)
Detects linear sugar moieties in the given molecule according to the current settings for linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
moleculeParam
- the atom container to scan for the presence of linear sugar moieties- Returns:
- an integer representing the number of detected linear sugar moieties in the given molecule
-
getNumberOfCircularAndLinearSugars
public int getNumberOfCircularAndLinearSugars(IAtomContainer moleculeParam)
Detects circular and linear sugar moieties in the given molecule according to the current settings for circular and linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
moleculeParam
- the atom container to scan for the presence of circular and linear sugar moieties- Returns:
- an integer representing the number of detected circular and linear sugar moieties in the given molecule
-
removeCircularSugars
public boolean removeCircularSugars(IAtomContainer moleculeParam)
Removes circular sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
The aglycone will have open valences in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the circular sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
If the given molecule consists only of circular sugars, an empty atom container is left after processing.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.- Parameters:
moleculeParam
- the molecule to remove circular sugar moieties from- Returns:
- true if sugar moieties were detected and removed
-
removeAndReturnCircularSugars
public List<IAtomContainer> removeAndReturnCircularSugars(IAtomContainer moleculeParam)
Removes circular sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed circular sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the circular sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of circular sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.- Parameters:
moleculeParam
- the molecule to remove circular sugar moieties from- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed circular sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
-
removeLinearSugars
public boolean removeLinearSugars(IAtomContainer moleculeParam)
Removes linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
The aglycone will have open valences in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the linear sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
If the given molecule consists only of linear sugars, an empty atom container is left after processing.- Parameters:
moleculeParam
- the molecule to remove linear sugar moieties from- Returns:
- true if sugar moieties were detected and removed
-
removeAndReturnLinearSugars
public List<IAtomContainer> removeAndReturnLinearSugars(IAtomContainer moleculeParam)
Removes linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed linear sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the linear sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of linear sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.- Parameters:
moleculeParam
- the molecule to remove linear sugar moieties from- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed linear sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
-
removeCircularAndLinearSugars
public boolean removeCircularAndLinearSugars(IAtomContainer moleculeParam)
Removes circular and linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycone (the 'true' aglycone) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the circular and linear sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of sugars, an empty atom container is returned.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.- Parameters:
moleculeParam
- the molecule to remove circular and linear sugar moieties from- Returns:
- true if sugar moieties were detected and removed
-
removeAndReturnCircularAndLinearSugars
public List<IAtomContainer> removeAndReturnCircularAndLinearSugars(IAtomContainer moleculeParam)
Removes circular and linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycone (the 'true' aglycone) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the circular and linear sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.- Parameters:
moleculeParam
- the molecule to remove circular and linear sugar moieties from- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
-
getCircularSugarCandidates
public List<IAtomContainer> getCircularSugarCandidates(IAtomContainer moleculeParam)
Extracts circular sugar moieties from the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.- Parameters:
moleculeParam
- the molecule to extract circular sugar moieties from- Returns:
- a list of substructures in the given molecule that are regarded as circular sugar moieties
-
getLinearSugarCandidates
public List<IAtomContainer> getLinearSugarCandidates(IAtomContainer moleculeParam)
Extracts linear sugar moieties from the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.- Parameters:
moleculeParam
- the molecule to extract linear sugar moieties from- Returns:
- a list of substructures in the given molecule that are regarded as linear sugar moieties
-
flagTooSmallDisconnectedPartsToPreserve
protected void flagTooSmallDisconnectedPartsToPreserve(IAtomContainer moleculeParam)
Checks the given input molecule for unconnected parts that would be cleared away after sugar removal, in order to add a flag to the atoms of these parts to be able to preserve them in the later removal. Note that this method does not check whether the molecule is actually disconnected and whether the preservation mode is not set to 'preserve all'.- Parameters:
moleculeParam
- the disconnected input molecule to check
-
removeTooSmallDisconnectedStructures
protected void removeTooSmallDisconnectedStructures(IAtomContainer moleculeParam)
Removes all unconnected fragments that are too small to keep according to the current preservation mode and threshold setting. If all structures are too small, an empty atom container is returned.
This does not guarantee that the resulting atom container consists of only one connected structure. There might be multiple unconnected structures that are big enough to be preserved.- Parameters:
moleculeParam
- the molecule to clean up; it might be empty after this method call but not null
-
isTooSmallToPreserve
protected boolean isTooSmallToPreserve(IAtomContainer moleculeParam)
Checks whether the given molecule or structure is too small to be kept according to the current preservation mode and threshold setting.- Parameters:
moleculeParam
- the molecule to check- Returns:
- true, if the given structure is too small to be preserved
-
isTerminal
protected boolean isTerminal(IAtomContainer substructure, IAtomContainer parentMolecule, List<IAtomContainer> candidateList)
Checks whether the given substructure is terminal (i.e. it can be removed without producing multiple unconnected structures in the remaining molecule) in the given parent molecule. To do this, the parent molecule is copied, the substructure is removed from this copy, and finally it is checked whether the parent molecule copy still consists of only one connected structure. If that is the case, the substructure is terminal. If the preservation mode is not set to 'preserve all structures', too small resulting fragments are cleared from the parent copy in between. These structures that are too small must also not be part of any other substructure in the given candidate list to avoid removing parts of other sugar candidates.
Note: This method only detects moieties that are immediately terminal. It will not deem terminal a sugar moiety that only becomes terminal after the removal of another sugar moiety, for example.- Parameters:
substructure
- the substructure to check for whether it is terminalparentMolecule
- the molecule the substructure is a part ofcandidateList
- a list containing the detected sugar candidates to check whether atoms of other candidates would be cleared away if the given substructure was removed (which has to be avoided)- Returns:
- true, if the substructure is terminal
-
removeSugarCandidates
protected List<IAtomContainer> removeSugarCandidates(IAtomContainer moleculeParam, List<IAtomContainer> candidateList)
Removes the given sugar moieties (or substructures in general) from the given molecule and returns the removed moieties (not the aglycone!). The removal algorithm is the same for linear and circular sugars. The only settings influencing the removal are the option specifying whether to remove only terminal sugar moieties and the set preservation mode (because it influences the determination of terminal vs. non-terminal).
If only terminal sugar moieties are to be removed, the sugar candidates are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after this method call, whereas in the former case, the remaining structure always consists of one connected structure.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
Note that the deglycosylated core is not returned as part of the given list in this method.- Parameters:
moleculeParam
- the molecule to remove the sugar candidates fromcandidateList
- the list of sugar moieties in the given molecule- Returns:
- a list of atom container objects representing the removed sugar moieties; the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties
-
copy
protected IAtomContainer copy(IAtomContainer molecule)
Generates a very basic copy of the given molecule, intended for testing whether one of its substructures is terminal inisTerminal(IAtomContainer, IAtomContainer, List)
. Copies atoms (new instances in copy), bonds (new instances in copy), atomic number, implicit hydrogen count, SRU index property, and bond order.- Parameters:
molecule
- atom container to copy- Returns:
- basic copy of the molecule
-
addUniqueIndicesToAtoms
protected void addUniqueIndicesToAtoms(IAtomContainer moleculeParam)
Adds an index as property to all atom objects of the given atom container to identify them uniquely within the atom container and its copies. This is required e.g. for the determination of terminal vs. non-terminal sugar moieties.- Parameters:
moleculeParam
- the molecule that will be processed by the class
-
generateSubstructureIdentifier
protected String generateSubstructureIdentifier(IAtomContainer substructure)
Creates an identifier string for substructures of a molecule, based on the unique indices of the included atoms. It is only encoded which atoms are part of the substructure, no bond information etc. Used for a quick matching of substructures in the same molecule. The unique indices in every atom have to be set.- Parameters:
substructure
- the substructure to create an identifier for- Returns:
- the identifier string
-
checkUniqueIndicesOfAtoms
protected boolean checkUniqueIndicesOfAtoms(IAtomContainer moleculeParam)
Checks whether all atoms in the given molecule have a unique (in the given molecule) index as property. It checks the uniqueness of the detected indices but not whether there are numbers missing (the ids of this class are created as numbers starting from zero and growing in integer steps).- Parameters:
moleculeParam
- the molecule to check- Returns:
- true if every atom has an index property that is unique in the given molecule
-
detectPotentialSugarCycles
protected List<IAtomContainer> detectPotentialSugarCycles(IAtomContainer moleculeParam, boolean includeSpiroRings, boolean ignoreKetoGroups)
Detects and returns cycles of the given molecule that are isolated (spiro rings included or not according to the boolean parameter), isomorphic to the circular sugar patterns, and only have exocyclic single bonds (keto groups ignored or not according to the boolean parameter). These cycles are the general candidates for circular sugars that are filtered according to the other settings in the following steps. Spiro atoms are marked by a property.- Parameters:
moleculeParam
- the molecule to extract potential circular sugars fromincludeSpiroRings
- specification whether spiro rings should be included in the detected potential sugar cycles or filtered out; for circular sugar detection this should be set according to the current 'detect spiro rings as circular sugars' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'ignoreKetoGroups
- specification whether potential sugar cycles with keto groups should be included in the returned list; for circular sugar detection this should be set according to the current 'detect circular sugars with keto groups' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'- Returns:
- a list of the potential sugar cycles
-
areAllExocyclicBondsSingle
protected boolean areAllExocyclicBondsSingle(IAtomContainer ringToTest, IAtomContainer originalMolecule, boolean ignoreKetoGroups)
Checks whether all exocyclic bonds connected to a given ring fragment of a parent atom container are of single order. If the option to allow potential sugar cycles having keto groups is activated, this method also returns true if a cycle having a keto group is processed.
The method iterates over all cyclic atoms and all of their bonds. So the runtime scales linear with the number of cyclic atoms and their connected bonds. In principle, this method can be used also for non-cyclic substructures.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
ringToTest
- the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameteroriginalMolecule
- the molecule that contains the ring under investigation; The exocyclic bonds will be queried from itignoreKetoGroups
- true if this method should ignore keto groups, i.e. also return true if there are some attached to the cycle- Returns:
- true, if all exocyclic bonds connected to the ring are of single order
-
hasGlycosidicBond
protected boolean hasGlycosidicBond(IAtomContainer ringToTest, IAtomContainer originalMolecule)
Checks all exocyclic connections of the given ring to detect an O-glycosidic bond. Checklist for glycosidic bond: Connected oxygen atom that is not in the ring, has two bonds that are both of single order and no bond partner is a hydrogen atom. This algorithm also classifies ester bonds as glycosidic bonds and any other bond type that meets the above criteria. Therefore, many 'non-classical, glycoside-like' connections are classified as O-glycosidic bonds.
Note: The 'ring' is not tested for whether it is circular or not. So theoretically, this method can also be used to detect glycosidic bonds of linear structures. BUT: The oxygen atom must not be part of the structure itself. Due to the processing of candidate linear sugar moieties this can make it difficult to use this method also for linear sugars.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
ringToTest
- the candidate sugar ringoriginalMolecule
- the molecule in which the ring is contained as a substructure to query the connected atoms from- Returns:
- true, if a glycosidic bond is detected
-
isMoleculeEmptyAfterRemovalOfThisRing
protected boolean isMoleculeEmptyAfterRemovalOfThisRing(IAtomContainer ringParam, IAtomContainer moleculeParam)
Checks whether the given molecule would be empty after removal of the given ring. Any remaining fragment will be cleared away if it is too small according to the set preservation mode option. The given parameters are not altered, copies of them are generated and processed. This method is intended to test for whether a molecule qualifies for the gylcosidic bond exemption.- Parameters:
ringParam
- the ring to test whether its removal would result in an empty moleculemoleculeParam
- the parent molecule- Returns:
- true if the parent molecule is empty after removal of the given ring and subsequent removal of too small remaining fragments
-
getExocyclicOxygenAtomCount
protected int getExocyclicOxygenAtomCount(IAtomContainer ringToTest, IAtomContainer originalMolecule)
Returns the number of attached exocyclic oxygen atoms of a given ring in the original atom container. The method iterates over all cyclic atoms and all of their connected atoms. So the runtime scales linear with the number of cyclic atoms and their connected atoms. The oxygen atoms are not tested for being attached by a single bond since in the algorithm, the determination whether a candidate sugar ring has only exocyclic single bonds precedes the calling of this method.
Note: The circularity of the given 'ring' is not tested, so this method could in theory also be used for linear structures. But his does not make much sense.
Note: This method does NOT check for hydroxy groups but for oxygen atoms. So e.g. the oxygen atom in a glycosidic bond is counted.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
ringToTest
- the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameter (they will be skipped otherwise)originalMolecule
- the molecule that contains the ring under investigation; The exocyclic bonds will be queried from it- Returns:
- number of attached exocyclic oxygen atoms of the given ring
-
doesRingHaveEnoughExocyclicOxygenAtoms
protected boolean doesRingHaveEnoughExocyclicOxygenAtoms(int numberOfAtomsInRing, int numberOfAttachedExocyclicOxygenAtoms)
Simple decision-making function for deciding whether a candidate sugar ring has enough attached, single-bonded exocyclic oxygen atoms according to the set threshold. The given number of oxygen atoms is divided by the given number of atoms in the ring (should also contain the usually present oxygen atom in a sugar ring) and the resulting ratio is checked for being equal or higher than the currently set threshold.
Note: Only the number of atoms in the ring is checked for not being 0. No further parameter tests are implemented. If the number is 0, false is returned. No exceptions are thrown.- Parameters:
numberOfAtomsInRing
- number of atoms in the possible sugar ring, including the cyclic oxygen atomnumberOfAttachedExocyclicOxygenAtoms
- number of attached exocyclic oxygen atoms of the ring under investigation (if zero, false is returned)- Returns:
- true, if the calculated ratio is equal to or higher than the currently set threshold
-
updateLinearSugarPatterns
protected void updateLinearSugarPatterns()
All linear sugar patterns represented by atom containers in the respective list are sorted, parsed into actual pattern objects, and stored in the internal list for initial linear sugar detection. To be called when a linear sugar pattern has been deleted or added to the list. It cannot directly be operated on the pattern objects because they cannot be sorted or represented in a human-readable format.
-
detectLinearSugarCandidatesByPatternMatching
protected List<IAtomContainer> detectLinearSugarCandidatesByPatternMatching(IAtomContainer moleculeParam)
Initial detection of linear sugar candidates by substructure search for the linear sugar patterns in the given molecule. All 'unique' matches are returned as atom container objects. this means that the same substructure will not be included multiple times but the substructures may overlap.- Parameters:
moleculeParam
- the molecule to search for linear sugar candidates- Returns:
- a list of possibly overlapping substructures from the given molecule matching the internal linear sugar patterns
-
combineOverlappingCandidates
protected List<IAtomContainer> combineOverlappingCandidates(List<IAtomContainer> candidateList)
Combines all overlapping (i.e. sharing the same atoms or bonds) structures in the given list into one atom container, respectively, to return distinct, non-overlapping substructures. Second step of linear sugar detection. Note: The returned substructures can grow very big. This is addressed in the third step. The parameter list is not altered and a completely new list returned.- Parameters:
candidateList
- a list of possibly overlapping substructures from the same atom container object- Returns:
- a list of distinct, non-overlapping substructures after combining every formerly overlapping structure
-
splitEtherEsterAndPeroxideBonds
protected List<IAtomContainer> splitEtherEsterAndPeroxideBonds(List<IAtomContainer> candidateList)
Splits all ether, ester, and peroxide bonds in the given linear sugar candidates and separates those that get disconnected in the process. Third step of linear sugar detection. This step was introduced because the linear sugar candidates returned by the combination method can be very big and contain connected sugar chains that should be detected as separate candidates. The detection is done using SMARTS patterns that are constants of this class. The parameter list is not altered and a completely new list returned.- Parameters:
candidateList
- a list of potential sugar substructures from the same atom container object- Returns:
- a new list of candidates where all ether, ester, and peroxide bonds have been split and disconnected candidates separated
-
removeAtomsOfCircularSugarsFromCandidates
protected void removeAtomsOfCircularSugarsFromCandidates(List<IAtomContainer> candidateList, IAtomContainer parentMolecule)
Removes all atoms belonging to possible circular sugars, as returned by the method for initial circular sugar detection, from the given linear sugar candidates. Fourth step of linear sugar detection. The linear sugar patterns also match parts of circular sugar, so this step has to be done to ensure the separate treatment of circular and linear sugars. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
candidateList
- a list of potential sugar substructures from the same atom container objectparentMolecule
- the molecule that is currently scanned for linear sugars to detect its circular sugars
-
removeCyclicAtomsFromSugarCandidates
protected void removeCyclicAtomsFromSugarCandidates(List<IAtomContainer> candidateList, IAtomContainer moleculeParam)
Removes all atoms that are part of a cycle from the given linear sugar candidates. Optional fifth step of linear sugar detection. The linear sugar patterns can also match in cycles that do not represent circular sugars but e.g. pseudo-sugars or macrocycles. It is optional to detect linear sugars in such structures or not. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
candidateList
- a list of potential sugar substructures from the same atom container objectmoleculeParam
- the molecule that is currently scanned for linear sugars to detect its cycles
-
removeTooSmallAndTooLargeCandidates
protected List<IAtomContainer> removeTooSmallAndTooLargeCandidates(List<IAtomContainer> candidateList)
Discards all linear sugar candidates that are too small or too big according to the current settings. Final step of linear sugar detection. This step was introduced because the preceding steps may produce small 'fragments', e.g. the hydroxy group of a circular sugar that was removed from a linear sugar candidate. These should be filtered out. ALso, a very large linear sugar that does not consist of multiple subunits linked by ether, ester, or peroxide bonds is considered too interesting to remove and should therefore also be filtered from the linear sugars detected for removal. The 'size' of the linear sugar candidates is determined as their carbon atom count. The set minimum and maximum sizes are inclusive. The parameter list is not altered and a completely new list returned.- Parameters:
candidateList
- a list of potential sugar substructures from the same atom container object- Returns:
- a new list of candidates where all too small and too big candidates have been filtered out
-
-