Class SugarRemovalUtility


  • public class SugarRemovalUtility
    extends Object
    The Sugar Removal Utility (SRU) implements a generalized algorithm for automated detection of circular and linear sugars in molecular structures and their removal, as described in "Schaub, J., Zielesny, A., Steinbeck, C., Sorokina, M. Too sweet: cheminformatics for deglycosylation in natural products. J Cheminform 12, 67 (2020). https://doi.org/10.1186/s13321-020-00467-y". It offers various functions to detect and remove sugar moieties with different options. Example usage:
    
     //prepare test molecule
     SmilesParser smiPar = new SmilesParser(SilentChemObjectBuilder.getInstance());
     //COCONUT DB CNP0220816
     IAtomContainer molecule = smiPar.parseSmiles("CC1=CC(OC2OC(CO)C(O)C(O)C2O)=C2C3=C(CCC3)C(=O)OC2=C1");
     //instantiate sugar removal utility
     SugarRemovalUtility sugarRemovalUtil = new SugarRemovalUtility(SilentChemObjectBuilder.getInstance());
     //remove sugar moieties, note that this changes the molecule instance!
     boolean sugarsWereRemoved = sugarRemovalUtil.removeCircularAndLinearSugars(molecule);
     if (sugarsWereRemoved) {
         //saturate open valences where sugars were situated if needed
         AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(molecule);
         CDKHydrogenAdder.getInstance(SilentChemObjectBuilder.getInstance()).addImplicitHydrogens(molecule);
     }
     
    Author:
    Jonas Schaub, Maria Sorokina
    • Field Detail

      • INDEX_PROPERTY_KEY

        public static final String INDEX_PROPERTY_KEY
        Property key for index that is added to any IAtom object in a given IAtomContainer object for internal unique identification of the respective IAtom object. For internal use only.
        See Also:
        Constant Field Values
      • IS_SPIRO_ATOM_PROPERTY_KEY

        public static final String IS_SPIRO_ATOM_PROPERTY_KEY
        Key for property that is added to IAtom objects that connect a spiro ring system for identification and preservation of these atoms in the removal process. For internal use only.
        See Also:
        Constant Field Values
      • TOO_SMALL_DISCONNECTED_PART_TO_PRESERVE_PROPERTY_KEY

        public static final String TOO_SMALL_DISCONNECTED_PART_TO_PRESERVE_PROPERTY_KEY
        Key for property that is added to IAtom objects that belong to unconnected parts of an input molecule that would be cleared away because it is too small according to the set preservation mode for preservation of these atoms in the removal process. For internal use only.
        See Also:
        Constant Field Values
      • LINEAR_SUGARS_SMILES

        protected static final String[] LINEAR_SUGARS_SMILES
        Linear sugar structures represented as SMILES codes. An input molecule is scanned for these substructures for the detection of linear sugars. This set consists of multiple aldoses, ketoses, and sugar alcohols with sizes between 3 and 7 carbons. Additional structures can be added or specific ones removed from the set at run-time using the respective methods.
      • LINEAR_ACIDIC_SUGARS_SMILES

        protected static final String[] LINEAR_ACIDIC_SUGARS_SMILES
        Linear acidic sugar structures represented as SMILES codes. These can be optionally added to the linear sugar structures used for initial detection of linear sugars in an input molecule.
      • CIRCULAR_SUGARS_SMILES

        protected static final String[] CIRCULAR_SUGARS_SMILES
        Circular sugar structures represented as SMILES codes. The isolated rings of an input molecule are matched with these structures for the detection of circular sugars. The structures listed here only represent the circular part of sugar rings (i.e. one oxygen atom and multiple carbon atoms). Common exocyclic structures like hydroxy groups are not part of the patterns and therefore not part of the detected circular sugar moieties. The set includes tetrahydrofuran, tetrahydropyran, and oxepane to match furanoses, pyranoses, and heptoses per default. It can be configured at run-time using the respective methods.
      • DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT

        public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT
        Default setting for whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed (default: false).
        See Also:
        Constant Field Values
      • REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT

        public static final boolean REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT
        Default setting for whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures (default: true).
        See Also:
        Constant Field Values
      • PRESERVATION_MODE_DEFAULT

        public static final SugarRemovalUtility.PreservationMode PRESERVATION_MODE_DEFAULT
        Default setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. (default: preserve all structures that consist of 5 or more heavy atoms). The set option plays a major role in discriminating terminal and non-terminal sugar moieties. The minimum value to reach for the respective characteristic to judge by is set in an additional option and all enum constants have their own default values. See the PreservationMode enum.
      • DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT

        public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT
        Default setting for whether detected circular sugar candidates must have a sufficient number of attached, single-bonded exocyclic oxygen atoms in order to be detected as a sugar moiety (default: true). The 'sufficient number' is defined in another option / default setting.
        See Also:
        Constant Field Values
      • EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT

        public static final double EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT
        Default setting for the minimum ratio of attached exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated (default: 0.5 so at a minimum 3 connected, exocyclic oxygen atoms for a six-membered ring, for example).
        See Also:
        Constant Field Values
      • DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT

        public static final boolean DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT
        Default setting for whether linear sugar structures that are part of a ring should be detected (default: false). This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
        See Also:
        Constant Field Values
      • LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT

        public static final int LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT
        Default setting for the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed, default: 4, inclusive).
        See Also:
        Constant Field Values
      • LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT

        public static final int LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT
        Default setting for the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed, default: 7, inclusive).
        See Also:
        Constant Field Values
      • DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT

        public static final boolean DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT
        Default setting for whether to include the linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule (default: false).
        See Also:
        Constant Field Values
      • DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT

        public static final boolean DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT
        Default setting for whether to include spiro rings in the initial set of detected rings considered for circular sugar detection (default: false). If the option is turned on and a spiro sugar ring is removed, its atom connecting it to another ring is preserved.
        See Also:
        Constant Field Values
      • DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT

        public static final boolean DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT
        Default setting for whether sugar-like rings that have keto groups should also be detected as circular sugars (default: false). The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off (default), every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.
        See Also:
        Constant Field Values
      • ESTER_SMARTS_PATTERN

        protected static final SmartsPattern ESTER_SMARTS_PATTERN
        Daylight SMARTS pattern for matching ester bonds between linear sugars. Defines an aliphatic carbon atom connected to a double-bonded oxygen atom and a single-bonded oxygen atom that must not be in a ring and is connected to another aliphatic carbon atom via a single bond. The oxygen atom must not be in a ring to avoid breaking circular sugars.
      • ETHER_SMARTS_PATTERN

        protected static final SmartsPattern ETHER_SMARTS_PATTERN
        Daylight SMARTS pattern for matching ether bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is in turn connected to another aliphatic carbon atom. The oxygen atom must not be in a ring to avoid breaking circular sugars. This pattern also matches ester bonds which is why esters must be detected and processed before ethers.
      • PEROXIDE_SMARTS_PATTERN

        protected static final SmartsPattern PEROXIDE_SMARTS_PATTERN
        Daylight SMARTS pattern for matching peroxide bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is connected to another oxygen atom of the same kind, followed by another aliphatic carbon atom. Even tough it is highly unlikely for a peroxide bond to be in a ring, every ring should be preserved.
    • Constructor Detail

      • SugarRemovalUtility

        public SugarRemovalUtility​(IChemObjectBuilder builder)
        Sole constructor of this class. All settings are set to their default values (see public static constants or enquire via get/is methods). To change these settings, use the respective 'setXY()' methods.
        Parameters:
        builder - IChemObjectBuilder for i.a. parsing SMILES strings of sugar patterns into atom containers
    • Method Detail

      • getLinearSugarPatternsList

        public List<IAtomContainer> getLinearSugarPatternsList()
        Returns a list of atom containers representing the linear sugar structures an input molecule is scanned for in linear sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures not, and the linear acidic sugar structures are only included if the respective option is activated.
        Note: do not change the returned list but use addLinearSugarToPatternsList(IAtomContainer) and removeLinearSugarFromPatternsList(IAtomContainer) to modify it (or the corresponding SMILES string-based methods) because these sync updates with the actually used list of DfPattterns.
        Returns:
        a list of atom containers representing a current snapshot of the linear sugar patterns used for detection
      • getCircularSugarPatternsList

        public List<IAtomContainer> getCircularSugarPatternsList()
        Returns a list of atom containers representing the circular sugar structures an input molecule is scanned for in circular sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures are not.
        Note: do not change the returned list but use addCircularSugarToPatternsList(IAtomContainer) and removeCircularSugarFromPatternsList(IAtomContainer) to modify it (or the corresponding SMILES string-based methods) because these sync updates with the actually used list of DfPattterns.
        Returns:
        a list of atom containers representing a current snapshot of the circular sugar patterns used for detection
      • areOnlyCircularSugarsWithOGlycosidicBondDetected

        public boolean areOnlyCircularSugarsWithOGlycosidicBondDetected()
        Specifies whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed.
        Returns:
        true if only circular sugar moieties connected via a glycosidic bond are removed according to the current settings
      • areOnlyTerminalSugarsRemoved

        public boolean areOnlyTerminalSugarsRemoved()
        Specifies whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.
        Returns:
        true if only terminal sugar moieties are removed according to the current settings
      • getPreservationModeSetting

        public SugarRemovalUtility.PreservationMode getPreservationModeSetting()
        Returns the current setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight, or it can be specified that all structures are to be preserved. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough').
        Returns:
        a PreservationMode enum object representing the current setting
      • getPreservationModeThresholdSetting

        public int getPreservationModeThresholdSetting()
        Returns the current threshold of e.g. molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule by the removal of a sugar moiety has to reach in order to be preserved and not discarded.
        Returns:
        an integer specifying the currently set threshold (either specified in Da or number of heavy atoms)
      • areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected

        public boolean areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected()
        Specifies whether detected circular sugar candidates must have a sufficient number of attached exocyclic oxygen atoms in order to be detected as a sugar moiety. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting.
        Returns:
        true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring is evaluated at circular sugar detection according to the current settings
      • getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting

        public double getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting()
        Returns the currently set minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.
        Returns:
        the minimum ratio of attached oxygen atoms to the number of atoms in the sugar ring; A value of e.g. 0.5 means that a six-membered sugar ring needs at least 3 attached oxygen atoms to be classified as a circular sugar moiety
      • areLinearSugarsInRingsDetected

        public boolean areLinearSugarsInRingsDetected()
        Specifies whether linear sugar structures that are part of a ring should be detected according to the current settings. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
        Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.
        Returns:
        true if linear sugars in rings are detected and removed with the current settings
      • getLinearSugarCandidateMinSizeSetting

        public int getLinearSugarCandidateMinSizeSetting()
        Returns the currently set minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).
        Returns:
        the set minimum carbon atom count of detected linear sugars (inclusive)
      • getLinearSugarCandidateMaxSizeSetting

        public int getLinearSugarCandidateMaxSizeSetting()
        Returns the currently set maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).
        Returns:
        the set maximum carbon atom count of detected linear sugars (inclusive)
      • areLinearAcidicSugarsDetected

        public boolean areLinearAcidicSugarsDetected()
        Specifies whether linear acidic sugar patterns are currently included in the linear sugar structures used for initial detection of linear sugars in a given molecule.
        Returns:
        true if acidic sugars are detected
      • areSpiroRingsDetectedAsCircularSugars

        public boolean areSpiroRingsDetectedAsCircularSugars()
        Specifies whether spiro rings are included in the initial set of detected rings considered for circular sugar detection.
        Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.
        Returns:
        true if spiro rings can be detected as circular sugars with the current settings
      • areCircularSugarsWithKetoGroupsDetected

        public boolean areCircularSugarsWithKetoGroupsDetected()
        Specifies whether potential sugar cycles with keto groups are detected in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.
        Returns:
        true if potential sugar cycles having keto groups are detected in circular sugar detection
      • addCircularSugarToPatternsList

        public boolean addCircularSugarToPatternsList​(IAtomContainer circularSugar)
        Allows to add an additional sugar ring to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorphic to the already present ones, and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.
        Parameters:
        circularSugar - an atom container representing only one isolated sugar ring
        Returns:
        true if the addition was successful; false if the given atom container is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring or is isomorphic to a circular sugar structure already present
      • addCircularSugarToPatternsList

        public boolean addCircularSugarToPatternsList​(String smilesCode)
        Allows to add an additional sugar ring (represented as a SMILES string) to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorphic to the already present ones, and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.
        Parameters:
        smilesCode - a SMILES code representation of a molecule consisting of only one isolated sugar ring
        Returns:
        true if the addition was successful; false if the given SMILES string is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring, is isomorphic to a circular sugar structure already present or if the given SMILES string cannot be parsed into a molecular structure
      • addLinearSugarToPatternsList

        public boolean addLinearSugarToPatternsList​(IAtomContainer linearSugar)
        Allows to add an additional linear sugar to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorphic to the already present ones or the patterns for circular sugars.
        Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
        Additional note: If the given structure is isomorphic to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.
        Parameters:
        linearSugar - an atom container representing a molecular structure to search for at linear sugar detection
        Returns:
        true if the addition was successful; false if the given atom container is empty or is isomorphic to a linear sugar structure already present or a circular sugar pattern
      • addLinearSugarToPatternsList

        public boolean addLinearSugarToPatternsList​(String smilesCode)
        Allows to add an additional linear sugar (represented as SMILES string) to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorphic to the already present ones or the patterns for circular sugars.
        Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
        Additional note: If the given structure is isomorphic to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.
        Parameters:
        smilesCode - a SMILES code representation of a molecular structure to search for
        Returns:
        true if the addition was successful; false if the given SMILES string is empty or does represent a molecule that is isomorphic to a linear sugar structure already present or a circular sugar pattern or if it cannot be parsed into a molecular structure
      • removeCircularSugarFromPatternsList

        public boolean removeCircularSugarFromPatternsList​(String smilesCode)
        Allows to remove a sugar ring pattern (represented as SMILES string) from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given character string must be a valid SMILES notation and be isomorphic to one of the currently used structure patterns. Example usage: Pass the argument "C1CCOC1" (tetrahydrofuran) to stop detecting furanoses in the circular sugar detection algorithm.
        Parameters:
        smilesCode - a SMILES code representation of a structure present in the circular sugar pattern list
        Returns:
        true if the removal was successful; false if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the circular sugar pattern list
      • removeCircularSugarFromPatternsList

        public boolean removeCircularSugarFromPatternsList​(IAtomContainer circularSugar)
        Allows to remove a sugar ring from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given molecule must be isomorphic to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of tetrahydrofuran to stop detecting furanoses in the circular sugar detection algorithm.
        Parameters:
        circularSugar - a molecule isomorphic to a structure present in the circular sugar pattern list
        Returns:
        true if the removal was successful; false if the given atom container is empty or its structure is not isomorphic to a circular sugar pattern structure in use
      • removeLinearSugarFromPatternsList

        public boolean removeLinearSugarFromPatternsList​(String smilesCode)
        Allows to remove a linear sugar pattern (represented as SMILES string) from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given character string must be a valid SMILES notation and be isomorphic to one of the currently used structure patterns. Example usage: Pass the argument "C(C(C=O)O)O" (aldotriose) to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
        Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.
        Parameters:
        smilesCode - a SMILES code representation of a structure present in the linear sugar pattern list
        Returns:
        true if the removal was successful; false if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the linear sugar pattern list
      • removeLinearSugarFromPatternsList

        public boolean removeLinearSugarFromPatternsList​(IAtomContainer linearSugar)
        Allows to remove a linear sugar pattern from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given molecule must be isomorphic to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of aldotriose to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
        Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.
        Parameters:
        linearSugar - a molecule isomorphic to a structure present in the linear sugar pattern list
        Returns:
        true if the removal was successful; false if the given atom container is empty or its structure is not isomorphic to a linear sugar pattern structure in use
      • clearCircularSugarPatternsList

        public void clearCircularSugarPatternsList()
        Clears all the circular sugar structures an input molecule is scanned for in circular sugar detection.
      • clearLinearSugarPatternsList

        public void clearLinearSugarPatternsList()
        Clears all the linear sugar structures an input molecule is scanned for in linear sugar detection. If the detection of linear acidic sugars is turned on, it is turned off in this method and these structures are also cleared from the linear sugar patterns.
      • setDetectCircularSugarsOnlyWithOGlycosidicBondSetting

        public void setDetectCircularSugarsOnlyWithOGlycosidicBondSetting​(boolean bool)
        Sets the option to only detect (and subsequently remove) circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond.
        Parameters:
        bool - true, if only circular sugar moieties connected via a glycosidic bond should be detected (and removed)
      • setRemoveOnlyTerminalSugarsSetting

        public void setRemoveOnlyTerminalSugarsSetting​(boolean bool)
        Sets the option to remove only terminal sugar moieties, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.
        Parameters:
        bool - true, if only terminal sugar moieties should be removed
      • setPreservationModeSetting

        public void setPreservationModeSetting​(SugarRemovalUtility.PreservationMode option)
        Sets the preservation mode for structures that get disconnected by sugar removal and the preservation mode threshold is set to the default value of the given enum constant. The preservation mode option specifies how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight, or it can be specified that all structures are to be preserved. The available options can be selected from the PreservationMode enum. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough'). This threshold is set to the default value of the given enum constant in this method.
        Note that if the option "ALL" is combined with the removal of only terminal moieties, even the smallest attached structure will prevent the removal of a sugar. The most important consequence is that circular sugars with any hydroxy groups will not be removed because these are not considered as part of the sugar moiety.
        Parameters:
        option - the selected preservation mode option
      • setPreservationModeThresholdSetting

        public void setPreservationModeThresholdSetting​(int threshold)
        Sets the preservation mode threshold, i.e. the molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule during the removal of a sugar moiety has to reach in order to be kept and not removed along with the sugar. If the preservation mode is set to "HEAVY_ATOM_COUNT", the threshold is interpreted as the needed minimum number of heavy atoms and if it is set to "MOLECULAR_WEIGHT", the threshold is interpreted as minimum molecular weight in Da.
        Notes: A threshold of zero can be set here, but it is recommended to choose the preservation mode "ALL" instead. On the other hand, if the preservation mode is set to "ALL", this threshold is automatically set to zero.
        Parameters:
        threshold - the new threshold
      • setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting

        public void setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting​(boolean bool)
        Sets the option to only detect (and subsequently remove) circular sugars that have a sufficient number of attached, exocyclic, single-bonded oxygen atoms. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting. If this option is re-activated, the previously set threshold is used again or the default value if no custom threshold has been set before.
        Parameters:
        bool - true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring should be evaluated at circular sugar detection
      • setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting

        public boolean setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting​(double threshold)
        Sets the minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.
        A ratio of e.g. 0.5 means that a six-membered candidate sugar ring needs to have at least 3 attached, exocyclic single-bonded oxygen atoms in order to be classified as a circular sugar.
        A zero value can be given if the option to remove only sugar rings with a sufficient number of exocyclic oxygen atoms is activated, but it is recommended to turn this option of instead.
        Note: The normally present oxygen atom within a sugar ring is included in the number of ring atoms. So setting the threshold to 1.0 implies that at least one of the carbon atoms in the ring has two attached oxygen atoms. In general, the threshold can be set to values higher than 1.0 or even to negative values, but it does not make a lot of sense.
        Parameters:
        threshold - the new ratio threshold
        Returns:
        true if updating the value was successful; false if the given number is infinite, 'NaN', or smaller than 0
      • setDetectLinearSugarsInRingsSetting

        public void setDetectLinearSugarsInRingsSetting​(boolean bool)
        Sets the option to detect linear sugar structures that are part of a ring. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
        Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.
        Parameters:
        bool - true, if linear sugar structures that are part of a ring should be detected (and removed)
      • setLinearSugarCandidateMinSizeSetting

        public void setLinearSugarCandidateMinSizeSetting​(int minSize)
        Sets the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed). This minimum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too small are discarded.
        Note: It is not tested whether the given minimum size is actually smaller than the set maximum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.
        Parameters:
        minSize - the new minimum size (inclusive) of linear sugars detected, interpreted as carbon atom count
      • setLinearSugarCandidateMaxSizeSetting

        public void setLinearSugarCandidateMaxSizeSetting​(int maxSize)
        Sets the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed). This maximum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too big are discarded.
        Note: It is not tested whether the given maximum size is actually greater than the set minimum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.
        Parameters:
        maxSize - the new maximum size (inclusive) of linear sugars detected, interpreted as carbon atom count
      • setDetectLinearAcidicSugarsSetting

        public void setDetectLinearAcidicSugarsSetting​(boolean bool)
        Sets the option to include linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule. If the option is turned on, the linear acidic sugar patterns are added to the linear sugar patterns list and can be retrieved and configured in the same way as the 'normal' linear sugar patterns. If the option is turned off, they are all removed again from the linear sugar patterns list.
        Parameters:
        bool - true, if linear acidic sugar patterns should be included in the linear sugar structures used for initial detection of linear sugars
      • setDetectSpiroRingsAsCircularSugarsSetting

        public void setDetectSpiroRingsAsCircularSugarsSetting​(boolean bool)
        Sets the option to include spiro rings in the initial set of detected rings considered for circular sugar detection. If the option is turned on, spiro atoms connected two spiro rings will be protected if a spiro sugar ring is removed. In the opposite case, spiro rings will be filtered from the set of isolated cycles detected in the given molecule.
        Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.
        Parameters:
        bool - true, if spiro rings should be detectable as circular sugars
      • setDetectCircularSugarsWithKetoGroupsSetting

        public void setDetectCircularSugarsWithKetoGroupsSetting​(boolean bool)
        Sets the option to detect potential sugar cycles with keto groups as circular sugars in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.
        Parameters:
        bool - true, if circular sugars with keto groups should be detected
      • restoreDefaultSettings

        public void restoreDefaultSettings()
        Sets all settings to their default values (see public static constants or enquire via get/is methods). This includes the pattern lists for linear and circular sugars. To call this method is equivalent to using the constructor of this class.
      • hasLinearSugars

        public boolean hasLinearSugars​(IAtomContainer moleculeParam)
        Detects linear sugar moieties in the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal linear sugar moieties are detected.
        Parameters:
        moleculeParam - the atom container to scan for the presence of linear sugar moieties
        Returns:
        true, if the given molecule contains linear sugar moieties
      • hasCircularSugars

        public boolean hasCircularSugars​(IAtomContainer moleculeParam)
        Detects circular sugar moieties in the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal circular sugar moieties are detected.
        Parameters:
        moleculeParam - the atom container to scan for the presence of circular sugar moieties
        Returns:
        true, if the given molecule contains circular sugar moieties
      • hasCircularOrLinearSugars

        public boolean hasCircularOrLinearSugars​(IAtomContainer moleculeParam)
        Detects circular and linear sugar moieties in the given molecule, according to the current settings for sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal sugar moieties are detected.
        Parameters:
        moleculeParam - the atom container to scan for the presence of sugar moieties
        Returns:
        true, if the given molecule contains sugar moieties of any kind (circular or linear)
      • isQualifiedForGlycosidicBondExemption

        public boolean isQualifiedForGlycosidicBondExemption​(IAtomContainer moleculeParam)
        Tests whether the given molecule qualifies for the glycosidic bond exemption. This is true for molecules that practically are single-cycle circular sugars, meaning that the molecule is empty if the sugar ring is detected and removed according to the current settings. These molecules or sugar rings do not need to have a glycosidic bond in order to be detected as a sugar ring if the option to only detect those circular sugars that have one is activated. This exemption was introduced because these molecules do not contain any other structure to bind to via a glycosidic bond.
        Note: It is checked whether the sugar ring really does not have a glycosidic bond and false is returned if it does.
        Parameters:
        moleculeParam - the molecule to check
        Returns:
        true, if the given molecule qualifies for the exemption (it only has one sugar cycle, is empty after its removal, and does not have a glycosidic bond); false otherwise
      • getNumberOfCircularSugars

        public int getNumberOfCircularSugars​(IAtomContainer moleculeParam)
        Detects circular sugar moieties in the given molecule according to the current settings for circular sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).
        Parameters:
        moleculeParam - the atom container to scan for the presence of circular sugar moieties
        Returns:
        an integer representing the number of detected circular sugar moieties in the given molecule
      • getNumberOfLinearSugars

        public int getNumberOfLinearSugars​(IAtomContainer moleculeParam)
        Detects linear sugar moieties in the given molecule according to the current settings for linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).
        Parameters:
        moleculeParam - the atom container to scan for the presence of linear sugar moieties
        Returns:
        an integer representing the number of detected linear sugar moieties in the given molecule
      • getNumberOfCircularAndLinearSugars

        public int getNumberOfCircularAndLinearSugars​(IAtomContainer moleculeParam)
        Detects circular and linear sugar moieties in the given molecule according to the current settings for circular and linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).
        Parameters:
        moleculeParam - the atom container to scan for the presence of circular and linear sugar moieties
        Returns:
        an integer representing the number of detected circular and linear sugar moieties in the given molecule
      • removeCircularSugars

        public boolean removeCircularSugars​(IAtomContainer moleculeParam)
        Removes circular sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        The aglycone will have open valences in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
        If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
        If all the circular sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
        If the given molecule consists only of circular sugars, an empty atom container is left after processing.
        Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
        Parameters:
        moleculeParam - the molecule to remove circular sugar moieties from
        Returns:
        true if sugar moieties were detected and removed
      • removeAndReturnCircularSugars

        public List<IAtomContainer> removeAndReturnCircularSugars​(IAtomContainer moleculeParam)
        Removes circular sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed circular sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
        If all the circular sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
        If the given molecule consists only of circular sugars, an empty atom container is returned at list index 0.
        The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
        Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
        If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.
        Parameters:
        moleculeParam - the molecule to remove circular sugar moieties from
        Returns:
        a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed circular sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
      • removeLinearSugars

        public boolean removeLinearSugars​(IAtomContainer moleculeParam)
        Removes linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        The aglycone will have open valences in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
        If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
        If all the linear sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
        If the given molecule consists only of linear sugars, an empty atom container is left after processing.
        Parameters:
        moleculeParam - the molecule to remove linear sugar moieties from
        Returns:
        true if sugar moieties were detected and removed
      • removeAndReturnLinearSugars

        public List<IAtomContainer> removeAndReturnLinearSugars​(IAtomContainer moleculeParam)
        Removes linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed linear sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
        If all the linear sugar moieties are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
        If the given molecule consists only of linear sugars, an empty atom container is returned at list index 0.
        The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
        If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.
        Parameters:
        moleculeParam - the molecule to remove linear sugar moieties from
        Returns:
        a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed linear sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
      • removeCircularAndLinearSugars

        public boolean removeCircularAndLinearSugars​(IAtomContainer moleculeParam)
        Removes circular and linear sugar moieties from the given atom container (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycone (the 'true' aglycone) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
        If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
        If all the circular and linear sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
        If the given molecule consists only of sugars, an empty atom container is returned.
        Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
        Parameters:
        moleculeParam - the molecule to remove circular and linear sugar moieties from
        Returns:
        true if sugar moieties were detected and removed
      • removeAndReturnCircularAndLinearSugars

        public List<IAtomContainer> removeAndReturnCircularAndLinearSugars​(IAtomContainer moleculeParam)
        Removes circular and linear sugar moieties from the given atom container and returns the resulting aglycone (at list index 0) and the removed sugar moieties (this operation modifies the given atom container, so clone the object beforehand if you need to preserve its original structure!). Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
        If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycone (the 'true' aglycone) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
        If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
        If all the circular and linear sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule (aglycone at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
        If the given molecule consists only of sugars, an empty atom container is returned at list index 0.
        The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
        Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
        If no sugar moieties were removed, the returned list is of size 1 and its only element is the molecule given as parameter, unchanged.
        Parameters:
        moleculeParam - the molecule to remove circular and linear sugar moieties from
        Returns:
        a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed sugar moieties at the remaining list positions. The returned aglycone may be unconnected if also non-terminal sugars are removed according to the settings, and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties and so does the aglycone in places where removed sugar moieties were formerly situated. If you do not care for where the removed moieties were sitting, saturate with implicit hydrogen atoms.
      • getCircularSugarCandidates

        public List<IAtomContainer> getCircularSugarCandidates​(IAtomContainer moleculeParam)
        Extracts circular sugar moieties from the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.
        Parameters:
        moleculeParam - the molecule to extract circular sugar moieties from
        Returns:
        a list of substructures in the given molecule that are regarded as circular sugar moieties
      • getLinearSugarCandidates

        public List<IAtomContainer> getLinearSugarCandidates​(IAtomContainer moleculeParam)
        Extracts linear sugar moieties from the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.
        Parameters:
        moleculeParam - the molecule to extract linear sugar moieties from
        Returns:
        a list of substructures in the given molecule that are regarded as linear sugar moieties
      • flagTooSmallDisconnectedPartsToPreserve

        protected void flagTooSmallDisconnectedPartsToPreserve​(IAtomContainer moleculeParam)
        Checks the given input molecule for unconnected parts that would be cleared away after sugar removal, in order to add a flag to the atoms of these parts to be able to preserve them in the later removal. Note that this method does not check whether the molecule is actually disconnected and whether the preservation mode is not set to 'preserve all'.
        Parameters:
        moleculeParam - the disconnected input molecule to check
      • removeTooSmallDisconnectedStructures

        protected void removeTooSmallDisconnectedStructures​(IAtomContainer moleculeParam)
        Removes all unconnected fragments that are too small to keep according to the current preservation mode and threshold setting. If all structures are too small, an empty atom container is returned.
        This does not guarantee that the resulting atom container consists of only one connected structure. There might be multiple unconnected structures that are big enough to be preserved.
        Parameters:
        moleculeParam - the molecule to clean up; it might be empty after this method call but not null
      • isTooSmallToPreserve

        protected boolean isTooSmallToPreserve​(IAtomContainer moleculeParam)
        Checks whether the given molecule or structure is too small to be kept according to the current preservation mode and threshold setting.
        Parameters:
        moleculeParam - the molecule to check
        Returns:
        true, if the given structure is too small to be preserved
      • isTerminal

        protected boolean isTerminal​(IAtomContainer substructure,
                                     IAtomContainer parentMolecule,
                                     List<IAtomContainer> candidateList)
        Checks whether the given substructure is terminal (i.e. it can be removed without producing multiple unconnected structures in the remaining molecule) in the given parent molecule. To do this, the parent molecule is copied, the substructure is removed from this copy, and finally it is checked whether the parent molecule copy still consists of only one connected structure. If that is the case, the substructure is terminal. If the preservation mode is not set to 'preserve all structures', too small resulting fragments are cleared from the parent copy in between. These structures that are too small must also not be part of any other substructure in the given candidate list to avoid removing parts of other sugar candidates.
        Note: This method only detects moieties that are immediately terminal. It will not deem terminal a sugar moiety that only becomes terminal after the removal of another sugar moiety, for example.
        Parameters:
        substructure - the substructure to check for whether it is terminal
        parentMolecule - the molecule the substructure is a part of
        candidateList - a list containing the detected sugar candidates to check whether atoms of other candidates would be cleared away if the given substructure was removed (which has to be avoided)
        Returns:
        true, if the substructure is terminal
      • removeSugarCandidates

        protected List<IAtomContainer> removeSugarCandidates​(IAtomContainer moleculeParam,
                                                             List<IAtomContainer> candidateList)
        Removes the given sugar moieties (or substructures in general) from the given molecule and returns the removed moieties (not the aglycone!). The removal algorithm is the same for linear and circular sugars. The only settings influencing the removal are the option specifying whether to remove only terminal sugar moieties and the set preservation mode (because it influences the determination of terminal vs. non-terminal).
        If only terminal sugar moieties are to be removed, the sugar candidates are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
        If all the sugars are to be removed from the given molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
        In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after this method call, whereas in the former case, the remaining structure always consists of one connected structure.
        Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved.
        Note that the deglycosylated core is not returned as part of the given list in this method.
        Parameters:
        moleculeParam - the molecule to remove the sugar candidates from
        candidateList - the list of sugar moieties in the given molecule
        Returns:
        a list of atom container objects representing the removed sugar moieties; the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties
      • copy

        protected IAtomContainer copy​(IAtomContainer molecule)
        Generates a very basic copy of the given molecule, intended for testing whether one of its substructures is terminal in isTerminal(IAtomContainer, IAtomContainer, List). Copies atoms (new instances in copy), bonds (new instances in copy), atomic number, implicit hydrogen count, SRU index property, and bond order.
        Parameters:
        molecule - atom container to copy
        Returns:
        basic copy of the molecule
      • addUniqueIndicesToAtoms

        protected void addUniqueIndicesToAtoms​(IAtomContainer moleculeParam)
        Adds an index as property to all atom objects of the given atom container to identify them uniquely within the atom container and its copies. This is required e.g. for the determination of terminal vs. non-terminal sugar moieties.
        Parameters:
        moleculeParam - the molecule that will be processed by the class
      • generateSubstructureIdentifier

        protected String generateSubstructureIdentifier​(IAtomContainer substructure)
        Creates an identifier string for substructures of a molecule, based on the unique indices of the included atoms. It is only encoded which atoms are part of the substructure, no bond information etc. Used for a quick matching of substructures in the same molecule. The unique indices in every atom have to be set.
        Parameters:
        substructure - the substructure to create an identifier for
        Returns:
        the identifier string
      • checkUniqueIndicesOfAtoms

        protected boolean checkUniqueIndicesOfAtoms​(IAtomContainer moleculeParam)
        Checks whether all atoms in the given molecule have a unique (in the given molecule) index as property. It checks the uniqueness of the detected indices but not whether there are numbers missing (the ids of this class are created as numbers starting from zero and growing in integer steps).
        Parameters:
        moleculeParam - the molecule to check
        Returns:
        true if every atom has an index property that is unique in the given molecule
      • detectPotentialSugarCycles

        protected List<IAtomContainer> detectPotentialSugarCycles​(IAtomContainer moleculeParam,
                                                                  boolean includeSpiroRings,
                                                                  boolean ignoreKetoGroups)
        Detects and returns cycles of the given molecule that are isolated (spiro rings included or not according to the boolean parameter), isomorphic to the circular sugar patterns, and only have exocyclic single bonds (keto groups ignored or not according to the boolean parameter). These cycles are the general candidates for circular sugars that are filtered according to the other settings in the following steps. Spiro atoms are marked by a property.
        Parameters:
        moleculeParam - the molecule to extract potential circular sugars from
        includeSpiroRings - specification whether spiro rings should be included in the detected potential sugar cycles or filtered out; for circular sugar detection this should be set according to the current 'detect spiro rings as circular sugars' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'
        ignoreKetoGroups - specification whether potential sugar cycles with keto groups should be included in the returned list; for circular sugar detection this should be set according to the current 'detect circular sugars with keto groups' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'
        Returns:
        a list of the potential sugar cycles
      • areAllExocyclicBondsSingle

        protected boolean areAllExocyclicBondsSingle​(IAtomContainer ringToTest,
                                                     IAtomContainer originalMolecule,
                                                     boolean ignoreKetoGroups)
        Checks whether all exocyclic bonds connected to a given ring fragment of a parent atom container are of single order. If the option to allow potential sugar cycles having keto groups is activated, this method also returns true if a cycle having a keto group is processed.
        The method iterates over all cyclic atoms and all of their bonds. So the runtime scales linear with the number of cyclic atoms and their connected bonds. In principle, this method can be used also for non-cyclic substructures.
        Note: It is not tested whether the original molecule is actually the parent of the ring to test.
        Parameters:
        ringToTest - the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameter
        originalMolecule - the molecule that contains the ring under investigation; The exocyclic bonds will be queried from it
        ignoreKetoGroups - true if this method should ignore keto groups, i.e. also return true if there are some attached to the cycle
        Returns:
        true, if all exocyclic bonds connected to the ring are of single order
      • hasGlycosidicBond

        protected boolean hasGlycosidicBond​(IAtomContainer ringToTest,
                                            IAtomContainer originalMolecule)
        Checks all exocyclic connections of the given ring to detect an O-glycosidic bond. Checklist for glycosidic bond: Connected oxygen atom that is not in the ring, has two bonds that are both of single order and no bond partner is a hydrogen atom. This algorithm also classifies ester bonds as glycosidic bonds and any other bond type that meets the above criteria. Therefore, many 'non-classical, glycoside-like' connections are classified as O-glycosidic bonds.
        Note: The 'ring' is not tested for whether it is circular or not. So theoretically, this method can also be used to detect glycosidic bonds of linear structures. BUT: The oxygen atom must not be part of the structure itself. Due to the processing of candidate linear sugar moieties this can make it difficult to use this method also for linear sugars.
        Note: It is not tested whether the original molecule is actually the parent of the ring to test.
        Parameters:
        ringToTest - the candidate sugar ring
        originalMolecule - the molecule in which the ring is contained as a substructure to query the connected atoms from
        Returns:
        true, if a glycosidic bond is detected
      • isMoleculeEmptyAfterRemovalOfThisRing

        protected boolean isMoleculeEmptyAfterRemovalOfThisRing​(IAtomContainer ringParam,
                                                                IAtomContainer moleculeParam)
        Checks whether the given molecule would be empty after removal of the given ring. Any remaining fragment will be cleared away if it is too small according to the set preservation mode option. The given parameters are not altered, copies of them are generated and processed. This method is intended to test for whether a molecule qualifies for the gylcosidic bond exemption.
        Parameters:
        ringParam - the ring to test whether its removal would result in an empty molecule
        moleculeParam - the parent molecule
        Returns:
        true if the parent molecule is empty after removal of the given ring and subsequent removal of too small remaining fragments
      • getExocyclicOxygenAtomCount

        protected int getExocyclicOxygenAtomCount​(IAtomContainer ringToTest,
                                                  IAtomContainer originalMolecule)
        Returns the number of attached exocyclic oxygen atoms of a given ring in the original atom container. The method iterates over all cyclic atoms and all of their connected atoms. So the runtime scales linear with the number of cyclic atoms and their connected atoms. The oxygen atoms are not tested for being attached by a single bond since in the algorithm, the determination whether a candidate sugar ring has only exocyclic single bonds precedes the calling of this method.
        Note: The circularity of the given 'ring' is not tested, so this method could in theory also be used for linear structures. But his does not make much sense.
        Note: This method does NOT check for hydroxy groups but for oxygen atoms. So e.g. the oxygen atom in a glycosidic bond is counted.
        Note: It is not tested whether the original molecule is actually the parent of the ring to test.
        Parameters:
        ringToTest - the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameter (they will be skipped otherwise)
        originalMolecule - the molecule that contains the ring under investigation; The exocyclic bonds will be queried from it
        Returns:
        number of attached exocyclic oxygen atoms of the given ring
      • doesRingHaveEnoughExocyclicOxygenAtoms

        protected boolean doesRingHaveEnoughExocyclicOxygenAtoms​(int numberOfAtomsInRing,
                                                                 int numberOfAttachedExocyclicOxygenAtoms)
        Simple decision-making function for deciding whether a candidate sugar ring has enough attached, single-bonded exocyclic oxygen atoms according to the set threshold. The given number of oxygen atoms is divided by the given number of atoms in the ring (should also contain the usually present oxygen atom in a sugar ring) and the resulting ratio is checked for being equal or higher than the currently set threshold.
        Note: Only the number of atoms in the ring is checked for not being 0. No further parameter tests are implemented. If the number is 0, false is returned. No exceptions are thrown.
        Parameters:
        numberOfAtomsInRing - number of atoms in the possible sugar ring, including the cyclic oxygen atom
        numberOfAttachedExocyclicOxygenAtoms - number of attached exocyclic oxygen atoms of the ring under investigation (if zero, false is returned)
        Returns:
        true, if the calculated ratio is equal to or higher than the currently set threshold
      • updateLinearSugarPatterns

        protected void updateLinearSugarPatterns()
        All linear sugar patterns represented by atom containers in the respective list are sorted, parsed into actual pattern objects, and stored in the internal list for initial linear sugar detection. To be called when a linear sugar pattern has been deleted or added to the list. It cannot directly be operated on the pattern objects because they cannot be sorted or represented in a human-readable format.
      • detectLinearSugarCandidatesByPatternMatching

        protected List<IAtomContainer> detectLinearSugarCandidatesByPatternMatching​(IAtomContainer moleculeParam)
        Initial detection of linear sugar candidates by substructure search for the linear sugar patterns in the given molecule. All 'unique' matches are returned as atom container objects. this means that the same substructure will not be included multiple times but the substructures may overlap.
        Parameters:
        moleculeParam - the molecule to search for linear sugar candidates
        Returns:
        a list of possibly overlapping substructures from the given molecule matching the internal linear sugar patterns
      • combineOverlappingCandidates

        protected List<IAtomContainer> combineOverlappingCandidates​(List<IAtomContainer> candidateList)
        Combines all overlapping (i.e. sharing the same atoms or bonds) structures in the given list into one atom container, respectively, to return distinct, non-overlapping substructures. Second step of linear sugar detection. Note: The returned substructures can grow very big. This is addressed in the third step. The parameter list is not altered and a completely new list returned.
        Parameters:
        candidateList - a list of possibly overlapping substructures from the same atom container object
        Returns:
        a list of distinct, non-overlapping substructures after combining every formerly overlapping structure
      • splitEtherEsterAndPeroxideBonds

        protected List<IAtomContainer> splitEtherEsterAndPeroxideBonds​(List<IAtomContainer> candidateList)
        Splits all ether, ester, and peroxide bonds in the given linear sugar candidates and separates those that get disconnected in the process. Third step of linear sugar detection. This step was introduced because the linear sugar candidates returned by the combination method can be very big and contain connected sugar chains that should be detected as separate candidates. The detection is done using SMARTS patterns that are constants of this class. The parameter list is not altered and a completely new list returned.
        Parameters:
        candidateList - a list of potential sugar substructures from the same atom container object
        Returns:
        a new list of candidates where all ether, ester, and peroxide bonds have been split and disconnected candidates separated
      • removeAtomsOfCircularSugarsFromCandidates

        protected void removeAtomsOfCircularSugarsFromCandidates​(List<IAtomContainer> candidateList,
                                                                 IAtomContainer parentMolecule)
        Removes all atoms belonging to possible circular sugars, as returned by the method for initial circular sugar detection, from the given linear sugar candidates. Fourth step of linear sugar detection. The linear sugar patterns also match parts of circular sugar, so this step has to be done to ensure the separate treatment of circular and linear sugars. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.
        Parameters:
        candidateList - a list of potential sugar substructures from the same atom container object
        parentMolecule - the molecule that is currently scanned for linear sugars to detect its circular sugars
      • removeCyclicAtomsFromSugarCandidates

        protected void removeCyclicAtomsFromSugarCandidates​(List<IAtomContainer> candidateList,
                                                            IAtomContainer moleculeParam)
        Removes all atoms that are part of a cycle from the given linear sugar candidates. Optional fifth step of linear sugar detection. The linear sugar patterns can also match in cycles that do not represent circular sugars but e.g. pseudo-sugars or macrocycles. It is optional to detect linear sugars in such structures or not. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.
        Parameters:
        candidateList - a list of potential sugar substructures from the same atom container object
        moleculeParam - the molecule that is currently scanned for linear sugars to detect its cycles
      • removeTooSmallAndTooLargeCandidates

        protected List<IAtomContainer> removeTooSmallAndTooLargeCandidates​(List<IAtomContainer> candidateList)
        Discards all linear sugar candidates that are too small or too big according to the current settings. Final step of linear sugar detection. This step was introduced because the preceding steps may produce small 'fragments', e.g. the hydroxy group of a circular sugar that was removed from a linear sugar candidate. These should be filtered out. ALso, a very large linear sugar that does not consist of multiple subunits linked by ether, ester, or peroxide bonds is considered too interesting to remove and should therefore also be filtered from the linear sugars detected for removal. The 'size' of the linear sugar candidates is determined as their carbon atom count. The set minimum and maximum sizes are inclusive. The parameter list is not altered and a completely new list returned.
        Parameters:
        candidateList - a list of potential sugar substructures from the same atom container object
        Returns:
        a new list of candidates where all too small and too big candidates have been filtered out