Class Abbreviations

  • All Implemented Interfaces:
    Iterable<String>

    public class Abbreviations
    extends Object
    implements Iterable<String>
    Utility class for abbreviating (sub)structures. Using either self assigned structural motifs or pre-loading a common set a structure depiction can be made more concise with the use of abbreviations (sometimes called superatoms).

    Basic usage:

    
     Abbreviations abrv = new Abbreviations();
    
     // add some abbreviations, when overlapping (e.g. Me,Et,tBu) first one wins
     abrv.add("[Na+].[H-] NaH");
     abrv.add("*c1ccccc1 Ph");
     abrv.add("*C(C)(C)C tBu");
     abrv.add("*CC Et");
     abrv.add("*C Me");
    
     // maybe we don't want 'Me' in the depiction
     abrv.setEnabled("Me", false);
    
     // assign abbreviations with some filters
     int numAdded = abrv.apply(mol);
    
     // generate all but don't assign, need to be added manually
     // set/update the CDKConstants.CTAB_SGROUPS property of mol
     List<Sgroup> sgroups = abrv.generate(mol);
     

    Predefined sets of abbreviations can be loaded, the following are on the classpath.

    
     // https://www.github.com/openbabel/superatoms
     abrv.loadFromFile("obabel_superatoms.smi");
     
    See Also:
    CDKConstants.CTAB_SGROUPS, Sgroup
    Keywords:
    abbreviate, depict, superatom
    • Constructor Detail

      • Abbreviations

        public Abbreviations()
    • Method Detail

      • iterator

        public Iterator<String> iterator()
        Iterate over loaded abbreviations. Both enabled and disabled abbreviations are listed.
        Specified by:
        iterator in interface Iterable<String>
        Returns:
        the abbreviations labels (e.g. Ph, Et, Me, OAc, etc.)
      • isEnabled

        public boolean isEnabled​(String label)
        Check whether an abbreviation is enabled.
        Parameters:
        label - is enabled
        Returns:
        the label is enabled
      • setEnabled

        public boolean setEnabled​(String label,
                                  boolean enabled)
        Set whether an abbreviation is enabled or disabled.
        Parameters:
        label - the label (e.g. Ph, Et, Me, OAc, etc.)
        enabled - flag the label as enabled or disabled
        Returns:
        the label state was modified
      • with

        public Abbreviations with​(Abbreviations.Option option)
        Convenience method to enable an option.
        Parameters:
        option - the option to enable.
        Returns:
        self, for chaining
      • without

        public Abbreviations without​(Abbreviations.Option option)
        Convenience method to disable an option.
        Parameters:
        option - the option to enable.
        Returns:
        self, for chaining
      • setContractOnHetero

        public void setContractOnHetero​(boolean val)
        Set whether abbreviations should be further contracted when they are connected to a heteroatom, for example -NH-Boc becomes -NHBoc. By default this option is enabled.
        Parameters:
        val - on/off
      • setContractToSingleLabel

        public void setContractToSingleLabel​(boolean val)
      • generate

        public List<Sgroup> generate​(IAtomContainer mol)
        Find all enabled abbreviations in the provided molecule. They are not added to the existing Sgroups and may need filtering.
        Parameters:
        mol - molecule
        Returns:
        list of new abbreviation Sgroups
      • generate

        public List<Sgroup> generate​(IAtomContainer mol,
                                     Map<IAtom,​Integer> atomSets)
        Find all enabled abbreviations in the provided molecule. They are not added to the existing Sgroups and may need filtering.
        Parameters:
        mol - molecule
        atomSets - mark atoms are belong to a set, sets can not be split in an abbreviation
        Returns:
        list of new abbreviation Sgroups
      • apply

        public int apply​(IAtomContainer mol)
        Generates and assigns abbreviations to a molecule. Abbreviations are first generated with generate(org.openscience.cdk.interfaces.IAtomContainer) and then applied to the molecule if it is reasonable to do so. Currently, we count the number of ring/chain atoms in/out of the contraction. If there are more atoms contracted than not it is not applied.
        Parameters:
        mol - molecule
        Returns:
        number of new abbreviations
        See Also:
        generate(IAtomContainer, Map)
      • apply

        public int apply​(IAtomContainer mol,
                         Map<IAtom,​Integer> atomSets)
        Generates and assigns abbreviations to a molecule. Abbreviations are first generated with generate(org.openscience.cdk.interfaces.IAtomContainer) and then applied to the molecule if it is reasonable to do so. Currently, we count the number of ring/chain atoms in/out of the contraction. If there are more atoms contracted than not it is not applied.
        Parameters:
        mol - molecule
        atomSets - atoms, keep these atoms together
        Returns:
        number of new abbreviations
        See Also:
        generate(IAtomContainer, Map)
      • add

        public boolean add​(String line)
                    throws InvalidSmilesException
        Convenience method to add an abbreviation from a SMILES string.
        Parameters:
        line - the smiles to add with a title (the label)
        Returns:
        the abbreviation was added, will be false if no title supplied
        Throws:
        InvalidSmilesException - the SMILES was not valid
      • add

        public boolean add​(IAtomContainer mol,
                           String label)
        Add an abbreviation to the factory. Abbreviations can be of various flavour based on the number of attachments:

        Detached - zero attachments, the abbreviation covers the whole structure (e.g. THF) Terminal - one attachment, covers substituents (e.g. Ph for Phenyl) Linker - [NOT SUPPORTED YET] two attachments, covers long repeated chains (e.g. PEG4)

        Attachment points (if present) must be specified with zero element atoms.

         *c1ccccc1 Ph
         *OC(=O)C OAc
         
        Parameters:
        mol - the fragment to abbreviate
        label - the label of the fragment
        Returns:
        the abbreviation was added
      • loadFromFile

        public int loadFromFile​(String path)
                         throws IOException
        Load a set of abbreviations from a classpath resource or file in SMILES format. The title is seperated by a space.
         *c1ccccc1 Ph
         *c1ccccc1 OAc
         

        Available:

        obabel_superatoms.smi
        https://www.github.com/openbabel/superatoms
        Parameters:
        path - classpath or filesystem path to a SMILES file
        Returns:
        the number of loaded abbreviation
        Throws:
        IOException