Class SmilesGenerator


  • public final class SmilesGenerator
    extends Object
    SMILES [Weininger, David. Journal of Chemical Information and Computer Sciences. 1988. 28, Weininger, David et. al.. Journal of Chemical Information and Computer Sciences. 1989. 29] provides a compact representation of chemical structures and reactions.
    Different flavours of SMILES can be generated and are fully configurable. The standard flavours of SMILES defined by Daylight are:
    • Generic - non-canonical SMILES string, different atom ordering produces different SMILES. No isotope or stereochemistry encoded.
    • Unique - canonical SMILES string, different atom ordering produces the same* SMILES. No isotope or stereochemistry encoded.
    • Isomeric - non-canonical SMILES string, different atom ordering produces different SMILES. Isotope and stereochemistry is encoded.
    • Absolute - canonical SMILES string, different atom ordering produces the same SMILES. Isotope and stereochemistry is encoded.
    To output a given flavour the flags in SmiFlavor are used:
     SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Isomeric);
     
    SmiFlavor provides more fine grained control, for example, for the following is equivalent to SmiFlavor.Isomeric:
     SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Stereo |
                                                  SmiFlavor.AtomicMass);
     
    Bitwise logic can be used such that we can remove options: SmiFlavor.Isomeric ^ SmiFlavor.AtomicMass will generate isomeric SMILES without atomic mass. A generator instance is created using one of the static methods, the SMILES are then created by invoking create(IAtomContainer).
     IAtomContainer  ethanol = ...;
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Generic);
     String          smi     = sg.create(ethanol); // CCO, C(C)O, C(O)C, or OCC
    
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Unique);
     String          smi     = sg.create(ethanol); // only CCO
     
    The isomeric and absolute generator encode tetrahedral and double bond stereochemistry using IStereoElements provided on the IAtomContainer. If stereochemistry is not being written it may need to be determined from 2D/3D coordinates using StereoElementFactory. By default the generator will not write aromatic SMILES. Kekulé SMILES are generally preferred for compatibility and aromaticity can easily be re-perceived by most tool kits whilst kekulisation may fail. If you really want aromatic SMILES the following code demonstrates
     IAtomContainer  benzene = ...;
    
     // 'benzene' molecule has no arom flags, we always get Kekulé output
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Generic);
     String          smi     = sg.create(benzene); // C1=CC=CC=C1
    
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Generic |
                                                   SmiFlavor.UseAromaticSymbols);
     String          smi     = sg.create(benzene); // C1=CC=CC=C1 flags not set!
    
     // Note, in practice we'd use an aromaticity algorithm
     for (IAtom a : benzene.atoms())
         a.setIsAromatic(true);
     for (IBond b : benzene.bond())
         a.setIsAromatic(true);
    
     // 'benzene' molecule now has arom flags, we always get aromatic SMILES if we request it
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Generic);
     String          smi     = sg.create(benzene); // C1=CC=CC=C1
    
     SmilesGenerator sg      = new SmilesGenerator(SmiFlavor.Generic |
                                                   SmiFlavor.UseAromaticSymbols);
     String          smi     = sg.create(benzene); // c1ccccc1
     
    It can be useful to know the output order of SMILES. On input the order of the atoms reflects the atom index. If we know this order we can refer to atoms by index and associate data with the SMILES string. The output order is obtained by parsing in an auxiliary array during creation. The following snippet demonstrates how we can write coordinates in order.
    
     IAtomContainer  mol = ...;
     SmilesGenerator sg  = new SmilesGenerator(SmiFlavor.Generic);
    
     int   n     = mol.getAtomCount();
     int[] order = new int[n];
    
     // the order array is filled up as the SMILES is generated
     String smi = sg.create(mol, order);
    
     // load the coordinates array such that they are in the order the atoms
     // are read when parsing the SMILES
     Point2d[] coords = new Point2d[mol.getAtomCount()];
     for (int i = 0; i < coords.length; i++)
         coords[order[i]] = container.getAtom(i).getPoint2d();
    
     // SMILES string suffixed by the coordinates
     String smi2d = smi + " " + Arrays.toString(coords);
    
     
    Using the output order of SMILES forms the basis of ChemAxon Extended SMILES (CXSMILES) which can also be generated. Extended SMILES allows additional structure data to be serialized including, atom labels/values, fragment grouping (for salts in reactions), polymer repeats, multi center bonds, and coordinates. The CXSMILES layer is appended after the SMILES so that parser which don't interpret it can ignore it. The two aggregate flavours are SmiFlavor.CxSmiles and SmiFlavor.CxSmilesWithCoords. As with other flavours, fine grain control is possible SmiFlavor.
    * the unique SMILES generation uses a fast equitable labelling procedure and as such there are some structures which may not be unique. The number of such structures is generally minimal.
    Author:
    Oliver Horlacher, Stefan Kuhn (chiral smiles), John May
    See Also:
    Aromaticity, Stereocenters, StereoElementFactory, ITetrahedralChirality, IDoubleBondStereochemistry, CDKConstants, SmilesParser
    Source code:
    main
    Belongs to CDK module:
    smiles
    Keywords:
    SMILES, generator
    • Constructor Detail

      • SmilesGenerator

        public SmilesGenerator​(int flavour)
        Create a SMILES generator with the specified SmiFlavor.
         SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Stereo |
                                                      SmiFlavor.Canonical);
         
        Parameters:
        flavour - SMILES flavour flags SmiFlavor
    • Method Detail

      • aromatic

        public SmilesGenerator aromatic()
        Deprecated.
        configure with SmiFlavor
        Derived a new generator that writes aromatic atoms in lower case. The preferred way of doing this is now to use the SmilesGenerator(int) constructor:
         SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.UseAromaticSymbols);
         
        Returns:
        a generator for aromatic SMILES
      • withAtomClasses

        @Deprecated
        public SmilesGenerator withAtomClasses()
        Deprecated.
        configure with SmiFlavor
        Specifies that the generator should write atom classes in SMILES. Atom classes are provided by the CDKConstants.ATOM_ATOM_MAPPING property. This method returns a new SmilesGenerator to use.
         IAtomContainer  container = ...;
         SmilesGenerator smilesGen = SmilesGenerator.unique()
                                                    .atomClasses();
         smilesGen.createSMILES(container); // C[CH2:4]O second atom has class = 4
         
        Returns:
        a generator for SMILES with atom classes
      • generic

        public static SmilesGenerator generic()
        Create a generator for generic SMILES. Generic SMILES are non-canonical and useful for storing information when it is not used as an index (i.e. unique keys). The generated SMILES is dependant on the input order of the atoms.
        Returns:
        a new arbitrary SMILES generator
      • isomeric

        public static SmilesGenerator isomeric()
        Convenience method for creating an isomeric generator. Isomeric SMILES are non-unique but contain isotope numbers (e.g. [13C]) and stereo-chemistry.
        Returns:
        a new isomeric SMILES generator
      • unique

        public static SmilesGenerator unique()
        Create a unique SMILES generator. Unique SMILES use a fast canonisation algorithm but does not encode isotope or stereo-chemistry.
        Returns:
        a new unique SMILES generator
      • absolute

        public static SmilesGenerator absolute()
        Create a absolute SMILES generator. Unique SMILES uses the InChI to canonise SMILES and encodes isotope or stereo-chemistry. The InChI module is not a dependency of the SMILES module but should be present on the classpath when generation absolute SMILES.
        Returns:
        a new absolute SMILES generator
      • createSMILES

        @Deprecated
        public String createSMILES​(IAtomContainer molecule)
        Deprecated.
        use #create
        Create a SMILES string for the provided molecule.
        Parameters:
        molecule - the molecule to create the SMILES of
        Returns:
        a SMILES string
      • createSMILES

        @Deprecated
        public String createSMILES​(IReaction reaction)
        Deprecated.
        use #createReactionSMILES
        Create a SMILES string for the provided reaction.
        Parameters:
        reaction - the reaction to create the SMILES of
        Returns:
        a reaction SMILES string
      • create

        public String create​(IAtomContainer molecule)
                      throws CDKException
        Generate SMILES for the provided molecule.
        Parameters:
        molecule - The molecule to evaluate
        Returns:
        the SMILES string
        Throws:
        CDKException - SMILES could not be created
      • create

        public String create​(IAtomContainer molecule,
                             int[] order)
                      throws CDKException
        Creates a SMILES string of the flavour specified in the constructor and write the output order to the provided array.
        The output order allows one to arrange auxiliary atom data in the order that a SMILES string will be read. A simple example is seen below where 2D coordinates are stored with a SMILES string. This method forms the basis of CXSMILES.
        
         IAtomContainer  mol = ...;
         SmilesGenerator sg  = new SmilesGenerator();
        
         int   n     = mol.getAtomCount();
         int[] order = new int[n];
        
         // the order array is filled up as the SMILES is generated
         String smi = sg.create(mol, order);
        
         // load the coordinates array such that they are in the order the atoms
         // are read when parsing the SMILES
         Point2d[] coords = new Point2d[mol.getAtomCount()];
         for (int i = 0; i < coords.length; i++)
             coords[order[i]] = container.getAtom(i).getPoint2d();
        
         // SMILES string suffixed by the coordinates
         String smi2d = smi + " " + Arrays.toString(coords);
        
         
        Parameters:
        molecule - the molecule to write
        order - array to store the output order of atoms
        Returns:
        the SMILES string
        Throws:
        CDKException - SMILES could not be created
      • create

        public static String create​(IAtomContainer molecule,
                                    int flavour,
                                    int[] order)
                             throws CDKException
        Creates a SMILES string of the flavour specified as a parameter and write the output order to the provided array.
        The output order allows one to arrange auxiliary atom data in the order that a SMILES string will be read. A simple example is seen below where 2D coordinates are stored with a SMILES string. This method forms the basis of CXSMILES.
        
         IAtomContainer  mol = ...;
         SmilesGenerator sg  = new SmilesGenerator();
        
         int   n     = mol.getAtomCount();
         int[] order = new int[n];
        
         // the order array is filled up as the SMILES is generated
         String smi = sg.create(mol, order);
        
         // load the coordinates array such that they are in the order the atoms
         // are read when parsing the SMILES
         Point2d[] coords = new Point2d[mol.getAtomCount()];
         for (int i = 0; i < coords.length; i++)
             coords[order[i]] = container.getAtom(i).getPoint2d();
        
         // SMILES string suffixed by the coordinates
         String smi2d = smi + " " + Arrays.toString(coords);
        
         
        Parameters:
        molecule - the molecule to write
        order - array to store the output order of atoms
        Returns:
        the SMILES string
        Throws:
        CDKException - a valid SMILES could not be created
      • create

        public String create​(IReaction reaction)
                      throws CDKException
        Create a SMILES for a reaction of the flavour specified in the constructor.
        Parameters:
        reaction - CDK reaction instance
        Returns:
        reaction SMILES
        Throws:
        CDKException
      • create

        public String create​(IReaction reaction,
                             int[] ordering)
                      throws CDKException
        Create a SMILES for a reaction of the flavour specified in the constructor and write the output order to the provided array.
        Parameters:
        reaction - CDK reaction instance
        Returns:
        reaction SMILES
        Throws:
        CDKException
      • setUseAromaticityFlag

        @Deprecated
        public void setUseAromaticityFlag​(boolean useAromaticityFlag)
        Deprecated.
        since 1.5.6, use aromatic() - invoking this method does nothing
        Indicates whether output should be an aromatic SMILES.
        Parameters:
        useAromaticityFlag - if false only SP2-hybridized atoms will be lower case (default), true=SP2 or aromaticity trigger lower case