Class RdfileReader

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Iterator<RdfileRecord>

    public final class RdfileReader
    extends Object
    implements Closeable, Iterator<RdfileRecord>
    Iterating reader for RDFiles.

    This class facilitates reading RDFiles the specification of which was initially published in [Dalby, A. et. al.. Journal of Chemical Information and Computer Sciences. 1992. 32] and is now maintained by Daussault Systems [Dassault Systèmes, CTFile Formats Biovia Databases 2020, 2020, Dassault Systèmes, https://discover.3ds.com/sites/default/files/2020-08/biovia_ctfileformats_2020.pdf].

    An RDFile is composed of
    1. an RDFile header
    2. one or more records where each record comprises
      1. an optional internal or external registry number
      2. a molecule represented as a MolFile in V2000 or V3000 format or a reaction represented as an RxnFile in V2000 or V3000 format
      3. an optional data block that consists of one or more (data field identifier, data) pairs
    Here is an example of how to read an RDF that is expected to only contain molecules:
     // read an RDF that is expected to only contain molecules
     List<IAtomContainer> molecules = new ArrayList<>();
     try (RdfileReader rdfileReader = new RdfileReader(new FileReader("molecules.rdf"), SilentChemObjectBuilder.getInstance())) {
         while(rdfileReader.hasNext()) {
           final RdfileRecord rdfileRecord = rdfileReader.next();
           if (rdfileRecord.isMolfile()) {
             molecules.add(rdfileRecord.getAtomContainer());
          } else {
           // create log entry or throw exception as only molecules are expected in this RDF
         }
       }
     }
     

    By default, any remaining records are skipped if an error is encountered in a record. This can be changed by using one of the constructors that allows to provide a boolean value for the argument continueOnError (one takes an #RdfileReader(InputStream,IChemObject,boolean) InputStream, the other one a #RdfileReader(Reader,IChemObject,boolean) Reader).

    Author:
    Uli Fechner
    See Also:
    RdfileRecord
    • Constructor Detail

      • RdfileReader

        public RdfileReader​(InputStream in,
                            IChemObjectBuilder chemObjectBuilder)
        Creates a new RdfileReader instance with the given InputStream and IChemObjectBuilder.
        Parameters:
        in - the InputStream serving the RDfile data
        chemObjectBuilder - the IChemObjectBuilder for creating CDK objects
      • RdfileReader

        public RdfileReader​(InputStream in,
                            IChemObjectBuilder chemObjectBuilder,
                            boolean continueOnError)
        Creates a new RdfileReader instance with the given InputStream and IChemObjectBuilder.

        If continueOnError is true remaining records are processed when an error is encountered; if false all remaining records in the file are skipped.

        Parameters:
        in - the InputStream serving the RDfile data
        chemObjectBuilder - the IChemObjectBuilder for creating CDK objects
        continueOnError - determines whether to continue processing records in case an error is encountered
      • RdfileReader

        public RdfileReader​(Reader reader)
        Creates a new RdfileReader instance with the given InputStream and IChemObjectBuilder.
        Parameters:
        reader - the Reader providing the RDfile data
      • RdfileReader

        public RdfileReader​(Reader reader,
                            IChemObjectBuilder chemObjectBuilder,
                            boolean continueOnError)
        Creates a new RdfileReader instance with the given InputStream and IChemObjectBuilder.

        If continueOnError is true remaining records are processed when an error is encountered; if false all remaining records in the file are skipped.

        Parameters:
        reader - the Reader providing the RDfile data
        chemObjectBuilder - the IChemObjectBuilder for creating CDK objects
        continueOnError - determines whether to continue processing records in case an error is encountered