9.6 Chemical Graph Formats

A large number of file formats have been used to encode chemical graphs in various software packages. CACTVS [101], a popular chemical software package, supports many file formats and incorporates an input/output module manager. BABEL [102] is one of the first chemoinformatics tools evolved from a format conversion utility. One of the most popular file formats, MDL molfile [103], provides a simple way to represent molecules, is easy to use, and includes support for all chemical properties of atoms and bonds including atomic coordinates.

The Chemical Markup Language [104] (CML) uses SGML and XML and provides a rich and flexible format for providing chemical information. A more compact popular format that includes all the necessary information in chemical diagrams, including stereodescriptors and aromaticity, is the SMILES string notation. It is easy to understand and naturally extends the notion of the molecular formula, but it was not designed as a way to provide a molecule's identity since it does not include a clear, documented method for canonicalizing. This contrasts with IUPAC INChi, which is by definition unique for each molecule for various layers of isomerization.

Get Analysis of Complex Networks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.