Skip to content

Mblakey/wiswesser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,150 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wiswesser Line Notation (WLN) Chemical Converter & Extractor

WARNING This project is undergoing a complete rework, use branch for stable (but slow) behaviour.

  • WLN Conversion - read and write WLN to/from smiles, inchi, mol files and other chemical line notations.
  • WLN Extraction - extract chemical terms from documents, this machine uses greedy matching to return matched WLN sequences from documents.

These tools are currently only tested on Linux and MacOS. Future update will include windows build.

Requirements

A chemical toolkit, either OpenBabel or RDKit. These can be installed into $PATH, or one directory level up from this project if building from source without a global install.

Source code is labelled C++ for toolkit linking, but is actually written in C. readwln and writewln are designed to be toolkit agnostic, and can be ported to any C/C++ chemical toolkit using macros, see Section Porting.

Build

This project uses cmake and make for building. Standard CMake build applies,

mkdir build
cd build
cmake ..
make

Converting between WLN and CLN Formats

Tools are designed to be piped together and therefore will take stdin by default.

readwln - This takes WLN sequences and returns the desired format.
writewln - This takes CLN sequences and returns WLN strings.

-h - display the help menu
-o|-i - choose output|input format for string, options are -osmi, -oinchi, -okey (inchikey) and -ocan following OpenBabels format conventions

Wiswesser Conversion Release Notes

The following are sections from Elbert G. Smiths rule book that were used to create the wln reader. Note that not all chapters are listed here, only the ones where compound types were introduced.

Please note that the "MANTRAP" rules, are not officialy given in either volume of the offical Wiswesser manuals, as such, implementation is tricky at best. For this parser, they will not be supported.

Rule Read Write
Unbranched and Branched Chains ✔️ ✔️
Systematic Contractions ✔️ ✔️
Organic Salts ✔️ ✔️
Benzene Derivatives ✔️ ✔️
Multisubstituted Benzene Rings ✔️ ✔️
Benzene Rings in Branching Chains ✔️ ✔️
Monocyclic Rings ✔️ ✔️
Bicyclic Rings ✔️ ✔️
Polycyclic Rings ✔️ ✔️
Perifused Rings ✔️ ✔️
Chains of Rings other than Benzene ✔️ ✔️
Sprio Rings ✔️ ✔️
Bicyclic Bridged Rings ✔️ ✔️
Rings with Pseudo Bridges ✔️ ✔️
Ring Structures with Crossed Bonds and Unbranched Bridges ✔️ ✔️
Rings of Rings Contraction ✔️
Metallocenes and Catanenes ✔️ ✔️
Chelete Compounds ✔️ ✔️
Ionic Charges, Free Radicals and Isotopes ✔️ ✔️
Multipliers
Ring Contractions and Multipliers
All Special Problems Rules

WLN Extraction

Grep-style tool for extracting WLN strings from text. This parses files and performs greedy matching to highlight/extract WLN strings from documents. The input format is expected to be a text document or pipe. To extract from pdfs, piping the output of a conversion tool is the currently accepted approach.

wlngrep <options> <filename>

Flags

-c - return number of matches instead of string
-o - print only the matched parts of line

Releases

No releases published

Packages

No packages published