-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The MInChI Demo page includes some interesting mixfiles (well, if you "copy branch" it's basically a JSON mixfile without mixfileVersion) with unknown InChI structures such as:
- No structures at all: BSA blocking buffer + PBS; bechamel sauce
- Partial lack: Dodecacarbonyltriiron
Right now the produced InChI is a little less than informative for these purposes. I propose adding an optional layer /x (external identifiers) to handle this problem.
/x layer
The /x layer consists of the following parts:
- A main part, consisting of percent-encoded strings separated by the character
&. Characters that MUST be encoded are/,&, unprintable characters, and whitespace characters. (I choose this style because it originates in an environment that uses&and/.)- The use of
+in place of%20for encoding a space is permitted. (Purely aesthetic reasons.)
- The use of
- A mandatory
/nsublayer which is very similar to the/nlayer, but with the ability to associate multiple strings to a substance as well as the ability to name a group. (This will cause some duplication of information in the nesting structure. We already do that with/g.) - An optional
/tsublayer specifying the type of the identifier in the main part. This layer contains a string, each character being a description of the corresponding index in the&-separated field. Acceptable types include (each of these have a Mixfile counterpart):f: formula (likely used when: unknown connectivity so unable to make InChI, has numbers in a range so unable to make InChI)s: SMILESn: Human-readable namek: InChIKey- (I could specify one for Molfile here but the size would be comical. A URL-safe base64 encoding of gzipped Molfile? Nah sounds too complicated.)
- (There are some additional database references that can be added, though these will NOT have a Mixfile counterpart. It could make sense to just write another "name" for now.)
The /x layer shall only appear on non-"standard MInChI", i.e. "MInChI=0.00.1" without the "S". There is too much variability for anything to be reproducible here. Lucky we don't have a MInChIKey...
Basic example (with whitespace added)
MInChI=0.00.1//n{{&}&}/g{{466wf-3&534wf-3}91wf-3&909wf-3}
/xbutter&flour&flour+dispersed+in+butter&milk&bechamel+sauce
/n{{1&2}3&4}5
/tnnnnn
Example of three identifiers on the same thing:
MInChI=0.00.1/C6H14/c1-3-5-6-4-2/h3-6H2,1-2H3/n{&1}/g{1:5pp0&}
/xOctacarbonyldicobalt&Co2(CO)8&PubChem_CID:25049
/n{1,2,3&}
/tnfn
On /n
When an /n sublayer is present, it should have the same "shape-of-braces" as the main /n layer. The format is the same as the main /n layer, with the exception that
- each structure can have multiple descriptions for the main part. This is resolved by allowing the use of a comma
,between numbers describing the same part. - each brace-grouping may have its own label. This is handled by permitting number-lists to be used after the closing brace, before the
&. (This resembles Newick format.)
About names
/x is currently unused and a good sound match. I think it's an acceptable use of a letter, unless someone has some other use in mind (e.g. using /x like the x- prefix of MIME types for experimental/extensions in general).