diff --git a/docs/00_intro/00_intro.mdx b/docs/00_intro/00_intro.mdx index e71e7892..b55dad02 100644 --- a/docs/00_intro/00_intro.mdx +++ b/docs/00_intro/00_intro.mdx @@ -7,37 +7,36 @@ id: "intro" import FeatureButton from "@site/src/components/features/FeatureButton.js"; import FloatImage from "@site/src/components/commons/FloatImage.js"; - - -The NFDI4Chem knowledge base provides information and recommendations to digitalise all key steps of chemical research to support scientists in their efforts to collect, store, process, analyse, publish, and reuse research data. This knowledge base is inspired by [RDMkit](https://rdmkit.elixir-europe.org/index.html) but has been tailored specifically towards Chemists as end-users. +# Why is RDM so important for chemistry? -Actions to [promote Open Science and Research Data Management](https://riojournal.com/article/55852/) in accordance with the FAIR data principles are presented by everyday users and range from planning and implementation to publication and reuse. - -## Why is RDM important for chemistry? - -:::danger Notice: Research Data Management in chemistry is currently not systematically organised and individual solutions of single institutions lead to low visibility, accessibility, and usability of research results. The added value of preserving and researching scientific data in chemistry is particularly high because the significance of the data is often immortal, hence, older data can be reused for current investigations. In most cases, it is even mandatory to be able to access older data, since experimental data or complex simulation data in particular can only be regenerated with great effort. -::: Main motivations for RDM in chemistry are: +- to boost sustainability by saving time and resources - to prevent the loss of data and ensure data security - to warrant long-term availability of research data - to accelerate retrieval of data and information - to enhance transparency, reproducibility allow verifiability of research findings -- to boost sustainability by saving time and resources - to enable data reuse in new research projects At least and most importantly a loss of previously acquired data is always an [irretrievable loss of knowledge](https://riojournal.com/article/55852/). ## Navigation through knowledge base + + +The NFDI4Chem knowledge base provides information and recommendations to digitalise all key steps of chemical research to support scientists in their efforts to collect, store, process, analyse, publish, and reuse research data. + +Actions to [promote Open Science and Research Data Management](https://riojournal.com/article/55852/) in accordance with the FAIR data principles are presented by everyday users and range from planning and implementation to publication and reuse. + + :::info Guidance for getting started: -The knowledge base offers different points of entry that help you in navigating the site and simplify the targeted search for information. Start learning more about RDM by selecting a domain or role, viewing the handling data articles, or finding information about electronic lab notebooks as part of a smartlab. Information on the publication of data is also provided. +The knowledge base offers different points of entry that help you in navigating the site and simplify the targeted search for information. Start with the search (Ctrl+K) or use the main navigation bar for topics like domains, roles, handling data articles, smartlab, or publication of data. ::: ### Domain-specific information @@ -89,3 +88,7 @@ In this category on data publishing you will find all the important information imgUrl={"/img/nfdi4chem_Data_Publication.svg"} text={"Data Publishing"} /> + +:::Acknowlegdements +This knowledge base is inspired by [RDMkit](https://rdmkit.elixir-europe.org/index.html) but has been tailored specifically towards Chemists as end-users. +::: diff --git a/docs/00_intro/10_fair.mdx b/docs/00_intro/10_fair.mdx index b756e67f..3b924262 100644 --- a/docs/00_intro/10_fair.mdx +++ b/docs/00_intro/10_fair.mdx @@ -30,9 +30,9 @@ Researchers — and the computers working on their behalf — must be able to fi ### F1. (meta)data are assigned a globally unique and persistent identifier {#f1} -A globally unique and [persistent identifier (PID)](/docs/pid) helps both machines and humans find the data in the first place. These PIDs are essential for research as they guarantee the availability of the associated resource, in this case a dataset. The registry services that make these identifiers available work to maintain the link to the resource, thus avoiding dead links. This ensures the resource remains findable and may be referenced simply by the use of its PID. +A globally unique and [persistent identifier (PID)](/docs/pid) helps both machines and humans find the data in the first place. These PIDs are essential for research as they guarantee the availability of the associated resource, in this case a dataset. The registry services that make these identifiers available work to maintain the link to the resource, thus avoiding dead links. -A common example of a citable PID is the Digital Object Identifier, or [DOI](https://doi.org/10.1000/182). As with many journals, scientific data repositories often assign a DOI automatically. The Registry of Research Data Repositories, [re3data](https://www.re3data.org/), indicates whether a given repository assigns an identifier, along with the PID type. For example, both the [The Cambridge Structural Database (CSD)](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/) and the [Chemotion Repository](https://www.chemotion-repository.net/) assign DOIs to each dataset deposited. Researchers must be aware of this option when searching for a suitable repository, while repositories should offer this service. +A common example of a citable PID is the Digital Object Identifier, or [DOI](https://doi.org/10.1000/182). As with many journals, scientific data repositories often assign a DOI automatically. For example, both the [The Cambridge Structural Database (CSD)](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/) and the [Chemotion Repository](https://www.chemotion-repository.net/) assign DOIs to each dataset deposited. Researchers must be aware of this option when searching for a suitable repository (e.g. at [re3data](https://www.re3data.org/)), while repositories should offer this service. ### F2. data are described with rich metadata (defined by R1 below) {#f2} @@ -44,7 +44,7 @@ Data need to be sufficiently described in order to make them both findable and r - what other data may be related (linked via its PID), and - associated journal publications and their DOI. -Repositories should provide researchers with a fillable [application profile](https://en.wikipedia.org/wiki/Application_profile) that allows researchers to give extensive and precise information on their deposited datasets. For example, the Chemotion Repository uses, among others, the [Datacite Metadata Schema](http://doi.org/10.5438/0012) to build its application profile, a schema specifically created for the publication and citation of research data. [RADAR](https://radar.products.fiz-karlsruhe.de/en), including the variant [RADAR4Chem](https://www.nfdi4chem.de/index.php/2650-2/), has also built [its metadata schema](https://radar.products.fiz-karlsruhe.de/en/radarfeatures/radar-metadatenschema) on Datacite. These include an assortment of mandatory, recommended, and optional metadata properties, allowing for a rich description of the deposited dataset. For those publishing data, always keep in mind: the more information provided, the better. +Repositories should provide a fillable [application profile](https://en.wikipedia.org/wiki/Application_profile) that allows researchers to give extensive and precise information on their deposited datasets. For example, the Chemotion Repository uses and [RADAR4Chem](https://www.nfdi4chem.de/index.php/2650-2/), among others, the [Datacite Metadata Schema](http://doi.org/10.5438/0012) to build its application profile, a schema specifically created for the publication and citation of research data. These include an assortment of mandatory, recommended, and optional metadata properties, allowing for a rich description of the deposited dataset. For those publishing data, always keep in mind: the more information provided, the better. ### F3. metadata clearly and explicitly include the identifier of the data it describes {#f3} @@ -132,10 +132,9 @@ In simple terms: metadata include any relevant history. If the dataset is relate #### R1.3. (meta)data meet domain-relevant community standards {#r1_3} -As research data management and, as such, [data publishing](/docs/data_publishing) becomes more and more prevalent across research areas, [best practices](/docs/best_practice) in the individual communities will arise. This should encompass metadata templates for proper documentation of datasets, how the data should be [organized](/docs/data_organisation), which vocabularies or [ontologies](/docs/ontology) to use, and [file formats](/docs/format_standards). NFDI4Chem is working to establish [metadata and data standards](https://www.nfdi4chem.de/index.php/task-areas/) for the various communities in chemistry. +As research data management and, as such, [data publishing](/docs/data_publishing) becomes more and more prevalent across research areas, [best practices](/docs/best_practice) in the individual communities will arise. This should encompass metadata templates for proper documentation of datasets, how the data should be [organized](/docs/data_organisation), which vocabularies or [ontologies](/docs/ontology) to use, and [file formats](/docs/format_standards). NFDI4Chem is working to establish [metadata and data standards](https://nfdi4chem.de/your-nfdi4chem-team-get-to-know-the-consortium-4/) for the various communities in chemistry. Where available, community standards and best practices should be followed when those publishing prepare their datasets and relevant metadata for publication. [Repositories](/docs/repositories), especially domain-specific service providers, should adhere to the standards set forth by the community by requiring files and metadata to follow format specifications. -As noted in [I1](#i1) experiments. Where required, format converters should be linked in the dataset’s metadata. diff --git a/docs/60_topics/62_data_formats/30_smiles.mdx b/docs/60_topics/62_data_formats/30_smiles.mdx index 30b71388..54dfe2c9 100644 --- a/docs/60_topics/62_data_formats/30_smiles.mdx +++ b/docs/60_topics/62_data_formats/30_smiles.mdx @@ -3,6 +3,22 @@ title: "SMILES" slug: "/smiles" --- -![Under Construction](/img/Constr_2bl.png) +## SMILES (Simplified Molecular Input Line Entry System) -This article is under construction. \ No newline at end of file +SMILES (Simplified Molecular Input Line Entry System) is a compact, text-based notation for representing chemical structures. It encodes molecules as linear strings using ASCII characters and is widely applied in cheminformatics for data exchange, database storage, and computational modeling. SMILES was introduced in the late 1980s and has since become a de facto standard for molecular line notation in many chemical software environments. + +## Basic syntax and examples + +In SMILES, atoms are denoted by their atomic symbols (e.g., C, O, N), and bonds are either implicit or explicitly specified using characters such as "=", "#", or ":". Single bonds are typically omitted. Branching is expressed with parentheses, and ring closures are indicated by matching digits. For example, ethanol can be written as CCO, while cyclohexane is represented as C1CCCCC1. Aromatic atoms are commonly written in lowercase letters (e.g., c1ccccc1 for benzene). SMILES also allows specification of stereochemistry through chiral flags (such as the @ symbol for tetrahedral stereocenters) and double-bond geometry markers (/, \), following defined conventions so that the relative three-dimensional arrangement of substituents can be reconstructed from the linear string. + +## Canonical and isomeric SMILES + +Two related concepts are important in practice: canonical SMILES and isomeric SMILES. Canonical SMILES provide a unique string representation for a given molecular connectivity according to a defined algorithm, facilitating database indexing and comparison. Isomeric SMILES additionally encode isotopic substitution and stereochemical information, enabling different isomers of the same connectivity to be distinguished. Explicit charge annotation, however, is a general feature of SMILES and is not restricted to isomeric forms, so charge can be specified in both canonical and non-canonical, as well as in isomeric and non-isomeric SMILES. + +## Uniqueness and limitations + +Despite its widespread use, SMILES is not intrinsically unique unless canonicalized, and the canonical form produced can depend on the implementation and algorithm used. Nevertheless, its simplicity, human readability, and compatibility with text-based workflows make SMILES a foundational format in modern computational chemistry and chemical data management, and a natural partner for other identifier systems. + +## Tool support and InChI + +Several cheminformatics toolkits, including RDKit and Open Babel, support SMILES parsing, generation, and canonicalization. Extensions such as SMARTS (for substructure searching) and SMIRKS (for reaction transformations) build upon the SMILES syntax. For persistent and standardized identification, the IUPAC International Chemical Identifier (InChI) was later developed as a complementary approach.