A CLI utility to retrieve chemical data (SMILES, InChI, Full Names) from PubChem and map Compound names to their corresponding identifiers.
This package is managed with standard Python tools and can be installed via pip.
run:
pip install chemnamesOr you can get the newest release by installing it from here by running:
pip install git+https://github.com/Desperadus/ChemNamesFor manual editable install follow these steps:
- Clone the repository.
- Install the package in editable mode (or normal mode):
pip install . -eOr using uv to manage the environment:
uv pip install . -eAfter installation, the CLI tool addchemnames will be available in your path.
The main utility reads a CSV file containing a Compound column, queries PubChem, and outputs a new CSV with added SMILES, InChI, and Full Name columns.
Input CSV Format:
The input file MUST contain a column named Compound.
Example input.csv:
Compound,ID
Aspirin,1
Caffeine,2Run the command:
addchemnames input.csv output.csvOutput:
The tool will generate output.csv with the enriched data. If a compound is not found, "xxxxxx" will be used as the placeholder.
If you have a CSV file that already contains PubChem CIDs (e.g., from a previous step or source), you can use add_keggs.py (located in the root of the repository) to append KEGG IDs.
Run the command:
python add_keggs.py input_with_cids.csv output_with_keggs.csv- Network Requests: This tool makes network requests to PubChem. Large files may take some time to process.
- Rate Limiting: The tool uses threading to speed up requests, but be mindful of PubChem's usage policies.
- Data Accuracy: Data is fetched "as is" from PubChem.
MIT