Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@ api/electrochemistry/reference_electrode.md
api/loaders/baseloader.md
api/loaders/eclabloader.md
api/loaders/gamryloader.md
api/loaders/column_names.md
api/loaders/eclab_fields.md
```
9 changes: 0 additions & 9 deletions doc/api/loaders/column_names.md

This file was deleted.

9 changes: 9 additions & 0 deletions doc/api/loaders/eclab_fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
github_url: https://github.com/echemdb/unitpackage/blob/master/unitpackage/loaders/eclab_fields.py
---

# `unitpackage.loaders.eclab_fields`
```{eval-rst}
.. automodule:: unitpackage.loaders.eclab_fields
:members:
```
6 changes: 6 additions & 0 deletions doc/news/from_loaders.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
**Added:**

* Added `device` parameter to `Entry.from_csv()` to select instrument-specific loaders (e.g., ``device='eclab'`` for BioLogic MPT files, ``device='gamry'`` for Gamry DTA files).
* Added `BaseLoader.metadata` property, which returns file structure information (loader name, delimiter, decimal, header, column headers) stored as ``dsvDescription`` in the entry's metadata.
* Added `EchemdbEntry.from_mpt()` classmethod to load BioLogic EC-Lab MPT files with automatic field updates, renaming, and filtering.
* Added `eclab_fields.py` module (renamed from ``column_names.py``) containing ``biologic_fields`` and ``biologic_fields_alt_names`` for standardized electrochemistry field definitions.
2 changes: 1 addition & 1 deletion doc/news/load-metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* Added `MetadataDescriptor` class for enhanced metadata handling with dict and attribute-style access.
* Added `Entry.default_metadata_key` class attribute to control metadata access patterns in subclasses.
* Added `Entry._default_metadata` property to access the appropriate metadata subset.
* Added `encoding`, `header_lines`, `column_header_lines`, `decimal`, and `delimiters` parameters to `Entry.from_csv()` for handling complex CSV formats.
* Added `encoding`, `header_lines`, `column_header_lines`, `decimal`, `delimiters`, and `device` parameters to `Entry.from_csv()` for handling complex CSV formats and instrument-specific file types.
* Added `create_tabular_resource_from_csv()` to create resources from CSV files with auto-detection of standard vs. complex formats.
* Added `create_df_resource_from_csv()` for creating pandas dataframe resources from CSV files with custom formats.
* Added `create_df_resource_from_df()` for creating resources directly from pandas DataFrames.
Expand Down
44 changes: 44 additions & 0 deletions doc/usage/load_and_save.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,50 @@ csv_entry.fields

For even more complex file formats from laboratory equipment, see the [Loaders](loaders.md) section.

### From specific device file formats

Files from laboratory equipment (devices) often have complex structures with lengthy headers, non-standard delimiters, and instrument-specific column names.
`Entry.from_csv` supports a `device` parameter that selects the appropriate loader for the file format.

For example, loading a BioLogic EC-Lab MPT file:

```{code-cell} ipython3
from unitpackage.entry import Entry

entry = Entry.from_csv(csvname='../../test/loader_data/eclab_cv.mpt', device='eclab')
entry
```

The loader automatically detects headers and delimiters. The resulting entry contains the raw column names from the instrument:

```{code-cell} ipython3
entry.fields
```

Information on the file structure is stored in the entry's metadata under `dsvDescription`:

```{code-cell} ipython3
entry.metadata['dsvDescription']['loader']
```

#### Domain-specific loading

For submodules such as `echemdb`, convenience methods provide additional processing.
`EchemdbEntry.from_mpt` loads an MPT file and then updates the fields with units, renames them to short standardized names, and keeps only the most relevant columns for electrochemistry:

```{code-cell} ipython3
from unitpackage.database.echemdb_entry import EchemdbEntry

entry = EchemdbEntry.from_mpt('../../test/loader_data/eclab_cv.mpt')
entry.df.head()
```

The fields now have units, short names, and a reference to the original BioLogic column name:

```{code-cell} ipython3
entry.fields
```

From a pandas DataFrame:

```{code-cell} ipython3
Expand Down
72 changes: 72 additions & 0 deletions unitpackage/database/echemdb_entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,78 @@ def __repr__(self):
"""
return f"Echemdb({self.identifier!r})"

@classmethod
def from_mpt(cls, csvname, encoding=None):
r"""
Return an :class:`~unitpackage.database.echemdb_entry.EchemdbEntry` from a BioLogic EC-Lab MPT file.

The file is parsed with the ECLabLoader. Fields are updated with
units from ``biologic_fields``
and renamed according to
``biologic_fields_alt_names``
(both defined in :mod:`unitpackage.loaders.eclab_fields`).
The original field names are preserved as ``originalName``.

Only columns whose original names appear in
``biologic_fields_alt_names``
are kept; all other columns are removed.

EXAMPLES::

>>> entry = EchemdbEntry.from_mpt('test/loader_data/eclab_cv.mpt')
>>> entry
Echemdb('eclab_cv')

>>> entry.df.head() # doctest: +NORMALIZE_WHITESPACE
t E I cycle
0 86.761598 0.849737 0.001722 1.0
1 86.772598 0.849149 -0.003851 1.0
...

Fields have units and the original BioLogic column names::

>>> [f for f in entry.fields if f.name == 'E'] # doctest: +NORMALIZE_WHITESPACE
[{'name': 'E', 'type': 'number', 'description': 'WE potential versus REF.',
'unit': 'V', 'dimension': 'E', 'originalName': 'Ewe/V'}]

>>> [f for f in entry.fields if f.name == 't'] # doctest: +NORMALIZE_WHITESPACE
[{'name': 't', 'type': 'number', 'description': 'Time.',
'unit': 's', 'dimension': 't', 'originalName': 'time/s'}]

>>> [f for f in entry.fields if f.name == 'I'] # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
[{'name': 'I', 'type': 'number', 'description': ...,
'unit': 'mA', 'dimension': 'I', 'originalName': '<I>/mA'}]

The loader metadata is stored in the entry's metadata::

>>> entry.metadata['dsvDescription']['loader']
'ECLabLoader'

>>> entry.metadata['dsvDescription']['header'] # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
'EC-Lab ASCII FILE\nNb header lines : 62...'

"""
from unitpackage.loaders.eclab_fields import (
biologic_fields,
biologic_fields_alt_names,
)

entry = cls.from_csv(csvname=csvname, encoding=encoding, device="eclab")
entry = entry.update_fields(biologic_fields)
entry = entry.rename_fields(
biologic_fields_alt_names, keep_original_name_as="originalName"
)

# Only keep columns that were renamed via biologic_fields_alt_names
columns_to_remove = [
f.name
for f in entry.fields
if f.name not in biologic_fields_alt_names.values()
]
entry = entry.remove_columns(*columns_to_remove)

return entry

@property
def bibliography(self):
r"""
Expand Down
89 changes: 53 additions & 36 deletions unitpackage/entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -1085,28 +1085,39 @@ def update_fields(self, fields):
return type(self)(resource=new_resource)

@classmethod
def from_csv(
def from_csv( # pylint: disable=too-many-locals
cls,
csvname,
encoding=None,
header_lines=None,
column_header_lines=None,
decimal=None,
delimiters=None,
device=None,
):
r"""
Returns an entry constructed from a CSV.

The file is always parsed through a loader which captures the file's
structure (delimiter, decimal separator, header, column headers) in the
entry's metadata under ``dsvDescription``.

A ``device`` can be specified to select a device-specific loader
(e.g., ``'eclab'`` or ``'gamry'``).

EXAMPLES::

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv')
>>> entry
Entry('from_csv')

>>> entry.resource # doctest: +NORMALIZE_WHITESPACE
{'name': 'from_csv',
...
The loader's file structure information is stored in the metadata::

>>> entry.metadata['dsvDescription']['loader']
'BaseLoader'
>>> entry.metadata['dsvDescription']['delimiter']
','

.. important::
Upper case filenames are converted to lower case entry identifiers!
Expand All @@ -1117,45 +1128,51 @@ def from_csv(
>>> entry
Entry('uppercase')

Casing in the filename is preserved in the metadata::

>>> entry.resource # doctest: +NORMALIZE_WHITESPACE
{'name': 'uppercase',
'type': 'table',
'path': 'UpperCase.csv',
...

CSV with a more complex structure, such as multiple header lines can be constructed::

>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2)
>>> entry.resource # doctest: +NORMALIZE_WHITESPACE
{'name': 'from_csv_multiple_headers',
'type': 'table',
'data': [],
'format': 'pandas',
'mediatype': 'application/pandas',
'schema': {'fields': [{'name': 'E / V', 'type': 'integer'},
{'name': 'j / A / cm2', 'type': 'integer'}]}}
>>> entry.fields # doctest: +NORMALIZE_WHITESPACE
[{'name': 'E / V', 'type': 'integer'},
{'name': 'j / A / cm2', 'type': 'integer'}]

"""
from unitpackage.local import create_tabular_resource_from_csv

# pylint: disable=duplicate-code
resource = create_tabular_resource_from_csv(
csvname=csvname,
encoding=encoding,
header_lines=header_lines,
column_header_lines=column_header_lines,
decimal=decimal,
delimiters=delimiters,
)
A device-specific loader can be used to parse instrument files::

>>> entry = Entry.from_csv(csvname='test/loader_data/eclab_cv.mpt', device='eclab')
>>> entry
Entry('eclab_cv')

>>> entry.df # doctest: +NORMALIZE_WHITESPACE
mode ox/red error ... (Q-Qo)/C I Range P/W
0 2 1 0 ... 0.000000e+00 41 0.000001
1 2 0 0 ... -3.622761e-08 41 -0.000003
...

>>> entry.metadata['dsvDescription']['loader']
'ECLabLoader'
>>> entry.metadata['dsvDescription']['delimiter']
'\t'

"""
from pathlib import Path

if resource.name == "memory":
resource.name = Path(
csvname
).stem.lower() # Use stem (filename without extension)
from unitpackage.loaders.baseloader import BaseLoader
from unitpackage.local import create_df_resource_from_df

dialect = {
"header_lines": header_lines,
"column_header_lines": column_header_lines,
"decimal": decimal,
"delimiters": delimiters,
}

loader_cls = BaseLoader.create(device) if device else BaseLoader

with open(csvname, "r", encoding=encoding or "utf-8") as f:
loader = loader_cls(f, **dialect)

resource = create_df_resource_from_df(loader.df)
resource.name = Path(csvname).stem.lower()
resource.custom["metadata"] = {"dsvDescription": loader.metadata}

return cls(resource)

Expand Down
Loading