diff --git a/docs/user-guide/doc-odm-user-guide/about-sc-hdf5-transformations.md b/docs/user-guide/doc-odm-user-guide/about-sc-hdf5-transformations.md
new file mode 100644
index 0000000..4b71ec9
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/about-sc-hdf5-transformations.md
@@ -0,0 +1,98 @@
+# Single-Cell HDF5 Transformations Overview
+
+This transformation converts a single-cell HDF5 file into the ODM-compatible output files. It extracts expression data and related cell metadata, and can optionally harmonize metadata and create or update biosample objects in ODM. The output files are then imported and linked automatically.
+
+The result is feature-level indexed data that is ready for downstream analysis and cross-study discovery without manual file preparation.
+
+## The ODM entity model for single-cell data
+
+Understanding the transformation requires familiarity with how ODM represents single-cell experiments. ODM organises data around a hierarchy of entities:
+
+- **Sample, Library, and Preparation groups** (collectively referred to as SLP) represent the biological and experimental context of the data. A Sample describes a biological specimen; a Library describes the sequencing library prepared from it; a Preparation describes a preparation step. These entities already exist in ODM for most studies, or can be created by the transformation itself.
+
+- **A Cell Group** represents the collection of individual cells from an experiment, together with their metadata. Each Cell Group must be linked to exactly one parent SLP entity (a Sample, Library, or Preparation group). This linkage is what allows ODM to associate cell-level observations with the correct experimental context.
+
+- **An Expression Group** represents the gene-by-cell expression matrix, compressed for efficient retrieval, together with computed dataset statistics. An Expression Group is always linked to a Cell Group.
+
+The transformation creates the Cell Group and Expression Group and links them into the existing (or newly created) SLP structure. This is why the configuration requires specifying how the resulting Cell Group should be connected to its parent — the linking step is fundamental to how ODM organises and queries the data.
+
+## What the transformation reads from the source file
+
+The transformation extracts three types of data from a HDF5 source file:
+
+**Cell metadata** — extracted primarily from the `obs` in H5AD input file, or the equivalent structure in 10x H5 input. This includes per-cell annotations such as barcodes, cluster assignments, quality control metrics, and any other experimental annotations. Multidimensional representations stored in `obsm` (such as PCA or UMAP coordinates) and pairwise cell annotations from `obsp` can also be extracted.
+
+**Feature metadata** — extracted from `var`, and optionally from `varm` and `varp`. This includes per-gene annotations such as gene identifiers and gene names. For supported species, the transformation can also map Ensembl or NCBI gene identifiers to gene names automatically (see [Gene ID to name mapping](attribute-mapping.md#gene-id-to-name-mapping)).
+
+**The expression matrix** — extracted from `X`, which contains count or normalized expression values. The transformation validates the matrix dimensions against the extracted cell and feature metadata, then writes the matrix in a Brotli-compressed format optimized for ODM ingestion.
+
+## The role of metadata curation
+
+Metadata curation is optional, but strongly recommended. It standardizes cell metadata so that it can be imported, linked, and indexed correctly in ODM. Certain fields must use the expected names and data types to ensure consistent linking and indexing. The transformation handles this for the user during processing. 
+
+As part of curation, the transformation performs automatic attribute mapping: commonly used attribute names from tools such as Seurat, Scanpy, or Cell Ranger are recognized and renamed to the canonical ODM API names without any configuration. Automatic attribute mapping helps harmonizing metadata across datasets, which is essential for cross-study search and downstream analysis. Attributes that do not match any known name are retained and their names are automatically converted to camelCase for consistency with the ODM naming convention. For the full list of recognized names, see the [Attribute Mapping Reference](attribute-mapping.md). 
+
+Curation is applied only to the data produced by the transformation for import into ODM. The source file is not modified.
+
+## Biosample metadata and the aggregation model
+
+Some single-cell datasets store tissue, disease, or other biosample-level attributes in cell metadata, repeating the same values for every cell. The transformation can aggregate these attributes into related biosample object: Sample, Library, or Preparation (SLP) objects in ODM.
+
+Aggregation is performed by grouping cells using a designated biosample identifier. Only attributes that are consistent across all cells in the same biosample can be assigned to related biosample objects.
+
+Attributes assigned to biosample objects are automatically removed from the cell metadata. This reduces duplication and improves the overall structure of the imported data.
+
+## Linking created objects
+
+When the transformation uploads a Cell Group, it links it to a parent Sample, Library, or Preparation entity (SLP).
+
+This is usually handled automatically. If the transformation creates new SLP objects, the Cell Group is linked to them. Otherwise, the transformation identifies the most appropriate existing SLP target in ODM. Users can override the automatic behavior by specifying the target explicitly in the configuration.
+For details, see [Linking group determination](transformation-process-reference.md#13-linking-group-determination).
+
+The Expression Group created by the transformation is linked to the corresponding Cell Group .
+
+## Dry run mode
+
+Dry run mode lets users validate the transformation setup before running a full import. In this mode, the transformation performs the initial processing steps, including reading the input, extracting metadata, applying curation, and running validation checks. It skips the most time-consuming output-generation steps, such as creating the expression matrix, and does not upload data to ODM.
+
+Dry run mode is useful for checking that the configuration works as expected and that the required inputs, metadata mappings, and linkage settings are resolved correctly before a full run.
+
+When `biosample_metadata` is configured without any `columns_to_export` entries, dry run mode can also be used to inspect which attributes are uniform within each biosample and therefore eligible for re-assigning.
+
+The recommended approach is to iterate on the configuration using dry runs until warnings are resolved, and then run the full transformation. For details, see [How to iterate on a configuration using dry runs](how-to-sc-hdf5-transformations.md#how-to-iterate-on-a-configuration-using-dry-runs).
+
+## Processors Controller API: configurations, images, and jobs
+
+The transformation is managed through the ODM Processors Controller API. It is based on three related components: configurations, images, and jobs.
+
+**Transformation configurations** are JSON documents that define how input files should be processed, including the input format, metadata extraction, and curation rules. Configurations can be created, retrieved, and updated independently of any particular run. The same configuration can be reused across multiple files with the same structure.
+
+**Transformation images** are versioned container images that run the processing logic. Available image versions can be queried through the API. The image used for single-cell HDF5 files is `hdf5-cells`. When starting a job, users can specify either `latest` or a specific release tag.
+
+**Transformation jobs** are the execution records. A job combines a configuration, an image, and one or more input files, runs the transformation, and produces the output and logs. Jobs are independent, so the same input can be run again with a different configuration or image when needed.
+
+## Transformation logs
+
+Each transformation job produces a log that records the processing steps, warnings, detected issues, and created outputs. The log also includes provenance information, such as the source file name and accession, and the accessions of the created objects.
+As part of the transformation, the log is uploaded to ODM and stored with the study as an attachment alongside the other generated files. This provides a persistent record of the transformation output. Logs are also available through the API for a limited time. By default, this retention period is two weeks.
+
+## Supported input formats
+
+The transformation supports the following HDF5-based input formats:
+
+- **H5AD (AnnData)** — the native format of the AnnData Python library, widely used for single-cell data processing.
+- **10x Genomics H5** — converted internally to H5AD before processing, so the same extraction workflow is used regardless of the input format.
+- **Legacy 10x Genomics H5 (v<3)** — supported only for files containing a single genome. Multi-genome legacy files are not supported.
+
+## Known limitations
+
+Currently, only one transformation process can be run per attachment. If there is a need to run another transformation job on the same data, a new copy of attachment should be imported or a new study should be created.
+
+## See also
+
+- [Single-cell data in ODM: Getting Started](quickstart-sc.md) - quick start tutorial for working with single-cell data.
+- [How-to Guides](how-to-sc-hdf5-transformations.md) — step-by-step guidance for running the transformation.
+- [Configuration Reference](configuration-reference.md) — full configuration schema.
+- [Transformation Process Reference](transformation-process-reference.md) — internal processing pipeline.
+- [API Reference](api-reference.md) — API endpoints.
+
diff --git a/docs/user-guide/doc-odm-user-guide/api-reference.md b/docs/user-guide/doc-odm-user-guide/api-reference.md
new file mode 100644
index 0000000..4f896cd
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/api-reference.md
@@ -0,0 +1,256 @@
+# API Reference: Single-Cell HDF5 Transformation (Processors Controller)
+
+> **Related documentation:** For conceptual background on configurations, images, and jobs, see [About Single-Cell HDF5 Transformations in ODM](about-sc-hdf5-transformations.md). For step-by-step usage of these endpoints, see the [Single-cell data in ODM: Getting Started](quickstart-sc.md) and [How-to Guides](how-to-sc-hdf5-transformations.md). For the configuration `data` object schema, see the [Configuration Reference](configuration-reference.md).
+
+This reference describes all endpoints in the ODM Processors Controller API used to manage and execute single-cell HDF5 transformations. Endpoints are grouped into three resources: Transformation Configurations, Transformation Images, and Transformation Jobs.
+
+---
+
+## Quick Reference
+
+| Operation | Method | Endpoint |
+|---|---|---|
+| List configurations | `GET` | `/api/v1/transformations/configurations` |
+| Get a configuration | `GET` | `/api/v1/transformations/configurations/{id}` |
+| Create a configuration | `POST` | `/api/v1/transformations/configurations` |
+| Update a configuration | `PUT` | `/api/v1/transformations/configurations/{id}` |
+| List images | `GET` | `/api/v1/transformations/images` |
+| Submit a job | `POST` | `/api/v1/transformations/jobs` |
+| Get job status | `GET` | `/api/v1/transformations/jobs/{id}` |
+| Retrieve job logs | `POST` | `/api/v1/transformations/jobs/{id}/logs` |
+
+---
+
+## Transformation Configurations
+
+A transformation configuration is a stored JSON document that defines how a source file should be processed. It contains a human-readable name and description alongside the `data` object, which is the full processing specification passed to the transformation image.
+
+Configurations are independent of any particular run. The same configuration can be reused across multiple jobs and updated iteratively without affecting previous job results.
+
+### List configurations
+
+```
+GET /api/v1/transformations/configurations
+```
+
+Returns an array of configuration objects. Each entry includes:
+
+| Field | Type | Description |
+|---|---|---|
+| `id` | integer | Unique identifier for the configuration |
+| `name` | string | Human-readable name |
+| `description` | string | Human-readable description |
+
+Use this endpoint to discover existing configurations before deciding to create a new one or reuse an existing one.
+
+### Get a configuration
+
+```
+GET /api/v1/transformations/configurations/{id}
+```
+
+Returns the full configuration object, including the `data` field with all processing rules. Use this to inspect an existing configuration before deciding to update or reuse it.
+
+**Path parameters:**
+
+| Parameter | Type | Description |
+|---|---|---|
+| `id` | integer | ID of the configuration to retrieve |
+
+### Create a configuration
+
+```
+POST /api/v1/transformations/configurations
+```
+
+Creates a new transformation configuration and returns its assigned `id`.
+
+**Request body:**
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `name` | string | Yes | Human-readable name for this configuration |
+| `description` | string | Yes | Human-readable description |
+| `data` | object | Yes | The processing specification. See the [Configuration Reference](configuration-reference.md) for the full schema. |
+
+**Example request body:**
+
+```json
+{
+  "name": "minimal_config",
+  "description": "Minimal transformation config for H5AD files",
+  "data": {
+    "file_type": "h5ad",
+    "biosample_metadata": null,
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata"
+      }
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata"
+      }
+    },
+    "cell_expression": {
+      "data_class": "Single-cell transcriptomics"
+    }
+  }
+}
+```
+
+**Response:** The response object includes the `id` assigned to the new configuration. This `id` is required when submitting a job.
+
+### Update a configuration
+
+```
+PUT /api/v1/transformations/configurations/{id}
+```
+
+Fully replaces the configuration at the given `id` with the provided content. 
+
+**Path parameters:**
+
+| Parameter | Type | Description |
+|---|---|---|
+| `id` | integer | ID of the configuration to update |
+
+**Request body:** Same structure as `POST /api/v1/transformations/configurations`.
+
+---
+
+## Transformation Images
+
+A transformation image is a versioned, containerized processing environment that executes the transformation logic for a specific input format. Images are managed separately from configurations, enabling version-controlled upgrades.
+
+### List images
+
+```
+GET /api/v1/transformations/images
+```
+
+Returns an array of available image objects.
+
+**Response fields per image:**
+
+| Field | Description |
+|---|---|
+| `name` | Identifier used when referencing the image in a job (e.g. `"hdf5-cells"`) |
+| `description` | Human-readable description of the image's purpose |
+| `input_formats` | File formats accepted as input |
+| `output_formats` | File formats produced as output |
+| `version` | Version tag (e.g. `"latest"` or a specific release tag such as `"0.0.7"`) |
+
+Use this endpoint to confirm image availability and identify the version to specify when submitting a job.
+
+---
+
+## Transformation Jobs
+
+A transformation job binds a configuration and an image to one or more input file accessions and executes the processing pipeline. Each job produces an execution log and, when not in dry-run mode, creates or updates ODM objects.
+
+### Submit a job
+
+```
+POST /api/v1/transformations/jobs
+```
+
+Creates and submits a new transformation job. The response includes the `id` of the created job, which is required for status and log queries.
+
+**Request body:**
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `configuration_id` | integer | Yes | ID of the transformation configuration to use |
+| `dry_run` | boolean | Yes | `true` to simulate the run without writing data to ODM; `false` for a full run |
+| `image_reference` | object | Yes | Specifies the image to use. Contains `name` (string) and `version` (string). |
+| `input_accessions` | array of strings | Yes | ODM accessions of the input files to process |
+| `volume_size` | integer | Yes | Scratch volume size in GB allocated for the job |
+
+**`image_reference` fields:**
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Image name. Use `"hdf5-cells"` for single-cell HDF5 transformations. |
+| `version` | string | Version tag. Use `"latest"` or a specific release tag (e.g. `"0.0.7"`). |
+
+**`volume_size` guidelines:**
+
+| Input format | Recommended `volume_size` |
+|---|---|
+| H5AD | ≥ 1.4 × size of the original attachment (GB) |
+| 10x H5 | ≥ 4 × size of the original attachment (GB) |
+
+H5 files require significantly more scratch space due to the internal conversion to H5AD format.
+
+**Example request body (dry run):**
+
+```json
+{
+  "configuration_id": 42,
+  "dry_run": true,
+  "image_reference": {
+    "name": "hdf5-cells",
+    "version": "latest"
+  },
+  "input_accessions": ["GSF020408"],
+  "volume_size": 30
+}
+```
+
+**Example request body (full run):**
+
+```json
+{
+  "configuration_id": 42,
+  "dry_run": false,
+  "image_reference": {
+    "name": "hdf5-cells",
+    "version": "latest"
+  },
+  "input_accessions": ["GSF020408"],
+  "volume_size": 30
+}
+```
+
+### Get job status
+
+```
+GET /api/v1/transformations/jobs/{id}
+```
+
+Returns the job object, including the current `status.state`.
+
+**Path parameters:**
+
+| Parameter | Type | Description |
+|---|---|---|
+| `id` | integer | ID of the job to query |
+
+**`status.state` values:**
+
+| State | Meaning |
+|---|---|
+| `RUNNING` | Job is in progress |
+| `DONE` | Job finished successfully |
+| `FAILED` | Job encountered an error |
+
+### Retrieve job logs
+
+```
+POST /api/v1/transformations/jobs/{id}/logs
+```
+
+Returns the log records for the specified job. Logs include:
+
+- Configuration validation messages.
+- Input file structure report (keys, data types, shapes, attribute names).
+- Warnings and errors encountered during metadata extraction and curation.
+- Linking validation results (dry-run only).
+- Accessions of ODM objects created or updated (full run only).
+
+**Path parameters:**
+
+| Parameter | Type | Description |
+|---|---|---|
+| `id` | integer | ID of the job whose logs to retrieve |
diff --git a/docs/user-guide/doc-odm-user-guide/attribute-mapping.md b/docs/user-guide/doc-odm-user-guide/attribute-mapping.md
new file mode 100644
index 0000000..c4bc1be
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/attribute-mapping.md
@@ -0,0 +1,62 @@
+# Attribute Mapping Reference
+
+During metadata curation, the transformation automatically maps commonly used attribute names found in source HDF5 files to the canonical ODM API names.
+
+Mapping is applied separately to cell metadata and feature metadata. When an attribute in the source file matches one of the known alternative names listed below, it is renamed to the corresponding ODM API display name. Attributes that do not match any known name are converted to camelCase.
+
+## Cell metadata attributes
+
+The table below lists the canonical ODM API name for each attribute alongside the alternative source names that are automatically recognized.
+
+| ODM API display name | Alternative names |
+|---|---|
+| cellID | — |
+| barcode | — |
+| batch | `sample_id`, `sample`, `run_id` |
+| cellType | `cell_type`, `celltype`, `ident`, `labels` |
+| cluster | `cluster_louvain`, `cluster_leiden`, `seurat_clusters` |
+| nCounts | `n_counts`, `umi_count`, `nCount_RNA`, `total_umi`, `n_umi`, `n_reads`, `nUMI`, `UMI_count` |
+| percentMito | `percent_mito`, `percent_mt`, `percent.mt`, `pct_mt`, `pct_mito`, `pct_counts_mito`, `percent.mito`, `percent.mito.raw`, `mito_ratio`, `pct_counts_mt` |
+| umap | `X_umap`, `UMAP` |
+| pca | `X_pca`, `PCA` |
+| tsne | `X_tsne`, `tSNE` |
+| pcaHarmony | `pca_harmony`, `X_harmony`, `harmony_embedding`, `X_pca_harmony` |
+| nGenes | `n_genes`, `n_genes_by_counts`, `nGene`, `n_features`, `nFeature_RNA`, `genes_detected`, `detected_genes`, `gene_count`, `Total_Genes_Detected` |
+| mitoCounts | `mito_counts`, `total_counts_mt`, `total_counts_mito`, `subsets_mt_sum`, `mt_sum`, `MT_sum` |
+| riboCounts | `ribo_counts`, `total_counts_ribo`, `total_counts_rb`, `subsets_ribo_sum`, `rb_counts`, `rb_sum` |
+| percentRibo | `percent_ribo`, `percent_rb`, `percent.rb`, `pct_counts_ribo`, `ribo_ratio`, `pct_ribo`, `pct_counts_rb`, `pct_counts_rrna` |
+| percentHemoglobin | `percent_hb`, `pct_hb`, `hemoglobin_fraction`, `prop_hb`, `percent_hemoglobin` |
+| doubletStatus | `doublet_status`, `is_doublet`, `predicted_doublet`, `multiplet_status` |
+| doubletScore | `doublet_score`, `scrublet_score`, `doublet_probability`, `multiplet_score`, `doublet_stat` |
+| sScore | `S_score`, `s.score`, `S.Score`, `s_phase_score`, `S_phase_probability` |
+| g2mScore | `G2M_score`, `g2m.score`, `G2M.Score`, `g2m_phase_score`, `G2M_phase_probability` |
+| cellCycle | `phase`, `cell_cycle_phase`, `cc_phase`, `cycle_stage` |
+| ambientFraction | `ambient_fraction`, `decontX_score`, `rho`, `contamination_fraction`, `ambient_rna_percent`, `soup_fraction`, `soup_frac` |
+
+## Feature metadata attributes
+
+The table below lists the canonical ODM API name for each feature attribute alongside the alternative source names that are automatically recognized.
+
+| ODM API display name | Alternative names |
+|---|---|
+| geneId | `gene_id` (index), `gene_ids`, `ensembl_id`, `feature_id`, `stable_id`, `ENSEMBL` |
+| gene | `symbol`, `symbols`, `gene_symbol`, `gene_symbols`, `feature_name`, `display_name`, `name`, `gene_name` |
+| totalCounts | `total_counts`, `gene_total`, `sum_counts`, `count_sum`, `total_umis` |
+| nCellsByCounts | `n_cells_by_counts`, `n_cells`, `num_cells`, `n_obs`, `num_cells_expressed` |
+| meanCounts | `mean_counts`, `avg_exp`, `obs_mean`, `means` |
+| pctDropoutByCounts | `pct_dropout_by_counts`, `pct_dropout`, `percent_dropout`, `dropout_rate` |
+
+### Gene ID to name mapping
+
+When feature metadata contains a `geneId` column but no gene name column, the transformation can automatically resolve gene names from a built-in reference. This is controlled by the `map_gene_ids_to_names` parameter in the `feature_metadata` configuration block, which is enabled by default. Set it to `false` for proteomics or other non-gene-ID data where this behaviour is not appropriate.
+
+The mapping is performed using Ensembl and NCBI reference data. Both Ensembl gene IDs (e.g. `ENSG...`) and NCBI gene IDs are supported. The following organisms are supported in `hdf5-cells`:
+
+| Organism | Genome version | Ensembl release | NCBI release |
+|----------|----------------|-----------------|--------------|
+| *Homo sapiens* | GRCh38.p14 | 115 | GCF_000001405.40-RS_2025_08 |
+| *Mus musculus* | GRCm39 | 115 | GCF_000001635.27-RS_2024_02 |
+| *Rattus norvegicus* | GRCr8 | 115 | GCF_036323735.1-RS_2024_02 |
+| *Sus scrofa* | Sscrofa11.1 | 115 | 106 |
+
+> The gene ID column must be named `geneId` for mapping to be performed. If the column has a different name in the source file, ensure it is covered by the feature metadata attribute mapping above so that it is renamed to `geneId` before this step runs.
\ No newline at end of file
diff --git a/docs/user-guide/doc-odm-user-guide/configuration-reference.md b/docs/user-guide/doc-odm-user-guide/configuration-reference.md
new file mode 100644
index 0000000..e1f6742
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/configuration-reference.md
@@ -0,0 +1,145 @@
+# Configuration Reference: Single-Cell HDF5 Transformation
+
+> **Related documentation:** [About SC HDF5 Transformations](about-sc-hdf5-transformations.md) · [How-to Guides](how-to-sc-hdf5-transformations.md) · [API Reference](api-reference.md) · [Transformation Process Reference](transformation-process-reference.md)
+
+The configuration is validated at the start of every run. If `file_type` is missing or invalid, the pipeline raises an error immediately. All other validation errors are collected and reported together. Unrecognised keys are ignored with a warning.
+
+---
+
+## Top-level parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_type` | `string` | **Yes** | — | Format of the input file. Accepted values: `"h5ad"`, `"h5"`. |
+| `save_logs` | `boolean` | No | `true` | When `false`, logs are not saved as an attachment after the run. Has no effect when the job is submitted with `dry_run: true`. |
+
+---
+
+## `biosample_metadata`
+
+Settings for extracting, transforming, and exporting cell-level metadata to Sample, Library, or Preparation entities. The entire section is optional.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `metadata_keys` | `dict[string, string]` | Yes | — | Maps HDF5 group keys to metadata types. Use `"obs": "metadata"` to read standard cell metadata. |
+| `biosample_column_name` | `string` | Yes | — | Column identifying which biosample each cell belongs to. Rows are grouped by this column for aggregation. |
+
+**`metadata_keys` example:**
+```json
+{ "obs": "metadata" }
+```
+
+### `biosample_metadata.sample`
+
+Settings for exporting metadata to the Sample entity. Optional.
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `create_new_group` | `boolean` | `false` | When `true`, creates a new Sample group in ODM and links it to the study. |
+| `template_id` | `string` | — | Template ID for the new Sample group. Falls back to the study default if omitted. |
+| `columns_to_export` | `list[string]` | — | Cell metadata columns to include in the exported Sample metadata. Only columns constant per biosample are eligible; exported columns are dropped from cell metadata. |
+| `columns_renaming_map` | `dict[string, string]` | — | Maps source column names to new names in the exported metadata. |
+| `columns_to_fill_missing_values` | `dict[string, string]` | — | Default values for missing entries in specified columns. |
+| `columns_to_curate_values` | `dict[string, dict[string, string]]` | — | Maps specific values in a column to replacement values. |
+
+**Examples:**
+```json
+{ "columns_renaming_map": { "tissue_type": "tissueType" } }
+{ "columns_to_fill_missing_values": { "disease": "unknown" } }
+{ "columns_to_curate_values": { "tissue": { "PBMCs": "peripheral blood mononuclear cells" } } }
+```
+
+### `biosample_metadata.library`
+
+Accepts the same parameters as `biosample_metadata.sample`, plus:
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `linking_group` | `string` | — | Accession of an existing Sample group to link the new Library group to. If omitted, the pipeline uses a Sample group from the same run or pre-fetched accessions. |
+
+### `biosample_metadata.preparation`
+
+Accepts the same parameters as `biosample_metadata.library`, including `linking_group`.
+
+> **Constraint:** Only one of `library` or `preparation` may have `columns_to_export` set in the same configuration.
+
+---
+
+## `cell_metadata`
+
+Settings for extracting and transforming cell-level metadata. Optional. If absent, no Cell Group is created.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `metadata_keys` | `dict[string, string]` | Yes | — | Maps HDF5 group keys to metadata types. At least one key with value `"metadata"` is required. |
+| `linking_group` | `dict[string, string \| list[string] \| null]` | No | — | Specifies the parent SLP entity (`sample`, `library` or `preparation`) to link the Cell Group to. Empty value triggers auto-discovery of all available accessions. For full linking resolution rules, see [Linking group determination](transformation-process-reference.md#13-linking-group-determination). |
+| `columns_to_drop` | `list[string]` | No | — | Column names to remove before processing. |
+| `columns_renaming_map` | `dict[string, string]` | No | — | Maps source column names to new names. |
+| `columns_to_fill_missing_values` | `dict[string, string]` | No | — | Default values for missing entries. |
+| `columns_to_curate_values` | `dict[string, dict[string, string]]` | No | — | Replacement values for specific entries in specified columns. |
+| `set_column_value` | `dict[string, string]` | No | — | Sets a constant value for all rows. Can add new columns or overwrite existing ones. |
+| `columns_to_preserve_name` | `list[string]` | No | — | Columns to exempt from internal name standardisation (e.g. Leiden cluster columns with decimal suffixes). |
+| `add_qc_metrics` | `boolean` | No | `true` | When `true`, adds QC metrics (counts, genes, mitochondrial/ribosomal presence) if not already present. Skipped when the job is submitted with `dry_run: true`. |
+
+**`metadata_keys` accepted values (H5AD):**
+
+| Key | Value | Description |
+|-----|-------|-------------|
+| `obs` | `metadata` | Standard cell annotations |
+| `obsm` | `embedding` | Multidimensional cell data (PCA, UMAP, etc.) |
+| `obsp` | `pairwise` | Pairwise cell annotations |
+
+For H5 files, use the same H5AD key names — the transformation maps them to the correct internal structure.
+
+**Examples:**
+```json
+{ "metadata_keys": { "obs": "metadata", "obsm": "embedding" } }
+{ "linking_group": { "library": "GSF017080" } }
+{ "columns_to_drop": ["taxon", "organism_id"] }
+{ "columns_renaming_map": { "sample": "batch", "pctmt": "percentMito" } }
+{ "set_column_value": { "sample_id": "lung_1" } }
+{ "columns_to_preserve_name": ["cluster_leiden_0.5"] }
+```
+
+---
+
+## `feature_metadata`
+
+Settings for extracting and transforming feature (gene)-level metadata. Optional.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `metadata_keys` | `dict[string, string]` | Yes | — | Maps HDF5 group keys to metadata types. At least one key with value `"metadata"` is required. |
+| `columns_to_drop` | `list[string]` | No | — | Column names to remove from feature metadata. |
+| `columns_renaming_map` | `dict[string, string]` | No | — | Maps source column names to new names. |
+| `columns_to_fill_missing_values` | `dict[string, string]` | No | — | Default values for missing entries. |
+| `columns_to_curate_values` | `dict[string, dict[string, string]]` | No | — | Replacement values for specific entries. |
+| `set_column_value` | `dict[string, string]` | No | — | Sets a constant value for all rows. |
+| `columns_to_preserve_name` | `list[string]` | No | — | Columns to exempt from internal name standardisation. |
+| `map_gene_ids_to_names` | `boolean` | No | `true` | When `true`, maps gene IDs to gene names if names are absent and `geneId` column is present. Set to `false` for proteomics or non-gene-ID data. |
+
+**`metadata_keys` accepted values (H5AD):**
+
+| Key | Value | Description |
+|-----|-------|-------------|
+| `var` | `metadata` | Standard feature annotations |
+| `varm` | `embedding` | Multidimensional feature data |
+| `varp` | `pairwise` | Pairwise feature annotations |
+
+---
+
+## `cell_expression`
+
+Settings for extracting and uploading the cell expression matrix. Optional. If absent, no Expression Group is created.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `data_class` | `string` | **Yes** | — | Data class label for the expression data (e.g. `"Single-cell transcriptomics"`). |
+| `compression_level` | `integer` (0–9) | No | `4` | Brotli compression level. Higher values produce smaller files at the cost of longer compression time. |
+| `chunk_size` | `integer` | No | inferred | Number of features processed per chunk. Calculated automatically from available memory if omitted. |
+| `max_buffer_size` | `integer` | No | `50` | Amount of data (in MB) held in memory before being flushed to disk during writing. |
+| `number_format` | `string` | No | inferred | Numeric precision of output values. Accepts printf-style (`"%.7g"`, `"%d"`) or NumPy dtype (`"float32"`, `"int64"`). |
+| `columns_to_drop` | `list[string]` | No | — | Column names to remove from expression metadata. |
+| `columns_renaming_map` | `dict[string, string]` | No | — | Maps source column names to new names. |
+| `set_column_value` | `dict[string, string]` | No | — | Sets a constant value for all rows in specified columns. |
+| `source_file_metadata` | `boolean` | No | `true` | When `true`, metadata from the source HDF5 attachment is read and included in expression metadata. Summary statistics (cell count, feature count, sparsity, etc.) are always appended regardless of this flag. |
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE156793.json b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE156793.json
new file mode 100644
index 0000000..459f4de
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE156793.json
@@ -0,0 +1,111 @@
+{
+  "name": "GSE156793.json",
+  "description": "Config to transform GSE156793 dataset",
+  "data": {
+    "file_type": "h5ad",
+    "biosample_metadata": {
+      "metadata_keys": {
+        "obs": "metadata"
+      },
+      "biosample_column_name": "RT_group",
+      "sample": {
+        "create_new_group": false,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": [
+          "Fetus_id", 
+          "Development_day"
+        ],
+        "columns_renaming_map": {
+          "Fetus_id": "Donor ID",
+          "Development_day": "Donor Age"
+        },
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": null
+      },
+      "library": {
+        "create_new_group": false,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": [
+          "Assay"
+        ],
+        "columns_renaming_map": {
+          "Assay": "Assay Type"
+        },
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": null
+      }
+    },
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata",
+        "obsm": "embedding",
+        "obsp": "pairwise"
+      },
+      "linking_group": null,
+      "columns_to_drop": [
+        "batch",
+        "Organ",
+        "Sex",
+        "Batch",
+        "Experiment_batch"
+      ],
+      "columns_renaming_map": {
+        "_index": "barcode",
+        "RT_group": "batch",
+        "Main_cluster_name": "cluster",
+        "Organ_cell_lineage": "cell_type"
+      },
+      "columns_to_curate_values": {
+        "matched_mca_cell_name": {
+          "nan": ""
+        },
+        "bca_cluster_info": {
+          "nan": ""
+        },
+        "matched_bca_cell_name": {
+          "nan": ""
+        },
+        "X_umap": {
+          "nan,nan": ""
+        }
+      },
+      "columns_to_fill_missing_values": {
+        "batch": "unknown"
+      },
+      "columns_to_preserve_name": [
+        "X_umap"
+      ],
+      "add_qc_metrics": true
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata",
+        "varm": "embedding",
+        "varp": "pairwise"
+      },
+      "columns_to_drop": null,
+      "columns_renaming_map": {
+        "_index": "geneId",
+        "gene_short_name": "gene"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "map_gene_ids_to_names": true
+    },
+    "cell_expression": {
+      "data_class": "Single-cell transcriptomics",
+      "compression_level": null,
+      "chunk_size": null,
+      "max_buffer_size": null,
+      "number_format": null,
+      "columns_to_drop": null,
+      "columns_renaming_map": null,
+      "set_column_value": null,
+      "source_file_metadata": true
+    }
+  }
+}
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE165045.json b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE165045.json
new file mode 100644
index 0000000..8a7b0ce
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/GSE165045.json
@@ -0,0 +1,49 @@
+{
+  "name": "GSE165045.json",
+  "description": "Config to transform GSE165045 dataset",
+  "data": { 
+    "file_type": "h5ad",
+    "biosample_metadata": null,
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata"
+      },
+      "linking_group": null,
+      "columns_to_drop": null,
+      "columns_renaming_map": {
+        "sample": "batch",
+        "_index": "barcode"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "add_qc_metrics": true
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata"
+      },
+      "columns_to_drop": null,
+      "columns_renaming_map": {
+        "_index": "gene"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "map_gene_ids_to_names": true
+    },
+    "cell_expression": {
+      "compression_level": null,
+      "chunk_size": null,
+      "max_buffer_size": null,
+      "data_class": "Single-cell transcriptomics",
+      "number_format": null,
+      "columns_to_drop": null,
+      "columns_renaming_map": null,
+      "set_column_value": null,
+      "source_file_metadata": true
+    }
+  }
+}
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_1.json b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_1.json
new file mode 100644
index 0000000..9a6c90c
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_1.json
@@ -0,0 +1,83 @@
+{
+  "name": "aggregated_config_1.json",
+  "description": "Aggregated config 1 to transform several public datasets",
+  "data": {
+    "file_type": "h5ad",
+    "biosample_metadata": null,
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata",
+        "obsm": "embedding",
+        "obsp": "pairwise"
+      },
+      "linking_group": null,
+      "columns_to_drop": [
+        "barcode", 
+        "Species",
+        "sex",
+        "age",
+        "disease",
+        "biosample_id",
+        "lvef"
+      ],
+      "columns_renaming_map": {
+          "index": "barcode",
+          "_index": "barcode",
+          "donor_id": "batch",
+          "sample": "batch",
+          "sample_id": "batch",
+          "Sample_Name": "batch",
+          "biological.individual": "batch",
+          "GSM_ID": "gsm_id",
+          "cell_type_leiden0.6": "cell_type",
+          "SubCluster": "cluster",
+          "cellbender_ncount": "n_counts",
+          "cellbender_ngenes": "n_genes",
+          "cellranger_percent_mito": "percent_mito",
+          "cellbender_entropy": "entropy",
+          "cellranger_doublet_scores": "doublet_scores"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "add_qc_metrics": true
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata",
+        "varm": "embedding",
+        "varp": "pairwise"
+      },
+      "columns_to_drop": [
+        "feature_biotype",
+        "feature_types",
+        "genome"
+      ],
+      "columns_renaming_map": {
+        "_index": "gene",
+        "index": "gene",
+        "GENE": "gene",
+        "var_index": "geneId",
+        "feature_is_filtered": "is_filtered"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "map_gene_ids_to_names": true
+    },
+    "cell_expression": {
+      "compression_level": null,
+      "chunk_size": null,
+      "max_buffer_size": null,
+      "data_class": "Single-cell transcriptomics",
+      "number_format": null,
+      "columns_to_drop": null,
+      "columns_renaming_map": null,
+      "set_column_value": null,
+      "source_file_metadata": true
+    }
+  }
+}
+
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_2.json b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_2.json
new file mode 100644
index 0000000..064fbd0
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_2.json
@@ -0,0 +1,107 @@
+{
+  "name": "aggregated_config_2.json",
+  "description": "Aggregated config 2 to transform several public datasets",
+  "data": {
+    "file_type": "h5ad",
+    "biosample_metadata": {
+      "metadata_keys": {
+        "obs": "metadata"
+      },
+      "biosample_column_name": "sample",
+      "sample": {
+        "create_new_group": false,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": [
+          "sex_ontology_term_id",
+          "development_stage_ontology_term_id",
+          "ethnicity_ontology_term_id",
+          "HbA1c",
+          "insulin_content",
+          "glucose_SI"
+        ],
+        "columns_renaming_map": {
+          "sex_ontology_term_id": "Donor Sex Term ID",
+          "development_stage_ontology_term_id": "Developmental Stage Term ID",
+          "ethnicity_ontology_term_id": "Donor Ethnicity Term ID",
+          "HbA1c": "Hemoglobin A1c (HbA1c) Concentration Value",
+          "insulin_content": "Fasting Insulin Concentration Value",
+          "glucose_SI": "Fasting Glucose Concentration Value"
+        },
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": null
+      },
+      "library": {
+        "create_new_group": false,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": [
+          "assay_ontology_term_id"
+        ],
+        "columns_renaming_map": {
+          "assay_ontology_term_id": "Assay Type Term ID"
+        },
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": null
+      }
+    },
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata",
+        "obsm": "embedding"
+      },
+      "linking_group": null,
+      "columns_to_drop": [
+        "id",
+        "BMI",
+        "organism_ontolology_term_id",
+        "disease_ontology_term_id",
+        "is_primary_data",
+        "tissue_ontology_term_id"
+      ],
+      "columns_renaming_map": {
+          "_index": "barcode",
+          "sample": "batch",
+          "louvain_anno_broad": "louvain",
+          "louvain_anno_fine": "louvain_fine",
+          "cell_type_ontology_term_id": "cell_type",
+          "mt_frac": "percent_mito"
+        },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "add_qc_metrics": true
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata"
+      },
+      "columns_to_drop": [
+        "feature_biotype"
+      ],
+      "columns_renaming_map": {
+        "ensembl_ID": "geneId",
+        "human_ensembl_ID": "human_ensembl_id",
+        "feature_is_filtered": "is_filtered",
+        "filtered_mapped_human_ensembl_ID": "filtered_mapped_human_ensembl_id"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "map_gene_ids_to_names": true
+    },
+    "cell_expression": {
+      "compression_level": null,
+      "chunk_size": null,
+      "max_buffer_size": null,
+      "data_class": "Single-cell transcriptomics",
+      "number_format": null,
+      "columns_to_drop": null,
+      "columns_renaming_map": null,
+      "set_column_value": null,
+      "source_file_metadata": true
+    }
+  }
+}
\ No newline at end of file
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_3.json b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_3.json
new file mode 100644
index 0000000..b9d9c9e
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/aggregated_config_3.json
@@ -0,0 +1,117 @@
+{
+  "name": "aggregated_config_3.json",
+  "description": "Aggregated config 3 to transform several public datasets",
+  "data": {
+    "file_type": "h5ad",
+    "biosample_metadata": {
+      "metadata_keys": {
+        "obs": "metadata"
+      },
+      "biosample_column_name": "sample_id",
+      "sample": {
+        "create_new_group": false,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": [
+          "Condition",
+          "self_reported_ethnicity_ontology_term_id",
+          "tissue_type"
+        ],
+        "columns_renaming_map": {
+          "Condition": "Condition Group",
+          "self_reported_ethnicity_ontology_term_id": "Donor Ethnicity Term ID",
+          "tissue_type": "Cell Source"
+        },
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": {
+          "Sample Source ID": {
+            "AM031": "Liver-32",
+            "AM042": "Liver-13",
+            "AM048": "Liver-14",
+            "AM061": "Liver-18",
+            "AM062": "Liver-33",
+            "AM072": "Liver-34"
+          }
+        }
+      },
+      "library": {
+        "create_new_group": null,
+        "template_id": null,
+        "linking_group": null,
+        "columns_to_export": null,
+        "columns_to_fill_missing_values": null,
+        "columns_to_curate_values": null
+      }
+    },
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata",
+        "obsm": "embedding",
+        "obsp": "pairwise"
+      },
+      "linking_group": null,
+      "columns_to_drop": [
+        "barcode",
+        "Sex",
+        "Age",
+        "batch",
+        "organism_ontology_term_id",
+        "donor_id",
+        "development_stage_ontology_term_id",
+        "sex_ontology_term_id",
+        "disease_ontology_term_id",
+        "tissue_ontology_term_id"
+      ],
+      "columns_renaming_map": {
+          "_index": "barcode",
+          "sample_id": "batch",
+          "log10GenesPerUMI_injured": "log10_genes_per_umi_injured",
+          "CellType_injured": "cell_type_injured",
+          "log10GenesPerUMI_healthy": "log10_genes_per_umi_healthy",
+          "CellType_healthy": "cell_type_healthy",
+          "cell_type_ontology_term_id": "cell_type"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": {
+        "batch": {
+          "AM031": "Liver-32",
+          "AM042": "Liver-13",
+          "AM048": "Liver-14",
+          "AM061": "Liver-18",
+          "AM062": "Liver-33",
+          "AM072": "Liver-34"
+        }
+      },
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "add_qc_metrics": true
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata",
+        "varm": "embedding",
+        "varp": "pairwise"
+      },
+      "columns_to_drop": null,
+      "columns_renaming_map": {
+        "_index": "gene"
+      },
+      "columns_to_fill_missing_values": null,
+      "columns_to_curate_values": null,
+      "set_column_value": null,
+      "columns_to_preserve_name": null,
+      "map_gene_ids_to_names": true
+    },
+    "cell_expression": {
+      "compression_level": null,
+      "chunk_size": null,
+      "max_buffer_size": null,
+      "data_class": "Single-cell transcriptomics",
+      "number_format": "float32",
+      "columns_to_drop": null,
+      "columns_renaming_map": null,
+      "set_column_value": null,
+      "source_file_metadata": true
+    }
+  }
+}
\ No newline at end of file
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/dataset-import-commands.md b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/dataset-import-commands.md
new file mode 100644
index 0000000..fd17906
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/dataset-import-commands.md
@@ -0,0 +1,195 @@
+# Curated Public Datasets: Import Commands
+
+The commands below load each curated single-cell dataset into an ODM instance using the `odm-import-data` CLI. Each command uploads study and sample/library metadata alongside the H5AD attachment, ready for transformation.
+
+Replace `<HOST>`, `<TOKEN>`, and `<TEMPLATE>` with your ODM instance URL, API token, and template ID before running. Commands can be run independently and in any order.
+
+Recommended template: [Public dataset template](https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/templates/public_studies_template_demo.json)
+
+> Two datasets — **GSE192740** and **GSE198623** — contain multiple species and upload two H5AD files (human and mouse, or human and pig) within a single command.
+
+---
+
+## HeartDiversityTucker10x
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/HeartDiversityTucker10x/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/HeartDiversityTucker10x/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/HeartDiversityTucker10x/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/HeartDiversityTucker10x/healthy_human_4chamber_map_unnormalized_V4.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/HeartDiversityTucker10x/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## SCP1303
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/SCP1303/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/SCP1303/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/SCP1303/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/SCP1303/human_dcm_hcm_scportal_03.17.2022.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/cardiovascular_diseases/SCP1303/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE156793
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/healthy_cell_atlases/GSE156793/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/healthy_cell_atlases/GSE156793/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/healthy_cell_atlases/GSE156793/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/healthy_cell_atlases/GSE156793/GSE156793_all_organs_annotated_with_genes.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/healthy_cell_atlases/GSE156793/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## FibroticLiverWatsonMERFISH
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/FibroticLiverWatsonMERFISH/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/FibroticLiverWatsonMERFISH/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/FibroticLiverWatsonMERFISH/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/FibroticLiverWatsonMERFISH/GSE210077_adata_healthy_diseased_nucseq_sparse.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/FibroticLiverWatsonMERFISH/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE165045
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/GSE165045/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/GSE165045/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/GSE165045/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/GSE165045/GSE165045_merged_with_TCR.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/inflammatory_diseases/GSE165045/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE148073
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE148073/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE148073/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE148073/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE148073/GSE148073_merged_data_new.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE148073/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE192740 (human + mouse)
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/GSE192740_human_combined_sparse.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/data_human.tsv \
+  -dc 'Single-cell transcriptomics' \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/GSE192740_mouse_combined_sparse.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE192740/data_mouse.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE198623 (human + pig)
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/GSE198623_human_processed.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/data_human.tsv \
+  -dc 'Single-cell transcriptomics' \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/GSE198623_pig_processed.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE198623/data_pig.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE292928
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE292928/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE292928/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE292928/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE292928/GSE292928_GEX_only_cellbender_sparse.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/metabolic_diseases/GSE292928/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
+
+---
+
+## GSE148434
+
+```bash
+odm-import-data \
+  --server <HOST> \
+  --token <TOKEN> \
+  --study https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/neurodegenerative_diseases/GSE148434/study.tsv \
+  --samples https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/neurodegenerative_diseases/GSE148434/samples.tsv \
+  --libraries https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/neurodegenerative_diseases/GSE148434/libraries.tsv \
+  -fl https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/neurodegenerative_diseases/GSE148434/GSE148434_merged_data.h5ad \
+  -flm https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/non_onco/single_cell/neurodegenerative_diseases/GSE148434/data.tsv \
+  -dc 'Single-cell transcriptomics' \
+  --template <TEMPLATE> \
+  --allow-duplicates
+```
diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/public-dataset-configurations-mapping.md b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/public-dataset-configurations-mapping.md
new file mode 100644
index 0000000..93137b6
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/extras/public-dataset-configurations-mapping.md
@@ -0,0 +1,24 @@
+# Transformation Configurations: Dataset Reference
+
+Each curated dataset has been assigned a tested transformation configuration. Use the table below to identify which configuration to use when running a transformation job for a given dataset.
+
+> Since these configurations have been pre-validated against their respective datasets, the dry-run step can be skipped when transforming the curated catalogue.
+
+---
+
+## Configuration–dataset mapping
+
+| Configuration | Datasets |
+|---|---|
+| [`aggregated_config_1`](aggregated_config_1.json) | HeartDiversityTucker10x |
+| | GSE292928 |
+| | GSE192740_human |
+| | GSE192740_mouse |
+| | GSE148434 |
+| | SCP1303 |
+| [`aggregated_config_2`](aggregated_config_2.json) | GSE198623_human |
+| | GSE198623_pig |
+| [`aggregated_config_3`](aggregated_config_3.json) | GSE148073 |
+| | FibroticLiverWatsonMERFISH |
+| [`GSE156793`](GSE156793.json) | GSE156793 |
+| [`GSE165045`](GSE165045.json) | GSE165045 |
diff --git a/docs/user-guide/doc-odm-user-guide/how-to-sc-hdf5-transformations.md b/docs/user-guide/doc-odm-user-guide/how-to-sc-hdf5-transformations.md
new file mode 100644
index 0000000..e0aa9ce
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/how-to-sc-hdf5-transformations.md
@@ -0,0 +1,406 @@
+# How-to Guides: Single-Cell HDF5 Transformations in ODM
+
+These guides show how to accomplish specific tasks using the single-cell HDF5 transformation. Each guide assumes you have a valid input file (H5AD or 10x H5) already attached to a study in ODM.
+
+To get started quickly with the full upload-to-query workflow, see the [Single-cell data in ODM: Getting Started](quickstart-sc.md). For a conceptual overview of the entities involved and how the transformation works, see [About Single-Cell HDF5 Transformations in ODM](about-sc-hdf5-transformations.md). For the full list of configuration parameters, see the [Configuration Reference](configuration-reference.md). For the API endpoint specifications, see the [API Reference](api-reference.md). For details on what the pipeline does internally at each stage, see the [Transformation Process Reference](transformation-process-reference.md).
+
+## Table of Contents
+
+- [How to run a transformation via the ODM API](#how-to-run-a-transformation-via-the-odm-api)
+- [How to iterate on a configuration using dry runs](#how-to-iterate-on-a-configuration-using-dry-runs)
+- [How to ingest cell and expression data from an H5AD file](#how-to-ingest-cell-and-expression-data-from-an-h5ad-file)
+- [How to create Sample, Library, or Preparation groups from your H5AD file](#how-to-create-sample-library-or-preparation-groups-from-your-h5ad-file)
+- [How to update existing biosample metadata](#how-to-update-existing-biosample-metadata)
+- [How to discover which biosample attributes are available in your file](#how-to-discover-which-biosample-attributes-are-available-in-your-file)
+- [How to process a 10x Genomics H5 file](#how-to-process-a-10x-genomics-h5-file)
+- [How to configure metadata curation](#how-to-configure-metadata-curation)
+
+---
+
+## How to run a transformation via the ODM API
+
+This guide covers the end-to-end steps to ingest single-cell data into ODM using the Processors Controller API. The process involves three steps: creating a configuration, running a dry run to validate it, and submitting the full run.
+
+### Step 1: Create a transformation configuration
+
+Create a configuration document that describes how to process your file. The `data` field contains the processing specification; the `name` and `description` are for your own reference.
+
+```
+POST /api/v1/transformations/configurations
+```
+
+```json
+{
+  "name": "my_study_config",
+  "description": "Cell and expression ingestion for study XYZ",
+  "data": {
+    "file_type": "h5ad",
+    "cell_metadata": {
+      "metadata_keys": {
+        "obs": "metadata",
+        "obsm": "embedding"
+      },
+      "columns_to_drop": ["taxon"],
+      "columns_renaming_map": {
+        "sample": "batch"
+      }
+    },
+    "feature_metadata": {
+      "metadata_keys": {
+        "var": "metadata"
+      }
+    },
+    "cell_expression": {
+      "data_class": "Single-cell transcriptomics"
+    }
+  }
+}
+```
+
+The response includes the `id` of the created configuration. It is required for subsequent steps.
+
+For a full description of the `data` object, see the [Configuration Reference](configuration-reference.md).
+
+For a list of ready-to-use configurations, see the [Configuration mapping](doc-odm-user-guide/extras/public-dataset-configurations-mapping.md).
+
+### Step 2: Identify the transformation image
+
+```
+GET /api/v1/transformations/images
+```
+
+Confirm that the `hdf5-cells` image is available. Note the version you want to use: typically `"latest"`, or a specific release tag for reproducibility.
+
+### Step 3: Submit a dry-run job
+
+```
+POST /api/v1/transformations/jobs
+```
+
+```json
+{
+  "configuration_id": <config_id>,
+  "dry_run": true,
+  "image_reference": {
+    "name": "hdf5-cells",
+    "version": "latest"
+  },
+  "input_accessions": ["<attachment_accession>"],
+  "volume_size": 30
+}
+```
+
+As a guideline for setting `volume_size`:
+- For H5AD input files, allocate approximately **1.4× the size of the original attachment** (in GB).
+- For 10x H5 input files, allocate at least **4× the size of the original attachment** (in GB).
+
+The response includes the `id` of the created job. It is required for subsequent steps.
+
+### Step 4: Monitor the dry-run job
+
+```
+GET /api/v1/transformations/jobs/{job_id}
+```
+
+Repeat until `status.state` reaches a terminal value: `DONE` or `FAILED`.
+
+### Step 5: Review the logs
+
+```
+POST /api/v1/transformations/jobs/{job_id}/logs
+```
+
+Review the logs for warnings and errors. Pay particular attention to:
+- Configuration validation messages.
+- The file structure report: which metadata keys are present in your file.
+- Linking validation results: whether all cell `batch` values map to existing SLP objects.
+- Any columns flagged for automatic renaming or data type conversion.
+
+If issues are found, update the configuration and repeat from Step 3. See [How to iterate on a configuration using dry runs](#how-to-iterate-on-a-configuration-using-dry-runs) for the recommended cycle.
+
+### Step 6: Submit the full run
+
+Once the dry run completes without issues, submit the same job with `dry_run` set to `false` in the request body:
+
+```
+POST /api/v1/transformations/jobs
+```
+
+```json
+{
+  "configuration_id": <config_id>,
+  "dry_run": false,
+  "image_reference": {
+    "name": "hdf5-cells",
+    "version": "latest"
+  },
+  "input_accessions": ["<attachment_accession>"],
+  "volume_size": 30
+}
+```
+
+Monitor and retrieve logs the same way as the dry run (Steps 4–5). When the job completes, the logs contain the ODM accessions assigned to each object that was created or updated. The logs are uploaded as attachment to the same study.
+
+---
+
+## How to iterate on a configuration using dry runs
+
+This guide describes the recommended iterative cycle for refining a transformation configuration before committing to a full run. Use this when the initial dry run reveals warnings or errors that require attention.
+
+The cycle follows this pattern:
+
+```
+Create configuration → Submit dry-run job → Review logs
+       ↑                                          |
+       └──── Update configuration ←──────────────┘
+              (if issues found)
+```
+
+After reviewing the logs from a dry-run job, update the existing configuration using:
+
+```
+PUT /api/v1/transformations/configurations/{config_id}
+```
+
+The request body of the `PUT` endpoint follows the same structure as the original `POST`. The configuration at the given `id` is fully replaced with the new content.
+
+Then resubmit the dry-run job with the same `configuration_id`. Because the configuration is updated in place, you can reuse the same `configuration_id` across all iterations without creating a new configuration for each attempt.
+
+Repeat until the dry run completes without errors or warnings that require action. Then submit the full run.
+
+---
+
+## How to ingest cell and expression data from an H5AD file
+
+Use this when the study already has Sample, Library, or Preparation groups in ODM and you only need to add the single-cell layer. Configure `cell_metadata`, `feature_metadata`, and `cell_expression` in your configuration's `data` field.
+
+```json
+{
+  "file_type": "h5ad",
+  "cell_metadata": {
+    "metadata_keys": {
+      "obs": "metadata",
+      "obsm": "embedding"
+    },
+    "columns_to_drop": ["taxon", "organism_id"],
+    "columns_renaming_map": {
+      "sample": "batch",
+      "pctmt": "percentMito"
+    }
+  },
+  "feature_metadata": {
+    "metadata_keys": {
+      "var": "metadata"
+    }
+  },
+  "cell_expression": {
+    "data_class": "Single-cell transcriptomics"
+  }
+}
+```
+
+The transformation resolves the linking target for created Cell Group automatically in the following order: Library → Preparation → Sample. To link to a specific group, set `cell_metadata.linking_group` explicitly:
+
+```json
+"cell_metadata": {
+  "linking_group": {
+    "library": "GSFXXXXXX"
+  }
+}
+```
+
+To link to all preparation groups in the study without specifying their accessions individually, set an empty value:
+
+```json
+"cell_metadata": {
+  "linking_group": {
+    "preparation": []
+  }
+}
+```
+
+---
+
+## How to create Sample, Library, or Preparation groups from your H5AD file
+
+Use this when your study does not yet have SLP groups in ODM and you want to derive biosample-level attributes from the cell metadata.
+
+Identify the column in your cell metadata that acts as a biosample identifier. Set this as `biosample_column_name`. Under the relevant entity (`sample`, `library`, or `preparation`), set `create_new_group: true` and list the columns to export under `columns_to_export`.
+
+```json
+{
+  "file_type": "h5ad",
+  "biosample_metadata": {
+    "metadata_keys": {
+      "obs": "metadata"
+    },
+    "biosample_column_name": "sample_id",
+    "sample": {
+      "create_new_group": true,
+      "columns_to_export": ["tissue", "disease", "donor_id"]
+    }
+  },
+  "cell_metadata": {
+    "metadata_keys": {
+      "obs": "metadata",
+      "obsm": "embedding"
+    }
+  },
+  "feature_metadata": {
+    "metadata_keys": {
+      "var": "metadata"
+    }
+  },
+  "cell_expression": {
+    "data_class": "Single-cell transcriptomics"
+  }
+}
+```
+
+The transformation aggregates cells by `biosample_column_name` and exports only attributes that are constant per biosample. Exported columns are automatically removed from the cell metadata.
+
+Only one of `library` or `preparation` may have `columns_to_export` set in the same configuration.
+
+To create a Library group without exporting attributes (a placeholder group used only for linking), set `create_new_group: true` and omit `columns_to_export`:
+
+```json
+"library": {
+  "create_new_group": true
+}
+```
+
+---
+
+## How to update existing biosample metadata
+
+Use this when SLP groups already exist in ODM but are missing attributes that are present in your HDF5 file.
+
+Configure `biosample_metadata` with `columns_to_export` for the target entity, but do not set `create_new_group: true`.
+
+```json
+{
+  "file_type": "h5ad",
+  "biosample_metadata": {
+    "metadata_keys": {
+      "obs": "metadata"
+    },
+    "biosample_column_name": "library_id",
+    "library": {
+      "columns_to_export": ["sequencing_platform", "library_strategy"]
+    }
+  }
+}
+```
+
+The transformation matches extracted rows to existing ODM objects on the entity ID column and updates only attributes that do not already exist. If any extracted ID does not match an existing ODM object, the transformation raises an error. Run a dry run first to catch ID mismatches before committing data.
+
+---
+
+## How to discover which biosample attributes are available in your file
+
+Use this to identify which cell-level metadata columns are uniform per biosample and can be exported. No data will be updated in ODM.
+
+Submit a dry-run job with `biosample_metadata` configured but without any `columns_to_export` entries:
+
+```json
+{
+  "file_type": "h5ad",
+  "biosample_metadata": {
+    "metadata_keys": {
+      "obs": "metadata"
+    },
+    "biosample_column_name": "sample_id",
+    "sample": {}
+  }
+}
+```
+
+The transformation logs the number of unique biosamples found and the list of columns that are constant across all cells per biosample. Use the logged list to plan your `columns_to_export` configuration before running a full ingestion.
+
+---
+
+## How to process a 10x Genomics H5 file
+
+The only required change compared to an H5AD configuration is setting `file_type` to `"h5"`. Use the same H5AD key names (`obs`, `var`) in `metadata_keys` — the transformation converts the 10x H5 format to H5AD internally and applies unified processing.
+
+```json
+{
+  "file_type": "h5",
+  "cell_metadata": {
+    "metadata_keys": {
+      "obs": "metadata"
+    }
+  },
+  "feature_metadata": {
+    "metadata_keys": {
+      "var": "metadata"
+    }
+  },
+  "cell_expression": {
+    "data_class": "Single-cell transcriptomics"
+  }
+}
+```
+
+Volume sizing for .h5 inputs: When setting `volume_size` for a job that uses an H5 input file, allocate at least **4× the original attachment size (e.g., a 5 GB file → volume_size ≥ 20 GB). H5 inputs require additional scratch space because the transformation converts them to H5AD during processing.
+
+Legacy 10x H5 support: Legacy 10x Genomics H5 files (v<3) are supported only when the file contains a single genome. If the file includes multiple genomes, pre-process it to extract the genome of interest before running the transformation.
+
+---
+
+## How to configure metadata curation
+
+These operations are available in `cell_metadata`, `feature_metadata`, and per-entity settings within `biosample_metadata`. They are applied in the order listed.
+
+**To drop columns:**
+
+```json
+"columns_to_drop": ["taxon", "organism_id"]
+```
+
+**To rename a column:**
+
+```json
+"columns_renaming_map": {
+  "sample": "batch",
+  "pctmt": "percentMito"
+}
+```
+
+**To replace specific values:**
+
+```json
+"columns_to_curate_values": {
+  "sample": {
+    "LGVXCTRL1": "lung_healthy_1"
+  }
+}
+```
+
+**To fill missing values:**
+
+```json
+"columns_to_fill_missing_values": {
+  "batch": "unknown"
+}
+```
+
+**To set a constant value for all rows:**
+
+```json
+"set_column_value": {
+  "sample_id": "lung_1"
+}
+```
+
+After all explicit column operations, matching attributes are mapped to ODM standard names. The rest are converted to camelCase. Columns listed in `columns_to_preserve_name` are exempt from this standardization step.
+
+For the details, see [Attribute Mapping Reference](attribute-mapping.md)
+
+**To prevent a column from being automatically renamed:**
+
+```json
+"columns_to_preserve_name": ["cluster_leiden_0.5"]
+```
+
+For full parameter specifications, see the [Configuration reference](configuration-reference.md).
diff --git a/docs/user-guide/doc-odm-user-guide/quickstart-sc.md b/docs/user-guide/doc-odm-user-guide/quickstart-sc.md
new file mode 100644
index 0000000..a320abd
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/quickstart-sc.md
@@ -0,0 +1,89 @@
+# Single-cell data in ODM: Getting Started
+**Upload → Transform → Index → Search**
+
+---
+
+## Overview
+
+### Who this is for
+Users who want to try single-cell data functionality in ODM on prepared, curated datasets.
+
+### What you'll achieve
+- Upload single-cell input files as attachments.
+- Run the transformation to generate ODM-compatible indexed objects.
+- Try cross-study search and analytical queries using the provided notebooks.
+
+### Prerequisites
+- ODM instance URL: `<HOST>`
+- API token: `<TOKEN>`
+- An environment set up to run the notebooks.
+
+---
+
+## Step 1 — Upload and transform a single dataset
+
+**Goal:** Walk through the full workflow on one HDF5 file and verify that analysis-ready objects appear in ODM.
+
+1. Create a study with an HDF5 file as an attachment.
+2. Run a transformation job to convert it into ODM-indexed single-cell objects.
+3. Verify that objects were created, linked, and indexed correctly.
+
+**Notebook:** *Transformation Quickstart*
+- Link: [Single-Cell RNA-Seq: Data Transformation and Upload to ODM](doc-odm-user-guide/notebooks/sc_transformations_demo.ipynb)
+- What it covers: uploading the HDF5 file · creating a configuration · running a dry-run · checking job status and outputs.
+
+---
+
+## Step 2 — Load curated public datasets
+
+**Goal:** Populate ODM with a ready-made catalogue of curated public single-cell studies so you can test cross-study search without preparing your own data. Load the data using the template provided for these datasets to enable range queries.
+
+1. Load the template
+  - Link: [Public dataset template](https://bio-test-data.s3.us-east-1.amazonaws.com/demo_materials/templates/public_studies_template_demo.json)
+  - Further details: [Template upload guide](../../../docs/tools/odm-sdk/terminal/templates/create-or-update-template.md)
+2. Load the curated datasets into ODM (HDF5 attachments included).
+  - **Ready-to-run import commands:** [Import commands for public datasets](doc-odm-user-guide/extras/dataset-import-commands.md). Includes copy-paste commands with placeholders for server, token and template.
+  - Further details: [Uploading studies to ODM](../../../docs/tools/odm-sdk/terminal/study/uploading-study.md)
+
+---
+
+## Step 3 — Transform curated datasets
+
+**Goal:** Transform the curated datasets to produce fully indexed objects with harmonised metadata.
+
+1. Following the Transformation Notebook as a reference, run the transformation for each curated dataset.
+2. Use the provided configurations to ensure consistent curation. You can skip the dry-run step, as the configurations are pre-tested.
+3. Monitor transformation jobs until all complete successfully.
+4. Confirm that the expected objects are present: Cell Group, Expression Group, and metadata objects.
+
+**Prepared configurations:**
+- Link: [Public dataset configurations](doc-odm-user-guide/extras/public-dataset-configurations-mapping.md)
+
+---
+
+## Step 4 — Confirm indexing completed
+
+**Goal:** Make sure all datasets are marked as indexed and ready to query.
+- Each transformed dataset shows the **Indexed** label in ODM Metadata Editor.
+- All indexing tasks show **Done** status in Task Manager.
+
+> **Note:** A completed transformation job does not mean the data is immediately searchable. ODM automatically triggers indexing after ingestion, but data is available for querying once indexing finishes.
+
+---
+
+## Step 5 — Query and analyse single-cell data
+
+**Goal:** Use ODM's search and analytics notebooks to explore your indexed datasets.
+
+**Notebook:** *Single-cell Query & Analysis*
+- Link: [Single-Cell RNA-Seq: Cohort Selection and Data Retrieval](doc-odm-user-guide/notebooks/sc_rnaseq_demo.ipynb)
+- What it covers: cross-study search examples · filtering by curated attributes · example analytical queries and result inspection.
+
+
+## Next steps
+
+- [Single-Cell HDF5 Transformations Overview](about-sc-hdf5-transformations.md) — conceptual overview of the transformation pipeline.
+- [How-to Guides](how-to-sc-hdf5-transformations.md) — step-by-step guidance for running the transformation.
+- [Configuration Reference](configuration-reference.md) — full configuration schema.
+- [Transformation Process Reference](transformation-process-reference.md) — internal processing pipeline.
+- [API Reference](api-reference.md) — API endpoints.
diff --git a/docs/user-guide/doc-odm-user-guide/transformation-process-reference.md b/docs/user-guide/doc-odm-user-guide/transformation-process-reference.md
new file mode 100644
index 0000000..5894da6
--- /dev/null
+++ b/docs/user-guide/doc-odm-user-guide/transformation-process-reference.md
@@ -0,0 +1,250 @@
+# Transformation Process Reference: Single-Cell HDF5 Transformation
+
+> **Related documentation:** For conceptual background, see [About Single-Cell HDF5 Transformations in ODM](about-sc-hdf5-transformations.md). For configuration parameter definitions and default values, see the [Configuration Reference](configuration-reference.md). For guides to running the transformation, see [Single-cell data in ODM: Getting Started](quickstart-sc.md) and [How-to Guides](how-to-sc-hdf5-transformations.md).
+
+This reference describes the internal processing stages of the single-cell HDF5 transformation pipeline. It is intended for users who need to understand what the pipeline does at each stage — for example, to interpret logs, diagnose errors, or reason about the order of operations.
+
+---
+
+## Stage 1: Initial setup and file preparation
+
+### 1.1 Configuration loading and validation
+
+The pipeline reads the transformation configuration file and validates all fields.
+
+**Top-level key validation** is performed first, checking the presence and data types of: `file_type`, `save_logs`, `biosample_metadata`, `cell_metadata`, `feature_metadata`, and `cell_expression`. If `file_type` is missing or contains an unsupported value (`"h5ad"` and `"h5"` are the only accepted values), the pipeline raises an error immediately and does not proceed.
+
+For all remaining sections, validation errors are accumulated and reported together at the end of the validation stage, so that all issues in the configuration are surfaced in a single run.
+
+**Per-section validation** covers:
+
+- Presence of required keys within each optional section.
+- Data type correctness for every key in the section.
+- Key-value correctness for `metadata_keys` entries.
+- `biosample_metadata`: ensures that `library` and `preparation` are not both configured for simultaneous update.
+- `cell_expression`: validates `number_format` as either a printf-format string or a NumPy dtype string; the resolved dtype is stored back into the configuration for downstream use.
+
+Unrecognized keys at any level are logged as warnings and ignored. 
+
+Examples of valid configurations can be found in [Public dataset configurations](doc-odm-user-guide/extras/public-dataset-configurations-mapping.md).
+
+### 1.2 Attachment and study metadata retrieval
+
+The pipeline retrieves the accession and metadata of the input HDF5 attachment from ODM. From this, it determines:
+
+- The name to assign to the processed data objects.
+- The study accession that the resulting Cell Group and Expression Group will be associated with.
+
+### 1.3 Linking group determination
+
+Before any file processing begins, the pipeline resolves the parent SLP entity (Sample, Library, or Preparation group) to which the Cell Group will be linked. The resolution follows these rules in order:
+
+- **New SLP group creation deferred:** If `biosample_metadata` is present and any of `sample`, `library`, or `preparation` has `create_new_group: true`, linking resolution is deferred until after those new groups are created and uploaded (Stage 4). The cell group is then linked to the newly created groups. Example:
+
+```json
+"biosample_metadata": {
+  "metadata_keys": {
+    "obs": "metadata"
+  },
+  "biosample_column_name": "sample_id",
+  "sample": {
+    "create_new_group": true
+  }
+}
+```
+
+- **Explicit `linking_group` in `cell_metadata`:** If `cell_metadata.linking_group` is set, the specified entity type and accession(s) are used directly. An empty value (e.g. `[]`) resolves to all available group accessions of the specified entity type for the study. Examples:
+
+```json
+"cell_metadata": {
+  "linking_group": {
+    "sample": ["GSF000001"]
+  }
+}
+```
+
+```json
+"cell_metadata": {
+  "linking_group": {
+    "preparation": []
+  }
+}
+```
+
+- **Auto-discovery:** If neither of the above applies, the pipeline fetches all SLP groups associated with the study from ODM and selects the first entity type that has at least one group, checking in the order: **Library → Preparation → Sample**. All accessions of the selected type are used for linking.
+
+If no SLP group can be found and no new group is being created, the pipeline raises an error.
+
+### 1.4 Temporary directory and file preparation
+
+A temporary directory is created to store all intermediate files produced during the run. The input HDF5 file is copied into this directory. If the input is of type `"h5"` (10x Genomics H5), it is converted to H5AD format and stored alongside the original copy, so that subsequent stages can stream from the H5AD representation uniformly regardless of source format.
+
+### 1.5 File structure inspection
+
+The pipeline opens the H5AD file and inspects its structure, logging:
+
+- Top-level keys (groups).
+- Data types and shapes.
+- Attribute names.
+
+This output is written to the transformation logs and is useful for verifying which metadata keys (such as `obs`, `var`, `obsm`) are present in the file before extraction begins.
+
+---
+
+## Stage 2: Metadata extraction
+
+The configuration is checked to determine whether processing for `biosample_metadata`, `cell_metadata`, and/or `feature_metadata` is required. Each configured section is processed independently according to the steps below.
+
+### 2.1 Configuration and input validation
+
+For each metadata section, the pipeline reads parameters (data type, input/output files, file type, metadata keys, column operations) and validates the presence of required keys and supported file types.
+
+### 2.2 Biosample metadata (`biosample_metadata` config)
+
+When `biosample_metadata` is present in the configuration, the pipeline can export Sample, Library, or Preparation-level attributes derived from cell-level metadata, curated as indicated in the configuration.
+
+Only one of `library` or `preparation` may have `columns_to_export` set. 
+
+Attributes exported to biosample metadata are automatically removed from the cell metadata in the subsequent processing step. Biosample attributes that do not need to be exported but also should not remain in cell metadata must be listed in `cell_metadata.columns_to_drop`.
+
+#### File reading and metadata extraction
+
+The pipeline opens the H5AD file, reads the metadata from the group indicated by `metadata_keys`, and organizes the resulting table by `biosample_column_name`. Only attributes that are constant within a biosample and listed in `columns_to_export` are processed for export.
+
+For each entity type with `columns_to_export` configured, columns are filtered and optionally curated; the entity ID column(s) (e.g. Sample Source ID, Library ID, Preparation ID) are set from the configuration, and the result is written to a TSV file in the temporary directory.
+
+Exporting a placeholder group containing only ID column(s) can be configured by setting `create_new_group: true` and omitting `columns_to_export`.
+
+#### Discovery mode
+
+Discovery mode is activated only when `dry_run` is enabled, `biosample_metadata` is present, and no entity has `columns_to_export` defined. In this mode, the pipeline logs the number of unique biosamples and the attributes constant within each biosample, then exits without writing a TSV. No ODM objects are created or modified.
+
+#### Existing biosample metadata update
+
+When `columns_to_export` is configured for an entity but `create_new_group` is not set, the pipeline prepares an update to existing ODM metadata objects.
+
+It fetches the current metadata for the entity type, then runs a matching procedure joining the extracted metadata to the existing metadata by the entity ID column (Sample Source ID, Library ID, or Preparation ID). Only attributes that do not already exist in the ODM metadata are retained; columns with the same name are skipped. If any extracted ID does not match an existing ODM object, an error is raised listing the unmatched IDs. The matching result is written to a TSV file for use in Stage 4.
+
+### 2.3 Cell and feature metadata extraction
+
+For cell and feature metadata, the pipeline opens the H5AD file and reads the groups specified in `metadata_keys`:
+
+- Standard metadata (`"metadata"`) is loaded into a DataFrame.
+- **Embeddings** (`"embedding"`) are read as multidimensional arrays, serialized as comma-separated strings, and added as columns.
+- **Pairwise data** (`"pairwise"`) is read as pairwise matrices; for each matrix, the row mean is calculated and added as a column.
+
+
+### 2.4 Index handling and sanity checks
+
+- If the DataFrame is empty and has neither columns nor an index, an error is raised.
+- If the index is unnamed, it is assigned the default name `_index`.
+- If the index name collides with an existing column name, it is renamed to avoid the conflict.
+- The index is extracted and appended as a column to ensure barcode or feature ID information is preserved for downstream validation.
+
+> **Note:** If the cell barcode is stored in the index and the index has no name, the extracted column will be named `_index`. To use a different name, rename it using `columns_renaming_map` in the configuration.
+
+### 2.5 Column operations
+
+The following transformations are applied in the order listed, when specified in the configuration:
+
+1. **Drop columns** (`columns_to_drop`)
+2. **Rename columns** (`columns_renaming_map`)
+3. **Curate values** (`columns_to_curate_values`)
+4. **Fill missing values** (`columns_to_fill_missing_values`)
+5. **Set constant values** (`set_column_value`)
+
+After all explicit column operations, **attribute name standardization** is applied: column names are mapped to ODM standard attribute names where a mapping exists; non-standard names are converted to camelCase. Columns listed in `columns_to_preserve_name` are exempt from this step. For the full list of recognized column names, see the [Attribute Mapping Reference](attribute-mapping.md). 
+
+Data type validation is then performed on the resulting DataFrame.
+
+**Cell metadata additional steps:**
+
+- **Required column validation:**
+  - `barcode`: Unique cell identifiers. Duplicate or missing values cause an error.
+  - `batch`: Sample, Library, or Preparation identifiers used for linking. Missing values cause an error.
+
+- **QC metric calculation**: The following attributes are computed and added if they are not present in the original file: number of counts, number of genes, percentage mitochondrial expression, and percentage ribosomal expression. The step can be skipped by setting `add_qc_metrics` to `false` in the cell metadata section of the configuration. 
+
+The step is also skipped when the job is submitted with `dry_run: true` in the request body.
+
+**Feature metadata additional steps:**
+
+- **Gene ID mapping**: If gene names are absent and the standard `geneId` column is present, the pipeline infers the ID source (Ensembl or NCBI) and the species. If both can be determined, a new column with the mapped gene names is added. Supported organisms and annotation releases are listed in [Gene ID to name mapping](attribute-mapping.md#gene-id-to-name-mapping). The step can be skipped by setting `map_gene_ids_to_names` to `false` in the feature metadata section of the configuration. 
+
+
+### 2.6 Storing data
+
+The processed metadata DataFrame is written to the temporary directory as a TSV file.
+
+---
+
+## Stage 3: Cell expression extraction
+
+### 3.1 Configuration and input validation
+
+The pipeline reads expression parameters: `data_class`, `compression_level`, `chunk_size`, `max_buffer_size`, and `number_format`. Parameters not specified in the configuration are either inferred from the data or set to sensible defaults.
+
+### 3.2 Expression matrix reading and validation
+
+The cell expression matrix is read from the HDF5 file. The pipeline validates that the matrix shape matches the number of cells and features as determined by the extracted metadata.
+
+### 3.3 Expression data writing
+
+The expression data, enriched with feature metadata according to the configuration, is written to a Brotli-compressed file (`.br`) in the temporary directory.
+
+### 3.4 Expression metadata reading and writing
+
+Expression metadata from the source attachment is read and transformed according to `columns_to_drop`, `columns_renaming_map`, and `set_column_value`, unless `source_file_metadata` is `false`.
+
+The following statistics are always computed and appended to the metadata regardless of the `source_file_metadata` flag:
+
+1. Total number of cells
+2. Total number of features
+3. Sparsity (%)
+4. Number of non-zero values
+5. Source file accession
+6. Source file name
+
+The generated metadata file is written to the temporary directory.
+
+---
+
+## Stage 4: Final steps and upload
+
+### 4.1 Dry run exit
+
+If the job is submitted with `dry_run: true` in the request body, the pipeline performs linking validation and exits at this point. Expression matrix compression is skipped. Logs are reported and available in APIs but not saved as attachments.
+
+A best-effort linking validation:
+
+- **Biosample coverage:** Unique values in the cell metadata `batch` column are compared against the ID values of the resolved Sample, Library, Preparation (SLP) groups. Unmatched values are logged as warnings.
+- **Duplicate IDs:** If the same ID value appears in more than one SLP object, a warning is logged.
+- **Group accession coverage:** Group accessions that contain no biosample objects matching any cell `batch` value are logged as warnings.
+
+Validation mismatches are reported as warnings and do not abort the dry run. They can be used to correct the configuration before submitting a full run.
+
+### 4.2 Upload to ODM
+
+Output files generated in previous pipeline stages are uploaded to ODM and linked to corresponding entities.
+
+**4.2.1 SLP groups**
+
+If `biosample_metadata` is configured with at least one entity:
+
+- **New groups** (for entities with `create_new_group: true`): The corresponding TSV is uploaded as a new group via the entity-specific API endpoint, with `template_id` applied if specified. The new group is linked to its parent: Sample Groups are linked to the study; Library and Preparation Groups are linked to a Sample Group, resolved by checking first `linking_group.sample` in the entity's configuration, then a Sample Group created in the same run, and finally pre-fetched Sample Group accessions for the study.
+
+The newly created Group's accession is stored for use in the cell group linking step. Library and Preparation take priority over Sample.
+
+- **Existing groups** (for entities with `create_new_group` not set): For each row in the update TSV produced in Stage 2.2, the pipeline updates the corresponding object by calling the ODM PATCH API endpoint with the new attribute values.
+
+**4.2.2 Cell Group upload**
+
+The transformed cell metadata TSV is uploaded as a new Cell Group, which is linked to the parent SLP Groups determined in Stage 1.3 or resolved in Stage 4.2.1 if new SLP Groups were created.
+
+**4.2.3 Expression Group upload**
+
+The Brotli-compressed expression file and its metadata file are uploaded to create a new Expression Group, which is linked to the newly created Cell Group.
+
+**4.2.4 Log upload**
+
+Transformation logs are uploaded as an attachment together with their metadata. This step is skipped if `save_logs` is `false`.