Write file storage code once. Run it against local files, S3, SFTP, or Azure.
Beta. The API is settling, but until 1.0, minor releases may include breaking changes. See the changelog for what's new, and open an issue if something breaks.
Most Python projects that deal with files eventually grow storage glue: small wrappers around local paths, S3 clients, SFTP connections, and cloud SDKs. Those wrappers are usually duplicated across projects, slightly inconsistent, and painful to replace later.
remote-store replaces them with one simple interface.
Where files live is configuration, not application code.
Under the hood, established Python libraries like s3fs, paramiko,
and azure-storage-file-datalake do the real work.
Requires Python 3.10+. The core API is synchronous; an async counterpart is available via remote_store.aio. See the concurrency guide for atomicity caveats and race conditions.
Install from PyPI:
pip install remote-storeBackends that need extra dependencies use extras:
pip install "remote-store[s3]" # Amazon S3 / MinIO
pip install "remote-store[s3-pyarrow]" # S3 via PyArrow (analytical workloads)
pip install "remote-store[sftp]" # SFTP / SSH
pip install "remote-store[azure]" # Azure Blob / ADLS Gen2
pip install "remote-store[sql]" # SQL Blob (SQLite, PostgreSQL, ...)
pip install "remote-store[sql-query]" # SQL Query (read-only, SQLAlchemy + PyArrow)Optional extras for integrations:
pip install "remote-store[requests]" # HTTP backend with requests (connection pooling)
pip install "remote-store[httpx]" # HTTP backend with httpx (HTTP/2)
pip install "remote-store[arrow]" # PyArrow filesystem adapter
pip install "remote-store[otel]" # OpenTelemetry instrumentation
pip install "remote-store[yaml]" # YAML config support
pip install "remote-store[pydantic]" # Pydantic BaseSettings config
pip install "remote-store[toml]" # TOML config on Python < 3.11The simplest way to use remote-store (examples/getting_started/quickstart.py):
from remote_store import Store
from remote_store.backends import LocalBackend
store = Store(LocalBackend(root="/tmp/data"))
store.write_text("hello.txt", "Hello, world!")
print(store.read_text("hello.txt")) # 'Hello, world!'For applications that manage multiple backends or switch between environments, use a Registry with declarative config:
from remote_store import Registry, RegistryConfig
config = RegistryConfig.from_dict({
"backends": {"main": {"type": "local", "options": {"root": "/tmp/data"}}},
"stores": {"data": {"backend": "main", "root_path": ""}},
})
with Registry(config) as registry:
store = registry.get_store("data")
store.write_text("hello.txt", "Hello, world!")
print(store.read_text("hello.txt")) # 'Hello, world!'Switch from local to S3 by changing the config file. The application code stays the same:
Dev — local filesystem:
[backends.main]
type = "local"
options = { root = "/tmp/data" }
[stores.reports]
backend = "main"
root_path = "reports"Production — S3:
[backends.main]
type = "s3"
options = { bucket = "analytics-data" }
[stores.reports]
backend = "main"
root_path = "reports"# Identical in both environments:
config = RegistryConfig.from_toml("remote-store.toml")
with Registry(config) as registry:
store = registry.get_store("reports")
store.write_text("monthly/2026-03.csv", report_csv)Configuration supports TOML, YAML, Pydantic BaseSettings, and plain dicts. Credentials are automatically masked in repr()/str() to prevent leakage in logs.
- Platform and internal tooling teams — provide one stable storage interface across environments
- Data engineering teams — pipelines that run against local storage, S3, or SFTP depending on the environment
- Teams that include citizen developers — analysts and domain experts who write Python shouldn't need to learn cloud SDKs just to read and write files
- Anyone tired of writing storage wrappers in every project
- One interface, many backends: local filesystem, S3, SFTP, Azure, in-memory
- Folder-scoped stores: each Store is rooted at a folder — compose layouts with multiple stores or narrow scope with
child() - Swap backends via config: move between environments without changing code
- Streaming by default: large files just work without blowing up memory
- Atomic writes where supported: safer updates for file-producing workflows
- Async support:
remote_store.aioprovidesAsyncStorewith coroutine methods; wrap any sync backend withSyncBackendAdapter - Established libraries underneath:
s3fs,paramiko, etc. do the real work
Zero runtime dependencies, strict mypy, spec-driven test suite. Optional integrations for PyArrow, OpenTelemetry, and more.
- Not a query engine (no SQL, no predicate pushdown)
- Not a table format (no Delta Lake log, no Iceberg manifests)
- Not a filesystem reimplementation (delegates to
s3fs,paramiko,pyarrow, etc. — the libraries you'd pick anyway)
| Backend | Extra | Library | Atomic write | Native glob | move() atomic |
|---|---|---|---|---|---|
| Local filesystem | (built-in) | stdlib | Yes | Yes | Yes* |
| Memory (in-process) | (built-in) | — | Yes | — | Yes |
| HTTP/HTTPS (read-only) | (built-in) | stdlib | — | — | — |
| Amazon S3 / MinIO | remote-store[s3] |
s3fs |
Yes | Yes | — (copy+delete) |
| S3 (PyArrow) | remote-store[s3-pyarrow] |
pyarrow + s3fs |
Yes | Yes | — (copy+delete) |
| SFTP / SSH | remote-store[sftp] |
paramiko |
Yes | — | Yes** |
| Azure Blob / ADLS | remote-store[azure] |
azure-storage-file-datalake |
Yes | Yes | HNS: Yes / non-HNS: — |
| SQL Blob (SQLite, PostgreSQL, ...) | remote-store[sql] |
sqlalchemy |
Yes | Yes | Yes |
| SQL Query (read-only) | remote-store[sql-query] |
sqlalchemy + pyarrow |
-- | -- | -- |
* Same-filesystem only; cross-filesystem falls back to copy+delete.
** Via posix_rename on most OpenSSH servers; falls back to copy+delete.
All backends except HTTP and SQL Query support read, write, delete, list, copy, move, and metadata. HTTP is read-only ({READ, METADATA}). SQL Query is read-only ({READ, LIST, METADATA, GLOB, SEEKABLE_READ}) — it materializes SQL queries to Parquet/CSV/Arrow IPC on read. Glob is supported natively by Local, S3, S3-PyArrow, and Azure; for others use the portable fallback ext.glob.glob_files(). Seekable reads are available on all backends via Store.read_seekable() — zero-overhead on seekable backends, HTTP Range reader on Azure, spool fallback on HTTP. See the capabilities matrix and concurrency guide for full details.
The Store provides 29 methods across read/write, browsing, management, and utility. Key highlights:
store.read_text("path/to/file.txt") # → str
store.write_text("path/to/file.txt", content) # write string
store.read_bytes("path/to/file.csv") # → bytes
store.write("path/to/data.bin", binary_stream) # streaming write
store.list_files("reports/", pattern="*.csv") # iterate FileInfo
store.glob("**/*.parquet") # native glob (capability-gated)
store.exists("path/to/file.txt") # → bool
store.move("old.txt", "new.txt") # move / rename
store.copy("src.txt", "dst.txt") # copy
store.delete("path/to/file.txt") # delete
store.child("subfolder") # scoped child store
store.supports(Capability.ATOMIC_WRITE) # runtime capability check
store.resolve("path/to/file.txt") # resolution plan (introspection)
store.ping() # health checkFor the full method list, see the API reference. All write, move, and copy methods accept overwrite=True to replace existing files.
For S3, reads add 0.7 ms (+15%) over raw boto3; listing is 29x faster (s3fs caching). For Azure, reads add 0.1 ms (+1%); writes add 2.4 ms (+17%). For SFTP, reads add 3.3 ms (+34%); writes add 1.6 ms (+7%). See the performance guide for full comparative benchmarks, methodology, and per-operation breakdowns.
The core library handles storage operations. Extensions add optional capabilities on top — e.g. PyArrow integration, observability, caching, or bulk operations. All live in remote_store.ext; import only what you need.
| Extension | Extra | What it does |
|---|---|---|
| PyArrow adapter | remote-store[arrow] |
Use any Store as a pyarrow.fs.FileSystem — works with Parquet, Pandas, Polars, DuckDB |
| Parquet datasets | remote-store[arrow] |
Managed Parquet datasets with manifests, _SUCCESS markers, and multi-part layouts |
| Batch operations | (none) | Bulk delete, copy, and exists with error aggregation |
| Transfer operations | (none) | Upload, download, and cross-store transfer with progress |
| Observability hooks | (none) | Callback-based instrumentation for logging, metrics, and tracing |
| OpenTelemetry bridge | remote-store[otel] |
Pre-built OTel spans and metrics for Store operations |
| Caching middleware | (none) | TTL-based read cache with automatic invalidation on mutations |
| Stream wrappers | (none) | Composable BinaryIO wrappers for progress tracking and checksums |
| Integrity helpers | (none) | Checksum computation and verification over Store's public API |
| Dagster IO manager | remote-store[dagster] |
IOManager adapter + config-driven Store resource for Dagster pipelines |
Plus glob helpers, partition helpers, YAML and Pydantic config adapters. See the extensions guide for details.
To explore remote-store beyond the Quick Start:
- Examples: self-contained scripts in
examples/covering core operations (file I/O, streaming, atomic writes, error handling, etc.) and backend-specific setups for S3, SFTP, and Azure. - Notebooks: interactive Jupyter notebooks that walk through common workflows step by step.
- Guides: topic-focused walkthroughs in the documentation covering backends, extensions, configuration, and patterns like data lake layouts or health checks.
There are several excellent Python libraries for file I/O across backends. Here is where remote-store sits:
| fsspec | smart_open | cloudpathlib | obstore | remote-store | |
|---|---|---|---|---|---|
| API surface | ~56 methods | open() only |
pathlib-style | ~10 methods | 29 methods |
| Backends | 30+ filesystems | S3, GCS, Az, SFTP | S3, GCS, Azure | S3, GCS, Azure | Local, S3, SFTP, Az, Memory |
| SFTP | via sshfs | Yes | — | — | Built-in |
| Streaming I/O | Yes | Yes | — (downloads) | Bytes-oriented | Yes (BinaryIO) |
| Atomic writes | — | — | — | — | Yes (capability-gated) |
| Async | Yes | — | — | Yes (first-class) | Yes (remote_store.aio) |
| Observability | — | — | — | — | ext.observe + OTel |
| Config model | Per-filesystem | URI-based | Per-client | Per-store kwargs | Immutable Registry |
| Runtime deps | Yes | Minimal | SDK-based | Rust binary | Zero (core) |
Comparison as of March 2026. Method counts and feature sets may change as these libraries evolve.
In short: remote-store is for teams that need more than open() (smart_open) but less than a full filesystem abstraction (fsspec), with streaming, SFTP, atomic writes, observability, and immutable config. Under the hood, it delegates to the same libraries you'd pick anyway (s3fs/boto3, paramiko, Azure SDK, PyArrow).
See CONTRIBUTING.md for the spec-driven development workflow, code style, and how to add new backends.
To report a vulnerability, please use GitHub Security Advisories instead of opening a public issue. See SECURITY.md for details.
MIT
