diff --git a/docs/ingest/etl/index.md b/docs/ingest/etl/index.md index 0d3ff371..049b071d 100644 --- a/docs/ingest/etl/index.md +++ b/docs/ingest/etl/index.md @@ -38,12 +38,21 @@ outlines how to use them effectively. Additionally, see support for {ref}`cdc` s dbt is an SQL-first platform for transforming data in data warehouses using Python and SQL. The data abstraction layer provided by dbt-core allows the decoupling of the models on which reports and dashboards rely from the source data. +- {ref}`dlt` + + dlt is a popular production-ready Python library for moving data: + Think ELT as Python code. - {ref}`flink` Apache Flink is a programming framework and distributed processing engine for stateful computations over unbounded and bounded data streams, written in Java. +- {ref}`ingestr` + + ingestr is a command-line application that allows copying data from any + source into any destination database. + - {ref}`kestra` Kestra is an open-source workflow automation and orchestration toolkit with a rich @@ -230,6 +239,7 @@ Load data from datasets and open table formats. - {ref}`aws-lambda` - {ref}`azure-functions` - {ref}`dbt` +- {ref}`dlt` - {ref}`dms` - {ref}`dynamodb` - {ref}`estuary` @@ -237,6 +247,7 @@ Load data from datasets and open table formats. - {ref}`hop` - {ref}`iceberg` - {ref}`influxdb` +- {ref}`ingestr` - {ref}`kafka` - {ref}`kestra` - {ref}`kinesis` diff --git a/docs/integrate/dlt/index.md b/docs/integrate/dlt/index.md new file mode 100644 index 00000000..68055756 --- /dev/null +++ b/docs/integrate/dlt/index.md @@ -0,0 +1,106 @@ +(dlt)= +# dlt + +```{div} .float-right .text-right +![dlt logo](https://cdn.sanity.io/images/nsq559ov/production/7f85e56e715b847c5519848b7198db73f793448d-82x25.svg?w=2000&auto=format){loading=lazy}[dlt] +

+ + CI status: dlt +``` +```{div} .clearfix +``` + +[dlt] (data load tool)—think ELT as Python code—is a popular, +production-ready Python library for moving data. It loads data from +various and often messy data sources into well-structured, live datasets. +dlt is used by {ref}`ingestr`. + +::::{grid} + +:::{grid-item} +- **Just code**: no need to use any backends or containers. + +- **Platform agnostic**: Does not replace your data platform, deployments, or security + models. Simply import dlt in your favorite code editor, or add it to your Jupyter + Notebook. + +- **Versatile**: You can load data from any source that produces Python data structures, + including APIs, files, databases, and more. +::: + +:::: + + +## Synopsis + +Prerequisites: +Install dlt and the CrateDB destination adapter: +```shell +pip install dlt dlt-cratedb +``` + +Load data from cloud storage or files into CrateDB. +```python +import dlt +from dlt.sources.filesystem import filesystem + +resource = filesystem( + bucket_url="s3://example-bucket", + file_glob="*.csv" +) + +pipeline = dlt.pipeline( + pipeline_name="filesystem_example", + destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), + dataset_name="doc", +) + +pipeline.run(resource) +``` + +Load data from SQL databases into CrateDB. +```python +from dlt.sources.sql_database import sql_database + +source = sql_database( + "mysql+pymysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam" +) + +pipeline = dlt.pipeline( + pipeline_name="sql_database_example", + destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), + dataset_name="doc", +) + +pipeline.run(source) +``` + +## Learn + +::::{grid} + +:::{grid-item-card} Examples: Use dlt with CrateDB +:link: https://github.com/crate/cratedb-examples/tree/main/framework/dlt +:link-type: url +Executable code examples that demonstrate how to use dlt with CrateDB. +::: + +:::{grid-item-card} Adapter: The dlt destination adapter for CrateDB +:link: https://github.com/crate/dlt-cratedb +:link-type: url +Based on the dlt PostgreSQL adapter, the package enables you to work +with dlt and CrateDB. +::: + +:::{grid-item-card} See also: ingestr +:link: ingestr +:link-type: ref +The ingestr data import/export application uses dlt. +::: + +:::: + + + +[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ +[dlt]: https://dlthub.com/ diff --git a/docs/integrate/index.md b/docs/integrate/index.md index 5baddec1..96152972 100644 --- a/docs/integrate/index.md +++ b/docs/integrate/index.md @@ -26,6 +26,7 @@ dbeaver/index dbt/index debezium/index django/index +dlt/index dms/index dynamodb/index estuary/index @@ -36,6 +37,7 @@ grafana/index hop/index iceberg/index influxdb/index +ingestr/index kafka/index kestra/index kinesis/index diff --git a/docs/integrate/ingestr/index.md b/docs/integrate/ingestr/index.md new file mode 100644 index 00000000..9d319c4b --- /dev/null +++ b/docs/integrate/ingestr/index.md @@ -0,0 +1,134 @@ +(ingestr)= +# ingestr + +```{div} .float-right .text-right + + CI status: ingestr +``` +```{div} .clearfix +``` + +[ingestr] is a command-line application for copying data from any source +to any destination database. It supports CrateDB on both the source and +destination sides. ingestr builds on {ref}`dlt`. + +::::{grid} + +:::{grid-item} +- **Single command**: ingestr allows copying & ingesting data from any source + to any destination with a single command. + +- **Many sources & destinations**: ingestr supports all common source and + destination databases. + +- **Incremental Loading**: ingestr supports both full-refresh and + incremental loading modes. +::: + +:::{grid-item} +![ingestr in a nutshell](https://github.com/bruin-data/ingestr/blob/main/resources/demo.gif?raw=true){loading=lazy} +::: + +:::: + + +## Synopsis + +Invoke ingestr for exporting data from CrateDB. +```shell +ingestr ingest \ + --source-uri 'crate://crate@localhost:4200/' \ + --source-table 'sys.summits' \ + --dest-uri 'duckdb:///cratedb.duckdb' \ + --dest-table 'dest.summits' +``` + +Invoke ingestr for loading data into CrateDB. +```shell +ingestr ingest \ + --source-uri 'csv://input.csv' \ + --source-table 'sample' \ + --dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \ + --dest-table 'doc.sample' +``` + +:::{note} +Please note there are subtle differences between the CrateDB source and target URLs. +While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect, +`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL +with a protocol schema designating CrateDB. The source adapter uses +CrateDB's HTTP protocol, while the destination adapter uses CrateDB's +PostgreSQL interface. +::: + + +## Coverage + +ingestr supports migration from 20-plus databases, data platforms, and analytics +engines, including all [databases supported by SQLAlchemy]. + +:::{rubric} Traditional Databases +::: +CockroachDB, CrateDB, Firebird, HyperSQL (hsqldb), IBM DB2 and Informix, +Microsoft Access, Microsoft SQL Server, MonetDB, MySQL and MariaDB, +OpenGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, +SQLite, TiDB, YDB, YugabyteDB + +:::{rubric} Cloud Data Warehouses & Analytics +::: +Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, +EXASOL DB, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server, +Impala, Kinetica, Rockset, Snowflake, Teradata Vantage + +:::{rubric} Specialized Data Stores +::: +Apache Drill, Apache Druid, Apache Hive and Presto, Clickhouse, Elasticsearch, +InfluxDB, MongoDB, OpenSearch + +:::{rubric} Message Brokers +::: +Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ) + +:::{rubric} File Formats +::: +CSV, JSONL/NDJSON, Parquet + +:::{rubric} Object Stores +::: +Amazon S3, Google Cloud Storage + +:::{rubric} SaaS Platforms & Services +::: +Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot, +Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc. + + +## Learn + +::::{grid} + +:::{grid-item-card} Documentation: ingestr CrateDB source +:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source +:link-type: url +Documentation about the CrateDB source adapter for ingestr. +::: + +:::{grid-item-card} Documentation: ingestr CrateDB destination +:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#destination +:link-type: url +Documentation about the CrateDB destination adapter for ingestr. +::: + +:::{grid-item-card} Examples: Use ingestr with CrateDB +:link: https://github.com/crate/cratedb-examples/tree/main/application/ingestr +:link-type: url +Executable code examples / rig that demonstrates how to use ingestr to +load data from Kafka to CrateDB. +::: + +:::: + + + +[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ +[ingestr]: https://bruin-data.github.io/ingestr/