1313[ dlt] (data load tool)—think ELT as Python code—is a popular,
1414production-ready Python library for moving data. It loads data from
1515various and often messy data sources into well-structured, live datasets.
16- dlt is used by {ref}` ingestr ` .
16+
17+ dlt supports [ 30+ databases supported by SQLAlchemy] ,
18+ and is also the workhorse behind the {ref}` ingestr ` toolkit.
1719
1820::::{grid}
1921
@@ -36,12 +38,13 @@ dlt is used by {ref}`ingestr`.
3638Prerequisites:
3739Install dlt and the CrateDB destination adapter:
3840``` shell
39- pip install dlt dlt-cratedb
41+ pip install --upgrade dlt-cratedb
4042```
4143
4244Load data from cloud storage or files into CrateDB.
4345``` python
4446import dlt
47+ import dlt_cratedb
4548from dlt.sources.filesystem import filesystem
4649
4750resource = filesystem(
@@ -60,6 +63,7 @@ pipeline.run(resource)
6063
6164Load data from SQL databases into CrateDB.
6265``` python
66+ import dlt_cratedb
6367from dlt.sources.sql_database import sql_database
6468
6569source = sql_database(
@@ -75,32 +79,136 @@ pipeline = dlt.pipeline(
7579pipeline.run(source)
7680```
7781
78- ## Learn
82+ ## Supported features
83+
84+ ### Data loading
85+
86+ Data is loaded into CrateDB using the most efficient method depending on the data source.
87+
88+ - For local files, the ` psycopg2 ` library is used to directly load files into
89+ CrateDB tables using the ` INSERT ` command.
90+ - For files in remote storage like S3 or Azure Blob Storage,
91+ CrateDB data loading functions are used to read the files and insert the data into tables.
92+
93+ ### Datasets
94+
95+ Use ` dataset_name="doc" ` to address CrateDB's default schema ` doc ` .
96+ When addressing other schemas, make sure they contain at least one table. [ ^ create-schema ]
97+
98+ ### File formats
99+
100+ - The [ SQL INSERT file format] is the preferred format for both direct loading and staging.
101+
102+ ### Column types
103+
104+ The ` cratedb ` destination has a few specific deviations from the default SQL destinations.
105+
106+ - CrateDB does not support the ` time ` datatype. Time will be loaded to a ` text ` column.
107+ - CrateDB does not support the ` binary ` datatype. Binary will be loaded to a ` text ` column.
108+ - CrateDB can produce rounding errors under certain conditions when using the ` float/double ` datatype.
109+ Make sure to use the ` decimal ` datatype if you can’t afford to have rounding errors.
110+
111+ ### Column hints
112+
113+ CrateDB supports the following [ column hints] .
114+
115+ - ` primary_key ` - marks the column as part of the primary key. Multiple columns can have this hint to create a composite primary key.
116+
117+ ### File staging
118+
119+ CrateDB supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as file staging destinations.
120+
121+ ` dlt ` will upload CSV or JSONL files to the staging location and use CrateDB data loading functions
122+ to load the data directly from the staged files.
123+
124+ Please refer to the filesystem documentation to learn how to configure credentials for the staging destinations.
125+
126+ - [ AWS S3]
127+ - [ Azure Blob Storage]
128+ - [ Google Storage]
129+
130+ Invoke a pipeline with staging enabled.
131+
132+ ``` python
133+ pipeline = dlt.pipeline(
134+ pipeline_name = ' chess_pipeline' ,
135+ destination = ' cratedb' ,
136+ staging = ' filesystem' , # add this to activate staging
137+ dataset_name = ' chess_data'
138+ )
139+ ```
140+
141+ ### dbt support
142+
143+ Integration with [ dbt] is generally supported via [ dbt-cratedb2] but not tested by us.
144+
145+ ### dlt state sync
146+
147+ The CrateDB destination fully supports [ dlt state sync] .
148+
149+
150+ ## See also
151+
152+ :::{rubric} Examples
153+ :::
79154
80155::::{grid}
81156
157+ :::{grid-item-card} Usage guide: Load API data with dlt
158+ :link : dlt-usage
159+ :link-type: ref
160+ Exercise a canonical ` dlt init ` example with CrateDB.
161+ :::
162+
82163:::{grid-item-card} Examples: Use dlt with CrateDB
83164:link : https://github.com/crate/cratedb-examples/tree/main/framework/dlt
84165:link-type: url
85- Executable code examples that demonstrate how to use dlt with CrateDB.
166+ Executable code examples on GitHub that demonstrate how to use dlt with CrateDB.
167+ :::
168+
169+ ::::
170+
171+ :::{rubric} Resources
86172:::
87173
88- :::{grid-item-card} Adapter: The dlt destination adapter for CrateDB
89- :link : https://github.com/crate/dlt-cratedb
174+ ::::{grid}
175+
176+ :::{grid-item-card} Package: ` dlt-cratedb `
177+ :link : https://pypi.org/project/dlt-cratedb/
90178:link-type: url
91- Based on the dlt PostgreSQL adapter, the package enables you to work
92- with dlt and CrateDB .
179+ The dlt destination adapter for CrateDB is
180+ based on the dlt PostgreSQL adapter .
93181:::
94182
95- :::{grid-item-card} See also: ingestr
183+ :::{grid-item-card} Related: ` ingestr `
96184:link : ingestr
97185:link-type: ref
98- The ingestr data import/export application uses dlt.
186+ The ingestr data import/export application uses dlt as a workhorse .
99187:::
100188
101189::::
102190
103191
192+ :::{toctree}
193+ :maxdepth: 1
194+ :hidden:
195+ Usage <usage >
196+ :::
197+
198+
199+ [ ^ create-schema ] : CrateDB does not support ` CREATE SCHEMA ` yet, see [ CRATEDB-14601] .
200+ This means by default, unless any table exists within a schema, the schema appears
201+ not to exist at all. However, it also can't be created explicitly. Schemas are
202+ currently implicitly created when tables exist in them.
104203
105- [ databases supported by SQLAlchemy ] : https://docs.sqlalchemy.org/en/20/dialects/
204+ [ 30+ databases supported by SQLAlchemy ] : https://dlthub.com/docs/dlt-ecosystem/destinations/sqlalchemy
205+ [ AWS S3 ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#aws-s3
206+ [ Azure Blob Storage ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#azure-blob-storage
207+ [ column hints ] : https://dlthub.com/docs/general-usage/schema#column-hint-rules
208+ [ CRATEDB-14601 ] : https://github.com/crate/crate/issues/14601
209+ [ dbt ] : https://dlthub.com/docs/hub/features/transformations/dbt-transformations
210+ [ dbt-cratedb2 ] : https://pypi.org/project/dbt-cratedb2/
106211[ dlt ] : https://dlthub.com/
212+ [ dlt state sync ] : https://dlthub.com/docs/general-usage/state#syncing-state-with-destination
213+ [ Google Storage ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#google-storage
214+ [ SQL INSERT file format ] : https://dlthub.com/docs/dlt-ecosystem/file-formats/insert-format
0 commit comments