Skip to content

Commit a027a1c

Browse files
committed
Connect: Migrate pages from crate-clients-tools, to be phased out soon
1 parent fead459 commit a027a1c

File tree

16 files changed

+1403
-199
lines changed

16 files changed

+1403
-199
lines changed

docs/connect/cli.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
(cli)=
2+
(connect-cli)=
3+
# Using command-line programs with CrateDB
4+
5+
This section provides a quick overview about a few CLI programs, and how to
6+
use them for connecting to CrateDB clusters. We recommend to use crash,
7+
psql, http ([HTTPie]), or curl.
8+
9+
You can use them to quickly validate HTTP and PostgreSQL connectivity to your
10+
database cluster, or to conduct basic scripting.
11+
12+
Before running the command-line snippets outlined below, please use the correct
13+
settings instead of the placeholder tokens `<hostname>`, `<username>` and
14+
`<password>`.
15+
16+
When using CrateDB Cloud, `<hostname>` will be something like
17+
`<clustername>.{aks1,eks1}.region.{azure,aws}.cratedb.net`.
18+
19+
20+
(crash)=
21+
## crash
22+
23+
```{div}
24+
:style: "float: right"
25+
![image](https://cratedb.com/docs/crate/crash/en/latest/_images/query.png){w=240px}
26+
```
27+
28+
The **CrateDB Shell** is an interactive command-line interface (CLI) tool for
29+
working with CrateDB. For more information, see the documentation about [crash].
30+
31+
```{div}
32+
:style: "clear: both"
33+
```
34+
35+
::::{tab-set}
36+
37+
:::{tab-item} CrateDB and CrateDB Cloud
38+
:sync: server
39+
40+
```{code-block} shell
41+
CRATEPW=<password> \
42+
crash --hosts 'https://<hostname>:4200' --username '<username>' \
43+
--command "SELECT 42.42;"
44+
```
45+
:::
46+
47+
:::{tab-item} CrateDB on localhost
48+
:sync: localhost
49+
50+
```{code-block} shell
51+
# No authentication.
52+
crash --command "SELECT 42.42;"
53+
54+
```
55+
:::
56+
57+
::::
58+
59+
60+
(psql)=
61+
## psql
62+
63+
```{div}
64+
:style: "float: right"
65+
![image](https://github.com/crate/crate-clients-tools/assets/453543/8f0a0e06-87f6-467f-be2d-b38121afbafa){w=240px}
66+
```
67+
68+
**psql** is a terminal-based front-end to PostgreSQL. It enables you to type in
69+
queries interactively, issue them to PostgreSQL, and see the query results.
70+
For more information, see the documentation about [psql].
71+
72+
```{div}
73+
:style: "clear: both"
74+
```
75+
76+
::::{tab-set}
77+
78+
:::{tab-item} CrateDB and CrateDB Cloud
79+
:sync: server
80+
81+
```{code-block} shell
82+
PGUSER=<username> PGPASSWORD=<password> \
83+
psql postgresql://<hostname>/crate --command "SELECT 42.42;"
84+
```
85+
:::
86+
87+
:::{tab-item} CrateDB on localhost
88+
:sync: localhost
89+
90+
```{code-block} shell
91+
# No authentication.
92+
psql postgresql://crate@localhost:5432/crate --command "SELECT 42.42;"
93+
```
94+
:::
95+
96+
::::
97+
98+
99+
(httpie)=
100+
## HTTPie
101+
102+
```{div}
103+
:style: "float: right"
104+
![image](https://github.com/crate/crate-clients-tools/assets/453543/f5a2916d-3730-4901-99cf-b88b9af03329){w=240px}
105+
```
106+
107+
The **HTTPie CLI** is a modern, user-friendly command-line HTTP client with
108+
JSON support, colors, sessions, downloads, plugins & more.
109+
For more information, see the documentation about [HTTPie].
110+
111+
```{div}
112+
:style: "clear: both"
113+
```
114+
115+
::::{tab-set}
116+
117+
:::{tab-item} CrateDB and CrateDB Cloud
118+
:sync: server
119+
120+
```{code-block} shell
121+
http "https://<username>:<password>@<hostname>:4200/_sql?pretty" \
122+
stmt="SELECT 42.42;"
123+
```
124+
:::
125+
126+
:::{tab-item} CrateDB on localhost
127+
:sync: localhost
128+
129+
```{code-block} shell
130+
http "localhost:4200/_sql?pretty" \
131+
stmt="SELECT 42.42;"
132+
```
133+
:::
134+
135+
::::
136+
137+
138+
(curl)=
139+
## curl
140+
141+
```{div}
142+
:style: "float: right"
143+
![image](https://github.com/crate/crate-clients-tools/assets/453543/318b0819-a0d4-4112-a320-23852263362c){w=240px}
144+
```
145+
146+
The venerable **curl** is the ubiquitous command line tool and library for transferring
147+
data with URLs. For more information, see the documentation about [curl].
148+
149+
This example combines it with [jq], a lightweight and flexible command-line JSON processor.
150+
151+
```{div}
152+
:style: "clear: both"
153+
```
154+
155+
::::{tab-set}
156+
157+
:::{tab-item} CrateDB and CrateDB Cloud
158+
:sync: server
159+
160+
```{code-block} shell
161+
echo '{"stmt": "SELECT 42.42;"}' \
162+
| curl "https://<username>:<password>@<hostname>:4200/_sql?pretty" --silent --data @- | jq
163+
```
164+
:::
165+
166+
:::{tab-item} CrateDB on localhost
167+
:sync: localhost
168+
169+
```{code-block} shell
170+
echo '{"stmt": "SELECT 42.42;"}' \
171+
| curl "localhost:4200/_sql?pretty" --silent --data @- | jq
172+
```
173+
:::
174+
175+
::::
176+
177+
178+
179+
[curl]: https://curl.se/
180+
[crash]: inv:crate-crash:*:label#index
181+
[HTTPie]: https://httpie.io/
182+
[jq]: https://jqlang.github.io/jq/
183+
[psql]: https://www.postgresql.org/docs/current/app-psql.html

docs/connect/df.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
(df)=
2+
(dataframes)=
3+
# CrateDB and DataFrame libraries
4+
5+
Data frame libraries and frameworks which can
6+
be used together with CrateDB.
7+
8+
9+
:::::{grid} 1 2 2 2
10+
:margin: 4 4 0 0
11+
:padding: 0
12+
:gutter: 2
13+
14+
::::{grid-item-card} {material-outlined}`lightbulb;2em` Tutorials
15+
:link: guide:dataframes
16+
:link-type: ref
17+
Learn how to use CrateDB together with popular open-source data frame
18+
libraries, on behalf of hands-on tutorials and code examples.
19+
+++
20+
{tag-info}`Dask` {tag-info}`pandas` {tag-info}`Polars`
21+
::::
22+
23+
::::{grid-item-card} {material-outlined}`read_more;2em` SQLAlchemy
24+
CrateDB's SQLAlchemy dialect implementation provides fundamental infrastructure
25+
to integrations with Dask, pandas, and Polars.
26+
+++
27+
[ORM Guides](inv:guide#orm)
28+
{ref}`ORM Catalog <orm>`
29+
::::
30+
31+
:::::
32+
33+
34+
(dask)=
35+
## Dask
36+
37+
[Dask] is a parallel computing library for analytics with task scheduling.
38+
It is built on top of the Python programming language, making it easy to scale
39+
the Python libraries that you know and love, like NumPy, pandas, and scikit-learn.
40+
41+
```{div}
42+
:style: "float: right"
43+
[![](https://github.com/crate/crate-clients-tools/assets/453543/99bd2234-c501-479b-ade7-bcc2bfc1f288){w=180px}](https://www.dask.org/)
44+
```
45+
46+
- [Dask DataFrames] help you process large tabular data by parallelizing pandas,
47+
either on your laptop for larger-than-memory computing, or on a distributed
48+
cluster of computers.
49+
50+
- [Dask Futures], implementing a real-time task framework, allow you to scale
51+
generic Python workflows across a Dask cluster with minimal code changes,
52+
by extending Python's `concurrent.futures` interface.
53+
54+
```{div}
55+
:style: "clear: both"
56+
```
57+
58+
59+
(pandas)=
60+
## pandas
61+
62+
```{div}
63+
:style: "float: right"
64+
[![](https://pandas.pydata.org/static/img/pandas.svg){w=180px}](https://pandas.pydata.org/)
65+
```
66+
67+
[pandas] is a fast, powerful, flexible, and easy to use open source data analysis
68+
and manipulation tool, built on top of the Python programming language.
69+
70+
Pandas (stylized as pandas) is a software library written for the Python programming
71+
language for data manipulation and analysis. In particular, it offers data structures
72+
and operations for manipulating numerical tables and time series.
73+
74+
:::{rubric} Data Model
75+
:::
76+
- Pandas is built around data structures called Series and DataFrames. Data for these
77+
collections can be imported from various file formats such as comma-separated values,
78+
JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
79+
- A Series is a 1-dimensional data structure built on top of NumPy's array.
80+
- Pandas includes support for time series, such as the ability to interpolate values
81+
and filter using a range of timestamps.
82+
- By default, a Pandas index is a series of integers ascending from 0, similar to the
83+
indices of Python arrays. However, indices can use any NumPy data type, including
84+
floating point, timestamps, or strings.
85+
- Pandas supports hierarchical indices with multiple values per data point. An index
86+
with this structure, called a "MultiIndex", allows a single DataFrame to represent
87+
multiple dimensions, similar to a pivot table in Microsoft Excel. Each level of a
88+
MultiIndex can be given a unique name.
89+
90+
```{div}
91+
:style: "clear: both"
92+
```
93+
94+
95+
(polars)=
96+
## Polars
97+
98+
```{div}
99+
:style: "float: right; margin-left: 0.5em"
100+
[![](https://github.com/pola-rs/polars-static/raw/master/logos/polars-logo-dark.svg){w=180px}](https://pola.rs/)
101+
```
102+
103+
[Polars] is a blazingly fast DataFrames library with language bindings for
104+
Rust, Python, Node.js, R, and SQL. Polars is powered by a multithreaded,
105+
vectorized query engine, it is open source, and written in Rust.
106+
107+
- **Fast:** Written from scratch in Rust and with performance in mind,
108+
designed close to the machine, and without external dependencies.
109+
110+
- **I/O:** First class support for all common data storage layers: local,
111+
cloud storage & databases.
112+
113+
- **Intuitive API:** Write your queries the way they were intended. Polars,
114+
internally, will determine the most efficient way to execute using its query
115+
optimizer. Polars' expressions are intuitive and empower you to write
116+
readable and performant code at the same time.
117+
118+
- **Out of Core:** The streaming API allows you to process your results without
119+
requiring all your data to be in memory at the same time.
120+
121+
- **Parallel:** Polars' multi-threaded query engine utilises the power of your
122+
machine by dividing the workload among the available CPU cores without any
123+
additional configuration.
124+
125+
- **Vectorized Query Engine:** Uses [Apache Arrow], a columnar data format, to
126+
process your queries in a vectorized manner and SIMD to optimize CPU usage.
127+
This enables cache-coherent algorithms and high performance on modern processors.
128+
129+
- **Open Source:** Polars is and always will be open source. Driven by an active
130+
community of developers. Everyone is encouraged to add new features and contribute.
131+
It is free to use under the MIT license.
132+
133+
:::{rubric} Data formats
134+
:::
135+
136+
Polars supports reading and writing to many common data formats.
137+
This allows you to easily integrate Polars into your existing data stack.
138+
139+
- Text: CSV & JSON
140+
- Binary: Parquet, Delta Lake, AVRO & Excel
141+
- IPC: Feather, Arrow
142+
- Databases: MySQL, Postgres, SQL Server, Sqlite, Redshift & Oracle
143+
- Cloud Storage: S3, Azure Blob & Azure File
144+
145+
```{div}
146+
:style: "clear: both"
147+
```
148+
149+
150+
## Examples
151+
152+
How to use CrateDB together with popular open-source dataframe libraries.
153+
154+
## Dask
155+
- [Guide to efficient data ingestion to CrateDB with pandas and Dask]
156+
- [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]
157+
- [Import weather data using Dask]
158+
- [Dask code examples]
159+
160+
## pandas
161+
- [Guide to efficient data ingestion to CrateDB with pandas]
162+
- [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]
163+
- [pandas code examples]
164+
165+
## Polars
166+
- [Polars code examples]
167+
168+
169+
170+
[Apache Arrow]: https://arrow.apache.org/
171+
[Dask]: https://www.dask.org/
172+
[Dask DataFrames]: https://docs.dask.org/en/latest/dataframe.html
173+
[Dask Futures]: https://docs.dask.org/en/latest/futures.html
174+
[pandas]: https://pandas.pydata.org/
175+
[Polars]: https://pola.rs/
176+
177+
[Dask code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/dask
178+
[Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html
179+
[Guide to efficient data ingestion to CrateDB with pandas]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas/1541
180+
[Guide to efficient data ingestion to CrateDB with pandas and Dask]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas-and-dask/1482
181+
[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb
182+
[Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161
183+
[pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas
184+
[Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars

0 commit comments

Comments
 (0)