You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrate/dask/usage.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
-
(dask-tutorial)=
1
+
(dask-usage)=
2
2
# Efficient data ingestion with Dask and CrateDB
3
3
4
4
## Introduction
5
-
Dask is a parallel computing library that enables distributed computing for tasks such as data processing and machine learning. In this tutorial, we'll explore how to leverage the power of CrateDB, a distributed SQL database, in conjunction with Dask, to perform efficient data processing and analysis tasks.
5
+
Dask is a parallel computing library that enables distributed computing for tasks such as data processing and machine learning.
6
+
In this usage guide, we'll explore how to leverage the power of CrateDB, a distributed SQL database, in conjunction with Dask, to perform efficient data processing and analysis tasks.
For this tutorial, we chose to use the California housing prices dataset, also available on [Kaggle](https://www.kaggle.com/datasets/camnugent/california-housing-prices?resource=download). This dataset is a popular dataset for regression tasks, consisting of median house values in census tracts in California, making it an excellent starting point for implementing basic machine learning algorithms.
24
+
For this usage guide, we chose to use the California housing prices dataset, also available on [Kaggle](https://www.kaggle.com/datasets/camnugent/california-housing-prices?resource=download). This dataset is a popular dataset for regression tasks, consisting of median house values in census tracts in California, making it an excellent starting point for implementing basic machine learning algorithms.
24
25
25
26
Before importing data, create a california_housing table in CrateDB:
26
27
@@ -190,6 +191,6 @@ On an M1 machine with 16 GB of RAM, the entire process of loading the 1.5 millio
190
191
191
192
## Conclusions
192
193
193
-
In this tutorial, we've covered the essentials of using CrateDB with Dask for efficient data processing and analysis. By combining the distributed capabilities of CrateDB with the parallel computing power of Dask, you can unlock the potential to handle large-scale datasets, perform complex queries, and leverage advanced analytics techniques.
194
+
In this usage guide, we've covered the essentials of using CrateDB with Dask for efficient data processing and analysis. By combining the distributed capabilities of CrateDB with the parallel computing power of Dask, you can unlock the potential to handle large-scale datasets, perform complex queries, and leverage advanced analytics techniques.
194
195
195
196
To learn more about updates, features, and other questions you might have, join our [CrateDB community](https://community.cratedb.com/).
0 commit comments