Skip to content

Commit 2aef611

Browse files
committed
R: Refactor ML tutorial to dedicated section
1 parent 427d701 commit 2aef611

File tree

4 files changed

+53
-38
lines changed

4 files changed

+53
-38
lines changed

docs/integrate/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Power BI <powerbi/index>
6060
prometheus/index
6161
pyviz/index
6262
queryzen/index
63+
r/index
6364
rill/index
6465
risingwave/index
6566
sql-server/index

docs/integrate/r/index.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
(r)=
2+
# R
3+
4+
```{div} .float-right
5+
[![R logo](https://www.r-project.org/Rlogo.png){height=60px loading=lazy}][R]
6+
```
7+
```{div} .clearfix
8+
```
9+
10+
:::{rubric} About
11+
:::
12+
13+
[R] is a free software environment for statistical computing and graphics.
14+
It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
15+
16+
:::{rubric} Learn
17+
:::
18+
19+
::::{grid} 2
20+
21+
:::{grid-item-card} Statistical analysis and visualization on huge datasets
22+
:link: r-tutorial
23+
:link-type: ref
24+
Learn how to create a machine learning pipeline using R and CrateDB.
25+
:::
26+
27+
::::
28+
29+
:::{toctree}
30+
:maxdepth: 1
31+
:hidden:
32+
Tutorial <tutorial>
33+
:::
34+
35+
[R]: https://www.r-project.org/

docs/topic/ml/r.rst renamed to docs/integrate/r/tutorial.rst

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
.. _cratedb-r:
2+
.. _r-tutorial:
23

34
==============
45
CrateDB with R
@@ -7,8 +8,7 @@ CrateDB with R
78
This integration document details how to create a Machine Learning pipeline
89
using R and CrateDB.
910

10-
Abstract
11-
========
11+
.. rubric:: Introduction
1212

1313
Statistical analysis and visualization on huge datasets is a common task many
1414
data scientists face in their day-to-day life. One common tool for doing this
@@ -22,12 +22,7 @@ statistical computations.
2222

2323
This can be accomplished with the `RPostgreSQL`_ library.
2424

25-
26-
Implementation
27-
==============
28-
29-
Set Up
30-
------
25+
.. rubric:: About
3126

3227
For this implementation, we will be using the classic `iris classification
3328
problem`_.
@@ -51,6 +46,8 @@ Using R, we want to:
5146
4. Retrieve our unclassified iris data, enrich the data with a prediction from
5247
our model, and insert the result into our iris table.
5348

49+
Setup
50+
=====
5451

5552
Prerequisites
5653
-------------
@@ -68,8 +65,8 @@ To install these libraries within R or RStudio, we can run:
6865
> install.packages("caret")
6966
7067
71-
CrateDB
72-
-------
68+
Provision data
69+
--------------
7370

7471
First, we need to create a table to hold our training data, as well as our
7572
unclassified irises:
@@ -112,9 +109,11 @@ We can verify that the data has been successfully imported like so:
112109
+----------+
113110
SELECT 1 row in set (0.130 sec)
114111
112+
Usage
113+
=====
115114

116-
Examining The Data
117-
------------------
115+
Explore data
116+
------------
118117

119118
With our data in CrateDB, we can now load it into R or RStudio. Within
120119
R, we should first import our data. We do this by loading the ``RPostgreSQL``
@@ -186,8 +185,8 @@ As we can see, the lengths and widths of sepals and petals are very good
186185
indicators of iris species, with little overlap between them.
187186

188187

189-
Training A Model
190-
----------------
188+
Train model
189+
-----------
191190

192191
Now that we have loaded our data and can visualize it to get a better idea of
193192
what it contains, we can create a machine learning model to predict a species
@@ -287,8 +286,8 @@ misclassified a *versicolor* as a *virginica* and vice versa. We could improve
287286
this by trying out other models, by tweaking our model, or by training on a
288287
larger dataset.
289288

290-
Enriching Data
291-
..............
289+
Enrich data
290+
-----------
292291

293292
Now that we have a model we are happy with, we can use this model to enrich
294293
unclassified iris flowers data.

docs/topic/ml/index.md

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -198,31 +198,11 @@ How to train time series forecasting models using PyCaret and CrateDB.
198198
::::
199199

200200

201-
(iris-r)=
202201
### R
203-
204-
Use R with CrateDB.
205-
206-
:::::{info-card}
207-
::::{grid-item}
208-
:columns: 9
209-
**Statistical analysis and visualization on huge datasets**
210-
211-
Details about how to create a machine learning pipeline
212-
using R and CrateDB.
213-
214-
:::{toctree}
215-
:maxdepth: 1
216-
217-
r
202+
:::{seealso}
203+
Please navigate to the dedicated page about {ref}`r`.
218204
:::
219205

220-
::::
221-
::::{grid-item}
222-
:columns: 3
223-
{tags-primary}`Fundamentals`
224-
::::
225-
:::::
226206

227207

228208
(scikit-learn)=

0 commit comments

Comments
 (0)