Runner for parallel sklearn benchmark

Hey Vincent,

I found an interesting instance with your Runner object to run functions in parallel.

Let's take an example of an experiment you showed here (https://www.youtube.com/watch?v=qcrR-Hd0LhI&ab_channel=PyData).

Is it possible to run this benchmark in parallel using the Runner object from the memo library? In the documentation, you are giving it all parameters using a grid. However, it does not make sense to put X and y into the grid. 

If we run it like this - e.g.:
```
X = [
    "i really like this post",
    "thanks for that comment",
    "i enjoy this friendly forum",
    "this is a bad post",
    "i dislike this article",
    "this is not well written"
]

y = np.array([1, 1, 1, 0, 0, 0])

settings = grid(
 model = ['lr', 'random_forest', 'ada', "xgb"],
 emb = ['bp', 'ft', 'spacy', 'cv-ngram'],
 train_size = np.arange(1, 4, 1),
 test_size = [1]
)
)

%%time
Runner(backend="threading", n_jobs=4).run(experiment, settings)

```
we gonna have a problem as we are not giving it an X and y.

I have also tried:
```python

from functools import partial

partial_version = partial(experiment, X=X, y=y)

Runner(backend="threading", n_jobs=4).run(partial_version, settings)
```

This work however logs the text into the logging JSON file, which is not very efficient.

```bash
{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"cv-ngram","train_size":1,"test_size":1,"accuracy_test":0.0,"accuracy_train":1.0,"pred_time":0.06557869911193848,"time_taken":0.68}
{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"spacy","train_size":1,"test_size":1,"accuracy_test":0.0,"accuracy_train":1.0,"pred_time":0.15421748161315918,"time_taken":0.79}
{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"spacy","train_size":3,"test_size":1,"accuracy_test":0.0,"accuracy_train":0.6666666666666666,"pred_time":0.11142969131469727,"time_taken":0.83}


```

is there any other workaround?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runner for parallel sklearn benchmark #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Runner for parallel sklearn benchmark #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions