Skip to content

Runner for parallel sklearn benchmark #38

@petrhrobar

Description

@petrhrobar

Hey Vincent,

I found an interesting instance with your Runner object to run functions in parallel.

Let's take an example of an experiment you showed here (https://www.youtube.com/watch?v=qcrR-Hd0LhI&ab_channel=PyData).

Is it possible to run this benchmark in parallel using the Runner object from the memo library? In the documentation, you are giving it all parameters using a grid. However, it does not make sense to put X and y into the grid.

If we run it like this - e.g.:

X = [
    "i really like this post",
    "thanks for that comment",
    "i enjoy this friendly forum",
    "this is a bad post",
    "i dislike this article",
    "this is not well written"
]

y = np.array([1, 1, 1, 0, 0, 0])

settings = grid(
 model = ['lr', 'random_forest', 'ada', "xgb"],
 emb = ['bp', 'ft', 'spacy', 'cv-ngram'],
 train_size = np.arange(1, 4, 1),
 test_size = [1]
)
)

%%time
Runner(backend="threading", n_jobs=4).run(experiment, settings)

we gonna have a problem as we are not giving it an X and y.

I have also tried:

from functools import partial

partial_version = partial(experiment, X=X, y=y)

Runner(backend="threading", n_jobs=4).run(partial_version, settings)

This work however logs the text into the logging JSON file, which is not very efficient.

{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"cv-ngram","train_size":1,"test_size":1,"accuracy_test":0.0,"accuracy_train":1.0,"pred_time":0.06557869911193848,"time_taken":0.68}
{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"spacy","train_size":1,"test_size":1,"accuracy_test":0.0,"accuracy_train":1.0,"pred_time":0.15421748161315918,"time_taken":0.79}
{"X":["i really like this post","thanks for that comment","i enjoy this friendly forum","this is a bad post","i dislike this article","this is not well written"],"y":[1,1,1,0,0,0],"model":"xgb","emb":"spacy","train_size":3,"test_size":1,"accuracy_test":0.0,"accuracy_train":0.6666666666666666,"pred_time":0.11142969131469727,"time_taken":0.83}

is there any other workaround?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions