Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

Add support for custom environment with Jupyter #93

@mgoeminne

Description

@mgoeminne

Is your feature request related to a problem? Please describe.

No, it's a suggestion for improving the functional coverage of FADI.

Describe the solution you'd like

A data scientist can use Jupyter Hub for iteratively explore data sets and provide technical solutions to various problems.

In order to do so, she frequently has to change the Jupyter environment of her notebooks in order to include some specific package, to test alternative processing frameworks, etc. Typically, each project / use case can have one or many dedicated environments with daily or weekly undergoing changes.

FADI should foster such a dynamic adaptation of the data scientist's needs, by providing a way to efficiently manage extra dependencies.

For instance, a Web application could be provided for specifying, adapting or copying the environment right before instantiating Jupyter Hub. An interesting feature would be the possibility to inherit environments, and to share them among stakeholders.

Describe alternatives you've considered

The current recommanded way to do it is to adapt the Helm view file of the underlying Kubernetes cluster, and to restart the appropriate services. This is not really acceptable for a end user.

An alternative consists in specifying the additional dependencies in "conda install"-like commands at the beginning of the notebooks, but that makes these specifications notebooks-specific. It also implies the additional dependencies must be satisfied each time the notebook is loaded. Environment variables/secrets must be set in the notebooks, which raises securities issues. Etc, etc.

Additional context

Please have a look on how Domino provides this features. Basically, a Docker file can be edited by the finale user for personalizing the environment.

A nice optimization would consist in caching popular / recent / frequently used environments, in such a way running notebooks using these environments would be faster.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions