Conversation
|
Thanks for the PR! @TheaperDeng I'm wondering whether it is better avoiding directly merging notebooks into our repo. We could instead have a README file in this quickstart folder, and link to externally hosted colab notebooks. In this way, users could even directly start running the notebooks by clicking the colab links. WDYT? |
I suggest we store our .ipynb files directly in the repository. Since notebooks are text-based (JSON) rather than binary, they are compatible with version control, even if Git diffs can be a bit cluttered. This is standard practice in many major projects; for example: https://github.com/pytorch/pytorch/tree/main/functorch/docs/source/tutorials To improve accessibility, we can include an "Open in Colab" badge at the top of each notebook using the following snippet: |
Ok. Sounds good. |
|
Another comment is that do we want to keep the package installation command in the notebook? First, we have provided installation guide in our readme so we could just point the users to that guide or repeat the instructions in a text block in the notebook. Second, the current one-line command (!pip install dattri) itself won't get all dependency ready anyway (e.g., pytorch will still be missing). So it might be misleading. |
That is a good point. I suggest we:
The Colab notebook serves as a quick start for users to experience dattri firsthand. A "one-click" (Run All) experience is much better, especially since PyTorch is pre-installed. While we should keep the installation block for convenience, we should clearly state that it is intended only for the Colab environment. |
TheaperDeng
left a comment
There was a problem hiding this comment.
Thanks! Could you add the test script after https://github.com/TRAIS-Lab/dattri/blob/main/.github/workflows/examples_test.yml#L54 to test the two quick start notebook?
A good way to do this is
pip install jupyter nbconvert # add this line to "Install dependencies"
jupyter nbconvert --to script your_notebook.ipynb
python your_notebook.py
Added test scripts to test the two quick start notebooks.
TheaperDeng
left a comment
There was a problem hiding this comment.
Please also check the command that convert ipynb file to py file in the example test.
| "id": "o2mEZymgc0a4" | ||
| }, | ||
| "source": [ | ||
| "Note: The installation block in the notebook is specifically designed for Google Colab and the use cases in this notebook. Standard installation instructions can me found in the [README](https://github.com/TRAIS-Lab/dattri/blob/main/README.md#quick-start)." |
There was a problem hiding this comment.
Standard installation instructions can be found
| "id": "lAg59xgUpsGX" | ||
| }, | ||
| "source": [ | ||
| "Note: The installation block in the notebook is specifically designed for Google Colab and the use cases in this notebook. Standard installation instructions can me found in the [README](https://github.com/TRAIS-Lab/dattri/blob/main/README.md#quick-start)." |
There was a problem hiding this comment.
Standard installation instructions can be found
| "source": [ | ||
| "LDS Score: used to evaluate the overall performance of a data attribution method.\n", | ||
| "\n", | ||
| "* A score near 1 means the attribution method accurately predicts the model's response to data changes\n", |
There was a problem hiding this comment.
LDS close to 1 means ...
| "id": "w7x4js5WvpTN" | ||
| }, | ||
| "source": [ | ||
| "LDS Score: used to evaluate the overall performance of a data attribution method.\n", |
There was a problem hiding this comment.
Linear Datamodeling Score (LDS) is a metric used to evaluate the performance of data attribution methods on the counterfactual estimation task of predicting model behavior given different subsets of the training set.
| "id": "yKAFh2xKeVxo" | ||
| }, | ||
| "source": [ | ||
| "Dictionary to manage and intialize different influence function algorithms with their specific configurations. Each key is a specific arritbution method and the corresponding value is a class constructor with some of its arguments already pre-filled." |
There was a problem hiding this comment.
Dictionary to manage different influence function algorithms with their specific configurations. Each key is a specific attribution method and the corresponding value is a class constructor with default arguments.
| "\n", | ||
| "\n", | ||
| "* Higher influence indicates that a particular data point is problematic for the model.\n", | ||
| "* Mislabeled samples will exert a stronger, often negative, influence on the model's traning process.\n", |
There was a problem hiding this comment.
Please introuduce self attribution here. (what is self attribution; why higher self attribution score indicates noisy label). Check https://arxiv.org/pdf/1703.04730 Sec 5.4 as reference.
|
Please also fix the coding style issues in "Lint with Ruff" test |
Description
Corrected two colab notebooks based on "influence_function_noisy_label.py" and "influence_function_lds.py". The notebooks have descriptive comments and are set up to run in under 5 minutes making their use efficient for new users of the dattri library.