Skip to content

Quick Start Colab notebooks#229

Open
carolinef35 wants to merge 7 commits intoTRAIS-Lab:mainfrom
carolinef35:colab_examples
Open

Quick Start Colab notebooks#229
carolinef35 wants to merge 7 commits intoTRAIS-Lab:mainfrom
carolinef35:colab_examples

Conversation

@carolinef35
Copy link

Description

Corrected two colab notebooks based on "influence_function_noisy_label.py" and "influence_function_lds.py". The notebooks have descriptive comments and are set up to run in under 5 minutes making their use efficient for new users of the dattri library.

@TheaperDeng TheaperDeng changed the title Corrected Colab examples Quick start colab notebook Dec 19, 2025
@TheaperDeng TheaperDeng changed the title Quick start colab notebook Quick Start Colab notebooks Dec 19, 2025
@jiaqima
Copy link
Contributor

jiaqima commented Dec 22, 2025

Thanks for the PR!

@TheaperDeng I'm wondering whether it is better avoiding directly merging notebooks into our repo. We could instead have a README file in this quickstart folder, and link to externally hosted colab notebooks. In this way, users could even directly start running the notebooks by clicking the colab links.

WDYT?

@TheaperDeng
Copy link
Collaborator

Thanks for the PR!

@TheaperDeng I'm wondering whether it is better avoiding directly merging notebooks into our repo. We could instead have a README file in this quickstart folder, and link to externally hosted colab notebooks. In this way, users could even directly start running the notebooks by clicking the colab links.

WDYT?

I suggest we store our .ipynb files directly in the repository. Since notebooks are text-based (JSON) rather than binary, they are compatible with version control, even if Git diffs can be a bit cluttered. This is standard practice in many major projects; for example: https://github.com/pytorch/pytorch/tree/main/functorch/docs/source/tutorials

To improve accessibility, we can include an "Open in Colab" badge at the top of each notebook using the following snippet:
HTML (This is just an example code block).

Open In Colab (this is just an example)

@jiaqima
Copy link
Contributor

jiaqima commented Dec 22, 2025

Thanks for the PR!
@TheaperDeng I'm wondering whether it is better avoiding directly merging notebooks into our repo. We could instead have a README file in this quickstart folder, and link to externally hosted colab notebooks. In this way, users could even directly start running the notebooks by clicking the colab links.
WDYT?

I suggest we store our .ipynb files directly in the repository. Since notebooks are text-based (JSON) rather than binary, they are compatible with version control, even if Git diffs can be a bit cluttered. This is standard practice in many major projects; for example: https://github.com/pytorch/pytorch/tree/main/functorch/docs/source/tutorials

To improve accessibility, we can include an "Open in Colab" badge at the top of each notebook using the following snippet: HTML (This is just an example code block).

Open In Colab (this is just an example)

Ok. Sounds good.

@jiaqima
Copy link
Contributor

jiaqima commented Dec 22, 2025

Another comment is that do we want to keep the package installation command in the notebook? First, we have provided installation guide in our readme so we could just point the users to that guide or repeat the instructions in a text block in the notebook. Second, the current one-line command (!pip install dattri) itself won't get all dependency ready anyway (e.g., pytorch will still be missing). So it might be misleading.

@TheaperDeng
Copy link
Collaborator

Another comment is that do we want to keep the package installation command in the notebook? First, we have provided installation guide in our readme so we could just point the users to that guide or repeat the instructions in a text block in the notebook. Second, the current one-line command (!pip install dattri) itself won't get all dependency ready anyway (e.g., pytorch will still be missing). So it might be misleading.

That is a good point. I suggest we:

  • Retain the installation block in the notebook, but clarify that it is specifically designed for Google Colab and the use cases in this notebook.
  • Direct users to the README for the standard installation guide with a link.

The Colab notebook serves as a quick start for users to experience dattri firsthand. A "one-click" (Run All) experience is much better, especially since PyTorch is pre-installed. While we should keep the installation block for convenience, we should clearly state that it is intended only for the Colab environment.

@jiaqima jiaqima closed this Jan 19, 2026
@jiaqima jiaqima reopened this Jan 19, 2026
Copy link
Collaborator

@TheaperDeng TheaperDeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you add the test script after https://github.com/TRAIS-Lab/dattri/blob/main/.github/workflows/examples_test.yml#L54 to test the two quick start notebook?

A good way to do this is

pip install jupyter nbconvert  # add this line to "Install dependencies"

jupyter nbconvert --to script your_notebook.ipynb  
python your_notebook.py

Copy link
Collaborator

@TheaperDeng TheaperDeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check the command that convert ipynb file to py file in the example test.

"id": "o2mEZymgc0a4"
},
"source": [
"Note: The installation block in the notebook is specifically designed for Google Colab and the use cases in this notebook. Standard installation instructions can me found in the [README](https://github.com/TRAIS-Lab/dattri/blob/main/README.md#quick-start)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard installation instructions can be found

"id": "lAg59xgUpsGX"
},
"source": [
"Note: The installation block in the notebook is specifically designed for Google Colab and the use cases in this notebook. Standard installation instructions can me found in the [README](https://github.com/TRAIS-Lab/dattri/blob/main/README.md#quick-start)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard installation instructions can be found

"source": [
"LDS Score: used to evaluate the overall performance of a data attribution method.\n",
"\n",
"* A score near 1 means the attribution method accurately predicts the model's response to data changes\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LDS close to 1 means ...

"id": "w7x4js5WvpTN"
},
"source": [
"LDS Score: used to evaluate the overall performance of a data attribution method.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linear Datamodeling Score (LDS) is a metric used to evaluate the performance of data attribution methods on the counterfactual estimation task of predicting model behavior given different subsets of the training set.

"id": "yKAFh2xKeVxo"
},
"source": [
"Dictionary to manage and intialize different influence function algorithms with their specific configurations. Each key is a specific arritbution method and the corresponding value is a class constructor with some of its arguments already pre-filled."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary to manage different influence function algorithms with their specific configurations. Each key is a specific attribution method and the corresponding value is a class constructor with default arguments.

"\n",
"\n",
"* Higher influence indicates that a particular data point is problematic for the model.\n",
"* Mislabeled samples will exert a stronger, often negative, influence on the model's traning process.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please introuduce self attribution here. (what is self attribution; why higher self attribution score indicates noisy label). Check https://arxiv.org/pdf/1703.04730 Sec 5.4 as reference.

@TheaperDeng
Copy link
Collaborator

Please also fix the coding style issues in "Lint with Ruff" test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants