agent-library-usage

This repository contains the artifacts and full results for the research paper A Study of Library Usage in Agent-Authored Pull Requests, accepted at the 23rd International Conference on Mining Software Repositories (MSR '26), April 13-14, 2026, Rio de Janeiro, Brazil, and available on arXiv.

abstract

Coding agents are becoming increasingly capable of completing end-to-end software engineering workflows that previously required a human developer, including raising pull requests (PRs) to propose their changes. However, we still know little about how these agents use libraries when generating code, a core part of real-world software development. To fill this gap, we study 26,760 agent-authored PRs from the AIDev dataset to examine three questions: how often do agents import libraries, how often do they introduce new dependencies (and with what versioning), and which specific libraries do they choose? We find that agents often import libraries (29.5% of PRs) but rarely add new dependencies (1.3% of PRs); and when they do, they follow strong versioning practices (75.0% specify a version), an improvement on direct LLM usage where versions are rarely mentioned. Generally, agents draw from a surprisingly diverse set of external libraries, contrasting with the limited "library preferences" seen in prior non-agentic LLM studies. Our results offer an early empirical view into how AI coding agents interact with today's software ecosystems.

dataset

This work is part of the MSR 2026 Mining Challenge, analysing the AIDev dataset, the first large-scale, openly available dataset of agent-authored pull requests from real-world GitHub repositories. The dataset was introduced by Li et al. and captures the emergence of autonomous coding agents in software engineering, providing a unique opportunity to study how AI teammates interact with real-world codebases and software ecosystems.

Dataset Version: This research utilises AIDev dataset revision eee0408a277826d88fc0ca5fa07d2fc325c96af1 (November 2025 snapshot).

installation

The code requires Python 3.11 or later to run. Ensure you have it installed with the command below, otherwise download and install it from here.

python --version

Now clone the repository code:

git clone https://github.com/itsluketwist/agent-library-usage

Once cloned, install the requirements locally in a virtual environment:

python -m venv .venv

. .venv/bin/activate

pip install .

usage

After installation, all analysis is run through Jupyter notebooks in the notebooks/ directory. Run the notebooks in order:

01_download_dataset.ipynb - Download and prepare the AIDev dataset
02_explore_languages.ipynb - Identify programming languages in the dataset
03_analyze_library_usage.ipynb - Analyse library usage patterns across all languages
04_generate_latex_tables.ipynb - Generate LaTeX tables for the research paper

Each notebook is self-contained and documents its purpose and outputs.

structure

data/ - Downloaded AIDev dataset files (parquet format, git-ignored)
output/ - Generated analysis results:
- *_library_usage.json - Per-language library usage data
- aggregated_statistics.json - Summary statistics across all languages
- latex_tables.tex - Generated LaTeX tables for the paper
src/ - Main project code:
- extractors/ - Language-specific library extractors:
  - base.py - Base extractor interface
  - python.py - Python import and requirements.txt extraction
  - javascript.py - JavaScript/TypeScript import and package.json extraction
  - go.py - Go import and go.mod extraction
  - csharp.py - C# using statements and .csproj extraction
  - rust.py - Rust use statements and Cargo.toml extraction
- pr_analyzer.py - Analyse PRs for library usage patterns
- constants.py - Shared constants and configurations
- main.py - Main analysis entry point
notebooks/ - Jupyter notebooks for the analysis pipeline:
- 01_download_dataset.ipynb - Download and prepare the AIDev dataset
- 02_explore_languages.ipynb - Identify programming languages in the dataset
- 03_analyze_library_usage.ipynb - Analyse library usage patterns (generates output/*.json)
- 04_generate_latex_tables.ipynb - Generate LaTeX tables for the paper (4 languages: TypeScript, Python, Go, C#)
tests/ - Unit tests for extractors and analyser

development

We use a few extra processes to ensure the code maintains a high quality. First clone the project and create a virtual environment - as described above. Now install the editable version of the project, with the development dependencies.

pip install --editable ".[dev]"

tests

This project includes unit tests to ensure correct functionality. Use pytest to run the tests with:

pytest tests

linting

We use pre-commit to lint the code, run it using:

pre-commit run --all-files

dependencies

We use uv for dependency management. First add new dependencies to requirements.in. Then version lock with uv using:

uv pip compile requirements.in --output-file requirements.txt --upgrade

citation

If you use this work in your research, please cite our paper:

ACM Reference Format:

Lukas Twist and Jie M. Zhang. 2026. A Study of Library Usage in Agent-Authored Pull Requests. In 23rd International Conference on Mining Software Repositories (MSR '26), April 13-14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3793302.3793562

BibTeX:

@inproceedings{twist2026AgentLibraryUsage,
  title = {{A Study of Library Usage in Agent-Authored Pull Requests}},
  author = {Twist, Lukas and Zhang, Jie M.},
  booktitle = {Proceedings of the 23rd International Conference on Mining Software Repositories},
  series = {MSR '26},
  location = {Rio de Janeiro, Brazil},
  year = {2026},
  month = {April},
  publisher = {ACM},
  doi = {10.1145/3793302.3793562},
}

acknowledgments

In a fitting twist of irony, this repository–which analyses how AI coding agents use libraries–was itself developed with assistance from Claude Code, an AI coding agent. All code was thoroughly reviewed and validated by the authors, who remain responsible for the scientific interpretations and conclusions.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
notebooks		notebooks
output		output
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-library-usage

abstract

dataset

installation

usage

structure

development

tests

linting

dependencies

citation

acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-library-usage

abstract

dataset

installation

usage

structure

development

tests

linting

dependencies

citation

acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages