This repository contains the artifacts and full results for the research paper A Study of Library Usage in Agent-Authored Pull Requests, accepted at the 23rd International Conference on Mining Software Repositories (MSR '26), April 13-14, 2026, Rio de Janeiro, Brazil, and available on arXiv.
Coding agents are becoming increasingly capable of completing end-to-end software engineering workflows that previously required a human developer, including raising pull requests (PRs) to propose their changes. However, we still know little about how these agents use libraries when generating code, a core part of real-world software development. To fill this gap, we study 26,760 agent-authored PRs from the AIDev dataset to examine three questions: how often do agents import libraries, how often do they introduce new dependencies (and with what versioning), and which specific libraries do they choose? We find that agents often import libraries (29.5% of PRs) but rarely add new dependencies (1.3% of PRs); and when they do, they follow strong versioning practices (75.0% specify a version), an improvement on direct LLM usage where versions are rarely mentioned. Generally, agents draw from a surprisingly diverse set of external libraries, contrasting with the limited "library preferences" seen in prior non-agentic LLM studies. Our results offer an early empirical view into how AI coding agents interact with today's software ecosystems.
This work is part of the MSR 2026 Mining Challenge, analysing the AIDev dataset, the first large-scale, openly available dataset of agent-authored pull requests from real-world GitHub repositories. The dataset was introduced by Li et al. and captures the emergence of autonomous coding agents in software engineering, providing a unique opportunity to study how AI teammates interact with real-world codebases and software ecosystems.
Dataset Version: This research utilises AIDev dataset revision eee0408a277826d88fc0ca5fa07d2fc325c96af1 (November 2025 snapshot).
The code requires Python 3.11 or later to run. Ensure you have it installed with the command below, otherwise download and install it from here.
python --versionNow clone the repository code:
git clone https://github.com/itsluketwist/agent-library-usageOnce cloned, install the requirements locally in a virtual environment:
python -m venv .venv
. .venv/bin/activate
pip install .After installation, all analysis is run through Jupyter notebooks in the notebooks/ directory. Run the notebooks in order:
01_download_dataset.ipynb- Download and prepare the AIDev dataset02_explore_languages.ipynb- Identify programming languages in the dataset03_analyze_library_usage.ipynb- Analyse library usage patterns across all languages04_generate_latex_tables.ipynb- Generate LaTeX tables for the research paper
Each notebook is self-contained and documents its purpose and outputs.
data/- Downloaded AIDev dataset files (parquet format, git-ignored)output/- Generated analysis results:*_library_usage.json- Per-language library usage dataaggregated_statistics.json- Summary statistics across all languageslatex_tables.tex- Generated LaTeX tables for the paper
src/- Main project code:extractors/- Language-specific library extractors:base.py- Base extractor interfacepython.py- Python import and requirements.txt extractionjavascript.py- JavaScript/TypeScript import and package.json extractiongo.py- Go import and go.mod extractioncsharp.py- C# using statements and .csproj extractionrust.py- Rust use statements and Cargo.toml extraction
pr_analyzer.py- Analyse PRs for library usage patternsconstants.py- Shared constants and configurationsmain.py- Main analysis entry point
notebooks/- Jupyter notebooks for the analysis pipeline:01_download_dataset.ipynb- Download and prepare the AIDev dataset02_explore_languages.ipynb- Identify programming languages in the dataset03_analyze_library_usage.ipynb- Analyse library usage patterns (generates output/*.json)04_generate_latex_tables.ipynb- Generate LaTeX tables for the paper (4 languages: TypeScript, Python, Go, C#)
tests/- Unit tests for extractors and analyser
We use a few extra processes to ensure the code maintains a high quality. First clone the project and create a virtual environment - as described above. Now install the editable version of the project, with the development dependencies.
pip install --editable ".[dev]"This project includes unit tests to ensure correct functionality.
Use pytest to run the tests with:
pytest testsWe use pre-commit to lint the code, run it using:
pre-commit run --all-filesWe use uv for dependency management.
First add new dependencies to requirements.in.
Then version lock with uv using:
uv pip compile requirements.in --output-file requirements.txt --upgradeIf you use this work in your research, please cite our paper:
ACM Reference Format:
Lukas Twist and Jie M. Zhang. 2026. A Study of Library Usage in Agent-Authored Pull Requests. In 23rd International Conference on Mining Software Repositories (MSR '26), April 13-14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3793302.3793562
BibTeX:
@inproceedings{twist2026AgentLibraryUsage,
title = {{A Study of Library Usage in Agent-Authored Pull Requests}},
author = {Twist, Lukas and Zhang, Jie M.},
booktitle = {Proceedings of the 23rd International Conference on Mining Software Repositories},
series = {MSR '26},
location = {Rio de Janeiro, Brazil},
year = {2026},
month = {April},
publisher = {ACM},
doi = {10.1145/3793302.3793562},
}
In a fitting twist of irony, this repository–which analyses how AI coding agents use libraries–was itself developed with assistance from Claude Code, an AI coding agent. All code was thoroughly reviewed and validated by the authors, who remain responsible for the scientific interpretations and conclusions.