From 7888212e28efecb116cba2a75ea15c1e61ffcbd0 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 1 Oct 2025 19:31:36 -0400 Subject: [PATCH 01/10] add copilot-instructions.md with type hints --- .github/copilot-instructions.md | 71 +++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000000000..b2dd317eba951 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,71 @@ +# Pandas Copilot Instructions + +## Project Overview +`pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. + + + +## Type Hints + +pandas strongly encourages the use of PEP 484 style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well! + +### Style Guidelines + +Type imports should follow the from `typing import ...` convention. Your code may be automatically re-written to use some modern constructs (e.g. using the built-in `list` instead of `typing.List`) by the pre-commit checks. + +In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in Mypy 1775. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like + +``` +class SomeClass1: + str = None +``` + +The appropriate way to annotate this would be as follows + +``` +str_type = str + +class SomeClass2: + str: str_type = None +``` +In some cases you may be tempted to use `cast` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example + +``` +from typing import cast + +from pandas.core.dtypes.common import is_number + +def cannot_infer_bad(obj: Union[str, int, float]): + + if is_number(obj): + ... + else: # Reasonably only str objects would reach this but... + obj = cast(str, obj) # Mypy complains without this! + return obj.upper() +``` +The limitation here is that while a human can reasonably understand that `is_number` would catch the `int` and `float` types mypy cannot make that same inference just yet (see mypy #5206. While the above works, the use of `cast` is strongly discouraged. Where applicable a refactor of the code to appease static analysis is preferable.) + +``` +def cannot_infer_good(obj: Union[str, int, float]): + + if isinstance(obj, str): + return obj.upper() + else: + ... +``` +With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid `cast` before going down such paths. + +### pandas-specific types + +Commonly used types specific to pandas will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. + +For example, quite a few functions in pandas accept a `dtype` argument. This can be expressed as a string like `"object"`, a `numpy.dtype` like `np.int64` or even a pandas `ExtensionDtype` like `pd.CategoricalDtype`. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module + +``` +from pandas._typing import Dtype + +def as_type(dtype: Dtype) -> ...: + ... +``` + +This module will ultimately house types for repeatedly used concepts like “path-like”, “array-like”, “numeric”, etc… and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types. From e3114b268e9de428ac433a154a3de90666f664b4 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 1 Oct 2025 19:38:55 -0400 Subject: [PATCH 02/10] add links to mypy issues --- .github/copilot-instructions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index b2dd317eba951..cd3e280662bb8 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -1,4 +1,4 @@ -# Pandas Copilot Instructions +# pandas Copilot Instructions ## Project Overview `pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. @@ -13,7 +13,7 @@ pandas strongly encourages the use of PEP 484 style type hints. New development Type imports should follow the from `typing import ...` convention. Your code may be automatically re-written to use some modern constructs (e.g. using the built-in `list` instead of `typing.List`) by the pre-commit checks. -In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in Mypy 1775. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like +In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in [Mypy 1775](https://github.com/python/mypy/issues/1775#issuecomment-310969854). The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like ``` class SomeClass1: @@ -43,7 +43,7 @@ def cannot_infer_bad(obj: Union[str, int, float]): obj = cast(str, obj) # Mypy complains without this! return obj.upper() ``` -The limitation here is that while a human can reasonably understand that `is_number` would catch the `int` and `float` types mypy cannot make that same inference just yet (see mypy #5206. While the above works, the use of `cast` is strongly discouraged. Where applicable a refactor of the code to appease static analysis is preferable.) +The limitation here is that while a human can reasonably understand that `is_number` would catch the `int` and `float` types mypy cannot make that same inference just yet (see [mypy #5206](https://github.com/python/mypy/issues/5206). While the above works, the use of `cast` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable.) ``` def cannot_infer_good(obj: Union[str, int, float]): From 413792986a91fd38d0719c81e65344b0ff7aa591 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Thu, 2 Oct 2025 19:47:28 -0400 Subject: [PATCH 03/10] rename file to AGENTS.md --- .github/{copilot-instructions.md => AGENTS.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/{copilot-instructions.md => AGENTS.md} (100%) diff --git a/.github/copilot-instructions.md b/.github/AGENTS.md similarity index 100% rename from .github/copilot-instructions.md rename to .github/AGENTS.md From 7a94d4c947961200fc9ee7f93219d4615e0d3b49 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Thu, 2 Oct 2025 20:12:49 -0400 Subject: [PATCH 04/10] updated AGENTS.md based on feedback from copilot --- .github/AGENTS.md | 96 ++++++++++++++--------------------------------- 1 file changed, 29 insertions(+), 67 deletions(-) diff --git a/.github/AGENTS.md b/.github/AGENTS.md index cd3e280662bb8..16747c882144f 100644 --- a/.github/AGENTS.md +++ b/.github/AGENTS.md @@ -1,71 +1,33 @@ -# pandas Copilot Instructions +# pandas Agent Instructions (Copilot etc) ## Project Overview `pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. - - -## Type Hints - -pandas strongly encourages the use of PEP 484 style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well! - -### Style Guidelines - -Type imports should follow the from `typing import ...` convention. Your code may be automatically re-written to use some modern constructs (e.g. using the built-in `list` instead of `typing.List`) by the pre-commit checks. - -In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in [Mypy 1775](https://github.com/python/mypy/issues/1775#issuecomment-310969854). The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like - -``` -class SomeClass1: - str = None -``` - -The appropriate way to annotate this would be as follows - -``` -str_type = str - -class SomeClass2: - str: str_type = None -``` -In some cases you may be tempted to use `cast` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example - -``` -from typing import cast - -from pandas.core.dtypes.common import is_number - -def cannot_infer_bad(obj: Union[str, int, float]): - - if is_number(obj): - ... - else: # Reasonably only str objects would reach this but... - obj = cast(str, obj) # Mypy complains without this! - return obj.upper() -``` -The limitation here is that while a human can reasonably understand that `is_number` would catch the `int` and `float` types mypy cannot make that same inference just yet (see [mypy #5206](https://github.com/python/mypy/issues/5206). While the above works, the use of `cast` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable.) - -``` -def cannot_infer_good(obj: Union[str, int, float]): - - if isinstance(obj, str): - return obj.upper() - else: - ... -``` -With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid `cast` before going down such paths. - -### pandas-specific types - -Commonly used types specific to pandas will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. - -For example, quite a few functions in pandas accept a `dtype` argument. This can be expressed as a string like `"object"`, a `numpy.dtype` like `np.int64` or even a pandas `ExtensionDtype` like `pd.CategoricalDtype`. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module - -``` -from pandas._typing import Dtype - -def as_type(dtype: Dtype) -> ...: - ... -``` - -This module will ultimately house types for repeatedly used concepts like “path-like”, “array-like”, “numeric”, etc… and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types. +## Purpose +- Assist contributors by suggesting code changes, tests, and documentation edits for the pandas repository while preserving stability and compatibility. + +## Persona & Tone +- Concise, neutral, code-focused. Prioritize correctness, readability, and tests. + +## Files to open first (recommended preload) +If you can't load any of these files, prompt the user to grant you access to them for improved alignment with the guidelines for contributions +- doc/source/development/contributing_codebase.rst +- doc/source/development/contributing_docstring.rst +- doc/source/development/contributing_documentation.rst +- doc/source/development/contributing.rst + +## Decision heuristics +- Favor small, backward-compatible changes with tests. +- If a change would be breaking, propose it behind a deprecation path and document the rationale. +- Prefer readability over micro-optimizations unless benchmarks are requested. +- Add tests for behavioral changes; update docs only after code change is final. + +## Type hints guidance (summary) +- Prefer PEP 484 style and types in pandas._typing when appropriate. +- Avoid unnecessary use of typing.cast; prefer refactors that convey types to type-checkers. +- Use builtin generics (list, dict) when possible. + +## Docstring guidance (summary) +- Follow NumPy / numpydoc conventions used across the repo: short summary, extended summary, Parameters, Returns/Yields, See Also, Notes, Examples. +- Ensure examples are deterministic, import numpy/pandas as documented, and pass doctest rules used by docs validation. +- Preserve formatting rules: triple double-quotes, no blank line before/after docstring, parameter formatting ("name : type, default ..."), types and examples conventions. From f1e952292a77642bc375c00822e6e7fa9a6e9817 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 8 Oct 2025 14:31:32 -0400 Subject: [PATCH 05/10] move AGENTS.md to root --- .github/AGENTS.md => AGENTS.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/AGENTS.md => AGENTS.md (100%) diff --git a/.github/AGENTS.md b/AGENTS.md similarity index 100% rename from .github/AGENTS.md rename to AGENTS.md From 947f263d466476109df578d94d9232649ddb3182 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 8 Oct 2025 14:33:26 -0400 Subject: [PATCH 06/10] update title --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 16747c882144f..60405bdf94148 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,4 +1,4 @@ -# pandas Agent Instructions (Copilot etc) +# pandas Agent Instructions ## Project Overview `pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. From 44d85097a2d4fb6fbbd2b1f4f6e0485f42dcf96c Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 15 Oct 2025 12:27:46 -0400 Subject: [PATCH 07/10] remove (useless) prompt to load .rst files --- AGENTS.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 60405bdf94148..46316d9aa44d1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,13 +9,6 @@ ## Persona & Tone - Concise, neutral, code-focused. Prioritize correctness, readability, and tests. -## Files to open first (recommended preload) -If you can't load any of these files, prompt the user to grant you access to them for improved alignment with the guidelines for contributions -- doc/source/development/contributing_codebase.rst -- doc/source/development/contributing_docstring.rst -- doc/source/development/contributing_documentation.rst -- doc/source/development/contributing.rst - ## Decision heuristics - Favor small, backward-compatible changes with tests. - If a change would be breaking, propose it behind a deprecation path and document the rationale. From 7583929efcca57201e2e5c024f09c0ccab2978a0 Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 15 Oct 2025 14:48:30 -0400 Subject: [PATCH 08/10] update pr template; re-add prompts to contributing guidelines docs and webpage --- .github/PULL_REQUEST_TEMPLATE.md | 1 + AGENTS.md | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 8eca91c692710..c6e93aee38a8b 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -3,3 +3,4 @@ - [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit). - [ ] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints) to new arguments/methods/functions. - [ ] Added an entry in the latest `doc/source/whatsnew/vX.X.X.rst` file if fixing a bug or adding a new feature. +- [ ] If I used AI to develop this pull request, I prompted it to follow `AGENTS.md`. diff --git a/AGENTS.md b/AGENTS.md index 46316d9aa44d1..66689c7858706 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,6 +9,14 @@ ## Persona & Tone - Concise, neutral, code-focused. Prioritize correctness, readability, and tests. +## Project Guidelines +- Be sure to follow all guidelines for contributing to the codebase specified at https://pandas.pydata.org/docs/development/contributing_codebase.html +- These guidelines are also available in the following local files, which should be loaded into context and adhered to + - doc/source/development/contributing_codebase.rst + - doc/source/development/contributing_docstring.rst + - doc/source/development/contributing_documentation.rst + - doc/source/development/contributing.rst + ## Decision heuristics - Favor small, backward-compatible changes with tests. - If a change would be breaking, propose it behind a deprecation path and document the rationale. From 4104ea8157faea79638f15929b28dfe8e814363d Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 15 Oct 2025 15:18:30 -0400 Subject: [PATCH 09/10] add PR section --- AGENTS.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 66689c7858706..0f6af9457f0ca 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -32,3 +32,16 @@ - Follow NumPy / numpydoc conventions used across the repo: short summary, extended summary, Parameters, Returns/Yields, See Also, Notes, Examples. - Ensure examples are deterministic, import numpy/pandas as documented, and pass doctest rules used by docs validation. - Preserve formatting rules: triple double-quotes, no blank line before/after docstring, parameter formatting ("name : type, default ..."), types and examples conventions. + +## Pull Requests (summary) +- Pull request titles should be descriptive and include one of the following prefixes: + - ENH: Enhancement, new functionality + - BUG: Bug fix + - DOC: Additions/updates to documentation + - TST: Additions/updates to tests + - BLD: Updates to the build process/scripts + - PERF: Performance improvement + - TYP: Type annotations + - CLN: Code cleanup +- Pull request descriptions should follow the template, and **succinctly** describe the change being made. Usually a few sentences is sufficient. +- Pull requests which are resolving an existing Github Issue should include a link to the issue in the PR Description. From d188c2d43183a67ce8198dd2c13cb93852c2b2cd Mon Sep 17 00:00:00 2001 From: Justine Wezenaar Date: Wed, 29 Oct 2025 11:40:11 -0400 Subject: [PATCH 10/10] add instruction to not annotate commits --- AGENTS.md | 1 + 1 file changed, 1 insertion(+) diff --git a/AGENTS.md b/AGENTS.md index 0f6af9457f0ca..10b9f4f6e78fd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -45,3 +45,4 @@ - CLN: Code cleanup - Pull request descriptions should follow the template, and **succinctly** describe the change being made. Usually a few sentences is sufficient. - Pull requests which are resolving an existing Github Issue should include a link to the issue in the PR Description. +- Do not add summaries or additional comments to individual commit messages. The single PR description is sufficient.