Skip to content

Conversation

@danyeaw
Copy link
Member

@danyeaw danyeaw commented Dec 23, 2025

Updated version to replace #144, developed with @travishathaway.

This CEP outlines how native support for pure Python wheel packages could be achieved by adding support for them in repodata. When implemented, conda clients will be able to seamlessly install conda packages and pure Python wheels from enabled channels.

Checklist for submitter

  • I am submitting a new CEP: Repodata Wheel Support.
    • I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
  • I am submitting modifications to CEP XX.
  • Something else: (add your description here).

Checklist for CEP approvals

  • The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
  • A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
  • The cep-XXXX.md file has been renamed accordingly.
  • The # CEP XXXX - header has been edited accordingly.
  • The CEP status in the table has been changed to approved.
  • The last modification date in the table has been updated accordingly.
  • The table in the README has been updated with the new CEP entry.
  • The pre-commit checks are passing.

Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>

### Pixi Integrates with uv (Jan 2024)

Pixi changes course to use uv directly instead of rip, which unlocks features like editable installations, and git and path dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all now available for conda-only workflows through pixi-build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pavelzw, thanks so much for the feedback! Do you think we are missing a milestone in our brief history section? This pixi build feature is more about building path/git for conda packages than installing wheels isn't it?

cep-XXXX.md Outdated

This CEP introduces a new optional `artifact_url` field in package records to specify download locations for individual packages.

> Note for this draft: The `artifact_url` field could also be added as a separate CEP to allow it for other record types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would actually be a good idea to avoid asymmetries in package record specifications. Either that or we explicitly mention this new field is for all package record types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, if we have rough consensus that this is a good approach, we should probably split artifact_url in to a separate CEP so that it can apply to all record types.

danyeaw and others added 2 commits December 24, 2025 08:18
Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>
@danyeaw
Copy link
Member Author

danyeaw commented Dec 24, 2025

Thanks @travishathaway, thanks so much for all the updates! I applied them locally and then pushed a commit 👍

@beckermr
Copy link
Contributor

I'm going to leave my opinions on the general goals/ideas/features of this CEP in an effort to help bring other perspectives to this CEP.

TL;DR - As a conda-forge/core developer, I personally would NOT recommend folks use this feature, nor would I enable or offer support for this feature for conda-forge.

The CEP states

By adding native support for pure Python wheels to repodata, conda clients can:

  • Resolve dependencies across conda and PyPI packages in a single solve
  • Provide users with transparent access to the broader Python ecosystem
  • Maintain environment consistency and reproducibility
  • Eliminate the cognitive burden of managing two package managers
  • Fill gaps in conda package availability without requiring new conda builds
  • Reduce the maintenance burden by fully or partially eliminating the need to create and maintain conda recipes for pure-Python packages

Here is a point-by-point explanation of why in my estimation this feature simply would not work for conda-forge.

Resolve dependencies across conda and PyPI packages in a single solve

In a non-trivial fraction of cases, even for pure-Python packages, the requirements in the conda-forge have subtle differences from the upstream requirements. These changes range from package renaming (e.g., - and _ are identical for python packages, but not conda packages, matplotlib vs matplotlib-base, etc.) to more substantial changes that enable broader compatibility (e.g., upstream uses exact pins for dependencies, but conda-forge relaxes them because they are clearly not needed). I can imagine that when it comes to pure-Python packages that depend on a compiled backend package, things will also be funky in some cases.

These requirement differences will likely lead to some funky solves that either the conda or conda-forge developers will hear about.

Maintain environment consistency and reproducibility

Environment consistency is a tricky concept IMHO. For sure with this CEP one can in some cases create an environment where all constraints are satisfied. However, if the repodata is wrong, due to the issues outlined above, the formal consistency of the requirements doesn't really matter.

Reproducibility is an even trickier concept. For an environment to be reproducible, one needs to have the same solver with that solver run under the same conditions. Let's assume the conda version is fixed and the solver command is run on the same machine. Even then the constraint on having the same conditions combined with this CEP in effect means that both the upstream wheel metadata and the conda channel metadata have to be the same. Given that the most likely source of wheels is pypi, there is no way one can promise those same conditions.

Even if we restrict ourselves to environments built from lock files, the combination of the conda channel with pypi as a source of wheels will also not always be reproducible. PyPI users can delete packages (as opposed to simply yanking packages) and those deletions will break even locked envs. We do not allow package deletions on conda-forge for this exact reason. Thus for conda-forge, we could not recommend using this feature for reproducible envs from lock files unless PyPI turns off the ability for users to delete files.

Eliminate the cognitive burden of managing two package managers

The vast majority of the cognative burden is the differing and interacting repodata, not whether or not one types pip install or conda install. This CEP doesn't and cannot address that issue. Even consistently interpreting the repodata in a single solver doesn't address this issue.

Fill gaps in conda package availability without requiring new conda builds
Reduce the maintenance burden by fully or partially eliminating the need to create and maintain conda recipes for pure-Python packages

For the reasons stated above, I am personally skeptical injecting pure-Python wheel metadata into the repodata would consistently result in a correct-enough environments to eliminate the need for new conda builds or the need to repackage pure-python Packages in conda-forge. I am not saying this doesn't work some of the time. Instead I am saying that the solution proposed in this CEP is not so much better than the current "conda, then pip" solve that it can achieve the arguably difficult goals above. Stated another way, in a world where this feature existed instead of the feature to pip-install overtop of conda environments, conda-forge likely would still need/want to repackage everything.

Other comments

When there are naming differences between channels, wheel records MUST use the conda-forge package name as the standard.

This statement is a red-flag for me on this CEP. First, conda-forge itself doesn't have an authoritative mapping of its own packages back to python wheels. There are several approaches in the wild and none of them is standardized into an automated bit of repodata tools like conda/mamba/pixi/rattler can read and interact with. See the discussions of PURLS. Second, treating conda-forge as a special channel specifically in a CEP (as opposed to simply using it as a motivating usecase), is definitely IMHO the antithesis of what a CEP is supposed to be. conda is a set of tools and standards and should not be singling out any one purveyor of conda packages.

!=X.Y.Z → Omit dependency (conda does not support version exclusions; see Limitations section)

I don't follow this comment. Can you clarify? For sure I have used this operator in the run section before (see, e.g., https://github.com/conda-forge/ngmix-feedstock/blob/main/recipe/meta.yaml#L30), At minimum any requirement that is foo!=xyz should be added into the constrains section of the repodata. If it is a the wheel's run section, one should add the package without any constraints in the run section of the repodata as well. Th combination of these two additions should achieve the same effect as "install foo, but not version xyz."

@beckermr
Copy link
Contributor

Here is one other point that I think is worth considering.

One way I can imagine conda-forge using this feature is through only injecting items into packages.whl via repodata patching. In other words, instead of opening up everything on e.g., pypi, we instead have a structured process where only specific pure-Python wheels from pypi are added via repodata patching.

This procedure has some advantages for conda-forge that might actually be worth considering more generally. These are

  • It would let us directly control/patch the final repodata entries so that we can ensure the injections are not breaking or destructive. This procedure solves the main issue IMHO with the CEP here, namely that the repodata itself between pypi and conda-forge is simply not compatible.
  • conda-forge could itself store a copy of the wheel artifacts so that things are reproducible for lock files (i.e., no deletions of wheels from some external source). Given our current storage options, conda-forge would likely use a tool like conda-press to put the artifacts directly into anaconda.org and/or a mirror as conda artifacts.

One issue I have left unaddressed here is testing new repodata entries before they are added. We'd want to build at least one test environment and insure the package, at minimum, imports before we pushed it out to the world.

On thing I am noticing is that as we add on these additional requirements and desires, it seems almost simpler for conda-forge to use its existing feedstock infrastructure. We'd likely have to build a new "staged-wheels" system to manage this kind of process.

@danyeaw
Copy link
Member Author

danyeaw commented Dec 26, 2025

Hi @beckermr, thanks so much for your time reviewing and responding to this draft, I really appreciate it! I would like to use your feedback to strengthen the draft.

Version exclusions

At minimum any requirement that is foo!=xyz should be added into the constrains section of the repodata.

Thanks so much for pointing this out. My updated understanding is that Requires-Dist: numpy>=1.20.0,!=1.24.0 from a wheel metadata would be translated to:

{
  "name": "pandas",
  "version": "2.0.0",
  "depends": [
    "numpy >=1.20.0"
  ],
  "constrains": [
    "numpy !=1.24.0"
  ]
}

I'll update the CEP to make sure that is clear.

Mapping names centrally to conda-forge

First, conda-forge itself doesn't have an authoritative mapping of its own packages back to python wheels....

Second, treating conda-forge as a special channel specifically in a CEP (as opposed to simply using it as a motivating usecase), is definitely IMHO the antithesis of what a CEP is supposed to be.

Great points. We were probably trying too hard to make the community approach the standard, but as you point out, it isn't currently standardized.

I think having the wheel index own the mapping is still the right approach, what if we have the index declare which channel it is mapping names to. For example, we could add an optional field called name_mapping_channel to the info section like:

{
  "info": {
    "subdir": "noarch",
    "base_url": "https://repo.example.com/channel/",
    "name_mapping_channel": "conda-forge"
  },
  "packages.whl": {
    "requests-2.32.5": { ... }
  }
}

What do you think about this idea?

PyPI can delete packages

PyPI users can delete packages (as opposed to simply yanking packages) and those deletions will break even locked envs. We do not allow package deletions on conda-forge for this exact reason. Thus for conda-forge, we could not recommend using this feature for reproducible envs from lock files unless PyPI turns off the ability for users to delete files.

Another great point that we could address in the CEP!

There has been discussion from the PyPI community over the last year about standardizing around the deletion policy. For example there was a withdrawn PEP 763 and the Discuss Python.org topic about it. The consensus came down to:

  1. In general there is consensus that limiting deletion would be great
  2. Unfortunately, currently PyPI has size quotas, and users currently need the ability to delete files to manage their quotas

This would be a tradeoff of directly using PyPI packages. Users would get access to thousands of packages with no extra hosting requirements, but they would also be subject to how PyPI currently works. Someone using a PyPI wheel repodata would have to decide if that is a good tradeoff for them.

However, the lock file formats (rattler-lock-v6 and conda-lock-v1) already support a hybrid ecosystem with PyPI sections in the lockfiles. If someone wants to use a wheels channel directly from PyPI, it isn't better or worse than we have right now for reproducibility. In fact, channels that mirror/store wheels (as you suggest below) would actually improve reproducibility compared to the current "conda then pip" workflow.

conda-forge could itself store a copy of the wheel artifacts so that things are reproducible for lock files (i.e., no deletions of wheels from some external source).

As you point out, there are workflows where we could make the system more reproducible than we have now. Do you think that the CEP should recommend that production channels mirror wheels to ensure reproducibility, rather than relying directly on PyPI URLs?

Downstream patching ability for conda-forge (and other ecosystems)

In a non-trivial fraction of cases, even for pure-Python packages, the requirements in the conda-forge have subtle differences from the upstream requirements.

I would love to help find the right solution to this! Thanks again for the really valuable perspective. I think there are two complementary approaches we should take:

  1. Push improvements upstream - we should engage with Python projects even more than we do now when we find metadata issues - opening PRs for incorrect dependencies, version constraints, etc. This is the sustainable long-term solution.

  2. Support downstream repodata patching - The CEP should explicitly support patching packages.whl entries, just like conda packages today. As you point out, this is essential for:

  • Immediate fixes - Can't wait for upstream releases when dependencies are breaking environments
  • Quality control - Test packages before exposing to users
  • Ecosystem needs - Name mappings, relaxed constraints for conda compatibility
  • Reproducibility - Optionally mirror/store wheel artifacts so they can't be deleted

The vast majority of the cognative burden is the differing and interacting repodata, not whether or not one types pip install or conda install. This CEP doesn't and cannot address that issue.

You're absolutely right that there is a burden caused by is conflicting metadata. Repodata patching is how we could address this, channels can correct metadata conflicts so users don't encounter them. However, I also think the current client workflows are also a burden. Giving users seamless access to thousands of packages without requiring them to know whether they're from PyPI or conda channels would solve a huge pain point.

For the reasons stated above, I am personally skeptical injecting pure-Python wheel metadata into the repodata would consistently result in a correct-enough environments to eliminate the need for new conda builds or the need to repackage pure-python Packages in conda-forge.

You're right to be skeptical about fully eliminating the need for feedstocks. This CEP won't replace conda-forge's packaging infrastructure - metadata differences mean many packages will still need proper conda recipes. This is about handling the simpler cases more efficiently. Think of it as an additional tool for easier pure-Python packages, not a replacement for feedstocks. I'll update the CEP to better capture this view.

Implementation plan for a wheel channel

One way I can imagine conda-forge using this feature is through only injecting items into packages.whl via repodata patching. In other words, instead of opening up everything on e.g., pypi, we instead have a structured process where only specific pure-Python wheels from pypi are added via repodata patching.

I am really liking your thoughts on how we could implement this, nice!

I am not sure if the implementation plan should be part of this CEP or not, so I would be grateful for everyone's thoughts on that. However, I really like where you are going with this plan and I also envision some sort of phased approach. We could start with fully manual curation, but then move toward semi-automated as we learn from the manual process. This balances lower barrier (no recipes needed for many packages) with quality control. Complex packages should still use feedstocks, but this handles the simpler pure-Python case more efficiently. It would be amazing to use a conda-forge wheel channel as a test case for this if the community is interested.

Thanks again for all of the extremely valuable input, I'm looking forward to hearing more of your thoughts as we continue to refine this draft.

@h-vetinari
Copy link

Not commenting on the whole CEP (I share much of @beckermr's reservations, but currently don't have a strong opinion), just one aspect that's important to get right IMO:

I think having the wheel index own the mapping is still the right approach, what if we have the index declare which channel it is mapping names to. For example, we could add an optional field called name_mapping_channel to the info section like:

I think it would be a good idea to consider whether this can build on top of the proposed PEP 804 (discourse), which tries to standardize something useable around the whole name mapping issue.

It'd be certainly better if we can build on top of that (and help support that PEP) rather than inventing yet another scheme.

CC @jaimergp @rgommers @mgorny

Updates include:
- Clarifications on naming standards
- Channel mapping
- Patching capabilities for dependency management

# Conflicts:
#	cep-XXXX.md
@danyeaw
Copy link
Member Author

danyeaw commented Dec 30, 2025

Hi @beckermr, I made the updates to the relevant sections to fix version exclusions, remove reliance on conda-forge for channel mapping, clarify that we need downstream patching ability, added an implementation options section, and added recommendations for protecting against PyPI packages being deleted. Thanks again for all of the great feedback.

Hi @h-vetinari

I think it would be a good idea to consider whether this can build on top of the proposed PEP 804

Thanks that's a great point that we should build on this! I added a call out to PEP-804 in the Naming standard and channel mapping section.

Copy link

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left mostly questions

This CEP has the following known limitations:

1. **Pure Python only:** This CEP explicitly does not address wheels with binary extensions, which require platform-specific compatibility guarantees beyond the current scope. Conda’s strength is binary compatibility, so using conda packages may be the optimal solution.
2. **Environment markers**:** Only Python version markers are converted to dependencies. Other environment markers (OS, platform, etc.) are ignored based on the pure Python assumption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this create problems for packages that actually use os/platform markers to only depend on a package for one platform? How would the channel operator deal with that? For example, let's look at this example:

  • package A: depends on H; platform == linux
  • package H: Only has a wheel for linux

Now, if a user was to try to install A on Windows, they would get a solver error. This seems wrong if A would still work completely fine on Windows without H.

Repodata patching is possible, but in the case I just showed, a channel operator would be forced to likely publish a bad package first and then patch the repodata entry if the repodata is generated on the server side purely based on the wheel metadata (like anaconda.org or conda-index does with conda packages for example).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JeanChristopheMorinPerso, since conditional dependencies is a separate CEP, what if we said that channel operators MAY patch and that they SHOULD support further conditional dependencies as they become available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its important that we do describe how to convert those markers when conditional dependencies do exist. Because not all markers are trivial to convert (e.g. platform_machine, python_full_version vs python_version). I think that should be in this CEP but I can also imagine that a following could work.

- `!=X.Y.Z` → Add to `constrains` field
- **Multiple specifiers:** Combine with commas (e.g., >=1.0,<2.0)
- **Python version requirements:** Convert Requires-Python to explicit python dependency
- **Environment markers:** Ignore markers other than Python version (pure Python assumption)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure python packages can still depend on platform specific packages. E.g. "tzdata; platform_system == "Windows""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the python implementation may be important (PyPy vs CPython, etc)

- A shared relative or absolute `base_url` with all wheels in the same directory, by populating the `base_url` field and leaving the `artifact_url` field empty.
- A manual PyPI repository with wheels in directories by the package name by populating the absolute URL in the `artifact_url` field, or the `base_url` and a relative path in the `artifact_url` field.
- External PyPI mirrors or CDNs using absolute URLs by populating the `artifact_url` field, for example to <https://files.pythonhosted.org/packages/.../package-1.0.0-py3-none-any.whl>
- Mixed sources within the same repodata file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the artifact_url populated for conda packages? Should the index.json file include that? Similarly for wheels. How are they indexed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @baszalmstra, I was thinking that if we want artifact_url support for conda packages that we should make it a separate CEP (it sounds like a good idea to me, and I can draft one). I don't think the index.json currently contains URLs, so I don't see why we would add them for this new field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to do that in a separate CEP indeed. The information missing is how this field should be populated by conda-index/rattler-index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final URL cannot be part of index.json because the package doesn't "know" where it will be served from (same as sha256, it cannot predict its own hash). This information is only available to the indexing tool.

Copy link
Contributor

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read the CEP and added a few comments. I am supportive of having better PyPI support in the conda ecosystem so users don't have to try their luck with multiple overlapping tools, but I have reservations with the scope and direction of this proposal.

First, it doesn't tell the whole story. The current CEP focuses on pre-processing wheel metadata in a friendly way for the conda solver. It doesn't go into the details and nuance of name mapping (with the complications it brings!), or what to do with the solved records once a solution is found: how to install the wheel, how to cache it, how to populate its conda-meta/*.json metadata. Ideally, the CEP would cover the whole story, or at least contextualize where it sits in the pipeline and which other CEPs to consult to get the full story.

I have reservations about the metadata-only approach too, and I think this needs to be better discussed in the Rejected ideas section. Why is this approach so desirable compared to the others should go in a Rationale section. IIUC, this is because:

  • Mirroring wheels is expensive and maybe unnecessary
  • Metadata patching is desirable
  • Easier to implement from the solver side of things

Also (unrelated, but worth exploring), if we do go this way, the proposal doesn't need to limit the strategy to wheels only. If we wanted to offer an agnostic field, like packages.*, where any package format could be allowed, it wouldn't take much more work:

  • url shows where to download the artifact from
  • fn informs of the format, and the client would know how to deal with it
  • depends and others need to conform to conda conventions and name mappings

(This is to say that a big part of the specification right now doesn't seem to be very wheel-specific).


That aside, there are some editorial changes that would need to be made, but let's get there once we have agreed on a direction better.


### Add more conda packages

Create and maintain new conda packages for each PyPI dependency needed. Tools like [Grayskull] exist to make this easier to convert. However, this is a significant workload for the community, with over half of all conda-forge packages being pure Python. Even with more dedicated resources, creating recipes for over 400 thousand pure Python packages is not achievable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see a conda-forge/wheels-index repository where the automation pipelines are maintained folks add their requests to:

  1. Add a project to the watchlist so it is included in the index
  2. Yank certain wheels
  3. Repodata patch
  4. Import and archive pure wheel feedstocks
  5. etc

I don't know if the resulting repodata would be added to conda-forge proper, or maybe a separate conda-forge-wheels channel (to keep the production channel lighter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants