Skip to content

Conversation

@davidhassell
Copy link
Contributor

Fixes #361

Lots of files touched, but most of them are just moving the odd import from the module level to within a method.

Three main areas:

1. Doc string rewriting

  • Only apply substitutions when necessary, rather than trying every possible substitution for every doc string. Only 3009 of the doc strings need rewriting, and each one of those only utilises a small number of the 110 possible substitutions.
  • ~20% of the speed-up.

2. Importing external modules that themselves have a slow import

  • Move the dask, scipy, s3fs, zarr, h5netcdf, uritools to run time, rather than import time. Many will not ever get imported, and when they do, the time is usually negligible compared to the operation being run.
  • ~80% of the speed-up.

3. Refactor CONSTANTS

As a consequence of 2., we can't initialise chunksize at import time (it needs dask). This prompted a bit of a refactor of cfdm.constants, cfdm.functions.ConstantAccess, and cfdm.configuration. Essentially, the original CONSTANTS dictionary is now removed, and replaced with the dictionary cfdm.functions.ConstantAccess._constants. This dictionary starts off empty, and gets populated as and when configuration parameters are accessed. In particular, when cfdm.configuration is called, it makes a special effort to full populate the dictionary (there could be a way of doing this without the pre-populating, but I couldn't easily find one that didn't play havoc with the setting of constants in a context manager.)

@davidhassell davidhassell added this to the NEXTVERSION milestone Oct 22, 2025
@davidhassell davidhassell added enhancement New feature or request performance Relating to speed and memory performance labels Oct 22, 2025
@davidhassell
Copy link
Contributor Author

missed a few imports :) 82b6cb2

A note in the thread lock in cfdm.data.locks.py - we don't need it to serializable any more, since we don't pass it in dask.arra.from_array any more, so we can use the Python builtin. This change alone sped up the import by 33% from the previous commit :)

@davidhassell
Copy link
Contributor Author

... and a few more: 9909c2f

@davidhassell davidhassell marked this pull request as draft October 27, 2025 09:37
@davidhassell
Copy link
Contributor Author

Sorry, Sadie (although I know you haven't looked, yet :)) - converting to draft as I'm still fiddling!

@davidhassell davidhassell marked this pull request as ready for review October 30, 2025 18:24
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very hot off the press, a really nice thing to note first of all is that, by Python 3.15 (though we'd probably need to wait until that was our minimum to avoid lots of conditional logic on imports, so many years...) thanks to 'PEP 810 – Explicit lazy imports' which was officially accepted yesterday there will be no need to move any heavier imports to first run-time use, but instead they can be imported at the top-level with use of a new lazy keyword, e.g. lazy from scipy.sparse import issparse. So by then we can move all of the shifted imports back home 😃

In the meantime - and concerning the other aspect, our rather niche dostring rewriting - great PR. Does what it says on the tin, with import time reduced on my system from:

$ git checkout main  # before
$ python -X importtime -c "import cfdm"
...
...
import time:      4581 |    1130718 | cfdm
$ git checkout david/docstring-rewrite-speed # after
$ python -X importtime -c "import cfdm"
...
...
import time:      2022 |     291805 | cfdm

so by my calculation 291805/1130718 * 100 = 25.807053571270643 % of the previous import time - nice! (Looks like imports are slower on my system than yours, at least with the wind blowing in whatever direction it was at the time, but it's a similar pattern of improvement comparing main to this branch. Not sure there's much point doing multiple tries since there may be some caching effects reducing the time taken for any further attempts.) Tests still pass and functionality unaffected as far as I can otherwise tell.

Only minor comments raised, except regarding the removal of the version constraint checks in __init.py - see in-line comment. Once you've considered those, please merge!

__cf_version__ = core.__cf_version__
__version__ = core.__version__

_requires = core._requires + (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to get rid of all of this version checking? We can still do it without having to actually import said modules, with use of importlib.metadata to see what is available in the existing environment, e.g:

>>> from importlib import metadata
>>> metadata.version("cftime")
'1.6.4'
>>> metadata.version("netCDF4")
'1.7.2'
>>> metadata.version("dask")
'2025.7.0'

(Good old metadata coming in useful!) It would be a shame to lose the useful checks in the name of import speed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I didn't know about this, thanks. I'll try it out and make a new commit if all goes well (not today!). Playing on the command line, each metadata.version call takes ~0.5 milliseconds - so not too expensive :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Having chatted about this offline, we decided to leave things as they are here for now, but open another issue to look into what we want to do/don't in this area (which I will do soon).

davidhassell and others added 8 commits November 4, 2025 17:45
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
@davidhassell
Copy link
Contributor Author

Thanks for the review, Sadie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Relating to speed and memory performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce the time taken to import cfdm

2 participants