NCAS-CMS · Zeitsperre · Jan 7, 2026 · Jan 7, 2026 · Jan 8, 2026
diff --git a/README.md b/README.md
@@ -12,38 +12,37 @@
 pyfive : A pure Python HDF5 file reader
 =======================================
 
-pyfive is an open source library for reading HDF5 files written using
+``pyfive`` is an open source library for reading HDF5 files written using
 pure Python (no C extensions). The package is still in development and not all
 features of HDF5 files are supported.
 
-pyfive aims to support the same API as [`h5py`](https://github.com/h5py/h5py)
-for reading files. Cases where a file uses a feature that is supported by `h5py`
-but not pyfive are considered bug and should be reported in our [Issues](https://github.com/NCAS-CMS/pyfive/issues).
-Writing HDF5 is not a goal of pyfive and portions of the API which apply only to writing will not be
-implemented.
+``pyfive`` aims to support the same API as [`h5py`](https://github.com/h5py/h5py) for reading files.
+Cases where a file uses a feature that is supported by ``h5py`` but not ``pyfive`` are considered bugs
+and should be reported in our [Issues](https://github.com/NCAS-CMS/pyfive/issues).
+Writing HDF5 output is not a goal of ``pyfive`` and portions of the API which apply only to writing will not be implemented.
 
 Dependencies
 ============
 
-pyfive is tested to work with Python 3.10 to 3.13.  It may also work
-with other Python versions.
+``pyfive`` is tested against Python versions 3.10 to 3.14.
+It may also work with other Python versions.
 
-The only dependencies to run the software besides Python is NumPy.
+The only dependencies to run the software besides Python is ``numpy``.
 
 Install
 =======
 
-pyfive can be installed using pip using the command::
+pyfive can be installed using ``pip`` using the command::
 
     pip install pyfive
 
-conda package are also available from conda-forge which can be installed::
+``conda`` packages are also available from conda-forge::
 
     conda install -c conda-forge pyfive
 
 To install from source in your home directory use::
 
-    python setup.py install --user
+    pip install --user ./pyfive
 
 The library can also be imported directly from the source directory.
 
@@ -54,21 +53,20 @@ Development
 git
 ---
 
-You can check out the latest pyfive souces with the command::
+You can check out the latest ``pyfive`` souces with the command::
 
     git clone https://github.com/NCAS-CMS/pyfive.git
 
 testing
 -------
 
-pyfive comes with a test suite in the ``tests`` directory.  These tests can be
-exercised using the commands ``pytest`` from the root directory assuming the
-``pytest`` package is installed.
+``pyfive`` comes with a test suite in the ``tests`` directory.
+These tests can be exercised using the ``pytest`` command from the root directory (requires installation of the ``pytest`` package).
 
-Conda-feedstock
-===============
+Conda-forge feedstock
+=====================
 
-Package repository at [conda feedstock](https://github.com/conda-forge/pyfive-feedstock)
+Package repository [conda-forge feedstock](https://github.com/conda-forge/pyfive-feedstock)
 
 Codecov
 =======
@@ -78,6 +76,6 @@ Test coverage assessement is done using [codecov](https://app.codecov.io/gh/NCAS
 Documentation
 =============
 
-Build locally with Sphinx::
+Build locally with Sphinx:
 
-    sphinx-build -Ea doc doc/build
+    $ sphinx-build -Ea doc doc/build
diff --git a/doc/_sidebar.rst.inc b/doc/_sidebar.rst.inc
@@ -8,7 +8,8 @@
     Introduction <introduction>
     Getting started <quickstart/index>
     API Reference <api_reference>
+    The p5dump utility <p5dump>
     Additional API Features <additional>
     Optimising Data Access Speed <optimising>
-    The p5dump utility <p5dump>
+    Understanding Cloud Optimisation <cloud>
     Change Log <changelog>
diff --git a/doc/additional.rst b/doc/additional.rst
@@ -8,8 +8,10 @@ Modifications to the File API
 
 When acccessing a file, in addition there are two modifications to the standard ``h5py`` API that can be used to optimise 
 performance. A new method (``get_lazy_view``) and an additional keyword argument on ``visititems`` (noindex) are provided
-to support access to all dataset metadata without loading chunk indices. (Loading chunk indices at dataset
-instantiation is mostly a useful optimisation, but not if you have no intent of accessing the data itself.)
+to support access to all dataset metadata without loading chunk indices. 
+
+.. note::
+   Loading chunk indices at dataset instantiation is mostly a useful optimisation, but not if you have no intent of accessing the data itself.
 
 The ``Group`` API is fully documented in the autogenerated API reference, but the additional methods and keyword arguments are highlighted here.
 These methods are also avilable on the ``File`` class, since ``File`` is a subclass of ``Group``. 
@@ -21,10 +23,9 @@ These methods are also avilable on the ``File`` class, since ``File`` is a subcl
 Modifications to the DatasetID API
 ----------------------------------
 
-When accessing datasets, additional functionality is exposed via the ``pyfive.h5d.DatasetID`` class, which
-is the class which implements the low-level data access methods for datasets (aka "variables").
+When accessing datasets, additional functionality is exposed via the ``pyfive.h5d.DatasetID`` class, which implements the low-level data access methods for datasets (`Variables`).
 
-The DatasetID API is fully documented in the autogenerated API reference, but the additional methods and attributes are highlighted here:
+The ``DatasetID`` API is fully documented in the autogenerated API reference, but additional methods and attributes are highlighted here:
 
 .. autoattribute:: pyfive.h5d.DatasetID.first_chunk
 .. autoattribute:: pyfive.h5d.DatasetID.btree_range

diff --git a/doc/api_reference.rst b/doc/api_reference.rst
@@ -26,6 +26,7 @@ Dataset
 
 DatasetID
 ----------
+
 .. autoclass:: pyfive.h5d.DatasetID
    :members:
    :noindex:
@@ -41,7 +42,7 @@ Datatype
 The h5t module
 --------------
 
-Partial implementation of some of the lower level h5py API, needed
+Partial implementation of some of the lower level ``h5py`` API, needed
 to support enumerations, variable length strings, and opaque datatypes.
 
 .. autofunction:: pyfive.h5t.check_enum_dtype

diff --git a/doc/cloud.rst b/doc/cloud.rst
@@ -1,7 +1,7 @@
 Cloud Optimisation
 ******************
 
-While `pyfive` can only read HDF5 files, it includes some features to help users understand whether it might
+While ``pyfive`` can only read HDF5 files, it includes some features to help users understand whether it might
 be worth rewriting files to make them cloud optimised (as defined by Stern et.al., 2022 [#]_).
 
 To be cloud optimised an HDF5 file needs to have a contiguous index for each 
@@ -21,8 +21,7 @@ Metadata can be repacked to the front of the file and variables can be rechunked
 which is effectively the same process undertaken when HDF5 data is reformatted to other cloud optimised formats.
 
 The HDF5 library provides a tool (`h5repack <https://support.hdfgroup.org/documentation/hdf5/latest/_h5_t_o_o_l__r_p__u_g.html>`_) 
-which can do this, provided it is driven with suitable information 
-about required chunk shape and the expected size of metadata fields. 
+which can do this, provided it is driven with suitable information about required chunk shape and the expected size of metadata fields. 
 `pyfive` supports both a method to query whether such repacking is necessary, and to extract necessary parameters.
 
 In the following example we compare and contrast the unpacked and repacked version of a particularly pathological 
@@ -50,12 +49,11 @@ If we look at some of the output of `p5dump -s` on this file
                     uas:_first_chunk = 36520 ;
 
 
-we can immediately see that this will be a problematic file!  The b-tree index is clearly interleaved with the data 
+We can immediately see that this will be a problematic file! The `b-tree` index is clearly interleaved with the data 
 (compare the first chunk address with last index addresses of the two variables), and with a chunk dimension of ``(1,)``, 
 any effort to use the time-dimension to locate data of interest will involve a ludicrous number of one number reads 
 (all underlying libraries read the data one chunk at a time). 
-It would feel like waiting for the heat death of the universe if one
-was to attempt to manipulate this data stored on an object store! 
+It would feel like waiting for the heat death of the universe if one was to attempt to manipulate this data stored on an object store! 
 
 It is relatively easy (albeit slow) to use 
 `h5repack <https://support.hdfgroup.org/documentation/hdf5/latest/_h5_t_o_o_l__r_p__u_g.html>`_ 
@@ -83,12 +81,11 @@ Now data follows indexes, the time dimension is one chunk, and there is a more s
 While this file would probably benefit from splitting into smaller files, now it has a contiguous set of indexes 
 it is possible to exploit this data via S3.
 
-All the metadata shown in this dump output arises from `pyfive` extensions to the `pyfive.h5t.DatasetID` class. 
-`pyfive` also provides a simple flag: `consolidated_metadata` for a `File` instance, which can take values of 
+All the metadata shown in this dump output arises from ``pyfive`` extensions to the ``pyfive.h5t.DatasetID`` class. 
+``pyfive`` also provides a simple flag: ``consolidated_metadata`` for a ``File`` instance, which can take values of 
 `True` or `False` for any given file, which simplifies at least the "is the index packed at the front of the file?" 
 part of the optimisation question - though inspection of chunking is a key part of the workflow necessary to 
 determine whether or not a file really is optimised for cloud usage.
 
-
 .. [#] Stern et.al. (2022): *Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud Optimized Data Production*,  https://dx.doi.org/10.3389/fclim.2021.782909. 
-.. [#] Hassel and Cimadevilla Alvarez (2025): *Cmip7repack: Repack CMIP7 netCDF-4 Datasets*, https://dx.doi.org/10.5281/zenodo.17550920.
+.. [#] Hassell and Cimadevilla Alvarez (2025): *Cmip7repack: Repack CMIP7 netCDF-4 Datasets*, https://dx.doi.org/10.5281/zenodo.17550920.
diff --git a/doc/conf.py b/doc/conf.py
@@ -67,6 +67,7 @@
     'autosummary': True,
 }
 
+# FIXME: These libraries are not found in the documentation
 autodoc_mock_imports = [
     'cartopy',
     'cf_units',
@@ -164,7 +165,6 @@
 
 # The name of an image file (relative to this directory) to place at the top
 # of the sidebar.
-# FIXME add a logo
 html_logo = "figures/Pyfive-logo.png"
 
 # The name of an image file (within the static path) to use as favicon of the

diff --git a/doc/introduction.rst b/doc/introduction.rst
@@ -5,41 +5,41 @@ About Pyfive
 ============
 
 ``pyfive`` provides a pure Python HDF reader which has been designed to be a thread-safe drop in replacement
-for `h5py <https://github.com/h5py/h5py>`_ with no dependencies on the HDF C library.  It aims to support the same API as 
-for reading files. Cases where access to a file uses a feature that is supported by the high-level ``h5py`` interface but not ``pyfive`` are considered bugs and 
+for `h5py <https://github.com/h5py/h5py>`_ with no dependencies on the HDF5 C library. It aims to support the same API as ``h5py`` for reading files.
+Cases where access to a file uses a feature that is supported by the high-level ``h5py`` interface but not ``pyfive`` are considered bugs and 
 should be reported in our `Issues <https://github.com/NCAS-CMS/pyfive/issues>`_. 
-Writing HDF5 is not a goal of pyfive and portions of the ``h5py`` API which apply only to writing will not be
-implemented.
+
+Writing HDF5 output is not a goal of ``pyfive`` and portions of the ``h5py`` API which apply only to writing will not be implemented.
 
 .. note::
-    While ``pyfive`` is designed to be a drop-in replacement for ``h5py``, the reverse may not be possible. It is possible to do things with ``pyfive`` 
-    that will not work with ``h5py``, and ``pyfive`` definitely includes *extensions* to the ``h5py`` API. This documentation makes clear which parts of
-    the API are extensions and where behaviour differs *by design* from ``h5py``.
+    While ``pyfive`` is designed to be a drop-in replacement for ``h5py``, the reverse may not be possible.
+    It is possible to perform actions with ``pyfive`` that are not supported by ``h5py`` as ``pyfive`` extends the ``h5py`` API beyond its initial specifications.
+    This documentation makes clear which parts of the API are extensions and where behaviour differs *by design* from ``h5py``.
 
-The motivation for ``pyfive`` development were many, but recent developments prioritised thread-safety, lazy loading, and 
+The motivations for ``pyfive`` development were many, but recent developments prioritised thread-safety, lazy loading, and 
 performance at scale in a cloud environment both standalone, 
-and as a backend for other software such as `cf-python <https://ncas-cms.github.io/cf-python/>`_, `xarray <https://docs.xarray.dev/en/stable/>`_,  and `h5netcdf <https://h5netcdf.org/index.html>`_. 
+and as a backend for other software such as `cf-python <https://ncas-cms.github.io/cf-python/>`_, `xarray <https://docs.xarray.dev/en/stable/>`_, and `h5netcdf <https://h5netcdf.org/index.html>`_. 
 
 As well as the high-level ``h5py`` API we have implemented a version of the ``h5d.DatasetID`` class, which now 
-holds all the code which is used for data access  (as opposed to attribute access).  We have also implemented
+holds all the code which is used for data access (as opposed to attribute access). We have also implemented
 extra methods (beyond the ``h5py`` API) to expose the chunk index directly (as well as via an iterator) and 
-to access chunk info using the ``zarr`` indexing scheme rather than the ``h5py`` indexing scheme. This is useful for avoiding
-the need for *a priori* use of ``kerchunk`` to make a ``zarr`` index for a file. 
+to access chunk info using the ``zarr`` indexing scheme rather than the ``h5py`` indexing scheme.
+This is useful for avoiding the need for *a priori* use of ``kerchunk`` to make a ``zarr`` index for a file. 
 
 The code also includes an implementation of what we have called pseudochunking which is used for accessing 
 a contiguous array which is larger than memory via S3. In essence all this does is declare default chunks 
 aligned with the array order on disk and use them for data access.
 
 There are optimisations to support cloud usage, the most important of which is that 
 once a variable is instantiated (i.e. for an open ``pyfive.File`` instance ``f``, when you do ``v=f['variable_name']``) 
-the attributes and b-tree (chunk index) are read, and it is then possible to close the parent file (``f``), 
+the attributes and `b-tree`` (chunk index) are read, and it is then possible to close the parent file (``f``), 
 but continue to use (``v``).
 
-The package includes a script ``p5dump`` which can be used to dump the contents of an HDF5 file to the terminal. 
+The package also includes a command line tool (``p5dump``) which can be used to dump the contents of an HDF5 file to the terminal. 
 
 .. note::
 
-    We have test coverage that shows that the usage of ``v`` in this way is thread-safe -  the test which demonstrates this is slow, 
+    We have test coverage that shows that the usage of ``v`` in this way is thread-safe - the test which demonstrates this is slow, 
     but it needs to be, since shorter tests did not always exercise expected failure modes. 
 
-The pyfive test suite includes all the components necessary for testing pyfive accessing data via both POSIX and S3.
+The ``pyfive`` test suite includes all the components necessary for testing pyfive accessing data via both POSIX and `S3`.
diff --git a/doc/optimising.rst b/doc/optimising.rst
@@ -9,22 +9,22 @@ how the data is stored in the file and how the data access library (in this case
 The data storage complexities arise from two main factors: the use of chunking, and the way attributes are stored in the files.
 
 **Chunking**: HDF5 files can store data in chunks, which allows for more efficient access to large datasets. 
-However, this also means that the library needs to maintain an index (a "b-tree") which relates the position in coordinate space to where each chunk is stored in the file.
-There is a b-tree index for each chunked variable, and this index can be scattered across the file, which can introduce overheads when accessing the data.
+However, this also means that the library needs to maintain an index (`b-tree`) which relates the position in coordinate space to where each chunk is stored in the file.
+There is a `b-tree` index for each chunked variable, and this index can be scattered across the file, which can introduce overheads when accessing the data.
 
-**Attributes**: HDF5 files can store attributes (metadata) associated with datasets and groups, and these attributes are stored in a separate section of the file.
+**Attributes**: HDF5 files can store attributes (`metadata`) associated with datasets and groups, and these attributes are stored in a separate section of the file.
 Again, these can be scattered across the files.
 
 
 Optimising the files themselves
 -------------------------------
 
 Optimal access to data occurs when the data is chunked in a way that matches the access patterns of your application, and when the
-b-tree indexes and attributes are stored contiguously in the file.  
+`b-tree` indexes and attributes are stored contiguously in the file.  
 
 Users of ``pyfive`` will always confront data files which have been  created by other software, but if possible, it is worth exploring whether 
 the `h5repack <https://docs.h5py.org/en/stable/special.html#h5repack>`_ tool can 
-be used to make a copy of the file which is optimised for access by using sensible chunks and to store the attributes and b-tree indexes contiguously.
+be used to make a copy of the file which is optimised for access by using sensible chunks and to store the attributes and `b-tree` indexes contiguously.
 If that is possible, then all access will benefit from fewer calls to storage to get the necessary metadata, and the data access will be faster.
 
 
@@ -84,8 +84,7 @@ For example, you can use the `concurrent.futures` module to read data from multi
 
     print("Results:", results)
 
-
-You can do the same thing to parallelise manipulations within the variables, by for example using, ``Dask``, but that is beyond the scope of this document.
+You can do the same thing to parallelise manipulations within the variables, by for example using, ``dask``, but that is beyond the scope of this document.
 
 
 Using pyfive with S3
@@ -101,8 +100,6 @@ file, which for HDF5 will be stored as one object, look like it is on a file sys
 memory so repeated reads can be more efficient.  The optimal caching strategy is dependent on the file layout
 and the expected access pattern, so ``s3fs`` provides a lot of flexibility as to how to configure that caching strategy.
 
-
-
 For ``pyfive`` the three most important variables to consider altering are the 
 ``default_block_size`` number, the ``default_cache_type`` option and the ``default_fill_cache`` boolean.
 
@@ -121,7 +118,9 @@ For ``pyfive`` the three most important variables to consider altering are the
     This is a boolean which determines whether ``s3fs`` will persistently cache the data that it reads.  
     If this is set to ``True``, then the blocks are cached persistently in memory, but if set to ``False``, then it only makes sense in conjunction with ``default_cache_type`` set to ``readahead`` or ``bytes`` to support streaming access to the data.
 
-Note that even with these strategies, it is possible that the file layout itself is such that access will be slow.  
-See the next section for more details of how to optimise your hDF5 files for cloud acccess.
+.. note::
+
+    Even with these strategies, it is possible that the file layout itself is such that access will be slow.  
+    See the next section for more details of how to optimise your hDF5 files for cloud acccess.