Skip to content

Improve error handling for numpy.histogram binning errors #68

@gallah-gallah

Description

@gallah-gallah

Description
I encountered a ValueError when using rio_stac to process certain datasets (usually small tiles, please check attached S2 tile). The error originates from a call to numpy.histogram, specifically with the message "Too many bins for data range. Cannot create 10 finite-sized bins." (https://github.com/numpy/numpy/blob/0532af47d6a815298b7841de00bdbc547104b237/numpy/lib/_histograms_impl.py#L449)

This occurs when the data range is extremely small (e.g., all values are identical or nearly identical), which makes it impossible for numpy to calculate a histogram with the requested number of bins.

While this is an issue with the underlying data, the raw numpy error message is not very user-friendly. It would be beneficial for rio_stac to catch this specific ValueError and handle it more gracefully, providing a more informative message to the user about the data's characteristics and why a histogram could not be generated.

It will be raised here

sample, edges = numpy.histogram(arr[~arr.mask])

What we currently use is a np.allclose call and then using the minimal schema conform histogram information (3 bins, min, max and first bucket entry all the unique value.
Please mind in order to reproduce this use e.g. public.ecr.aws/docker/library/python:3.11.8. Depending on the numpy build and linked libs, it can also not occur.

Thans, Erik

S2A_MSIL2A_20220722T105631_N0400_R094_T31TCN_20220722T171159.SAFE_earth-search_sentinel-2-l2a_data.tif

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions