HuggingFace datasets use aggressive caching for map() operations, which creates a significant development workflow issue: when you modify a function used in dataset mapping, the cached results from the old function version are still returned, making it appear as if your code changes have no effect.
We need some sort of cache cleaning or debug_mode for development reasons, as users may have issues when trying to customize the data loading and preprocessing functions of the repository.
HuggingFace datasets use aggressive caching for map() operations, which creates a significant development workflow issue: when you modify a function used in dataset mapping, the cached results from the old function version are still returned, making it appear as if your code changes have no effect.
We need some sort of cache cleaning or
debug_modefor development reasons, as users may have issues when trying to customize the data loading and preprocessing functions of the repository.