Skip to content

[FEATURE]: Add Unit Tests for create_tokenizer and load_merkle_proof #77

@Shubhamx404

Description

@Shubhamx404

Feature and its Use Cases

Overview

The repository currently lacks unit tests for two utility functions:
create_tokenizer in tests/test_tokenizer.py and load_merkle_proof in tests/test_util.py

  • This lack of test coverage creates blind spots, making the code harder to safely refactor and potentially allowing bugs to be introduced without being detected by our test suite.

Example scenarios to test:

  • When "bpe" is passed, the function should return an instance of the BPE tokenizer class.
  • When "sentencepiece" is passed, the function should return an instance of the SentencePiece tokenizer class.
  • When an unsupported tokenizer type is passed, the function should raise a ValueError.
  • Loading a valid JSON Merkle proof file returns the expected structure.
  • Attempting to load a non-existent file raises FileNotFoundError.
  • Attempting to load invalid JSON raises a parsing exception.

Acceptance Criteria

  • Unit tests exist for both create_tokenizer and load_merkle_proof.
  • Tests cover both successful execution and failure scenarios.
  • All tests pass when running:

Additional Context

No response

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions