Skip to content

Abstract formats and restructure the repo#170

Merged
Pierlou merged 22 commits intomasterfrom
refactor/abstract-formats
Dec 2, 2025
Merged

Abstract formats and restructure the repo#170
Pierlou merged 22 commits intomasterfrom
refactor/abstract-formats

Conversation

@Pierlou
Copy link
Copy Markdown
Collaborator

@Pierlou Pierlou commented Nov 18, 2025

The idea is to get rid of the current repo structure with detect_fields and detect_labels to end up with a unified and cleaner layout. Therefore:

  • creation of a Format class and FormatsManager to deal with them (in format.py)
  • creation of a formats/ folder that contains all formats as single python files, with the format's name as file name (float.py, siren.py, ...)
  • the tree structure that is used to segment the formats (FR, geo, etc.) is replaced with the tags attribute
  • the labels attribute allows to merge fields and labels checks in single files, everything about let's say latitude_wgs is in latitude_wgs.py

Example:

from csv_detective.format import FormatsManager
fmtm = FormatsManager()
for format in fm.get_formats_from_tags(["fr", "geo"]):
    print(format.name)
    format.func("42")  # to test the value for each selected format

This approach will also allow to more easily perform overall additions/modifications (with in mind a new version of #155 with a parent attribute, which, combined with chunks, should speed up the analysis)

@Pierlou Pierlou marked this pull request as ready for review December 1, 2025 15:28
@Pierlou
Copy link
Copy Markdown
Collaborator Author

Pierlou commented Dec 1, 2025

The diff is quite big, but the only files you'd like to check are:

  • format.py
  • formats/__init__.py
  • the tests refactors if you want to check that the behaviour remains very similar after all the changes
  • the previous logic that was in load_tests.py (hopefully you agree that these changes make it clearer and cleaner)

Copy link
Copy Markdown
Contributor

@bolinocroustibat bolinocroustibat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review everything of course, but the current structure is definitely more readable.
A few remarks (I might add more later):

  • if possible, isn't it better not to use relative imports?
  • I forgot what exactly is the mechanic behind proportion (ex-PROPORTION) again. Would be great to explain it in the README (and with a little code comment as well maybe)

Comment thread csv_detective/formats/date_fr.py Outdated
Comment thread csv_detective/format.py Outdated
Comment thread csv_detective/format.py Outdated
Comment thread csv_detective/formats/__init__.py
@Pierlou Pierlou merged commit 9638f5b into master Dec 2, 2025
5 checks passed
@Pierlou Pierlou deleted the refactor/abstract-formats branch December 2, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants