Skip to content

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475

Open
raphodn wants to merge 4 commits intoraphodn/pre-commit-ruff-fixfrom
raphodn/get-taxonomy-oxf-refactor
Open

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475
raphodn wants to merge 4 commits intoraphodn/pre-commit-ruff-fixfrom
raphodn/get-taxonomy-oxf-refactor

Conversation

@raphodn
Copy link
Copy Markdown
Member

@raphodn raphodn commented Apr 25, 2026

Description

Following #466

Solution

  • TaxonomyType type: add a new dataset_filename key & dataset_path property
  • replace TAXONOMY_URLS with TAXONOMY_MAPPING
  • new _generate_file_path and use both TAXONOMY_MAPPING & TaxonomyType.dataset_path

Related issue(s)

@sonarqubecloud
Copy link
Copy Markdown

origin = "origin"
language = "language"
other_nutritional_substance = "other_nutritional_substance"
dataset_filename: str
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to avoid mypy error

src/openfoodfacts/types.py:901: error: "TaxonomyType" has no attribute "dataset_filename"  [attr-defined]

Copy link
Copy Markdown
Contributor

@Freso Freso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment/suggestion, but overall LGTM!

Comment on lines +896 to +930
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj

category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense/work to flip these two block around?

Suggested change
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj
category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's a good idea

other_nutritional_substance = "other_nutritional_substance"
dataset_filename: str

def __new__(cls, value: str, dataset_filename: str):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants