refactor: Replace hardcoded URIs with ontology lookups and remove whitespaces in `GenderExtractor` by vaibhav45sktech · Pull Request #824 · dbpedia/extraction-framework

vaibhav45sktech · 2026-01-24T18:41:29Z

Replaces hardcoded URI strings with context.ontology lookups and improves code quality.

Changes:

Use context.ontology.properties() and context.ontology.classes() instead of raw URIs
Fix pronoun regex: word boundaries + case-insensitive + proper escaping
Pre-instantiate langStringDatatype at class level
Handle division-by-zero in gender ratio calculation
Clean up whitespace and formatting

Resolves issue #825

Summary by CodeRabbit

Release Notes

Improvements
- Enhanced gender extraction with improved validation for entity recognition
- Better language-specific output formatting in results

coderabbitai · 2026-01-24T18:42:06Z

📝 Walkthrough

Walkthrough

Refactors GenderExtractor to use ontology-derived properties instead of hard-coded strings, introduces explicit language code extraction and gender mapping from configuration, adds Person-type verification, reworks pronoun counting with case-insensitive word-boundary matching, and implements threshold-based gender determination with ratio validation.

Changes

Cohort / File(s)	Summary
Gender Extractor Logic Refactoring `core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala`	Replaces string-constant predicates with context.ontology lookups (foaf:gender, rdf:type); adds language code extraction and pronoun mapping from GenderExtractorConfig; introduces Person-type verification check; reworks text analysis to count pronouns with case-insensitive word-boundary matching; replaces pairwise max/min comparisons with sorted-count approach; implements explicit threshold logic (minDifference, minCount) for gender determination; outputs single Quad with langString datatype only when thresholds are met

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related issues

GenderExtractor: Replace hardcoded URI strings with ontology lookups #810: Directly addresses the same file and objective of replacing hard-coded ontology URIs with context.ontology lookups, indicating coordinated refactoring toward ontology-driven implementation.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main refactoring changes: replacing hardcoded URIs with ontology lookups and improving regex patterns with proper word boundaries.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2026-01-24T18:42:20Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

vaibhav45sktech · 2026-01-27T05:03:45Z

Greetings @jimkont ,Kindly review my pr whenever available

updated chnages

d5867cd

This was referenced Jan 24, 2026

GenderExtractor: Replace hardcoded URI strings with ontology lookups #810

Closed

GenderExtractor: Replace hardcoded URI strings with ontology lookups #822

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Replace hardcoded URIs with ontology lookups and remove whitespaces in `GenderExtractor`#824

refactor: Replace hardcoded URIs with ontology lookups and remove whitespaces in `GenderExtractor`#824
vaibhav45sktech wants to merge 1 commit intodbpedia:masterfrom
vaibhav45sktech:fix-gender-extractor

vaibhav45sktech commented Jan 24, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Uh oh!

sonarqubecloud bot commented Jan 24, 2026

Uh oh!

vaibhav45sktech commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaibhav45sktech commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Uh oh!

sonarqubecloud bot commented Jan 24, 2026

Quality Gate passed

Uh oh!

vaibhav45sktech commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaibhav45sktech commented Jan 24, 2026 •

edited

Loading

coderabbitai bot commented Jan 24, 2026 •

edited

Loading