Fix template text extraction for Lang, Native name, and Nihongo templates by vaibhav45sktech · Pull Request #828 · dbpedia/extraction-framework

vaibhav45sktech · 2026-01-27T16:57:37Z

Problem

Templates like {{lang|nap|Abbrùzzu}} and {{Nihongo2|東京都}} in Wikipedia infoboxes
were not being extracted, resulting in missing text content in DBpedia.

Root Cause

The Lang template was configured to extract parameter 3, but {{lang}} only has 2 parameters.
Additionally, Native name, Nihongo, and Nihongo2 templates were not configured.

Fix

Updated templatetransform.json:

Lang: Extract param 2 (was incorrectly param 3)
Native name|native_name: Added - extracts param 2
Nihongo2: Added - extracts param 1
Nihongo: Added - extracts param 2

Examples

Template	Before	After
`{{lang\|nap\|Abbrùzzu}}`	(empty)	Abbrùzzu
`{{Nihongo2\|東京都}}`	(empty)	東京都

Testing

Added test cases to TemplateTransformParserTest.scala
Verified configuration with standalone validation script

fixes issue #747

Summary by CodeRabbit

New Features
- Enhanced template parsing to support additional language template formats for improved localization data extraction.
Tests
- Added test coverage for new language template parsing scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ates

coderabbitai · 2026-01-27T16:58:18Z

📝 Walkthrough

Walkthrough

Template transformation rules in the configuration are updated to handle new wiki template patterns for native names and Japanese text. Corresponding test cases are added to verify text extraction from these newly supported templates.

Changes

Cohort / File(s)	Summary
Template Transformation Configuration `core/src/main/resources/templatetransform.json`	Modified Lang replacement rule pattern; added three new public key entries (native_name, Nihongo2, Nihongo) with textNode transformers and corresponding replacement patterns
Template Parser Tests `core/src/test/scala/org/dbpedia/extraction/wikiparser/TemplateTransformParserTest.scala`	Added four test cases verifying text extraction from lang, native_name, Nihongo2, and Nihongo wiki template variants

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: fixing template text extraction for three specific template types that are central to the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala`:
- Around line 41-42: The Datatype constructor is being called with only one
argument for langStringDatatype which causes a compile error; update the
initialization of langStringDatatype (the private val langStringDatatype) to
supply the required three parameters (name, labels, comments) or retrieve the
existing datatype from the ontology; for example, use
context.ontology.datatypes.getOrElse("rdf:langString", new
Datatype("rdf:langString", Map.empty[String,String], Map.empty[String,String"]))
so Datatype is constructed with the proper arguments or the ontology-provided
instance is used.
- Line 80: The check "if (genderCounts.isEmpty) return Seq.empty" is incorrect
because genderCounts may contain zero-valued entries even when no pronouns
matched; update the early-exit to check actual matched counts instead—either
remove the check entirely and rely on the later "maxCount >
GenderExtractorConfig.minCount" guard, or replace it with a concrete check such
as "if (genderCounts.values.forall(_ == 0)) return Seq.empty" (or compute
maxCount here and return when maxCount == 0) to ensure we only exit when no
pronouns were matched; reference variables: genderCounts, pronounMap, and
GenderExtractorConfig.minCount within class GenderExtractor.

🧹 Nitpick comments (1)

core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala (1)

66-78: Consider a functional approach for pronoun counting.

The mutable reassignment pattern can be replaced with a more idiomatic functional approach using foldLeft or groupMapReduce.

♻️ Suggested functional alternative

-   var genderCounts: Map[String, Int] =
-     Map.empty.withDefaultValue(0)
-
-   for ((pronoun, gender) <- pronounMap) {
-     val regex =
-       new Regex("(?i)\\b" + Regex.quote(pronoun) + "\\b")
-
-     val count =
-       regex.findAllIn(wikiText).size
-
-     genderCounts =
-       genderCounts.updated(gender, genderCounts(gender) + count)
+   val genderCounts: Map[String, Int] =
+     pronounMap.foldLeft(Map.empty[String, Int].withDefaultValue(0)) {
+       case (counts, (pronoun, gender)) =>
+         val regex = new Regex("(?i)\\b" + Regex.quote(pronoun) + "\\b")
+         val count = regex.findAllIn(wikiText).size
+         counts.updated(gender, counts(gender) + count)
    }

core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala

sonarqubecloud · 2026-01-28T16:10:15Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

vaibhav45sktech · 2026-01-28T16:12:43Z

Greetings @TallTed , kindly review my pr whenever available .

TallTed · 2026-01-28T18:10:52Z

@vaibhav45sktech — Your PR is beyond my scope. Please look into CODEOWNERS and the like.

vaibhav45sktech · 2026-01-28T21:50:25Z

@vaibhav45sktech — Your PR is beyond my scope. Please look into CODEOWNERS and the like.

Thanks @TallTed

vaibhav45sktech · 2026-01-28T21:51:17Z

Greetings @jimkont , Could you kindly review my pr whenever available .

vaibhav45sktech added 2 commits January 24, 2026 17:57

updated chnages

339a568

Fix template text extraction for lang, native name, and Nihongo templ…

70fbc63

…ates

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala Outdated Show resolved Hide resolved

core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala Outdated Show resolved Hide resolved

Revert GenderExtractor.scala to upstream version

10f7858

vaibhav45sktech closed this Jan 27, 2026

vaibhav45sktech reopened this Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix template text extraction for Lang, Native name, and Nihongo templates#828

Fix template text extraction for Lang, Native name, and Nihongo templates#828
vaibhav45sktech wants to merge 3 commits intodbpedia:masterfrom
vaibhav45sktech:fix-template-text-extraction

vaibhav45sktech commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 28, 2026

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

TallTed commented Jan 28, 2026

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vaibhav45sktech commented Jan 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Examples

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 28, 2026

Quality Gate passed

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

TallTed commented Jan 28, 2026

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

vaibhav45sktech commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vaibhav45sktech commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading