Skip to content

Bug: Mods duplicating in Firestore — URL format mismatch in icarus-mod-tools sync pipeline #78

@AgentKush

Description

@AgentKush

Bug: Mods duplicating in Firestore — URL format mismatch in icarus-mod-tools sync pipeline

Summary

Mods are being duplicated in the Firestore mods collection (and therefore on projectdaedalus.app/mods and in the Mod Manager). Out of 437 total mod documents, 18 mods are duplicated (15 exact same-author duplicates, 3 same-name-different-author). The root cause is that the imt sync pipeline creates two separate Firestore documents for the same mod because GitHub URLs appear in two different-but-equivalent string formats, and no URL normalization is applied anywhere in the pipeline.

Affected Mods (18 duplicates found)

Mod Name Author DocID (Copy A — updated) DocID (Copy B — orphaned)
Ammo Crafting Overhaul AgentKush C36TS9hnyLTpdvZlQtp9 MRAJGEkocxjFkLlDzjuQ
Armor Set Bonuses Enhanced AgentKush PVroMlKJxc0B0hifUhwP bdCEYK62gdl5EQyssNZW
Creature Difficulty Scaling AgentKush hZ4KC1AxNU3trvnoO9tt ywxphBQ0u9mM78Uzsdkg
Creature Drop Enhancer AgentKush B486JpYYGqaVGV9azcid r4hNce8YNsuhxMtyXygz
Exotic Economy Overhaul AgentKush ZoAqKslCgHTh05gmFN9L hMWOOZjijHNBV89vbHdG
Extended Spoil Timers AgentKush HsF6nRiXT1OsnMKbNSrP qLnloK5jAz1AATpiiYGy
Faster Crafting AgentKush 49aoTZnWS1aGfCsXyHpp VYEMpqUxCwpBfSgIFZoV
Fish and Farming Boost AgentKush QjjJOWjmMCQgYGPfP2Rr ZcIRrbgz7jnwxBT4h9Vc
Hardcore Rebalance Pack AgentKush TYFO3Oywk2xIb2dX2DZs qbP6EorPnjDqDhzpk0dv
Performance Optimizer AgentKush eQh97ZFKg4OTkWme8W3v xo3yyL0ez61tZvnzS9cj
Stack Size Overhaul AgentKush 6gNWsTqVbmmjXZ91tZD8 t7pRO3RC2sq4wQa34GQA
Tier Upgrade Forge AgentKush cXi3IIm7NF0FF3YodSvu gyw13Wg07HCFPvyIUHPK
Trap and Defense Expansion AgentKush Y0dtJQlZiWl1O9q7z7RC bDZEOAKJ8dmMDIJx4h7O
Jimk Fixed Weather Vane Jimk72 kVFxZDaWtEJPnnFqi8gN z6pIfflkYpFjfshtxqA9
Extraction 10 Seconds Jimk72 / TheOrangeFloof LRmG4qwVxMyN6aDSdtlt SR9N5m9olseyhxcEwp1a
Extraction 5 Seconds TheOrangeFloof / Jimk72 BuM5jiC6WBbdSojVYkoE khzIZhD0GpRxUg8zvq5c
Larkwell Care Package Begginfokillz / Eric So6FTF4m3gdV0m6XaFPU c5y23huHKkdOgCpRVkW4
Zay - Extended Frozen Ore Zayon tuYdRhOkJXoM2fe1wtbn WU6IA6olDg2xIVNE0EAR

Root Cause

The URL format mismatch

Modders write URLs in modinfo.json using the human-friendly GitHub format:

https://github.com/AgentKush/Icarus-mods/raw/main/Faster_Crafting/Faster_Crafting.EXMODZ

The GitHub Contents API (via Octokit) returns download_url values in the raw format:

https://raw.githubusercontent.com/AgentKush/Icarus-mods/main/Faster_Crafting/Faster_Crafting.EXMODZ

These point to the exact same file (GitHub 302-redirects one to the other), but they're different strings. There is no URL normalization anywhere in icarus-mod-tools to canonicalize them before comparison or storage.

How this creates duplicates

During a single imt sync run, the meta/modinfo/list collection ends up with both URL formats pointing to the same modinfo.json:

  1. https://raw.githubusercontent.com/AgentKush/Icarus-mods/main/modinfo.json (from GitHub API scan)
  2. https://github.com/AgentKush/Icarus-mods/raw/main/modinfo.json (possibly from manual imt add or a secondary discovery path)

When imt sync mods runs, it fetches the modinfo.json from both URLs, getting identical JSON content. Both produce Modinfo objects with the same name/author. The first creates a Firestore document. The second should find it via sync.find(list) and update — but instead creates a new document.

Why sync.find() misses the existing document

The most likely explanation is Firestore eventual consistency. The duplicate pairs are created within ~5 seconds of each other:

'Tier Upgrade Forge': 
  Copy A created: 2026-02-22T05:00:44.459953Z
  Copy B created: 2026-02-22T05:00:49.731891Z  (5.3 seconds later)

The batch processes all mods from the first modinfo.json URL, then immediately processes the same mods from the second URL. The find query runs before Firestore has fully indexed the documents from the first batch, returns nil, and a duplicate is created.

Evidence: Copy B entries are orphaned

Copy B entries have createTime == updateTime — they were created once and never updated again by subsequent sync runs:

Copy A (updated regularly):
  Created:  2026-02-22T05:00:44Z
  Updated:  2026-02-25T01:00:52Z  ← different, gets regular updates

Copy B (orphaned):
Created: 2026-02-22T05:00:49Z
Updated: 2026-02-22T05:00:49Z ← same as create, never touched again

This confirms Copy B was created by accident and is never matched by subsequent sync runs — likely because the subsequent runs only process one URL format from meta/modinfo/list.

Additional data anomaly: "Eye Colors Expanded!" has wrong download URLs

The mod Eye Colors Expanded! (DocID: 55F4mIY6qi5RYsAY278Y) has file URLs that actually point to More Drop Ship Slots files:

exmodz: .../More%20Drop%20Ship%20Slots/More%20Drop%20Ship%20Slots.EXMODZ
pak:    .../More%20Drop%20Ship%20Slots/More%20Drop%20Ship%20Slots_P.pak

This is probably a modder error in their modinfo.json but worth noting.

Suggested Fix

1. Add URL normalization (primary fix)

Add a helper method to normalize all GitHub URLs to a single canonical format:

def normalize_github_url(url)
  return url if url.nil? || url.empty?

github.com/OWNER/REPO/raw/BRANCH/path → raw.githubusercontent.com/OWNER/REPO/BRANCH/path

url.sub(
%r{https://github\.com/([^/]+/[^/]+)/raw/},
'https://raw.githubusercontent.com/\1/'
).sub(
%r{https://github\.com/([^/]+/[^/]+)/blob/},
'https://raw.githubusercontent.com/\1/'
)
end

Apply this when:

  • Storing download_url values in meta/modinfo/list (during imt sync modinfo)
  • Parsing URLs from modinfo.json content (during imt sync mods)
  • Storing file URLs in the mods collection documents

2. Deduplicate meta/modinfo/list (secondary fix)

Before running imt sync mods, deduplicate the modinfo URL list by normalized URL so the same modinfo.json isn't fetched twice.

3. Add a pre-create check with retry (belt-and-suspenders)

Before creating a new document in sync_list, add a brief delay or re-query to handle Firestore eventual consistency:

doc_id = sync.find(list)
if doc_id.nil?
  sleep(0.5)  # Brief pause for Firestore consistency
  doc_id = sync.find(list)  # Retry
end

4. Cleanup existing duplicates

The 18 "Copy B" orphaned documents should be deleted from Firestore. These are the entries where createTime == updateTime from the table above.

How to Reproduce

  1. Have a modinfo.json with URLs using github.com/.../raw/... format
  2. Ensure the repo is in the tracked repos list
  3. Run imt sync modinfo — this stores the modinfo.json URL in raw.githubusercontent.com format (from GitHub API)
  4. If meta/modinfo/list also contains the github.com/.../raw/... format URL (from manual add or secondary discovery), both are stored
  5. Run imt sync mods — both URLs get fetched, producing duplicate Modinfo objects
  6. Rapid-fire Firestore writes bypass the find deduplication due to eventual consistency

Environment

  • Icarus-Mod-Tools gem v2.5.x
  • Google Cloud Firestore (project: projectdaedalus-fb09f)
  • Mod Manager: IcarusModManagerPATCH241
  • Website: projectdaedalus.app/mods

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions