GEDMerge

A powerful tool to find and merge duplicate people in GEDCOM genealogy files.

Overview

GEDMerge is designed to help genealogists clean up their family tree data by automatically detecting and merging duplicate individual records in GEDCOM files. The tool uses advanced matching algorithms including fuzzy string matching and phonetic comparison to identify potential duplicates with high accuracy.

Features

GEDCOM Parsing: Full support for GEDCOM 5.5 and 5.5.1 formats
Duplicate Detection: Advanced algorithms to find potential duplicate individuals
Smart Merging: Intelligent merging that preserves all relevant data
Data Integrity: Maintains relationships between individuals and families
CLI Interface: Easy-to-use command-line interface

Project Structure

GedMerge/
├── gedmerge/
│   ├── core/           # Core data models and GEDCOM parsing
│   ├── matching/       # Duplicate detection algorithms
│   ├── merge/          # Merge logic and strategies
│   ├── ui/             # User interface (CLI)
│   ├── utils/          # Utility functions
│   └── data/           # Data storage and caching
├── tests/              # Test suite
├── GEDCOM/             # Sample GEDCOM files
├── pyproject.toml      # Project configuration and dependencies
└── README.md           # This file

Installation

From Source

cd GedMerge
pip install -e .

With Development Dependencies

pip install -e ".[dev]"

Dependencies

python-gedcom: GEDCOM file parsing
rapidfuzz: Fast fuzzy string matching
phonetics: Phonetic matching algorithms (Soundex, Metaphone, etc.)
pytest: Testing framework (dev dependency)

Quick Start

Load and Analyze a GEDCOM File

gedmerge analyze path/to/your/file.ged

This will display basic statistics about your GEDCOM file including:

Number of individuals
Number of families
Date range
Geographic coverage

Find Duplicates

gedmerge find-duplicates path/to/your/file.ged

Merge Duplicates

gedmerge merge path/to/your/file.ged --output merged.ged

Name Preprocessing for Duplicate Detection

IMPORTANT: Before running duplicate detection, names must be preprocessed to ensure consistent "apples to apples" comparisons.

See NAMING_CONVENTIONS.md for complete details.

Quick Overview

NN Convention: Use NN for missing given names (genealogy standard)
Language Codes: Set ISO 639-1 codes (en, fr, de) for all names
Clean Variants: Separate embedded variants like Margaret [Marguerite] into separate name records
Remove Placeholders: Remove generic placeholders like EndofLine, Unknown
Preserve Meaningful Data: Keep mother's maiden names even if different from children

Preprocessing Workflow

# Step 1: Structural cleanup (NN convention, placeholders)
python ../preprocess_names_for_matching.py database.rmtree --report
python ../preprocess_names_for_matching.py database.rmtree --execute

# Step 2: Language analysis and variant separation
python ../analyze_name_structure.py database.rmtree --check-language
python ../analyze_name_structure.py database.rmtree --fix-variants --execute

# Step 3: NOW ready for duplicate detection
gedmerge find-duplicates database.rmtree

RootsMagic Compatibility

✅ Language codes are safe - The NameTable.Language field is a standard RootsMagic field and will NOT break your database.

Development

Running Tests

pytest

Running Tests with Coverage

pytest --cov=gedmerge --cov-report=html

Roadmap

Phase 1: Foundation (Current)

Phase 2: Duplicate Detection

Name matching algorithms
Date/place comparison
Relationship analysis
Scoring system

Phase 3: Merging Logic

Merge strategies
Conflict resolution
Data preservation
Undo capability

Phase 4: User Interface

Interactive CLI
Review and approve duplicates
Batch operations
Progress tracking

Phase 5: Advanced Features

Machine learning for matching
Custom matching rules
Performance optimization
Export reports

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - See LICENSE file for details

Acknowledgments

GEDCOM format specification by The Church of Jesus Christ of Latter-day Saints
python-gedcom library contributors
Genealogy community for feature suggestions and testing

Support

For issues, questions, or suggestions, please open an issue on the project repository.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
GEDCOM		GEDCOM
RootsMagicData		RootsMagicData
Rootsmagic		Rootsmagic
docs		docs
examples		examples
gedmerge		gedmerge
models		models
scripts		scripts
tests		tests
.gitignore		.gitignore
BUG_REPORT_DATE_MATCHING.md		BUG_REPORT_DATE_MATCHING.md
CONTINUAL_LEARNING.md		CONTINUAL_LEARNING.md
CONTINUAL_LEARNING_SUMMARY.md		CONTINUAL_LEARNING_SUMMARY.md
GENEALOGY_MATCHING_ANALYSIS.md		GENEALOGY_MATCHING_ANALYSIS.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LANGUAGE_SUPPORT.md		LANGUAGE_SUPPORT.md
LICENSE		LICENSE
ML_DOCUMENTATION.md		ML_DOCUMENTATION.md
ML_QUICKSTART.md		ML_QUICKSTART.md
PYTHON_3.11_OPTIMIZATIONS.md		PYTHON_3.11_OPTIMIZATIONS.md
QUICK_REFERENCE_INVESTIGATION.md		QUICK_REFERENCE_INVESTIGATION.md
README.md		README.md
README_ML.md		README_ML.md
VALIDATION_FEATURES.md		VALIDATION_FEATURES.md
analyze_cleanup.py		analyze_cleanup.py
analyze_multilingual.py		analyze_multilingual.py
demo_language_support.py		demo_language_support.py
multilingual_analysis.txt		multilingual_analysis.txt
phonetics.py		phonetics.py
pyproject.toml		pyproject.toml
show_equivalency_tables.py		show_equivalency_tables.py
test_adapter.py		test_adapter.py
test_cleaning_simple.py		test_cleaning_simple.py
test_french_name_fixes.py		test_french_name_fixes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEDMerge

Overview

Features

Project Structure

Installation

From Source

With Development Dependencies

Dependencies

Quick Start

Load and Analyze a GEDCOM File

Find Duplicates

Merge Duplicates

Name Preprocessing for Duplicate Detection

Quick Overview

Preprocessing Workflow

RootsMagic Compatibility

Development

Running Tests

Running Tests with Coverage

Roadmap

Phase 1: Foundation (Current)

Phase 2: Duplicate Detection

Phase 3: Merging Logic

Phase 4: User Interface

Phase 5: Advanced Features

Contributing

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

skyelaird/GedMerge

Folders and files

Latest commit

History

Repository files navigation

GEDMerge

Overview

Features

Project Structure

Installation

From Source

With Development Dependencies

Dependencies

Quick Start

Load and Analyze a GEDCOM File

Find Duplicates

Merge Duplicates

Name Preprocessing for Duplicate Detection

Quick Overview

Preprocessing Workflow

RootsMagic Compatibility

Development

Running Tests

Running Tests with Coverage

Roadmap

Phase 1: Foundation (Current)

Phase 2: Duplicate Detection

Phase 3: Merging Logic

Phase 4: User Interface

Phase 5: Advanced Features

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages