Advanced VBA script for fixing UTF-8/CP-1252 encoding issues in CRM exports
Renamed project from bogus-character-cleanup to crm-encoding-repair to better reflect purpose and target audience
When working with international contact data in CRMs (Salesforce, HubSpot, Dynamics, etc.), encoding issues cause names and text to become corrupted during export/import:
Müller→MüllerFrançois→FrançoiseŚwięty→ÅšwiÄ™tyMalmö→Malmö
This happens when UTF-8 encoded data is incorrectly read as CP-1252/Windows-1252, making the data unusable for database uploads, mail merges, and automation.
This VBA macro automatically detects and fixes 250+ common encoding corruption patterns in a single pass, with detailed logging of every change made.
- ✅ 250+ encoding patterns covering Western, Eastern, and Nordic European languages
- ✅ Comprehensive CP-1252 support including smart quotes, dashes, currency symbols
- ✅ Detailed audit logging - timestamped log file with before/after values for every change
- ✅ 70% faster than naive sequential checking
- ✅ Smart processing - only updates cells that need fixing
- ✅ Triple-encoding detection for data that's been corrupted multiple times
Western European: German, French, Spanish, Portuguese, Italian, Dutch
Nordic: Swedish, Norwegian, Danish, Icelandic, Finnish
Eastern European: Polish, Czech, Slovak, Romanian, Hungarian, Croatian
Baltic: Latvian, Lithuanian, Estonian
Other: Turkish, Albanian
- Open Excel with your corrupted CRM export
- Press
Alt + F11to open VBA Editor - Insert → Module
- Paste the contents of
FixEncodingWithLogging.vba - Save and close VBA Editor
- Open your spreadsheet with encoding issues
- Activate the worksheet you want to fix
- Press
Alt + F8to open Macros dialog - Select
FixingEncodingIssuesAdvanced - Click Run
The script will:
- Process all cells in the active worksheet
- Fix all detected encoding issues
- Create a timestamped log file (e.g.,
EncodingFixes_2025-11-01_14-30-45.txt) - Display a summary of changes
- Full Documentation - Complete guide with troubleshooting
- Quick Reference Guide - Common issues by language
- Sample Log File - Example output
- Accented letters: é→é, ü→ü, ñ→ñ, Å‚→ł
- Smart quotes: “→", ’→'
- Dashes: â€"→–, â€"→—
- Currency: €→€, £→£
- Symbols: ©→©, ®→®, °→°
- HTML entities: '→', "→"
- Müller → Müller
- Françoise → Françoise
- ÅšwiÄ™ty → Święty
- Malmö → Malmö
- Jiří Dvořák → Jiří Dvořák
- O'Brien → O'Brien
- ✅ Salesforce contact exports
- ✅ HubSpot data migrations
- ✅ Microsoft Dynamics 365 reports
- ✅ Zoho CRM exports
- ✅ Pipedrive CSV downloads
- ✅ Any CSV/Excel export with international names
This script fixes the specific issue where UTF-8 encoded text (the modern standard) is incorrectly interpreted as CP-1252/Windows-1252 encoding. This commonly occurs when:
- Exporting from web-based CRMs to Excel
- Importing CSV files with wrong encoding
- Migrating data between systems
- Copy/pasting from emails or web forms
The script uses a dictionary-based approach for O(1) lookup performance, processing each cell only once.
- Added logging system with timestamped audit files
- Expanded from 40 to 250+ encoding patterns
- Added CP-1252 complete coverage
- Added support for Czech, Slovak, Turkish, Romanian, Baltic languages
- Improved performance (70% faster)
- Added triple-encoding detection
- Core UTF-8 encoding fixes for German, French, Spanish, Polish
- Manual if/then checking
- ~40 common patterns
Found an encoding pattern not covered? Submit an issue with:
- The corrupted text (e.g., "Müller")
- What it should be (e.g., "Müller")
- The language/origin
Free to use and modify for your organization. Attribution appreciated but not required.
Party on Wayne! 🎸