Skip to content

Add typosquatting detection to supply chain scanner (CHK-SUP-009) #10

@MikeeBuilds

Description

@MikeeBuilds

Summary

The forensic analysis documents active typosquatting attacks on MoltHub where attackers create packages with names nearly identical to popular legitimate ones. The example given: account aslaep123 mimicking the legitimate asleep123 developer.

Currently, scan_supply_chain.sh CHK-SUP-003 only checks against a static list of known malicious package names. It does not perform fuzzy matching to detect new typosquats that aren't yet in the database.

Attack Pattern

Legitimate Package                  Typosquat Package
-------------------                  ------------------
openclaw-utils          -->        openclaw-utlis        (transposed 'i' and 'l')
clawhub-sdk             -->        clawhuub-sdk          (doubled letter)
asleep123/cool-skill    -->        aslaep123/cool-skill  (author name swap)
openclaw-ai             -->        0penclaw-ai           (zero for 'o')
claw-calendar           -->        claw-calender         (common misspelling)

Proposed Check: CHK-SUP-009

Algorithm: Levenshtein Distance + Homoglyph Detection

For each installed skill package name, compute:

  1. Levenshtein distance against all known trusted packages
  2. Homoglyph substitution check (0/O, 1/l/I, rn/m)
  3. Character transposition detection
TRUSTED_PACKAGES = [
    "openclaw-utils",
    "clawhub-sdk",
    "claw-calendar",
    "openclaw-ai",
    # loaded from references/trusted-packages.json
]

HOMOGLYPHS = {
    '0': 'o', 'O': '0',
    '1': 'l', 'l': '1', 'I': 'l',
    'rn': 'm', 'm': 'rn',
    'vv': 'w', 'w': 'vv',
}

def is_typosquat(name: str, trusted_list: list) -> tuple:
    """Returns (is_suspicious, closest_match)"""
    for trusted in trusted_list:
        if name == trusted:
            return (False, "")
        
        distance = levenshtein(name, trusted)
        
        # Flag if edit distance is 1-2 (very close but not exact)
        if 0 < distance <= 2:
            return (True, trusted)
        
        # Check homoglyph substitution
        normalized = apply_homoglyphs(name)
        if normalized == trusted:
            return (True, trusted)
    
    return (False, "")

Implementation

Delegate the fuzzy matching to a Python helper called from scan_supply_chain.sh:

check_typosquat_fuzzy() {
  log_info "CHK-SUP-009: Running fuzzy typosquatting detection..."
  
  local trusted_file="$SCRIPT_DIR/../references/trusted-packages.json"
  
  for skill_dir in "$SKILLS_DIR"/*/; do
    [[ -d "$skill_dir" ]] || continue
    local skill_name
    skill_name="$(basename "$skill_dir")"
    
    local result
    result=$(python3 "$SCRIPT_DIR/helpers/typosquat_check.py" \
      "$skill_name" "$trusted_file" 2>/dev/null)
    
    if [[ -n "$result" ]]; then
      emit_finding \
        "CHK-SUP-009" \
        "critical" \
        "Possible typosquat: '$skill_name' resembles '$result'" \
        "Package name is suspiciously similar to trusted package '$result'." \
        "installed=$skill_name closest_trusted=$result" \
        "Verify the package source. If not from the official publisher, remove immediately."
    fi
  done
}

New Files Required

scripts/helpers/typosquat_check.py    # Levenshtein + homoglyph checker
references/trusted-packages.json       # Known-good package names + publishers

Example trusted-packages.json

{
  "packages": [
    "openclaw-utils",
    "clawhub-sdk",
    "claw-calendar",
    "openclaw-ai",
    "claw-weather",
    "openclaw-slack-bridge",
    "claw-gmail-reader"
  ],
  "publishers": [
    "openclaw-official",
    "clawhub-verified",
    "asleep123",
    "steinberger"
  ]
}

Detection Matrix

Technique Example Detection Method
Letter transposition openclaw-utlis vs openclaw-utils Levenshtein distance = 2
Doubled character clawhuub-sdk vs clawhub-sdk Levenshtein distance = 1
Homoglyph 0penclaw-ai vs openclaw-ai Homoglyph normalization
Misspelling claw-calender vs claw-calendar Levenshtein distance = 1
Author spoof aslaep123/skill vs asleep123/skill Publisher name fuzzy match

References

  • Forensic analysis: "Typosquatting" -- aslaep123 mimicking asleep123
  • Forensic analysis: "Automated scripts to upload hundreds of malicious skills every few minutes"
  • OWASP ASI04: Supply Chain

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity-related issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions