Skip to content

Possible bug: Should heuristic_comparator consider opposite-sign transactions as similar? #162

@william-davies

Description

@william-davies

I noticed that heuristic_comparator in similar.py currently treats transactions with opposite-sign amounts as similar. I wanted to check if this is intended behavior or if it might be worth considering a change.

Current Behavior

In the current implementation:

            diff = abs((number1 / number2) if number2 != ZERO else (number2 / number1))
            if diff == ZERO:
                return False
            if diff < ONE:
                diff = ONE / diff
            if (diff - ONE) < epsilon:
                break

The abs() is applied to the ratio of the numbers, which means:

  • number1 = -100, number2 = 100
  • diff = abs(-100/100) = abs(-1) = 1
  • 1 - 1 = 0 < epsilon
  • Therefore the amounts are considered "similar"

Potential Consideration

Looking at the test cases in similar_test.py, I noticed there aren't any test cases involving opposite-sign transactions.

If treating opposite-sign transactions as similar isn't intentional, we might want to add a sign check. This would prevent matching transactions that represent fundamentally different operations (like a deposit vs withdrawal).

Suggested Change

If this makes sense, we could add a sign check before comparing the ratio:

# Look for amounts on common accounts.
common_keys = set(amounts1) & set(amounts2)
for key in sorted(common_keys):
    # Compare the amounts.
    number1 = amounts1[key]
    number2 = amounts2[key]
    if number1 == ZERO and number2 == ZERO:
        break
    # Add this check
    if (number1 > 0) != (number2 > 0):  # Different signs
        continue
    diff = abs((number1 / number2) if number2 != ZERO else (number2 / number1))
    # ... rest of comparison ...

Example Where This Matters

; duplicate of tmp.beancount:1
; 2024-12-01 * "" ""
;   Assets:SomeAsset  100.00 USD
cat tmp.beancount                                                                                                                                                                                    
2024-12-01 * "" ""
  Assets:SomeAsset -100.00 USD

I think these two transactions should NOT be considered duplicates. But the current implementation DOES consider them duplicates.

What do you think? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions