-
Notifications
You must be signed in to change notification settings - Fork 52
Description
I noticed that heuristic_comparator in similar.py currently treats transactions with opposite-sign amounts as similar. I wanted to check if this is intended behavior or if it might be worth considering a change.
Current Behavior
In the current implementation:
diff = abs((number1 / number2) if number2 != ZERO else (number2 / number1))
if diff == ZERO:
return False
if diff < ONE:
diff = ONE / diff
if (diff - ONE) < epsilon:
breakThe abs() is applied to the ratio of the numbers, which means:
number1 = -100,number2 = 100diff = abs(-100/100) = abs(-1) = 11 - 1 = 0 < epsilon- Therefore the amounts are considered "similar"
Potential Consideration
Looking at the test cases in similar_test.py, I noticed there aren't any test cases involving opposite-sign transactions.
If treating opposite-sign transactions as similar isn't intentional, we might want to add a sign check. This would prevent matching transactions that represent fundamentally different operations (like a deposit vs withdrawal).
Suggested Change
If this makes sense, we could add a sign check before comparing the ratio:
# Look for amounts on common accounts.
common_keys = set(amounts1) & set(amounts2)
for key in sorted(common_keys):
# Compare the amounts.
number1 = amounts1[key]
number2 = amounts2[key]
if number1 == ZERO and number2 == ZERO:
break
# Add this check
if (number1 > 0) != (number2 > 0): # Different signs
continue
diff = abs((number1 / number2) if number2 != ZERO else (number2 / number1))
# ... rest of comparison ...Example Where This Matters
; duplicate of tmp.beancount:1
; 2024-12-01 * "" ""
; Assets:SomeAsset 100.00 USD
cat tmp.beancount
2024-12-01 * "" ""
Assets:SomeAsset -100.00 USD
I think these two transactions should NOT be considered duplicates. But the current implementation DOES consider them duplicates.
What do you think? Thanks!