Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 254 additions & 0 deletions .agent/artifacts/balance_caching_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Balance Caching & Settlement Algorithm Refactoring Plan

## Overview

This document outlines the plan to simplify balance calculations by:
1. Caching net balances per group member in the Group document
2. Using the optimized settlement algorithm uniformly across all balance endpoints
3. Removing redundant calculation logic

## Current Architecture

### Data Flow
```
Expense Created → Settlements Created → Recalculate balances on every request
```

### Problems
1. **Redundant Calculations**: `get_friends_balance_summary()`, `get_overall_balance_summary()`, and `calculate_optimized_settlements()` all have similar aggregation logic
2. **N+1 Queries**: `get_overall_balance_summary()` runs one aggregation per group
3. **No Caching**: Every balance request recalculates from settlements

## Proposed Architecture

### New Data Flow
```
Expense Created → Settlements Created → Update Cached Balances → Read from cache
```

### Group Document Schema Enhancement
```python
{
"_id": ObjectId,
"name": "Trip 2024",
"members": [...],
"currency": "USD",
# NEW: Cached balances (updated on expense/settlement changes)
"cachedBalances": {
"user1_id": 75.50, # positive = owed by others
"user2_id": -50.00, # negative = owes to others
"user3_id": -25.50,
},
"balancesUpdatedAt": datetime
}
```

## Implementation Steps

### Phase 1: Add Balance Caching Infrastructure

#### 1.1 Update Group Schema
Add `cachedBalances` field to GroupResponse and database model.

#### 1.2 Create `_recalculate_group_balances()` method
- Calculates per-member net balances using `_calculate_advanced_settlements()`
- Updates the `cachedBalances` field in the group document
- Returns the calculated balances

```python
async def _recalculate_group_balances(self, group_id: str) -> Dict[str, float]:
"""Recalculate and cache member balances for a group."""
# Get optimized settlements
optimized = await self._calculate_advanced_settlements(group_id)

# Convert to per-member balances
member_balances = defaultdict(float)
for settlement in optimized:
member_balances[settlement.fromUserId] -= settlement.amount # owes
member_balances[settlement.toUserId] += settlement.amount # owed

# Update group document with cached balances
await self.groups_collection.update_one(
{"_id": ObjectId(group_id)},
{
"$set": {
"cachedBalances": dict(member_balances),
"balancesUpdatedAt": datetime.now(timezone.utc)
}
}
)

return dict(member_balances)
```

### Phase 2: Update Balance Calculation Triggers

#### 2.1 After `create_expense()` - call `_recalculate_group_balances()`
#### 2.2 After `update_expense()` - call `_recalculate_group_balances()`
#### 2.3 After `delete_expense()` - call `_recalculate_group_balances()`
#### 2.4 After `create_manual_settlement()` - call `_recalculate_group_balances()`
#### 2.5 After `update_settlement_status()` - call `_recalculate_group_balances()`

### Phase 3: Simplify Balance Endpoints

#### 3.1 Refactor `get_overall_balance_summary()`
Instead of N aggregations (one per group), read from cached balances:

```python
async def get_overall_balance_summary(self, user_id: str) -> Dict[str, Any]:
"""Get overall balance summary using cached balances."""
groups = await self.groups_collection.find(
{"members.userId": user_id}
).to_list(None)

total_owed_to_you = 0
total_you_owe = 0
groups_summary = []

for group in groups:
# Read from cache
cached = group.get("cachedBalances", {})
user_balance = cached.get(user_id, 0)

if abs(user_balance) > 0.01:
groups_summary.append({
"group_id": str(group["_id"]),
"group_name": group["name"],
"yourBalanceInGroup": user_balance,
})
if user_balance > 0:
total_owed_to_you += user_balance
else:
total_you_owe += abs(user_balance)

return {
"totalOwedToYou": total_owed_to_you,
"totalYouOwe": total_you_owe,
"netBalance": total_owed_to_you - total_you_owe,
"currency": "USD",
"groupsSummary": groups_summary,
}
```

#### 3.2 Refactor `get_friends_balance_summary()`
Use optimized settlements per group, then aggregate by friend:

```python
async def get_friends_balance_summary(self, user_id: str) -> Dict[str, Any]:
"""Get friend balances using optimized settlements per group."""
groups = await self.groups_collection.find(
{"members.userId": user_id}
).to_list(None)

friend_balances = defaultdict(lambda: {"balance": 0, "groups": []})

for group in groups:
group_id = str(group["_id"])

# Get optimized settlements for this group
optimized = await self.calculate_optimized_settlements(group_id)

for settlement in optimized:
# Check if user is involved
if settlement.fromUserId == user_id:
# User owes friend (negative)
friend_id = settlement.toUserId
friend_balances[friend_id]["balance"] -= settlement.amount
friend_balances[friend_id]["groups"].append({
"groupId": group_id,
"groupName": group["name"],
"balance": -settlement.amount
})
elif settlement.toUserId == user_id:
# Friend owes user (positive)
friend_id = settlement.fromUserId
friend_balances[friend_id]["balance"] += settlement.amount
friend_balances[friend_id]["groups"].append({
"groupId": group_id,
"groupName": group["name"],
"balance": settlement.amount
})

# Build response with user details
# ... (fetch user details in batch, build response)
```

### Phase 4: Migration

#### 4.1 Create migration script to initialize `cachedBalances`
For all existing groups, calculate and store initial cached balances.

```python
async def migrate_cached_balances():
"""Initialize cachedBalances for all existing groups."""
groups = await groups_collection.find({}).to_list(None)

for group in groups:
group_id = str(group["_id"])
await expense_service._recalculate_group_balances(group_id)
```

## Benefits

1. **Performance**: O(1) for balance reads (from cache) vs O(N) aggregations
2. **Simplicity**: One source of truth for balance calculations
3. **Consistency**: All endpoints use the same optimized settlement algorithm
4. **Reduced Complexity**: Remove duplicated aggregation pipelines

## Risks & Mitigations

| Risk | Mitigation |
|------|------------|
| Cache staleness | Always update cache on any expense/settlement change |
| Migration failures | Run migration in batches with retry logic |
| Edge cases | Keep fallback to recalculate on-demand if cache missing |

## Testing Plan

1. Unit tests for `_recalculate_group_balances()`
2. Integration tests for balance endpoints with cached data
3. Migration testing on staging database
4. Performance benchmarks before/after

---

## Implementation Order

1. ✅ Create this plan
2. ✅ Add `_recalculate_group_balances()` method
3. ✅ Add `_get_cached_balances()` helper method
4. ✅ Add cache update calls to expense/settlement methods:
-`create_expense()`
-`update_expense()`
-`delete_expense()`
-`create_manual_settlement()`
5. ✅ Refactor `get_overall_balance_summary()` to use cache
6. ✅ Refactor `get_friends_balance_summary()` to use optimized settlements
7. ✅ Create migration script (`005_init_cached_balances.py`)
8. 🔲 Run migration on your database
9. 🔲 Write tests
10. 🔲 Deploy migration

## How to Run Migration

```bash
cd backend
python -m migrations.005_init_cached_balances
```

## Summary of Changes

### New Methods Added to `ExpenseService`:
- `_recalculate_group_balances(group_id)` - Calculates and caches per-member balances
- `_get_cached_balances(group_id)` - Gets cached balances with fallback to recalculation

### Modified Methods:
- `create_expense()` - Now calls `_recalculate_group_balances()` after expense creation
- `update_expense()` - Now calls `_recalculate_group_balances()` after expense update
- `delete_expense()` - Now calls `_recalculate_group_balances()` after expense deletion
- `create_manual_settlement()` - Now calls `_recalculate_group_balances()` after settlement creation
- `get_overall_balance_summary()` - Now reads from cached balances (simple O(1) reads)
- `get_friends_balance_summary()` - Now uses `calculate_optimized_settlements()` per group

### New Migration:
- `005_init_cached_balances.py` - Initializes cached balances for all existing groups
90 changes: 77 additions & 13 deletions backend/app/auth/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,42 @@ async def create_user_with_email(
# Check if user already exists
existing_user = await db.users.find_one({"email": email})
if existing_user:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists",
)
# Check if it's a placeholder from import
if existing_user.get("isPlaceholder"):
# Link this signup to the placeholder account
logger.info(
f"Linking signup to placeholder account for user_id={existing_user['_id']}"
)
hashed_password = get_password_hash(password)
await db.users.update_one(
{"_id": existing_user["_id"]},
{
"$set": {
"hashed_password": hashed_password,
"name": name, # Update with new name if provided
"isPlaceholder": False,
"auth_provider": "email",
"activated_at": datetime.now(timezone.utc),
}
},
)

# Return the linked account
existing_user["hashed_password"] = hashed_password
existing_user["name"] = name
existing_user["isPlaceholder"] = False

Comment on lines +107 to +131
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep returned placeholder user in sync with DB updates

The DB update sets auth_provider and activated_at, but the response only updates hashed_password, name, and isPlaceholder, so the client receives stale fields. Consider reusing an update_data dict and applying it to existing_user before returning.

🛠️ Suggested fix
-                hashed_password = get_password_hash(password)
-                await db.users.update_one(
-                    {"_id": existing_user["_id"]},
-                    {
-                        "$set": {
-                            "hashed_password": hashed_password,
-                            "name": name,  # Update with new name if provided
-                            "isPlaceholder": False,
-                            "auth_provider": "email",
-                            "activated_at": datetime.now(timezone.utc),
-                        }
-                    },
-                )
+                hashed_password = get_password_hash(password)
+                activated_at = datetime.now(timezone.utc)
+                update_data = {
+                    "hashed_password": hashed_password,
+                    "name": name,  # Update with new name if provided
+                    "isPlaceholder": False,
+                    "auth_provider": "email",
+                    "activated_at": activated_at,
+                }
+                await db.users.update_one(
+                    {"_id": existing_user["_id"]},
+                    {"$set": update_data},
+                )
 
-                existing_user["hashed_password"] = hashed_password
-                existing_user["name"] = name
-                existing_user["isPlaceholder"] = False
+                existing_user.update(update_data)
🤖 Prompt for AI Agents
In `@backend/app/auth/service.py` around lines 107 - 131, The response user object
is missing DB fields set during update (auth_provider and activated_at), so
update the in-memory existing_user to match the DB update: build an update_data
dict (same keys as used in the db.users.update_one call, e.g.,
"hashed_password", "name", "isPlaceholder", "auth_provider", "activated_at") and
apply those entries to existing_user after calling get_password_hash and
awaiting db.users.update_one in the signup/linking flow (the block around
get_password_hash, db.users.update_one, and existing_user mutations); ensure
existing_user contains the new "auth_provider" and "activated_at" values before
returning.

# Create refresh token
refresh_token = await self._create_refresh_token_record(
str(existing_user["_id"])
)

return {"user": existing_user, "refresh_token": refresh_token}
else:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists",
)

# Create user document
user_doc = {
Expand Down Expand Up @@ -229,21 +261,53 @@ async def authenticate_with_google(self, id_token: str) -> Dict[str, Any]:
detail="Internal server error",
)
if user:
# Update user info if needed
update_data = {}
if user.get("firebase_uid") != firebase_uid:
update_data["firebase_uid"] = firebase_uid
if user.get("imageUrl") != picture and picture:
update_data["imageUrl"] = picture

if update_data:
# Check if this is a placeholder account from import
if user.get("isPlaceholder"):
# Activate the placeholder account with Google credentials
logger.info(
f"Activating placeholder account via Google auth for user_id={user['_id']}"
)
update_data = {
"firebase_uid": firebase_uid,
"isPlaceholder": False,
"auth_provider": "google",
"name": name if name else user.get("name"),
"activated_at": datetime.now(timezone.utc),
}
if picture:
update_data["imageUrl"] = picture

try:
await db.users.update_one(
{"_id": user["_id"]}, {"$set": update_data}
)
user.update(update_data)
except PyMongoError as e:
logger.warning("Failed to update user profile: %s", str(e))
logger.error(
"Failed to activate placeholder account for user_id=%s: %s",
user["_id"],
str(e),
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to activate account",
)
else:
# Regular user - update info if needed
update_data = {}
if user.get("firebase_uid") != firebase_uid:
update_data["firebase_uid"] = firebase_uid
if user.get("imageUrl") != picture and picture:
update_data["imageUrl"] = picture

if update_data:
try:
await db.users.update_one(
{"_id": user["_id"]}, {"$set": update_data}
)
user.update(update_data)
except PyMongoError as e:
logger.warning("Failed to update user profile: %s", str(e))
else:
# Create new user
user_doc = {
Expand Down
6 changes: 6 additions & 0 deletions backend/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ class Settings(BaseSettings):
firebase_auth_provider_x509_cert_url: Optional[str] = None
firebase_client_x509_cert_url: Optional[str] = None

# Splitwise Integration
splitwise_api_key: Optional[str] = None
splitwise_consumer_key: Optional[str] = None
splitwise_consumer_secret: Optional[str] = None
frontend_url: str = "http://localhost:5173" # Frontend URL for OAuth redirect

Comment on lines +36 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the backend directory structure
git ls-files backend/app/ | head -20

Repository: Devasy/splitwiser

Length of output: 644


🏁 Script executed:

# Read the config.py file to see the actual implementation
cat -n backend/app/config.py

Repository: Devasy/splitwiser

Length of output: 3651


🏁 Script executed:

# Search for RequestResponseLoggingMiddleware to understand logging behavior
fd -t f "*.py" backend/ -x grep -l "RequestResponseLoggingMiddleware"

Repository: Devasy/splitwiser

Length of output: 344


🏁 Script executed:

# Search for Splitwise usage throughout the codebase
rg -n "splitwise" --type py -i

Repository: Devasy/splitwiser

Length of output: 12475


🏁 Script executed:

# Check if there's URL validation in the router file
cat -n backend/app/integrations/router.py | head -100

Repository: Devasy/splitwiser

Length of output: 4161


🏁 Script executed:

# Search for any logging of credentials or sensitive data
rg -n "logger\.|splitwise_consumer|splitwise_api_key|frontend_url" backend/app/integrations/router.py

Repository: Devasy/splitwiser

Length of output: 854


🏁 Script executed:

# Check if there are any pydantic validators in config.py
rg -n "validator|field_validator" backend/app/config.py

Repository: Devasy/splitwiser

Length of output: 43


🏁 Script executed:

# Search for any logging statements that might expose secrets
rg -n "logger\." backend/app/config.py backend/app/integrations/service.py backend/app/integrations/router.py

Repository: Devasy/splitwiser

Length of output: 1184


Add URL validation for frontend_url to prevent OAuth redirect vulnerabilities.

The frontend_url configuration is used directly in OAuth redirect URIs (lines 47 and 81 in router.py) without validation. An attacker could set this to an untrusted domain, creating an open redirect vulnerability. Consider:

  1. Validating that frontend_url matches an allowed list of origins
  2. Using Pydantic validators in the Settings class to enforce valid URL formats and trusted domains

The RequestResponseLoggingMiddleware is safe—it only logs request/response metadata (method, URL, status, time) and does not expose request bodies or headers containing credentials.

🤖 Prompt for AI Agents
In @backend/app/config.py around lines 36 - 41, The Settings class exposes
frontend_url which is used directly in OAuth redirect URIs (router.py), allowing
an attacker to configure an untrusted domain; add a Pydantic validator on the
Settings model (e.g., validate_frontend_url or a @validator for frontend_url) to
ensure the value is a well-formed URL and matches a whitelist of allowed origins
(or at minimum only allows specific schemes and hostnames), and update any code
paths that consume frontend_url (router functions that build redirect URIs) to
rely on the validated property so untrusted domains cannot be used for
redirects.

# App
debug: bool = False

Expand Down
Loading
Loading