Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1082aa1
Add Rocky Linux setup script and PostgreSQL integration testing
Sep 27, 2025
44fa9ee
Add access token authentication requirements and production server sc…
Sep 27, 2025
9a29088
Add automatic access token generation to setup scripts
Sep 27, 2025
e270e86
Add firewall configuration for DenoKV port 4512
Sep 27, 2025
6d9978f
Add upgrade script for DenoKV on Rocky Linux
Sep 27, 2025
1666723
Complete setup script with full service startup
Sep 27, 2025
cefc080
updated
Sep 27, 2025
4ff0f53
Fix PostgreSQL to use production service instead of test Docker
Sep 27, 2025
fd37790
Fix PostgreSQL consistency - remove Docker references from test script
Sep 27, 2025
a8ddc14
Fix async trait Send requirements in remote library
Sep 27, 2025
e5ad2dd
Fix PostgreSQL authentication issues - ident to md5
Sep 27, 2025
c74519b
Add standalone PostgreSQL authentication fix script
Sep 27, 2025
059a669
fix
Sep 27, 2025
84b73c8
Add standalone service management script
Sep 27, 2025
ae0334c
Add PostgreSQL connection test script
Sep 27, 2025
bb7d38a
Fix PostgreSQL initialization for existing installations
Sep 27, 2025
6183819
Fix PostgreSQL password authentication issues
Sep 27, 2025
b737bc3
Fix Rust environment sourcing for non-root users
Sep 27, 2025
b8bf509
fix
Sep 27, 2025
694853e
fix
Sep 27, 2025
bde4f74
fix
Sep 27, 2025
24d01fe
feat: Complete DenoKV setup with systemd service
Sep 27, 2025
513e1be
feat: Update to PostgreSQL 16 and remove Docker dependency
Sep 27, 2025
7697736
fix: Resolve DenoKV systemd service CHDIR error
Sep 27, 2025
7678a20
fix: Use correct PostgreSQL package names for Rocky Linux
Sep 27, 2025
2dd171f
fix: Add missing --access-token argument to DenoKV systemd service
Sep 27, 2025
cdb7c44
fix: Add correct DenoKV PostgreSQL environment variables
Sep 27, 2025
9251896
feat: Add automatic access token generation
Sep 27, 2025
0377fdd
feat: Add comprehensive test script to setup logs
Sep 27, 2025
e24c674
feat: Update test script to use native Deno KV API
Sep 27, 2025
5c05a7e
refactor: Simplify test script to basic set/get/delete operations
Sep 27, 2025
c977b40
fix: Ensure DENOKV_ACCESS_TOKEN variable is properly expanded in syst…
Sep 27, 2025
d244a6a
fix: Use proper variable expansion in systemd service heredoc
Sep 28, 2025
657cba8
Add enqueue support to KV Connect and improve connection recovery
Dec 29, 2025
145ea3a
Merge pull request #2 from codebenderhq/feature/enqueue-support-and-c…
rawkakani Dec 29, 2025
4d10bc4
Add enqueue support to KV Connect and improve PostgreSQL connection r…
Dec 29, 2025
b41472e
Remove recycle_timeout - requires runtime specification
Dec 29, 2025
0a86b16
Update upgrade script with manual update instructions
Dec 29, 2025
9f9c14e
Remove retry logic, keep only enqueue support
Dec 29, 2025
21c1e41
fix: implement key expiration and queue parity for postgres backend
Mar 26, 2026
c8b8fa9
Merge pull request #3 from codebenderhq/fix/postgres-key-expiration
rawkakani Mar 26, 2026
557a40e
Merge upstream/main, resolve Cargo.toml conflict (bump postgres to 0.…
Mar 26, 2026
43414a3
fix: resolve merge conflict with upstream/main
Mar 26, 2026
a1ecf9c
fix: postgres atomic concurrency and watch notification bugs
Mar 30, 2026
f3cbbdf
refactor: use monotonic version counter for postgres concurrency
Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
target/
/*.sqlite*
/*.sqlite*
rawkakani_db.pem
188 changes: 188 additions & 0 deletions CONNECTION_ISSUES_EXPLANATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Why PostgreSQL Connection Issues Are Happening

## Summary of the Issue

Based on your logs from Dec 28, 2025, you're experiencing PostgreSQL connection failures that occur when:

1. **PostgreSQL server process crashes or restarts**
2. **Connection pool tries to use dead connections**
3. **PostgreSQL cannot create relation-cache files**

## Root Causes

### 1. **PostgreSQL Server Process Crash** (Primary Cause)

**Log Evidence:**
```
WARNING: terminating connection because of crash of another server process
```

**What This Means:**
- Another PostgreSQL backend process crashed (not your DenoKV process)
- PostgreSQL automatically terminates all connections when a backend process crashes
- This is a **safety mechanism** to prevent data corruption

**Why This Happens:**
- **Memory issues**: PostgreSQL process ran out of memory (OOM killer)
- **Disk I/O errors**: Storage problems causing process crashes
- **PostgreSQL bugs**: Rare but possible in certain versions
- **Resource exhaustion**: CPU/memory limits reached
- **System instability**: Hardware or OS issues

**How to Diagnose:**
```bash
# Check PostgreSQL logs for crash details
sudo tail -100 /var/log/postgresql/postgresql-*.log | grep -i "crash\|fatal\|panic"

# Check system logs for OOM kills
sudo dmesg | grep -i "out of memory\|killed process"

# Check PostgreSQL process status
sudo systemctl status postgresql
```

### 2. **Connection Pool Using Dead Connections**

**Log Evidence:**
```
WARN deadpool.postgres] Connection error: connection closed
INFO deadpool.postgres] Connection could not be recycled: Connection closed
```

**What This Means:**
- The connection pool (deadpool) had connections that were **already dead**
- When PostgreSQL crashed, it closed all connections
- deadpool tried to reuse these dead connections
- deadpool detected they were closed and tried to recycle them
- But recycling failed because the connection was already terminated

**Why This Happens:**
- **No connection health checks**: The pool doesn't validate connections before use
- **Stale connections**: Connections remain in pool after server crash
- **No automatic recovery**: Pool doesn't automatically recreate dead connections

**The Fix (Already Implemented):**
- Added connection validation before use (`SELECT 1` query)
- Added retry logic with exponential backoff
- Added automatic connection recreation on failure

### 3. **Relation-Cache Initialization File Errors**

**Log Evidence:**
```
WARNING: could not create relation-cache initialization file "base/16385/pg_internal.init"
WARNING: could not create relation-cache initialization file "global/pg_internal.init"
```

**What This Means:**
- PostgreSQL tries to create cache files for faster query planning
- These files are **optional performance optimizations**
- Failure to create them is **not critical** - PostgreSQL works without them
- This is a **warning**, not an error

**Why This Happens:**
- **File system permissions**: PostgreSQL user doesn't have write access
- **Disk space issues**: No space to create cache files
- **Read-only file system**: Database directory mounted read-only
- **PostgreSQL recovery mode**: Server in recovery and can't write cache

**Impact:**
- **Minimal**: Queries work but may be slightly slower
- **No data loss**: This doesn't affect data integrity
- **Can be ignored**: This is a non-critical warning

## Why It Happened on Dec 28 (11 Days After Startup)

The server started successfully on **Dec 17** and ran fine for 11 days. Then on **Dec 28**, you saw these errors. This suggests:

1. **PostgreSQL server restarted/crashed** on Dec 28
2. **All existing connections were terminated** by PostgreSQL
3. **Connection pool had stale connections** that were no longer valid
4. **Application tried to use dead connections** → errors occurred

## What Happens Now (After Our Fixes)

With the improvements we've implemented:

1. **Connection Validation**: Every connection is tested with `SELECT 1` before use
2. **Automatic Retry**: Transient errors trigger automatic retries (up to 3 attempts)
3. **Exponential Backoff**: Retries wait progressively longer (100ms, 200ms, 400ms)
4. **Better Error Detection**: We detect transient vs permanent errors
5. **Connection Recreation**: Dead connections are automatically replaced

**Result**: The application will now automatically recover from PostgreSQL crashes without user intervention.

## Recommendations

### 1. **Investigate PostgreSQL Crashes**

Find out why PostgreSQL crashed:

```bash
# Check PostgreSQL error log
sudo tail -200 /var/log/postgresql/postgresql-*.log

# Check for OOM kills
sudo dmesg | grep -i "killed process.*postgres"

# Check system resources
free -h
df -h
```

### 2. **Monitor PostgreSQL Health**

Set up monitoring for:
- PostgreSQL process crashes
- Memory usage
- Disk space
- Connection counts

### 3. **Configure PostgreSQL for Stability**

```sql
-- Increase shared_buffers if you have enough RAM
ALTER SYSTEM SET shared_buffers = '256MB';

-- Set connection limits
ALTER SYSTEM SET max_connections = 100;

-- Enable connection timeouts
ALTER SYSTEM SET idle_in_transaction_session_timeout = '10min';
```

### 4. **Set Up Automatic Restart**

Ensure PostgreSQL auto-restarts on crash:

```bash
# For systemd
sudo systemctl enable postgresql
sudo systemctl edit postgresql
# Add:
# [Service]
# Restart=always
# RestartSec=5
```

### 5. **Fix Relation-Cache Warnings (Optional)**

If you want to eliminate the warnings:

```bash
# Check PostgreSQL data directory permissions
sudo ls -la /var/lib/postgresql/*/base/

# Ensure PostgreSQL user can write
sudo chown -R postgres:postgres /var/lib/postgresql/
sudo chmod 700 /var/lib/postgresql/*/base/
```

## Conclusion

**The connection issues are caused by:**
1. PostgreSQL server process crashing (primary)
2. Connection pool not detecting dead connections (secondary - now fixed)
3. PostgreSQL cache file warnings (cosmetic - can be ignored)

**The application will now handle these gracefully** with automatic retries and connection recovery. However, you should still investigate why PostgreSQL is crashing to prevent future issues.
Loading
Loading