Log patterns: Log patterns related code by dawkaka · Pull Request #300 · monoscope-tech/monoscope

dawkaka · 2026-01-22T19:09:30Z

Saving and presenting log patterns in ui and log patterns anomaly detection

Unusual increase in log patterns within a time window
New detected log patterns

Closes #

How to test

tonyalaribe · 2026-01-22T20:24:46Z

src/Models/Apis/LogPatterns.hs

+
+
+-- | Get a pattern by ID
+getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)


use the _selectWhere pattern, instead of enumerating the fields one by one

tonyalaribe · 2026-01-22T20:26:55Z

src/Models/Apis/Issues/Enhancement.hs

+        Issues.LogPattern ->
+          "Describe this log pattern issue and its implications.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service
+        Issues.LogPatternRateChange ->
+          "Describe this log pattern rate change and its implications.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service



Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

tonyalaribe · 2026-01-22T23:43:18Z

src/Models/Apis/LogPatterns.hs

+        SELECT
+          lp.id,
+          lp.project_id,
+          lp.log_pattern,
+          lp.pattern_hash,
+          lp.baseline_state,
+          lp.baseline_volume_hourly_mean,
+          lp.baseline_volume_hourly_stddev,
+          COALESCE(counts.current_count, 0)::INT AS current_hour_count
+        FROM apis.log_patterns lp
+        LEFT JOIN (
+          SELECT log_pattern, COUNT(*) AS current_count
+          FROM otel_logs_and_spans
+          WHERE project_id = ?
+            AND timestamp >= date_trunc('hour', NOW())
+            AND log_pattern IS NOT NULL
+          GROUP BY log_pattern
+        ) counts ON counts.log_pattern = lp.log_pattern
+        WHERE lp.project_id = ?
+          AND lp.state != 'ignored' AND lp.baseline_state = 'established'


How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?

Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range

Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case

tonyalaribe · 2026-01-22T23:43:48Z

src/Models/Apis/RequestDumps.hs

+            [text|
+              SELECT lp.log_pattern, count(*) as p_count
+              FROM apis.log_patterns lp
+              INNER JOIN otel_logs_and_spans ols
+                ON lp.log_pattern = ols.log_pattern AND lp.project_id::text = ols.project_id
+              WHERE lp.project_id = ?
+                AND lp.state != 'ignored'
+                AND ${whereCondition}
+              GROUP BY lp.log_pattern
+              ORDER BY p_count DESC
+              OFFSET ? LIMIT 15
+            |]


same as here. how is log_pattern joining on otel_logs_and_spans?

otel_logs_and_spans has a log_pattern column

Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.

tonyalaribe · 2026-01-22T23:45:32Z

src/Models/Apis/LogPatterns.hs

+
+-- | Get pattern stats from otel_logs_and_spans
+-- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation
+getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats)


why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?

This is for detecting spikes

tonyalaribe · 2026-01-22T23:47:09Z

src/BackgroundJobs.hs

+
+-- | Calculate baselines for log patterns
+-- Uses hourly counts from otel_logs_and_spans over the last 7 days
+calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx ()


I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.

claude · 2026-01-24T07:06:39Z

PR Review: Log Patterns Feature

Summary

This is a solid implementation of log patterns functionality with anomaly detection. The code follows Haskell best practices and makes good use of the available GHC extensions. However, there are opportunities for improvement in performance, security, and code succinctness.

🔴 Critical Issues

1. Unbounded Query Vulnerability (`src/Models/Apis/LogPatterns.hs:96-110`)

getLogPatterns :: DB es => Projects.ProjectId -> Int -> Int -> Eff es [LogPattern]
getLogPatterns pid limit offset = PG.query q (pid, limit, offset)

Issue: No validation on limit parameter - malicious caller could request millions of rows.
Fix: Add bounds: let safeLimit = min limit 1000

2. Race Condition in Baseline Calculation (`src/BackgroundJobs.hs:1674-1695`)

Sequential pattern processing without locking could cause race conditions when multiple jobs run concurrently.
Fix: Use database-level locking or optimistic concurrency control (add version column).

3. Missing Permission Check (`src/Models/Apis/LogPatterns.hs:138-148`)

acknowledgeLogPatterns :: DB es => Users.UserId -> V.Vector Text -> Eff es Int64

Issue: No validation that user has permission to acknowledge patterns for the project.
Fix: Join with project membership table in the query.

4. Potential Orphaned Jobs (`src/BackgroundJobs.hs:1748-1752`)

If insertIssue fails, the enhancement job is still created.
Fix: Wrap in transaction or use bracket pattern.

⚠️ Performance Issues

1. N+1 Query Pattern (`src/BackgroundJobs.hs:1678-1695`)

patterns <- LogPatterns.getLogPatterns pid 1000 0
forM_ patterns \lp -> do
  statsM <- LogPatterns.getPatternStats pid lp.logPattern 168

Impact: 1 + N queries for baseline calculation.
Fix: Create batch query using SQL window functions to compute all pattern statistics in one go.

2. Missing Composite Index (`static/migrations/0026_log_patterns.sql`)

Spike detection query filters by state and baseline_state, but no composite index exists.
Fix:

CREATE INDEX idx_log_patterns_spike_detection 
  ON apis.log_patterns(project_id, state, baseline_state)
  WHERE state != 'ignored' AND baseline_state = 'established';

3. Memory Leak Risk (`src/Pkg/Drain.hs:230-236`)

getAllLogGroups tree =
  let levelOnes = children tree
      levelTwos = V.concatMap nodes levelOnes
      allLogGroups = V.concatMap logGroups levelTwos
   in V.map (\grp -> (grp.exampleLog, templateStr grp, logIds grp)) allLogGroups

Issue: Multiple intermediate vectors without forced evaluation.
Fix: Add bang patterns or use strict vector operations.

📝 Code Succinctness Improvements

1. Use MultiWayIf (`src/BackgroundJobs.hs:1691-1693`)

-- Current:
newState = case lp.baselineState of
  BSEstablished -> BSEstablished 
  BSLearning -> if newMean > 100 || patternAgeDays > 1
                then BSEstablished 
                else BSLearning

-- Better:
newState = if
  | lp.baselineState == BSEstablished -> BSEstablished
  | newMean > 100 || patternAgeDays > 1 -> BSEstablished
  | otherwise -> BSLearning

2. Use ViewPatterns (`src/Pkg/Drain.hs:156-168`)

-- Current: 13 lines
leastRecentlyUsedIndex logGroups =
  V.ifoldl'
    ( \acc i g ->
        case acc of
          Nothing -> Just (i, lastSeen g)
          Just (j, t) ->
            if lastSeen g < t
              then Just (i, lastSeen g)
              else Just (j, t)
    )
    Nothing
    logGroups
    & maybe 0 fst

-- More concise: 6 lines
leastRecentlyUsedIndex = maybe 0 fst . V.ifoldl' step Nothing
  where
    step acc i g = case acc of
      Nothing -> Just (i, lastSeen g)
      Just (j, t) | lastSeen g < t -> Just (i, lastSeen g)
                  | otherwise -> Just (j, t)

3. Replace Case with fromMaybe (`src/Models/Apis/LogPatterns.hs:248-252`)

-- Current:
results <- PG.query q (pid, pattern')
case results of
  [Only count] -> return count
  _ -> return 0

-- Concise:
fromMaybe 0 . fmap fromOnly . listToMaybe <$> PG.query q (pid, pattern')

4. Define Data Types Instead of Large Tuples (`src/BackgroundJobs.hs:1720`)

-- Current:
forM_ spikeData \(patternId, patternHash, currentRate, mean, stddev) -> do

-- Better:
data SpikeInfo = SpikeInfo
  { patternId :: LogPatternId
  , patternHash :: Text
  , currentRate :: Double
  , baselineMean :: Double
  , baselineStddev :: Double
  } deriving stock (Show, Generic)

forM_ (toSpikeInfo <$> spikeData) \info -> ...

Better type safety and more readable.

5. Simplify Vector Conversions (`src/BackgroundJobs.hs:512-515`)

-- Current:
existingPatterns <- LogPatterns.getLogPatternTexts pid
let known = V.fromList $ map (\pat -> ("",False,pat,Nothing,Nothing,Nothing)) existingPatterns
    combined = known <> ((\(logId, content, trId, serviceName, level) -> (logId, True, content, trId, serviceName, level)) <$> events)

-- Better:
let known = (\pat -> ("", False, pat, Nothing, Nothing, Nothing)) <$> V.fromList existingPatterns
    combined = known <> (second5 (const True) <$> events)  -- Define second5 helper

🔒 Security Concerns

1. Missing Input Validation (`src/Models/Apis/LogPatterns.hs:151-165`)

Pattern content has no length limits - large strings could cause DoS.
Fix: guard (T.length pattern < 10000) >> upsertLogPattern ...

2. Unvalidated Bulk Actions (`src/Pages/Anomalies.hs:141-160`)

Bulk actions endpoint trusts client-supplied IDs without verifying project ownership.
Fix: Add WHERE project_id = ? to all anomaly ID lookups.

📦 Database Schema Issues

`static/migrations/0026_log_patterns.sql`

Line 15 & 24: Use enum types instead of TEXT for state and baseline_state:
```
state apis.log_pattern_state NOT NULL DEFAULT 'new'
```

Missing index: acknowledged_by column lacks index, causing slow joins:

CREATE INDEX idx_log_patterns_acknowledged_by ON apis.log_patterns(acknowledged_by);

Race condition in trigger (line 58): log_pattern_created_notify could create duplicate jobs.
Fix: Use INSERT ... ON CONFLICT DO NOTHING in the background job queue.

✅ Positive Highlights

Excellent use of deriving strategies and newtypes
Good separation of concerns (Models, BackgroundJobs, Pages)
Comprehensive database migration with indexes
Well-structured issue types with proper sum types
Good use of V.Vector for performance
LLM enhancement integration is well-architected
Proper use of Maybe for optional fields

Test Coverage

The test file test/unit/Pkg/DrainSpec.hs was reformatted (49 lines changed). Ensure all tests still pass and no logic was altered during formatting.

Recommended Priority

High: Fix unbounded query (add limit validation)
High: Add permission check to acknowledgeLogPatterns
High: Add composite index for spike detection
Medium: Batch N+1 queries in baseline calculation
Medium: Add transaction wrapper for issue creation
Low: Refactor large tuples to data types
Low: Apply succinctness improvements

Overall, this is a well-structured feature that follows good Haskell practices. Addressing the critical issues will make it production-ready. Great work! 🚀

tonyalaribe · 2026-01-26T21:40:58Z

src/Models/Apis/Issues.hs

      [text|
      SELECT id, created_at, updated_at, project_id, issue_type::text, endpoint_hash, acknowledged_at, acknowledged_by, archived_at, title, service, critical,
-        CASE WHEN critical THEN 'critical' ELSE 'info' END, affected_requests, affected_clients, NULL::double precision,
+        CASE WHEN critical THEN 'critical' ELSE 'info' END, 0::int, 0::int, NULL::double precision,


Why did you set these to 0?

Those are not part of the new issues table.

Why are we adding it to the query?

tonyalaribe · 2026-01-26T21:50:05Z

src/Models/Apis/Issues/Enhancement.hs

+        Issues.LogPattern ->
+          "Generate a concise title for this log pattern issue.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service
+        Issues.LogPatternRateChange ->
+          "Generate a concise title for this log pattern rate change.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service


Use the text quasiquotes here.

tonyalaribe · 2026-01-26T21:54:23Z

static/migrations/0026_log_patterns.sql

@@ -0,0 +1,68 @@
+BEGIN;
+
+CREATE TABLE IF NOT EXISTS apis.log_patterns (


We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.

tonyalaribe · 2026-01-26T21:55:39Z

static/migrations/0026_log_patterns.sql

+CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC);
+CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name);
+
+CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$


Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.

It's a regular table.

claude · 2026-01-27T15:06:54Z

Code Review: Log Patterns Implementation

Overall Assessment: This is solid, production-quality Haskell code with excellent architecture. The implementation demonstrates strong type safety, proper database design, and statistical rigor. However, there are some critical issues and opportunities for improvement.

Critical Issues 🔴

1. Division by Zero Risk

Location: src/BackgroundJobs.hs:1712

let zScore = (currentRate - mean) / stddev

Missing check for mean > 0. If mean = 0 but currentRate > 0, this should likely be flagged as a spike for a new pattern gaining activity.

Fix:

| stddev > 0 && mean > 0 ->
    let currentRate = fromIntegral lpRate.currentHourCount
        zScore = (currentRate - mean) / stddev
        isSpike = abs zScore > 3.0 && currentRate > mean
| mean == 0 && currentHourCount > 10 -> 
    Just (lpRate.patternId, ...) -- New pattern with activity

2. Trigger Performance Issue

Location: static/migrations/0026_log_patterns.sql:40-58

The trigger fires on ALL inserts, including ON CONFLICT DO UPDATE cases. For 1000 pattern upserts per batch, this creates 1000 duplicate background jobs.

Fix:

CREATE TRIGGER log_pattern_created_notify 
AFTER INSERT ON apis.log_patterns 
FOR EACH ROW 
WHEN (NEW.xmax = 0)  -- Only real inserts, not upserts
EXECUTE PROCEDURE apis.new_log_pattern_proc();

3. Memory Risk with Large Pattern Sets

Location: src/BackgroundJobs.hs:512, 1678

Loading all patterns into memory without pagination. For projects with 10K+ patterns, this could cause OOM.

Fix: Add pagination or limit to getLogPatterns:

-- Process in batches of 1000
let batchSize = 1000
patterns <- getLogPatterns pid batchSize 0

High Priority Issues ⚠️

4. Missing Database Constraints

Location: static/migrations/0026_log_patterns.sql:15, 24

State columns lack CHECK constraints for data integrity.

Fix:

state TEXT NOT NULL DEFAULT 'new' CHECK (state IN ('new', 'acknowledged', 'ignored')),
baseline_state TEXT NOT NULL DEFAULT 'learning' CHECK (baseline_state IN ('learning', 'established')),

5. Hardcoded Magic Numbers

Throughout src/BackgroundJobs.hs, thresholds are hardcoded:

Line 1713: 3.0 (3-sigma threshold)
Line 1681: 168 (7-day baseline window)
Line 1743: 10000 (min events for new pattern issue)
Line 1692: 100 (min hourly events for baseline)

Fix: Extract to named constants with documentation:

-- Constants for baseline establishment
minEventsForEstablishment :: Double
minEventsForEstablishment = 100  -- Minimum hourly events to establish baseline

minAgeForEstablishment :: Double  
minAgeForEstablishment = 1  -- Minimum age in days

sigmaThreshold :: Double
sigmaThreshold = 3.0  -- Standard deviations for spike detection

6. Missing Index

Location: static/migrations/0026_log_patterns.sql

No index on pattern_hash alone for cross-project queries.

Fix:

CREATE INDEX idx_log_patterns_hash ON apis.log_patterns(pattern_hash);

Code Quality Improvements

7. Redundant SQL in LogPatterns.hs

Location: src/Models/Apis/LogPatterns.hs:95-110, 119-134

SELECT clauses repeat all 20 fields multiple times.

Improvement: Leverage the Entity deriving more:

getLogPatterns pid limit offset = 
  PG.query [sql| SELECT * FROM apis.log_patterns 
                 WHERE project_id = ? 
                 ORDER BY last_seen_at DESC 
                 LIMIT ? OFFSET ? |] (pid, limit, offset)

8. Use Enum for Change Direction

Location: src/Models/Apis/Issues.hs:669

direction :: Text should be type-safe:

data ChangeDirection = Spike | Drop deriving (Show, Eq, Generic, ToJSON, FromJSON)

9. MultiWayIf for Cleaner Conditionals

Location: src/BackgroundJobs.hs:1690-1693

Already using many extensions, but MultiWayIf would help:

let newState = if
  | patternAgeDays < 1 -> BSLearning
  | newMean < 100 && lp.baselineState == BSLearning -> BSLearning  
  | otherwise -> BSEstablished

Performance Considerations

Strengths ✅

Excellent use of median + MAD instead of mean + stddev for outlier robustness
Batch queries to prevent N+1 (lines 315-318)
HashMap for O(1) pattern lookup (line 1719)
Proper use of coerce for zero-cost conversions

Opportunities

Pattern Loading: Consider streaming for very large pattern sets
Drain Tree: Not persisted between runs (CPU vs memory trade-off is reasonable)
Query Optimization: Lines 285-306 use LEFT JOIN with subquery - consider materialized view for 1000s of patterns

Security ✅

✅ All queries use parameterized statements (no SQL injection)
✅ Proper access control via project_id filtering
⚠️ DoS potential: High log volume → thousands of patterns → thousands of baseline calculations (consider rate limiting)

Missing Functionality

No drop detection - only spike detection (line 1713: currentRate > mean prevents drop detection)
No pattern cleanup - consider retention policy for patterns not seen in 30+ days
No error handling/retry - baseline/spike calculation failures are silent
No tests - missing unit tests for baseline edge cases, integration tests for spike detection

What I Really Like 🌟

Statistical Rigor: Using median + MAD with proper 1.4826 scaling factor shows deep understanding
Type Safety: Excellent use of newtypes (LogPatternId), derivingVia, and type-safe database operations
Modern Haskell: Great use of RecordDotSyntax, BlockArguments, DerivingVia
State Machine: Proper baseline state management (BSLearning → BSEstablished)
Database Design: Comprehensive indexes and foreign key constraints
Code Organization: Clean separation between data models, background jobs, and UI

Recommendations

Must Fix Before Merge

✅ Fix division-by-zero in spike detection (BackgroundJobs.hs:1712)
✅ Fix trigger to avoid duplicate jobs (migration)
✅ Add CHECK constraints for state columns

Should Fix Soon

Add pagination for pattern loading (prevent OOM)
Extract magic numbers to named constants
Add basic error handling and logging
Add index on pattern_hash

Nice to Have

Implement drop detection
Add unit tests for statistical calculations
Add pattern cleanup/retention policy

Verdict: Strong implementation with minor critical fixes needed. The architecture is sound and the code demonstrates excellent Haskell practices. Once the critical issues are addressed, this will be production-ready.

claude · 2026-01-27T15:20:03Z

PR #300 Review: Log Patterns Functionality

Reviewed 12 files with +1102/-374 lines. Overall: Good implementation with room for improvement.

Priority Issues

Security: Missing Input Validation (LogPatterns.hs:151) - No length validation on patterns/hashes
Security: No Rate Limiting - Pattern creation lacks rate limiting
Performance: Memory Issue (BackgroundJobs.hs:512) - getLogPatternTexts loads ALL patterns into memory
Bug: Race Condition (BackgroundJobs.hs:1743-1754) - TOCTOU between event count check and pattern retrieval
Bug: Fragile LLM Parsing (Enhancement.hs:86-88) - Silent failures on unexpected format

Performance Optimizations

Vector Inefficiency (BackgroundJobs.hs:536): O(n) V.find - use HashMap for O(1)
DB Query Performance (LogPatterns.hs:218-243): Add index on (project_id, log_pattern, timestamp)
Magic Numbers (BackgroundJobs.hs:1692): Extract to constants

Code Succinctness (GHC Extensions)

Use guards instead of case on bools (LogPatterns.hs:138-141)
Use ViewPatterns for cleaner pattern matching (Drain.hs:100-107)
Use RecordWildCards to simplify long functions (Issues.hs:669-694)
Break long lines (Issues.hs:694 is 140+ chars)

Test Coverage

Good: Comprehensive DRAIN algorithm tests
Missing: LogPatterns DB ops, baseline calc, spike detection, integration tests

Package Usage

Consider using statistics package for robust median/MAD/stddev calculations

Summary Score

Code Quality: 7/10
Security: 6/10 (needs input validation & rate limiting)
Performance: 7/10
Test Coverage: 5/10
Maintainability: 7/10

Recommendation: Request changes for security/performance issues, then approve.

tonyalaribe · 2026-01-27T21:33:10Z

Rename patterns function from 15mins to 5mins.

claude · 2026-01-28T23:16:10Z

PR Review: Log Pattern Anomaly Detection

Great work on implementing log pattern anomaly detection! The implementation is thoughtful with good test coverage. However, I've identified several critical issues and optimization opportunities.

🚨 CRITICAL: Security Issue

SQL Injection Vulnerability in src/Models/Apis/RequestDumps.hs lines 1344-1356:

The target parameter is interpolated into SQL without proper validation. An attacker could manipulate targetM to inject SQL.

-- VULNERABLE CODE
target = fromMaybe "log_pattern" targetM
let q = [text|
  SELECT log_pattern, count(*) as p_count
  FROM otel_logs_and_spans
  WHERE ${whereCondition} AND log_pattern = ANY(?)
  ...
|]

Fix: Use a whitelist:

target = case targetM of
  Just "log_pattern" -> "log_pattern"
  Just "summary_pattern" -> "summary_pattern"
  _ -> "log_pattern"

⚡ Performance Issues

1. Expensive Baseline Calculations

Lines 1213-1248 use PERCENTILE_CONT twice per pattern, which is computationally expensive. For projects with hundreds of patterns, this will be slow.

Recommendations:

Create materialized view for hourly counts
Consider simpler approximations (e.g., 1.4826 * stddev_samp instead of MAD)
Batch process patterns instead of individual queries

2. Missing Indexes Causing Table Scans

Lines 89-100: Query filtering on (summary_pattern IS NULL OR log_pattern IS NULL) will cause full table scans.

Add partial indexes:

CREATE INDEX idx_otel_logs_missing_log_pattern 
  ON otel_logs_and_spans(project_id, timestamp) 
  WHERE log_pattern IS NULL;

3. Inefficient Pattern Lookup

Lines 113-119: getLogPatternTexts loads ALL patterns into memory for every batch. For projects with thousands of patterns, this is wasteful.

Recommendation: Filter at database level instead of in-memory.

4. Job Storm

Lines 76-81: Scheduling jobs for ALL projects at once creates a job storm.

Recommendation: Stagger job creation:

run_at = NOW() + random() * INTERVAL '1 hour'

🐛 Bugs

1. Pattern Metadata Extraction

Lines 139-146: If the first log ID doesn't match any event, metadata is lost.

-- BUG: Uses first ID which might not exist
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
      Just logId | logId /= "" ->
        case V.find (\(i, _, _, _, _) -> i == logId) events of

Fix: Find any matching event:

let metadata = case V.find (\(i, _, _, _, _) -> V.elem i ids) events of

2. Inconsistent Z-Score Logic

Line 224: Using abs but then checking direction is inconsistent.

-- INCONSISTENT
let zScore = (currentRate - mean) / stddev
    isSpike = abs zScore > 3.0 && currentRate > mean

Fix: Just check positive spikes:

let isSpike = zScore > 3.0  -- Only detect upward spikes

3. Variable Name Collision

Line 228: Using pid' when pid is in scope is confusing.

let spikeIds = V.fromList $ map (\(pid', _, _, _, _) -> pid') spikeData

Fix: Use descriptive names:

let spikeIds = V.fromList $ map (\(patternId, _, _, _, _) -> patternId) spikeData

📝 Code Quality & Succinctness

1. Long Parameter Lists Need Record Types

Multiple functions have 7-9 parameters. Use records:

-- Before (9 parameters!)
updateTreeWithLog :: DrainTree -> Int -> Text -> V.Vector Text -> Text -> Bool -> Text -> Text -> UTCTime -> DrainTree

-- After
data LogContext = LogContext
  { tokenCount :: Int
  , firstToken :: Text
  , tokens :: V.Vector Text
  , logId :: Text
  , isSample :: Bool
  , content :: Text
  , field :: Text
  , timestamp :: UTCTime
  }

updateTreeWithLog :: DrainTree -> LogContext -> DrainTree

2. Use RecordWildCards & NamedFieldPuns

Lines 1756-1761 can be simplified:

-- Enable RecordWildCards
case V.findIndex (\DrainLevelOne{tokenCount} -> tokenCount == targetCount) levelOnes of
  Just index ->
    let existingLevel@DrainLevelOne{nodes} = levelOnes V.! index

3. Repetitive Update Pattern

Lines 1760-1803: updateOrCreateLevelOne, updateOrCreateLevelTwo, and updateOrCreateLogGroup all follow the same pattern. Abstract into a higher-order function:

updateOrCreateInVector :: 
  (a -> Bool) ->           -- Find predicate
  (a -> (a, Bool)) ->      -- Update function
  (V.Vector a -> a) ->     -- Create function
  V.Vector a -> (V.Vector a, Bool)

4. Use Type Classes for Issue Prompts

Enhancement.hs lines 650-950 have repetitive pattern matching. Use type classes:

class IssueDataToPrompt a where
  toTitlePrompt :: a -> Text
  toDescriptionPrompt :: a -> Text

instance IssueDataToPrompt APIChangeData where ...
instance IssueDataToPrompt RuntimeExceptionData where ...

5. Inefficient Text Processing

Line 1903-1906: Character-by-character processing is slow. Use T.uncons:

-- Instead of: if T.head t == '"'
case T.uncons t of
  Just ('"', rest) -> ...
  Just ('[', rest) -> ...

🔍 SQL Optimization

1. SELECT * is Inefficient

Line 1319: Avoid SELECT *:

-- Replace with explicit column list
PG.query [sql| SELECT id, project_id, log_pattern, ... FROM apis.log_patterns WHERE id = ANY(?) |]

2. Missing Index for Common Query

Lines 1104-1115: Add composite index:

CREATE INDEX idx_log_patterns_project_last_seen 
  ON apis.log_patterns(project_id, last_seen_at DESC);

3. Filter Recently Active Patterns Only

Lines 1285-1311: Add filter to reduce unnecessary joins:

WHERE lp.project_id = ?
  AND lp.state != 'ignored' 
  AND lp.baseline_state = 'established'
  AND lp.last_seen_at > NOW() - INTERVAL '1 day'  -- Only check recently active

4. Data Migration for Constraint Change

Migration lines 2020-2093: Constraint changed but might have duplicates. Add deduplication:

-- Before changing constraint
DELETE FROM apis.log_patterns a USING apis.log_patterns b
WHERE a.id > b.id 
  AND a.project_id = b.project_id 
  AND a.log_level = b.log_level 
  AND a.field_path = b.field_path 
  AND a.pattern_hash = b.pattern_hash;

🔒 Security Concerns

1. Sample Messages May Contain PII

Line 257: Sample log messages might contain sensitive data.

Recommendations:

Truncate to first 200 chars
Add config flag to disable sample collection
Scrub common PII patterns (emails, API keys)

2. Validate Background Job Payloads

Line 2069: Ensure background job processor validates pattern hash format to prevent injection.

✅ What's Good

Excellent test coverage in test/unit/Pkg/DrainSpec.hs
Good use of batch queries to avoid N+1 (line 1316)
Proper use of quasi-quoters for SQL
Thoughtful use of median/MAD for robustness against outliers
Well-structured hierarchical clustering in Drain algorithm

🎯 Priority Actions

Must Fix Before Merge:

✅ Fix SQL injection in RequestDumps.hs
✅ Add pattern metadata extraction bug fix
✅ Add partial indexes for pattern extraction

Should Fix:
4. Optimize baseline calculation (batch or simplify stats)
5. Stagger background job creation
6. Add composite indexes for common queries
7. Use record types for long parameter lists
8. Fix z-score logic inconsistency

Nice to Have:
9. Extract higher-order update function
10. Use RecordWildCards/ViewPatterns
11. Add type classes for issue prompts
12. Use explicit SELECT columns

Overall, this is solid work! The main blockers are the SQL injection vulnerability and performance concerns at scale. Once those are addressed, this will be a great addition.

claude · 2026-01-29T20:58:19Z

Code Review: PR #300 - Log Patterns Implementation

Executive Summary

This PR implements log pattern detection using the DRAIN algorithm with database persistence, UI presentation, and anomaly detection. The implementation is solid but has several areas for improvement regarding code quality, performance, and succinctness.

Overall Assessment: 7.5/10

✅ Well-structured, type-safe code
✅ Excellent test coverage
✅ Proper SQL parameterization (security)
⚠️ Debug code left in production
⚠️ N+1 query patterns
⚠️ Opportunities for more concise code

Critical Issues (Fix Before Merge)

1. Debug Code in Production 🚨

src/BackgroundJobs.hs:523-524, 274-275

traceShowM patternTxt  -- Remove
traceShowM ids         -- Remove

src/Models/Apis/LogPatterns.hs:274-275

traceShowM counts      -- Remove
traceShowM patterns    -- Remove

Fix: Remove all traceShowM calls or replace with proper structured logging:

Log.logDebug "Processing pattern" [("pattern", AE.toJSON patternTxt)]

2. Potential Index Error 🐛

src/Pkg/Drain.hs:164-177

leastRecentlyUsedIndex :: V.Vector LogGroup -> Int
leastRecentlyUsedIndex logGroups =
  V.ifoldl' ... Nothing logGroups
    & maybe 0 fst  -- Returns 0 for empty vector!

Issue: Returns index 0 for empty vectors, causing crashes.

Fix:

leastRecentlyUsedIndex :: V.Vector LogGroup -> Maybe Int
leastRecentlyUsedIndex logGroups
  | V.null logGroups = Nothing
  | otherwise = Just $ V.ifoldl' ... & fromMaybe 0 . fmap fst

3. Commented Code

src/BackgroundJobs.hs:502

-- processPatterns "summary" "summary_pattern" ...  -- Remove entirely

Performance Issues

1. N+1 Query Pattern ⚡

src/BackgroundJobs.hs:1664-1697

patterns <- LogPatterns.getLogPatterns pid 1000 0      -- 1 query
forM_ patterns \lp -> do
  statsM <- LogPatterns.getPatternStats pid lp.logPattern 168  -- 1000 queries
  _ <- LogPatterns.updateBaseline pid ...                      -- 1000 queries

For 1000 patterns: 2001 total queries

Recommendation: Use a single query with window functions:

WITH hourly_stats AS (
  SELECT log_pattern,
         PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY hourly_count) as median,
         STDDEV(hourly_count) as stddev
  FROM (
    SELECT log_pattern, 
           date_trunc('hour', timestamp) as hour,
           COUNT(*) as hourly_count
    FROM otel_logs_and_spans
    WHERE project_id = ? 
      AND log_pattern = ANY(?)  -- Batch all patterns
      AND timestamp >= NOW() - INTERVAL '7 days'
    GROUP BY log_pattern, hour
  ) subq
  GROUP BY log_pattern
)
UPDATE apis.log_patterns lp
SET baseline_volume_hourly_mean = hs.median,
    baseline_volume_hourly_stddev = hs.stddev,
    baseline_updated_at = NOW()
FROM hourly_stats hs
WHERE lp.log_pattern = hs.log_pattern AND lp.project_id = ?

2. Multiple Queries Where One Would Suffice

src/Models/Apis/LogPatterns.hs:269-304

patterns <- PG.query patternsQuery (Only pid)     -- Query 1
counts <- PG.query countsQuery (Only pid)         -- Query 2
pure $ map (attachCount counts) patterns

Recommendation: Single JOIN query:

SELECT lp.*, COALESCE(c.count, 0)::INT as current_hour_count
FROM apis.log_patterns lp
LEFT JOIN (
  SELECT log_pattern, COUNT(*)::INT as count
  FROM otel_logs_and_spans
  WHERE project_id = ? AND timestamp >= NOW() - INTERVAL '1 hour'
  GROUP BY log_pattern
) c ON lp.log_pattern = c.log_pattern
WHERE lp.project_id = ? AND lp.state != 'ignored'

3. Inefficient Text Search

src/Pkg/Drain.hs:260-264

looksLikeJson t =
  ("{" `T.isInfixOf` t && "}" `T.isSuffixOf` t)  -- Scans entire string

Better:

looksLikeJson t = case T.uncons (T.strip t) of
  Just ('{', _) -> "}" `T.isSuffixOf` t
  Just ('[', _) -> "]" `T.isSuffixOf` t
  _ -> False

Code Succinctness (Leverage GHC Extensions)

Use LambdaCase

src/BackgroundJobs.hs:509-515

-- Current:
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
      Just logId | logId /= "" ->
        case V.find (\(i, _, _, _, _) -> i == logId) events of
          Just (_, _, trId, sName, lvl) -> (sName, lvl, trId)
          Nothing -> (Nothing, Nothing, Nothing)
      _ -> (Nothing, Nothing, Nothing)

-- Better with LambdaCase:
let (serviceName, logLevel, logTraceId) = 
      ids V.!? 0 >>= \case
        logId | logId /= "" ->
          V.find (\(i,_,_,_,_) -> i == logId) events <&> 
            \(_,_,trId,sName,lvl) -> (sName, lvl, trId)
        _ -> Nothing
      & fromMaybe (Nothing, Nothing, Nothing)

Reduce Column Enumeration

src/Models/Apis/LogPatterns.hs:102-112

Manually lists all columns when could use SELECT *:

-- Current: 10 lines of column names
SELECT id, project_id, created_at, updated_at, log_pattern, ...

-- Better:
SELECT * FROM apis.log_patterns
WHERE project_id = ? ORDER BY last_seen_at DESC LIMIT ? OFFSET ?

Package Usage

The project has excellent packages available (lens, extra, safe) but could leverage them more:

Use `lens` operators

src/Models/Apis/LogPatterns.hs:306

-- Current: Manual record construction
attachCount counts (patId, projId, logPat, patHash, blState, blMean, blStddev) =
  LogPatternWithRate { patternId = patId, ... }

-- Could use lens for cleaner construction

Use `extra` functions

Could use groupOn, sumOn from Data.List.Extra in several places.

Security ✅

✅ All SQL queries properly use parameterized queries (? placeholders)
✅ No SQL injection vulnerabilities found
✅ Good database constraints and foreign keys
✅ Proper unique constraints with composite keys

Note: Earlier diff versions had dynamic query construction, but current version correctly uses parameterization. Good fix!

Test Coverage ✅

test/unit/Pkg/DrainSpec.hs has excellent coverage:

✅ Basic tokenization
✅ Quoted strings, key-value patterns
✅ Various log formats (HTTP, database, errors, timestamps)
✅ Edge cases

Suggestion: Add property-based tests:

prop "tokenize preserves non-empty input" =
  forAll (listOf1 arbitrary) $ \words ->
    not . null $ tokenize (T.unwords words)

Positive Aspects 🌟

Well-structured code - Clear separation of concerns
Type safety - Good use of newtypes (LogPatternId, UUIDId)
Comprehensive logging - Proper structured logging
Good test coverage - Especially Drain algorithm
Idiomatic Haskell - Proper use of Effectful
Good migrations - Proper indexing, idempotent SQL
Excellent refactoring - The mkIssue helper (Issues.hs:756) reduces duplication significantly

Recommendations Priority

Must Fix (Before Merge):

❌ Remove all traceShowM calls
❌ Fix leastRecentlyUsedIndex empty vector handling
❌ Delete commented code

Should Fix (High Value):

⚡ Optimize N+1 baseline calculation query
⚡ Combine getPatternsWithCurrentRates queries
📝 Extract magic numbers (3.0 z-score, 10000 threshold) to config

Nice to Have:

📚 Use more lens operators
📚 Apply LambdaCase/ViewPatterns for brevity
📚 Property-based tests

Line Count Reduction Opportunities

Current additions: 1327 lines
Estimated reduction with optimizations: ~150 lines (11% reduction)

Query consolidation: ~30 lines
Use of LambdaCase: ~40 lines
Lens operators: ~20 lines
Helper extraction: ~30 lines
SELECT * vs column enumeration: ~30 lines

Overall: Strong implementation of a complex feature. The DRAIN algorithm is well-implemented with good test coverage. Main issues are debug code and query optimization opportunities. Fix critical issues and this is ready to merge! 🚀

claude · 2026-01-29T21:04:46Z

Code Review: Log Patterns Implementation

Great work implementing the log pattern detection and anomaly detection system! The architecture is solid. Here are suggestions for more concise and idiomatic code:

CRITICAL ISSUES:

Debug Code in Production (BackgroundJobs.hs:529-530)
- Remove traceShowM statements before merging
Disabled Monitor (BackgroundJobs.hs:361)
- QueryMonitorsCheck -> pass is commented out. Why?
UNIQUE Constraint Mismatch (0026_log_patterns.sql)
- Migration has: UNIQUE(project_id, pattern_hash)
- Code expects: ON CONFLICT (project_id, log_level, field_path, pattern_hash)
- This will cause runtime errors! Fix the migration.

CODE QUALITY:

Models/Apis/LogPatterns.hs:

Line 128-130: Remove V.null guard, PostgreSQL handles empty arrays
Line 273-283: Use Map.fromList for O(log n) lookups instead of O(n) lookup

Pkg/Drain.hs:

Line 104: Use point-free: ((== targetCount) . (.tokenCount))
Good test coverage overall

BackgroundJobs.hs:

Line 520-526: Use do notation in Maybe monad to simplify nesting
Consider batch upserts to reduce DB round-trips

PERFORMANCE:

Add index: CREATE INDEX idx_log_patterns_upsert_key ON apis.log_patterns(project_id, log_level, field_path, pattern_hash)
Monitor memory usage for large log batches

MINOR ISSUES:

Unused import: SeverityLevel (Pkg/Drain.hs:20)
Magic numbers: 1.4826, 287, 300 need comments/constants
trace_id field in migration but not in LogPattern record type
No rollback migration provided

SUMMARY:
Rating: 7.5/10

Must fix before merge:

Remove debug traceShowM calls
Fix UNIQUE constraint mismatch
Address commented out monitor check

This is nearly production-ready with those fixes. Great Drain algorithm implementation!

claude · 2026-01-31T13:10:58Z

Code Review: PR #300 - Log Pattern Detection System

Thank you for this comprehensive implementation of log pattern detection and anomaly tracking! The feature adds valuable functionality for identifying unusual log patterns and volume spikes. Below is a detailed review focusing on code quality, performance, and opportunities for making the code more succinct using available packages and GHC extensions.

High Priority Issues

1. Inefficient N+1 Query Pattern (`src/Models/Apis/LogPatterns.hs:269-313`)

The getPatternsWithCurrentRates function performs two separate queries and manually joins them in Haskell. This is inefficient for large datasets.

Current approach:

getPatternsWithCurrentRates pid = do
  patterns <- PG.query patternsQuery (Only pid)
  counts :: [(Text, Int)] <- PG.query countsQuery (Only pid)
  pure $ map (attachCount counts) patterns

Issue: The lookup in attachCount is O(n), making this O(n²) overall.

Recommended: Use a single SQL LEFT JOIN query:

SELECT lp.id, lp.project_id, lp.log_pattern, lp.pattern_hash,
       lp.baseline_state, lp.baseline_volume_hourly_mean, lp.baseline_volume_hourly_stddev,
       COALESCE(counts.cnt, 0)::INT
FROM apis.log_patterns lp
LEFT JOIN (
  SELECT log_pattern, COUNT(*)::INT as cnt
  FROM otel_logs_and_spans
  WHERE project_id = ?::text AND timestamp >= now() - interval '1 hour'
    AND kind = 'log' AND log_pattern IS NOT NULL
  GROUP BY log_pattern
) counts ON lp.log_pattern = counts.log_pattern
WHERE lp.project_id = ? AND lp.state != 'ignored' AND lp.baseline_state = 'established'

Alternatively, if keeping two queries, convert counts to a HashMap for O(1) lookups:

import qualified Data.HashMap.Strict as HM
countsMap = HM.fromList counts
-- then: HM.findWithDefault 0 logPat countsMap

2. Extract Magic Numbers to Named Constants (`src/BackgroundJobs.hs`)

Multiple hardcoded values should be named constants:

-- Line 234, 238, 259, 289
baselineHoursRequired :: Int = 168  -- 7 days
baselineMeanThreshold :: Double = 100
zScoreThreshold :: Double = 3.0
minPatternAgeDays :: Int = 1
spikeEventVolumeThreshold :: Int = 10000

3. Duplicated Issue Creation Logic (`src/BackgroundJobs.hs:271-275, 295-298`)

The pattern of creating an issue, inserting it, and triggering LLM enhancement is repeated:

Extract to:

createAndEnhanceIssue :: Issue -> ATBackgroundCtx ()
createAndEnhanceIssue issue = do
  authCtx <- ask @Config.AuthContext
  Issues.insertIssue issue
  liftIO $ withResource authCtx.jobsPool \conn ->
    void $ createJob conn "background_jobs" $ 
      EnhanceIssuesWithLLM issue.projectId (V.singleton issue.id)

Then use: createAndEnhanceIssue =<< liftIO (Issues.createLogPatternIssue ...)

4. Inefficient Vector Operations (`src/BackgroundJobs.hs:174-177`)

case V.find (\(i, _, _, _, _) -> i == logId) events of

This O(n) lookup happens inside a loop. Build a HashMap from events first:

import qualified Data.HashMap.Strict as HM
let eventsMap = HM.fromList [(logId, (trId, sName, lvl)) | (logId, _, trId, sName, lvl) <- V.toList events]

5. Use `coerce` More Effectively (`src/Models/Apis/LogPatterns.hs:116`)

getLogPatternTexts pid = coerce @[Only Text] @[Text] <$> PG.query q (Only pid)

This works but is verbose. Since you have DerivingVia, define:

newtype LogPatternTexts = LogPatternTexts [Text]
  deriving newtype (FromRow)
  deriving stock (Coercible [Only Text])

Or simply use: map (\(Only t) -> t) which is clearer.

Medium Priority Improvements

6. Leverage RecordWildCards More (`src/Models/Apis/LogPatterns.hs:303-313`)

attachCount counts (patId, projId, logPat, patHash, blState, blMean, blStddev) =
  LogPatternWithRate
    { patternId = patId
    , projectId = projId
    , logPattern = logPat
    , patternHash = patHash
    , baselineState = blState
    , baselineMean = blMean
    , baselineStddev = blStddev
    , currentHourCount = fromMaybe 0 $ lookup logPat counts
    }

With RecordWildCards (already enabled):

attachCount counts (patternId, projectId, logPattern, patternHash, baselineState, baselineMean, baselineStddev) =
  let currentHourCount = fromMaybe 0 $ lookup logPattern counts
  in LogPatternWithRate{..}

7. Simplify Nested Case Logic (`src/BackgroundJobs.hs:174-177, src/Pages/Anomalies.hs:1465-1606`)

BackgroundJobs.hs:

-- Current deeply nested pattern
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
      Just logId | logId /= "" ->
        case V.find (\(i, _, _, _, _) -> i == logId) events of
          Just (_, _, trId, sName, lvl) -> (sName, lvl, trId)
          Nothing -> (Nothing, Nothing, Nothing)
      _ -> (Nothing, Nothing, Nothing)

-- Better with ViewPatterns and guards:
let (serviceName, logLevel, logTraceId) =
      fromMaybe (Nothing, Nothing, Nothing) $ do
        logId <- ids V.!? 0
        guard (logId /= "")
        (_, _, trId, sName, lvl) <- V.find (\(i, _, _, _, _) -> i == logId) events
        pure (sName, lvl, trId)

Pages/Anomalies.hs: Use pattern guards:

renderIssueStats issue
  | Issues.LogPatternSpike <- issue.issueType
  , AE.Success (logPatternData :: LogPatternSpikeData) <- AE.fromJSON (getAeson issue.issueData)
  = renderLogPatternSpike logPatternData
  | Issues.NewLogPattern <- issue.issueType
  , AE.Success (newPatternData :: NewLogPatternData) <- AE.fromJSON (getAeson issue.issueData)
  = renderNewLogPattern newPatternData
  | otherwise = defaultRender

8. Use Available Packages (`src/Pkg/Drain.hs:1936-1980`)

The tokenization logic could leverage:

attoparsec or megaparsec (check if in dependencies) for structured parsing instead of manual T.head/T.tail
text-builder for efficient text concatenation instead of ++ on lists

The recursive list building with go ... ++ ... is O(n²). Use:

import qualified Data.DList as DL  -- if available
go acc rest = ... DL.snoc acc token ...

Or accumulate with foldl':

foldl' (\acc c -> ...) [] (T.unpack content)

9. Extract Service Name Fallback (`src/Models/Apis/Issues/Enhancement.hs`, `src/Pages/Anomalies.hs`)

Repeated pattern:

service = fromMaybe "unknown-service" issue.service

Add to a utility module or use:

serviceOrDefault :: Issue -> Text
serviceOrDefault = fromMaybe "unknown-service" . (.service)

10. Optimize Drain Pattern Matching (`src/Pkg/Drain.hs`)

The MAD constant 1.4826 should be documented:

-- Constant to convert MAD to standard deviation equivalent
-- 1.4826 ≈ 1/Φ⁻¹(3/4) where Φ is the standard normal CDF
madToStddevFactor :: Double
madToStddevFactor = 1.4826

Also consider using Numeric.Statistics from statistics package if available for robust stat calculations.

Database & Performance

11. Add Missing Indices (`static/migrations/0026_log_patterns.sql`)

-- For spike detection query performance
CREATE INDEX IF NOT EXISTS idx_log_patterns_spike_detection 
  ON apis.log_patterns(project_id, state, baseline_state) 
  WHERE baseline_state = 'established' AND state != 'ignored';

-- For hash lookups
CREATE INDEX IF NOT EXISTS idx_log_patterns_hash 
  ON apis.log_patterns(pattern_hash);

-- For recent patterns ordering
CREATE INDEX IF NOT EXISTS idx_log_patterns_recent 
  ON apis.log_patterns(project_id, last_seen_at DESC);

12. Consider Batch Insert Trigger Debouncing (`static/migrations/0026_log_patterns.sql:47-55`)

The trigger fires on every INSERT, creating a background job each time. For high-volume scenarios, consider:

Batching jobs (e.g., only fire if no job created in last N minutes)
Rate limiting with a check on jobs table
Moving to a scheduled check rather than trigger-based

Security & Error Handling

13. Add Error Handling for Issue Creation (`src/BackgroundJobs.hs:271, 295`)

issue <- liftIO $ Issues.createLogPatternIssue pid lp

If issue creation fails, it's silently ignored. Add:

result <- liftIO $ tryAny $ Issues.createLogPatternIssue pid lp
case result of
  Left err -> Log.logError [fmt|Failed to create log pattern issue: {err}|]
  Right issue -> createAndEnhanceIssue issue

14. Validate Pattern Hash (`src/BackgroundJobs.hs:292`)

Before querying with patternHash, validate it's non-empty and reasonable length.

Test Coverage

15. Good Test Updates (`test/unit/Pkg/DrainSpec.hs`)

The test file shows good coverage improvements. Consider adding:

Tests for edge cases: empty patterns, very long patterns
Tests for the MAD/median baseline calculation
Tests for spike detection with various z-scores

Code Succinctness Summary

Already well-used extensions:

✅ OverloadedRecordDot - used extensively
✅ LambdaCase - could be used more (see Support clicking on a field in the endpoint details page to view more info about the field in the right hand sidebar #7)
✅ RecordWildCards - use more for record construction (Inject endpoint request and response details into the endpoint details view. #6)

Could leverage more:

ViewPatterns - for complex pattern matching (Support clicking on a field in the endpoint details page to view more info about the field in the right hand sidebar #7)
DerivingVia - for reducing boilerplate (Support adding and editing and deleting team members from the new project page (and soon, project settings page) #5)
PatternGuards - for cleaner conditionals (Support clicking on a field in the endpoint details page to view more info about the field in the right hand sidebar #7)
MultiWayIf - already enabled but could replace some case expressions

Packages to consider:

unordered-containers - for HashMap (should already be in deps via aeson)
vector-algorithms - already in deps, good
safe - already in deps for safe indexing

Summary

Excellent work on:

Comprehensive feature implementation with baseline tracking
Good use of robust statistics (median + MAD instead of mean + stddev)
Database migration with proper constraints
Test coverage updates

Key improvements needed:

Combine queries in getPatternsWithCurrentRates for efficiency
Extract magic numbers to named constants
Reduce duplicated issue creation logic
Add database indices for query performance
Build lookup maps instead of repeated O(n) searches

LOC reduction estimate: ~50-80 lines could be removed by:

Using HashMap instead of lists (saves explicit lookups)
Extracting repeated patterns to helpers
Using RecordWildCards more consistently
Simplifying nested case/guards

This is a solid foundation that will benefit from these refinements!

dawkaka and others added 6 commits January 22, 2026 08:45

feat: log patterns migrations

6554a94

log pattern model types

19e5557

feat: complete log patterns partition

c729678

Auto-format code with fourmolu

598d185

ui for log patterns

5e03ab0

Auto-format code with fourmolu

d983b09

This comment was marked as outdated.

Sign in to view

fix drain tests

fc40a76

This comment was marked as outdated.

Sign in to view

tonyalaribe reviewed Jan 22, 2026

View reviewed changes

dawkaka and others added 4 commits January 22, 2026 20:33

fix hlint errors

b55e22f

use select where

282bfb8

fetch log ids at once to avoid N+1 queries

c5f3125

Auto-format code with fourmolu

f2abf87

This comment was marked as outdated.

Sign in to view

feat: use text quasiquotes

83e8e1f

This comment was marked as outdated.

Sign in to view

add issue columns

642b376

This comment was marked as outdated.

Sign in to view

dawkaka and others added 2 commits January 22, 2026 21:22

claude review fixes

07f5f18

Auto-format code with fourmolu

684e41e

This comment was marked as outdated.

Sign in to view

Remove focus outline from AI search input

a254f11

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

This comment was marked as outdated.

Sign in to view

tonyalaribe reviewed Jan 22, 2026

View reviewed changes

chore: multiway if

bf14999

auto set basleine to established after 24 hours

5bbf760

This comment was marked as outdated.

Sign in to view

dawkaka and others added 2 commits January 24, 2026 07:02

updaqte log patterns

2a04530

Auto-format code with fourmolu

2733507

tonyalaribe reviewed Jan 26, 2026

View reviewed changes

dawkaka and others added 2 commits January 27, 2026 15:02

fetch log pattern stats without joining tables

eabc206

Auto-format code with fourmolu

e0566a6

dawkaka added 2 commits January 27, 2026 15:15

use text neat interpolation

bd731dc

remove skip

6c3b074

dawkaka and others added 5 commits January 28, 2026 20:21

improve drain algorithm tokenization

f9f6cbd

add field path to log patterns table

8d48445

add body field path

3d22265

add field path to drain

3e42aef

Auto-format code with fourmolu

d819123

dawkaka and others added 2 commits January 29, 2026 20:53

drain improvements

30873f9

Auto-format code with fourmolu

8de17bf

dawkaka and others added 2 commits January 29, 2026 20:59

chore: fix hlint

110a345

Auto-format code with fourmolu

76156c2

remove trace calls

2f2da2d



		-- \| Get a pattern by ID
		getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)

		@@ -0,0 +1,68 @@
		BEGIN;

		CREATE TABLE IF NOT EXISTS apis.log_patterns (

Conversation

dawkaka commented Jan 22, 2026

How to test

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

claude bot commented Jan 24, 2026

PR Review: Log Patterns Feature

Summary

🔴 Critical Issues

1. Unbounded Query Vulnerability (src/Models/Apis/LogPatterns.hs:96-110)

2. Race Condition in Baseline Calculation (src/BackgroundJobs.hs:1674-1695)

3. Missing Permission Check (src/Models/Apis/LogPatterns.hs:138-148)

4. Potential Orphaned Jobs (src/BackgroundJobs.hs:1748-1752)

⚠️ Performance Issues

1. N+1 Query Pattern (src/BackgroundJobs.hs:1678-1695)

2. Missing Composite Index (static/migrations/0026_log_patterns.sql)

3. Memory Leak Risk (src/Pkg/Drain.hs:230-236)

📝 Code Succinctness Improvements

1. Use MultiWayIf (src/BackgroundJobs.hs:1691-1693)

2. Use ViewPatterns (src/Pkg/Drain.hs:156-168)

3. Replace Case with fromMaybe (src/Models/Apis/LogPatterns.hs:248-252)

4. Define Data Types Instead of Large Tuples (src/BackgroundJobs.hs:1720)

5. Simplify Vector Conversions (src/BackgroundJobs.hs:512-515)

🔒 Security Concerns

1. Missing Input Validation (src/Models/Apis/LogPatterns.hs:151-165)

2. Unvalidated Bulk Actions (src/Pages/Anomalies.hs:141-160)

📦 Database Schema Issues

static/migrations/0026_log_patterns.sql

✅ Positive Highlights

Test Coverage

Recommended Priority

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 27, 2026

Code Review: Log Patterns Implementation

Critical Issues 🔴

1. Division by Zero Risk

2. Trigger Performance Issue

3. Memory Risk with Large Pattern Sets

High Priority Issues ⚠️

1. Unbounded Query Vulnerability (`src/Models/Apis/LogPatterns.hs:96-110`)

2. Race Condition in Baseline Calculation (`src/BackgroundJobs.hs:1674-1695`)

3. Missing Permission Check (`src/Models/Apis/LogPatterns.hs:138-148`)

4. Potential Orphaned Jobs (`src/BackgroundJobs.hs:1748-1752`)

1. N+1 Query Pattern (`src/BackgroundJobs.hs:1678-1695`)

2. Missing Composite Index (`static/migrations/0026_log_patterns.sql`)

3. Memory Leak Risk (`src/Pkg/Drain.hs:230-236`)

1. Use MultiWayIf (`src/BackgroundJobs.hs:1691-1693`)

2. Use ViewPatterns (`src/Pkg/Drain.hs:156-168`)

3. Replace Case with fromMaybe (`src/Models/Apis/LogPatterns.hs:248-252`)

4. Define Data Types Instead of Large Tuples (`src/BackgroundJobs.hs:1720`)

5. Simplify Vector Conversions (`src/BackgroundJobs.hs:512-515`)

1. Missing Input Validation (`src/Models/Apis/LogPatterns.hs:151-165`)

2. Unvalidated Bulk Actions (`src/Pages/Anomalies.hs:141-160`)

`static/migrations/0026_log_patterns.sql`