Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,28 @@ Format follows [Keep a Changelog](https://keepachangelog.com/). Versions follow

---

## [0.2.7] - 2026-03-28

### Added

- **Episodic Task Schema** (BL-003): New `EpisodicTaskRecord` interface with `TaskState`, `FailureType` types for tracking task episodes.
- **Task Episode Capture** (BL-014): Methods for creating, updating, and querying task episodes.
- **Validation Outcome Ingestion** (BL-015): Parse type-check, build, and test validation results.
- **Failure Taxonomy** (BL-016): `classifyFailure()` function categorizes errors as syntax, runtime, logic, resource, or unknown.
- **Success Pattern Extraction** (BL-017): Extract command sequences and tools from successful task episodes.
- **Similar Task Recall** (BL-018): Find similar past tasks with configurable similarity threshold (0.85).
- **Retry/Recovery Evidence** (BL-019, BL-020): Track retry attempts and recovery strategies with budget suggestions.
- `addCommandToEpisode()`, `addValidationOutcome()`, `addSuccessPatterns()` store methods.
- `addRetryAttempt()`, `addRecoveryStrategy()`, `suggestRetryBudget()`, `suggestRecoveryStrategies()` store methods.
- `parseValidationOutput()` utility for parsing validation output.

### Testing

- New unit tests for episodic task CRUD operations.
- New unit tests for validation parsing and failure classification.

---

## [0.2.6] - 2026-03-27

### Fixed
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-28
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Context

The existing memory system captures individual events (memory capture, recall, feedback) but lacks structured representation of task-level execution. The backlog identifies BL-003 as foundational for episodic learning—without task episode schema, we cannot capture, classify, or learn from task execution patterns.

## Goals / Non-Goals

**Goals:**
- Define EpisodicTaskRecord schema with essential fields
- Support task states: pending, running, success, failed, timeout
- Support failure classification taxonomy

**Non-Goals:**
- Implementing actual episode capture logic (deferred to separate change)
- Multi-task orchestration
- Complex task dependency graphs

## Decisions

### Decision: Separate Table vs Extended MemoryRecord
Use separate `episodic_tasks` table rather than extending MemoryRecord.

**Rationale:** Task episodes have different lifecycle and query patterns than memories. Separation enables independent scaling and querying.

### Decision: Failure Taxonomy Categories
Define failure types: syntax, runtime, logic, resource, unknown.

**Rationale:** Standardized taxonomy enables pattern learning across similar failures. Categories map to common development error types.

### Decision: Lazy Schema Initialization
Initialize episodic_tasks table on first use, not at provider init.

**Rationale:** Reduces startup overhead if episodic features aren't used. Backward compatible with existing deployments.

## Risks / Trade-offs

- [Risk] Schema evolution complexity → **Mitigation**: Version field in record, forward-compatible additions
- [Risk] Query performance with large episode volumes → **Mitigation**: Index on task state and timestamp
- [Risk] Integration with existing events → **Mitigation**: Reference existing sessionID for correlation
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Why

Current memory system captures individual events but lacks structured representation of task execution episodes. To enable episodic learning and retry/recovery intelligence, we need a dedicated schema for capturing task-level execution records with validation outcomes, failure classifications, and success patterns.

## What Changes

- Add new `episodic_tasks` table for task episode records
- Define task states: pending, running, success, failed, timeout
- Add failure taxonomy classification system
- Integrate task capture with existing session events

## Capabilities

### New Capabilities
- `episodic-task-schema`: Core schema for task episode records with states, outcomes, and metadata

### Modified Capabilities
- None (this is a foundational schema change)

## Impact
- New database table: `episodic_tasks`
- Schema extensions to existing types
- No impact on existing memory operations
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## ADDED Requirements

### Requirement: Episodic task record creation
The system SHALL support creating episodic task records with task ID, session ID, scope, start time, and initial state.

#### Scenario: Task episode starts
- **WHEN** a task begins execution with task ID "task-123" in scope "project:myproject"
- **THEN** an episodic task record is created with state "running"

### Requirement: Task state transitions
The system SHALL support updating task state: pending → running → success | failed | timeout.

#### Scenario: Task succeeds
- **WHEN** task with ID "task-123" completes successfully
- **THEN** the task record state is updated to "success"

#### Scenario: Task fails
- **WHEN** task with ID "task-123" fails
- **THEN** the task record state is updated to "failed"

### Requirement: Failure classification
The system SHALL support classifying failures by taxonomy: syntax, runtime, logic, resource, unknown.

#### Scenario: Failure classified as syntax
- **WHEN** a task fails with syntax error
- **THEN** the failureType field is set to "syntax"

### Requirement: Task episode retrieval
The system SHALL support querying task episodes by scope, state, and time range.

#### Scenario: Query failed tasks
- **WHEN** querying for failed tasks in scope "project:myproject"
- **THEN** returns all task records with state "failed" in that scope
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## 1. Type Definitions

- [x] 1.1 Define EpisodicTaskRecord interface in types.ts
- [x] 1.2 Define TaskState type (pending, running, success, failed, timeout)
- [x] 1.3 Define FailureType taxonomy enum

## 2. Database Schema

- [x] 2.1 Create episodic_tasks table in store.ts
- [x] 2.2 Add lazy initialization on first use
- [ ] 2.3 Add index on task state and timestamp

## 3. Store Methods

- [x] 3.1 Implement createTaskEpisode method
- [x] 3.2 Implement updateTaskState method
- [x] 3.3 Implement getTaskEpisode method
- [x] 3.4 Implement queryTaskEpisodes method

## 4. Testing

- [x] 4.1 Add unit tests for task episode CRUD
- [x] 4.2 Add integration tests for lazy initialization
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-28
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Context

The current memory system relies entirely on automatic capture and retrieval. Users have no explicit control over what memories are stored, how they're used, or the ability to correct misunderstandings. The backlog identifies BL-010, BL-011, BL-012 as the first set of user-facing commands that give users explicit memory management capabilities.

## Goals / Non-Goals

**Goals:**
- Implement `/remember` command for explicit memory capture with optional labels
- Implement `/forget` command for memory removal (soft-delete and hard-delete options)
- Implement `/what-did-you-learn` command for viewing recent memory summaries
- Integrate all commands with existing effectiveness tracking

**Non-Goals:**
- Multi-user identity management (deferred to BL-034)
- Preference learning (separate change)
- Episodic task recording (separate change)

## Decisions

### Decision: Command Interface
Use tool-based interface matching existing `memory_search`, `memory_delete` patterns rather than slash commands.

**Rationale:** Consistent with OpenCode tool calling convention, easier to test, better structured output.

### Decision: Soft-Delete Default
`/forget` defaults to soft-delete (marks memory as disabled) rather than hard-delete.

**Rationale:** Preserves audit trail, enables recovery, maintains effectiveness event integrity. Hard-delete available via explicit flag.

### Decision: Summary Format
`/what-did-you-learn` returns categorized summaries rather than raw memory list.

**Rationale:** More actionable for users, reduces context overhead, enables future preference inference from summaries.

## Risks / Trade-offs

- [Risk] User confusion between auto-capture and explicit remember → **Mitigation**: Document difference, consider distinct storage flag
- [Risk] Memory bloat from excessive explicit captures → **Mitigation**: Apply same minChar threshold as auto-capture
- [Risk] Effectiveness metrics double-counting → **Mitigation**: Use distinct event source type for explicit vs auto operations
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Why

Users currently have no way to explicitly manage their memories—capture, retrieval, and deletion are entirely automatic. This limits user control and makes it difficult to teach the system about preferences or correct its understanding. Adding explicit memory commands gives users agency over their memory footprint and enables preference learning.

## What Changes

- Add `/remember` command for explicit memory capture
- Add `/forget` command for explicit memory removal/disabling
- Add `/what-did-you-learn` command for viewing recent learning summary
- All commands integrate with existing effectiveness tracking pipeline

## Capabilities

### New Capabilities

- `memory-explicit-remember`: Explicit memory capture command with optional context/category labels
- `memory-explicit-forget`: Explicit memory removal command with soft-delete and hard-delete options
- `memory-learning-summary`: Recent learning summary view with configurable time window

### Modified Capabilities

- `memory-management-commands`: Extends with three new commands (remember, forget, what-did-you-learn)

## Impact

- New tool implementations in `src/tools/`
- New CLI command handlers
- Schema changes for soft-delete support (optional `status` field on MemoryRecord)
- Integration with existing effectiveness_events table
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# memory-explicit-forget Specification

## Purpose
Enable users to explicitly remove or disable memories.

## ADDED Requirements

### Requirement: Soft-delete memory command
The system SHALL provide a forget command that marks memories as disabled without immediate physical deletion.

#### Scenario: User soft-deletes a memory
- **WHEN** user invokes forget command with a valid memory ID (no force flag)
- **THEN** the memory status is set to disabled
- **AND** the memory is excluded from search results
- **AND** the command returns success with updated status

#### Scenario: Soft-deleted memory is not retrieved
- **WHEN** a search is executed
- **THEN** memories with status disabled are not included in results
- **AND** effectiveness recall events do not count disabled memories

### Requirement: Hard-delete memory command
The system SHALL provide an option to permanently delete memories.

#### Scenario: User hard-deletes a memory
- **WHEN** user invokes forget command with a valid memory ID and force flag
- **THEN** the memory is physically removed from the database
- **AND** the command returns success confirmation

#### Scenario: Hard-delete without confirmation fails
- **WHEN** user invokes forget command with force flag but without explicit confirm
- **THEN** the command is rejected with guidance for safe execution

### Requirement: Forget emits effectiveness event
The system SHALL record forget operations in the effectiveness pipeline.

#### Scenario: Forget operation emits event
- **WHEN** user successfully executes forget (soft or hard)
- **THEN** the system records an event for audit purposes
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# memory-explicit-remember Specification

## Purpose
Enable users to explicitly capture memories with optional contextual labels.

## ADDED Requirements

### Requirement: Explicit memory capture command
The system SHALL provide an explicit memory capture command that accepts content text and optional context/category labels.

#### Scenario: User captures explicit memory
- **WHEN** user invokes remember command with content "Always use TypeScript for new projects"
- **THEN** the memory is stored with content "Always use TypeScript for new projects"
- **AND** the memory is marked with source as explicit-remember

#### Scenario: User captures memory with category label
- **WHEN** user invokes remember command with content and category "preference"
- **THEN** the memory is stored with the category label attached
- **AND** the category is queryable in search

#### Scenario: Explicit memory triggers effectiveness tracking
- **WHEN** user successfully captures an explicit memory
- **THEN** the system records a capture event with source explicit-remember
- **AND** the event is included in effectiveness summaries

### Requirement: Minimum content threshold
The system SHALL apply the same minimum character threshold to explicit memories as auto-capture.

#### Scenario: Explicit memory below threshold
- **WHEN** user invokes remember command with content shorter than minCaptureChars
- **THEN** the command returns a warning that content is too short
- **AND** no memory is stored
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# memory-learning-summary Specification

## Purpose
Provide users with a view of what the system has learned recently.

## ADDED Requirements

### Requirement: Learning summary command
The system SHALL provide a command that returns a summary of recently captured memories.

#### Scenario: User requests learning summary
- **WHEN** user invokes what-did-you-learn command
- **THEN** the system returns a summary of memories captured in the past 7 days
- **AND** the summary is organized by category when categories exist

#### Scenario: Summary with custom time window
- **WHEN** user invokes what-did-you-learn with days=30
- **THEN** the system returns memories from the past 30 days

#### Scenario: Empty summary for new users
- **WHEN** user invokes what-did-you-learn with no prior memories
- **THEN** the system returns a message indicating no memories captured yet

### Requirement: Summary includes memory counts
The system SHALL provide memory counts by category in the summary.

#### Scenario: Summary shows category breakdown
- **WHEN** user invokes what-did-you-learn
- **THEN** the response includes count of memories per category
- **AND** total memory count is included
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## 1. Tool Interface Design

- [x] 1.1 Define tool schema for memory_explicit_remember command
- [x] 1.2 Define tool schema for memory_explicit_forget command
- [x] 1.3 Define tool schema for memory_learning_summary command

## 2. Memory Explicit Remember Implementation

- [x] 2.1 Implement memory_explicit_remember handler
- [x] 2.2 Add content validation (minChars threshold)
- [x] 2.3 Add category label support
- [x] 2.4 Integrate with effectiveness event emission

## 3. Memory Explicit Forget Implementation

- [x] 3.1 Implement memory_explicit_forget handler
- [x] 3.2 Add soft-delete logic (status=disabled)
- [x] 3.3 Add hard-delete logic with confirm flag
- [x] 3.4 Update search to exclude disabled memories
- [x] 3.5 Add forget event to effectiveness pipeline

## 4. Learning Summary Implementation

- [x] 4.1 Implement memory_learning_summary handler
- [x] 4.2 Add time window parameter (default 7 days)
- [x] 4.3 Add category grouping logic
- [x] 4.4 Add memory count by category

## 5. Integration and Testing

- [x] 5.1 Register new tools with provider
- [x] 5.2 Add unit tests for each command
- [x] 5.3 Add integration tests for effectiveness pipeline
- [x] 5.4 Update documentation with new commands
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-28
Loading