Skip to content

feat: Slack Integration — workspace-wide message ingestion with thread/spine extraction #52

@Flare576

Description

@Flare576

Overview

Add Slack as a data source for Ei. Like Cursor/ClaudeCode, a single synthetic persona per workspace ingests messages and feeds the extraction pipeline. Unlike code tool integrations, Slack is human↔human conversation — higher signal for people/topics, different extraction unit model.

This ticket also establishes the two-speed polling architecture that Ticket #N will later port to the other integrations.

Depends on: #51 (sources[] field)


Auth

Two modes, both supported:

auth: {
  type: "xoxp" | "browser";
  token: string;    // xoxp (permanent) or xoxc (browser session, ~2 weeks)
  cookie?: string;  // xoxd (browser mode only)
}

xoxp (recommended): User creates a Slack App at api.slack.com once, installs to their workspace, gets a non-expiring user token. Works on Enterprise Grid. No compliance risk.

browser (quick start): xoxc/xoxd extracted from browser DevTools (or auto-extracted from Slack's local LevelDB at ~/Library/Containers/com.tinyspeck.slackmacgap/Data/Library/Application Support/Slack/Local Storage/leveldb). Expires every ~2 weeks. Good for initial setup and testing.

OAuth redirect via existing https://ei.flare576.com/callback/ pattern (same as Spotify).


Channel Tier Model

Extraction scope and backfill depth vary by channel type:

Tier Criteria Backfill Extraction scope
DM D-prefix channel ID 90 days All messages
Private G-prefix or private channel 90 days All messages
Public C-prefix, member count < threshold 30 days Threads you posted in only
Broadcast C-prefix, member count ≥ threshold (default: 100) Skip Skip

User can override any channel's tier via channel_overrides.

Why "threads you engaged in" for public channels? Real-world data from R&P shows Jeremy posts in ~9% of public channel messages. Extracting ambient channel discussion he never participated in is LLM cost with minimal signal gain. The broadcast threshold isn't about member count as a quality proxy — it's about channels where you're a lurker by design.


Extraction Model

The Two Units

Thread unit — A Slack thread (parent + replies):

  • messages_context: thread root + all previously-extracted replies
  • messages_analyze: new replies since thread_last_extracted
  • Thread roots are NEVER in the spine's analyze set — they belong to the thread unit
  • Threads with new activity are always re-queued, even if the parent is old

Spine unit — Unthreaded messages in the channel:

  • messages_context: ~8h of prior unthreaded messages
  • messages_analyze: new unthreaded messages in the extraction window
  • Thread roots appear as [Processed] anchors showing where side-conversations branched
  • Thread reply content is NEVER in the spine

Processing Order (per channel per day)

  1. Threads first — any thread with new replies since last sync, oldest first
  2. Spine second — unthreaded messages in the window, with thread roots as anchors

The Interleaved Spine Format

Current extraction prompts use two clean sections ("Earlier Conversation" / "Most Recent Messages"). The spine needs a third concept: processed thread roots interleaved with new unthreaded messages. This requires a new prompt variant buildSpineExtractionPrompt():

## Earlier Context
[mid:abc:human] ← prior unthreaded messages

## Channel Messages
[Processed] [mid:def:Jeremy] ← thread root (side-conversation happened here)
[New] [mid:ghi:Tom]          ← unthreaded message to extract from
[New] [mid:jkl:Jeremy]       ← unthreaded message to extract from
[Processed] [mid:mno:Jeremy] ← thread root (another side-conversation)
[New] [mid:pqr:Tom]          ← unthreaded message to extract from

System prompt: "ONLY ANALYZE messages labeled [New]. Messages labeled [Processed] are thread roots — they show where side-conversations branched off. Do not extract from [Processed] messages."


One Day At A Time — Workspace-Wide Cursor

Unlike code tool integrations (session-at-a-time), Slack advances a workspace-level cursor 24h per run:

  1. Read extraction_point (ISO timestamp — workspace cursor)
  2. extractionWindow = [extraction_point, extraction_point + 24h]
  3. If extractionLimit > 30 days ago: skip public channels (they only get 30 days backfill)
  4. For each eligible channel with activity in window:
    a. Queue thread units (threads with new replies)
    b. Queue spine unit (unthreaded messages in window)
  5. Advance extraction_point by 24h

Why this approach? Real-world R&P data shows ~500-800 messages/month across all channels. A 24h window is ~16-26 messages — 1-2 LLM extraction calls. The 90-day DM backfill is 90 micro-runs, identical cadence to OpenCode's session-at-a-time approach.


Two-Speed Polling (New Architecture — Establish Pattern Here)

Current integrations use fixed 60s polling with no backoff, producing constant "nothing to do" log spam when caught up. Slack establishes the correct pattern:

type SyncResult = "imported" | "nothing_to_do";

Catch-up speed (default 60s): When extraction_point is more than 24h behind now.
Steady-state speed (default 3600s / 1h): When extraction_point is within 24h of now.

Processor uses SyncResult to switch speeds:

  • "imported" → stay at catch-up speed (or reset to it)
  • "nothing_to_do" → switch to steady-state speed

No user config needed. No log spam. Fast during backfill, quiet when current.

This pattern will be ported to OpenCode/Cursor/ClaudeCode in a follow-up ticket once validated on Slack.


Settings Shape

// src/integrations/slack/types.ts
export interface SlackSettings {
  integration?: boolean;
  polling_interval_ms?: number;          // catch-up speed, default: 60000
  steady_state_interval_ms?: number;     // steady-state speed, default: 3600000
  extraction_model?: string;             // "Provider:model" override
  extraction_point?: string;             // ISO — workspace cursor, advances 24h/run
  last_sync?: string;                    // ISO — last time integration ran

  auth: {
    type: "xoxp" | "browser";
    token: string;
    cookie?: string;                     // xoxd (browser mode only)
  };

  backfill_days: {
    dm: number;                          // default: 90
    private: number;                     // default: 90
    public: number;                      // default: 30
  };

  broadcast_threshold?: number;          // member count above which = skip, default: 100

  channel_overrides?: Record<string, "dm" | "private" | "public" | "skip">;

  channels: Record<string, SlackChannelState>;
}

export interface SlackChannelState {
  spine_last_extracted?: string;         // ISO
  threads: Record<string, string>;       // threadTs → ISO last extracted
}

Add slack?: SlackSettings to HumanSettings in src/core/types/entities.ts.


Sources Tagging (requires #51)

All extracted items get tagged with the source channel:

sources: [`slack:${channelId}`]   // e.g. "slack:C09V5R90C0G"

This enables ei topics --source slack "deployment" and ei topics --source "slack:C09V5R90C0G" for channel-scoped queries.


Real-World Data (R&P Workspace)

Sampled to validate window size assumptions:

Metric Value
Total channel memberships ~1,513
Channels with recent activity ~10-20
Dead/archived channels (0 members) ~200+
Estimated messages/month (all channels) ~500-800
Messages in a 24h window ~16-26
Thread density ~30% of top-level messages
Average thread depth ~8 replies
Jeremy's share of active project channel ~34%
Jeremy's share of public interest channel ~9%

What Changes

File Change
src/integrations/slack/ New directory: types.ts, importer.ts, reader.ts
src/prompts/human/spine-scan.ts New prompt builder for interleaved [Processed]/[New] spine format
src/core/types/entities.ts Add slack?: SlackSettings to HumanSettings
src/core/processor.ts Add checkAndSyncSlack(), two-speed polling logic, SyncResult type
src/core/tools/builtin/ Slack OAuth auth helper (mirrors spotify-auth.ts)
web/ Settings UI: Slack connection, backfill config, channel tier overrides
tui/ /auth slack command, /settings Slack section
CONTRACTS.md Document Slack persona naming convention, channel tier model

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions