Skip to content

Excel extraction should produce per-sheet .csv.txt files with frontmatter #9

@sbaker

Description

@sbaker

Summary

When an Excel file (.xlsx) is extracted during compilation/packaging, the current implementation produces raw text output. Instead, each sheet should be extracted as a separate .csv.txt file with YAML frontmatter metadata.

Expected Behavior

Given an Excel file with sheets "Revenue" and "Expenses":

Output: revenue.csv.txt

---
source: financials.xlsx
sheet: Revenue
rows: 150
columns: 8
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...

Output: expenses.csv.txt

---
source: financials.xlsx
sheet: Expenses
rows: 75
columns: 6
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...

Current Behavior

Excel files are extracted as a single blob of text without sheet separation or frontmatter.

Context

  • This applies to the registry-bound output — by the time content hits the registry it should already be in .csv.txt format
  • Each sheet becomes its own file, broken out as individual code files with frontmatter
  • Frontmatter should include source file, sheet name, and basic stats
  • The xlsx dependency (0.18.5) is only used for trusted text extraction during compile/package, not user-uploaded content

Affected Components

  • Python CLI: compiler.py (binary asset extraction stage)
  • Go CLI: compiler.go
  • Node.js CLI: compilation pipeline

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions