-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
When an Excel file (.xlsx) is extracted during compilation/packaging, the current implementation produces raw text output. Instead, each sheet should be extracted as a separate .csv.txt file with YAML frontmatter metadata.
Expected Behavior
Given an Excel file with sheets "Revenue" and "Expenses":
Output: revenue.csv.txt
---
source: financials.xlsx
sheet: Revenue
rows: 150
columns: 8
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...
Output: expenses.csv.txt
---
source: financials.xlsx
sheet: Expenses
rows: 75
columns: 6
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...
Current Behavior
Excel files are extracted as a single blob of text without sheet separation or frontmatter.
Context
- This applies to the registry-bound output — by the time content hits the registry it should already be in
.csv.txtformat - Each sheet becomes its own file, broken out as individual code files with frontmatter
- Frontmatter should include source file, sheet name, and basic stats
- The xlsx dependency (0.18.5) is only used for trusted text extraction during compile/package, not user-uploaded content
Affected Components
- Python CLI:
compiler.py(binary asset extraction stage) - Go CLI:
compiler.go - Node.js CLI: compilation pipeline
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels