Skip to content

Conversation

@constshift
Copy link

Add In-Memory File Support

Summary

This PR adds support for opening tabular data files directly from memory, enabling processing of files from HTTP responses, embedded data, and other sources without writing to disk.

Motivation

Currently, grate only supports opening files from the filesystem using grate.Open(filename). This limitation requires users to:

  • Write temporary files when processing HTTP downloads
  • Use filesystem I/O for data already in memory
  • Create temporary files for testing

This PR addresses these limitations by adding in-memory support across all file formats.

Changes

New Public API

  1. grate.OpenBytes(data []byte) (Source, error)

    • Opens tabular data from an in-memory byte slice
    • Auto-detects format (XLS, XLSX, CSV, TSV)
    • Returns same Source interface as Open()
  2. grate.RegisterWithBytes(name, priority, opener, openerBytes)

    • Enhanced registration supporting both file and in-memory handlers
    • Maintains backward compatibility with existing Register()

Implementation Details

Core Package (grate.go)

  • Added OpenBytesFunc type for format handlers
  • Added OpenBytes() and OpenReader() functions
  • Updated registration system to support optional in-memory handlers

XLS Package (xls/xls.go, xls/cfb/interface.go)

  • Added OpenBytes() function
  • Refactored to use openWorkbook() helper to eliminate duplication
  • Added cfb.OpenBytes() for Compound File Binary support

XLSX Package (xlsx/xlsx.go)

  • Added OpenBytes() function
  • Refactored to use parseDocument() helper to eliminate duplication
  • Updated Close() to handle nil file handle

Simple Package (simple/csv.go, simple/tsv.go)

  • Added OpenCSVBytes() and OpenTSVBytes() functions
  • Refactored to use parseCSV() and parseTSV() helpers

Testing

  • Added test suite in inmemory_test.go:
    • TestOpenBytes - validates all formats from byte slices
  • All existing tests pass (XLS, XLSX, CSV, TSV)
  • Created working example in examples/example_inmemory.go

Documentation

  • Updated README.md with in-memory usage examples
  • Added example for OpenBytes()

Usage Examples

From Byte Slice

data, _ := os.ReadFile("data.xlsx")
wb, _ := grate.OpenBytes(data)
defer wb.Close()
// ... use wb normally

From Embedded File

//go:embed data.xlsx
var embeddedData []byte

wb, _ := grate.OpenBytes(embeddedData)
defer wb.Close()
// ... use wb normally

Breaking Changes

None. This PR is fully backward compatible:

  • All existing grate.Open(filename) code works unchanged
  • Existing format handlers continue to work
  • No changes to public interfaces or behavior

Performance Considerations

  • In-memory operations avoid filesystem I/O overhead
  • Memory usage: entire file loaded into RAM (same as current ZIP/CFB parsing)
  • No performance regression for existing file-based operations
  • TSV/CSV: same streaming scanner approach, just different source

- Introduced OpenBytes function for in-memory data handling in CSV, TSV, XLS, and XLSX formats.
- Updated registration functions to support in-memory operations.
- Added example demonstrating in-memory file processing.
- Created tests for opening files from byte slices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant