Skip to content

[Refactor] Phase 2: Introduce dedicated table types #96

@JonnyTran

Description

@JonnyTran

Summary

Introduce dedicated TableField and TableQuestion types in the backend and frontend to properly handle and validate table data, leveraging workspace schema configuration for enhanced type safety and validation. This phase replaces the current approach of storing table data as stringified JSON.

Motivation

The current implementation has several critical limitations that impact data integrity and user experience:

  1. Table data is stored as stringified JSON in generic TextField fields and TextQuestionAnswer objects
  2. No dedicated types for table data in the database models, limiting validation capabilities
  3. Limited integration with workspace schema configuration for table validation
  4. Inefficient serialization/deserialization between frontend and backend
  5. This approach makes it difficult to ensure table data integrity and provide proper type safety

Strategic Context

This phase is part of the larger workspace-level schema management architecture redesign. It builds on Phase 1's foundation to create proper data types that integrate with workspace schema configuration, enabling the advanced table functionality required for the document-schema-fields table view.

Proposed Refactor

This phase focuses on two main improvements that create the foundation for advanced table handling:

1. Implement TableField Type in Backend (#90)

  • Create Dedicated Type: Add proper TableField support to backend models with workspace integration
  • Schema-Aware Validation: Use workspace schema configuration for table validation
  • Enhanced API Support: Update handlers to work with table types and schema context
  • Migration Strategy: Convert existing text fields with table data while maintaining compatibility

2. Implement TableQuestion Type and Improve Answer Handling (#91)

  • Dedicated Question Type: Create TableQuestion type with workspace schema integration
  • Enhanced Frontend Handling: Improve TableQuestionAnswer with schema awareness
  • Schema-Based Validation: Validate table answers against workspace schema definitions
  • Improved User Experience: Create better table editing and review interfaces

Dependencies

This phase builds on the foundational infrastructure:

  • Required: Workspace Schema Configuration Infrastructure
  • Required: SchemaService implementation for table validation
  • Recommended: Phase 1 completion for optimal user experience
  • Foundation for: Phase 3 reference resolution and suggestion handling

Implementation Strategy

Following the strategic approach of proper data modeling:

Backend-First Approach

  1. Create Proper Types: Define TableField and TableQuestion with workspace integration
  2. Implement Validation: Use workspace schema configuration for comprehensive validation
  3. Enhance APIs: Update handlers to support table-specific operations
  4. Migration Support: Provide seamless transition from existing string-based storage

Frontend Integration

  1. Enhanced Components: Update table handling components to use proper types
  2. Schema Awareness: Integrate with workspace schema configuration for validation
  3. Improved UX: Create better editing and validation experiences
  4. Type Safety: Ensure frontend properly handles structured table data

Acceptance Criteria

Backend Requirements

  • Dedicated TableField and TableQuestion types exist with proper enum support
  • Table types integrate with workspace schema configuration for validation
  • Schema-based validation is implemented for table fields and answers
  • API handlers correctly process table types with workspace context
  • Migration scripts successfully convert existing table data
  • Backward compatibility is maintained during transition

Frontend Requirements

  • TableQuestionAnswer provides improved table editing experience with schema awareness
  • Table data is properly handled with type safety throughout the frontend
  • Frontend components integrate with workspace schema configuration
  • UI provides clear validation feedback and error handling
  • Reference resolution works correctly in table components

Quality Requirements

  • Performance is optimized for table data operations
  • Integration tests verify table field and question functionality
  • All tests pass with the new types
  • Documentation covers table type configuration and usage
  • Error handling provides meaningful feedback to users

Technical Implementation

Database Schema Updates

# Enhanced enums
class FieldType(str, Enum):
    table = "table"  # New type with workspace integration

class QuestionType(str, Enum):
    table = "table"  # New type with workspace integration

# Enhanced models with workspace schema awareness
class Field(DatabaseModel):
    def validate_table_data(self, data: dict, workspace_id: UUID) -> List[ValidationError]:
        schema_service = SchemaService(workspace_id)
        return schema_service.validate_table_field(self.name, data)

class Question(DatabaseModel):
    def validate_table_answer(self, data: dict, workspace_id: UUID) -> List[ValidationError]:
        schema_service = SchemaService(workspace_id)
        return schema_service.validate_table_answer(self.name, data)

Frontend Type Safety

export class TableQuestionAnswer extends QuestionAnswer {
    public value: TableAnswer;
    public schemaDefinition: TableSchema;
    public workspaceId: string;

    // Enhanced with workspace schema awareness
    get isValid(): boolean {
        return this.value?.data?.length > 0 && this.validateAgainstSchema();
    }

    private validateAgainstSchema(): boolean {
        // Validate using workspace schema configuration
        return true;
    }
}

Related Issues

Direct Components

Strategic Dependencies

Enables Future Work

Success Metrics

  • Data Integrity: Reduced table data validation errors
  • Type Safety: Elimination of string-based table storage issues
  • Developer Experience: Improved ability to work with table data
  • Performance: Optimized table data handling and validation
  • User Experience: Better table editing and validation feedback

Migration Considerations

  • Gradual Migration: Support both old and new types during transition
  • Data Preservation: Ensure no data loss during type conversion
  • Performance: Maintain system performance during migration
  • Rollback Capability: Ability to revert changes if needed

This is Phase 2 of the larger Extralit Document Extraction Data Architecture refactoring plan. This phase focuses on creating proper data types that integrate with workspace schema configuration, providing the foundation for advanced table functionality while maintaining system stability and performance.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorCode refactoring or technical debt improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions