Skip to content

[Refactor] Improve table reference resolution system #92

@JonnyTran

Description

@JonnyTran

Description

Refactor and improve the table reference resolution system to leverage workspace schema configuration and provide a more robust, maintainable solution for handling references between tables. This enhancement moves critical logic to the backend and improves multi-user scenarios.

Problem

  • Reference resolution between tables is complex and error-prone
  • Multi-user scenarios are not handled well in the current implementation
  • Current implementation mixes concerns between frontend and backend
  • No clear ownership of reference resolution logic
  • Limited integration with workspace schema configuration
  • References are resolved client-side, leading to performance and consistency issues

Proposed Solution

  1. Leverage Workspace Schema Configuration: Use schema definitions to understand reference relationships
  2. Move Resolution to Backend: Implement server-side reference resolution using SchemaService
  3. Create Document-Centric APIs: Use DocumentService for cross-dataset reference resolution
  4. Enhance Frontend Components: Simplify frontend by using backend resolution services
  5. Improve Multi-User Support: Handle reference consistency across users

Implementation Details

Dependencies

Backend Changes

  1. Enhance SchemaService with reference resolution:

    class SchemaService:
        def resolve_table_references(self, 
                                   table_data: dict, 
                                   workspace_id: UUID,
                                   user_id: UUID = None) -> dict:
            """Resolve all references in table data using workspace schema"""
            schema_config = self.get_workspace_schema_config(workspace_id)
            resolved_data = table_data.copy()
            
            for ref_column in self._get_reference_columns(table_data):
                ref_values = self._resolve_reference_column(
                    ref_column, table_data[ref_column], workspace_id, user_id
                )
                resolved_data[f"{ref_column}_resolved"] = ref_values
            
            return resolved_data
        
        def get_reference_schema_mapping(self, workspace_id: UUID) -> Dict[str, str]:
            """Get mapping of reference columns to their target schemas"""
            pass
            
        def validate_reference_consistency(self, 
                                         table_data: dict, 
                                         workspace_id: UUID) -> List[ValidationError]:
            """Validate that all references point to valid records"""
            pass
  2. Create DocumentService for cross-dataset references:

    class DocumentService:
        def resolve_document_references(self, 
                                      document_ref: str, 
                                      workspace_id: UUID,
                                      user_id: UUID = None) -> Dict[str, Any]:
            """Resolve all references for a complete document across datasets"""
            all_records = self.get_document_records(document_ref, workspace_id)
            schema_service = SchemaService(workspace_id)
            
            resolved_document = {}
            for record in all_records:
                if self._has_table_data(record):
                    resolved_data = schema_service.resolve_table_references(
                        record.table_data, workspace_id, user_id
                    )
                    resolved_document[record.schema_name] = resolved_data
            
            return resolved_document
  3. Add reference resolution API endpoints:

    @router.post("/workspaces/{workspace_id}/tables/resolve-references")
    async def resolve_table_references(
        workspace_id: UUID,
        table_data: dict,
        user_id: UUID = None,
        db: AsyncSession = Depends(get_async_db)
    ):
        schema_service = SchemaService(workspace_id, db)
        return schema_service.resolve_table_references(table_data, workspace_id, user_id)
    
    @router.get("/workspaces/{workspace_id}/documents/{reference}/resolved")
    async def get_resolved_document(
        workspace_id: UUID,
        reference: str,
        user_id: UUID = None,
        db: AsyncSession = Depends(get_async_db)
    ):
        document_service = DocumentService(workspace_id, db)
        return document_service.resolve_document_references(reference, workspace_id, user_id)
  4. Enhance record APIs with reference context:

    • Include resolved reference data in record responses
    • Add reference validation before saving records
    • Provide reference metadata for frontend components

Frontend Changes

  1. Refactor useReferenceTablesViewModel to use backend APIs:

    export const useReferenceTablesViewModel = (props: { tableJSON: TableData }) => {
        const { state: workspace } = useWorkspace();
        
        const resolveReferences = async (tableData: TableData): Promise<TableData> => {
            if (!workspace?.id) return tableData;
            
            const response = await documentService.resolveTableReferences(
                workspace.id,
                tableData.toJSON()
            );
            
            return new TableData(
                response.data,
                response.schema,
                response.reference
            );
        };
        
        const getResolvedDocument = async (reference: string): Promise<ResolvedDocument> => {
            if (!workspace?.id) return null;
            
            return await documentService.getResolvedDocument(workspace.id, reference);
        };
        
        return {
            resolveReferences,
            getResolvedDocument,
            // ... other methods simplified using backend APIs
        };
    };
  2. Simplify table rendering components:

    • Remove complex client-side reference resolution logic
    • Use resolved data from backend APIs
    • Add error handling for reference resolution failures
    • Implement caching for resolved references
  3. Enhance multi-user reference handling:

    • Display reference conflicts between users
    • Show resolution history and user context
    • Provide UI for reference conflict resolution
    • Enable collaborative reference editing
  4. Improve reference management UI:

    // New component: ReferenceResolver.vue
    export default {
        props: {
            tableData: Object,
            workspaceId: String,
        },
        data() {
            return {
                resolvedData: null,
                loading: false,
                errors: [],
            };
        },
        async mounted() {
            await this.resolveReferences();
        },
        methods: {
            async resolveReferences() {
                this.loading = true;
                try {
                    this.resolvedData = await documentService.resolveTableReferences(
                        this.workspaceId,
                        this.tableData
                    );
                } catch (error) {
                    this.errors.push(error.message);
                } finally {
                    this.loading = false;
                }
            },
          },
      };

Performance and Caching

  1. Implement reference resolution caching:

    • Cache resolved references at the workspace level
    • Invalidate cache when referenced records change
    • Use Redis or similar for distributed caching
  2. Optimize reference queries:

    • Batch reference resolution requests
    • Use database joins for efficient reference lookup
    • Implement lazy loading for large reference datasets

Related Files

  • extralit/argilla-server/src/argilla_server/services/SchemaService.py - Enhanced reference resolution
  • extralit/argilla-server/src/argilla_server/services/DocumentService.py - Cross-dataset reference handling
  • extralit/argilla-server/src/argilla_server/api/handlers/v1/references/ - New reference endpoints
  • extralit/argilla-frontend/components/base/base-render-table/useReferenceTablesViewModel.ts - Simplified frontend logic
  • extralit/argilla-frontend/components/features/reference-resolution/ - New reference UI components
  • extralit/argilla-frontend/v1/infrastructure/services/DocumentService.ts - Document API client

Acceptance Criteria

  • Reference resolution has clear ownership in the backend using SchemaService
  • Workspace schema configuration drives reference resolution logic
  • Backend provides efficient APIs for reference resolution with proper caching
  • Frontend reference handling is simplified and uses backend APIs
  • Multi-user scenarios are properly supported with conflict resolution
  • Reference validation ensures data integrity across tables
  • Performance is optimized with appropriate caching strategies
  • Cross-dataset reference resolution works correctly
  • UI provides clear feedback for reference resolution status and errors
  • The system maintains backward compatibility with existing data
  • Integration tests verify reference resolution functionality
  • Error handling provides meaningful feedback to users

Related Issues

This is part of the strategic workspace-level schema management enhancement:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions