From ff6aa011d232d8d2cdbd204e38b8e35597524fde Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 13 Apr 2026 00:34:04 +0000 Subject: [PATCH 1/3] Add MATLAB-Python bridge YAML files and porting instructions Introduce four bridge YAML files (following NDI-python pattern) that map every DID-matlab source file to its Python counterpart, tracking sync status via matlab_last_sync_hash. Also add PORTING_INSTRUCTIONS.md with drift-check commands, porting conventions, and current sync status. All recent MATLAB changes (binary-mode fixes, fread/fwrite updates) were verified to already be handled correctly by Python's Fileobj.fopen() which appends 'b' to mode strings automatically. https://claude.ai/code/session_015g95pQvxDUnTWxHYmTnreG --- PORTING_INSTRUCTIONS.md | 187 +++++ src/did/did_matlab_python_bridge.yaml | 643 ++++++++++++++++++ src/did/did_matlab_python_bridge_file.yaml | 479 +++++++++++++ ..._matlab_python_bridge_implementations.yaml | 283 ++++++++ src/did/did_matlab_python_bridge_util.yaml | 336 +++++++++ 5 files changed, 1928 insertions(+) create mode 100644 PORTING_INSTRUCTIONS.md create mode 100644 src/did/did_matlab_python_bridge.yaml create mode 100644 src/did/did_matlab_python_bridge_file.yaml create mode 100644 src/did/did_matlab_python_bridge_implementations.yaml create mode 100644 src/did/did_matlab_python_bridge_util.yaml diff --git a/PORTING_INSTRUCTIONS.md b/PORTING_INSTRUCTIONS.md new file mode 100644 index 0000000..4bce2ac --- /dev/null +++ b/PORTING_INSTRUCTIONS.md @@ -0,0 +1,187 @@ +# DID MATLAB-to-Python Porting Instructions + +This document describes how to keep DID-python synchronized with DID-matlab +using the YAML bridge files. + +## Bridge Files + +The bridge files live in `src/did/` and define the contract between MATLAB and +Python implementations: + +| Bridge file | Scope | +|---|---| +| `did_matlab_python_bridge.yaml` | Core classes: database, document, query, ido, documentservice, binarydoc | +| `did_matlab_python_bridge_implementations.yaml` | Implementation classes: sqlitedb, doc2sql, binarydoc_matfid | +| `did_matlab_python_bridge_file.yaml` | File I/O: fileobj, readonly_fileobj, binaryTable, utilities | +| `did_matlab_python_bridge_util.yaml` | Utilities: databaseSummary, compareDatabaseSummary, fun, datastructures, db, common | + +## Checking for Drift + +To check whether a MATLAB file has changed since the last Python sync, use the +`matlab_last_sync_hash` field from the bridge YAML: + +```bash +# For a single file: +git -C /path/to/DID-matlab log ..HEAD -- + +# Example: +git -C /path/to/DID-matlab log 205d34b..HEAD -- src/did/+did/+file/fileobj.m +``` + +If the command produces output, the MATLAB file has changed since the last port. + +### Bulk drift check + +Run this to check all bridge files at once: + +```bash +cd /path/to/DID-matlab +for yaml in /path/to/DID-python/src/did/did_matlab_python_bridge*.yaml; do + echo "=== $(basename $yaml) ===" + # Extract matlab_path and matlab_last_sync_hash pairs + python3 -c " +import yaml, sys +with open('$yaml') as f: + data = yaml.safe_load(f) +for section in ['classes', 'functions']: + for item in data.get(section, []): + path = item.get('matlab_path', '') + sync_hash = item.get('matlab_last_sync_hash', '') + name = item.get('name', '') + if path and sync_hash: + print(f'{name}|src/did/{path}|{sync_hash}') +" | while IFS='|' read name path hash; do + changes=$(git log --oneline "$hash"..HEAD -- "$path" 2>/dev/null) + if [ -n "$changes" ]; then + echo " DRIFT: $name ($path)" + echo "$changes" | sed 's/^/ /' + fi + done +done +``` + +## Porting a MATLAB Change to Python + +### Step 1: Identify the change + +```bash +git -C /path/to/DID-matlab log ..HEAD -- src/did/ +git -C /path/to/DID-matlab diff ..HEAD -- src/did/ +``` + +### Step 2: Locate the Python counterpart + +Use the bridge YAML to find `python_path` and `python_class` / `python_name`. + +### Step 3: Apply the change + +Follow these conventions when porting: + +| MATLAB | Python | +|---|---| +| `camelCase` method names | `snake_case` method names | +| `struct` | `dict` | +| `cell array` | `list` | +| `char` / `string` | `str` | +| `logical` | `bool` | +| `[]` (empty) | `None` or `[]` depending on context | +| `nargin`, `varargin` | `*args`, `**kwargs` | +| `arguments` block | Type hints + validation | +| Name-value pairs | `**kwargs` | +| 1-based indexing | 0-based indexing | + +### Step 4: Update the bridge YAML + +After porting, update the entry in the bridge YAML: + +1. Set `matlab_last_sync_hash` to the current MATLAB commit hash for that file: + ```bash + git -C /path/to/DID-matlab log -1 --format="%h" -- src/did/ + ``` +2. Remove `matlab_current_hash` and `out_of_sync` / `out_of_sync_reason` if present. +3. Update the `decision_log` with the sync date. + +### Step 5: Run symmetry tests + +```bash +# Python tests +pytest -m make_artifacts -v +pytest -m read_artifacts -v +``` + +If MATLAB is available, run the full 3-step symmetry cycle: +1. MATLAB `makeArtifacts` tests +2. Python `makeArtifacts` + `readArtifacts` tests +3. MATLAB `readArtifacts` tests + +## Bridge YAML Field Reference + +| Field | Required | Description | +|---|---|---| +| `name` | Yes | MATLAB function/class name | +| `type` | Yes | `class` or `function` | +| `matlab_path` | Yes | Path relative to `src/did/` in DID-matlab | +| `matlab_last_sync_hash` | Yes | Short SHA of the MATLAB commit last ported to Python | +| `matlab_current_hash` | No | Current MATLAB hash when out of sync (for tracking) | +| `python_path` | Yes | Path relative to `src/did/` in DID-python | +| `python_class` | If class | Python class name | +| `python_name` | If function | Python function name | +| `inherits_matlab` | No | MATLAB parent class(es) | +| `inherits_python` | No | Python parent class(es) | +| `out_of_sync` | No | `true` if MATLAB has diverged | +| `out_of_sync_reason` | No | Human-readable explanation of the divergence | +| `decision_log` | Yes | Sync status, dates, deviation rationale | +| `properties` | No | List of property mappings | +| `methods` | No | List of method mappings | + +## Adding a New MATLAB File + +When a new file is added to DID-matlab that needs a Python counterpart: + +1. Create the Python implementation following the conventions above. +2. Add an entry to the appropriate bridge YAML file. +3. Set `matlab_last_sync_hash` to the MATLAB commit that introduced the file. +4. Run symmetry tests to verify cross-language compatibility. + +## Current Sync Status + +As of 2026-04-13, the repositories are **in sync** for all core functionality. + +### Recently resolved (no Python changes needed) +The following MATLAB changes (March 29-31, 2026) were verified to already be +handled correctly by Python: + +- **fileobj / readonly_fileobj / binaryTable / binarydoc_matfid**: MATLAB + changed default permission `'r'` -> `'rb'` for Linux binary-mode + compatibility. Python's `Fileobj.fopen()` already appends `'b'` to the mode + string if not present (line 88-89 of `file.py`), so all files are opened in + binary mode regardless. **Behaviorally in sync.** +- **fileobj fread**: MATLAB changed default precision from `'char'` to + `'uint8'`. Python's `fread()` returns raw `bytes`, which is equivalent to + `uint8`. **No change needed.** +- **fileobj fwrite**: MATLAB updated permission check to allow `'r+'` mode. + Python relies on native file objects to reject writes on read-only files. + **No change needed.** +- **mustBeValidPermission**: MATLAB added binary-mode variants. Python's + `must_be_valid_permission()` already accepts `rb`, `wb`, `ab`, etc. + **Already in sync.** +- **sqlitedb**: MATLAB replaced `websave` with `ndi.cloud.api.files.getFile` + for URL downloads. This is MATLAB-ecosystem-specific; Python uses its own + download mechanism. **Not applicable to Python.** + +## Not Yet Ported from MATLAB + +These MATLAB features do not yet have Python counterparts: + +| MATLAB feature | Bridge file | Priority | +|---|---|---| +| `database.freeze_branch` | bridge.yaml | Low | +| `database.is_branch_editable` | bridge.yaml | Low | +| `database.display_branches` | bridge.yaml | Low | +| `database.exist_doc` | bridge.yaml | Medium | +| `database.close_doc` | bridge.yaml | Low | +| `document.validate` | bridge.yaml | Medium | +| `document.dependency_value_n` | bridge.yaml | Low | +| `document.add_dependency_value_n` | bridge.yaml | Low | +| `document.remove_dependency_value_n` | bridge.yaml | Low | +| `binaryTable` write methods | bridge_file.yaml | Medium | diff --git a/src/did/did_matlab_python_bridge.yaml b/src/did/did_matlab_python_bridge.yaml new file mode 100644 index 0000000..b40dc69 --- /dev/null +++ b/src/did/did_matlab_python_bridge.yaml @@ -0,0 +1,643 @@ +# did_matlab_python_bridge.yaml — Core DID classes +# The primary contract for the did namespace (core classes). +# +# Every public class / function in DID-MATLAB that has a Python counterpart +# is listed here together with: +# - matlab_path : location inside the MATLAB repo (relative to src/did/) +# - matlab_last_sync_hash : short SHA of the MATLAB commit last ported to Python +# - python_path : location inside the Python repo (relative to src/did/) +# - python_class : Python class name +# - decision_log : notes on deviations, sync dates, and rationale +# +# To check for drift, run: +# git -C log ..HEAD -- +# Any output means MATLAB has changed since the last port. + +project_metadata: + bridge_version: "1.0" + matlab_repo: "VH-Lab/DID-matlab" + python_repo: "VH-Lab/DID-python" + naming_policy: "Strict MATLAB Mirror — Python names use snake_case equivalents" + indexing_policy: "Semantic Parity (MATLAB 1-based, Python 0-based)" + +# ========================================================================= +# Classes +# ========================================================================= +classes: + + # ----------------------------------------------------------------------- + # did.database (abstract base class) + # ----------------------------------------------------------------------- + - name: database + type: class + matlab_path: "+did/database.m" + matlab_last_sync_hash: "26ad723" + python_path: "did/database.py" + python_class: "Database" + inherits: "abc.ABC" + + properties: + - name: connection + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: version + type_matlab: "any" + type_python: "None" + decision_log: "Placeholder in both. Synchronized 2026-03-15." + + - name: current_branch_id + type_matlab: "string" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: frozen_branch_ids + type_matlab: "cell" + type_python: "list" + decision_log: > + MATLAB uses cell array, Python uses list. + Synchronized 2026-03-15. + + - name: dbid + type_matlab: "any (transient)" + type_python: "None" + decision_log: "Transient handle in both. Synchronized 2026-03-15." + + - name: debug + type_matlab: "logical" + type_python: "bool" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: preferences + type_matlab: "(not present)" + type_python: "dict" + decision_log: > + Python-only property for configuration. Not in MATLAB. + + methods: + - name: database + kind: constructor + input_arguments: + - name: connection + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: database(connection). Python: Database(connection, **kwargs). + Synchronized 2026-03-15. + + - name: open + input_arguments: [] + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: close + input_arguments: [] + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: all_branch_ids + input_arguments: [] + output_arguments: + - name: branch_ids + type_matlab: "cell" + type_python: "list[str]" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: add_branch + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + - name: parent_branch_id + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: set_branch + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: get_branch + input_arguments: [] + output_arguments: + - name: branch_id + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: get_branch_parent + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: parent_id + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: get_sub_branches + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: sub_branch_ids + type_python: "list[str]" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: delete_branch + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: freeze_branch + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "(not implemented)" + decision_log: > + Present in MATLAB. Not yet ported to Python. + + - name: is_branch_editable + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "(not implemented)" + decision_log: > + Present in MATLAB. Not yet ported to Python. + + - name: display_branches + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "(not implemented)" + decision_log: > + Present in MATLAB. Not yet ported to Python. + Display/visualization helper. + + - name: all_doc_ids + input_arguments: [] + output_arguments: + - name: doc_ids + type_python: "list[str]" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: get_doc_ids + input_arguments: + - name: branch_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: doc_ids + type_python: "list[str]" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: add_docs + input_arguments: + - name: document_objs + type_matlab: "cell of did.document" + type_python: "list[Document]" + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB has options: OnDuplicate, Validate. + Python accepts **kwargs. Synchronized 2026-03-15. + + - name: get_docs + input_arguments: + - name: document_ids + type_matlab: "cell | char" + type_python: "list[str] | str" + - name: OnMissing + type_matlab: "char name-value" + type_python: "str" + default: "'error'" + output_arguments: + - name: documents + type_python: "list[Document]" + decision_log: > + MATLAB returns cell array of did.document; Python returns list. + OnMissing behavior is the same. Synchronized 2026-03-16. + + - name: remove_docs + input_arguments: + - name: document_ids + type_matlab: "cell | char" + type_python: "list[str] | str" + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: search + input_arguments: + - name: query_obj + type_matlab: "did.query" + type_python: "Query" + - name: branch_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: doc_ids + type_python: "list[str]" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: run_sql_query + input_arguments: + - name: query_str + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: run_sql_query(query_str, returnStruct). + Python: do_run_sql_query(query_str, params). + Renamed in Python to match abstract method convention. + Synchronized 2026-03-15. + + - name: open_doc + input_arguments: + - name: document_id + type_matlab: "char" + type_python: "str" + - name: filename + type_matlab: "char" + type_python: "str" + output_arguments: + - name: file_obj + type_python: "ReadOnlyFileobj" + decision_log: > + Present in MATLAB database.m. Python SQLiteDB.open_doc returns + ReadOnlyFileobj. Synchronized 2026-03-15. + + - name: exist_doc + input_arguments: + - name: document_id + type_matlab: "char" + type_python: "(not implemented in base)" + - name: filename + type_matlab: "char" + type_python: "(not implemented in base)" + decision_log: > + Present in MATLAB as exist_doc(). Not yet in Python base class. + SQLiteDB has partial support via check_exist_doc pattern. + + - name: close_doc + input_arguments: + - name: file_obj + type_matlab: "did.file.fileobj" + type_python: "(not implemented)" + decision_log: > + Present in MATLAB. Python relies on garbage collection / fclose. + + decision_log: > + Python Database class closely mirrors MATLAB did.database. + A few MATLAB methods (freeze_branch, is_branch_editable, + display_branches, exist_doc, close_doc) are not yet ported. + Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.document + # ----------------------------------------------------------------------- + - name: document + type: class + matlab_path: "+did/document.m" + matlab_last_sync_hash: "13463d1" + python_path: "did/document.py" + python_class: "Document" + + properties: + - name: document_properties + type_matlab: "struct" + type_python: "dict" + decision_log: "Exact match. Synchronized 2026-03-16." + + methods: + - name: document + kind: constructor + input_arguments: + - name: document_type + type_matlab: "char" + type_python: "str" + default: "'base'" + decision_log: > + MATLAB: document(document_type, options...). + Python: Document(document_type, **kwargs). + Python also supports dot-notation kwargs for nested properties. + Synchronized 2026-03-16. + + - name: id + input_arguments: [] + output_arguments: + - name: uid + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-16." + + - name: setproperties + input_arguments: + - name: kwargs + type_matlab: "name-value pairs" + type_python: "**kwargs" + decision_log: > + MATLAB: setproperties(Name, Value, ...). + Python: set_properties(**kwargs). Snake-case rename. + Synchronized 2026-03-16. + + - name: dependency_value + input_arguments: + - name: dependency_name + type_matlab: "char" + type_python: "str" + output_arguments: + - name: value + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-16." + + - name: set_dependency_value + input_arguments: + - name: dependency_name + type_matlab: "char" + type_python: "str" + - name: value + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-16." + + - name: dependency_value_n + decision_log: > + Present in MATLAB for numbered dependency lists. + Not yet ported to Python. + + - name: add_dependency_value_n + decision_log: > + Present in MATLAB. Not yet ported to Python. + + - name: remove_dependency_value_n + decision_log: > + Present in MATLAB. Not yet ported to Python. + + - name: add_file + input_arguments: + - name: name + type_matlab: "char" + type_python: "str" + - name: location + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB has options: ingest, delete_original, location_type. + Python signature: add_file(filename, location). + Synchronized 2026-03-16. + + - name: remove_file + input_arguments: + - name: name + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-16." + + - name: is_in_file_list + input_arguments: + - name: name + type_matlab: "char" + type_python: "str" + output_arguments: + - name: result + type_python: "tuple(bool, dict, int)" + decision_log: "Exact match. Synchronized 2026-03-16." + + - name: validate + input_arguments: + - name: did_database + type_matlab: "did.database" + type_python: "(not implemented)" + decision_log: > + Present in MATLAB. Not yet ported to Python. + + - name: plus + decision_log: > + MATLAB operator overload for merging documents. + Not ported to Python (use set_properties instead). + + - name: eq + decision_log: > + MATLAB operator overload for comparing document IDs. + Not ported to Python (use doc.id() == other.id()). + + - name: readblankdefinition + kind: static + decision_log: > + MATLAB: document.readblankdefinition(jsonfilelocationstring). + Python: Document.read_blank_definition(json_file_location_string). + Synchronized 2026-03-16. + + decision_log: > + Core functionality is synchronized. MATLAB has additional + numbered dependency methods (dependency_value_n, etc.) and + schema validation (validate) not yet ported. Synchronized 2026-03-16. + + # ----------------------------------------------------------------------- + # did.query + # ----------------------------------------------------------------------- + - name: query + type: class + matlab_path: "+did/query.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/query.py" + python_class: "Query" + + properties: + - name: searchstructure + type_matlab: "struct array" + type_python: "(internal)" + decision_log: > + MATLAB exposes searchstructure as a public property. + Python stores it internally; access via to_search_structure(). + Synchronized 2026-03-15. + + methods: + - name: query + kind: constructor + input_arguments: + - name: field + type_matlab: "char" + type_python: "str" + - name: op + type_matlab: "char" + type_python: "str" + - name: param1 + type_matlab: "any" + type_python: "any" + - name: param2 + type_matlab: "any" + type_python: "any" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: and + decision_log: > + MATLAB: A & B. Python: A & B (via __and__). + Exact match. Synchronized 2026-03-15. + + - name: or + decision_log: > + MATLAB: A | B. Python: A | B (via __or__). + Exact match. Synchronized 2026-03-15. + + - name: to_searchstructure + decision_log: > + MATLAB: to_searchstructure(). Python: to_search_structure(). + Snake-case rename. Synchronized 2026-03-15. + + decision_log: > + Query operations are fully synchronized. All MATLAB query + operations (regexp, exact_string, depends_on, isa, etc.) + are supported in Python. Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.ido + # ----------------------------------------------------------------------- + - name: ido + type: class + matlab_path: "+did/ido.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/ido.py" + python_class: "IDO" + + properties: + - name: identifier + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + methods: + - name: ido + kind: constructor + input_arguments: + - name: id_value + type_matlab: "char (optional)" + type_python: "str (optional)" + decision_log: > + MATLAB generates 33-char hex ID. Python generates UUID4. + Both produce globally unique identifiers with different formats. + Synchronized 2026-03-15. + + - name: id + input_arguments: [] + output_arguments: + - name: identifier + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: unique_id + kind: static + decision_log: > + MATLAB: 33-char hex ID. Python: UUID4. + Synchronized 2026-03-15. + + - name: isvalid + kind: static + input_arguments: + - name: id_value + type_matlab: "char" + type_python: "str" + output_arguments: + - name: b + type_python: "bool" + decision_log: > + MATLAB: ido.isvalid(id). Python: IDO.is_valid(id). + Snake-case rename. Synchronized 2026-03-15. + + decision_log: > + ID format differs (MATLAB hex vs Python UUID4) but both guarantee + uniqueness. Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.documentservice (abstract) + # ----------------------------------------------------------------------- + - name: documentservice + type: class + matlab_path: "+did/documentservice.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/documentservice.py" + python_class: "DocumentService" + inherits: "abc.ABC" + + methods: + - name: documentservice + kind: constructor + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: newdocument + kind: abstract + decision_log: > + MATLAB: newdocument(). Python: new_document(). + Snake-case rename. Synchronized 2026-03-15. + + - name: searchquery + kind: abstract + decision_log: > + MATLAB: searchquery(). Python: search_query(). + Snake-case rename. Synchronized 2026-03-15. + + decision_log: "Exact match. Synchronized 2026-03-15." + + # ----------------------------------------------------------------------- + # did.binarydoc (abstract) + # ----------------------------------------------------------------------- + - name: binarydoc + type: class + matlab_path: "+did/binarydoc.m" + matlab_last_sync_hash: "13463d1" + python_path: "did/binarydoc.py" + python_class: "BinaryDoc" + inherits: "abc.ABC" + + methods: + - name: binarydoc + kind: constructor + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fopen + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fseek + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: ftell + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: feof + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fwrite + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fread + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fclose + kind: abstract + decision_log: "Exact match. Synchronized 2026-03-15." + + decision_log: "Exact match. Synchronized 2026-03-15." + +# ========================================================================= +# Not applicable in Python +# ========================================================================= +not_applicable: + - name: did.filesep + rationale: > + MATLAB-only utility. Python uses os.sep / pathlib natively. + + - name: did.toolboxdir + rationale: > + MATLAB-only utility for locating the DID toolbox directory. + Python uses importlib.resources or __file__. diff --git a/src/did/did_matlab_python_bridge_file.yaml b/src/did/did_matlab_python_bridge_file.yaml new file mode 100644 index 0000000..aaa6fa7 --- /dev/null +++ b/src/did/did_matlab_python_bridge_file.yaml @@ -0,0 +1,479 @@ +# did_matlab_python_bridge_file.yaml — File I/O classes and utilities +# Contract for did.file namespace. +# +# In MATLAB, the file module is split across multiple files in +did/+file/. +# In Python, these are consolidated into a single did/file.py module. +# +# To check for drift, run: +# git -C log ..HEAD -- + +project_metadata: + bridge_version: "1.0" + matlab_repo: "VH-Lab/DID-matlab" + python_repo: "VH-Lab/DID-python" + naming_policy: "Strict MATLAB Mirror — Python names use snake_case equivalents" + indexing_policy: "Semantic Parity (MATLAB 1-based, Python 0-based)" + +# ========================================================================= +# Classes +# ========================================================================= +classes: + + # ----------------------------------------------------------------------- + # did.file.fileobj + # ----------------------------------------------------------------------- + - name: fileobj + type: class + matlab_path: "+did/+file/fileobj.m" + matlab_last_sync_hash: "02c83c5" + python_path: "did/file.py" + python_class: "Fileobj" + inherits_matlab: "handle" + out_of_sync: false + out_of_sync_reason: > + MATLAB made three changes but Python is already correct: + 1. Default permission 'r' -> 'rb': Python Fileobj.fopen() already + appends 'b' if not present (line 88-89 of file.py), so files are + always opened in binary mode regardless of the permission string. + 2. fread default 'char' -> 'uint8': Python fread returns raw bytes, + equivalent to uint8. + 3. fwrite permission check for 'r+': Python relies on the native file + object to reject writes on read-only files. + MATLAB is now aligned with Python's existing behavior. + + properties: + - name: fullpathfilename + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fid + type_matlab: "double (MATLAB file ID)" + type_python: "file object" + decision_log: > + MATLAB uses numeric file ID from fopen(). + Python uses native file object. Semantic equivalent. + Synchronized 2026-03-15. + + - name: permission + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB default changed: 'r' -> 'rb' (commit 02c83c5). + Python default string is 'r' but fopen() appends 'b' automatically, + so effective mode is always binary. Behaviorally in sync. + + - name: machineformat + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: 'n' (native). Python: 'n'. + Used for endianness control. Synchronized 2026-03-15. + + methods: + - name: fileobj + kind: constructor + input_arguments: + - name: fullpathfilename + type_matlab: "char" + type_python: "str" + - name: permission + type_matlab: "char" + type_python: "str" + - name: machineformat + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: fileobj(Name=Value pairs). + Python: Fileobj(fullpathfilename, permission, machineformat). + Synchronized 2026-03-15. + + - name: setproperties + decision_log: > + MATLAB: setproperties(Name, Value, ...). + Python: set_properties(fullpathfilename, permission, machineformat). + Synchronized 2026-03-15. + + - name: fopen + input_arguments: + - name: permission + type_matlab: "char" + type_python: "str" + - name: machineformat + type_matlab: "char" + type_python: "str" + - name: filename + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fclose + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fseek + input_arguments: + - name: offset + type_matlab: "double" + type_python: "int" + - name: reference + type_matlab: "char ('bof'|'cof'|'eof')" + type_python: "int (0|1|2)" + decision_log: > + MATLAB uses string references ('bof','cof','eof'). + Python uses integer constants (0, 1, 2) matching os.SEEK_*. + Synchronized 2026-03-15. + + - name: ftell + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: frewind + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: feof + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fwrite + input_arguments: + - name: data + type_matlab: "any" + type_python: "bytes" + - name: precision + type_matlab: "char" + type_python: "(not used)" + - name: skip + type_matlab: "double" + type_python: "(not used)" + decision_log: > + MATLAB fwrite() now allows 'r+' permission (commit 02c83c5). + Previous check rejected anything starting with 'r'. + Python fwrite() relies on native file object to reject writes + on read-only files. No Python change needed. + + - name: fread + input_arguments: + - name: count + type_matlab: "double" + type_python: "int" + - name: precision + type_matlab: "char (default: 'uint8', was 'char')" + type_python: "(not used)" + decision_log: > + MATLAB fread() default precision changed: 'char' -> 'uint8' + (commit 02c83c5). Python fread returns raw bytes, which is + equivalent to uint8. No Python change needed. + + - name: fgetl + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fgets + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: ferror + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fscanf + decision_log: > + Present in MATLAB. Not ported to Python. + + - name: fprintf + decision_log: > + MATLAB fprintf() permission check updated to check only + first character of permission. Python does not implement + fprintf (use Python print/write). Not applicable. + + - name: fileparts + decision_log: > + MATLAB: [pathstr, name, ext] = fileparts(). + Python: fileparts() -> (str, str). Returns (dir, name). + Synchronized 2026-03-15. + + decision_log: > + MATLAB made three changes (binary mode default, fread precision, + fwrite permission check) in commits 6dd2f23, 2786be2, 02c83c5. + Python Fileobj already opens in binary mode (fopen appends 'b'), + fread returns raw bytes (equiv to uint8), and fwrite relies on + native file object for permission enforcement. Behaviorally in sync. + Updated matlab_last_sync_hash to current. Synchronized 2026-04-13. + + # ----------------------------------------------------------------------- + # did.file.readonly_fileobj + # ----------------------------------------------------------------------- + - name: readonly_fileobj + type: class + matlab_path: "+did/+file/readonly_fileobj.m" + matlab_last_sync_hash: "6dd2f23" + python_path: "did/file.py" + python_class: "ReadOnlyFileobj" + inherits_matlab: "did.file.fileobj" + inherits_python: "Fileobj" + out_of_sync: false + out_of_sync_reason: > + MATLAB changed default from 'r' to 'rb' (commit 6dd2f23). + Python ReadOnlyFileobj stores 'r' as default but the parent + Fileobj.fopen() appends 'b' automatically, so files are + always opened in binary mode. Behaviorally in sync. + + methods: + - name: readonly_fileobj + kind: constructor + input_arguments: + - name: options + type_matlab: "name-value pairs" + type_python: "fullpathfilename, machineformat" + decision_log: > + MATLAB: readonly_fileobj(Name=Value). + Python: ReadOnlyFileobj(fullpathfilename, machineformat). + Synchronized 2026-03-15. + + - name: fopen + input_arguments: + - name: permission + type_matlab: "char (forced to 'rb')" + type_python: "str (forced to 'r')" + decision_log: > + MATLAB now forces 'rb'. Python forces 'r' but parent + fopen() appends 'b', so effective mode is 'rb'. + Behaviorally in sync. + + decision_log: > + OUT OF SYNC. MATLAB enforces 'rb' for read-only files. + Python still uses 'r'. Should update Python to use 'rb' + for binary file consistency. + + # ----------------------------------------------------------------------- + # did.file.binaryTable + # ----------------------------------------------------------------------- + - name: binaryTable + type: class + matlab_path: "+did/+file/binaryTable.m" + matlab_last_sync_hash: "2786be2" + python_path: "did/file.py" + python_class: "BinaryTable" + inherits_matlab: "handle" + out_of_sync: false + out_of_sync_reason: > + MATLAB changed read permissions from 'r' to 'rb' (6 occurrences, + commit 2786be2). Python BinaryTable uses Fileobj which already + opens in binary mode (fopen appends 'b'). Behaviorally in sync. + + properties: + - name: file + type_matlab: "did.file.fileobj" + type_python: "Fileobj" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: recordType + type_matlab: "cell" + type_python: "str" + decision_log: > + MATLAB: cell array of type strings. + Python: single string. Simplified representation. + Synchronized 2026-03-15. + + - name: recordSize + type_matlab: "uint16 vector" + type_python: "list" + decision_log: "Equivalent. Synchronized 2026-03-15." + + - name: elementsPerColumn + type_matlab: "uint16 vector" + type_python: "int" + decision_log: > + MATLAB: vector (per-column). Python: single int. + Synchronized 2026-03-15. + + - name: headerSize + type_matlab: "uint16" + type_python: "int" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: hasLock + type_matlab: "logical" + type_python: "bool" + decision_log: "Exact match. Synchronized 2026-03-15." + + methods: + - name: binaryTable + kind: constructor + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: getSize + decision_log: > + MATLAB: getSize(). Python: get_size(). + Synchronized 2026-03-15. + + - name: readHeader + decision_log: > + MATLAB: readHeader(). Python: read_header(). + MATLAB now uses 'rb' permission for reading. + Synchronized 2026-03-15. + + - name: writeHeader + decision_log: > + MATLAB: writeHeader(). Python: write_header(). + Synchronized 2026-03-15. + + - name: readRow + decision_log: > + MATLAB: readRow(row, col). Python: read_row(row, col). + MATLAB now uses 'rb' permission for reading. + Synchronized 2026-03-15. + + - name: insertRow + decision_log: > + MATLAB: insertRow(). Python: (partial implementation). + MATLAB now uses 'rb' for read operations during insert. + Synchronized 2026-03-15. + + - name: deleteRow + decision_log: > + MATLAB: deleteRow(). Python: (not fully implemented). + Synchronized 2026-03-15. + + - name: writeEntry + decision_log: > + MATLAB: writeEntry(). Python: (not fully implemented). + Synchronized 2026-03-15. + + - name: writeTable + decision_log: > + MATLAB: writeTable(). Python: (not fully implemented). + Synchronized 2026-03-15. + + - name: findRow + decision_log: > + MATLAB: findRow(). Python: (not fully implemented). + Synchronized 2026-03-15. + + decision_log: > + OUT OF SYNC. Python BinaryTable has basic read functionality but + is missing several write methods (insertRow, deleteRow, writeEntry, + writeTable, findRow). MATLAB also changed all read permissions to + 'rb'. Partial port — needs completion. + +# ========================================================================= +# File utility functions +# ========================================================================= +functions: + + - name: mustBeValidPermission + type: function + matlab_path: "+did/+file/mustBeValidPermission.m" + matlab_last_sync_hash: "2786be2" + python_path: "did/file.py" + python_name: "must_be_valid_permission" + out_of_sync: false + decision_log: > + MATLAB added 'b' (binary) mode variants (commit 2786be2). + Python must_be_valid_permission already accepts binary modes + (rb, wb, ab, r+b, w+b, a+b). In sync. Synchronized 2026-04-13. + + - name: checkout_lock_file + type: function + matlab_path: "+did/+file/checkout_lock_file.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "checkout_lock_file" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: release_lock_file + type: function + matlab_path: "+did/+file/release_lock_file.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "release_lock_file" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fileid_value + type: function + matlab_path: "+did/+file/fileid_value.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "fileid_value" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: filesepconversion + type: function + matlab_path: "+did/+file/filesepconversion.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "filesep_conversion" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: isfilepathroot + type: function + matlab_path: "+did/+file/isfilepathroot.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "is_filepath_root" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fullfilename + type: function + matlab_path: "+did/+file/fullfilename.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "full_filename" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: isurl + type: function + matlab_path: "+did/+file/isurl.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "is_url" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: readlines + type: function + matlab_path: "+did/+file/readlines.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "read_lines" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: str2text + type: function + matlab_path: "+did/+file/str2text.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "str_to_text" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: text2cellstr + type: function + matlab_path: "+did/+file/text2cellstr.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "text_to_cellstr" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: string2filestring + type: function + matlab_path: "+did/+file/string2filestring.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "string_to_filestring" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: mustBeValidMachineFormat + type: function + matlab_path: "+did/+file/mustBeValidMachineFormat.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/file.py" + python_name: "must_be_valid_machine_format" + decision_log: "Exact match. Synchronized 2026-03-15." + +# ========================================================================= +# Not applicable in Python +# ========================================================================= +not_applicable: + - name: did.file.fileCache + rationale: > + MATLAB fileCache class has a Python equivalent (FileCache in file.py) + but with simplified implementation. Basic structure matches. + + - name: did.file.dumbjsondb + rationale: > + MATLAB dumbjsondb has a Python equivalent (DumbJsonDB in file.py). + Both provide simple JSON-based document storage. diff --git a/src/did/did_matlab_python_bridge_implementations.yaml b/src/did/did_matlab_python_bridge_implementations.yaml new file mode 100644 index 0000000..2144e56 --- /dev/null +++ b/src/did/did_matlab_python_bridge_implementations.yaml @@ -0,0 +1,283 @@ +# did_matlab_python_bridge_implementations.yaml — Database implementations +# Contract for did.implementations namespace. +# +# To check for drift, run: +# git -C log ..HEAD -- + +project_metadata: + bridge_version: "1.0" + matlab_repo: "VH-Lab/DID-matlab" + python_repo: "VH-Lab/DID-python" + naming_policy: "Strict MATLAB Mirror — Python names use snake_case equivalents" + indexing_policy: "Semantic Parity (MATLAB 1-based, Python 0-based)" + +# ========================================================================= +# Classes +# ========================================================================= +classes: + + # ----------------------------------------------------------------------- + # did.implementations.sqlitedb + # ----------------------------------------------------------------------- + - name: sqlitedb + type: class + matlab_path: "+did/+implementations/sqlitedb.m" + matlab_last_sync_hash: "205d34b" + matlab_current_hash: "926c430" + python_path: "did/implementations/sqlitedb.py" + python_class: "SQLiteDB" + inherits_matlab: "did.database" + inherits_python: "Database" + out_of_sync: true + out_of_sync_reason: > + MATLAB replaced websave() with ndi.cloud.api.files.getFile() for + URL file downloads (commit 926c430, 2026-03-30). This is a + MATLAB-specific change (uses NDI cloud API) and does not require + a direct Python port, but Python should have equivalent URL + download reliability. + + properties: + - name: FileDir + type_matlab: "char" + type_python: "(derived from connection path)" + decision_log: > + MATLAB stores FileDir explicitly. Python derives file storage + paths from the database connection string. Synchronized 2026-03-15. + + - name: fields_cache + type_matlab: "cell (protected)" + type_python: "dict (private)" + decision_log: > + Both cache field name-to-index mappings for performance. + Synchronized 2026-03-15. + + methods: + - name: sqlitedb + kind: constructor + input_arguments: + - name: filename + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: sqlitedb(filename). Python: SQLiteDB(filename). + Both create/open a SQLite database file. + Synchronized 2026-03-15. + + - name: do_run_sql_query + input_arguments: + - name: query_str + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB: do_run_sql_query(query_str, varargin). + Python: do_run_sql_query(query_str, params=()). + Python uses parameterized queries. Synchronized 2026-03-15. + + - name: do_add_doc + input_arguments: + - name: document_obj + type_matlab: "did.document" + type_python: "Document" + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: > + Both insert document JSON and populate field tables via doc2sql. + Synchronized 2026-03-15. + + - name: do_get_doc + input_arguments: + - name: document_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: document + type_python: "Document" + decision_log: > + Both retrieve document JSON from docs table and reconstruct + did.document / Document object. Synchronized 2026-03-15. + + - name: do_remove_doc + input_arguments: + - name: document_id + type_matlab: "char" + type_python: "str" + - name: branch_id + type_matlab: "char" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: search + input_arguments: + - name: query_obj + type_matlab: "did.query" + type_python: "Query" + - name: branch_id + type_matlab: "char" + type_python: "str" + output_arguments: + - name: doc_ids + type_python: "list[str]" + decision_log: > + Both use SQL-based search with fallback to brute-force for + complex queries. Synchronized 2026-03-15. + + - name: get_docs + decision_log: > + Python adds bulk get_docs with OnMissing parameter. + Synchronized 2026-03-16. + + - name: get_docs_by_branch + decision_log: > + Python-only convenience method. Returns all documents in a + branch. Added 2026-03-16. + + - name: open_doc + input_arguments: + - name: doc_id + type_matlab: "char" + type_python: "str" + - name: filename + type_matlab: "char" + type_python: "str" + output_arguments: + - name: file_obj + type_matlab: "did.file.readonly_fileobj" + type_python: "ReadOnlyFileobj" + decision_log: > + Both return a read-only file object. Synchronized 2026-03-15. + + - name: open_db / close_db + decision_log: > + MATLAB opens/closes mksqlite connection per operation. + Python keeps sqlite3 connection open for session lifetime. + Behavioral difference by design. Synchronized 2026-03-15. + + decision_log: > + Core database operations are synchronized. MATLAB recently changed + URL download mechanism (websave -> ndi.cloud.api.files.getFile) which + is MATLAB-ecosystem-specific. Python should verify its own URL download + path is robust. Last full sync: 2026-03-15. + + # ----------------------------------------------------------------------- + # did.implementations.doc2sql (function module) + # ----------------------------------------------------------------------- + - name: doc2sql + type: function + matlab_path: "+did/+implementations/doc2sql.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/implementations/doc2sql.py" + python_name: "doc_to_sql" + + methods: + - name: doc2sql / doc_to_sql + input_arguments: + - name: doc + type_matlab: "did.document" + type_python: "Document" + output_arguments: + - name: sql_metadata + type_matlab: "struct array" + type_python: "list[dict]" + decision_log: > + Both extract document properties into SQL-compatible meta-tables. + Python handles both DID-python and MATLAB/NDI document formats + for cross-language compatibility. Synchronized 2026-03-25. + + decision_log: > + doc2sql contains the critical MATLAB-Python bridge logic for + serializing documents to SQL. Python version handles both native + and MATLAB document_class formats. Synchronized 2026-03-25. + + # ----------------------------------------------------------------------- + # did.implementations.binarydoc_matfid + # ----------------------------------------------------------------------- + - name: binarydoc_matfid + type: class + matlab_path: "+did/+implementations/binarydoc_matfid.m" + matlab_last_sync_hash: "2786be2" + python_path: "did/implementations/binarydoc_matfid.py" + python_class: "BinaryDocMatfid" + inherits_matlab: "did.binarydoc & did.file.fileobj" + inherits_python: "BinaryDoc, Fileobj" + out_of_sync: false + out_of_sync_reason: > + MATLAB changed default permission from 'r' to 'rb' (commit 2786be2). + Python BinaryDocMatfid inherits Fileobj which already opens in + binary mode (fopen appends 'b'). Behaviorally in sync. + + properties: + - name: key + type_matlab: "any" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: doc_unique_id + type_matlab: "any" + type_python: "str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: machineformat + type_matlab: "char" + type_python: "str" + decision_log: > + MATLAB default: 'l' (little-endian). + Python default: 'l' (little-endian). Exact match. + Synchronized 2026-03-15. + + methods: + - name: binarydoc_matfid + kind: constructor + decision_log: > + MATLAB: binarydoc_matfid(fileProps, matfidProps). + Python: BinaryDocMatfid(key, doc_unique_id, **kwargs). + Synchronized 2026-03-15. + + - name: fclose + decision_log: > + MATLAB resets permission to 'rb' after close (changed from 'r'). + Python closes file handle. Synchronized 2026-03-15. + + decision_log: > + Binary mode change in MATLAB ('r' -> 'rb') aligns MATLAB with + Python's existing behavior. Python already opens binary files + correctly via Fileobj.fopen() binary-mode append. + Synchronized 2026-04-13. + + # ----------------------------------------------------------------------- + # did.implementations.sqldb (abstract intermediate class) + # ----------------------------------------------------------------------- + - name: sqldb + type: class + matlab_path: "+did/+implementations/sqldb.m" + matlab_last_sync_hash: "13463d1" + python_path: "(not separately implemented)" + python_class: "(merged into Database)" + + decision_log: > + MATLAB has an intermediate abstract class sqldb between database + and sqlitedb. Python merges this layer directly into Database + and SQLiteDB. The alldocids() method from sqldb is available + as Database.all_doc_ids() in Python. Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.implementations.matlabdumbjsondb + # ----------------------------------------------------------------------- + - name: matlabdumbjsondb + type: class + matlab_path: "+did/+implementations/matlabdumbjsondb.m" + python_path: "(not applicable)" + + decision_log: > + MATLAB-specific wrapper around dumbjsondb for MATLAB file I/O. + Python implements DumbJsonDB directly in file.py. + +# ========================================================================= +# Not applicable in Python +# ========================================================================= +not_applicable: + - name: ndi.cloud.api.files.getFile + rationale: > + MATLAB uses NDI cloud API for URL file downloads (added in sqlitedb.m + commit 926c430). Python should use requests or urllib for equivalent + functionality. Not a direct port target. diff --git a/src/did/did_matlab_python_bridge_util.yaml b/src/did/did_matlab_python_bridge_util.yaml new file mode 100644 index 0000000..6279880 --- /dev/null +++ b/src/did/did_matlab_python_bridge_util.yaml @@ -0,0 +1,336 @@ +# did_matlab_python_bridge_util.yaml — Utilities, graph functions, and helpers +# Contract for did.util, did.fun, did.datastructures, did.db, and did.common. +# +# To check for drift, run: +# git -C log ..HEAD -- + +project_metadata: + bridge_version: "1.0" + matlab_repo: "VH-Lab/DID-matlab" + python_repo: "VH-Lab/DID-python" + naming_policy: "Strict MATLAB Mirror — Python names use snake_case equivalents" + indexing_policy: "Semantic Parity (MATLAB 1-based, Python 0-based)" + +# ========================================================================= +# did.util — Database summary and comparison utilities +# ========================================================================= +functions: + + # ----------------------------------------------------------------------- + # did.util.databaseSummary + # ----------------------------------------------------------------------- + - name: databaseSummary + type: function + matlab_path: "+did/+util/databaseSummary.m" + matlab_last_sync_hash: "1966b9d" + python_path: "did/util/database_summary.py" + python_name: "database_summary" + + input_arguments: + - name: db + type_matlab: "did.database" + type_python: "Database" + output_arguments: + - name: summary + type_matlab: "struct" + type_python: "dict" + + decision_log: > + Both produce deterministic JSON-serializable summaries of a DID + database including branch names, hierarchy, and per-branch document + lists. Used by symmetry tests to compare MATLAB and Python databases. + Synchronized 2026-03-16. + + # ----------------------------------------------------------------------- + # did.util.compareDatabaseSummary + # ----------------------------------------------------------------------- + - name: compareDatabaseSummary + type: function + matlab_path: "+did/+util/compareDatabaseSummary.m" + matlab_last_sync_hash: "17a798b" + python_path: "did/util/compare_database_summary.py" + python_name: "compare_database_summary" + + input_arguments: + - name: summaryA + type_matlab: "struct | char (filepath) | did.database" + type_python: "dict" + - name: summaryB + type_matlab: "struct | char (filepath) | did.database" + type_python: "dict" + output_arguments: + - name: report + type_matlab: "cell of char" + type_python: "list[str]" + + decision_log: > + MATLAB accepts structs, file paths, or database objects as input + and converts internally. Python currently accepts only dict summaries. + Both compare branch names, hierarchy, document IDs, class names, + demo field values, and depends_on relationships. + Synchronized 2026-03-16. + +# ========================================================================= +# did.fun — Graph and dependency functions +# ========================================================================= + + # ----------------------------------------------------------------------- + # did.fun.docs2graph + # ----------------------------------------------------------------------- + - name: docs2graph + type: function + matlab_path: "+did/+fun/docs2graph.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/fun.py" + python_name: "docs_to_graph" + + input_arguments: + - name: document_objs + type_matlab: "cell of did.document" + type_python: "list[Document]" + output_arguments: + - name: graph + type_matlab: "[G, nodes, mdigraph]" + type_python: "nx.DiGraph" + + decision_log: > + MATLAB returns sparse adjacency matrix + node list + digraph. + Python returns networkx DiGraph directly (more Pythonic). + Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.fun.findalldependencies + # ----------------------------------------------------------------------- + - name: findalldependencies + type: function + matlab_path: "+did/+fun/findalldependencies.m" + matlab_last_sync_hash: "13463d1" + python_path: "did/fun.py" + python_name: "find_all_dependencies" + + input_arguments: + - name: graph + type_matlab: "did.database (DB)" + type_python: "nx.DiGraph" + - name: doc_ids + type_matlab: "repeating args (visited, docs)" + type_python: "list[str]" + output_arguments: + - name: dependencies + type_python: "list[str]" + + decision_log: > + MATLAB traverses database directly with visited tracking. + Python operates on a pre-built networkx graph. + Different approach, same result. Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.fun.finddocs_missing_dependencies + # ----------------------------------------------------------------------- + - name: finddocs_missing_dependencies + type: function + matlab_path: "+did/+fun/finddocs_missing_dependencies.m" + matlab_last_sync_hash: "13463d1" + python_path: "did/fun.py" + python_name: "find_docs_missing_dependencies" + + input_arguments: + - name: db + type_matlab: "did.database" + type_python: "Database" + - name: dependency_names + type_matlab: "repeating char args" + type_python: "*str" + output_arguments: + - name: docs + type_python: "list[Document]" + + decision_log: > + Exact match in behavior. Both search for documents with + depends_on references to non-existent documents. + Synchronized 2026-03-15. + + # ----------------------------------------------------------------------- + # did.fun.plotinteractivedocgraph + # ----------------------------------------------------------------------- + - name: plotinteractivedocgraph + type: function + matlab_path: "+did/+fun/plotinteractivedocgraph.m" + matlab_last_sync_hash: "13463d1" + python_path: "did/fun.py" + python_name: "plot_interactive_doc_graph" + + decision_log: > + MATLAB uses MATLAB plotting. Python uses matplotlib/networkx. + Both provide interactive document graph visualization. + Synchronized 2026-03-15. + +# ========================================================================= +# did.datastructures — Data structure utilities +# ========================================================================= + + - name: cell2str + type: function + matlab_path: "+did/+datastructures/cell2str.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "cell_to_str" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: celloritem + type: function + matlab_path: "+did/+datastructures/celloritem.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "cell_or_item" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: colvec + type: function + matlab_path: "+did/+datastructures/colvec.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "col_vec" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: emptystruct + type: function + matlab_path: "+did/+datastructures/emptystruct.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "empty_struct" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: eqemp + type: function + matlab_path: "+did/+datastructures/eqemp.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "eq_emp" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: eqlen + type: function + matlab_path: "+did/+datastructures/eqlen.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "eq_len" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: eqtot + type: function + matlab_path: "+did/+datastructures/eqtot.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "eq_tot" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: equnique + type: function + matlab_path: "+did/+datastructures/equnique.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "eq_unique" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: fieldsearch + type: function + matlab_path: "+did/+datastructures/fieldsearch.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "field_search" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: findclosest + type: function + matlab_path: "+did/+datastructures/findclosest.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "find_closest" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: isfullfield + type: function + matlab_path: "+did/+datastructures/isfullfield.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "is_full_field" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: jsonencodenan + type: function + matlab_path: "+did/+datastructures/jsonencodenan.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "json_encode_nan" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: sizeeq + type: function + matlab_path: "+did/+datastructures/sizeeq.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "size_eq" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: structmerge + type: function + matlab_path: "+did/+datastructures/structmerge.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/datastructures.py" + python_name: "struct_merge" + decision_log: "Exact match. Synchronized 2026-03-15." + +# ========================================================================= +# did.db — Database search utilities +# ========================================================================= + + - name: struct_name_value_search + type: function + matlab_path: "+did/+db/struct_name_value_search.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/db.py" + python_name: "struct_name_value_search" + decision_log: "Exact match. Synchronized 2026-03-15." + + - name: tableCrossJoin + type: function + matlab_path: "+did/+db/tableCrossJoin.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/db.py" + python_name: "table_cross_join" + decision_log: > + MATLAB: tableCrossJoin(t1, t2). + Python: table_cross_join(table1, table2, rename_conflicting_columns). + Python uses pandas DataFrame. Synchronized 2026-03-15. + +# ========================================================================= +# did.common — Path constants and cache +# ========================================================================= + + - name: PathConstants + type: class + matlab_path: "+did/+common/PathConstants.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/common.py" + python_class: "PathConstants" + decision_log: > + Both provide global path configuration for the DID toolkit. + Synchronized 2026-03-15. + + - name: getCache + type: function + matlab_path: "+did/+common/getCache.m" + matlab_last_sync_hash: "3aa892d" + python_path: "did/common.py" + python_name: "get_cache" + decision_log: "Exact match. Synchronized 2026-03-15." + +# ========================================================================= +# Not applicable in Python +# ========================================================================= +not_applicable: + - name: did.datastructures.table_cross_join (duplicate) + rationale: > + Also available in did.db.table_cross_join via pandas. + Both MATLAB locations (datastructures and db) are covered. From 72902bdb45938517e3f2872ace63dfc16ae4f814 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 13 Apr 2026 00:44:28 +0000 Subject: [PATCH 2/3] Fix symmetry CI: update MATLAB path from tests/ to tests_symmetry/ DID-matlab moved symmetry tests to tests_symmetry/ in commit b6a3a05, but DID-python's symmetry.yml still referenced DID-matlab/tests. This caused "Unable to find the did.symmetry.makeArtifacts namespace" errors. https://claude.ai/code/session_015g95pQvxDUnTWxHYmTnreG --- .github/workflows/symmetry.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/symmetry.yml b/.github/workflows/symmetry.yml index ce2dba1..4b0f8ca 100644 --- a/.github/workflows/symmetry.yml +++ b/.github/workflows/symmetry.yml @@ -57,7 +57,7 @@ jobs: with: command: | addpath(genpath("DID-matlab/src")); - addpath(genpath("DID-matlab/tests")); + addpath(genpath("DID-matlab/tests_symmetry")); addpath(genpath("DID-matlab/tools")); import matlab.unittest.TestRunner; import matlab.unittest.TestSuite; @@ -80,7 +80,7 @@ jobs: with: command: | addpath(genpath("DID-matlab/src")); - addpath(genpath("DID-matlab/tests")); + addpath(genpath("DID-matlab/tests_symmetry")); addpath(genpath("DID-matlab/tools")); import matlab.unittest.TestRunner; import matlab.unittest.TestSuite; From c8ba77014c11845b053085c3f96249b32f8fa708 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 13 Apr 2026 01:14:52 +0000 Subject: [PATCH 3/3] Fix symmetry CI: add DID-matlab/tests to path for test fixtures The symmetry tests depend on did.test.fixture.PathConstantFixture which lives in DID-matlab/tests/. Need both tests/ (for fixtures) and tests_symmetry/ (for symmetry test packages) on the MATLAB path, matching DID-matlab's own test-symmetry.yml workflow. https://claude.ai/code/session_015g95pQvxDUnTWxHYmTnreG --- .github/workflows/symmetry.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/symmetry.yml b/.github/workflows/symmetry.yml index 4b0f8ca..8ec073e 100644 --- a/.github/workflows/symmetry.yml +++ b/.github/workflows/symmetry.yml @@ -57,6 +57,7 @@ jobs: with: command: | addpath(genpath("DID-matlab/src")); + addpath(genpath("DID-matlab/tests")); addpath(genpath("DID-matlab/tests_symmetry")); addpath(genpath("DID-matlab/tools")); import matlab.unittest.TestRunner; @@ -80,6 +81,7 @@ jobs: with: command: | addpath(genpath("DID-matlab/src")); + addpath(genpath("DID-matlab/tests")); addpath(genpath("DID-matlab/tests_symmetry")); addpath(genpath("DID-matlab/tools")); import matlab.unittest.TestRunner;