Skip to content

Conversation

@caviri
Copy link
Member

@caviri caviri commented Nov 12, 2025

Note

Add JSON→JSON-LD conversion and a full agents-based enrichment pipeline with supporting data models, API/analysis modules, utilities, tooling, and docs.

  • Core conversion and CLI:
    • Add scripts/convert_json_jsonld.py and docs for JSON→JSON-LD conversion, plus Tentris upload scripts (scripts/upload_all_to_tentris.sh, scripts/test_tentris_upload.sh).
  • Agents and enrichment:
    • Introduce enrichment agents for academic catalog, organizations, users, and linked entities (src/agents/**), including structured outputs, classifiers, context compilers, prompts, and validation.
    • Add EPFL-specific assessment/checkers and URL validation utilities for agent workflows.
  • Data models:
    • Implement Pydantic models and JSON-LD mapping across entities (src/data_models/**) with conversion utilities.
  • API and analysis:
    • Add src/api.py and analysis modules for users, orgs, repositories (src/analysis/**).
    • Add context providers for sources (src/context/**).
  • LLM and config:
    • Provide src/llm/model_config.py and agent management setup.
  • Utilities:
    • Add enhanced logging, token counting, URL utils, and general helpers (src/utils/**).
  • Tooling and docs:
    • Add devcontainer config, justfile, pyproject.toml, and extensive docs under docs/ and .cursor/rules/.
  • Tests:
    • Add caching test tests/test_cache.py.

Written by Cursor Bugbot for commit 0df57dc. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@caviri
Copy link
Member Author

caviri commented Nov 12, 2025

Hi @rmfranken, I can see the conversion for repositories what about the conversion for user, and organization?

@rmfranken
Copy link

Here are the conversion commands for each type - they are pretty much the same:
A person:
python scripts/convert_json_jsonld.py to-jsonld MalloryWittwer.json output.jsonld --base-url https://github.com/MalloryWittwer

A organization:
python scripts/convert_json_jsonld.py to-jsonld sdsc-ordes.json sdsc-ordes.jsonld --base-url https://github.com/sdsc-ordes
A software:
python scripts/convert_json_jsonld.py to-jsonld DeepLabCutDeepLabCut.json DeepLabCutDeepLabCut.jsonld --base-url https://github.com/DeepLabCut/DeepLabCut

The script auto-detects what kind of entity it's dealing with and processes it accordingly. The base-url is a bit too safe maybe - but we can see how easy it is to inject that into the call that we build.

@rmfranken
Copy link

I think I want to make the URI's hashed - not blank nodes. Tentris is not dealing super well with them - not sure why - will investigate tomorrow.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

org.linkedEntities = enrichment_data.organization_relations[org.legalName]
else:
org.academicCatalogRelations = []
org.linkedEntities = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Data Identity Crisis: Authors Become Organizations

The organization-level enrichment loop iterates over self.data.author instead of the correct organizations field. This causes the code to process authors as if they were organizations, attempting to access org.legalName on Person objects which don't have that attribute. The loop should iterate over self.data.relatedToOrganizations instead.

Fix in Cursor Fix in Web

org.linkedEntities = enrichment_data.organization_relations[org.legalName]
else:
org.academicCatalogRelations = []
org.linkedEntities = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Enrichment Loop Confuses Authors and Organizations

The organization-level enrichment loop iterates over self.data.author instead of the correct organizations field. This causes the code to process authors as if they were organizations, attempting to access org.legalName on Person objects which don't have that attribute. The loop should iterate over self.data.relatedToOrganizations instead.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants