Skip to content

Discussion: content management for the site #368

@jmillanacosta

Description

@jmillanacosta

As discussed today at the end of the SCRUM and related to #221 (knowledge graph / DB) and #220 (MCP / LLM access).

The aim is to solve these issues:

  • Non-technical contributors struggling to contribute or keep up with version control
  • Not having a proper database and way to interact with it for programmatic access, agents, generating the pages cleanly, etc.

We have a patchy implementation with data being fetched server or client side from different sources.

Server side:

  • BioStudies API queries.
  • Zenodo API queries.
  • GitHub raw CDN app.py fetches service_index.json and methods_index.json from VHP4Safety/cloud on every request. Individual tool and method pages then make a second fetch to cloud.vhp4safety.nl/service/<id>.json and VHP4Safety/cloud/docs/methods/<id>.json respectively.

Client side:

  • GitHub raw CDN case study pages fetch their content from VHP4Safety/ui-casestudy-config via JavaScript at each page load. The Jinja template is an empty shell, nothing is server-rendered.
  • CompoundWiki SPARQL compound pages call /get_compound_* Flask endpoints from JS, which then query compoundcloud.wikibase.cloud. The page is still rendered client-side.

With @johannehouweling more recent integrations we have classes for BioStudies and Zenodo with defined data models, I propose to keep working in this way and integrate it all into a schema + database. If we want to extend to a clean data model on which to build things like a REST API, an MCP, Knowledge Graphs (eg via an RDF export), etc we need to fix how we reference or expose our data elsewhere.

That's why I'd like to discuss an update to our stack where:

a) We have a database with a defined schema
Define an explicit object model for VHP data: tools, methods, case studies, compounds, datasets. We already have an implicit model for all the content in the platform, and explicit for Zenodo and BioStudies, but still scattered. Here we can distinguish between platform-managed content (tools, methods, case studies...) which for sure should be stored and owned internally; and external data (Zenodo, BioStudies, compounds) which could be ingested for caching, exports, API access, etc instead of being queried to the external sources at runtime always.

b) Routes and templates do what they're supposed to do
Flask routes query the internal data layer and pass structured data to Jinja templates, templates render it server-side. We get rid of the JS scripts fetching JSON from the cloud repo at runtime or HTML inside JSON files, and instead have the cloud data in the database. This also fixes the SEO / accessibility gap from client-side rendering.

c) Non-technical contributors can add data through a UI
An /admin panel where an authorized editor fills in a markdown/form generated from each content type's object model and update the database.

d) A cleaner Flask API that exposes all the VHP data
Once the data model and storage are in place, a consistent API on top of it becomes more straightforward, so we remove most logic from the flask app and delegate it to each object model's methods. A REST API could also be built on top of the clean data model + database.

e) MCP, agents, whatever is best nowadays Build the adequate service layer on top of the properly structured data model and their database methods. @senseibelbi

Questions

  • What's the right scope for the data model? Platform-managed content only (tools, methods, case studies), or do we also cache/index external data (BioStudies, Zenodo, compoundwiki)
  • Once the DB and admin UI exist, do we need to restructure/refactor cloud and ui-casestudy-config?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions