From 814b1e246b2f2ef65673c7f5ce3058d344ecd591 Mon Sep 17 00:00:00 2001 From: Matthias Gille Levenson Date: Wed, 17 Sep 2025 18:18:55 +0200 Subject: [PATCH] Add info on input data --- README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/README.md b/README.md index 2a5ef74..a10ad8c 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,22 @@ Two main functions are implemented: - match, for checking if some pattern exists in a corpus (stops at first match). Returns a boolean - findall, for finding the position of all matching tokens. Returns a list of tuples, with start and end position. +The corpus should take the form of a list of dictionnaries: + +```json +[ + {"word": "Da", + "lemma": "dar", + "pos": "VERB", + "morph": "Mood=Imp|Number=Sing|Person=2|Polite=Infm|VerbForm=Fin"}, + {"word": "paz", + "lemma": "paz", + "pos": "NOUN", + "morph": "Gender=Masc|Number=Sing"} +] +``` + + ```python import sys import corpus_query_language as CQL