When running cbrkit.retrieval.apply_queries(...) it returns a Result object containing the results for each query (in result.queries).
For each query it stores a QueryResultStep not only the similarities but also the whole case base. For this reason, the same case base is stored multiple times. When dumping the results using a dumper it produces large files due to the repetition of this information.
Potential solutions
- Store the case base outside the QueryResultStep once. If several case bases can be introduced in one result (due to
apply_batches) then store a Mapping of case bases and keep the casebase attribute in QueryResultStep as a reference to that mapping.