Skip to content

Commit 1fbfeb0

Browse files
authored
CU-8699nk284 Update MedCAT service to v2 (#27)
* CU-8699nk284: Update requirements to v2 * CU-8699nk284: Add DeID requirement * CU-8699nk284: Update code to be in line with v2 * CU-8699nk284: Update README with v2 link * CU-8699nk284: Update README with v2 link (2nd place) * CU-8699nk284: Fix examples tutorial link * CU-8699nk284: Update model card stuff for v2 compatibility * CU-8699nk284: Avoid running docker hub push on pull requets * CU-8699nk284: Fix typo in import path * CU-8699nk284: Fix further typo * CU-8699nk284: Fix access path for MetaCAT config categery name * CU-8699nk284: Update to latest medcat v2 release * CU-8699nk284: Bump supported version to latest (to fix legacy CDB conversion) * U-8699nk284: Fix some config access * U-8699nk284: Fix some more config access * CU-8699nk284: Bump dependency to latest * CU-8699nk284: Update to latest v2 version * CU-8699nk284: Bump to latest medcat version again (this time finally, I hope) * CU-8699nk284: Fix config access * CU-8699nk284: Mock CDB load during test to use en_core_web_md instead of en_core_sci_lg * CU-8699nk284: [TEMP] Add debug output * CU-8699nk284: Add client test within mocked / changed spacy model context manager * CU-8699nk284: [TEMP] Add debug (more) output * CU-8699nk284: Mock a different method and more generally during testing * CU-8699nk284: Fix import during testing * CU-8699nk284: Remove (most of) the debug output * CU-8699nk284: Bump requirements to latest * CU-8699nk284: Update ultiprocessing method to v2 * CU-8699nk284: Fix keyword argument name * CU-8699nk284: Update multiprocessing * CU-8699nk284: Update to latest MedCAT version * CU-8699nk284: Update to latest requirements * CU-8699nk284: Use newer python in workflow * CU-8699nk284: Bump requirements to latest * CU-8699nk284: Bump requirements to latest * Revert "CU-8699nk284: Avoid running docker hub push on pull requets" This reverts commit 0f9cfc2. * CU-8699nk284: Fix Dockerfile for GitHub URL-based installs * CU-8699nk284: Fix hash attribute path
1 parent ed95e13 commit 1fbfeb0

File tree

6 files changed

+39
-28
lines changed

6 files changed

+39
-28
lines changed

.github/workflows/medcat-service_run-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
- name: Install Python 3
3030
uses: actions/setup-python@v5
3131
with:
32-
python-version: 3.9
32+
python-version: 3.11
3333
cache: 'pip' # caching pip dependencies
3434

3535
- name: Install dependencies

medcat-service/Dockerfile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ ENV CRYPTOGRAPHY_DONT_BUILD_RUST=1
66
WORKDIR /cat
77
COPY ./requirements.txt /cat
88

9+
# NOTE: need git for URL based installs
10+
RUN apt-get update && apt-get install -y git
11+
912
# Install Python dependencies
1013
ARG USE_CPU_TORCH=true
1114
# NOTE: Allow building without GPU so as to lower image size (GPU is disabled by default)

medcat-service/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Introduction
22

3-
This project implements the [MedCAT](https://github.com/CogStack/MedCAT/) NLP application as a service behind a REST API. The general idea is to be able send the text to MedCAT NLP service and receive back the annotations. The REST API is built using [Flask](https://flask.palletsprojects.com/).
3+
This project implements the [MedCAT](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/) NLP application as a service behind a REST API. The general idea is to be able send the text to MedCAT NLP service and receive back the annotations. The REST API is built using [Flask](https://flask.palletsprojects.com/).
44

55
Git Branches:
66
- devel: development branch, latest updates and features, might be unstable.
@@ -327,4 +327,4 @@ The main settings that can be used to improve the performance when querying larg
327327
## MedCAT library
328328
MedCAT parameters are defined in selected `envs/env_medcat*` file.
329329

330-
For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/MedCAT/).
330+
For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/).

medcat-service/medcat_service/nlp_processor/medcat_processor.py

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@
1010
from medcat.cat import CAT
1111
from medcat.cdb import CDB
1212
from medcat.config import Config
13-
from medcat.meta_cat import MetaCAT
14-
from medcat.utils.ner.deid import DeIdModel
13+
from medcat.config.config_meta_cat import ConfigMetaCAT
14+
from medcat.components.addons.meta_cat import MetaCATAddon
15+
from medcat.components.ner.trf.deid import DeIdModel
1516
from medcat.vocab import Vocab
1617

1718

@@ -188,7 +189,7 @@ def process_content_bulk(self, content):
188189
# use generators both to provide input documents and to provide resulting annotations
189190
# to avoid too many mem-copies
190191
invalid_doc_ids = []
191-
ann_res = []
192+
ann_res = {}
192193

193194
start_time_ns = time.time_ns()
194195

@@ -197,11 +198,14 @@ def process_content_bulk(self, content):
197198
ann_res = self.cat.deid_multi_texts(MedCatProcessor._generate_input_doc(content, invalid_doc_ids),
198199
redact=self.DEID_REDACT)
199200
else:
200-
ann_res = self.cat.multiprocessing_batch_char_size(
201-
MedCatProcessor._generate_input_doc(content, invalid_doc_ids), nproc=self.bulk_nproc)
202-
201+
text_input = MedCatProcessor._generate_input_doc(content, invalid_doc_ids)
202+
ann_res = {
203+
ann_id: res for ann_id, res in
204+
self.cat.get_entities_multi_texts(
205+
text_input, n_process=self.bulk_nproc)
206+
}
203207
except Exception as e:
204-
self.log.error(repr(e))
208+
self.log.error("Unable to process data", exc_info=e)
205209

206210
additional_info = {"elapsed_time": str((time.time_ns() - start_time_ns) / 10e8)}
207211

@@ -239,11 +243,12 @@ def _populate_model_card_info(self, config: Config):
239243
Args:
240244
config (Config): MedCAT configuration object.
241245
"""
242-
self.model_card_info["ontologies"] = config.version.ontology \
243-
if (isinstance(config.version.ontology, list)) else str(config.version.ontology)
244-
self.model_card_info["meta_cat_model_names"] = [i["Category Name"] for i in config.version.meta_cats] \
245-
if (isinstance(config.version.meta_cats, list)) else str(config.version.meta_cats)
246-
self.model_card_info["model_last_modified_on"] = str(config.version.last_modified)
246+
self.model_card_info["ontologies"] = config.meta.ontology \
247+
if (isinstance(config.meta.ontology, list)) else str(config.meta.ontology)
248+
self.model_card_info["meta_cat_model_names"] = [
249+
cnf.general.category_name for cnf in config.components.addons
250+
if (isinstance(cnf, ConfigMetaCAT))]
251+
self.model_card_info["model_last_modified_on"] = str(config.meta.last_saved)
247252

248253
# helper MedCAT methods
249254
#
@@ -281,7 +286,7 @@ def _create_cat(self):
281286
cat.cdb.filter_by_cui(cuis_to_keep)
282287

283288
if self.app_model.lower() in ["", "unknown", "medmen"]:
284-
self.app_model = cat.config.version.id
289+
self.app_model = cat.config.meta.hash
285290

286291
self._populate_model_card_info(cat.config)
287292

@@ -305,13 +310,13 @@ def _create_cat(self):
305310
spacy_model = os.getenv("SPACY_MODEL", "")
306311

307312
if spacy_model != "":
308-
cdb.config.general["spacy_model"] = spacy_model
313+
cdb.config.general.nlp.modelname = spacy_model
309314
else:
310315
logging.warning("SPACY_MODEL environment var not set" +
311316
", attempting to load the spacy model found within the CDB : "
312-
+ cdb.config.general["spacy_model"])
317+
+ cdb.config.general.nlp.modelname)
313318

314-
if cdb.config.general["spacy_model"] == "":
319+
if cdb.config.general.nlp.modelname == "":
315320
raise ValueError("No SPACY_MODEL env var declared, the CDB loaded does not have a\
316321
spacy_model set in the config variable! \
317322
To solve this declare the SPACY_MODEL in the env_medcat file.")
@@ -330,18 +335,21 @@ def _create_cat(self):
330335
if os.getenv("APP_MODEL_META_PATH_LIST", None) is not None:
331336
self.log.debug("Loading META annotations ...")
332337
for model_path in os.getenv("APP_MODEL_META_PATH_LIST").split(":"):
333-
m = MetaCAT.load(model_path)
338+
m = MetaCATAddon.deserialise_from(model_path)
334339
meta_models.append(m)
335340

336-
if cat:
337-
meta_models.extend(cat._meta_cats)
341+
# if cat:
342+
# meta_models.extend(cat._meta_cats)
338343

339344
if self.app_model.lower() in [None, "unknown"]:
340-
self.app_model = cdb.config.version.id
345+
self.app_model = cdb.config.meta.hash
341346

342-
config.general["log_level"] = os.getenv("LOG_LEVEL", logging.INFO)
347+
config.general.log_level = os.getenv("LOG_LEVEL", logging.INFO)
343348

344-
cat = CAT(cdb=cdb, config=config, vocab=vocab, meta_cats=meta_models)
349+
cat = CAT(cdb=cdb, config=config, vocab=vocab)
350+
# add MetaCATs
351+
for mc in meta_models:
352+
cat.add_addon(mc)
345353

346354
self._populate_model_card_info(cat.config)
347355

medcat-service/models/examples/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## [example-medcat-v1-model-pack][(models/examples/example-medcat-v1-model-pack.zip)
44
- This model pack is built by running the MedCAT V1 Tutorial Part 3.1.
5-
- https://github.com/CogStack/MedCATtutorials/blob/5a07e4d77da404631cc16b47d3f1c6bd028de396/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb
5+
- https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v1-tutorials/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb
66

77
It isn't a trained model, but has the concepts "Kidney Failure" and "Failure of Kidneys" built in
88

medcat-service/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ setuptools==78.1.1
66
simplejson==3.19.3
77
werkzeug==3.1.3
88
setuptools-rust==1.11.0
9-
medcat==1.16.0
9+
medcat[meta-cat,spacy,deid] @ git+https://github.com/CogStack/cogstack-nlp.git@refs/tags/medcat/v0.13.5#subdirectory=medcat-v2
1010
# pinned because of issues with de-id models and past models (it will not do any de-id)
1111
transformers>=4.34.0,<5.0.0
12-
requests==2.32.4
12+
requests==2.32.4

0 commit comments

Comments
 (0)