feat: Support MedCAT v2 #25

baixiac · 2025-10-22T11:17:56Z

Py3.12 support has also been added and Py3.8 support was deprecated and removed.

kawsarnoor

all looks like it makes sense...might be worth running this past Mart perhaps since he is most knowledgable on v2

baixiac · 2025-11-03T12:57:42Z

all looks like it makes sense...might be worth running this past Mart perhaps since he is most knowledgable on v2

Added @mart-r in here. Just noticed that both MedCAT v2.0 and v2.1 got yanked. Are they the only versions that support Py3.9?

mart-r · 2025-11-05T15:01:43Z

Regarding version support, python 3.9 is end of life now, so we're not supporting that anymore. Everything from now on will be 3.10 and up (currently 3.10 to 3.13).

mart-r

I think this looks good overall from medcat v2 side of things! A few nitpicks more than anything else.

There's another few nitpicks I'd like to comment on in general:

Since python 3.9, you don't need to use typing.Set (or List, Tuple, Dict, probably a few others) for generic type hinting. You can just use the types themselves (i.e set[str])
If you were to drop 3.9 support, you could also use python 3.10-specific type hinting (e.g unions with |: set[str] | None instead of Union[set[str]]; and so on)
Both of the above do probably fall outside the scope of this PR, though - just noticed thse changes - even new ones, I guess it's just about consistency at this point

EDIT:
I'd like to also point out that just because I didn't see anything that may break with these changes, doesn't mean these things don't exist. Especially given I only reviewed the changes - so if something was omitted, I didn't review that.

mart-r · 2025-11-05T15:14:30Z

app/trainers/medcat_trainer.py

                        "per_concept_tp": tps.get(cui, 0),
                        "per_concept_counts": cc.get(cui, 0),
-                        "per_concept_count_train": model.cdb.cui2count_train.get(cui, 0),
+                        "per_concept_count_train": cast(Dict[str, Any], model.cdb.cui2info.get(cui, {})).get("count_train", 0),


Not sure why that's cast to a dict? Wouldn't it be an int as before?

mart-r · 2025-11-05T15:16:40Z

app/trainers/medcat_trainer.py

-                train_count.append(model.cdb.cui2count_train[c] if c in model.cdb.cui2count_train else 0)
-                concept_names.append(model.cdb.get_name(c))
+                train_count.append(model.cdb.cui2info.get(c, {}).get("count_train", 0))  # type: ignore
+                concept_names.append(model.cdb.cui2info.get(c, {}).get("preferred_name", ""))  # type: ignore


CDB.get_name still exists. But the behvaiour here seems to be different. I.e before it would return the CUI if the CDB didn't have the concept (and the method still would), but the new implementation returns an empty string instead in those cases.

mart-r · 2025-11-05T15:19:08Z

app/trainers/medcat_trainer.py

-            self._tracker_client.send_model_stats(model.cdb.make_stats(), step)
-            before_cui2count_train = dict(model.cdb.cui2count_train)
+            self._tracker_client.send_model_stats(dict(model.cdb.get_basic_info()), step)
+            before_cui2count_train = {c: info["count_train"] for c, info in model.cdb.cui2info.items()}


There does exist a CDB.get_cui2count_train method that builds the cui2count_train mapping (that is a mapping of only CUIs that do in fact have training examples). The difference being that the new implementation here would create a mapping with all concepts (even ones with no training). And that's likely to be considerably bigger since (at least in my experience) most concepts don't have training.

mart-r · 2025-11-05T15:20:57Z

app/trainers/medcat_trainer.py

-                    key=lambda item: item[1],
+                c: info["count_train"]
+                for c, info in sorted(
+                    model.cdb.cui2info.items(),


Again, could use CDB.get_cui2count_train to avoid the untrained concepts in here.

mart-r · 2025-11-05T15:22:42Z

app/trainers/metacat_trainer.py

-                for meta_cat in model._meta_cats:
+                model.config.meta.description = description or model.config.meta.description
+                meta_cat_addons = [
+                    addon for addon in model.get_addons()


Could use the CAT.get_addons_of_type method instead.

mart-r · 2025-11-05T15:23:47Z

app/trainers/metacat_trainer.py

-                    self._tracker_client.log_model_config(self.get_flattened_config(meta_cat, category_name))
+                assert self._model_service.model is not None, "Model should not be None"
+                meta_cat_addons = [
+                    addon for addon in self._model_service.model.get_addons()


Again, could use the CAT.get_addons_of_type method

baixiac · 2025-11-06T17:20:48Z

Yeah, the old style of type hinting is mainly there for making older Python versions happy. Alright, will deprecate Py3.9 support for MedCAT v2 in a separate PR then.

baixiac added 4 commits October 21, 2025 16:31

feat: support MedCAT v2

412a940

feat: make MedCAT V2 ontology mappings configurable

20eeab7

fix: add workaround for metrics collection for supervsised training

99b9fcd

feat: add concept ids to evaluation results and deprecate py38 support

57a6ab6

baixiac force-pushed the medcat2 branch from eabbb39 to ea3e32a Compare October 22, 2025 12:24

feat: upgrade to MedCAT 2.2 and update dependencies

a2635f7

baixiac force-pushed the medcat2 branch from ea3e32a to a2635f7 Compare October 22, 2025 13:13

baixiac requested review from kawsarnoor and phoevos October 23, 2025 09:12

baixiac added 2 commits October 23, 2025 15:02

chore: update individual local dev containers

f41a15f

chore: improve type hints

896a814

kawsarnoor approved these changes Nov 3, 2025

View reviewed changes

baixiac force-pushed the medcat2 branch 3 times, most recently from ff8a58e to 3cb5ffb Compare November 3, 2025 12:10

feat: support python 3.12

4f42271

baixiac force-pushed the medcat2 branch from 3cb5ffb to 4f42271 Compare November 3, 2025 12:32

baixiac requested a review from mart-r November 3, 2025 12:54

mart-r approved these changes Nov 5, 2025

View reviewed changes

feat: add improvement based on review

9499f35

baixiac force-pushed the medcat2 branch from 76fa41d to 9499f35 Compare November 6, 2025 17:09

baixiac merged commit 5acddc8 into main Nov 10, 2025
8 checks passed

baixiac deleted the medcat2 branch November 11, 2025 16:12

feat: Support MedCAT v2 #25

feat: Support MedCAT v2 #25

Uh oh!

Conversation

baixiac commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kawsarnoor left a comment

Choose a reason for hiding this comment

Uh oh!

baixiac commented Nov 3, 2025

Uh oh!

mart-r commented Nov 5, 2025

Uh oh!

mart-r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baixiac commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

baixiac commented Oct 22, 2025 •

edited

Loading

mart-r left a comment •

edited

Loading