-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Support MedCAT v2 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kawsarnoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all looks like it makes sense...might be worth running this past Mart perhaps since he is most knowledgable on v2
ff8a58e to
3cb5ffb
Compare
Added @mart-r in here. Just noticed that both MedCAT v2.0 and v2.1 got yanked. Are they the only versions that support Py3.9? |
|
Regarding version support, python 3.9 is end of life now, so we're not supporting that anymore. Everything from now on will be 3.10 and up (currently 3.10 to 3.13). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good overall from medcat v2 side of things! A few nitpicks more than anything else.
There's another few nitpicks I'd like to comment on in general:
- Since python 3.9, you don't need to use
typing.Set(orList,Tuple,Dict, probably a few others) for generic type hinting. You can just use the types themselves (i.eset[str]) - If you were to drop 3.9 support, you could also use python 3.10-specific type hinting (e.g unions with
|:set[str] | Noneinstead ofUnion[set[str]]; and so on) - Both of the above do probably fall outside the scope of this PR, though - just noticed thse changes - even new ones, I guess it's just about consistency at this point
EDIT:
I'd like to also point out that just because I didn't see anything that may break with these changes, doesn't mean these things don't exist. Especially given I only reviewed the changes - so if something was omitted, I didn't review that.
app/trainers/medcat_trainer.py
Outdated
| "per_concept_tp": tps.get(cui, 0), | ||
| "per_concept_counts": cc.get(cui, 0), | ||
| "per_concept_count_train": model.cdb.cui2count_train.get(cui, 0), | ||
| "per_concept_count_train": cast(Dict[str, Any], model.cdb.cui2info.get(cui, {})).get("count_train", 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why that's cast to a dict? Wouldn't it be an int as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/trainers/medcat_trainer.py
Outdated
| train_count.append(model.cdb.cui2count_train[c] if c in model.cdb.cui2count_train else 0) | ||
| concept_names.append(model.cdb.get_name(c)) | ||
| train_count.append(model.cdb.cui2info.get(c, {}).get("count_train", 0)) # type: ignore | ||
| concept_names.append(model.cdb.cui2info.get(c, {}).get("preferred_name", "")) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDB.get_name still exists. But the behvaiour here seems to be different. I.e before it would return the CUI if the CDB didn't have the concept (and the method still would), but the new implementation returns an empty string instead in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/trainers/medcat_trainer.py
Outdated
| self._tracker_client.send_model_stats(model.cdb.make_stats(), step) | ||
| before_cui2count_train = dict(model.cdb.cui2count_train) | ||
| self._tracker_client.send_model_stats(dict(model.cdb.get_basic_info()), step) | ||
| before_cui2count_train = {c: info["count_train"] for c, info in model.cdb.cui2info.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does exist a CDB.get_cui2count_train method that builds the cui2count_train mapping (that is a mapping of only CUIs that do in fact have training examples). The difference being that the new implementation here would create a mapping with all concepts (even ones with no training). And that's likely to be considerably bigger since (at least in my experience) most concepts don't have training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/trainers/medcat_trainer.py
Outdated
| key=lambda item: item[1], | ||
| c: info["count_train"] | ||
| for c, info in sorted( | ||
| model.cdb.cui2info.items(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, could use CDB.get_cui2count_train to avoid the untrained concepts in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/trainers/metacat_trainer.py
Outdated
| for meta_cat in model._meta_cats: | ||
| model.config.meta.description = description or model.config.meta.description | ||
| meta_cat_addons = [ | ||
| addon for addon in model.get_addons() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use the CAT.get_addons_of_type method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/trainers/metacat_trainer.py
Outdated
| self._tracker_client.log_model_config(self.get_flattened_config(meta_cat, category_name)) | ||
| assert self._model_service.model is not None, "Model should not be None" | ||
| meta_cat_addons = [ | ||
| addon for addon in self._model_service.model.get_addons() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, could use the CAT.get_addons_of_type method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Yeah, the old style of type hinting is mainly there for making older Python versions happy. Alright, will deprecate Py3.9 support for MedCAT v2 in a separate PR then. |
Py3.12 support has also been added and Py3.8 support was deprecated and removed.