-
Notifications
You must be signed in to change notification settings - Fork 135
Description
It seems like uv had some change in 0.4 that meant our red-knot benchmark silently stopped working, in that the dependencies for our benchmark projects weren't being installed into the right virtual environment: astral-sh/ruff#13228 (comment). (Probably a skill issue in our usage of uv rather than a breaking change from uv.)
It would be great if we could have some kind of test in CI that checks that the tools we're running emit roughly the number of errors we expect, so that it doesn't silently become invalid in the future.
Relatedly: mypy emits one error when checking black via our benchmark infrastructure, and we're not sure why that is. Black is compiled with mypyc and they run mypy in CI, so there should probably be 0 errors there! We're probably invoking mypy slightly wrong somehow? If we could get a clean run of mypy on black in our benchmark, that would make it significantly easier to add this kind of test.