add a test for `ty_benchmark` that asserts we have roughly the number of expected errors

It seems like uv had some change in 0.4 that meant our red-knot benchmark silently stopped working, in that the dependencies for our benchmark projects weren't being installed into the right virtual environment: https://github.com/astral-sh/ruff/pull/13228#issuecomment-2326322306. (Probably a skill issue in our usage of uv rather than a breaking change from uv.)

It would be great if we could have some kind of test in CI that checks that the tools we're running emit roughly the number of errors we expect, so that it doesn't silently become invalid in the future.

Relatedly: mypy emits one error when checking black via our benchmark infrastructure, and we're not sure why that is. Black is compiled with mypyc and they run mypy in CI, so there should probably be 0 errors there! We're probably invoking mypy slightly wrong somehow? If we could get a clean run of mypy on black in our benchmark, that would make it significantly easier to add this kind of test.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add a test for `ty_benchmark` that asserts we have roughly the number of expected errors #241

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add a test for ty_benchmark that asserts we have roughly the number of expected errors #241

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

add a test for `ty_benchmark` that asserts we have roughly the number of expected errors #241