add a HW agnostic test that runs trtllm-bench and compares AD to PT BEs. consider adding a test vs. golden perf too