Hi,
Is there any tutorial that we can refer to so that we could serve a deberta model using fastertransformer in Triton?
I think the steps would be:
- Convert a deberta-v2 model into fastertransformer;
- Dump the weights;
- Load it in the triton inference server;
However, I only see the step 1 with a tensorflow example.