We currently support vLLM as the only local GPU inference backend. We should add SGLang as an alternative.Having a second backend gives users flexibility without changing the evaluation workflow and SGLang is also gaining a lot of popularity - which might be handful for future users. What do you think?
We currently support vLLM as the only local GPU inference backend. We should add SGLang as an alternative.Having a second backend gives users flexibility without changing the evaluation workflow and SGLang is also gaining a lot of popularity - which might be handful for future users. What do you think?