1. qpth (locuslab/qpth) - Pure PyTorch, GPU-native interior-point solver - Accepts Q as (m, m) and h as (batch, m) — it auto-broadcasts Q, which directly exploits your shared-Gramian structure - Your constraint maps cleanly: G_ineq = -I, h_i = -u_i - Caveat: minimally maintained, and numerically sensitive — recommend float64 2. lqp_py (ipo-lab/lqp_py) - PyTorch + GPU, designed for batched QPs - Only supports box constraints (l <= x <= u), but your u_i <= v is a half-bounded box (l = u_i, u = +inf), so it maps cleanly - Differentiable - Minimally maintained, simpler API 3. cvxpylayers - Declare once in CVXPY, run as a PyTorch layer; GPU via CuClarabel backend - Well-maintained, general, but more overhead than specialized solvers, possibly not batched