Take a look at SkyPilot. We will want to run experiments on baremetal and oc to collect both model perf (accuracy, perplexity etc.) and system perf (latency dist, tput dist) results. SkyPilot might help manage multiple backends. If it's too "heavy", we'll ignore SkyPilot for now.
Take a look at SkyPilot. We will want to run experiments on baremetal and oc to collect both model perf (accuracy, perplexity etc.) and system perf (latency dist, tput dist) results. SkyPilot might help manage multiple backends. If it's too "heavy", we'll ignore SkyPilot for now.