Updates:
vLLM backend:
- added support for inferencing models on multiple GPUs;
- added extra parameters for controlling the process of generation;
- added support for speculative decoding (https://docs.vllm.ai/en/latest/models/spec_decode.html, experimental vLLM feature);
- added several optimizations;
Other:
- improved instruction prompt;
- adjusted object categories from where objects are being sampled;
- bug fixes