Skip to content

Commit f9579c4

Browse files
committed
Merge remote-tracking branch 'origin/master' into snippets-arm-jit-binary-call-emitter
2 parents a20f240 + 338fa1e commit f9579c4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4965
-5165
lines changed

docs/articles_en/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@ Specifying ``EXPORT_BLOB`` and ``BLOB_PATH`` parameters works similarly to ``CAC
289289
* To export a blob with weights you need to pass ``"CACHE_MODE" : "OPTIMIZE_SPEED"`` in the config.
290290
* If the blob is exported as weightless you also need to either provide
291291
``"WEIGHTS_PATH" : "path\\to\\original\\model.bin"`` or ``"MODEL_PTR" : original ov::Model object``.
292+
* Ahead-of-time import in weightless mode has been optimized to consume less memory than during regular compilation or using ``CACHE_DIR``.
292293

293294
.. tab-set::
294295

docs/articles_en/openvino-workflow/running-inference/model-input-output/dynamic-shapes.rst

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ As it was demonstrated in the :doc:`Changing Input Shapes <changing-input-shape>
1818
Reshaping models provides an ability to customize the model input shape for the exact size required in the end application.
1919
This article explains how the ability of model to reshape can further be leveraged in more dynamic scenarios.
2020

21-
Applying Dynamic Shapes
22-
#######################
21+
When to Use Dynamic Shapes
22+
##########################
2323

2424
Conventional "static" model reshaping works well when it can be done once per many model inference calls with the same shape.
2525
However, this approach does not perform efficiently if the input tensor shape is changed on every inference call. Calling the ``reshape()`` and ``compile_model()`` methods each time a new size comes is extremely time-consuming.
@@ -40,12 +40,14 @@ The methods are sensitive to model internals, do not always give optimal perform
4040
For a short overview of the methods, refer to the :doc:`When Dynamic Shapes API is Not Applicable <dynamic-shapes/openvino-without-dynamic-shapes-api>` page.
4141
Apply those methods only if native dynamic shape API described in the following sections does not work or does not perform as expected.
4242

43-
The decision about using dynamic shapes should be based on proper benchmarking of a real application with real data.
44-
Unlike statically shaped models, dynamically shaped ones require different inference time, depending on input data shape or input tensor content.
45-
Furthermore, using the dynamic shapes can bring more overheads in memory and running time of each inference call depending on hardware plugin and model used.
43+
It is recommended to benchmark your application with real data to see if you need dynamic shapes and how it affects performance and resource use. Dynamic shapes can change inference performance and memory requirements compared to static shapes. The impact depends on the hardware plugin used, such as CPU, GPU, or NPU, and on the specific model.
4644

47-
Handling Dynamic Shapes
48-
#######################
45+
.. note::
46+
47+
**GPU Dynamic Shape Support:** GPUs support dynamic shapes, but optimization is still in progress for a broader range of models. Performance may vary depending on the specific model and use case. Consider testing with your specific workload to evaluate performance.
48+
49+
How to Use Dynamic Shapes
50+
#########################
4951

5052
This section describes how to handle dynamically shaped models with OpenVINO Runtime API version 2022.1 and higher. When using dynamic shapes, there are three main differences in the workflow than with static shapes:
5153

src/plugins/intel_gpu/src/graph/debug_helper.cpp

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,7 +466,31 @@ NodeDebugHelper::~NodeDebugHelper() {
466466
dump_raw);
467467
}
468468
}
469+
for (size_t i = 0; i < m_inst.get_intermediates_memories().size(); i++) {
470+
std::string name = get_file_prefix() + "_intermediates_" + std::to_string(i);
471+
auto output_mem = m_inst.get_intermediates_memories()[i];
472+
if (output_mem == nullptr) {
473+
GPU_DEBUG_COUT << " intermediates_mem is nullptr. Nothing to dump." << std::endl;
474+
continue;
475+
}
469476

477+
auto& output_layout = output_mem->get_layout();
478+
if (config.get_dump_tensors_format() == ov::intel_gpu::DumpFormat::binary) {
479+
// Binary dump : raw
480+
auto filename = get_file_path_for_binary_dump(output_layout, name, config.get_dump_tensors_path());
481+
482+
mem_lock<char, mem_lock_type::read> lock(output_mem, m_stream);
483+
ov::util::save_binary(filename, lock.data(), output_mem->size());
484+
GPU_DEBUG_COUT << " Dump layer dst : " << layer_name << " to " << filename << std::endl;
485+
debug_str_for_bin_load += (filename + ",");
486+
} else {
487+
const bool dump_raw = config.get_dump_tensors_format() == ov::intel_gpu::DumpFormat::text_raw;
488+
GPU_DEBUG_COUT << " Dump " << (dump_raw ? "raw " : "") << name << std::endl;
489+
auto filename = config.get_dump_tensors_path() + get_name_for_dump(name) + ".txt";
490+
// Text dump
491+
log_memory_to_file(output_mem, output_layout, m_stream, filename, dump_raw);
492+
}
493+
}
470494
if (config.get_dump_tensors_format() == ov::intel_gpu::DumpFormat::binary && m_inst.is_input()) {
471495
debug_str_for_bin_load[debug_str_for_bin_load.size()-1] = '\"';
472496
GPU_DEBUG_COUT << debug_str_for_bin_load << std::endl;;

src/plugins/intel_gpu/src/graph/impls/ocl/kernels_cache.cpp

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -213,13 +213,6 @@ void kernels_cache::get_program_source(const kernels_code& kernels_source_code,
213213

214214
current_batch.has_microkernels |= kernel_string->has_microkernels;
215215

216-
// TODO: Technically, microkernels doesn't require specific headers, but we don't want to include
217-
// some headers to all batches as it may lead to compilation error on some driver versions.
218-
// Need to generalize work with headers to include only necessary parts
219-
if (current_batch.has_microkernels) {
220-
current_batch.source.insert(current_batch.source.begin(), current_batch.micro_headers.begin(), current_batch.micro_headers.end());
221-
}
222-
223216
current_batch.source.push_back(std::move(full_code));
224217
current_batch.kernels_counter++;
225218
}

0 commit comments

Comments
 (0)