[NPUW] Async weights bank processing and closure guard #32505

smirnov-alexey · 2025-10-21T18:32:15Z

Alternative to #32424

dmatveev

Haven't checked all the changes but put some thoughts here.

I hope the # of changes can be reduced but with the same effect.

dmatveev · 2025-10-21T18:43:34Z

src/plugins/intel_npu/src/plugin/npuw/compiled_model.hpp

+        // Need to wrap closure, since finalize_weights_bank() will
+        // asynchronously evaluate weights and put them in closure.
+        // Other functions of CompiledModel as well as InferRequest and
+        // other entities need to wait for the closure to be populated first
+        // (meaning to wait for async weights processing to end).
+        class SafeClosureWrapper {
+        public:
+            std::vector<ov::Tensor>& unsafe_get_closure() {
+                return m_closure;
+            }
+            std::vector<ov::Tensor>& get_closure() {
+                if (m_evaluated) {
+                    return m_closure;
+                }
+                if (m_evaluation.valid()) {
+                    m_evaluation.wait();
+                    m_evaluated = true;
+                }
+                return m_closure;
+            }
+            void set_future(std::shared_future<void>& evaluation) {
+                m_evaluation = evaluation;
+            }
+
+        private:
+            std::vector<ov::Tensor> m_closure;
+            std::shared_future<void> m_evaluation;
+            bool m_evaluated = false;
+        };


Move it to utils
Make it a generic template over type T.
Assume you don't know anything about closure here.

template<class T> class delayed { T t; T& get() { if (...) { ...} const T& get() const { ... }. };

as easy as that. I assume you can implement the both above methods with get_impl. so is the unsafe_get().

If you really want to go with closure specifics here, as some places in the edited code might suggest, maybe it'd be also worth to override operator[] to make the integration more seamless.

dmatveev · 2025-10-21T18:45:43Z

src/plugins/intel_npu/src/plugin/npuw/compiled_model.hpp

+            std::shared_future<void> m_evaluation;
+            bool m_evaluated = false;
+        };
+        mutable SafeClosureWrapper closure;


There's no need to make this field mutable, it should maintain the same contracts as the original vector.

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp

src/plugins/intel_npu/src/plugin/npuw/compiled_model.hpp

src/plugins/intel_npu/src/plugin/npuw/util.hpp

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp

dmatveev · 2025-10-22T15:11:29Z

src/plugins/intel_npu/src/plugin/npuw/serialization.hpp

    {char{0x4c}, char{0x4c}, char{0x4d}, char{0x43}, char{0x4d}, char{0x4f}};

-const constexpr char* NPUW_SERIALIZATION_VERSION = "0.13";
+const constexpr char* NPUW_SERIALIZATION_VERSION = "0.14";


What changed here?

Nothing in the format, however I fixed a potential bug in deserialization. Decided to bump just in case

not sure how this changes a version tbh

dmatveev · 2025-10-22T15:12:51Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

-            compiled->m_kvcache_compiled->m_import_weights_ctx.reset();
            compiled->m_prefill_compiled->finalize_weights_bank();
-            compiled->m_prefill_compiled->m_import_weights_ctx.reset();

            if (compiled->m_lm_head_compiled) {
                compiled->m_lm_head_compiled->m_weights_bank = bank;

                compiled->m_lm_head_compiled->finalize_weights_bank();
-                compiled->m_lm_head_compiled->m_import_weights_ctx.reset();


I thought I asked but it seems I didn't. Why do we need all these changes here? Why did it work before and how does it work now?

Now it's done after bank evaluation is finished in finalize_weights_bank() function. I reset the context to potentially release some memory after the import

Its good we have tests now but do they track memory consumption changes as well or it is not supposed to change?

It's not supposed to change much (there should be no dangling references or actual mmaped data). But you are right, we don't have memory consumption tests yet

smirnov-alexey · 2025-10-22T22:52:30Z

build_jenkins

smirnov-alexey · 2025-10-23T11:16:55Z

The tests is failing:

[ RUN      ] SerializationTest.Stress_ParallelImport
[WARNING] 23:17:27.37 [BackendsRegistry] Got an error during backend 'npu_level_zero_backend' loading : Exception from openvino\openvino\src\plugins\intel_npu\src\utils\src\zero\zero_api.cpp:25:
Cannot load library 'ze_loader.dll': 126 from cwd: C:\actions-runner\_work\openvino\openvino

[WARNING] 23:17:27.38 [BackendsRegistry] None of the backends were initialized successfully.Only offline compilation can be done!
[WARNING] 23:17:27.39 [NPUPlugin] No available compiler. Enabling only runtime options 
C:\BuildTools\VC\Tools\MSVC\14.42.34433\include\vector(202) : Assertion failed: vector iterators incompatible

As well as

[ RUN      ] BehaviorTestsNPUW.CompilationIsSuccessful

smirnov-alexey added 7 commits October 15, 2025 15:52

WIP async bank processing during import

190c82a

Always evaluate bank async

4c3da31

Move resets after bank's finalized

7eb64b4

Merge branch 'master' into as/npuw_async_bank_finalization

83c43dd

Merge branch 'master' into as/npuw_async_bank_finalization

5e12d4d

Add one more wait() to prevent data race

69917c7

Guard closure for async weights evaluation

d414e58

smirnov-alexey added this to the 2025.4 milestone Oct 21, 2025

smirnov-alexey requested a review from dmatveev October 21, 2025 18:32

smirnov-alexey assigned dmatveev Oct 21, 2025

smirnov-alexey requested a review from a team as a code owner October 21, 2025 18:32

smirnov-alexey added the Code Freeze label Oct 21, 2025

smirnov-alexey requested a review from a team as a code owner October 21, 2025 18:32

smirnov-alexey mentioned this pull request Oct 21, 2025

[NPUW] Async weights bank processing #32424

Open

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 21, 2025

dmatveev reviewed Oct 21, 2025

View reviewed changes

Address review comments

c4ebbab

smirnov-alexey commented Oct 22, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/npuw/util.hpp Outdated Show resolved Hide resolved

Clean up

e38bd1e

dmatveev reviewed Oct 22, 2025

View reviewed changes

smirnov-alexey added 3 commits October 22, 2025 16:49

Merge branch 'master' into as/npuw_async_bank_closure_guard

a702e30

Refactoring

5c24d9c

Merge branch 'master' into as/npuw_async_bank_closure_guard

d11e98d

[NPUW] Async weights bank processing and closure guard #32505

Are you sure you want to change the base?

[NPUW] Async weights bank processing and closure guard #32505

Conversation

smirnov-alexey commented Oct 21, 2025

Uh oh!

dmatveev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smirnov-alexey commented Oct 22, 2025

Uh oh!

smirnov-alexey commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smirnov-alexey commented Oct 23, 2025 •

edited

Loading