Skip to content

Conversation

smirnov-alexey
Copy link
Contributor

Alternative to #32424

Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked all the changes but put some thoughts here.

I hope the # of changes can be reduced but with the same effect.

Comment on lines 183 to 211
// Need to wrap closure, since finalize_weights_bank() will
// asynchronously evaluate weights and put them in closure.
// Other functions of CompiledModel as well as InferRequest and
// other entities need to wait for the closure to be populated first
// (meaning to wait for async weights processing to end).
class SafeClosureWrapper {
public:
std::vector<ov::Tensor>& unsafe_get_closure() {
return m_closure;
}
std::vector<ov::Tensor>& get_closure() {
if (m_evaluated) {
return m_closure;
}
if (m_evaluation.valid()) {
m_evaluation.wait();
m_evaluated = true;
}
return m_closure;
}
void set_future(std::shared_future<void>& evaluation) {
m_evaluation = evaluation;
}

private:
std::vector<ov::Tensor> m_closure;
std::shared_future<void> m_evaluation;
bool m_evaluated = false;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to utils
Make it a generic template over type T.
Assume you don't know anything about closure here.

template<class T>
class delayed {
    T t;
    T& get() { if (...) { ...}
    const T& get() const { ... }.
};

as easy as that. I assume you can implement the both above methods with get_impl. so is the unsafe_get().

If you really want to go with closure specifics here, as some places in the edited code might suggest, maybe it'd be also worth to override operator[] to make the integration more seamless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::shared_future<void> m_evaluation;
bool m_evaluated = false;
};
mutable SafeClosureWrapper closure;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to make this field mutable, it should maintain the same contracts as the original vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{char{0x4c}, char{0x4c}, char{0x4d}, char{0x43}, char{0x4d}, char{0x4f}};

const constexpr char* NPUW_SERIALIZATION_VERSION = "0.13";
const constexpr char* NPUW_SERIALIZATION_VERSION = "0.14";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the format, however I fixed a potential bug in deserialization. Decided to bump just in case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how this changes a version tbh

Comment on lines -1497 to -1713
compiled->m_kvcache_compiled->m_import_weights_ctx.reset();
compiled->m_prefill_compiled->finalize_weights_bank();
compiled->m_prefill_compiled->m_import_weights_ctx.reset();

if (compiled->m_lm_head_compiled) {
compiled->m_lm_head_compiled->m_weights_bank = bank;

compiled->m_lm_head_compiled->finalize_weights_bank();
compiled->m_lm_head_compiled->m_import_weights_ctx.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I asked but it seems I didn't. Why do we need all these changes here? Why did it work before and how does it work now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's done after bank evaluation is finished in finalize_weights_bank() function. I reset the context to potentially release some memory after the import

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its good we have tests now but do they track memory consumption changes as well or it is not supposed to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not supposed to change much (there should be no dangling references or actual mmaped data). But you are right, we don't have memory consumption tests yet

@smirnov-alexey
Copy link
Contributor Author

build_jenkins

@smirnov-alexey
Copy link
Contributor Author

smirnov-alexey commented Oct 23, 2025

The tests is failing:

[ RUN      ] SerializationTest.Stress_ParallelImport
[WARNING] 23:17:27.37 [BackendsRegistry] Got an error during backend 'npu_level_zero_backend' loading : Exception from openvino\openvino\src\plugins\intel_npu\src\utils\src\zero\zero_api.cpp:25:
Cannot load library 'ze_loader.dll': 126 from cwd: C:\actions-runner\_work\openvino\openvino

[WARNING] 23:17:27.38 [BackendsRegistry] None of the backends were initialized successfully.Only offline compilation can be done!
[WARNING] 23:17:27.39 [NPUPlugin] No available compiler. Enabling only runtime options 
C:\BuildTools\VC\Tools\MSVC\14.42.34433\include\vector(202) : Assertion failed: vector iterators incompatible

As well as

[ RUN      ] BehaviorTestsNPUW.CompilationIsSuccessful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants