Skip to content

Optimize Enn runner #18735

Open
Jiseong-oh wants to merge 1 commit intomainfrom
mem_zero_copy
Open

Optimize Enn runner #18735
Jiseong-oh wants to merge 1 commit intomainfrom
mem_zero_copy

Conversation

@Jiseong-oh
Copy link
Copy Markdown
Collaborator

@Jiseong-oh Jiseong-oh commented Apr 7, 2026

Summary

  • Open Model with zero-copy using dmabufheap
  • Improve model loading with zero-copy
  • Remove unncessary comments or headers. Change file name

cc @SS-JIA @digantdesai @kimishpatel

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18735

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 4 Unrelated Failures

As of commit b68ac07 with merge base b5ae0b9 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Jiseong-oh Jiseong-oh added partner: samsung For backend delegation, kernels, demo, etc. from the 3rd-party partner, Samsung module: samsung labels Apr 7, 2026
@Jiseong-oh Jiseong-oh requested a review from SS-JIA April 7, 2026 07:32
- Open Model with zero-copy using dmabufheap
- Improve model loading with zero-copy
- Remove unncessary comments or headers. Change file name
- remove enn api from default extension/file_data_loader

Co-authored-by: Hoon Choi <hoon98.choi@samsung.com>
Signed-off-by: Jiseong oh <jiseong.oh@samsung.com>
@SS-JIA
Copy link
Copy Markdown
Contributor

SS-JIA commented Apr 9, 2026

@claude help review this PR

@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Claude finished @SS-JIA's task in 4m 37s —— View job


PR Review: Optimize Enn runner — Memory Zero-Copy

  • Read all changed files
  • Review ExynosFileDataLoader implementation
  • Review EnnSharedMemoryManager implementation
  • Review changes to ENN API and executor
  • Review CMakeLists.txt changes
  • Review test coverage
  • Post comprehensive review feedback

Overview

This PR introduces zero-copy model loading for the Samsung ENN backend by:

  1. Adding a SharedMemoryManager that allocates buffers via EnnCreateBuffer (dmabuf heap)
  2. Creating a custom FileDataLoader (exynos_file_data_loader.cpp) that uses these shared buffers instead of operator new
  3. Modifying EnnExecutor::initialize to detect when model data is in a shared buffer and open the model via file descriptor (EnnOpenModelFromFd) instead of copying from memory
  4. Adding new ENN API bindings for buffer creation/release/fd-query
  5. Refactoring the ENN API singleton lifecycle

The concept is sound — if the model data is already in a dmabuf-backed buffer, passing the fd directly avoids a copy. However, there are several issues ranging from correctness bugs to architectural concerns.


Critical Issues

1. ret may be used uninitialized in EnnExecutor::initializeenn_executor.cpp:46-72

If _sm_instance->query() returns true but the inner if condition (_out->va == binary_buf_addr && ...) is false, then ret is never assigned before the ET_CHECK_OR_RETURN_ERROR(ret == ENN_RET_SUCCESS, ...) check on line 68. This is undefined behavior.

EnnReturn ret;  // uninitialized
...
if (_sm_instance->query(&_out, binary_buf_addr, buf_size)) {
    int fd;
    if (_out->va == binary_buf_addr &&
        !enn_api_inst->EnnGetFileDescriptorFromEnnBuffer(_out, &fd)) {
      ret = enn_api_inst->EnnOpenModelFromFd(fd, &model_id_);
      ...
    }
    // <-- if inner condition is false, ret is still uninitialized
}
if (!model_id_) {  // model_id_ may also be uninitialized garbage if query returned true but inner failed
    ret = enn_api_inst->EnnOpenModelFromMemory(...);
}
// ret may be uninitialized here
ET_CHECK_OR_RETURN_ERROR(ret == ENN_RET_SUCCESS, ...);

Fix this →

2. Test file references non-existent header and namespaceexynos_file_data_loader_test.cpp:1,13

#include <executorch/backends/samsung/runtime/enn_api_wrapper.h>  // does not exist
...
using executorch::backends::enn::api_wrapper::SharedMemoryManager;  // wrong namespace

The file enn_api_wrapper.h does not exist in this PR or the codebase. The actual header is enn_shared_memory_manager.h and the namespace is executorch::backends::enn::shared_memory_manager::SharedMemoryManager. This will cause a compilation failure and is likely the cause of the CI build failure.

Fix this →

3. EnnCreateBuffer loaded twice in loadApiLibenn_api_implementation.cpp:74,86

ENN_LOAD_API_FUNC(libenn_public_api_, EnnCreateBuffer, this) appears on both line 74 and line 86. The first load at line 74 is the new addition, and the second at line 86 is also new. This is redundant and suggests one was left in by mistake.

Fix this →


Design Concerns

4. Global replacement of FileDataLoader via CMake is fragileextension/data_loader/CMakeLists.txt:27-34

if(EXECUTORCH_BUILD_ENN)
  list(REMOVE_ITEM _extension_data_loader__srcs
       "extension/data_loader/file_data_loader.cpp"
  )
  list(APPEND _extension_data_loader__srcs
       "backends/samsung/runtime/extension/exynos_file_data_loader.cpp"
  )
endif()

This globally replaces the FileDataLoader implementation for the entire build when EXECUTORCH_BUILD_ENN is set. This means every user of FileDataLoader in the process will allocate from the ENN shared memory manager (dmabuf), not just the model loading path. This has several implications:

  • Any non-ENN code using FileDataLoader will get dmabuf allocations, which may be slower or have different lifetime semantics
  • The header file (exynos_file_data_loader.h) is a near-copy of the original file_data_loader.h, creating maintenance burden
  • The destructor in exynos_file_data_loader.cpp:56-66 calls std::free() on file_name_ but the name was allocated with ::strdup() (which uses malloc), so this works — but it's inconsistent with the original which used et_aligned_free. If the buffer manager allocates file_name_ differently in the future, this would break.

Consider a more targeted approach: either subclass FileDataLoader or create a separate ExynosDataLoader class rather than replacing the global implementation.

5. using namespace in header fileenn_shared_memory_manager.h:17

using namespace torch::executor::enn;

using namespace directives in header files pollute the namespace of every translation unit that includes the header. This should be avoided. Use fully qualified names or move the using directive inside the implementation file.

6. Double initialization of ENN APIenn_executor_runner.cpp:121-127 + enn_api_implementation.cpp:41-50

The EnnApi constructor now calls EnnInitialize() internally:

EnnApi::EnnApi() {
  auto status = loadApiLib();
  if (status == Error::Ok) {
    EnnInitialize();  // called in constructor
    ...

But the runner also calls it explicitly:

void exynos_npu_init() {
  EnnApi::getEnnApiInstance()->EnnInitialize();  // called again
}

Similarly, EnnDeinitialize is called both in the destructor and in exynos_npu_deinit(). This double init/deinit may cause issues depending on the ENN API's tolerance for redundant calls. And the old code in EnnExecutor::initialize that called EnnInitialize was removed but the constructor now does it, so the executor path is fine — but the runner path double-initializes.

7. Singleton SharedMemoryManager is never cleaned upenn_shared_memory_manager.cpp:28-31

The SharedMemoryManager is a Meyer's singleton with static lifetime. Its EnnBufferPtrList may still contain entries when the process exits. Since the destructor is defaulted (~SharedMemoryManager() = default), these buffers are leaked. If ENN API resources need explicit cleanup, this could be a resource leak.


Minor Issues

8. get_size_from_fd is defined but never usedenn_executor.cpp:27-39

This function is defined in the file but not called anywhere. Dead code should be removed.

9. free(void* ptr) calls free(ptr, {}) with default-constructed align_val_tenn_shared_memory_manager.cpp:74-76

void SharedMemoryManager::free(void* ptr) {
  free(ptr, {});
}

The alignment parameter in free(void* ptr, std::align_val_t alignment) is unused (the function never references alignment), so this works, but the overload is misleading. Consider removing the alignment parameter from the free path if it's not needed.

10. ET_LOG format specifier mismatchenn_shared_memory_manager.cpp:85

ET_LOG(Info, "va(%p), size(%d), offset(%d) is erased from LUT",
    ptr, (*it)->size, (*it)->offset);

(*it)->size and (*it)->offset are uint32_t but logged with %d (signed int). Use %u or PRIu32 for correctness.

11. ET_LOG format specifier mismatchenn_executor.cpp:48

ET_LOG(Info, "Start to open model %p, %ld", binary_buf_addr, buf_size);

buf_size is size_t, which should use %zu, not %ld.

12. Missing __hexagon__ in pread guardexynos_file_data_loader.cpp:30

The original file_data_loader.cpp includes defined(__hexagon__) in the ET_HAVE_PREAD guard, but the copy in exynos_file_data_loader.cpp only has defined(__xtensa__). This is a divergence from the upstream source that could cause issues on Hexagon platforms.

13. Unnecessary includesenn_executor.cpp:14-18

#include <android/log.h>
#include <sys/mman.h>

android/log.h is likely already pulled in through the logging header, and sys/mman.h is not used in this file (no mmap calls). These should be removed.


Summary

The zero-copy optimization concept is good, but the PR needs work before merging:

  • Must fix: Uninitialized ret variable (UB), missing enn_api_wrapper.h header (won't compile), duplicate EnnCreateBuffer load
  • Should address: Global FileDataLoader replacement strategy, using namespace in header, double ENN init/deinit
  • Nice to fix: Dead code, format specifier mismatches, unnecessary includes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: samsung partner: samsung For backend delegation, kernels, demo, etc. from the 3rd-party partner, Samsung

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants