Skip to content

Add modern logging utils#4486

Open
lzhangzz wants to merge 12 commits intoInternLM:mainfrom
lzhangzz:logging
Open

Add modern logging utils#4486
lzhangzz wants to merge 12 commits intoInternLM:mainfrom
lzhangzz:logging

Conversation

@lzhangzz
Copy link
Copy Markdown
Collaborator

@lzhangzz lzhangzz commented Apr 2, 2026

Add an async logger backed by fmtlib

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces TurboMind’s legacy printf-style logger with a new turbomind::core::Logger that uses {}-style formatting via fmtlib and supports async logging, then migrates call sites and build targets accordingly.

Changes:

  • Introduce a new fmt-backed async logger in src/turbomind/core (with Catch2 coverage).
  • Migrate TM_LOG_* call sites from printf-style (%d/%s/...) to fmt-style ({}) across core/engine/models/kernels/comm/tests.
  • Remove the old src/turbomind/utils/logger.* library and update CMake targets/dependencies (FetchContent fmt + concurrentqueue).

Reviewed changes

Copilot reviewed 61 out of 61 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
CMakeLists.txt Fetch fmtlib via FetchContent; add nvcc diag suppression for fmt headers.
tests/csrc/unittests/CMakeLists.txt Removes legacy logger linkage from unittest target link sets.
tests/csrc/unittests/gtest_utils.h Migrates test logging to {}-style TM_LOG_*.
tests/csrc/unittests/test_logprob_kernels.cu Switches include to new core/logger.h.
tests/csrc/unittests/test_sampling_layer.cu Converts debug logging to fmt-style formatting.
tests/csrc/unittests/unittest_utils.h Converts some test logging to fmt-style formatting.
src/turbomind/utils/logger.h Deletes legacy logger header.
src/turbomind/utils/logger.cc Deletes legacy logger implementation.
src/turbomind/utils/CMakeLists.txt Drops logger lib; updates cuda_utils to link fmt; removes logger deps from utils libs.
src/turbomind/utils/cuda_utils.h Uses new logger; converts CUDA error logging to fmt-style strings.
src/turbomind/utils/cuda_utils.cc Uses new logger macros for sync-and-check logging.
src/turbomind/utils/anomaly_handler.cu Migrates warnings to TM_LOG_WARN with fmt-style formatting; switches include to core logger.
src/turbomind/turbomind.cc Converts various warnings/errors/info logs to fmt-style formatting.
src/turbomind/python/bind.cpp Converts memcpy failure log to fmt-style formatting.
src/turbomind/models/CMakeLists.txt Removes logger from models link list (models already links core).
src/turbomind/models/llama/CMakeLists.txt Removes logger from Llama target link list.
src/turbomind/models/llama/unified_attention_layer.cc Switches include to core logger; converts logs to fmt-style.
src/turbomind/models/llama/SequenceManager.h Adds <map> include (compile fix for new usage).
src/turbomind/models/llama/SequenceManager.cc Switches include to core logger; converts many logs to fmt-style.
src/turbomind/models/llama/LlamaWeight.cc Converts padding warnings to fmt-style.
src/turbomind/models/llama/LlamaLinear.cu Converts error log to fmt-style.
src/turbomind/models/llama/LlamaDecoderLayerWeight.cc Switches include to core logger; converts env log to fmt-style.
src/turbomind/models/llama/llama_kernels.cu Adds string_utils.h include (likely for helpers used elsewhere in file).
src/turbomind/models/llama/GatedDeltaNetLayer.cc Switches include to core logger; converts info log to fmt-style.
src/turbomind/models/llama/BlockTrie.cc Converts warnings/info to fmt-style.
src/turbomind/models/llama/BlockManager.h Switches include to core logger.
src/turbomind/models/llama/BlockManager.cc Switches include to core logger; converts logs to fmt-style.
src/turbomind/models/llama/Barrier.h Switches include to core logger; converts log to fmt-style.
src/turbomind/kernels/sampling_topk_kernels.h Switches include to core logger.
src/turbomind/kernels/sampling_topk_kernels.cu Adds string_utils.h include.
src/turbomind/kernels/sampling_penalty_kernels.cu Converts debug log to fmt-style.
src/turbomind/kernels/logprob_kernels.cu Switches include to core logger; converts debug log to fmt-style.
src/turbomind/kernels/decoding_kernels.cu Adds string_utils.h include.
src/turbomind/kernels/attention/CMakeLists.txt Replaces logger linkage with core/cuda_utils/fmt::fmt for tests.
src/turbomind/generation/utils.h Converts warning log to fmt-style.
src/turbomind/generation/sampling.cc Switches include to core logger; converts debug logs to fmt-style.
src/turbomind/generation/logits_processor.cc Converts debug/warn logs to fmt-style; minor cast cleanup.
src/turbomind/engine/request.cc Converts callback error logs to fmt-style.
src/turbomind/engine/gateway.h Switches include to core logger; converts warnings/info to fmt-style.
src/turbomind/engine/gateway.cc Converts gateway errors to fmt-style and removes legacy prefixes.
src/turbomind/engine/engine.cc Switches include to core logger; converts various logs to fmt-style.
src/turbomind/core/CMakeLists.txt Fetches concurrentqueue; adds logger.cc to core; updates core link deps; adds test_logger.
src/turbomind/core/logger.h New Logger API and TM_LOG_* macros using fmt format strings.
src/turbomind/core/logger.cc Implements async worker + formatting/prefixing + env var parsing + optional signal handling.
src/turbomind/core/test_logger.cc Adds Catch2 tests for logger formatting, filtering, env vars, sync/async behavior, fatal/signal cases (POSIX-guarded).
src/turbomind/core/tensor.cc Updates warning logging to fmt-style.
src/turbomind/core/stream.h Updates CUDA destroy error logs to fmt-style.
src/turbomind/core/check.h Extends CheckErrorStream to store source location.
src/turbomind/core/check.cc Switches check failures to go through new logger fatal path.
src/turbomind/core/allocator.h Updates commented-out debug logging to fmt-style placeholders.
src/turbomind/comm/CMakeLists.txt Removes legacy logger linkage from comm libs.
src/turbomind/comm/host_comm.h Switches include to core logger.
src/turbomind/comm/env.h Switches include to core logger; converts env logging to fmt-style.
src/turbomind/comm/test_comm.cu Converts info logs to fmt-style; updates commented debug log format.
src/turbomind/comm/nccl/CMakeLists.txt Removes logger from nccl_comm linkage.
src/turbomind/comm/nccl/nccl.cu Switches include to core logger; converts warnings/errors and NCCL check message formatting.
src/turbomind/comm/gloo/CMakeLists.txt Removes logger from gloo_comm linkage.
src/turbomind/comm/gloo/tcp_store.cc Switches include to core logger; converts warning/error logs to fmt-style.
src/turbomind/comm/gloo/gloo_comm.cc Switches include to core logger; converts info log to fmt-style.
src/turbomind/comm/cuda_ipc/CMakeLists.txt Removes logger from cuda_ipc_comm linkage.
src/turbomind/comm/cuda_ipc/cuda_ipc_comm.cu Switches include to core logger; converts warnings to fmt-style.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 45 to 48
explicit TestFailureError(std::string name, std::string msg = "")
{
msg_ = fmtstr("TEST FAIL [%s] %s", name.c_str(), msg.c_str());
msg_ = fmtstr("TEST FAIL [%s] %s", name, msg.c_str());
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmtstr uses snprintf/printf-style formatting. Passing name (a std::string) to a %s specifier is undefined behavior and can corrupt the error message or crash. Use name.c_str() (and msg.c_str()), or switch this to fmt::format/TM_LOG_*-style formatting instead of fmtstr.

Copilot uses AI. Check for mistakes.
Comment on lines 156 to 161
param.kind = cudaMemcpyDefault;

if (auto ec = cudaMemcpy3DAsync(&param, stream.handle()); ec == cudaSuccess) {
TM_LOG_WARNING(cudaGetErrorString(ec));
TM_LOG_WARN("{}", cudaGetErrorString(ec));
return;
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cudaMemcpy3DAsync return-code check is inverted: it logs a warning and returns when ec == cudaSuccess, which will emit a spurious "no error" message and short-circuit the normal fallback path. This should return silently on success, and only log/fallback when ec != cudaSuccess.

Copilot uses AI. Check for mistakes.

static void OnFatalSignal(int signum)
{
AsyncLogWorker::Instance().Stop();
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnFatalSignal calls AsyncLogWorker::Stop(), which enqueues to a concurrent queue and join()s a std::thread. Those operations are not async-signal-safe and can deadlock/crash if the signal interrupts code holding internal locks/allocators. Consider making the signal handler minimal (e.g., disable async + best-effort write(2, ...)), or gate this draining behavior behind an opt-in env var with a safe default.

Suggested change
AsyncLogWorker::Instance().Stop();
#ifndef _WIN32
// Best-effort, async-signal-safe diagnostic.
static const char msg[] = "turbomind: caught fatal signal, aborting\n";
::write(STDERR_FILENO, msg, sizeof(msg) - 1);
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +34 to 36
TM_LOG_FATAL("CUDA runtime error: {} {}:{}", _cudaGetErrorEnum(result), file, line);
std::abort();
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TM_LOG_FATAL is [[noreturn]] (it calls Logger::LogFatal which aborts), so the subsequent std::abort() is unreachable/redundant. Dropping the extra abort avoids dead code and potential compiler warnings.

Copilot uses AI. Check for mistakes.
Comment on lines 70 to +72
if (failures > tol_failures) {
TM_LOG_ERROR("%s (failures: %.2f%% atol: %.2e rtol: %.2e rel_gap: %.2e%%)",
name.c_str(), 100. * failures / size, atol, rtol, 100. * relative_gap);
TM_LOG_ERROR({} (failures: {:.2f}% atol: {:.2e} rtol: {:.2e} rel_gap: {:.2e}%)",
name, (double)(100. * failures / size), atol, rtol, 100. * relative_gap);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TM_LOG_ERROR call is missing the opening quote for the format string (TM_LOG_ERROR({} ...), which makes the file fail to compile. Also, this file still includes the removed legacy header (src/turbomind/utils/logger.h), so even after fixing the quote it should include src/turbomind/core/logger.h (or another header that provides the new {}-based TM_LOG_* macros).

Copilot uses AI. Check for mistakes.
Comment on lines 46 to +53
target_link_libraries( # Libs for test_attention_kernels
unittest PUBLIC
CUDA::cudart CUDA::curand
gpt_kernels gtest memory_utils tensor unfused_attention_kernels cuda_utils logger)
gpt_kernels gtest memory_utils tensor unfused_attention_kernels cuda_utils)
target_link_libraries( # Libs for test_logprob_kernels
unittest PUBLIC
CUDA::cudart
logprob_kernels memory_utils cuda_utils logger)
logprob_kernels memory_utils cuda_utils)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger was removed from the unittest link lineups, but the new TM_LOG_* macros now live in core/logger.h and require core (for turbomind::core::Logger implementation in logger.cc) to be linked into the unittest executable. As-is, unit tests that use TM_LOG_* are likely to hit undefined references at link time; add core (or a higher-level target that already brings it in) to the relevant target_link_libraries(unittest ...) lines.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants