Skip to content

Logging improvements #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: v1.x.x
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,5 @@ This release includes a new `ConfigManager` class to simplify managing the confi
- Fix a bug in `BackgroundService` where it won't try to `self.cancel()` and `await self.wait()` if there are no internal tasks. This prevented to properly implement custom stop logic without having to redefine the `stop()` method too.

- Fix a bug where if a string was passed to the `ConfigManagingActor` it would be interpreted as a sequence of 1 character strings.

- Remove a confusing log message and rate-limit another spammy log message.
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ copyright: "Copyright © 2022 Frequenz Energy-as-a-Service GmbH"
repo_name: "frequenz-sdk-python"
repo_url: "https://github.com/frequenz-floss/frequenz-sdk-python"
edit_uri: "edit/v1.x.x/docs/"
strict: true # Treat warnings as errors
# strict: true # Treat warnings as errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, marshmallow-code/marshmallow#2739 is fixed.


# Build directories
theme:
Expand Down
300 changes: 300 additions & 0 deletions src/frequenz/sdk/_internal/_logging.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
# License: MIT
# Copyright © 2024 Frequenz Energy-as-a-Service GmbH

"""Logging utilities for the SDK."""

import logging
from collections.abc import Mapping
from datetime import datetime, timedelta
from types import TracebackType

_ExcInfoType = (
bool
| BaseException
| tuple[None, None, None]
| tuple[type[BaseException], BaseException, TracebackType | None]
| None
)

DEFAULT_RATE_LIMIT = timedelta(minutes=15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just put the literal timedelta() in the constructor to avoid the indirection in the docs. If you keep it this way, you should documented via a docstring so it appears in the docs and users can know what's the default.


# The standard logging.py file uses variadic arguments in the logging methods, but the
# type hints file has more specific parameters. pylint is not able to handle this, so
# we need to suppress the warning.
#
# pylint: disable=arguments-differ


class RateLimitedLogger:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to inherit from Logger so this can be used in places where a plain Logger is expected? If you do so you probably only need to implement log().

And maybe this could be implemented as a Filter instead (here is an example filter to de-duplicate messages), but not sure, because filters are applied at the Handler level, so it would apply to everything that's logged or we need to customize the config in such a way that messages that we want to rate-limit must be handled by a different handler.

This looks quite interesting: https://github.com/samuller/log-rate-limit, it is a filter but allows overriding on every log call too, and allows grouping messages in streams, and individual streams can be rate-limited individually.

"""Logger that limits the rate of logging messages.

The first message is logged immediately. Subsequent messages are ignored until the
rate limit interval has elapsed. After that the next request goes through, and so on.

This allows a new outage to be reported immediately and subsequent logs to be
rate-limited.

When an outage has been resolved, the `reset()` method may be used to reset the
logger and the next message will get logged immediately.
"""

def __init__(
self,
logger: logging.Logger,
rate_limit: timedelta = DEFAULT_RATE_LIMIT,
) -> None:
"""Initialize the logger.

Args:
logger: Logger to rate-limit.
rate_limit: Time interval between two log messages.
"""
self._logger = logger
self._started: bool = False
self._last_log_time: datetime | None = None
self._rate_limit: timedelta = rate_limit

def set_rate_limit(self, rate_limit: timedelta) -> None:
"""Set the rate limit for the logger.

Args:
rate_limit: Time interval between two log messages.
"""
self._rate_limit = rate_limit
Comment on lines +55 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just making it public?

Suggested change
self._rate_limit: timedelta = rate_limit
def set_rate_limit(self, rate_limit: timedelta) -> None:
"""Set the rate limit for the logger.
Args:
rate_limit: Time interval between two log messages.
"""
self._rate_limit = rate_limit
self.rate_limit: timedelta = rate_limit
"""The rate limit for the logger."""


def is_limiting(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_limiting(self) -> bool:
@property
def is_limiting(self) -> bool:

"""Return whether rate limiting is active.

This is true when a previous message has already been logged, and can be reset
by calling the `reset()` method.
"""
return self._started

def reset(self) -> None:
"""Reset the logger to healthy state."""
self._started = False
self._last_log_time = None

def log( # pylint: disable=too-many-arguments
self,
level: int,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log a message.

Args:
level: Log level.
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
if self._rate_limit is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check? It can't be None, right?

self._logger.log(
level,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)
return

current_time = datetime.now()
if (
not self._started
or self._last_log_time is None
or (current_time - self._last_log_time) >= self._rate_limit
):
self._logger.log(
level,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)
self._last_log_time = current_time
self._started = True

def info( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log an info message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.log(
logging.INFO,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)

def debug( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log a debug message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.log(
logging.DEBUG,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)

def warning( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log a warning message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.log(
logging.WARNING,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)

def critical( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log a critical message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.log(
logging.CRITICAL,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)

def error( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = None,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log an error message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.log(
logging.ERROR,
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)

def exception( # pylint: disable=too-many-arguments
self,
msg: object,
*args: object,
exc_info: _ExcInfoType = True,
stack_info: bool = False,
stacklevel: int = 1,
extra: Mapping[str, object] | None = None,
) -> None:
"""Log an exception message.

Args:
msg: Log message.
*args: Arguments for the log message.
exc_info: Exception information.
stack_info: Stack information.
stacklevel: Stack level.
extra: Extra information.
"""
self.error(
msg,
*args,
exc_info=exc_info,
stack_info=stack_info,
stacklevel=stacklevel,
extra=extra,
)
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,6 @@ def _check_request(
return Error(request=request, msg="Empty battery IDs in the request")

for battery in request.component_ids:
_logger.debug("Checking battery %d", battery)
if battery not in self._battery_caches:
msg = (
f"No battery {battery}, available batteries: "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import math
from abc import ABC, abstractmethod
from collections.abc import Iterable
from datetime import datetime, timezone
from datetime import datetime, timedelta, timezone
from typing import Any, Generic, Self, TypeVar

from frequenz.channels import ChannelClosedError, Receiver
Expand All @@ -22,6 +22,7 @@
InverterData,
)

from ..._internal import _logging
from ..._internal._asyncio import AsyncConstructible
from ..._internal._constants import MAX_BATTERY_DATA_AGE_SEC
from ...microgrid import connection_manager
Expand All @@ -33,6 +34,11 @@

_logger = logging.getLogger(__name__)

_missing_data_logger = _logging.RateLimitedLogger(
_logger,
timedelta(minutes=5),
)

T = TypeVar("T", bound=ComponentData)
"""Type variable for component data."""

Expand Down Expand Up @@ -120,6 +126,12 @@ async def fetch_next(self) -> ComponentMetricsData | None:
data = await asyncio.wait_for(
self._receiver.receive(), self._max_waiting_time
)
if _missing_data_logger.is_limiting():
_missing_data_logger.reset()
_missing_data_logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()
Comment on lines +130 to +134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just log using the regular _logger only reset() here?

Suggested change
_missing_data_logger.reset()
_missing_data_logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()
__logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()


except ChannelClosedError:
_logger.exception(
Expand All @@ -128,7 +140,9 @@ async def fetch_next(self) -> ComponentMetricsData | None:
return None
except asyncio.TimeoutError:
# Next time wait infinitely until we receive any message.
_logger.debug("Component %d stopped sending data.", self._component_id)
_missing_data_logger.debug(
"Component %d stopped sending data.", self._component_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase because when it is repeated, one might think that it just stopped sending data now. Also since we have instances where the problem was in the data pipeline and not the component itself, maybe is more accurate to say we are not receiving data.

Suggested change
"Component %d stopped sending data.", self._component_id
"No data received for component %d.", self._component_id

)
return ComponentMetricsData(
self._component_id, datetime.now(tz=timezone.utc), {}
)
Expand Down
Loading