Skip to content

Conversation

@IamLRBA
Copy link
Contributor

@IamLRBA IamLRBA commented Jul 16, 2025

Description

This PR adds tracking of CPU and RAM utilization percentages to complement the existing energy measurements. The implementation uses psutil (which is already a project dependency) to collect:

  • cpu_utilization_percent: Current CPU usage percentage
  • ram_utilization_percent: Current RAM usage percentage
  • ram_used_gb: Current RAM usage in GB

The main changes I made are:

  1. In core/cpu.py:
  • Added CPU and RAM utilization tracking using psutil in both IntelPowerGadget and IntelRAPL classes
  • Added cpu_utilization_percent, ram_utilization_percent, and ram_used_gb to the returned metrics
  1. In emissions_tracker.py:
  • Added cpu_utilization_percent, ram_utilization_percent, and ram_used_gb to the EmissionsData object in _prepare_emissions_data()

These metrics are are collected using psutil and are now included in the emissions data output alongside the existing power measurements.

Related Issue

This PR resolves: #885

Motivation and Context

The change was requested to provide more detailed system resource utilization metrics alongside energy consumption data. This helps users:

  • Better understand the relationship between resource usage and energy consumption
  • Identify potential performance bottlenecks
  • Monitor system health during long-running experiments

How Has This Been Tested?

The changes were tested by:

  1. Running the tracker on different platforms (Linux, Windows, macOS)
  2. Verifying metrics are collected correctly in all tracking modes (process/machine)
  3. Checking output files/API to ensure new metrics are included
  4. Running existing test suite to ensure no regressions

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • New feature (non-breaking change which adds functionality)

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

cc @benoit-cty, @sebasmos

@sebasmos
Copy link

This is great stuff. Just a doubt, is this measuring RAM or vRAM?

@benoit-cty
Copy link
Contributor

Hello @IamLRBA,
Thanks !

For the failing pre-commit, have a look to https://github.com/mlco2/codecarbon/blob/master/CONTRIBUTING.md#coding-style--linting

It seems you removed empty line at end of file, it's not something we want : https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline

I will be off for the project for some days, so I can't test your contribution.

@sebasmos The code use virtual_memory() which is the physical memory on the motherboard, called DRAM, it's not the VRAM of the GPU. This could be done with nvmlDeviceGetMemoryInfo(handle).

@IamLRBA
Copy link
Contributor Author

IamLRBA commented Aug 1, 2025

Hello @IamLRBA, Thanks !

For the failing pre-commit, have a look to https://github.com/mlco2/codecarbon/blob/master/CONTRIBUTING.md#coding-style--linting

It seems you removed empty line at end of file, it's not something we want : https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline

I will be off for the project for some days, so I can't test your contribution.

@sebasmos The code use virtual_memory() which is the physical memory on the motherboard, called DRAM, it's not the VRAM of the GPU. This could be done with nvmlDeviceGetMemoryInfo(handle).

@benoit-cty I think I have solved the problems which were requested to be changed.

@benoit-cty benoit-cty requested a review from a team as a code owner November 16, 2025 18:12
@benoit-cty
Copy link
Contributor

Sorry for the delay.
Before merge, we need:

  • Add unit test
  • Update documentation
  • Add this metrics to the API. What is your feeling @SabAmine ? Could we merge without the API support ?

@SaboniAmine
Copy link
Member

The api client is not impacted so we can merge, it's ok for the API. We can implement it, if needed, in a future PR, but it's good for me now.
Maybe a test can be useful for this PR

@benoit-cty
Copy link
Contributor

Hello @IamLRBA ,

I made some improvement while testing Google Antigravity.

Implementation Plan: Improve CPU and RAM Utilization Tracking

Problem

The current implementation collects CPU and RAM utilization metrics at a single point in time when _prepare_emissions_data() is called. This provides instantaneous values that may not accurately represent the average resource usage during the measurement period. The user suggests computing averages over time by collecting these metrics in the _monitor_power() method, which runs every 1 second.

Background Context

Currently:

CPU and RAM utilization are collected in IntelPowerGadget.get_cpu_details() and IntelRAPL.get_cpu_details() in core/cpu.py These instantaneous values are used directly in _prepare_emissions_data() in emissions_tracker.py

The _monitor_power() method already runs every 1 second to collect power measurements for hardware that doesn't support energy monitoring
The CPU class already has a _power_history list to track power measurements over time

Proposed Changes

[MODIFY]

emissions_tracker.py
Add instance variables for tracking utilization history:

  • Add _cpu_utilization_history: List[float] = [] to store CPU utilization percentages
  • Add _ram_utilization_history: List[float] = [] to store RAM utilization percentages
  • Add _ram_used_history: List[float] = [] to store RAM usage in GB

Update

_monitor_power() method (lines 801-810):

  • Collect CPU utilization using psutil.cpu_percent()
  • Collect RAM utilization using psutil.virtual_memory().percent
  • Collect RAM usage using psutil.virtual_memory().used / (1024**3)
  • Append these values to the respective history lists

Update

_prepare_emissions_data()
method (lines 675-768):

  • Compute average CPU utilization from _cpu_utilization_history
  • Compute average RAM utilization from _ram_utilization_history
  • Compute average RAM usage from _ram_used_history
  • Use these averaged values when creating the EmissionsData object (lines 763-765)
  • Handle edge case when history lists are empty (use 0 or current value)

Clear history on tracker start/reset:

  • Clear utilization history lists in the start() method
  • Clear utilization history lists in the start_task() method

[MODIFY]

emissions_data.py Add fields to EmissionsData dataclass :

  • cpu_utilization_percent: float = 0
  • ram_utilization_percent: float = 0
  • ram_used_gb: float = 0

Add fields to TaskEmissionsData dataclass :

  • cpu_utilization_percent: float = 0
  • ram_utilization_percent: float = 0
  • ram_used_gb: float = 0

Update

compute_delta_emission()
method:

These utilization metrics represent averages over the delta period, so they should not be subtracted
Keep the values as-is (they already represent the delta period)

Add CPU, GPU, and RAM Utilization to Database and API

Overview

Based on the last two commits that added cpu_utilization_percent, gpu_utilization_percent, and ram_utilization_percent to the

EmissionsData and TaskEmissionsData classes, this implementation will extend these fields to the database schema and API layer.

Proposed Changes

Database Models

[MODIFY]

sql_models.py
Add three new columns to the Emission model:

  • cpu_utilization_percent (Float, nullable=True)
  • gpu_utilization_percent (Float, nullable=True)
  • ram_utilization_percent (Float, nullable=True)

These will be added after the existing power/energy fields.

API Schemas

[MODIFY]

schemas.py
EmissionBase schema: Add three utilization fields with validation (must be >= 0 and <= 100)

Report schemas: Add the three utilization fields to:

  • RunReport
  • ExperimentReport
  • ProjectReport
  • OrganizationReport
  • Repository Layer

[MODIFY]

repository_emissions.py
Update two methods:

  • add_emission: Include the three utilization fields when creating the database emission object
  • map_sql_to_schema: Include the three utilization fields when mapping from SQL model to Pydantic schema

Database Migration

[NEW] Migration file in versions/
Create a new Alembic migration to add the three columns to the emissions table with:

  • Default value of NULL (nullable)
  • Float type
  • Proper upgrade and downgrade functions

IamLRBA and others added 5 commits November 19, 2025 19:12
- Implement CPU and RAM utilization percentage tracking using psutil
- Add cpu_utilization_percent, ram_utilization_percent, and ram_used_gb metrics
- Include utilization metrics in emissions data output
- Update IntelPowerGadget and IntelRAPL classes to collect utilization data

Resolves mlco2#885
@IamLRBA
Copy link
Contributor Author

IamLRBA commented Nov 20, 2025

Hello @IamLRBA ,

I made some improvement while testing Google Antigravity.

Implementation Plan: Improve CPU and RAM Utilization Tracking

Problem

The current implementation collects CPU and RAM utilization metrics at a single point in time when _prepare_emissions_data() is called. This provides instantaneous values that may not accurately represent the average resource usage during the measurement period. The user suggests computing averages over time by collecting these metrics in the _monitor_power() method, which runs every 1 second.

Background Context

Currently:

CPU and RAM utilization are collected in IntelPowerGadget.get_cpu_details() and IntelRAPL.get_cpu_details() in core/cpu.py These instantaneous values are used directly in _prepare_emissions_data() in emissions_tracker.py

The _monitor_power() method already runs every 1 second to collect power measurements for hardware that doesn't support energy monitoring The CPU class already has a _power_history list to track power measurements over time

Proposed Changes

[MODIFY]

emissions_tracker.py Add instance variables for tracking utilization history:

* Add _cpu_utilization_history: List[float] = [] to store CPU utilization percentages

* Add _ram_utilization_history: List[float] = [] to store RAM utilization percentages

* Add _ram_used_history: List[float] = [] to store RAM usage in GB

Update

_monitor_power() method (lines 801-810):

* Collect CPU utilization using psutil.cpu_percent()

* Collect RAM utilization using psutil.virtual_memory().percent

* Collect RAM usage using psutil.virtual_memory().used / (1024**3)

* Append these values to the respective history lists

Update

_prepare_emissions_data() method (lines 675-768):

* Compute average CPU utilization from _cpu_utilization_history

* Compute average RAM utilization from _ram_utilization_history

* Compute average RAM usage from _ram_used_history

* Use these averaged values when creating the EmissionsData object (lines 763-765)

* Handle edge case when history lists are empty (use 0 or current value)

Clear history on tracker start/reset:

* Clear utilization history lists in the start() method

* Clear utilization history lists in the start_task() method

[MODIFY]

emissions_data.py Add fields to EmissionsData dataclass :

* cpu_utilization_percent: float = 0

* ram_utilization_percent: float = 0

* ram_used_gb: float = 0

Add fields to TaskEmissionsData dataclass :

* cpu_utilization_percent: float = 0

* ram_utilization_percent: float = 0

* ram_used_gb: float = 0

Update

compute_delta_emission() method:

These utilization metrics represent averages over the delta period, so they should not be subtracted Keep the values as-is (they already represent the delta period)

Add CPU, GPU, and RAM Utilization to Database and API

Overview

Based on the last two commits that added cpu_utilization_percent, gpu_utilization_percent, and ram_utilization_percent to the

EmissionsData and TaskEmissionsData classes, this implementation will extend these fields to the database schema and API layer.

Proposed Changes

Database Models

[MODIFY]

sql_models.py Add three new columns to the Emission model:

* cpu_utilization_percent (Float, nullable=True)

* gpu_utilization_percent (Float, nullable=True)

* ram_utilization_percent (Float, nullable=True)

These will be added after the existing power/energy fields.

API Schemas

[MODIFY]

schemas.py EmissionBase schema: Add three utilization fields with validation (must be >= 0 and <= 100)

Report schemas: Add the three utilization fields to:

* RunReport

* ExperimentReport

* ProjectReport

* OrganizationReport

* Repository Layer

[MODIFY]

repository_emissions.py Update two methods:

* add_emission: Include the three utilization fields when creating the database emission object

* map_sql_to_schema: Include the three utilization fields when mapping from SQL model to Pydantic schema

Database Migration

[NEW] Migration file in versions/ Create a new Alembic migration to add the three columns to the emissions table with:

* Default value of NULL (nullable)

* Float type

* Proper upgrade and downgrade functions

Hi @benoit-cty,

Thanks a lot for your detailed comment and for helping me to work on this!

I've read through your implementation plan, and it makes perfect sense. I really appreciate you building on the initial PR and fleshing out the complete picture, especially the parts about storing the averages in the data classes, updating the database schema, and adding it to the API.

It's great to see this feature being integrated so thoroughly. Please let me know if there's anything I can do to help test the new changes or assist in any other way.

Looking forward to seeing this merged!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store RAM and CPU utilization, not only energy

4 participants