Skip to content

Added PatchTSMixer model#404

Open
Jasmine-Yuting-Zhang wants to merge 34 commits intomainfrom
PatchTSMixer
Open

Added PatchTSMixer model#404
Jasmine-Yuting-Zhang wants to merge 34 commits intomainfrom
PatchTSMixer

Conversation

@Jasmine-Yuting-Zhang
Copy link
Collaborator

This PR introduces support for the PatchTSMixer model in the Plato federated learning framework for time series forecasting tasks.

Description

Specifically, this PR:

  • Added ETT.py to support the Electricity Transformer dataset, including data loading, preprocessing, and federated partitioning logic.
  • Integrated the PatchTSMixer model architecture in HuggingFace for time series forecasting within Plato.
  • Added TOML configuration files for PatchTSMixer experiments under configs/TimeSeries/.
  • Added mean squared error (MSE)–based evaluation for PatchTSMixer experiments.

How has this been tested?

Quick check evaluation:

uv run python plato.py --config configs/TimeSeries/patchtsmixer_custom.toml

This configuration runs only 3 rounds, which is useful for quick functional tests and CORE-style checks. The run completed successfully without runtime errors.

Longer training run:

uv run python plato.py --config configs/TimeSeries/patchtsmixer_large.toml

This configuration uses more rounds. After 400 rounds, the MSE dropped from 7.14 to around 1.30, indicating that the model and data pipeline are working as expected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue) Fixes #
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code has been formatted using the Ruff formatter (ruff format) and checked using the Ruff linter (ruff check --fix).
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

baochunli and others added 30 commits October 28, 2025 08:51
- Resolved a RuntimeError caused by non-contiguous tensors during view operations (in nanochat - gpt.py):
"view size is not compatible with input tensor's size and stride...". Replaced .view() with .reshape()
- Resolved an issue where the configuration requested 'train_loss' in the results, but the server's get_logged_items() did not include it.
- To avoid vocabulary size mismatch between model and tokenizer during CORE evaluation.
- Updated log message from "global accuracy" to "Average Centered CORE benchmark metric"
- Used ruff to format code
- Added instructions for initializing submodules and resolving maturin build failure.
- Included configurations for both pre-trained and custom modes.
@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for platodocs canceled.

Name Link
🔨 Latest commit 20ab574
🔍 Latest deploy log https://app.netlify.com/projects/platodocs/deploys/692f507818245f0008203aed

- Used Open-Meteo Archive API for hourly inputs.
- Interpolated to 5-min resolution with a linear method.
- Added TOML config files (tunable for better results).
- Formatted code with ruff.
@netlify
Copy link

netlify bot commented Feb 3, 2026

Deploy Preview for platodocs ready!

Name Link
🔨 Latest commit e76f09a
🔍 Latest deploy log https://app.netlify.com/projects/platodocs/deploys/6981786b87c6af00084cf859
😎 Deploy Preview https://deploy-preview-404--platodocs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

)
logging.info(
"Location: lat=%.2f, lon=%.2f, historical_days=%d",
latitude,

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (private)
as clear text.
This expression logs
sensitive data (private)
as clear text.

Copilot Autofix

AI 4 days ago

In general, the fix is to avoid logging sensitive data such as raw geographic coordinates. Instead, log a non‑sensitive label or a redacted/generalized form that still provides observability without exposing private information.

Concretely for plato/datasources/openmeteo.py, we should change the logging.info call that currently logs lat=%.2f, lon=%.2f, historical_days=%d with latitude, longitude, and historical_days. The simplest safe approach that preserves intent is to stop logging the numeric coordinates and keep only non‑sensitive context such as location_name (already logged in the previous logging.info) and historical_days. For example, we can log "Location configuration: historical_days=%d" or "Location configuration: name=%s, historical_days=%d" using location_name instead of coordinates. This keeps functionality identical; only the log message changes.

No new imports or helper methods are required; we just modify the existing log statement in that file/region.

Suggested changeset 1
plato/datasources/openmeteo.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/plato/datasources/openmeteo.py b/plato/datasources/openmeteo.py
--- a/plato/datasources/openmeteo.py
+++ b/plato/datasources/openmeteo.py
@@ -142,9 +142,8 @@
             task_config["description"],
         )
         logging.info(
-            "Location: lat=%.2f, lon=%.2f, historical_days=%d",
-            latitude,
-            longitude,
+            "Location configuration: name=%s, historical_days=%d",
+            location_name,
             historical_days,
         )
         logging.info("Variables: %s", ", ".join(variables))
EOF
@@ -142,9 +142,8 @@
task_config["description"],
)
logging.info(
"Location: lat=%.2f, lon=%.2f, historical_days=%d",
latitude,
longitude,
"Location configuration: name=%s, historical_days=%d",
location_name,
historical_days,
)
logging.info("Variables: %s", ", ".join(variables))
Copilot is powered by AI and may make mistakes. Always verify output.
logging.info(
"Location: lat=%.2f, lon=%.2f, historical_days=%d",
latitude,
longitude,

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (private)
as clear text.
This expression logs
sensitive data (private)
as clear text.

Copilot Autofix

AI 4 days ago

In general, to fix clear-text logging of sensitive information, either stop logging the sensitive fields entirely, or sanitize them so that only non-sensitive/less sensitive derivatives (e.g., coarse-grained, masked, or redacted values) are logged. The rest of the functionality (in this case, fetching weather data based on actual coordinates) should continue to use the full-precision values; only the log output should change.

Here, the best minimal fix is to avoid logging the raw latitude and longitude in clear text while preserving useful diagnostic context. We can do this by:

  • Removing latitude and longitude from the formatted log line, and instead
  • Logging only non-sensitive, high-level information, such as location_name, historical_days, and the selected task_type/description; or
  • If coordinates are still desired for debugging, logging a coarse/rounded or redacted version (e.g., to the nearest whole degree or replacing them with [REDACTED]).

To keep changes minimal and avoid assumptions about what is sensitive, I will treat the numeric coordinates as sensitive and remove them from the log message, while still logging historical_days. Concretely, in plato/datasources/openmeteo.py:

  • Locate the logging.info call around lines 144–149 that logs "Location: lat=%.2f, lon=%.2f, historical_days=%d" with latitude, longitude, historical_days.
  • Replace it with a log line that does not include latitude or longitude in clear text, for example: "Location configured: historical_days=%d" or "Location configured for %s: historical_days=%d" using location_name and historical_days.
  • No new imports or helper functions are needed; we only change the string and arguments of the existing log call.

This change ensures that the tainted longitude (and latitude) no longer flow into the logging sink, addressing all alert variants referencing that call, while leaving how the coordinates are used elsewhere untouched.

Suggested changeset 1
plato/datasources/openmeteo.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/plato/datasources/openmeteo.py b/plato/datasources/openmeteo.py
--- a/plato/datasources/openmeteo.py
+++ b/plato/datasources/openmeteo.py
@@ -142,9 +142,8 @@
             task_config["description"],
         )
         logging.info(
-            "Location: lat=%.2f, lon=%.2f, historical_days=%d",
-            latitude,
-            longitude,
+            "Location configured for %s: historical_days=%d",
+            location_name,
             historical_days,
         )
         logging.info("Variables: %s", ", ".join(variables))
EOF
@@ -142,9 +142,8 @@
task_config["description"],
)
logging.info(
"Location: lat=%.2f, lon=%.2f, historical_days=%d",
latitude,
longitude,
"Location configured for %s: historical_days=%d",
location_name,
historical_days,
)
logging.info("Variables: %s", ", ".join(variables))
Copilot is powered by AI and may make mistakes. Always verify output.
) -> str:
"""Generate a unique cache key based on request parameters."""
key_string = f"{latitude}_{longitude}_{start_date}_{end_date}_{'_'.join(sorted(variables))}_{target_freq}"
return hashlib.md5(key_string.encode()).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (private)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI 4 days ago

In general, to fix this kind of issue you should avoid MD5 (and other broken hashes like SHA‑1) when hashing potentially sensitive data, even if only for identifiers. Instead, use a modern, collision-resistant hash function such as SHA‑256 (for general hashing) or a dedicated password hashing scheme for credentials. For non-security uses like cache keys, SHA‑256 is a drop‑in replacement for MD5.

The single best fix here is to change _generate_cache_key in plato/utils/openmeteo_api.py to use hashlib.sha256 instead of hashlib.md5. This preserves the behavior (a deterministic hex string derived from the same input) but uses a strong hash. No other logic needs to change, and all callers will continue to work since the function still returns a hex string. We should also keep the hashlib import, since we are still using it.

Concretely:

  • In plato/utils/openmeteo_api.py, update line 29:
    • From: return hashlib.md5(key_string.encode()).hexdigest()
    • To: return hashlib.sha256(key_string.encode()).hexdigest()
  • No changes are required in plato/datasources/openmeteo.py or elsewhere.
  • No new imports or helper methods are needed; hashlib.sha256 is part of the standard library and already available via the existing import hashlib.
Suggested changeset 1
plato/utils/openmeteo_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/plato/utils/openmeteo_api.py b/plato/utils/openmeteo_api.py
--- a/plato/utils/openmeteo_api.py
+++ b/plato/utils/openmeteo_api.py
@@ -26,7 +26,7 @@
 ) -> str:
     """Generate a unique cache key based on request parameters."""
     key_string = f"{latitude}_{longitude}_{start_date}_{end_date}_{'_'.join(sorted(variables))}_{target_freq}"
-    return hashlib.md5(key_string.encode()).hexdigest()
+    return hashlib.sha256(key_string.encode()).hexdigest()
 
 
 def _get_cache_path(cache_dir: Path, cache_key: str) -> Path:
EOF
@@ -26,7 +26,7 @@
) -> str:
"""Generate a unique cache key based on request parameters."""
key_string = f"{latitude}_{longitude}_{start_date}_{end_date}_{'_'.join(sorted(variables))}_{target_freq}"
return hashlib.md5(key_string.encode()).hexdigest()
return hashlib.sha256(key_string.encode()).hexdigest()


def _get_cache_path(cache_dir: Path, cache_key: str) -> Path:
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants