-
-
Notifications
You must be signed in to change notification settings - Fork 18
Data platform Forecast page #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
peterdudfield
wants to merge
60
commits into
main
Choose a base branch
from
data-platform
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
29249f5
first bit of work
peterdudfield cc50553
add to readme
peterdudfield 9803a66
add to todo list
peterdudfield 4779ec0
add todo
peterdudfield b159482
stream forecast more effeciently
peterdudfield cbb10ea
tidy up observations
peterdudfield 46c7792
tidy up time window
peterdudfield fb83cc4
dp 0.13.1
peterdudfield 5ee30e2
Us Watts not %
peterdudfield ce8497b
load 30 days of data
peterdudfield 834581c
add metrics table
peterdudfield 65bf1e2
add data caching
peterdudfield 963833d
move back to 7 days
peterdudfield 72ca134
add caching
peterdudfield bda0da3
move data to new file
peterdudfield 75ab761
add colours to main plot
peterdudfield dc1b91f
update import
peterdudfield a111be7
scale by units and add colours
peterdudfield 5a04423
add probablistic
peterdudfield d538064
add forecast type options, add daily MAE options
peterdudfield 7791489
add daily ME
peterdudfield 095d495
add two todos
peterdudfield d10135f
solve for different forecast versions
peterdudfield 2a57b66
remove from todo
peterdudfield 85dd00a
add legendgroup
peterdudfield 6d0b280
filter on pvlive_day_after
peterdudfield fa6af55
add todo bug not releasing cache
peterdudfield d3bdaf6
refactor into multiple files
peterdudfield 2d6ad59
increase forecast window to 30 days
peterdudfield ccf2c87
add init files
peterdudfield 144ccf4
fix import
peterdudfield 10bbf6e
add more todos
peterdudfield a0faf6b
add TODOs
peterdudfield 1886cb5
use MW by default on UK-National
peterdudfield 5cf060c
add gsp id to name
peterdudfield c427a77
reduce to 7 days
peterdudfield b0dd9ae
fix for MAE plot
peterdudfield e5b137a
have option to show sem
peterdudfield c83446e
forecast vs actual
peterdudfield 1c0a0b3
remove duplicate in daily MAE plot
peterdudfield c47f8b1
minus 1 sec, so we dont get obsevervations on the next day
peterdudfield 6ceaad3
tidy
peterdudfield 91f60aa
option for aligning t0s
peterdudfield 4b30bdb
MAE plot link to 0
peterdudfield 9603b9c
try to sort cache issue out
peterdudfield b861b7d
add select t0s from forecast
peterdudfield 839cd21
tidy
peterdudfield 86c6f8c
add todo
peterdudfield 44b08ba
cache more functions
peterdudfield ab92c25
ruff
peterdudfield 0f4058e
Feedback, add details
peterdudfield 3af7531
robustness against no forecast data
peterdudfield 4a0ff33
release cache data every 5 mins
peterdudfield 6978935
PR comments
peterdudfield 2b2da5b
add option for strict forecast filtering
peterdudfield b1bee0b
tidy
peterdudfield 50e0ae2
Pr commens, use agg better
peterdudfield 5232456
use p10_fraction, rather than p10
peterdudfield 612ebbf
add _fraction to column from other_statistics_fractions column
peterdudfield 0726b10
lint
peterdudfield File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| """Cache utilities for the forecast module.""" | ||
|
|
||
| from datetime import UTC, datetime, timedelta | ||
|
|
||
| from dp_sdk.ocf import dp | ||
|
|
||
| from dataplatform.forecast.constant import cache_seconds | ||
|
|
||
|
|
||
| def key_builder_remove_client(func: callable, *args: list, **kwargs: dict) -> str: | ||
| """Custom key builder that ignores the client argument for caching purposes.""" | ||
| key = f"{func.__name__}:" | ||
| for arg in args: | ||
| if not isinstance(arg, dp.DataPlatformDataServiceStub): | ||
| key += f"{arg}-" | ||
|
|
||
| for k, v in kwargs.items(): | ||
| key += f"{k}={v}-" | ||
|
|
||
| # get the time now to the closest 5 minutes, this forces a new cache every 5 minutes | ||
| current_time = datetime.now(UTC).replace(second=0, microsecond=0) | ||
| current_time = current_time - timedelta( | ||
| minutes=current_time.minute % (int(cache_seconds / 60)), | ||
| ) | ||
| key += f"time={current_time}-" | ||
|
|
||
| return key |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| """Constants for the forecast module.""" | ||
|
|
||
| colours = [ | ||
| "#FFD480", | ||
| "#FF8F73", | ||
| "#4675C1", | ||
| "#65B0C9", | ||
| "#58B0A9", | ||
| "#FAA056", | ||
| "#306BFF", | ||
| "#FF4901", | ||
| "#B701FF", | ||
| "#17E58F", | ||
| ] | ||
|
|
||
| metrics = { | ||
| "MAE": "MAE is absolute mean error, average(abs(forecast-actual))", | ||
| "ME": "ME is mean (bias) error, average((forecast-actual))", | ||
| } | ||
|
|
||
| cache_seconds = 300 # 5 minutes | ||
|
|
||
| # This is used for a specific case for the UK National and GSP | ||
| observer_names = ["pvlive_in_day", "pvlive_day_after"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,243 @@ | ||
| """Functions to get forecast and observation data from Data Platform.""" | ||
|
|
||
| import time | ||
| from datetime import datetime, timedelta | ||
|
|
||
| import betterproto | ||
| import pandas as pd | ||
| from aiocache import Cache, cached | ||
| from dp_sdk.ocf import dp | ||
|
|
||
| from dataplatform.forecast.cache import key_builder_remove_client | ||
| from dataplatform.forecast.constant import cache_seconds, observer_names | ||
|
|
||
|
|
||
| async def get_forecast_data( | ||
| client: dp.DataPlatformDataServiceStub, | ||
| location: dp.ListLocationsResponseLocationSummary, | ||
| start_date: datetime, | ||
| end_date: datetime, | ||
| selected_forecasters: list[dp.Forecaster], | ||
| ) -> pd.DataFrame: | ||
| """Get forecast data for the given location and time window.""" | ||
| all_data_df = [] | ||
|
|
||
| for forecaster in selected_forecasters: | ||
| forecaster_data_df = await get_forecast_data_one_forecaster( | ||
| client, | ||
| location, | ||
| start_date, | ||
| end_date, | ||
| forecaster, | ||
| ) | ||
| if forecaster_data_df is not None: | ||
| all_data_df.append(forecaster_data_df) | ||
|
|
||
| all_data_df = pd.concat(all_data_df, ignore_index=True) | ||
|
|
||
| all_data_df["effective_capacity_watts"] = all_data_df["effective_capacity_watts"].astype(float) | ||
|
|
||
| # get watt value | ||
| all_data_df["p50_watts"] = all_data_df["p50_fraction"] * all_data_df["effective_capacity_watts"] | ||
|
|
||
| for col in ["p10", "p25", "p75", "p90"]: | ||
| col_fraction = f"{col}_fraction" | ||
| if col_fraction in all_data_df.columns: | ||
| all_data_df[f"{col}_watts"] = ( | ||
| all_data_df[col_fraction] * all_data_df["effective_capacity_watts"] | ||
| ) | ||
|
|
||
| return all_data_df | ||
|
|
||
|
|
||
| @cached(ttl=cache_seconds, cache=Cache.MEMORY, key_builder=key_builder_remove_client) | ||
| async def get_forecast_data_one_forecaster( | ||
| client: dp, | ||
| location: dp.ListLocationsResponseLocationSummary, | ||
| start_date: datetime, | ||
| end_date: datetime, | ||
| selected_forecaster: dp.Forecaster, | ||
| ) -> pd.DataFrame | None: | ||
| """Get forecast data for one forecaster for the given location and time window.""" | ||
| all_data_list_dict = [] | ||
|
|
||
| # Grab all the data, in chunks of 30 days to avoid too large requests | ||
| temp_start_date = start_date | ||
| while temp_start_date <= end_date: | ||
| temp_end_date = min(temp_start_date + timedelta(days=30), end_date) | ||
|
|
||
| # fetch data | ||
| stream_forecast_data_request = dp.StreamForecastDataRequest( | ||
| location_uuid=location.location_uuid, | ||
| energy_source=dp.EnergySource.SOLAR, | ||
| time_window=dp.TimeWindow( | ||
| start_timestamp_utc=temp_start_date, | ||
| end_timestamp_utc=temp_end_date, | ||
| ), | ||
| forecasters=[selected_forecaster], | ||
| ) | ||
| forecasts = [] | ||
| async for chunk in client.stream_forecast_data(stream_forecast_data_request): | ||
| forecasts.append( | ||
| chunk.to_dict(include_default_values=True, casing=betterproto.Casing.SNAKE), | ||
| ) | ||
|
|
||
| if len(forecasts) > 0: | ||
| all_data_list_dict.extend(forecasts) | ||
|
|
||
| temp_start_date = temp_start_date + timedelta(days=30) | ||
|
|
||
| all_data_df = pd.DataFrame.from_dict(all_data_list_dict) | ||
| if len(all_data_df) == 0: | ||
| return None | ||
|
|
||
| # get plevels into columns and rename them 'fraction | ||
| columns_before_expand = set(all_data_df.columns) | ||
| all_data_df = all_data_df.pipe( | ||
| lambda df: df.join(pd.json_normalize(df["other_statistics_fractions"])), | ||
| ).drop("other_statistics_fractions", axis=1) | ||
| new_columns = set(all_data_df.columns) - columns_before_expand | ||
| if len(new_columns) > 0: | ||
| all_data_df = all_data_df.rename(columns={col: f"{col}_fraction" for col in new_columns}) | ||
|
|
||
| # create column forecaster_name, its forecaster_fullname with version removed | ||
| all_data_df["forecaster_name"] = all_data_df["forecaster_fullname"].apply( | ||
| lambda x: x.rsplit(":", 1)[0], # split from right, max 1 split | ||
| ) | ||
|
|
||
| return all_data_df | ||
|
|
||
|
|
||
| @cached(ttl=cache_seconds, cache=Cache.MEMORY, key_builder=key_builder_remove_client) | ||
| async def get_all_observations( | ||
| client: dp.DataPlatformDataServiceStub, | ||
| location: dp.ListLocationsResponseLocationSummary, | ||
| start_date: datetime, | ||
| end_date: datetime, | ||
| ) -> pd.DataFrame: | ||
| """Get all observations for the given location and time window.""" | ||
| all_observations_df = [] | ||
|
|
||
| for observer_name in observer_names: | ||
| # Get all the observations for this observer_name, in chunks of 7 days | ||
| observation_one_df = [] | ||
| temp_start_date = start_date | ||
| while temp_start_date <= end_date: | ||
| temp_end_date = min(temp_start_date + timedelta(days=7), end_date) | ||
|
|
||
| get_observations_request = dp.GetObservationsAsTimeseriesRequest( | ||
| observer_name=observer_name, | ||
| location_uuid=location.location_uuid, | ||
| energy_source=dp.EnergySource.SOLAR, | ||
| time_window=dp.TimeWindow(temp_start_date, temp_end_date), | ||
| ) | ||
| get_observations_response = await client.get_observations_as_timeseries( | ||
| get_observations_request, | ||
| ) | ||
|
|
||
| observations = [] | ||
| for chunk in get_observations_response.values: | ||
| observations.append( | ||
| chunk.to_dict(include_default_values=True, casing=betterproto.Casing.SNAKE), | ||
| ) | ||
|
|
||
| observation_one_df.append(pd.DataFrame.from_dict(observations)) | ||
|
|
||
| temp_start_date = temp_start_date + timedelta(days=7) | ||
|
|
||
| observation_one_df = pd.concat(observation_one_df, ignore_index=True) | ||
| observation_one_df = observation_one_df.sort_values(by="timestamp_utc") | ||
| observation_one_df["observer_name"] = observer_name | ||
|
|
||
| all_observations_df.append(observation_one_df) | ||
|
|
||
| all_observations_df = pd.concat(all_observations_df, ignore_index=True) | ||
|
|
||
| all_observations_df["effective_capacity_watts"] = all_observations_df[ | ||
| "effective_capacity_watts" | ||
| ].astype(float) | ||
|
|
||
| all_observations_df["value_watts"] = ( | ||
| all_observations_df["value_fraction"] * all_observations_df["effective_capacity_watts"] | ||
| ) | ||
| all_observations_df["timestamp_utc"] = pd.to_datetime(all_observations_df["timestamp_utc"]) | ||
|
|
||
| return all_observations_df | ||
|
|
||
|
|
||
| async def get_all_data( | ||
| client: dp.DataPlatformDataServiceStub, | ||
| selected_location: dp.ListLocationsResponseLocationSummary, | ||
| start_date: datetime, | ||
| end_date: datetime, | ||
| selected_forecasters: list[dp.Forecaster], | ||
| ) -> dict: | ||
| """Get all forecast and observation data, and merge them.""" | ||
| # get generation data | ||
| time_start = time.time() | ||
| all_observations_df = await get_all_observations( | ||
| client, | ||
| selected_location, | ||
| start_date, | ||
| end_date, | ||
| ) | ||
| observation_seconds = time.time() - time_start | ||
|
|
||
| # get forcast all data | ||
| time_start = time.time() | ||
| all_forecast_data_df = await get_forecast_data( | ||
| client, | ||
| selected_location, | ||
| start_date, | ||
| end_date, | ||
| selected_forecasters, | ||
| ) | ||
| forecast_seconds = time.time() - time_start | ||
|
|
||
| # If the observation data includes pvlive_day_after and pvlive_in_day, | ||
| # then lets just take pvlive_day_after | ||
| one_observations_df = all_observations_df.copy() | ||
| if "pvlive_day_after" in all_observations_df["observer_name"].values: | ||
| one_observations_df = all_observations_df[ | ||
| all_observations_df["observer_name"] == "pvlive_day_after" | ||
| ] | ||
|
|
||
| # make target_timestamp_utc | ||
| all_forecast_data_df["init_timestamp"] = pd.to_datetime(all_forecast_data_df["init_timestamp"]) | ||
| all_forecast_data_df["target_timestamp_utc"] = all_forecast_data_df[ | ||
| "init_timestamp" | ||
| ] + pd.to_timedelta(all_forecast_data_df["horizon_mins"], unit="m") | ||
|
|
||
| # take the foecast data, and group by horizonMins, forecasterFullName | ||
| # calculate mean absolute error between p50Fraction and observations valueFraction | ||
| merged_df = pd.merge( | ||
| all_forecast_data_df, | ||
| one_observations_df, | ||
| left_on=["target_timestamp_utc"], | ||
| right_on=["timestamp_utc"], | ||
| how="inner", | ||
| suffixes=("_forecast", "_observation"), | ||
| ) | ||
|
|
||
| # error and absolute error | ||
| merged_df["error"] = merged_df["p50_watts"] - merged_df["value_watts"] | ||
| merged_df["absolute_error"] = merged_df["error"].abs() | ||
|
|
||
| return { | ||
| "merged_df": merged_df, | ||
| "all_forecast_data_df": all_forecast_data_df, | ||
| "all_observations_df": all_observations_df, | ||
| "forecast_seconds": forecast_seconds, | ||
| "observation_seconds": observation_seconds, | ||
| } | ||
|
|
||
|
|
||
| def align_t0(merged_df: pd.DataFrame) -> pd.DataFrame: | ||
| """Align t0 forecasts for different forecasters.""" | ||
| # number of unique forecasters | ||
| num_forecasters = merged_df["forecaster_name"].nunique() | ||
| # Count number of forecasters that have each t0 time | ||
| counts = merged_df.groupby("init_timestamp")["forecaster_name"].nunique() | ||
| # Filter to just those t0s that all forecasters have | ||
| common_t0s = counts[counts == num_forecasters].index | ||
| return merged_df[merged_df["init_timestamp"].isin(common_t0s)] | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not put the p50 in here and avoid the duplicated lines above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I see the the column name is
p50_fractionbut the other quantiles don't have thefractionsuffix, but I'm confused why this should be different for the different since it looks like the other quantiles are also fractions since they are multiplied by the capacity hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this needs to be updated in here and here, to udpate that. Ill make some github issues for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openclimatefix/uk-pv-forecast-blend#74
and openclimatefix/uk-pvnet-app#374