Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ async def create_or_update_pipeline(
project_id: ProjectID,
product_name: ProductName,
product_api_base_url: str,
) -> DataType | None:
) -> DataType:
# NOTE https://github.com/ITISFoundation/osparc-simcore/issues/7527
settings: DirectorV2Settings = get_plugin_settings(app)

Expand Down Expand Up @@ -86,7 +86,7 @@ async def create_or_update_pipeline(
error_context={**body, "backend_url": backend_url},
)
)
return None
raise


@log_decorator(logger=_logger)
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like this "cleanup" patterns repeats again and again

# Something goes wrong with new project

if project_uuid := new_project.get("uuid"):
                await _best_effort_cleanup(...)

...

The _best_effort_cleanup pattern appears in three places within the except block, and once inline before a manual raise:

  1. except (ParentProjectNotFoundError, ParentNodeNotFoundError)
  2. except asyncio.CancelledError
  3. except Exception
  4. Inline before raise web.HTTPBadRequest (the product name mismatch case)
  • The first three are straightforward: they're all "project was already inserted, something went wrong, clean up."
  • The fourth one is the odd one out: it does cleanup and then raises an HTTPException, which normally skips cleanup (see the except web.HTTPException: raise arm). That's a bit inconsistent - it manually raises an HTTPBadRequest after cleanup, rather than letting the HTTPException handler catch it cleanly!!

Copy link
Copy Markdown
Member

@pcrespov pcrespov May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this proposal, after some iterations with claude

1. The inline product mismatch case

The issue: it manually does cleanup then raises HTTPBadRequest, bypassing the consistent except-block pattern. Fix: raise a domain exception instead, let the existing except Exception block handle cleanup, and translate to HTTP in a dedicated handler.

class _ProjectProductMismatchError(Exception):
    def __init__(self, expected, actual):
        self.expected = expected
        self.actual = actual

# inline becomes:
if _project_product_name != product_name:
    raise _ProjectProductMismatchError(product_name, _project_product_name)

# new except arm, placed before `except web.HTTPException`:
except _ProjectProductMismatchError as exc:
    if project_uuid := new_project.get("uuid"):
        await _best_effort_cleanup(app, project_uuid, user_id, simcore_user_agent, exc.actual)
    raise web.HTTPBadRequest(
        text=f"Project product name mismatch {exc.expected=} {exc.actual=}"
    )

Now cleanup always flows through the except block, never inline.


2. Consolidating the pattern

A context manager that arms itself once the project is inserted and runs cleanup on any unhandled exception:

from contextlib import asynccontextmanager

@asynccontextmanager
async def _best_effort_cleanup_project_on_error(app, user_id, simcore_user_agent, product_name, project_ref: dict):
    # project_ref is a mutable container: {"uuid": None}
    try:
        yield project_ref
    except web.HTTPException:
        raise  # pre-insertion HTTP errors: no cleanup needed
    except BaseException:
        if uuid := project_ref.get("uuid"):
            await _best_effort_cleanup(app, uuid, user_id, simcore_user_agent, product_name)
        raise

Usage in create_project:

_project_ref = {}  # armed once uuid is set

async with _best_effort_cleanup_project_on_error(app, user_id, simcore_user_agent, product_name, _project_ref):
    ...
    new_project = await _projects_repository_legacy.insert_project(...)
    _project_ref["uuid"] = new_project["uuid"]  # arm cleanup from here on
    ...

All three except cleanup arms collapse into the context manager. The except web.HTTPException: raise logic is preserved — pre-insertion HTTP errors pass through cleanly.

One caveat: the _ProjectProductMismatchError from point 1 still needs its own handler (or to be a non-HTTPException base), otherwise the except web.HTTPException: raise arm would swallow it before the context manager sees it. With the fix from point 1, it's a plain exception and gets caught by BaseException correctly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THOUGHT: _best_effort_cleanup reminds me as a loose implementation of the SAGA pattern that you used recently in other PR: the compensating action for "insert project" is "delete project." Nonetheless, the "best effort" qualifier acknowledges it's not atomic: the compensation itself can fail, which is the realistic trade-off outside of a true transaction :-)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GitHK thx for this fix. This PR interests me. I would like to talk to you about what went wrong int he creation of the pipelines. Perhaps we acn talk offline? thx

Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,26 @@

_logger = logging.getLogger(__name__)


async def _best_effort_cleanup(
app: web.Application,
project_uuid: ProjectID,
user_id: UserID,
simcore_user_agent: str,
product_name: ProductName,
) -> None:
try:
await _projects_service.submit_delete_project_task(
app=app,
project_uuid=project_uuid,
user_id=user_id,
simcore_user_agent=simcore_user_agent,
product_name=product_name,
)
except Exception: # pylint: disable=broad-except
_logger.exception("Best-effort cleanup failed for project %s", project_uuid)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_troubleshooting_log_kwargs` ?

Pls update instructions



type CopyFileCoro = Coroutine[Any, Any, None]
type CopyProjectNodesCoro = Coroutine[Any, Any, dict[NodeID, ProjectNodeCreate]]

Expand Down Expand Up @@ -444,11 +464,12 @@ async def create_project( # pylint: disable=too-many-arguments,too-many-branche
}

_project_product_name = await _projects_repository_legacy.get_project_product(project_uuid=new_project["uuid"])
assert (
_project_product_name == product_name # nosec
), "Project product name mismatch"
if _project_product_name != product_name:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls dobule check this warning

raise web.HTTPBadRequest(text=f"Project product name mismatch {product_name=} {_project_product_name=}")
if project_uuid := new_project.get("uuid"):
await _best_effort_cleanup(app, project_uuid, user_id, simcore_user_agent, _project_product_name)
raise web.HTTPBadRequest( # noqa: TRY301
text=f"Project product name mismatch {product_name=} {_project_product_name=}"
)

data = ProjectGet.from_domain_model(new_project).model_dump(**RESPONSE_MODEL_POLICY)

Expand All @@ -468,28 +489,30 @@ async def create_project( # pylint: disable=too-many-arguments,too-many-branche

except (ParentProjectNotFoundError, ParentNodeNotFoundError) as exc:
if project_uuid := new_project.get("uuid"):
await _projects_service.submit_delete_project_task(
app=app,
project_uuid=project_uuid,
user_id=user_id,
simcore_user_agent=simcore_user_agent,
product_name=product_name,
)
await _best_effort_cleanup(app, project_uuid, user_id, simcore_user_agent, product_name)
raise web.HTTPNotFound(text=f"{exc}") from exc
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not your PR but please let's avoid passing text=f"{exc}" to a raised exception! This might contain sensitive data that could potentially even reach the final user!!


except asyncio.CancelledError:
_logger.warning(
"cancelled create_project for '%s'. Cleaning up",
f"{user_id=}",
"cancelled create_project for user_id='%s'. Cleaning up",
user_id,
)
if project_uuid := new_project.get("uuid"):
await _projects_service.submit_delete_project_task(
app=app,
project_uuid=project_uuid,
user_id=user_id,
simcore_user_agent=simcore_user_agent,
product_name=product_name,
)
await _best_effort_cleanup(app, project_uuid, user_id, simcore_user_agent, product_name)
raise

except web.HTTPException:
# Pre-insertion validation HTTP errors (e.g. invalid data, not found, forbidden)
# do not need cleanup. Post-insertion cases handle their own cleanup before raising.
raise

except Exception:
_logger.exception(
"Unexpected error during create_project for user_id='%s'. Cleaning up",
user_id,
)
Comment thread
GitHK marked this conversation as resolved.
if project_uuid := new_project.get("uuid"):
await _best_effort_cleanup(app, project_uuid, user_id, simcore_user_agent, product_name)
raise
Comment thread
GitHK marked this conversation as resolved.


Expand Down
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this module I'd say that we want to suppress the error to have the same equivalent behaviour as before.
If there are reasons to let the error bubble up, let me know, otherwise I would not change it.

Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@
from ..catalog import catalog_service
from ..constants import APP_FIRE_AND_FORGET_TASKS_KEY
from ..director_v2 import director_v2_service
from ..director_v2.exceptions import DirectorV2ServiceError
from ..dynamic_scheduler import api as dynamic_scheduler_service
from ..models import ClientSessionID
from ..products import products_web
Expand Down Expand Up @@ -991,13 +992,14 @@ async def add_project_node(

# also ensure the project is updated by director-v2 since services
# are due to access comp_tasks at some point see [https://github.com/ITISFoundation/osparc-simcore/issues/3216]
await director_v2_service.create_or_update_pipeline(
request.app,
user_id,
project["uuid"],
product_name,
product_api_base_url,
)
with suppress(DirectorV2ServiceError):
await director_v2_service.create_or_update_pipeline(
request.app,
user_id,
project["uuid"],
product_name,
product_api_base_url,
)
await dynamic_scheduler_service.update_projects_networks(request.app, project_id=ProjectID(project["uuid"]))

if _is_node_dynamic(service_key):
Expand Down Expand Up @@ -1116,9 +1118,10 @@ async def delete_project_node(
await db_legacy.remove_project_node(user_id, project_uuid, NodeID(node_uuid), client_session_id=client_session_id)
# also ensure the project is updated by director-v2 since services
product_name = products_web.get_product_name(request)
await director_v2_service.create_or_update_pipeline(
request.app, user_id, project_uuid, product_name, product_api_base_url
)
with suppress(DirectorV2ServiceError):
await director_v2_service.create_or_update_pipeline(
request.app, user_id, project_uuid, product_name, product_api_base_url
)
await dynamic_scheduler_service.update_projects_networks(request.app, project_id=project_uuid)


Expand Down Expand Up @@ -1228,13 +1231,14 @@ async def patch_project_node(
)

# 4. Make calls to director-v2 to keep data in sync (ex. comp_* DB tables)
await director_v2_service.create_or_update_pipeline(
app,
user_id,
project_id,
product_name=product_name,
product_api_base_url=product_api_base_url,
)
with suppress(DirectorV2ServiceError):
await director_v2_service.create_or_update_pipeline(
app,
user_id,
project_id,
product_name=product_name,
product_api_base_url=product_api_base_url,
)
if _node_patch_exclude_unset.get("label"):
await dynamic_scheduler_service.update_projects_networks(app, project_id=project_id)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,10 @@ class ProjectWorkbenchMismatchError(StudyDispatcherError):
"Project {project_uuid} appears to be corrupted and cannot be accessed properly.",
_version=1,
)


class ProjectCreationAbortedError(StudyDispatcherError):
msg_template = user_message(
"Project {project_uuid} creation was aborted due to an unexpected error.",
_version=1,
)
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
"""

import logging
from collections.abc import AsyncIterator
from contextlib import asynccontextmanager
from pathlib import Path
from typing import NamedTuple

Expand All @@ -22,12 +24,13 @@
from pydantic import AnyUrl, HttpUrl, TypeAdapter
from servicelib.logging_utils import log_decorator

from ..projects._projects_repository_legacy import ProjectDBAPI
from ..projects._projects_service import get_project_for_user
from ..director_v2.director_v2_service import create_or_update_pipeline
from ..projects._projects_repository_legacy import PROJECT_DBAPI_APPKEY, ProjectDBAPI
from ..projects._projects_service import delete_project_by_user, get_project_for_user
from ..projects.exceptions import ProjectInvalidRightsError, ProjectNotFoundError
from ..utils import now_str
from . import _service
from ._errors import ProjectWorkbenchMismatchError
from ._errors import ProjectCreationAbortedError, ProjectWorkbenchMismatchError
from ._models import FileParams, ServiceInfo, ViewerInfo
from ._users import UserInfo

Expand Down Expand Up @@ -186,43 +189,69 @@
)


@asynccontextmanager
async def rollback_project_on_error(
app: web.Application, user_id: int, project_uuid: ProjectID, *, product_name: str
) -> AsyncIterator[None]:
Comment thread
GitHK marked this conversation as resolved.
"""Schedules best-effort project cleanup and raises ProjectCreationAbortedError on `Exception`.

The rollback path submits project deletion asynchronously and does not wait
for cleanup completion before exiting the context manager.

Task cancellations and other `BaseException` subclasses are propagated as-is
and do not trigger rollback or wrapping.
"""
try:
yield
except Exception as exc:
_logger.warning(
Comment thread
GitHK marked this conversation as resolved.
"Failed to complete project %s creation. Reverting project insertion.",
project_uuid,
exc_info=True,
)
try:
await delete_project_by_user(
app,
project_uuid=project_uuid,
user_id=user_id,
product_name=product_name,
Comment thread
GitHK marked this conversation as resolved.
wait_until_completed=False,
)
except Exception: # pylint: disable=broad-except
_logger.exception("Failed to rollback project %s during cleanup", project_uuid)
raise ProjectCreationAbortedError(project_uuid=project_uuid) from exc
Comment thread
GitHK marked this conversation as resolved.


async def _add_new_project(
app: web.Application,
project: Project,
user: UserInfo,
*,
product_name: str,
product_api_base_url: str,
):
# TODO: move this to projects_api
# TODO: this piece was taken from the end of projects.projects_handlers.create_projects

from ..director_v2.director_v2_service import create_or_update_pipeline
from ..projects._projects_repository_legacy import PROJECT_DBAPI_APPKEY

db: ProjectDBAPI = app[PROJECT_DBAPI_APPKEY]

# validated project is transform in dict via json to use only primitive types
project_in: dict = json_loads(project.model_dump_json(exclude_none=True, by_alias=True))
# NOTE: Because of legacy reasons I do not want to remove the exclude_none=True from line above
# so I need to set the templateType here if it was removed.
project_in["templateType"] = project_in.get("templateType")

# update metadata (uuid, timestamps, ownership) and save
_project_db: dict = await db.insert_project(
project_in,
user.id,
product_name=product_name,
force_as_template=False,
project_nodes=None,
)
assert _project_db["uuid"] == str(project.uuid) # nosec

# This is a new project and every new graph needs to be reflected in the pipeline db
#
# TODO: Ensure this user has access to these services!
#
await create_or_update_pipeline(app, user.id, project.uuid, product_name, product_api_base_url)
) -> None:
async with rollback_project_on_error(app, user.id, project.uuid, product_name=product_name):
# TODO: move this to projects_api # noqa: FIX002

Check warning on line 234 in services/web/server/src/simcore_service_webserver/studies_dispatcher/_projects.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Complete the task associated to this "TODO" comment.

See more on https://sonarcloud.io/project/issues?id=ITISFoundation_osparc-simcore&issues=AZ5Oq4LBfnCpIWrOnzg9&open=AZ5Oq4LBfnCpIWrOnzg9&pullRequest=9155
# TODO: this piece was taken from the end of projects.projects_handlers.create_projects # noqa: FIX002

Check warning on line 235 in services/web/server/src/simcore_service_webserver/studies_dispatcher/_projects.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Complete the task associated to this "TODO" comment.

See more on https://sonarcloud.io/project/issues?id=ITISFoundation_osparc-simcore&issues=AZ5Oq4LBfnCpIWrOnzg-&open=AZ5Oq4LBfnCpIWrOnzg-&pullRequest=9155

db: ProjectDBAPI = app[PROJECT_DBAPI_APPKEY]

# validated project is transform in dict via json to use only primitive types
project_in: dict = json_loads(project.model_dump_json(exclude_none=True, by_alias=True))
# NOTE: Because of legacy reasons I do not want to remove the exclude_none=True from line above
# so I need to set the templateType here if it was removed.
project_in["templateType"] = project_in.get("templateType")

# update metadata (uuid, timestamps, ownership) and save
_project_db: dict = await db.insert_project(
project_in,
user.id,
product_name=product_name,
force_as_template=False,
project_nodes=None,
)
assert _project_db["uuid"] == f"{project.uuid}" # nosec
await create_or_update_pipeline(app, user.id, project.uuid, product_name, product_api_base_url)


async def _project_exists(
Expand Down Expand Up @@ -372,7 +401,9 @@
project = _create_project(
project_id=project_uid,
name=f"File {file_params.file_name[-20:]}",
description=f"Autogenerated study with a file-picker for {file_params.file_name} [{file_params.file_type} file]",
description=(
f"Autogenerated study with a file-picker for {file_params.file_name} [{file_params.file_type} file]"
),
thumbnail=project_thumbnail,
owner=user,
workbench={
Expand Down
Loading
Loading