Skip to content

[AQUA Telemetry] Update MD Tracking #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

agrimk
Copy link
Member

@agrimk agrimk commented May 22, 2025

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 22, 2025
Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-19.13%

Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-19.13%

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments.

Can we add some unit tests, and also set up pre-commit hook to format the code changes during commit? Also, can you also add some examples of logging events of both successful and unsuccessful in the description?

deployment_id = deployment.id


deployment_id = deployment.id()
Copy link
Member

@VipulMascarenhas VipulMascarenhas May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is a property, keep it as .id instead else it will result in TypeError

@@ -38,6 +38,7 @@ def __init__(
config: dict = None,
signer: Signer = None,
client_kwargs: dict = None,
_error_message: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a parameter called _error_message here? We can have it inside init directly:

self._error_message = None



def get_deployment_status(self,model_deployment_id: str, work_request_id : str, model_type : str) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstrings

@@ -54,6 +54,8 @@
SERVICE_MANAGED_CONTAINER_URI_SCHEME = "dsmc://"
SUPPORTED_FILE_FORMATS = ["jsonl"]
MODEL_BY_REFERENCE_OSS_PATH_KEY = "artifact_location"
DEFAULT_WAIT_TIME = 1200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these constants are specific to model deployment status, we could have it in the ads.aqua.modeldeployment.constants.py.

@@ -80,6 +84,9 @@
from ads.model.model_metadata import ModelCustomMetadataItem
from ads.telemetry import telemetry

THREAD_POOL_SIZE = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this constant in ads.aqua.modeldeployment.constants.py

@@ -80,6 +84,9 @@
from ads.model.model_metadata import ModelCustomMetadataItem
from ads.telemetry import telemetry

THREAD_POOL_SIZE = 16
thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=THREAD_POOL_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the common telemetry or the common decorator threadpool instead of creating one here.

category=f"aqua/{model_type}/deployment/status",
action="FAILED",
detail="Error creating model deployment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should log the error message (_error_message) coming from work request here instead of a static message. This will be used to track the specific reasons why the deployment failed.

)

self.telemetry.record_event_async(
Copy link
Member

@VipulMascarenhas VipulMascarenhas May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be within else block? We can use try-except-else here.

category=f"aqua/{model_type}/deployment/status",
action="SUCCEEDED",
detail=" Create model deployment successful",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip this detail, action "SUCCEEDED" implies the same thing.

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good, added some minor comments. Can you run pre-commit hook to take care of some formatting issues?

pip install pre-commit
run pre-commit install

self.telemetry.record_event_async(
category=f"aqua/{model_type}/deployment/status",
action="FAILED",
detail=data_science_work_request._error_message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can _error_message be None for any reason? Might be good to do detail=data_science_work_request._error_message or UNKNOWN to avoid unforeseen issues in telemetry logging.

@@ -78,6 +79,7 @@ def _sync(self):
self._percentage= work_request.percent_complete
self._status = work_request.status
self._description = work_request_logs[-1].message if work_request_logs else "Processing"
if work_request.status == 'FAILED' : self._error_message = self.client.list_work_request_errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to show an example output for failed and successful MD in the PR description.

@mrDzurb mrDzurb changed the title Odsc 70841 update md tracking [AQUA Telemetry] Update MD Tracking Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants