Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1182 ml classification queue #1219

Open
wants to merge 31 commits into
base: dev
Choose a base branch
from
Open

Conversation

CarsonDavis
Copy link
Collaborator

No description provided.

@CarsonDavis CarsonDavis linked an issue Feb 7, 2025 that may be closed by this pull request
except requests.exceptions.RequestException as e:
return {"status": JobStatusEnum.FAILED, "message": f"API request failed: {str(e)}"}

def load_model(self, model_identifier: str) -> bool:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to request_load_model

self.model_identifier = model_identifier

def _unload_all_other_models(self) -> bool:
"""Unload all models except the one managed by this instance"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to implement
_request_model_unload: sends the unload request
unload_model: calls request, then send status check request until confirmation of unload, with error handling if max retries met
unload_all_models:run _unload for all models

then we can implement a similar pattern on the loading side as well.
Lastly, lets merge model manager into apiclient, since there is no logical separation.

"""Process this external job and update status/results"""
try:
api_client = InferenceAPIClient()
model_version = ModelVersion.objects.get(classification_type=self.inference_job.classification_type)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need model version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the API documentation to see if a model identifier is needed in conjunction with the job_id

if new_status == ExternalJobStatus.COMPLETED:
self.store_results(response.get("results"))
elif new_status in [ExternalJobStatus.FAILED, ExternalJobStatus.CANCELLED]:
self.set_error(response.get("message", ""))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add note about failed or cancelled if no message is present.

"""Prepare single URL data for API"""
return {
"url_id": url.id,
"text": url.scraped_text[: self.config.max_text_length],
Copy link
Collaborator Author

@CarsonDavis CarsonDavis Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to refactor this. It should take in a max text length only, not a url count. And then it should grab as many urls as will fit into the max text length. If a single url exceeds the max text length, then that url can be sent by itself in a batch of 1.

container_name: sde_indexing_helper_local_celerybeat
depends_on:
- redis
- postgres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add this config to the production.yml file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ML Classification Queue
2 participants