Skip to content

Trace Analysis with MLFlow #30

@EoghanOConnor

Description

@EoghanOConnor

We need to integrate Skills with the remote MLflow instance and explore how MLflow tracing data can be leveraged to improve Skills performance and effectiveness for the RHDP team.

This involves both the technical integration and a follow-up analysis phase to extract actionable insights from MLflow traces.

Objectives

  • Connect Skills to the remote MLflow tracking server
  • Ensure all relevant runs, metrics, and traces are logged correctly
  • Investigate MLflow tracing capabilities (e.g., spans, logs, metadata)
  • Analyze trace data using a notebook to identify opportunities for improving Skills
  • Provide recommendations for RHDP team based on findings

Tasks

-[ ] Configure connection to remote MLflow instance

  • Validate authentication and access permissions

  • Update Skills to log runs and traces to MLflow

  • Verify trace data is being captured correctly

  • Set up a notebook (e.g., Jupyter) for trace analysis

  • Explore MLflow trace schema and available data within the notebook

  • Perform analysis on trace data (latency, errors, patterns, etc.)

  • Identify bottlenecks or inefficiencies in Skills

  • Propose improvements based on trace insights

  • Document findings and recommendations for RHDP team

Acceptance Criteria

  • Skills successfully log data to the remote MLflow instance
  • Trace data is accessible and interpretable
  • Analysis report produced with clear insights
  • At least 2–3 actionable recommendations for improving Skills
  • Documentation shared with RHDP team

Notes / Considerations

  • Confirm MLflow version compatibility
  • Ensure sensitive data is handled appropriately in logs/traces
  • Coordinate with NERC infrastructure team if needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions