Skip to content

[Bug] mlflow.dspy.util warning with structured output - failed to save dspy module state (object is not JSON serializable) #8595

@chriswmann

Description

@chriswmann

What happened?

@TomeHirata

As requested in MLFLow Issue 16939

Can you file an issue on https://github.com/stanfordnlp/dspy/issues with your motivation of using pydantic.HttpUrl as part of Signature and tag me?

Apologies for the confusion. The actual signature we're using looks more like this:

class MyCustomType(BaseModel):
    # various fields ...
    url: HttpUrl
    

class MySignature(dspy.Signature):
    text: str = dspy.InputField()
    output: MyCustomType = dspy.OutputField()

I just used the HttpUrl directly in the signature in the linked issue to minimise the reproducible example.

Please let me know if you need any more information.

Steps to reproduce

Both of these result in the same warning being emitted.

More realistic code:

import dspy
import mlflow

from dspy.teleprompt import BootstrapFewShot
from pydantic import HttpUrl, BaseModel

dspy.configure(lm=dspy.LM("gemini/gemini-2.5-flash"))

mlflow.set_experiment("dspy-example")
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.dspy.autolog(
    log_compiles=True,
    log_traces_from_compile=True,
)


class Output(BaseModel):
    url: HttpUrl


def metric(example, pred, trace):
    return 0


def main():
    with mlflow.start_run():
        # Using HttpUrl directly for brevity, although this was first encountered
        # while using a `BaseModel` with an `HttpUrl` field.
        model = dspy.Predict("question: str -> answer: Output")

        trainset = [
            dspy.Example(
                question="Give me a random email address.",
                answer=Output(url="https://www.example.com"),
            ).with_inputs("question"),
        ]

        optimiser: BootstrapFewShot = BootstrapFewShot(
            metric=metric,
            max_bootstrapped_demos=1,
            max_labeled_demos=1,
            max_rounds=1,
        )
        optimised = optimiser.compile(model, trainset=trainset)

        # Confirm log_model works fine
        mlflow.dspy.log_model(optimised, "optimised_model")


if __name__ == "__main__":
    main()

Original example:

import dspy
import mlflow

from dspy.teleprompt import BootstrapFewShot
from pydantic import HttpUrl

dspy.configure(lm=dspy.LM("gemini/gemini-2.5-flash"))

mlflow.set_experiment("dspy-example")
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.dspy.autolog(
    log_compiles=True,
    log_traces_from_compile=True,
)


def metric(example, pred, trace):
    return 0


def main():
    with mlflow.start_run():
        # Using HttpUrl directly for brevity, although this was first encountered
        # while using a `BaseModel` with an `HttpUrl` field.
        model = dspy.Predict("question: str -> answer: HttpUrl")

        trainset = [
            dspy.Example(
                question="Give me a random URL",
                answer=HttpUrl("https://example.com"),
            ).with_inputs("question"),
        ]

        optimiser: BootstrapFewShot = BootstrapFewShot(
            metric=metric,
            max_bootstrapped_demos=1,
            max_labeled_demos=1,
            max_rounds=1,
        )
        mlflow.doctor()
        optimised = optimiser.compile(model, trainset=trainset)

        # Confirm log_model works fine
        mlflow.dspy.log_model(optimised, "optimised_model")


if __name__ == "__main__":
    main()

DSPy version

2.6.27

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions