Found 8 locations that should be converted to structured output for better reliability.
Status: Converted to structured output
Impact: Eliminates NoneType bugs, type-safe planning
# Current:
result = json.loads(response)
# Should be:
class ResearchPlanSchema(BaseModel):
main_topic: str
key_areas: List[str]
sub_questions: List[str]
...
structured_llm = llm.with_structured_output(ResearchPlanSchema)
plan = await structured_llm.ainvoke(...)Impact: Robust research plan generation
# Current:
return json.loads(response)
# Should be:
class QuestionGeneration(BaseModel):
questions: List[Dict[str, str]]
gaps_identified: List[str]
structured_llm = llm.with_structured_output(QuestionGeneration)
result = await structured_llm.ainvoke(...)Impact: Reliable question generation
# Current:
plan_data = json.loads(response.content)
# Should be:
class PlanData(BaseModel):
main_topic: str
key_areas: List[str]
sub_questions: List[str]
expected_sections: List[str]
search_strategy: str
structured_llm = llm.with_structured_output(PlanData)
plan_data = await structured_llm.ainvoke(...)Impact: Eliminates plan generation failures
# Current:
result = json.loads(response.content)
questions = result["questions"]
# Should be:
class QuestionsResult(BaseModel):
questions: List[Dict[str, Any]]
gaps_identified: List[str]
structured_llm = llm.with_structured_output(QuestionsResult)
result = await structured_llm.ainvoke(...)
questions = result.questions # Type-safe!Impact: Reliable iteration questions
# Current:
result = json.loads(response.content)
return result["convergence_score"] / 100.0
# Should be:
class ConvergenceResult(BaseModel):
convergence_score: float
structured_llm = llm.with_structured_output(ConvergenceResult)
result = await structured_llm.ainvoke(...)
return result.convergence_score / 100.0Impact: Reliable convergence checking
# Current:
result = json.loads(response.content)
return result.get("convergence_score", 50) / 100.0
# Should be:
class ConvergenceScore(BaseModel):
convergence_score: float = 50.0
structured_llm = llm.with_structured_output(ConvergenceScore)
result = await structured_llm.ainvoke(...)
return result.convergence_score / 100.0Impact: Prevents denoising failures
# Current:
feedback = json.loads(response)
# Should be:
class EvolutionFeedback(BaseModel):
fitness_score: float
strengths: List[str]
weaknesses: List[str]
suggestions: List[str]
structured_llm = llm.with_structured_output(EvolutionFeedback)
feedback = await structured_llm.ainvoke(...)Impact: Reliable variant evolution
# Current:
response_data = json.loads(response_content)
judgment = response_data["judgment"]
# Should be:
class CoverageJudgment(BaseModel):
judgment: Literal["Yes", "No"]
explanation: str
structured_llm = llm.with_structured_output(CoverageJudgment)
result = await structured_llm.ainvoke(...)
judgment = result.judgmentImpact: Reliable evaluation (less critical, eval code only)
Total locations: 8
Priority 1 (Main agent): 1 - ✅ FIXED
Priority 2 (TTD-DR): 6 - Need fixing
Priority 3 (Eval): 1 - Lower priority
Why TTD-DR has so many:
- Complex multi-stage pipeline
- Each stage uses LLM for structured decisions
- All still using manual parsing
- All prone to same NoneType bugs
Recommendation:
- ✅ planner_node fixed (main agent robust now)
- Fix TTD-DR components (enable TTD-DR to work)
- Eval code can wait (lower priority)
Estimated effort:
- Each location: ~10-15 minutes
- Total for TTD-DR: ~1-2 hours
- High impact on reliability
✅ Type safety: Pydantic validation
✅ No parsing errors: Guaranteed valid or exception
✅ Better errors: Clear failure messages
✅ IDE support: Autocomplete, type checking
✅ Maintainability: Self-documenting schemas
✅ Robustness: Eliminates entire class of bugs
Next Steps:
- Deploy current fix (planner_node)
- Test if TTD-DR works now
- If still issues, convert TTD-DR components one by one
- Eval code last (less critical)