feat: Add Ollama integration support (fixes #64) #79

aarunbhardwaj · 2025-10-25T17:41:27Z

Summary

This PR adds comprehensive Ollama integration support to synthetic-data-kit, addressing issue #64.

Fixes

Closes #64 - Add Ollama integration support

Changes Made

✅ LLM Client Integration: Added Ollama provider support in LLMClient with proper error handling
✅ CLI Support: Updated CLI commands (create, curate) to handle Ollama provider
✅ Configuration: Added Ollama configuration utilities and default settings
✅ Error Handling: Improved error messages and server connectivity checks
✅ Testing: Added comprehensive integration and unit tests for Ollama functionality

Key Features

Local LLM Usage: No API keys required, runs entirely locally with Ollama
Privacy & Cost: Eliminates need for external API calls and associated costs
Feature Parity: Full compatibility with existing pipeline (ingest → create → curate → save-as)
Easy Setup: Works out-of-the-box with ollama serve and any supported model

Technical Implementation

Added ollama provider option alongside existing vllm and api-endpoint providers
Implemented Ollama-specific message formatting for optimal prompt structure
Added proper server availability checks and connection error handling
Updated configuration system to support Ollama settings

Testing

✅ All existing tests pass
✅ New Ollama integration tests added (tests/integration/test_ollama_integration.py)
✅ Comprehensive unit tests for Ollama functionality
✅ Tested with real Ollama deployment using llama3.2:latest

Usage Example

# Start Ollama server
ollama serve

# Install a model
ollama pull llama3.2

# Use synthetic-data-kit with Ollama
synthetic-data-kit --config config.yaml create input.txt --type qa --provider ollama

- Add Ollama provider support to LLM client with proper error handling - Update CLI to handle Ollama provider in create and curate commands - Fix Ollama server connection and generation handling - Add comprehensive Ollama integration tests - Update main config to use Ollama as default provider - Improve error messages and server availability checks This enables local LLM usage with Ollama for synthetic data generation, providing an alternative to API-based providers for privacy and cost savings.

- Add get_ollama_config() function to config utilities - Add comprehensive unit tests for Ollama provider functionality - Test Ollama initialization, chat completion, message formatting, and batch processing - Improve provider validation in config utilities These changes complete the Ollama integration test coverage and utilities.

meta-cla · 2025-10-25T17:41:33Z

Hi @aarunbhardwaj!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copilot

Pull Request Overview

This PR adds comprehensive Ollama integration support to synthetic-data-kit, enabling users to run LLM operations entirely locally without API keys or external service dependencies. The implementation provides full feature parity with existing vLLM and api-endpoint providers.

Key Changes:

Added Ollama as a third provider option alongside vLLM and api-endpoint
Implemented Ollama-specific message formatting that converts OpenAI-style messages to Ollama's prompt format
Added server availability checks and improved error handling for all provider types

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
synthetic_data_kit/models/llm_client.py	Core Ollama integration with chat/batch completion methods and message formatting
synthetic_data_kit/utils/config.py	Added Ollama configuration function with default settings
synthetic_data_kit/cli.py	Extended CLI commands to support Ollama provider with server checks
synthetic_data_kit/config.yaml	Added Ollama configuration section with default values
tests/unit/test_llm_client.py	Unit tests for Ollama client initialization and operations
tests/integration/test_ollama_integration.py	Integration tests covering Ollama functionality and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-25T17:42:09Z

synthetic_data_kit/models/llm_client.py

+            except (requests.exceptions.RequestException, KeyError, ValueError) as e:
+                if verbose:
+                    logger.error(f"Ollama API error (attempt {attempt+1}/{self.max_retries}): {e}")
+                if attempt == self.max_retries - 1:
+                    raise Exception(f"Failed to get Ollama completion after {self.max_retries} attempts: {str(e)}")
+                time.sleep(self.retry_delay * (attempt + 1))
+            except Exception as e:
+                if verbose:
+                    logger.error(f"Ollama API error (attempt {attempt+1}/{self.max_retries}): {e}")
+                if attempt == self.max_retries - 1:
+                    raise Exception(f"Failed to get Ollama completion after {self.max_retries} attempts: {str(e)}")
+                time.sleep(self.retry_delay * (attempt + 1))


The exception handling has duplicated error logging and retry logic. The second 'except Exception' block (lines 443-448) catches exceptions already handled by the first block (lines 437-442), making the second block unreachable for RequestException, KeyError, and ValueError. Either remove the redundant second block or consolidate both into a single 'except Exception' handler.

Copilot · 2025-10-25T17:42:09Z

synthetic_data_kit/cli.py

+        except requests.exceptions.RequestException:
+            console.print(f"❌ Error: VLLM server not available at {api_base}", style="red")
+            console.print("Please start the VLLM server with:", style="yellow")
+            console.print(f"vllm serve {model}", style="bold blue")
+            return 1


This new exception handler for RequestException in the vLLM section will never execute because it appears after the return statement at line 645. The existing try-except block already handles this case earlier in the code, making this block unreachable dead code. Remove lines 646-650.

Suggested change

except requests.exceptions.RequestException:

console.print(f"❌ Error: VLLM server not available at {api_base}", style="red")

console.print("Please start the VLLM server with:", style="yellow")

console.print(f"vllm serve {model}", style="bold blue")

return 1

meta-cla · 2025-10-27T08:11:48Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Arun Kumar and others added 3 commits October 25, 2025 23:02

Merge branch 'meta-llama:main' into main

bfe11fd

Copilot AI review requested due to automatic review settings October 25, 2025 17:41

Copilot AI reviewed Oct 25, 2025

View reviewed changes

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Ollama integration support (fixes #64) #79

feat: Add Ollama integration support (fixes #64) #79

aarunbhardwaj commented Oct 25, 2025

Uh oh!

meta-cla bot commented Oct 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 25, 2025

Uh oh!

Copilot AI Oct 25, 2025

Uh oh!

meta-cla bot commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Ollama integration support (fixes #64) #79

Are you sure you want to change the base?

feat: Add Ollama integration support (fixes #64) #79

Conversation

aarunbhardwaj commented Oct 25, 2025

Summary

Fixes

Changes Made

Key Features

Technical Implementation

Testing

Usage Example

Uh oh!

meta-cla bot commented Oct 25, 2025

Action Required

Process

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

meta-cla bot commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant