A production-ready C# .NET 8 project that demonstrates training a base LLM to accurately read, understand, and extract structured data from invoices and receipts of any format.
InvoiceAI/
├── Controllers/
│ ├── InvoiceController.cs – POST /api/invoice/extract
│ ├── TrainingController.cs – POST /api/training/run
│ └── DemoController.cs – POST /api/demo/run/{id}
├── Services/
│ ├── IInvoiceExtractionService.cs
│ ├── InvoiceExtractionService.cs – Anthropic Claude API integration
│ └── ModelTrainingService.cs – Training pipeline orchestration
├── Models/
│ ├── Invoice.cs – Domain models
│ └── TrainingModels.cs – Training/request/response DTOs
├── Data/
│ └── SampleInvoiceDataset.cs – 8 labelled training invoices
├── InvoicePrompts.cs – System & few-shot prompt engineering
└── Program.cs – DI setup + startup auto-training
This project uses two complementary LLM training techniques:
Labelled (invoice text → extracted JSON) pairs from the sample dataset are injected into the system prompt at inference time. The LLM learns:
- Your exact JSON output schema
- How to handle different invoice formats (retail, medical, SaaS, construction, etc.)
- Edge cases like deposits, discounts, multi-currency
The system prompt is carefully crafted to act as the "weights" of the model. The training service evaluates field accuracy across the dataset and selects the highest-performing prompt variant.
The POST /api/training/export/jsonl endpoint exports a JSONL file in OpenAI fine-tune format. To submit actual weight-update fine-tuning jobs:
# Export training data
curl https://localhost:5001/api/training/export/jsonl > training.jsonl
# Submit to OpenAI (requires OpenAI API key)
openai api fine_tuning.jobs.create \
-t training.jsonl \
-m gpt-4o-mini-2024-07-18| ID | Category | Vendor |
|---|---|---|
| train-001 | Technology Services | ACME Tech Solutions LLC |
| train-002 | Retail Receipt | Best Buy |
| train-003 | Freelance / Creative | Sarah Chen Design Studio |
| train-004 | SaaS Subscription | Stripe |
| train-005 | Construction / Trades | Reliable Builders Inc. |
| train-006 | Restaurant Receipt | The Golden Fork |
| train-007 | Medical / Healthcare | City Medical Center |
| train-008 | Shipping / Logistics | FedEx Freight |
- .NET 8 SDK
- An Anthropic API key
Option A – appsettings.json:
{
"Anthropic": { "ApiKey": "sk-ant-..." }
}Option B – Environment variable (recommended):
export ANTHROPIC_API_KEY=sk-ant-...cd InvoiceAI
dotnet runThe app starts on https://localhost:5001. Swagger UI loads at the root /.
Run training pipeline:
curl -X POST https://localhost:5001/api/training/runExtract a sample invoice (fine-tuned):
curl -X POST https://localhost:5001/api/demo/run/train-001Compare base vs fine-tuned:
curl -X POST https://localhost:5001/api/demo/compare/train-002Extract your own invoice:
curl -X POST https://localhost:5001/api/invoice/extract \
-H "Content-Type: application/json" \
-d '{
"invoiceText": "ACME Corp\nInvoice #123\nDate: 2024-01-15\n\nWeb Development 1 $5000\n\nTotal: $5000",
"useFineTunedModel": true
}'Extract structured data from any invoice text.
Request:
{
"invoiceText": "...",
"useFineTunedModel": true,
"sourceFileName": "invoice.txt"
}Response:
{
"success": true,
"invoice": {
"vendorName": "ACME Corp",
"invoiceNumber": "INV-123",
"invoiceDate": "2024-01-15T00:00:00",
"totalAmount": 5000.00,
"currency": "USD",
"lineItems": [...],
"metadata": {
"confidenceScore": 0.97,
"wasFineTuned": true,
"modelVersion": "claude-opus-4-5"
}
},
"processingTimeMs": 1423
}Runs the full training pipeline and returns metrics.
Returns dataset statistics (example counts, categories, coverage).
Downloads the training dataset in OpenAI JSONL format.
Runs extraction on a built-in sample invoice. IDs: train-001 to train-008.
Runs both base and fine-tuned extractions side-by-side for comparison.
Add a new TrainingExample to SampleInvoiceDataset.cs:
public static TrainingExample GetMyNewInvoice() => new()
{
Id = "train-009",
Category = "utilities",
RawText = """
POWER COMPANY
Account: 123456
Amount Due: $145.50
Due: March 1, 2024
""",
GroundTruth = new Invoice
{
VendorName = "POWER COMPANY",
TotalAmount = 145.50m,
InvoiceDate = new DateTime(2024, 3, 1),
// ...
}
};Then register it in GetAllExamples():
public static List<TrainingExample> GetAllExamples() => new()
{
// existing examples...
GetMyNewInvoice()
};| Technology | Purpose |
|---|---|
| .NET 8 | Runtime & web framework |
| ASP.NET Core | REST API |
| Anthropic Claude API | Base LLM for extraction |
| Newtonsoft.Json | JSON parsing |
| Swashbuckle | Swagger UI |
| Microsoft.Extensions.ML | ML.NET integration hooks |