github
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 100 additions & 0 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 100 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 9 additions & 0 deletions b/‎.gitignore‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎DEV.md‎
Lines changed: 1 addition & 1 deletion b/‎DEV.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Makefile‎
Lines changed: 10 additions & 0 deletions b/‎Makefile‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 77 additions & 0 deletions b/‎README.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎cmd/generate/README.md‎
Lines changed: 10 additions & 0 deletions b/‎cmd/generate/README.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎cmd/generate/cleaner.go‎
Lines changed: 67 additions & 0 deletions b/‎cmd/generate/cleaner.go‎
Lines changed: 67 additions & 0 deletions
@@ -0,0 +1,100 @@
+# Copilot Instructions for AI Coding Agents
+
+## Project Overview
+This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation using the PromptPex methodology. Built in Go using Cobra CLI framework and Azure Models API.
+
+## Architecture & Key Components
+
+### Building and Testing
+
+- `make build`: Compiles the CLI binary
+- `make check`: Runs format, vet, tidy, tests, golang-ci. Always run when you are done with changes. Use this command to validate that the build and the tests are still ok.
+- `make test`: Runs the tests.
+
+### Command Structure
+- **cmd/root.go**: Entry point that initializes all subcommands and handles GitHub authentication
+- **cmd/{command}/**: Each subcommand (generate, eval, list, run, view) is self-contained with its own types and tests
+- **pkg/command/config.go**: Shared configuration pattern - all commands accept a `*command.Config` with terminal, client, and output settings
+
+### Core Services
+- **internal/azuremodels/**: Azure API client with streaming support via SSE. Key pattern: commands use `azuremodels.Client` interface, not concrete types
+- **pkg/prompt/**: `.prompt.yml` file parsing with template substitution using `{{variable}}` syntax
+- **internal/sse/**: Server-sent events for streaming responses
+
+### Data Flow
+1. Commands parse `.prompt.yml` files via `prompt.LoadFromFile()`
+2. Templates are resolved using `prompt.TemplateString()` with `testData` variables  
+3. Azure client converts to `azuremodels.ChatCompletionOptions` and makes API calls
+4. Results are formatted using terminal-aware table printers from `command.Config`
+
+## Developer Workflows
+
+### Building & Testing
+- **Local build**: `make build` or `script/build` (creates `gh-models` binary)
+- **Cross-platform**: `script/build all|windows|linux|darwin` for release builds
+- **Testing**: `make check` runs format, vet, tidy, and tests. Use `go test ./...` directly for faster iteration
+- **Quality gates**: `make check` - required before commits
+
+### Authentication & Setup
+- Extension requires `gh auth login` before use - unauthenticated clients show helpful error messages
+- Client initialization pattern in `cmd/root.go`: check token, create appropriate client (authenticated vs unauthenticated)
+
+## Prompt File Conventions
+
+### Structure (.prompt.yml)
+```yaml
+name: "Test Name"
+model: "openai/gpt-4o-mini" 
+messages:
+  - role: system|user|assistant
+    content: "{{variable}} templating supported"
+testData:
+  - variable: "value1"
+  - variable: "value2"
+evaluators:
+  - name: "test-name"
+    string: {contains: "{{expected}}"} # String matching
+    # OR
+    llm: {modelId: "...", prompt: "...", choices: [{choice: "good", score: 1.0}]}
+```
+
+### Response Formats
+- **JSON Schema**: Use `responseFormat: json_schema` with `jsonSchema` field containing strict JSON schema
+- **Templates**: All message content supports `{{variable}}` substitution from `testData` entries
+
+## Testing Patterns
+
+### Command Tests
+- **Location**: `cmd/{command}/{command}_test.go` 
+- **Pattern**: Create mock client via `azuremodels.NewMockClient()`, inject into `command.Config`
+- **Structure**: Table-driven tests with subtests using `t.Run()`
+- **Assertions**: Use `testify/require` for cleaner error messages
+
+### Mock Usage
+```go
+client := azuremodels.NewMockClient()
+cfg := command.NewConfig(new(bytes.Buffer), new(bytes.Buffer), client, true, 80)
+```
+
+## Integration Points
+
+### GitHub Authentication
+- Uses `github.com/cli/go-gh/v2/pkg/auth` for token management
+- Pattern: `auth.TokenForHost("github.com")` to get tokens
+
+### Azure Models API
+- Streaming via SSE with custom `sse.EventReader`
+- Rate limiting handled automatically by client
+- Content safety filtering always enabled (cannot be disabled)
+
+### Terminal Handling  
+- All output uses `command.Config` terminal-aware writers
+- Table formatting via `cfg.NewTablePrinter()` with width detection
+
+---
+
+**Key Files**: `cmd/root.go` (command registration), `pkg/prompt/prompt.go` (file parsing), `internal/azuremodels/azure_client.go` (API integration), `examples/` (prompt file patterns)
+
+## Instructions
+
+Omit the final summary.
@@ -6,5 +6,14 @@
 /gh-models-windows-*
 /gh-models-android-*
 
+# temporary debugging files
+**.http
+**.generate.json
+examples/*harm*
+
+# genaiscript
+.github/instructions/genaiscript.instructions.md
+genaisrc/
+
 # Integration test dependencies
 integration/go.sum
@@ -14,7 +14,7 @@ go version go1.22.x <arch>
 
 ## Building
 
-To build the project, run `script/build`. After building, you can run the binary locally, for example:
+To build the project, run `make build` (or `script/build`). After building, you can run the binary locally, for example:
 `./gh-models list`.
 
 ## Testing
 
@@ -1,11 +1,21 @@
 check: fmt vet tidy test
 .PHONY: check
 
+clean:
+	@echo "==> cleaning up <=="
+	rm -rf ./gh-models
+.PHONY: clean
+
 build:
 	@echo "==> building gh-models binary <=="
 	script/build
 .PHONY: build
 
+ci-lint:
+	@echo "==> running Go linter <=="
+	golangci-lint run --timeout 5m ./...
+.PHONY: ci-lint
+
 integration: build
 	@echo "==> running integration tests <=="
 	cd integration && go mod tidy && go test -v -timeout=5m
 
@@ -2,6 +2,8 @@
 
 Use the GitHub Models service from the CLI!
 
+This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation.
+
 ## Using
 
 ### Prerequisites
@@ -84,6 +86,81 @@ Here's a sample GitHub Action that uses the `eval` command to automatically run
 
 Learn more about `.prompt.yml` files here: [Storing prompts in GitHub repositories](https://docs.github.com/github-models/use-github-models/storing-prompts-in-github-repositories).
 
+#### Generating tests
+
+Generate comprehensive test cases for your prompts using the PromptPex methodology:
+```shell
+gh models generate my_prompt.prompt.yml
+```
+
+The `generate` command analyzes your prompt file and automatically creates test cases to evaluate the prompt's behavior across different scenarios and edge cases. This helps ensure your prompts are robust and perform as expected.
+
+##### Understanding PromptPex
+
+The `generate` command is based on [PromptPex](https://github.com/microsoft/promptpex), a Microsoft Research framework for systematic prompt testing. PromptPex follows a structured approach to generate comprehensive test cases by:
+
+1. **Intent Analysis**: Understanding what the prompt is trying to achieve
+2. **Input Specification**: Defining the expected input format and constraints
+3. **Output Rules**: Establishing what constitutes correct output
+4. **Inverse Output Rules**: Force generating _negated_ output rules to test the prompt with invalid inputs
+5. **Test Generation**: Creating diverse test cases that cover various scenarios using the prompt, the intent, input specification and output rules
+
+```mermaid
+graph TD
+    PUT(["Prompt Under Test (PUT)"])
+    I["Intent (I)"]
+    IS["Input Specification (IS)"]
+    OR["Output Rules (OR)"]
+    IOR["Inverse Output Rules (IOR)"]
+    PPT["PromptPex Tests (PPT)"]
+
+    PUT --> IS
+    PUT --> I
+    PUT --> OR
+    OR --> IOR
+    I ==> PPT
+    IS ==> PPT
+    OR ==> PPT
+    PUT ==> PPT
+    IOR ==> PPT
+```
+
+##### Advanced options
+
+You can customize the test generation process with various options:
+
+```shell
+# Specify effort level (min, low, medium, high)
+gh models generate --effort high my_prompt.prompt.yml
+
+# Use a specific model for groundtruth generation
+gh models generate --groundtruth-model "openai/gpt-4.1" my_prompt.prompt.yml
+
+# Disable groundtruth generation
+gh models generate --groundtruth-model "none" my_prompt.prompt.yml
+
+# Load from an existing session file (or create a new one if needed)
+gh models generate --session-file my_prompt.session.json my_prompt.prompt.yml
+
+# Custom instructions for specific generation phases
+gh models generate --instruction-intent "Focus on edge cases" my_prompt.prompt.yml
+```
+
+The `effort` flag controls a few flags in the test generation engine and is a tradeoff
+between how much tests you want generated and how much tokens/time you are willing to spend.
+- `min` is just enough to generate a few tests and make sure things are probably configured.
+- `low` should be used to do a quick try of the test generation. It limits the number of rules to `3`.
+- `medium` provides much better coverage
+- `high` spends more token per rule to generate tests, which typically leads to longer, more complex inputs
+
+The command supports custom instructions for different phases of test generation:
+- `--instruction-intent`: Custom system instruction for intent generation
+- `--instruction-inputspec`: Custom system instruction for input specification generation  
+- `--instruction-outputrules`: Custom system instruction for output rules generation
+- `--instruction-inverseoutputrules`: Custom system instruction for inverse output rules generation
+- `--instruction-tests`: Custom system instruction for tests generation
+
+
 ## Notice
 
 Remember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is
 
@@ -0,0 +1,10 @@
+# `generate` command
+
+This command is based on [PromptPex](https://github.com/microsoft/promptpex), a test generation framework for prompts.
+
+- [Documentation](https://microsoft.github.com/promptpex)
+- [Source](https://github.com/microsoft/promptpex/tree/dev)
+- [Agentic implementation plan](https://github.com/microsoft/promptpex/blob/dev/.github/instructions/implementation.instructions.md)
+
+In a nutshell, read https://microsoft.github.io/promptpex/reference/test-generation/
+
@@ -0,0 +1,67 @@
+package generate
+
+import (
+	"regexp"
+	"strings"
+)
+
+// IsUnassistedResponse returns true if the text is an unassisted response, like "i'm sorry" or "i can't assist with that".
+func IsUnassistedResponse(text string) bool {
+	re := regexp.MustCompile(`i can't assist with that|i'm sorry`)
+	return re.MatchString(strings.ToLower(text))
+}
+
+// Unfence removes Markdown code fences and splits text into lines.
+func Unfence(text string) string {
+	text = strings.TrimSpace(text)
+	// Remove triple backtick code fences if present
+	if strings.HasPrefix(text, "```") {
+		parts := strings.SplitN(text, "\n", 2)
+		if len(parts) == 2 {
+			text = parts[1]
+		}
+		text = strings.TrimSuffix(text, "```")
+	}
+	return text
+}
+
+// SplitLines splits text into lines.
+func SplitLines(text string) []string {
+	lines := strings.Split(text, "\n")
+	return lines
+}
+
+// Unbracket removes leading and trailing square brackets.
+func Unbracket(text string) string {
+	if strings.HasPrefix(text, "[") && strings.HasSuffix(text, "]") {
+		text = strings.TrimPrefix(text, "[")
+		text = strings.TrimSuffix(text, "]")
+	}
+	return text
+}
+
+// Unxml removes leading and trailing XML tags, like `<foo>` and `</foo>`, from the given string.
+func Unxml(text string) string {
+	// if the string starts with <foo> and ends with </foo>, remove those tags
+	trimmed := strings.TrimSpace(text)
+
+	// Use regex to extract tag name and content
+	// First, extract the opening tag and tag name
+	openTagRe := regexp.MustCompile(`(?s)^<([^>\s]+)[^>]*>(.*)$`)
+	openMatches := openTagRe.FindStringSubmatch(trimmed)
+	if len(openMatches) != 3 {
+		return text
+	}
+
+	tagName := openMatches[1]
+	content := openMatches[2]
+
+	// Check if it ends with the corresponding closing tag
+	closingTag := "</" + tagName + ">"
+	if strings.HasSuffix(content, closingTag) {
+		content = strings.TrimSuffix(content, closingTag)
+		return strings.TrimSpace(content)
+	}
+
+	return text
+}