Skip to content

Commit c6a6821

Browse files
authored
generate command (#79)
2 parents 55632e7 + e8bb082 commit c6a6821

37 files changed

+3943
-49
lines changed

.github/copilot-instructions.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Copilot Instructions for AI Coding Agents
2+
3+
## Project Overview
4+
This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation using the PromptPex methodology. Built in Go using Cobra CLI framework and Azure Models API.
5+
6+
## Architecture & Key Components
7+
8+
### Building and Testing
9+
10+
- `make build`: Compiles the CLI binary
11+
- `make check`: Runs format, vet, tidy, tests, golang-ci. Always run when you are done with changes. Use this command to validate that the build and the tests are still ok.
12+
- `make test`: Runs the tests.
13+
14+
### Command Structure
15+
- **cmd/root.go**: Entry point that initializes all subcommands and handles GitHub authentication
16+
- **cmd/{command}/**: Each subcommand (generate, eval, list, run, view) is self-contained with its own types and tests
17+
- **pkg/command/config.go**: Shared configuration pattern - all commands accept a `*command.Config` with terminal, client, and output settings
18+
19+
### Core Services
20+
- **internal/azuremodels/**: Azure API client with streaming support via SSE. Key pattern: commands use `azuremodels.Client` interface, not concrete types
21+
- **pkg/prompt/**: `.prompt.yml` file parsing with template substitution using `{{variable}}` syntax
22+
- **internal/sse/**: Server-sent events for streaming responses
23+
24+
### Data Flow
25+
1. Commands parse `.prompt.yml` files via `prompt.LoadFromFile()`
26+
2. Templates are resolved using `prompt.TemplateString()` with `testData` variables
27+
3. Azure client converts to `azuremodels.ChatCompletionOptions` and makes API calls
28+
4. Results are formatted using terminal-aware table printers from `command.Config`
29+
30+
## Developer Workflows
31+
32+
### Building & Testing
33+
- **Local build**: `make build` or `script/build` (creates `gh-models` binary)
34+
- **Cross-platform**: `script/build all|windows|linux|darwin` for release builds
35+
- **Testing**: `make check` runs format, vet, tidy, and tests. Use `go test ./...` directly for faster iteration
36+
- **Quality gates**: `make check` - required before commits
37+
38+
### Authentication & Setup
39+
- Extension requires `gh auth login` before use - unauthenticated clients show helpful error messages
40+
- Client initialization pattern in `cmd/root.go`: check token, create appropriate client (authenticated vs unauthenticated)
41+
42+
## Prompt File Conventions
43+
44+
### Structure (.prompt.yml)
45+
```yaml
46+
name: "Test Name"
47+
model: "openai/gpt-4o-mini"
48+
messages:
49+
- role: system|user|assistant
50+
content: "{{variable}} templating supported"
51+
testData:
52+
- variable: "value1"
53+
- variable: "value2"
54+
evaluators:
55+
- name: "test-name"
56+
string: {contains: "{{expected}}"} # String matching
57+
# OR
58+
llm: {modelId: "...", prompt: "...", choices: [{choice: "good", score: 1.0}]}
59+
```
60+
61+
### Response Formats
62+
- **JSON Schema**: Use `responseFormat: json_schema` with `jsonSchema` field containing strict JSON schema
63+
- **Templates**: All message content supports `{{variable}}` substitution from `testData` entries
64+
65+
## Testing Patterns
66+
67+
### Command Tests
68+
- **Location**: `cmd/{command}/{command}_test.go`
69+
- **Pattern**: Create mock client via `azuremodels.NewMockClient()`, inject into `command.Config`
70+
- **Structure**: Table-driven tests with subtests using `t.Run()`
71+
- **Assertions**: Use `testify/require` for cleaner error messages
72+
73+
### Mock Usage
74+
```go
75+
client := azuremodels.NewMockClient()
76+
cfg := command.NewConfig(new(bytes.Buffer), new(bytes.Buffer), client, true, 80)
77+
```
78+
79+
## Integration Points
80+
81+
### GitHub Authentication
82+
- Uses `github.com/cli/go-gh/v2/pkg/auth` for token management
83+
- Pattern: `auth.TokenForHost("github.com")` to get tokens
84+
85+
### Azure Models API
86+
- Streaming via SSE with custom `sse.EventReader`
87+
- Rate limiting handled automatically by client
88+
- Content safety filtering always enabled (cannot be disabled)
89+
90+
### Terminal Handling
91+
- All output uses `command.Config` terminal-aware writers
92+
- Table formatting via `cfg.NewTablePrinter()` with width detection
93+
94+
---
95+
96+
**Key Files**: `cmd/root.go` (command registration), `pkg/prompt/prompt.go` (file parsing), `internal/azuremodels/azure_client.go` (API integration), `examples/` (prompt file patterns)
97+
98+
## Instructions
99+
100+
Omit the final summary.

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,14 @@
66
/gh-models-windows-*
77
/gh-models-android-*
88

9+
# temporary debugging files
10+
**.http
11+
**.generate.json
12+
examples/*harm*
13+
14+
# genaiscript
15+
.github/instructions/genaiscript.instructions.md
16+
genaisrc/
17+
918
# Integration test dependencies
1019
integration/go.sum

DEV.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ go version go1.22.x <arch>
1414

1515
## Building
1616

17-
To build the project, run `script/build`. After building, you can run the binary locally, for example:
17+
To build the project, run `make build` (or `script/build`). After building, you can run the binary locally, for example:
1818
`./gh-models list`.
1919

2020
## Testing

Makefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,21 @@
11
check: fmt vet tidy test
22
.PHONY: check
33

4+
clean:
5+
@echo "==> cleaning up <=="
6+
rm -rf ./gh-models
7+
.PHONY: clean
8+
49
build:
510
@echo "==> building gh-models binary <=="
611
script/build
712
.PHONY: build
813

14+
ci-lint:
15+
@echo "==> running Go linter <=="
16+
golangci-lint run --timeout 5m ./...
17+
.PHONY: ci-lint
18+
919
integration: build
1020
@echo "==> running integration tests <=="
1121
cd integration && go mod tidy && go test -v -timeout=5m

README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Use the GitHub Models service from the CLI!
44

5+
This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation.
6+
57
## Using
68

79
### Prerequisites
@@ -84,6 +86,81 @@ Here's a sample GitHub Action that uses the `eval` command to automatically run
8486

8587
Learn more about `.prompt.yml` files here: [Storing prompts in GitHub repositories](https://docs.github.com/github-models/use-github-models/storing-prompts-in-github-repositories).
8688

89+
#### Generating tests
90+
91+
Generate comprehensive test cases for your prompts using the PromptPex methodology:
92+
```shell
93+
gh models generate my_prompt.prompt.yml
94+
```
95+
96+
The `generate` command analyzes your prompt file and automatically creates test cases to evaluate the prompt's behavior across different scenarios and edge cases. This helps ensure your prompts are robust and perform as expected.
97+
98+
##### Understanding PromptPex
99+
100+
The `generate` command is based on [PromptPex](https://github.com/microsoft/promptpex), a Microsoft Research framework for systematic prompt testing. PromptPex follows a structured approach to generate comprehensive test cases by:
101+
102+
1. **Intent Analysis**: Understanding what the prompt is trying to achieve
103+
2. **Input Specification**: Defining the expected input format and constraints
104+
3. **Output Rules**: Establishing what constitutes correct output
105+
4. **Inverse Output Rules**: Force generating _negated_ output rules to test the prompt with invalid inputs
106+
5. **Test Generation**: Creating diverse test cases that cover various scenarios using the prompt, the intent, input specification and output rules
107+
108+
```mermaid
109+
graph TD
110+
PUT(["Prompt Under Test (PUT)"])
111+
I["Intent (I)"]
112+
IS["Input Specification (IS)"]
113+
OR["Output Rules (OR)"]
114+
IOR["Inverse Output Rules (IOR)"]
115+
PPT["PromptPex Tests (PPT)"]
116+
117+
PUT --> IS
118+
PUT --> I
119+
PUT --> OR
120+
OR --> IOR
121+
I ==> PPT
122+
IS ==> PPT
123+
OR ==> PPT
124+
PUT ==> PPT
125+
IOR ==> PPT
126+
```
127+
128+
##### Advanced options
129+
130+
You can customize the test generation process with various options:
131+
132+
```shell
133+
# Specify effort level (min, low, medium, high)
134+
gh models generate --effort high my_prompt.prompt.yml
135+
136+
# Use a specific model for groundtruth generation
137+
gh models generate --groundtruth-model "openai/gpt-4.1" my_prompt.prompt.yml
138+
139+
# Disable groundtruth generation
140+
gh models generate --groundtruth-model "none" my_prompt.prompt.yml
141+
142+
# Load from an existing session file (or create a new one if needed)
143+
gh models generate --session-file my_prompt.session.json my_prompt.prompt.yml
144+
145+
# Custom instructions for specific generation phases
146+
gh models generate --instruction-intent "Focus on edge cases" my_prompt.prompt.yml
147+
```
148+
149+
The `effort` flag controls a few flags in the test generation engine and is a tradeoff
150+
between how much tests you want generated and how much tokens/time you are willing to spend.
151+
- `min` is just enough to generate a few tests and make sure things are probably configured.
152+
- `low` should be used to do a quick try of the test generation. It limits the number of rules to `3`.
153+
- `medium` provides much better coverage
154+
- `high` spends more token per rule to generate tests, which typically leads to longer, more complex inputs
155+
156+
The command supports custom instructions for different phases of test generation:
157+
- `--instruction-intent`: Custom system instruction for intent generation
158+
- `--instruction-inputspec`: Custom system instruction for input specification generation
159+
- `--instruction-outputrules`: Custom system instruction for output rules generation
160+
- `--instruction-inverseoutputrules`: Custom system instruction for inverse output rules generation
161+
- `--instruction-tests`: Custom system instruction for tests generation
162+
163+
87164
## Notice
88165

89166
Remember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is

cmd/generate/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# `generate` command
2+
3+
This command is based on [PromptPex](https://github.com/microsoft/promptpex), a test generation framework for prompts.
4+
5+
- [Documentation](https://microsoft.github.com/promptpex)
6+
- [Source](https://github.com/microsoft/promptpex/tree/dev)
7+
- [Agentic implementation plan](https://github.com/microsoft/promptpex/blob/dev/.github/instructions/implementation.instructions.md)
8+
9+
In a nutshell, read https://microsoft.github.io/promptpex/reference/test-generation/
10+

cmd/generate/cleaner.go

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
package generate
2+
3+
import (
4+
"regexp"
5+
"strings"
6+
)
7+
8+
// IsUnassistedResponse returns true if the text is an unassisted response, like "i'm sorry" or "i can't assist with that".
9+
func IsUnassistedResponse(text string) bool {
10+
re := regexp.MustCompile(`i can't assist with that|i'm sorry`)
11+
return re.MatchString(strings.ToLower(text))
12+
}
13+
14+
// Unfence removes Markdown code fences and splits text into lines.
15+
func Unfence(text string) string {
16+
text = strings.TrimSpace(text)
17+
// Remove triple backtick code fences if present
18+
if strings.HasPrefix(text, "```") {
19+
parts := strings.SplitN(text, "\n", 2)
20+
if len(parts) == 2 {
21+
text = parts[1]
22+
}
23+
text = strings.TrimSuffix(text, "```")
24+
}
25+
return text
26+
}
27+
28+
// SplitLines splits text into lines.
29+
func SplitLines(text string) []string {
30+
lines := strings.Split(text, "\n")
31+
return lines
32+
}
33+
34+
// Unbracket removes leading and trailing square brackets.
35+
func Unbracket(text string) string {
36+
if strings.HasPrefix(text, "[") && strings.HasSuffix(text, "]") {
37+
text = strings.TrimPrefix(text, "[")
38+
text = strings.TrimSuffix(text, "]")
39+
}
40+
return text
41+
}
42+
43+
// Unxml removes leading and trailing XML tags, like `<foo>` and `</foo>`, from the given string.
44+
func Unxml(text string) string {
45+
// if the string starts with <foo> and ends with </foo>, remove those tags
46+
trimmed := strings.TrimSpace(text)
47+
48+
// Use regex to extract tag name and content
49+
// First, extract the opening tag and tag name
50+
openTagRe := regexp.MustCompile(`(?s)^<([^>\s]+)[^>]*>(.*)$`)
51+
openMatches := openTagRe.FindStringSubmatch(trimmed)
52+
if len(openMatches) != 3 {
53+
return text
54+
}
55+
56+
tagName := openMatches[1]
57+
content := openMatches[2]
58+
59+
// Check if it ends with the corresponding closing tag
60+
closingTag := "</" + tagName + ">"
61+
if strings.HasSuffix(content, closingTag) {
62+
content = strings.TrimSuffix(content, closingTag)
63+
return strings.TrimSpace(content)
64+
}
65+
66+
return text
67+
}

0 commit comments

Comments
 (0)