Skip to content

Commit 33f6318

Browse files
authored
Merge pull request #339 from drivecore/feature/338-message-compaction
Add automatic compaction of historical messages for agents
2 parents db62e48 + e2a86c0 commit 33f6318

22 files changed

+896
-16
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Command-line interface for AI-powered coding tasks. Full details available on th
1212
- 👤 **Human Compatible**: Uses README.md, project files and shell commands to build its own context
1313
- 🌐 **GitHub Integration**: GitHub mode for working with issues and PRs as part of workflow
1414
- 📄 **Model Context Protocol**: Support for MCP to access external context sources
15+
- 🧠 **Message Compaction**: Automatic management of context window for long-running agents
1516

1617
Please join the MyCoder.ai discord for support: https://discord.gg/5K6TYrHGHt
1718

docs/features/message-compaction.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Message Compaction
2+
3+
When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded. The message compaction feature helps prevent this by providing agents with awareness of their token usage and tools to manage their context window.
4+
5+
## Features
6+
7+
### 1. Token Usage Tracking
8+
9+
The LLM abstraction now tracks and returns:
10+
- Total tokens used in the current completion request
11+
- Maximum allowed tokens for the model/provider
12+
13+
This information is used to monitor context window usage and trigger appropriate actions.
14+
15+
### 2. Status Updates
16+
17+
Agents receive status updates with information about:
18+
- Current token usage and percentage of the maximum
19+
- Cost so far
20+
- Active sub-agents and their status
21+
- Active shell processes and their status
22+
- Active browser sessions and their status
23+
24+
Status updates are sent:
25+
1. Every 5 agent interactions (periodic updates)
26+
2. Whenever token usage exceeds 50% of the maximum (threshold-based updates)
27+
28+
Example status update:
29+
```
30+
--- STATUS UPDATE ---
31+
Token Usage: 45,235/100,000 (45%)
32+
Cost So Far: $0.23
33+
34+
Active Sub-Agents: 2
35+
- sa_12345: Analyzing project structure and dependencies
36+
- sa_67890: Implementing unit tests for compactHistory tool
37+
38+
Active Shell Processes: 3
39+
- sh_abcde: npm test
40+
- sh_fghij: npm run watch
41+
- sh_klmno: git status
42+
43+
Active Browser Sessions: 1
44+
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html
45+
46+
If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
47+
--- END STATUS ---
48+
```
49+
50+
### 3. Message Compaction Tool
51+
52+
The `compactHistory` tool allows agents to compact their message history by summarizing older messages while preserving recent context. This tool:
53+
54+
1. Takes a parameter for how many recent messages to preserve unchanged
55+
2. Summarizes all older messages into a single, concise summary
56+
3. Replaces the original messages with the summary and preserved messages
57+
4. Reports on the reduction in context size
58+
59+
## Usage
60+
61+
Agents are instructed to monitor their token usage through status updates and use the `compactHistory` tool when token usage approaches 50% of the maximum:
62+
63+
```javascript
64+
// Example of agent using the compactHistory tool
65+
{
66+
name: "compactHistory",
67+
preserveRecentMessages: 10,
68+
customPrompt: "Focus on summarizing our key decisions and current tasks."
69+
}
70+
```
71+
72+
## Configuration
73+
74+
The message compaction feature is enabled by default with reasonable defaults:
75+
- Status updates every 5 agent interactions
76+
- Recommendation to compact at 70% token usage
77+
- Default preservation of 10 recent messages when compacting
78+
79+
## Model Token Limits
80+
81+
The system includes token limits for various models:
82+
83+
### Anthropic Models
84+
- claude-3-opus-20240229: 200,000 tokens
85+
- claude-3-sonnet-20240229: 200,000 tokens
86+
- claude-3-haiku-20240307: 200,000 tokens
87+
- claude-2.1: 100,000 tokens
88+
89+
### OpenAI Models
90+
- gpt-4o: 128,000 tokens
91+
- gpt-4-turbo: 128,000 tokens
92+
- gpt-3.5-turbo: 16,385 tokens
93+
94+
### Ollama Models
95+
- llama2: 4,096 tokens
96+
- mistral: 8,192 tokens
97+
- mixtral: 32,768 tokens
98+
99+
## Benefits
100+
101+
- Prevents context window overflow errors
102+
- Maintains important context for agent operation
103+
- Enables longer-running agent sessions
104+
- Makes the system more robust for complex tasks
105+
- Gives agents self-awareness of resource usage

example-status-update.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Example Status Update
2+
3+
This is an example of what the status update looks like for the agent:
4+
5+
```
6+
--- STATUS UPDATE ---
7+
Token Usage: 45,235/100,000 (45%)
8+
Cost So Far: $0.23
9+
10+
Active Sub-Agents: 2
11+
- sa_12345: Analyzing project structure and dependencies
12+
- sa_67890: Implementing unit tests for compactHistory tool
13+
14+
Active Shell Processes: 3
15+
- sh_abcde: npm test -- --watch packages/agent/src/tools/utility
16+
- sh_fghij: npm run watch
17+
- sh_klmno: git status
18+
19+
Active Browser Sessions: 1
20+
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html
21+
22+
Your token usage is high (45%). It is recommended to use the 'compactHistory' tool now to reduce context size.
23+
--- END STATUS ---
24+
```
25+
26+
## About Status Updates
27+
28+
Status updates are sent to the agent (every 5 interactions and whenever token usage exceeds 50%) to provide awareness of:
29+
30+
1. **Token Usage**: Current usage and percentage of maximum context window
31+
2. **Cost**: Estimated cost of the session so far
32+
3. **Active Sub-Agents**: Running background agents and their tasks
33+
4. **Active Shell Processes**: Running shell commands
34+
5. **Active Browser Sessions**: Open browser sessions and their URLs
35+
36+
When token usage gets high (>70%), the agent is reminded to use the `compactHistory` tool to reduce context size by summarizing older messages.
37+
38+
## Using the compactHistory Tool
39+
40+
The agent can use the compactHistory tool like this:
41+
42+
```javascript
43+
{
44+
name: "compactHistory",
45+
preserveRecentMessages: 10,
46+
customPrompt: "Optional custom summarization prompt"
47+
}
48+
```
49+
50+
This will summarize all but the 10 most recent messages into a single summary message, significantly reducing token usage while preserving important context.

packages/agent/CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
# [mycoder-agent-v1.6.0](https://github.com/drivecore/mycoder/compare/mycoder-agent-v1.5.0...mycoder-agent-v1.6.0) (2025-03-21)
2+
3+
4+
### Features
5+
6+
* **browser:** add system browser detection for Playwright ([00bd879](https://github.com/drivecore/mycoder/commit/00bd879443c9de51c6ee5e227d4838905506382a)), closes [#333](https://github.com/drivecore/mycoder/issues/333)
7+
18
# [mycoder-agent-v1.5.0](https://github.com/drivecore/mycoder/compare/mycoder-agent-v1.4.2...mycoder-agent-v1.5.0) (2025-03-20)
29

310
### Bug Fixes

packages/agent/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "mycoder-agent",
3-
"version": "1.5.0",
3+
"version": "1.6.0",
44
"description": "Agent module for mycoder - an AI-powered software development assistant",
55
"type": "module",
66
"main": "dist/index.js",

packages/agent/src/core/llm/providers/anthropic.ts

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,13 +81,33 @@ function addCacheControlToMessages(
8181
});
8282
}
8383

84-
function tokenUsageFromMessage(message: Anthropic.Message) {
84+
// Define model context window sizes for Anthropic models
85+
const ANTHROPIC_MODEL_LIMITS: Record<string, number> = {
86+
'claude-3-opus-20240229': 200000,
87+
'claude-3-sonnet-20240229': 200000,
88+
'claude-3-haiku-20240307': 200000,
89+
'claude-3-7-sonnet-20250219': 200000,
90+
'claude-2.1': 100000,
91+
'claude-2.0': 100000,
92+
'claude-instant-1.2': 100000,
93+
// Add other models as needed
94+
};
95+
96+
function tokenUsageFromMessage(message: Anthropic.Message, model: string) {
8597
const usage = new TokenUsage();
8698
usage.input = message.usage.input_tokens;
8799
usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
88100
usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
89101
usage.output = message.usage.output_tokens;
90-
return usage;
102+
103+
const totalTokens = usage.input + usage.output;
104+
const maxTokens = ANTHROPIC_MODEL_LIMITS[model] || 100000; // Default fallback
105+
106+
return {
107+
usage,
108+
totalTokens,
109+
maxTokens,
110+
};
91111
}
92112

93113
/**
@@ -175,10 +195,14 @@ export class AnthropicProvider implements LLMProvider {
175195
};
176196
});
177197

198+
const tokenInfo = tokenUsageFromMessage(response, this.model);
199+
178200
return {
179201
text: content,
180202
toolCalls: toolCalls,
181-
tokenUsage: tokenUsageFromMessage(response),
203+
tokenUsage: tokenInfo.usage,
204+
totalTokens: tokenInfo.totalTokens,
205+
maxTokens: tokenInfo.maxTokens,
182206
};
183207
} catch (error) {
184208
throw new Error(

packages/agent/src/core/llm/providers/ollama.ts

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,22 @@ import {
1313

1414
import { TokenUsage } from '../../tokens.js';
1515
import { ToolCall } from '../../types.js';
16+
// Define model context window sizes for Ollama models
17+
// These are approximate and may vary based on specific model configurations
18+
const OLLAMA_MODEL_LIMITS: Record<string, number> = {
19+
'llama2': 4096,
20+
'llama2-uncensored': 4096,
21+
'llama2:13b': 4096,
22+
'llama2:70b': 4096,
23+
'mistral': 8192,
24+
'mistral:7b': 8192,
25+
'mixtral': 32768,
26+
'codellama': 16384,
27+
'phi': 2048,
28+
'phi2': 2048,
29+
'openchat': 8192,
30+
// Add other models as needed
31+
};
1632
import { LLMProvider } from '../provider.js';
1733
import {
1834
GenerateOptions,
@@ -56,7 +72,7 @@ export class OllamaProvider implements LLMProvider {
5672
messages,
5773
functions,
5874
temperature = 0.7,
59-
maxTokens,
75+
maxTokens: requestMaxTokens,
6076
topP,
6177
frequencyPenalty,
6278
presencePenalty,
@@ -86,10 +102,10 @@ export class OllamaProvider implements LLMProvider {
86102
};
87103

88104
// Add max_tokens if provided
89-
if (maxTokens !== undefined) {
105+
if (requestMaxTokens !== undefined) {
90106
requestOptions.options = {
91107
...requestOptions.options,
92-
num_predict: maxTokens,
108+
num_predict: requestMaxTokens,
93109
};
94110
}
95111

@@ -114,11 +130,23 @@ export class OllamaProvider implements LLMProvider {
114130
const tokenUsage = new TokenUsage();
115131
tokenUsage.output = response.eval_count || 0;
116132
tokenUsage.input = response.prompt_eval_count || 0;
133+
134+
// Calculate total tokens and get max tokens for the model
135+
const totalTokens = tokenUsage.input + tokenUsage.output;
136+
137+
// Extract the base model name without specific parameters
138+
const baseModelName = this.model.split(':')[0];
139+
// Check if model exists in limits, otherwise use base model or default
140+
const modelMaxTokens = OLLAMA_MODEL_LIMITS[this.model] ||
141+
(baseModelName ? OLLAMA_MODEL_LIMITS[baseModelName] : undefined) ||
142+
4096; // Default fallback
117143

118144
return {
119145
text: content,
120146
toolCalls: toolCalls,
121147
tokenUsage: tokenUsage,
148+
totalTokens,
149+
maxTokens: modelMaxTokens,
122150
};
123151
}
124152

packages/agent/src/core/llm/providers/openai.ts

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import OpenAI from 'openai';
55

66
import { TokenUsage } from '../../tokens.js';
7-
import { ToolCall } from '../../types';
7+
import { ToolCall } from '../../types.js';
88
import { LLMProvider } from '../provider.js';
99
import {
1010
GenerateOptions,
@@ -19,6 +19,19 @@ import type {
1919
ChatCompletionTool,
2020
} from 'openai/resources/chat';
2121

22+
// Define model context window sizes for OpenAI models
23+
const OPENAI_MODEL_LIMITS: Record<string, number> = {
24+
'gpt-4o': 128000,
25+
'gpt-4-turbo': 128000,
26+
'gpt-4-0125-preview': 128000,
27+
'gpt-4-1106-preview': 128000,
28+
'gpt-4': 8192,
29+
'gpt-4-32k': 32768,
30+
'gpt-3.5-turbo': 16385,
31+
'gpt-3.5-turbo-16k': 16385,
32+
// Add other models as needed
33+
};
34+
2235
/**
2336
* OpenAI-specific options
2437
*/
@@ -60,7 +73,7 @@ export class OpenAIProvider implements LLMProvider {
6073
messages,
6174
functions,
6275
temperature = 0.7,
63-
maxTokens,
76+
maxTokens: requestMaxTokens,
6477
stopSequences,
6578
topP,
6679
presencePenalty,
@@ -79,7 +92,7 @@ export class OpenAIProvider implements LLMProvider {
7992
model: this.model,
8093
messages: formattedMessages,
8194
temperature,
82-
max_tokens: maxTokens,
95+
max_tokens: requestMaxTokens,
8396
stop: stopSequences,
8497
top_p: topP,
8598
presence_penalty: presencePenalty,
@@ -116,11 +129,17 @@ export class OpenAIProvider implements LLMProvider {
116129
const tokenUsage = new TokenUsage();
117130
tokenUsage.input = response.usage?.prompt_tokens || 0;
118131
tokenUsage.output = response.usage?.completion_tokens || 0;
132+
133+
// Calculate total tokens and get max tokens for the model
134+
const totalTokens = tokenUsage.input + tokenUsage.output;
135+
const modelMaxTokens = OPENAI_MODEL_LIMITS[this.model] || 8192; // Default fallback
119136

120137
return {
121138
text: content,
122139
toolCalls,
123140
tokenUsage,
141+
totalTokens,
142+
maxTokens: modelMaxTokens,
124143
};
125144
} catch (error) {
126145
throw new Error(`Error calling OpenAI API: ${(error as Error).message}`);
@@ -198,4 +217,4 @@ export class OpenAIProvider implements LLMProvider {
198217
},
199218
}));
200219
}
201-
}
220+
}

packages/agent/src/core/llm/types.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ export interface LLMResponse {
8080
text: string;
8181
toolCalls: ToolCall[];
8282
tokenUsage: TokenUsage;
83+
// Add new fields for context window tracking
84+
totalTokens?: number; // Total tokens used in this request
85+
maxTokens?: number; // Maximum allowed tokens for this model
8386
}
8487

8588
/**

0 commit comments

Comments
 (0)