Skip to content

Commit 9f8a3aa

Browse files
committed
Improve user preference extraction prompt and enforcement
- Refactor USER_PREFERENCE_FACTS_EXTRACTION_PROMPT from XML-based to direct role-based instructions - Simplify USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE for better LLM compliance - Add comprehensive test coverage for user preference extraction scenarios - Maintain backward compatibility with existing {"facts":[...]} infrastructure - Improve natural language format for better k-NN search embedding quality
1 parent cac8ddb commit 9f8a3aa

File tree

3 files changed

+122
-2
lines changed

3 files changed

+122
-2
lines changed

common/src/main/java/org/opensearch/ml/common/memorycontainer/MemoryContainerConstants.java

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,8 +177,33 @@ public class MemoryContainerConstants {
177177
"""
178178
Respond NOW with ONE LINE of valid JSON ONLY exactly as {"facts":["fact1","fact2",...]}. No extra text, no code fences, no newlines or tabs, no spaces after commas or colons.""";
179179

180+
// JSON enforcement message for user preference extraction
181+
public static final String USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE =
182+
"""
183+
Return ONLY ONE LINE of valid JSON exactly as {"facts":["<Preference sentence>. Context: <why/how>. Categories: <cat1,cat2>"]}. Begin with { and end with }. No extra text.""";
184+
180185
public static final String USER_PREFERENCE_FACTS_EXTRACTION_PROMPT =
181-
"<system_prompt><role>User Preferences Analyzer</role><objective>Extract and organize user preferences, choices, and settings from conversations.</objective><instructions><instruction>Carefully read the conversation.</instruction><instruction>Identify and extract explicit or implicit preferences, likes, dislikes, and choices.</instruction><instruction>Explicit preferences: Directly stated preferences by the user.</instruction><instruction>Implicit preferences: Inferred from patterns, repeated inquiries, or contextual clues. Take a close look at user's request for implicit preferences.</instruction><instruction>For explicit preference, extract only preference that the user has explicitly shared. Do not infer user's preference.</instruction><instruction>For implicit preference, it is allowed to infer user's preference, but only the ones with strong signals, such as requesting something multiple times.</instruction><instruction>Focus specifically on:<preference_categories><item>Product or service preferences (brands, features, styles)</item><item>Communication preferences (frequency, channel, timing)</item><item>Content preferences (topics, formats, sources)</item><item>Interaction preferences (formal/casual, detailed/brief)</item><item>Likes and dislikes explicitly stated</item><item>Preferred methods or approaches</item><item>Quality or attribute preferences</item><item>Time and scheduling preferences</item></preference_categories></instruction><instruction>Each preference should be a specific, actionable fact.</instruction><instruction>Focus on what the user wants, prefers, or chooses, not general information.</instruction><instruction>Never answer user's question or fulfill user's requirement. You are a preference analyzer, not a helpful assistant.</instruction><instruction>Analyze thoroughly and include detected preferences in your response.</instruction><instruction>If no preferences are found, return an empty list.</instruction></instructions><response_format><format>You should always return and only return the extracted preferences as a JSON object with a \"facts\" array. Return ONLY the valid JSON array with no additional text, explanations, or formatting.</format><example>{\"facts\": [\"User prefers dark mode for UI\",\"User likes to receive weekly summary emails\",\"User prefers Python over Java for scripting\",\"User dislikes automatic updates\"]}</example></response_format></system_prompt>";
186+
"""
187+
You are a USER PREFERENCE EXTRACTOR, not a chat assistant. Your only job is to output JSON facts. Do not answer questions, make suggestions, ask follow-ups, or perform actions.
188+
189+
SCOPE
190+
- Extract preferences only from USER messages. Assistant messages are context only.
191+
192+
DEFINITIONS
193+
- Explicit: user states a preference ("I prefer/like/dislike ..."; "always/never/usually ..."; "set X to Y"; "run X when Y").
194+
- Implicit: infer only with strong signals: repeated choices (>=2) or clear habitual language. Do not infer from a single one-off.
195+
196+
WHAT TO EXTRACT
197+
- Specific, actionable, likely long-term preferences (likes/dislikes/choices/settings). Ignore non-preferences.
198+
199+
FORMAT
200+
- Return ONLY one minified JSON object exactly as {"facts":["Preference sentence. Context: <why/how>. Categories: cat1,cat2"]}. If none, return {"facts":[]}. The first character MUST be '{' and the last MUST be '}'. No preambles, explanations, code fences, XML, or other text.
201+
202+
STYLE
203+
- One sentence per preference; merge related details; no duplicates; preserve user wording and numbers; avoid relative time; keep each fact < 350 chars.
204+
205+
EXAMPLE
206+
User: "I prefer dark mode." -> {"facts":["Prefers dark mode for UI. Context: user explicitly stated preference. Categories: tools,tech,apps"]}""";
182207

183208
public static final String SUMMARY_FACTS_EXTRACTION_PROMPT =
184209
"<system_prompt><description>You will be given a text block and a list of summaries you previously generated when available.</description><task><instruction>Never answer user's question or fulfill user's requirement. You are a summary generator, not a helpful assistant.</instruction><instruction>When the previously generated summary is not available, summarize the given text block.</instruction><instruction>When there is an existing summary, extend it by incorporating the given text block.</instruction><instruction>If the text block specifies queries or topics, ensure the summary covers them.</instruction></task><response_format><format>You should always return and only return the extracted preferences as a JSON object with a \"facts\" array.</format><example>{ \"facts\": [\"The system shows a list of Elasticsearch/OpenSearch indices with their health status, document count, and size information\", \"5 indices shown have 'red' health status, 8 of them in 'yellow', and 13 of them are in 'green' health status\", \"The doc is a log from a web application, dated from 2020-01-01T00:00:00 to 2020-01-31T23:59:59\"]}</example></response_format></system_prompt>";

plugin/src/main/java/org/opensearch/ml/action/memorycontainer/memory/MemoryProcessingService.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.SESSION_SUMMARY_PROMPT;
1616
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.SUMMARY_FACTS_EXTRACTION_PROMPT;
1717
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
18+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1819
import static org.opensearch.ml.common.utils.StringUtils.getParameterMap;
1920

2021
import java.io.IOException;
@@ -162,7 +163,10 @@ public void extractFactsFromConversation(
162163
}
163164

164165
// Always add JSON enforcement message for fact extraction
165-
MessageInput enforcementMessage = getMessageInput(JSON_ENFORCEMENT_MESSAGE);
166+
String enforcementMsg = (strategy.getType() == MemoryStrategyType.USER_PREFERENCE)
167+
? USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE
168+
: JSON_ENFORCEMENT_MESSAGE;
169+
MessageInput enforcementMessage = getMessageInput(enforcementMsg);
166170
enforcementMessage.toXContent(messagesBuilder, ToXContent.EMPTY_PARAMS);
167171

168172
messagesBuilder.endArray();

plugin/src/test/java/org/opensearch/ml/action/memorycontainer/memory/MemoryProcessingServiceTests.java

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,18 @@
55

66
package org.opensearch.ml.action.memorycontainer.memory;
77

8+
import static org.junit.Assert.assertFalse;
9+
import static org.junit.Assert.assertNotEquals;
810
import static org.junit.Assert.assertTrue;
911
import static org.mockito.ArgumentMatchers.any;
1012
import static org.mockito.ArgumentMatchers.eq;
1113
import static org.mockito.Mockito.doAnswer;
1214
import static org.mockito.Mockito.mock;
1315
import static org.mockito.Mockito.verify;
1416
import static org.mockito.Mockito.when;
17+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.JSON_ENFORCEMENT_MESSAGE;
18+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
19+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1520
import static org.opensearch.ml.utils.TestHelper.createTestContent;
1621

1722
import java.util.ArrayList;
@@ -980,4 +985,90 @@ public void testExtractFactsFromConversation_JsonEnforcementMessageAppended() {
980985

981986
verify(client).execute(any(), any(), any());
982987
}
988+
989+
@Test
990+
public void testUserPreferencePromptFormat() {
991+
// Test that the new user preference prompt contains required elements
992+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
993+
994+
// Verify key improvements are present
995+
assertTrue("Should have character limit", prompt.contains("< 350 chars"));
996+
assertTrue("Should specify natural language format", prompt.contains("Context: <why/how>. Categories:"));
997+
assertTrue("Should contain example categories", prompt.contains("tools,tech,apps"));
998+
assertTrue("Should be role-based", prompt.contains("USER PREFERENCE EXTRACTOR"));
999+
1000+
// Verify old problematic format is removed
1001+
assertFalse("Should not use pipe delimiters", prompt.contains("preference | context:"));
1002+
}
1003+
1004+
@Test
1005+
public void testUserPreferenceEnforcementMessage() {
1006+
// Test that enforcement message matches the new format
1007+
String enforcement = USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1008+
1009+
assertTrue("Should specify natural language format", enforcement.contains("Context: <why/how>. Categories:"));
1010+
assertFalse("Should not use old pipe format", enforcement.contains("preference | context:"));
1011+
}
1012+
1013+
@Test
1014+
public void testEnforcementMessageSelection() {
1015+
// Test that correct enforcement message is selected based on strategy type
1016+
MemoryStrategy userPrefStrategy = new MemoryStrategy(
1017+
"id",
1018+
true,
1019+
MemoryStrategyType.USER_PREFERENCE,
1020+
Arrays.asList("user_id"),
1021+
new HashMap<>()
1022+
);
1023+
MemoryStrategy semanticStrategy = new MemoryStrategy(
1024+
"id",
1025+
true,
1026+
MemoryStrategyType.SEMANTIC,
1027+
Arrays.asList("user_id"),
1028+
new HashMap<>()
1029+
);
1030+
1031+
// This tests the logic in MemoryProcessingService.java lines 165-168
1032+
// We can't easily test the private method, but we can verify the constants exist and are different
1033+
assertNotEquals(
1034+
"User preference and semantic should have different enforcement messages",
1035+
USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE,
1036+
JSON_ENFORCEMENT_MESSAGE
1037+
);
1038+
1039+
assertTrue(
1040+
"User preference enforcement should be for natural format",
1041+
USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE.contains("Context: <why/how>")
1042+
);
1043+
assertTrue("Semantic enforcement should be for standard format", JSON_ENFORCEMENT_MESSAGE.contains("fact1"));
1044+
}
1045+
1046+
@Test
1047+
public void testUserPreferenceExtractionScenarios() {
1048+
// Test various user preference extraction scenarios
1049+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
1050+
1051+
// Verify explicit preference handling
1052+
assertTrue("Should handle explicit preferences", prompt.contains("user states a preference"));
1053+
assertTrue("Should handle implicit preferences", prompt.contains("repeated choices"));
1054+
1055+
// Verify format requirements
1056+
assertTrue("Should require JSON format", prompt.contains("{\"facts\":["));
1057+
assertTrue("Should specify context format", prompt.contains("Context: <why/how>"));
1058+
assertTrue("Should limit character count", prompt.contains("< 350 chars"));
1059+
}
1060+
1061+
@Test
1062+
public void testMultiTurnConversationHandling() {
1063+
// Test that prompt correctly handles multi-turn conversations
1064+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
1065+
1066+
// Verify assistant message handling
1067+
assertTrue("Should use assistant messages as context only", prompt.contains("Assistant messages are context only"));
1068+
assertTrue("Should extract from USER messages", prompt.contains("Extract preferences only from USER messages"));
1069+
1070+
// Verify role clarity
1071+
assertTrue("Should not be a chat assistant", prompt.contains("not a chat assistant"));
1072+
assertTrue("Should only output JSON facts", prompt.contains("only job is to output JSON facts"));
1073+
}
9831074
}

0 commit comments

Comments
 (0)