Skip to content

Commit c91b5ef

Browse files
committed
Improve user preference extraction prompt with XML-based structure
- Refactor USER_PREFERENCE_FACTS_EXTRACTION_PROMPT to match SEMANTIC_FACTS_EXTRACTION_PROMPT style - Use lightweight XML structure: <ROLE>, <SCOPE>, <EXTRACT>, <STYLE & RULES>, <OUTPUT> - Simplify USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE for better LLM compliance - Add comprehensive test coverage for user preference extraction scenarios - Maintain backward compatibility with existing {"facts":[...]} infrastructure - Improve natural language format for better k-NN search embedding quality
1 parent cac8ddb commit c91b5ef

File tree

3 files changed

+126
-2
lines changed

3 files changed

+126
-2
lines changed

common/src/main/java/org/opensearch/ml/common/memorycontainer/MemoryContainerConstants.java

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,8 +177,33 @@ public class MemoryContainerConstants {
177177
"""
178178
Respond NOW with ONE LINE of valid JSON ONLY exactly as {"facts":["fact1","fact2",...]}. No extra text, no code fences, no newlines or tabs, no spaces after commas or colons.""";
179179

180+
// JSON enforcement message for user preference extraction
181+
public static final String USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE =
182+
"""
183+
Return ONLY ONE LINE of valid JSON exactly as {"facts":["<Preference sentence>. Context: <why/how>. Categories: <cat1,cat2>"]}. Begin with { and end with }. No extra text.""";
184+
180185
public static final String USER_PREFERENCE_FACTS_EXTRACTION_PROMPT =
181-
"<system_prompt><role>User Preferences Analyzer</role><objective>Extract and organize user preferences, choices, and settings from conversations.</objective><instructions><instruction>Carefully read the conversation.</instruction><instruction>Identify and extract explicit or implicit preferences, likes, dislikes, and choices.</instruction><instruction>Explicit preferences: Directly stated preferences by the user.</instruction><instruction>Implicit preferences: Inferred from patterns, repeated inquiries, or contextual clues. Take a close look at user's request for implicit preferences.</instruction><instruction>For explicit preference, extract only preference that the user has explicitly shared. Do not infer user's preference.</instruction><instruction>For implicit preference, it is allowed to infer user's preference, but only the ones with strong signals, such as requesting something multiple times.</instruction><instruction>Focus specifically on:<preference_categories><item>Product or service preferences (brands, features, styles)</item><item>Communication preferences (frequency, channel, timing)</item><item>Content preferences (topics, formats, sources)</item><item>Interaction preferences (formal/casual, detailed/brief)</item><item>Likes and dislikes explicitly stated</item><item>Preferred methods or approaches</item><item>Quality or attribute preferences</item><item>Time and scheduling preferences</item></preference_categories></instruction><instruction>Each preference should be a specific, actionable fact.</instruction><instruction>Focus on what the user wants, prefers, or chooses, not general information.</instruction><instruction>Never answer user's question or fulfill user's requirement. You are a preference analyzer, not a helpful assistant.</instruction><instruction>Analyze thoroughly and include detected preferences in your response.</instruction><instruction>If no preferences are found, return an empty list.</instruction></instructions><response_format><format>You should always return and only return the extracted preferences as a JSON object with a \"facts\" array. Return ONLY the valid JSON array with no additional text, explanations, or formatting.</format><example>{\"facts\": [\"User prefers dark mode for UI\",\"User likes to receive weekly summary emails\",\"User prefers Python over Java for scripting\",\"User dislikes automatic updates\"]}</example></response_format></system_prompt>";
186+
"""
187+
<ROLE>You are a USER PREFERENCE EXTRACTOR, not a chat assistant. Your only job is to output JSON facts. Do not answer questions, make suggestions, ask follow-ups, or perform actions.</ROLE>
188+
189+
<SCOPE>
190+
• Extract preferences only from USER messages. Assistant messages are context only.
191+
• Explicit: user states a preference ("I prefer/like/dislike ..."; "always/never/usually ..."; "set X to Y"; "run X when Y").
192+
• Implicit: infer only with strong signals: repeated choices (>=2) or clear habitual language. Do not infer from a single one-off.
193+
</SCOPE>
194+
195+
<EXTRACT>
196+
• Specific, actionable, likely long-term preferences (likes/dislikes/choices/settings). Ignore non-preferences.
197+
</EXTRACT>
198+
199+
<STYLE & RULES>
200+
• One sentence per preference; merge related details; no duplicates; preserve user wording and numbers; avoid relative time; keep each fact < 350 chars.
201+
• Format: "Preference sentence. Context: <why/how>. Categories: cat1,cat2"
202+
</STYLE & RULES>
203+
204+
<OUTPUT>
205+
Return ONLY one minified JSON object exactly as {"facts":["Preference sentence. Context: <why/how>. Categories: cat1,cat2"]}. If none, return {"facts":[]}. The first character MUST be '{' and the last MUST be '}'. No preambles, explanations, code fences, XML, or other text.
206+
</OUTPUT>""";
182207

183208
public static final String SUMMARY_FACTS_EXTRACTION_PROMPT =
184209
"<system_prompt><description>You will be given a text block and a list of summaries you previously generated when available.</description><task><instruction>Never answer user's question or fulfill user's requirement. You are a summary generator, not a helpful assistant.</instruction><instruction>When the previously generated summary is not available, summarize the given text block.</instruction><instruction>When there is an existing summary, extend it by incorporating the given text block.</instruction><instruction>If the text block specifies queries or topics, ensure the summary covers them.</instruction></task><response_format><format>You should always return and only return the extracted preferences as a JSON object with a \"facts\" array.</format><example>{ \"facts\": [\"The system shows a list of Elasticsearch/OpenSearch indices with their health status, document count, and size information\", \"5 indices shown have 'red' health status, 8 of them in 'yellow', and 13 of them are in 'green' health status\", \"The doc is a log from a web application, dated from 2020-01-01T00:00:00 to 2020-01-31T23:59:59\"]}</example></response_format></system_prompt>";

plugin/src/main/java/org/opensearch/ml/action/memorycontainer/memory/MemoryProcessingService.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.SESSION_SUMMARY_PROMPT;
1616
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.SUMMARY_FACTS_EXTRACTION_PROMPT;
1717
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
18+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1819
import static org.opensearch.ml.common.utils.StringUtils.getParameterMap;
1920

2021
import java.io.IOException;
@@ -162,7 +163,10 @@ public void extractFactsFromConversation(
162163
}
163164

164165
// Always add JSON enforcement message for fact extraction
165-
MessageInput enforcementMessage = getMessageInput(JSON_ENFORCEMENT_MESSAGE);
166+
String enforcementMsg = (strategy.getType() == MemoryStrategyType.USER_PREFERENCE)
167+
? USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE
168+
: JSON_ENFORCEMENT_MESSAGE;
169+
MessageInput enforcementMessage = getMessageInput(enforcementMsg);
166170
enforcementMessage.toXContent(messagesBuilder, ToXContent.EMPTY_PARAMS);
167171

168172
messagesBuilder.endArray();

plugin/src/test/java/org/opensearch/ml/action/memorycontainer/memory/MemoryProcessingServiceTests.java

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,18 @@
55

66
package org.opensearch.ml.action.memorycontainer.memory;
77

8+
import static org.junit.Assert.assertFalse;
9+
import static org.junit.Assert.assertNotEquals;
810
import static org.junit.Assert.assertTrue;
911
import static org.mockito.ArgumentMatchers.any;
1012
import static org.mockito.ArgumentMatchers.eq;
1113
import static org.mockito.Mockito.doAnswer;
1214
import static org.mockito.Mockito.mock;
1315
import static org.mockito.Mockito.verify;
1416
import static org.mockito.Mockito.when;
17+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.JSON_ENFORCEMENT_MESSAGE;
18+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
19+
import static org.opensearch.ml.common.memorycontainer.MemoryContainerConstants.USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1520
import static org.opensearch.ml.utils.TestHelper.createTestContent;
1621

1722
import java.util.ArrayList;
@@ -980,4 +985,94 @@ public void testExtractFactsFromConversation_JsonEnforcementMessageAppended() {
980985

981986
verify(client).execute(any(), any(), any());
982987
}
988+
989+
@Test
990+
public void testUserPreferencePromptFormat() {
991+
// Test that the new user preference prompt contains required elements
992+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
993+
994+
// Verify XML-based structure like SEMANTIC_FACTS_EXTRACTION_PROMPT
995+
assertTrue("Should have ROLE section", prompt.contains("<ROLE>"));
996+
assertTrue("Should have SCOPE section", prompt.contains("<SCOPE>"));
997+
assertTrue("Should have OUTPUT section", prompt.contains("<OUTPUT>"));
998+
assertTrue("Should be role-based", prompt.contains("USER PREFERENCE EXTRACTOR"));
999+
1000+
// Verify key requirements
1001+
assertTrue("Should have character limit", prompt.contains("< 350 chars"));
1002+
assertTrue("Should specify context format", prompt.contains("Context: <why/how>"));
1003+
1004+
// Verify old problematic format is removed
1005+
assertFalse("Should not use pipe delimiters", prompt.contains("preference | context:"));
1006+
}
1007+
1008+
@Test
1009+
public void testUserPreferenceEnforcementMessage() {
1010+
// Test that enforcement message matches the new format
1011+
String enforcement = USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE;
1012+
1013+
assertTrue("Should specify natural language format", enforcement.contains("Context: <why/how>. Categories:"));
1014+
assertFalse("Should not use old pipe format", enforcement.contains("preference | context:"));
1015+
}
1016+
1017+
@Test
1018+
public void testEnforcementMessageSelection() {
1019+
// Test that correct enforcement message is selected based on strategy type
1020+
MemoryStrategy userPrefStrategy = new MemoryStrategy(
1021+
"id",
1022+
true,
1023+
MemoryStrategyType.USER_PREFERENCE,
1024+
Arrays.asList("user_id"),
1025+
new HashMap<>()
1026+
);
1027+
MemoryStrategy semanticStrategy = new MemoryStrategy(
1028+
"id",
1029+
true,
1030+
MemoryStrategyType.SEMANTIC,
1031+
Arrays.asList("user_id"),
1032+
new HashMap<>()
1033+
);
1034+
1035+
// This tests the logic in MemoryProcessingService.java lines 165-168
1036+
// We can't easily test the private method, but we can verify the constants exist and are different
1037+
assertNotEquals(
1038+
"User preference and semantic should have different enforcement messages",
1039+
USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE,
1040+
JSON_ENFORCEMENT_MESSAGE
1041+
);
1042+
1043+
assertTrue(
1044+
"User preference enforcement should be for natural format",
1045+
USER_PREFERENCE_JSON_ENFORCEMENT_MESSAGE.contains("Context: <why/how>")
1046+
);
1047+
assertTrue("Semantic enforcement should be for standard format", JSON_ENFORCEMENT_MESSAGE.contains("fact1"));
1048+
}
1049+
1050+
@Test
1051+
public void testUserPreferenceExtractionScenarios() {
1052+
// Test various user preference extraction scenarios
1053+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
1054+
1055+
// Verify explicit preference handling
1056+
assertTrue("Should handle explicit preferences", prompt.contains("user states a preference"));
1057+
assertTrue("Should handle implicit preferences", prompt.contains("repeated choices"));
1058+
1059+
// Verify format requirements
1060+
assertTrue("Should require JSON format", prompt.contains("{\"facts\":["));
1061+
assertTrue("Should specify context format", prompt.contains("Context: <why/how>"));
1062+
assertTrue("Should limit character count", prompt.contains("< 350 chars"));
1063+
}
1064+
1065+
@Test
1066+
public void testMultiTurnConversationHandling() {
1067+
// Test that prompt correctly handles multi-turn conversations
1068+
String prompt = USER_PREFERENCE_FACTS_EXTRACTION_PROMPT;
1069+
1070+
// Verify assistant message handling
1071+
assertTrue("Should use assistant messages as context only", prompt.contains("Assistant messages are context only"));
1072+
assertTrue("Should extract from USER messages", prompt.contains("Extract preferences only from USER messages"));
1073+
1074+
// Verify role clarity
1075+
assertTrue("Should not be a chat assistant", prompt.contains("not a chat assistant"));
1076+
assertTrue("Should only output JSON facts", prompt.contains("only job is to output JSON facts"));
1077+
}
9831078
}

0 commit comments

Comments
 (0)