sotopia-lab · neelbhandari6 · Feb 20, 2025 · Feb 21, 2025 · Mar 27, 2025 · Apr 1, 2025
diff --git a/README.md b/README.md
@@ -45,6 +45,9 @@ Sotopia is an open-ended social learning environment that allows agents to inter
 ## Help
 See [documentation](https://docs.sotopia.world) for more details.
 
+> [!IMPORTANT]
+> If you are trying to develop on top of Sotopia, we highly recommend to follow the [development guide](https://docs.sotopia.world/contribution/contribution).
+
 ## Get started
 
 ### Install locally

diff --git a/docs/pages/examples/benchmark.md b/docs/pages/examples/benchmark.md
@@ -12,3 +12,15 @@ When `only-show-performance` is speficied, only model results with available epi
 Currently this script would run over 100 simulations on the Sotopia Hard tasks. And the partner model is fixed to be `meta-llama/Llama-3-70b-chat-hf`
 
 An example script is provided in `scripts/display_benchmark_results.sh`
+
+# Benchmark your model as a evaluator
+
+```
+uv run python examples/benchmark_evaluator.py --model=<model> --tag=<tag> --batch-size=<batch_size> --push-to-db
+```
+
+This script will re-evaluate the existing episodes with the new model and compare with human annotations.
+
+> **Note:** Sometimes you might need to run the script twice to get the results. This is because the uploading to the database might take some time to complete.
+
+> **Warning:** The re-evaluation does not use the exact same prompt as the original evaluation. However, we have no evidence suggesting that this slight format difference causes any performance discrepancy.
diff --git a/docs/pages/experimental/index.mdx b/docs/pages/experimental/index.mdx
@@ -4,6 +4,21 @@ import { Callout } from "nextra/components"
 This part of the documentation is for experimental features. The APIs and functionalities are subject to frequent change.
 </Callout>
 
+<Callout type="info">
+Sotopia is transitioning to the AACT (an actor model library with strong typing and validation) engine for its experimental features. Essentially, for each agent, we have an individual process running the agent's logic. Why are we not using asyncio directly? (Note that's basically what currently popular multi-agent frameworks like Autogen, Swarm, CrewAI etc. are using).
+
+Asyncio requires non-blocking implementation of the agent's logic. Imagine two agents chatting with each other. If we use asyncio directly, we need to wait for the first agent to finish its turn before the second agent can respond. This is not a natural interaction flow. Like if one agent is taking forever in typing, the other agent will have to wait. That's totally fine for cases where the agents are "cooperative" and the interaction is "turn-based."
+
+But that's really not the case for social simulations.
+
+And what if we have 1000 agents? Things will get even worse as the interactions and dependencies between the agents become more complex.
+
+Instead, we advocate this "real-time" async interaction flow, where each agent is independent and they can do their own thing regardless of the other agents.
+
+And we believe this new engine will be the future of more realistic social simulations.
+So here we are! In this very exciting experimental phase. And we are looking for your feedback and help!
+</Callout>
+
 The experimental APIs of Sotopia are intended for quickly prototyping and experimenting with new functionalities,
 without breaking the existing stable APIs. But we will still maintain the quality of the code for these features.
 Feel free to raise an issue if you find any bugs or wants more features in the experimental APIs.

diff --git a/examples/benchmark_evaluator.py b/examples/benchmark_evaluator.py
@@ -15,8 +15,8 @@
 
 target_model_patterns: list[list[str]] = [
     ["gpt-4", "gpt-4", "gpt-3.5-turbo"],
-    ["gpt-4", "gpt-4o-mini", "gpt-4"],
-    ["gpt-4", "gpt-4o-mini", "togethercomputer/llama-2-70b-chat"],
+    ["gpt-4", "gpt-3.5-turbo", "gpt-4"],
+    ["gpt-4", "gpt-3.5-turbo", "togethercomputer/llama-2-70b-chat"],
     ["gpt-4", "togethercomputer/llama-2-70b-chat", "gpt-3.5-turbo"],
 ]
 
@@ -113,7 +113,6 @@ def evaluate_evaluator(
     to_re_evaluate_list = list(human_annotation_dict.keys())
     aggregate_human_annotations: list[EpisodeLog] = list(human_annotation_dict.values())  # type: ignore
     # Call the function with the specified parameters
-
     re_evaluated_episodes: list[EpisodeLog] = EpisodeLog.find(
         EpisodeLog.tag == tag
     ).all()  # type: ignore
@@ -164,7 +163,6 @@ def evaluate_evaluator(
 
     correlation_list = []
     ordered_re_eval_episodes = []
-
     for human_annotated_episode in aggregate_human_annotations:
         for re_eval_episode in re_evaluated_episodes:
             assert isinstance(re_eval_episode, EpisodeLog)

diff --git a/examples/experiment_eval.py b/examples/experiment_eval.py
@@ -17,12 +17,12 @@
     EnvironmentProfile,
     EpisodeLog,
     EvaluationDimensionBuilder,
+    SotopiaDimensions,
 )
 from sotopia.envs.evaluators import (
     EvaluationForTwoAgents,
     EpisodeLLMEvaluator,
     RuleBasedTerminatedEvaluator,
-    SotopiaDimensions,
 )
 from sotopia.envs.parallel import ParallelSotopiaEnv
 from sotopia.messages import AgentAction, Observation

diff --git a/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py b/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py
@@ -19,13 +19,11 @@
     pass
 
 # Configure logging
-FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
-logging.basicConfig(
-    level=logging.WARNING,
-    format=FORMAT,
-    datefmt="[%X]",
-    handlers=[RichHandler()],
-)
+log = logging.getLogger("sotopia.llm_agent")
+log.setLevel(logging.INFO)
+# Prevent propagation to root logger
+log.propagate = False
+log.addHandler(RichHandler(rich_tracebacks=True, show_time=True))
 
 
 @NodeFactory.register("llm_agent")
@@ -63,20 +61,13 @@ def set_profile(self, use_pk_value: bool) -> None:
             assert (
                 self.background is not None and self.name is not None
             ), "Background and name must be provided"
-            if " " in self.name:
-                first_name, last_name = self.name.split(" ", 1)
-            else:
-                first_name = self.name
-                last_name = ""
-            profile = AgentProfile(
-                first_name=first_name, last_name=last_name, **self.background
-            )
+            profile = AgentProfile(**self.background)
         else:
             assert not self.agent_profile_pk == "", "Agent profile pk must be provided"
             profile = AgentProfile.get(pk=self.agent_profile_pk)
 
         self.agent_profile_pk = profile.pk
-        self.name = " ".join([profile.first_name, profile.last_name]).strip()
+        self.name = profile.first_name
         self.background = profile.model_dump()
 
     def _format_message_history(self, message_history: list[Observation]) -> str:

diff --git a/examples/experimental/sotopia_original_replica/origin.toml b/examples/experimental/sotopia_original_replica/origin.toml
diff --git a/examples/experimental/sotopia_original_replica/output.toml b/examples/experimental/sotopia_original_replica/output.toml
diff --git a/examples/experimental/sotopia_original_replica/raw_config.json b/examples/experimental/sotopia_original_replica/raw_config.json
diff --git a/examples/experimental/sotopia_original_replica/readme.md b/examples/experimental/sotopia_original_replica/readme.md
@@ -1,21 +1,18 @@
 To run this example, please use aact to launch.
 
 ```bash
-aact run-dataflow examples/experimental/sotopia_original_replica/origin.toml
+python examples/experimental/sotopia_original_replica/simulate.py
 ```
 
-To view the flow of the information, please run:
+this example can be also run in a web interface by running:
 
 ```bash
-aact draw-dataflow examples/experimental/sotopia_original_replica/origin.toml --svg-path examples/experimental/sotopia_original_replica/origin.svg
+fastapi run sotopia/api/fastapi_server.py --port 8080
 ```
-
-To quickly generate your own simluation config, format your input like in the `raw_config.toml` file
-to generate an executable file, run:
+Then in another terminal, run:
 ```bash
-cd examples/experimental/sotopia_original_replica
-python generate_executable.py --input=raw_config.json  # output will be stored in output.toml
-aact run-dataflow output.toml  # calling aact to run the simulation
+python examples/experimental/sotopia_original_replica/websocket_simulation_client.py
 ```
+You would see the msgs coming from the websocket server.
 
 ![Alt text](./origin.svg)