Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ Sotopia is an open-ended social learning environment that allows agents to inter
## Help
See [documentation](https://docs.sotopia.world) for more details.

> [!IMPORTANT]
> If you are trying to develop on top of Sotopia, we highly recommend to follow the [development guide](https://docs.sotopia.world/contribution/contribution).

## Get started

### Install locally
Expand Down
12 changes: 12 additions & 0 deletions docs/pages/examples/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,15 @@ When `only-show-performance` is speficied, only model results with available epi
Currently this script would run over 100 simulations on the Sotopia Hard tasks. And the partner model is fixed to be `meta-llama/Llama-3-70b-chat-hf`

An example script is provided in `scripts/display_benchmark_results.sh`

# Benchmark your model as a evaluator

```
uv run python examples/benchmark_evaluator.py --model=<model> --tag=<tag> --batch-size=<batch_size> --push-to-db
```

This script will re-evaluate the existing episodes with the new model and compare with human annotations.

> **Note:** Sometimes you might need to run the script twice to get the results. This is because the uploading to the database might take some time to complete.

> **Warning:** The re-evaluation does not use the exact same prompt as the original evaluation. However, we have no evidence suggesting that this slight format difference causes any performance discrepancy.
15 changes: 15 additions & 0 deletions docs/pages/experimental/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@ import { Callout } from "nextra/components"
This part of the documentation is for experimental features. The APIs and functionalities are subject to frequent change.
</Callout>

<Callout type="info">
Sotopia is transitioning to the AACT (an actor model library with strong typing and validation) engine for its experimental features. Essentially, for each agent, we have an individual process running the agent's logic. Why are we not using asyncio directly? (Note that's basically what currently popular multi-agent frameworks like Autogen, Swarm, CrewAI etc. are using).

Asyncio requires non-blocking implementation of the agent's logic. Imagine two agents chatting with each other. If we use asyncio directly, we need to wait for the first agent to finish its turn before the second agent can respond. This is not a natural interaction flow. Like if one agent is taking forever in typing, the other agent will have to wait. That's totally fine for cases where the agents are "cooperative" and the interaction is "turn-based."

But that's really not the case for social simulations.

And what if we have 1000 agents? Things will get even worse as the interactions and dependencies between the agents become more complex.

Instead, we advocate this "real-time" async interaction flow, where each agent is independent and they can do their own thing regardless of the other agents.

And we believe this new engine will be the future of more realistic social simulations.
So here we are! In this very exciting experimental phase. And we are looking for your feedback and help!
</Callout>

The experimental APIs of Sotopia are intended for quickly prototyping and experimenting with new functionalities,
without breaking the existing stable APIs. But we will still maintain the quality of the code for these features.
Feel free to raise an issue if you find any bugs or wants more features in the experimental APIs.
Expand Down
6 changes: 2 additions & 4 deletions examples/benchmark_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@

target_model_patterns: list[list[str]] = [
["gpt-4", "gpt-4", "gpt-3.5-turbo"],
["gpt-4", "gpt-4o-mini", "gpt-4"],
["gpt-4", "gpt-4o-mini", "togethercomputer/llama-2-70b-chat"],
["gpt-4", "gpt-3.5-turbo", "gpt-4"],
["gpt-4", "gpt-3.5-turbo", "togethercomputer/llama-2-70b-chat"],
["gpt-4", "togethercomputer/llama-2-70b-chat", "gpt-3.5-turbo"],
]

Expand Down Expand Up @@ -113,7 +113,6 @@ def evaluate_evaluator(
to_re_evaluate_list = list(human_annotation_dict.keys())
aggregate_human_annotations: list[EpisodeLog] = list(human_annotation_dict.values()) # type: ignore
# Call the function with the specified parameters

re_evaluated_episodes: list[EpisodeLog] = EpisodeLog.find(
EpisodeLog.tag == tag
).all() # type: ignore
Expand Down Expand Up @@ -164,7 +163,6 @@ def evaluate_evaluator(

correlation_list = []
ordered_re_eval_episodes = []

for human_annotated_episode in aggregate_human_annotations:
for re_eval_episode in re_evaluated_episodes:
assert isinstance(re_eval_episode, EpisodeLog)
Expand Down
2 changes: 1 addition & 1 deletion examples/experiment_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@
EnvironmentProfile,
EpisodeLog,
EvaluationDimensionBuilder,
SotopiaDimensions,
)
from sotopia.envs.evaluators import (
EvaluationForTwoAgents,
EpisodeLLMEvaluator,
RuleBasedTerminatedEvaluator,
SotopiaDimensions,
)
from sotopia.envs.parallel import ParallelSotopiaEnv
from sotopia.messages import AgentAction, Observation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,11 @@
pass

# Configure logging
FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
logging.basicConfig(
level=logging.WARNING,
format=FORMAT,
datefmt="[%X]",
handlers=[RichHandler()],
)
log = logging.getLogger("sotopia.llm_agent")
log.setLevel(logging.INFO)
# Prevent propagation to root logger
log.propagate = False
log.addHandler(RichHandler(rich_tracebacks=True, show_time=True))


@NodeFactory.register("llm_agent")
Expand Down Expand Up @@ -63,20 +61,13 @@ def set_profile(self, use_pk_value: bool) -> None:
assert (
self.background is not None and self.name is not None
), "Background and name must be provided"
if " " in self.name:
first_name, last_name = self.name.split(" ", 1)
else:
first_name = self.name
last_name = ""
profile = AgentProfile(
first_name=first_name, last_name=last_name, **self.background
)
profile = AgentProfile(**self.background)
else:
assert not self.agent_profile_pk == "", "Agent profile pk must be provided"
profile = AgentProfile.get(pk=self.agent_profile_pk)

self.agent_profile_pk = profile.pk
self.name = " ".join([profile.first_name, profile.last_name]).strip()
self.name = profile.first_name
self.background = profile.model_dump()

def _format_message_history(self, message_history: list[Observation]) -> str:
Expand Down
65 changes: 0 additions & 65 deletions examples/experimental/sotopia_original_replica/origin.toml

This file was deleted.

84 changes: 0 additions & 84 deletions examples/experimental/sotopia_original_replica/output.toml

This file was deleted.

30 changes: 0 additions & 30 deletions examples/experimental/sotopia_original_replica/raw_config.json

This file was deleted.

15 changes: 6 additions & 9 deletions examples/experimental/sotopia_original_replica/readme.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,18 @@
To run this example, please use aact to launch.

```bash
aact run-dataflow examples/experimental/sotopia_original_replica/origin.toml
python examples/experimental/sotopia_original_replica/simulate.py
```

To view the flow of the information, please run:
this example can be also run in a web interface by running:

```bash
aact draw-dataflow examples/experimental/sotopia_original_replica/origin.toml --svg-path examples/experimental/sotopia_original_replica/origin.svg
fastapi run sotopia/api/fastapi_server.py --port 8080
```

To quickly generate your own simluation config, format your input like in the `raw_config.toml` file
to generate an executable file, run:
Then in another terminal, run:
```bash
cd examples/experimental/sotopia_original_replica
python generate_executable.py --input=raw_config.json # output will be stored in output.toml
aact run-dataflow output.toml # calling aact to run the simulation
python examples/experimental/sotopia_original_replica/websocket_simulation_client.py
```
You would see the msgs coming from the websocket server.

![Alt text](./origin.svg)
Loading