This repository contains a compact agent‑based simulation of firms competing for workers on a circular space. Agents learn wage‑setting policies either with a Deep Q‑Network (DQN) or a tabular Q‑learning baseline. The code is organized to let you switch between models and predefined scenarios (1–8) that toggle replay, target‑net sync, and a bidding vs. take‑it‑or‑leave‑it mechanism.
At a glance
- Environment: workers are placed on a circle; moving farther is costly ((\textit{effort})).
- Firms post wages; workers choose based on payoff (wage minus travel effort).
- Learning: either DQN (Keras/TensorFlow) or a pure Q‑table.
- Rich diagnostics: best‑response maps, per‑tick Q‑values, firm performance logs.
.
├── Model.py # Entry point – CLI, simulation loop & scenario wiring
├── Globals.py # Global hyperparameters & defaults
├── Neural_Network.py # Keras policy/target network builders & helpers (DQN)
├── QTable_Agent.py # Tabular Q-learning agent + policy shim
├── Firm.py # Firm agent logic (act, learn, memory)
├── Worker.py # Worker agent logic & choice
├── Space.py # Minimal container for agents (iterable)
├── DataHandling.py # CSV logging: Q-values over time, firm performance
└── PlotUtils.py # Best-response (BR) map & plotting helpers
Two economic interaction modes:
- model_type = 0: Take‑it‑or‑leave‑it (TIOLI) wage posting
- model_type = 1: Bidding
Replay & target‑network toggles define eight scenarios:
- TIOLI + replay (symmetric)
- TIOLI, no replay (symmetric)
- Bidding + replay (symmetric)
- Bidding, no replay (symmetric)
- TIOLI, asymmetric fast‑sync firm 0 – firm 0 uses online updates (no replay, mini‑batch=1) and forces target‑net sync every iteration; others use replay/sync.
- Bidding, asymmetric fast‑sync firm 0 – as in 5, but bidding.
- TIOLI, asymmetric fast‑sync firm 1 – firm 1 is the fast‑sync/online learner.
- Bidding, asymmetric fast‑sync firm 1 – as in 7, but bidding.
These correspond to the comment block in
Model.py. See the function that wiresset_simulation_scenariofor exact parameterizations in your copy.
Python 3.9+ recommended.
# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
# install dependencies
pip install -U pip wheel
pip install numpy pandas matplotlib tensorflow kerasIf you only run the Q‑table baseline,
tensorflow/kerasis still imported for a small policy shim; however, you can strip that dependency if desired (seeQTable_Agent.py).
Run a symmetric DQN TIOLI scenario (1):
python Model.py --set_simulation_scenario 1Use the tabular baseline (set the flag to 1) and plot the best‑response map occasionally:
python Model.py --set_simulation_scenario 2 --qtable 1 --plot_br 10000Try an asymmetric setup where firm 0 learns online with per‑iteration target sync (scenario 5):
python Model.py --set_simulation_scenario 5Use
-h/--helpfor the full list of flags and their defaults in your version.
Names and defaults come from
Model.py; your copy may differ slightly.
--set_simulation_scenario INT– select scenario (1–8).--qtable {0,1}–1uses the tabular baseline,0uses DQN.--learning_rate FLOAT– optimizer step size (DQN).--beta FLOAT– exploration decay parameter for ε‑greedy.--effort FLOAT– travel cost per unit distance on the circle.--random_productivity {0,1}– draw ±delta_productivityshocks each period if 1.--delta_productivity FLOAT– amplitude for productivity shock/asymmetry.--num_firms INT– number of competing firms.--plot_br INT– if >0, render the best‑response map every N iterations.- Additional flags may exist (e.g., seed, plotting on/off); see
Model.py.
DataHandling.py writes CSVs to the working directory:
q_values_over_time.csv– per‑tick Q‑values by firm (for diagnostics/plots)firm_performance.csv– rewards, wages, and other performance metrics
PlotUtils.py can render best‑response (BR) maps and other visuals.
-
Environment
Workers live on a circle. Accepting a job implies a travel cost proportional to distance (controlled by--effort). -
Firms
Each firm sets a wage from a discrete grid. In TIOLI, workers accept if net utility is highest; in Bidding, firms compete more explicitly. -
Learning
- DQN: policy/target networks (
Neural_Network.py), replay memory, target syncs. - Tabular: standard Q‑learning update with ε‑greedy; a small policy shim makes it compatible with the same logging and plotting code.
- DQN: policy/target networks (
-
Logging
Q‑values and firm performance are persisted every tick (append mode after the first tick).
- Set a fixed random seed.
- Keep the wage grid and environment parameters constant while comparing models.
- For DQN sensitivity checks: vary replay size, target‑sync frequency, and learning rate.
- No plots appear: ensure
--plot_bris set to a positive number andmatplotlibis installed. - TensorFlow errors (Q‑table only): if you want to avoid TF altogether, you can replace the policy shim tensor return with NumPy arrays and adjust calls in
DataHandling.pyaccordingly. - CSV not updating: the files append after the first iteration; delete them if you want a fresh run.