Skip to content

Commit 23ff0f6

Browse files
Unicast and other networking improvements (UBC-Thunderbots#3417)
* project vimrc for FZF * filter out .swp files * udp listener refactoring in-progress * in-progress: add latency_tester module * latency scripts compile * formatting * Revert "Logger now creates a log directory if doesn't exist (UBC-Thunderbots#3143)" This reverts commit 0e8bf7e. * cleanup test script * wip * add tracy symbols * working on ansibleing * init * wip * wip * mayhaps it works * wip * wip * wip * wip * debugging * should clean this up * wip * various thunderscope improvements * add some unit tests * wip * wip * wip * wip * wip & cleanup * wip * cleanup * revert unintentional change * [pre-commit.ci lite] apply automatic fixes * bug fix and update getting_started guide * [pre-commit.ci lite] apply automatic fixes * formatting + fix CI * [pre-commit.ci lite] apply automatic fixes * address PR comments * [pre-commit.ci lite] apply automatic fixes * [pre-commit.ci lite] apply automatic fixes * update getting started * further improve docs * doc cleanup * simplify getting_started more * remove unnecessary socket closing * cleanup edge case handling of robot_communication binding * [pre-commit.ci lite] apply automatic fixes * cleanup edge cases * formatting * cleanup logic * simplify logic for selecting interface * further simplify selection logic * remove redundant caching * delete duplicated code * [pre-commit.ci lite] apply automatic fixes * address PR comments * update documentation for getLocalIp * [pre-commit.ci lite] apply automatic fixes * add initial delay before starting the test * maybe fix ci * [pre-commit.ci lite] apply automatic fixes * standardize latency testing * add unicast support to benchmarking scripts * [pre-commit.ci lite] apply automatic fixes * wip * fix communication bug * Add documentation, tests and cleanup code in benchmarking_utils * Add documentation and cleanup some code * initial changes for unicast comm support * WIP network service implementation * network service implemented * wip python changes * begin outlines of wifi communication manager * wip * wip refactor RobotCommunication to use new WifiNetworkManager module * wip * wip but Thunderscope doesn't yet run * wip debugged thunderscope and seems to work * working on host mode * Finish up changes * constexpr evaluation in Thunderloop constructor * doc update * Change order of tracy calls * [pre-commit.ci lite] apply automatic fixes * debugging integration problems * seems to work? * wip * wip * wip * formatting yo! * update ansible * getting ready for PR * docs for benchmarking utils * ci + documentation * [pre-commit.ci lite] apply automatic fixes * fix ansible linting * wip not yet done * [pre-commit.ci lite] apply automatic fixes * wip * [pre-commit.ci lite] apply automatic fixes * wip * [pre-commit.ci lite] apply automatic fixes * C++ appears to compile, working through Python * [pre-commit.ci lite] apply automatic fixes * update wifi_communication_manager * some cleanup + stray forgotten type hinting return types * wip * [pre-commit.ci lite] apply automatic fixes * fixing network log listener * [pre-commit.ci lite] apply automatic fixes * wip * [pre-commit.ci lite] apply automatic fixes * network log listener is finally done * wip * fix broken compile and tests * [pre-commit.ci lite] apply automatic fixes * minor nit from field testing --------- Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
1 parent cedb04f commit 23ff0f6

File tree

92 files changed

+3046
-848
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+3046
-848
lines changed

docs/getting-started.md

+13-7
Original file line numberDiff line numberDiff line change
@@ -230,13 +230,11 @@ Now that you're setup, if you can run it on the command line, you can run it in
230230

231231
- If we want to run it with real robots:
232232
- Open your terminal, `cd` into `Software/src` and run `ifconfig`.
233-
- Pick the network interface you would like to use:
234-
1. If you are running things locally, you can pick any interface that is not `lo`
235-
2. If you would like to communicate with robots on the network, make sure to select the interface that is connected to the same network as the robots.
233+
- Pick the network interface you would like to use. If you would like to communicate with robots on the network, make sure to select the interface that is connected to the same network as the robots.
236234
- For example, on a sample machine, the output may look like this:
237235

238236
```
239-
enp0s5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
237+
wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
240238
...
241239
[omitted]
242240
...
@@ -247,22 +245,30 @@ Now that you're setup, if you can run it on the command line, you can run it in
247245
...
248246
```
249247

250-
- An appropriate interface we could choose is `enp0s5`
248+
- An appropriate interface we could choose is `wlp3s0`
249+
- Hint: If you are using a wired connection, the interface will likely start with `e-`. If you are using a WiFi connection, the interface will likely start with `w-`.
251250
- If we are running the AI as "blue": `./tbots.py run thunderscope_main --interface=[interface_here] --run_blue`
252251
- If we are running the AI as "yellow": `./tbots.py run thunderscope_main --interface=[interface_here] --run_yellow`
253252
- `[interface_here]` corresponds to the `ifconfig` interfaces seen in the previous step
254-
- For instance, a call to run the AI as blue on wifi could be: `./tbots.py run thunderscope_main --interface=enp0s5 --run_blue`
253+
- For instance, a call to run the AI as blue on WiFi could be: `./tbots.py run thunderscope_main --interface=wlp3s0 --run_blue`. This will start Thunderscope and set up communication with robots over the wifi interface. It will also listen for referee and vision messages on the same interface.
254+
- **Note: You do not need to include the `--interface=[interface_here]` argument!** You can run Thunderscope without it and use the dynamic configuration widget to set the interfaces for communication to send and receive robot, vision and referee messages.
255+
- If you choose to include `--interface=[interface_here]` argument, Thunderscope will listen for and send robot messages on this port as well as receive vision and referee messages.
256+
- Using the dynamic configuration widget is recommended at Robocup. To reduce latencies, it is recommended to connect the robot router to the AI computer via ethernet and use a separate ethernet connection to receive vision and referee messages. In this configuration, Thunderscope will need to bind to two different interfaces, each likely starting with a "e-".
257+
- If you have specified `--run_blue` or `--run_yellow`, navigate to the "Parameters" widget. In "ai_config" > "ai_control_config" > "network_config", you can set the appropriate interface using the dropdowns for robot, vision and referee message communication.
255258
- This command will set up robot communication and the Unix full system binary context manager. The Unix full system context manager hooks up our AI, Backend and SensorFusion
256259
2. Run AI along with Robot Diagnostics:
257260
- The Mechanical and Electrical sub-teams use Robot Diagnostics to test specific parts of the Robot.
258261
- If we want to run with one AI and Diagnostics
259-
- `./tbots.py run thunderscope_main [--run_blue | --run_yellow] --run_diagnostics` will start Thunderscope
262+
- `./tbots.py run thunderscope_main [--run_blue | --run_yellow] --run_diagnostics --interface=[interface_here]` will start Thunderscope
260263
- `[--run_blue | --run_yellow]` indicate which FullSystem to run
261264
- `--run_diagnostics` indicates if diagnostics should be loaded as well
262265
- Initially, the robots are all connected to the AI and only receive input from it
263266
- To change the input source for the robot, use the drop-down menu of that robot to change it between None, AI, and Manual
264267
- None means the robots are receiving no commands
265268
- More info about Manual control below
269+
- `--interface=[interface_here]` corresponds to the `ifconfig` interfaces seen in the previous step
270+
- For instance, a call to run the AI as blue on WiFi could be: `./tbots.py run thunderscope_main --interface=wlp3s0 --run_blue --run_diagnostics`
271+
- The `--interface` flag is optional. If you do not include it, you can set the interface in the dynamic configuration widget. See above for how to set the interface in the dynamic configuration widget.
266272
3. Run only Diagnostics
267273
- To run just Diagnostics
268274
- `./tbots.py run thunderscope --run_diagnostics --interface <network_interface>`

docs/wifi-communication-readme.md

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# WiFi Communication README
2+
3+
## Lessons Learned So Far
4+
1. For optimal performance, make sure that any packets sent over the network are below MTU size (1500 bytes). Packets larger than MTU size require multiple transmissions for a single send event and multiple retransmissions in the case of packet loss. Overall, these packets contribute to greater utilization of the network and increased latency.
5+
2. Connect the host computer to the network via ethernet cable when possible. By minimizing the utilization of the WiFi network, this change significantly improves the round-trip time.
6+
3. Use unicast communication over WiFi for frequent, low-latency communication over multicast. [RFC 9119](https://www.rfc-editor.org/rfc/rfc9119.html#section-3.1.2) provides a good overview of the limitations of multicast communication over WiFi. In short, routers are forced to transmit at the lowest common data rate of all devices on the network to ensure that all devices receive the packet, meaning that the network is slowed down by the slowest device. In addition, router features such as Multiple Input Multiple Output (MIMO) may not be available when using multicast communication. We have found a 24% improvement in round-trip time when switching from multicast to unicast communication with some benchmarking tests.
7+
4. On embedded Linux devices, WiFi power management seems to cause significant latency spikes. To disable power management, run the following command: `sudo iw dev {wifi_interface} set power_save off` where `{wifi_interface}` is the name of the WiFi interface (e.g. `wlan0`).
8+
9+
10+
## Debugging
11+
12+
We have built some tools to help diagnose network latency problems without the confounding effects of running Thunderloop and Thunderscope and their associated overheads.
13+
14+
The latency tester tests the round trip time between two nodes. The primary node sends a message to the secondary node, which sends back the same message as soon as it receives it. The primary node then measures the round trip time.
15+
16+
Typically, the primary node is the host computer and the secondary node is the robot.
17+
18+
## Running the latency tester with the robot
19+
### Prerequisites
20+
You must know:
21+
- The IP address of the robot. We will refer to this address as `{robot_ip}`.
22+
- The WiFi interface of the robot. We will refer to this interface as `{robot_wifi_interface}`. This interface is typically found by running `ifconfig` or `ip a` on the robot.
23+
- The network interface of the host computer. We will refer to this interface as `{host_interface}`. This interface is typically found by running `ifconfig` or `ip a` on the host computer.
24+
25+
1. Build the latency tester secondary node: `./tbots.py build latency_tester_secondary_node --platforms=//cc_toolchain:robot`
26+
2. Copy the binary to the robot: `scp bazel-bin/software/networking/benchmarking_utils/latency_tester_secondary_node robot@{robot_ip}:/home/robot/latency_tester_secondary_node`
27+
3. SSH into the robot: `ssh robot@{robot_ip}`
28+
4. There are two test modes: multicast or unicast
29+
1. For multicast:
30+
1. Run the latency tester secondary node: `./latency_tester_secondary_node --interface {robot_wifi_interface}`
31+
- You may optionally also provide the following arguments:
32+
- `--runtime_dir` to specify the directory where log files are stored
33+
- `--listen_port` to specify the port on which the secondary node listens for messages.
34+
- `--send_port` to specify the port on which the secondary node sends messages
35+
- `--listen_channel` to specify the channel on which the secondary node listens for messages
36+
- `--send_channel` to specify the channel on which the secondary node sends back replies
37+
2. On a different terminal on the host computer, run the latency tester primary node: `./tbots.py run latency_tester_primary_node -- --interface {host_interface}`
38+
- You may optionally also provide the following arguments:
39+
- `--runtime_dir` to specify the directory where log files are stored
40+
- `--listen_port` to specify the port on which the primary node listens for replies to messages. This port must match the `--send_port` argument provided to the secondary node.
41+
- `--send_port` to specify the port on which the primary node sends messages. This port must match the `--listen_port` argument provided to the secondary node.
42+
- `--listen_channel` to specify the channel on which the primary node listens for replies to messages. This channel must match the `--send_channel` argument provided to the secondary node.
43+
- `--send_channel` to specify the channel on which the primary node sends messages. This channel must match the `--listen_channel` argument provided to the secondary node.
44+
- `--num_messages` to specify the number of messages to send
45+
- `--message_size_bytes` to specify the size of the message payload in bytes
46+
- `--timeout_duration_ms` to specify the duration in milliseconds to wait for a reply before retransmitting the message
47+
- `--initial_delay_s` to specify the delay in seconds before sending the first message
48+
2. For unicast:
49+
1. Run the latency_tester_secondary_node: `./latency_tester_secondary_node --interface {robot_wifi_interface} --unicast`
50+
- You may optionally also provide the following arguments:
51+
- `--runtime_dir` to specify the directory where log files are stored
52+
- `--listen_port` to specify the port on which the secondary node listens for messages.
53+
- `--send_port` to specify the port on which the secondary node sends messages
54+
- `--send_ip` to specify the IP address of the primary node to send replies to
55+
2. On a different terminal on the host computer, run the latency_tester_primary_node: `./tbots.py run latency_tester_primary_node -- --interface {host_interface} --unicast`
56+
- You may optionally also provide the following arguments:
57+
- `--runtime_dir` to specify the directory where log files are stored
58+
- `--listen_port` to specify the port on which the primary node listens for replies to messages. This port must match the `--send_port` argument provided to the secondary node.
59+
- `--send_port` to specify the port on which the primary node sends messages. This port must match the `--listen_port` argument provided to the secondary node.
60+
- `--send_ip` to specify the IP address of the secondary node to send messages to (`{robot_ip}`)
61+
- `--num_messages` to specify the number of messages to send
62+
- `--message_size_bytes` to specify the size of the message payload in bytes
63+
- `--timeout_duration_ms` to specify the duration in milliseconds to wait for a reply before retransmitting the message
64+
- `--initial_delay_s` to specify the delay in seconds before sending the first message
65+
3. This tool can also be run with Tracy, a profiling tool, which provides some nice performance visualizations and histograms. To do so:
66+
1. Make sure Tracy has been installed. Run `./environment_setup/install_tracy.sh` to install Tracy.
67+
2. On a new terminal in the host computer run Tracy: `./tbots.py run tracy`
68+
3. When running the latency tester primary node, add the `--tracy` flag to the command before the `--`. For example: `./tbots.py run latency_tester_primary_node --tracy -- --interface {host_interface}`
69+
4. Tracy will allow you to select the binary to profile and provide detailed performance information after the tester has run.

environment_setup/install_tracy.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ CURR_DIR=$(dirname -- "$(readlink -f -- "$BASH_SOURCE")")
66
cd "$CURR_DIR" || exit
77

88
# Install dependences
9-
sudo apt install libglfw3-dev libfreetype-dev libdbus-1-dev
9+
sudo apt install -y libglfw3-dev libfreetype-dev libdbus-1-dev
1010

1111
# Install capstone-next
1212
wget -nc https://github.com/capstone-engine/capstone/archive/refs/tags/5.0.1.zip -O /tmp/capstone.zip

environment_setup/setup_software.sh

+4
Original file line numberDiff line numberDiff line change
@@ -211,4 +211,8 @@ sudo ln -s ~/.platformio/penv/bin/platformio /usr/local/bin/platformio
211211

212212
print_status_msg "Done PlatformIO Setup"
213213

214+
print_status_msg "Set up ansible-lint"
215+
/opt/tbotspython/bin/ansible-galaxy collection install ansible.posix
216+
print_status_msg "Finished setting up ansible-lint"
217+
214218
print_status_msg "Done Software Setup, please reboot for changes to take place"

environment_setup/ubuntu20_requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
ansible-lint==24.12.2
12
pyqtgraph==0.13.7
23
pyqt6==6.6.1
34
PyQt6-Qt6==6.6.1

environment_setup/ubuntu22_requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
ansible-lint==24.12.2
12
pyqtgraph==0.13.7
23
pyqt6==6.6.1
34
PyQt6-Qt6==6.6.1

environment_setup/ubuntu24_requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
ansible-lint==24.12.2
12
pyqtgraph==0.13.7
23
thefuzz==0.19.0
34
iterfzf==0.5.0.20.0

scripts/lint_and_format.sh

+13
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,25 @@ function run_eof_new_line(){
119119
fi
120120
}
121121

122+
function run_ansible_lint(){
123+
printf "Running ansible-lint...\n\n"
124+
125+
/opt/tbotspython/bin/ansible-lint $CURR_DIR/../src/software/embedded/ansible/**/*.yml --fix
126+
127+
if [[ "$?" != 0 ]]; then
128+
printf "\n***Failed to lint and format Ansible files!***\n\n"
129+
exit 1
130+
fi
131+
}
132+
133+
122134
# Run formatting
123135
run_code_spell
124136
run_clang_format
125137
run_bazel_formatting
126138
run_ruff
127139
run_eof_new_line
128140
run_git_diff_check
141+
run_ansible_lint
129142

130143
exit 0

src/proto/BUILD

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ proto_library(
1111
"game_state.proto",
1212
"geneva_slot.proto",
1313
"geometry.proto",
14+
"ip_notification.proto",
1415
"parameters.proto",
1516
"play.proto",
1617
"power_frame_msg.proto",
@@ -50,6 +51,7 @@ py_proto_library(
5051
"game_state.proto",
5152
"geneva_slot.proto",
5253
"geometry.proto",
54+
"ip_notification.proto",
5355
"parameters.proto",
5456
"play.proto",
5557
"primitive.proto",

src/proto/ip_notification.proto

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
syntax = "proto3";
2+
3+
import "proto/tbots_timestamp_msg.proto";
4+
5+
package TbotsProto;
6+
7+
message IpNotification
8+
{
9+
int32 robot_id = 1;
10+
string ip_address = 2;
11+
}

src/proto/parameters.proto

+15
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ message AiControlConfig
5353

5454
// Override the existing play with the Play enum provided
5555
required PlayName override_ai_play = 2 [default = UseAiSelection];
56+
57+
// Interfaces for various network listeners
58+
required NetworkConfig network_config = 3;
5659
}
5760

5861
message AiParameterConfig
@@ -623,6 +626,18 @@ message PossessionTrackerConfig
623626
];
624627
}
625628

629+
message NetworkConfig
630+
{
631+
// The robot communication interface
632+
required string robot_communication_interface = 1 [default = "lo"];
633+
634+
// The referee interface
635+
required string referee_interface = 2 [default = "lo"];
636+
637+
// The vision interface
638+
required string vision_interface = 3 [default = "lo"];
639+
}
640+
626641
message CreaseDefenderConfig
627642
{
628643
// The additional buffer length for each side of the goal

src/proto/primitive.proto

+10-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@ syntax = "proto3";
22

33
package TbotsProto;
44

5+
import "google/protobuf/descriptor.proto";
56
import "proto/geometry.proto";
67
import "proto/geneva_slot.proto";
7-
import "google/protobuf/descriptor.proto";
8+
import "proto/tbots_timestamp_msg.proto";
89

910
extend google.protobuf.EnumValueOptions
1011
{
@@ -100,13 +101,20 @@ message PowerControl
100101

101102
message Primitive
102103
{
103-
reserved 5;
104104
oneof primitive
105105
{
106106
MovePrimitive move = 1;
107107
StopPrimitive stop = 2;
108108
DirectControlPrimitive direct_control = 3;
109109
}
110+
111+
// Sequence number for the primitive
112+
uint64 sequence_number = 4;
113+
114+
reserved 5;
115+
116+
// Epoch timestamp when primitives were assigned
117+
Timestamp time_sent = 6;
110118
}
111119

112120
message MovePrimitive

src/shared/constants.h

+6-4
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,12 @@ static const short unsigned int REDIS_DEFAULT_PORT = 6379;
4343
static const short unsigned int PRIMITIVE_PORT = 42070;
4444

4545
// the port the AI receives msgs from the robot
46-
static const short unsigned int ROBOT_STATUS_PORT = 42071;
47-
static const short unsigned int ROBOT_LOGS_PORT = 42072;
48-
static const short unsigned int ROBOT_CRASH_PORT = 42074;
49-
static const short unsigned int NETWORK_COMM_TEST_PORT = 42075;
46+
static constexpr short unsigned int ROBOT_STATUS_PORT = 42071;
47+
static constexpr short unsigned int ROBOT_LOGS_PORT = 42072;
48+
static constexpr short unsigned int ROBOT_CRASH_PORT = 42074;
49+
static constexpr short unsigned int NETWORK_COMM_TEST_PORT = 42075;
50+
static constexpr short unsigned int ROBOT_TO_FULL_SYSTEM_IP_NOTIFICATION_PORT = 42073;
51+
static constexpr short unsigned int FULL_SYSTEM_TO_ROBOT_IP_NOTIFICATION_PORT = 42076;
5052

5153
// maximum transfer unit of the network interface
5254
// this is an int to avoid Wconversion with lwip

src/shared/test_util/tbots_gtest_main.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ bool TbotsGtestMain::help = false;
1111
bool TbotsGtestMain::enable_visualizer = false;
1212
bool TbotsGtestMain::run_sim_in_realtime = false;
1313
bool TbotsGtestMain::stop_ai_on_start = false;
14-
std::string TbotsGtestMain::runtime_dir = "/tmp/tbots/gtest_logs";
14+
std::string TbotsGtestMain::runtime_dir = "/tmp/tbots/yellow_test";
1515
double TbotsGtestMain::test_speed = 1.0;
1616

1717

src/software/BUILD

+2
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ cc_binary(
4646
"//proto:tbots_cc_proto",
4747
"//shared:constants",
4848
"//software/networking/udp:threaded_proto_udp_listener",
49+
"//software/networking/udp:threaded_proto_udp_sender",
50+
"//software/world:robot_state",
4951
"@boost//:program_options",
5052
],
5153
)
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,35 @@
11
---
2-
32
- name: Deploy to the powerboard
43
hosts: THUNDERBOTS_HOSTS
54

65
tasks:
76
- name: Sync powerboard files
87
become: true
9-
become_method: sudo
8+
become_method: ansible.builtin.sudo
109
ansible.posix.synchronize:
1110
src: ../../../../../power/powerloop.tar.gz
1211
dest: ~/
13-
recursive: yes
14-
copy_links: yes
12+
recursive: true
13+
copy_links: true
1514

1615
- name: Untar powerboard files
17-
shell: 'mkdir -p powerloop && tar -xvf ~/powerloop.tar.gz -C powerloop'
16+
ansible.builtin.shell: "mkdir -p powerloop && tar -xvf ~/powerloop.tar.gz -C powerloop"
1817
register: result
18+
changed_when: true
1919
args:
2020
chdir: ~/
2121

2222
- name: Put the powerboard in bootloader mode
2323
ansible.builtin.pause:
2424
prompt: "Press enter to continue"
25-
echo: no
25+
echo: false
2626

2727
- name: Flashing... (this will take a while on the first run)
28-
shell: '/opt/tbotspython/bin/platformio run --disable-auto-clean -t nobuild -t upload -d ~/powerloop/power'
28+
ansible.builtin.command: "/opt/tbotspython/bin/platformio run --disable-auto-clean -t nobuild -t upload -d ~/powerloop/power"
2929
register: result
30+
changed_when: true
3031

3132
- name: Reset powerboard to finish flashing
3233
ansible.builtin.pause:
3334
prompt: "Press enter to continue"
34-
echo: no
35-
35+
echo: false

0 commit comments

Comments
 (0)