Skip to content

Instable cluster_chaos_test #3072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tillrohrmann opened this issue Apr 1, 2025 · 1 comment · Fixed by #3092
Closed

Instable cluster_chaos_test #3072

tillrohrmann opened this issue Apr 1, 2025 · 1 comment · Fixed by #3092
Assignees
Labels
tests flaky tests or other test related issues

Comments

@tillrohrmann
Copy link
Contributor

Instable cluster_chaos_test:

https://github.com/restatedev/restate/actions/runs/14194054359/job/39765020718?pr=3069#step:12:3909

From the logs it looks as if node-2 starts:

2025-04-01T11:12:31.6360326Z node-2	| 2025-04-01T11:12:21.005603Z INFO restate_server
2025-04-01T11:12:31.6360687Z node-2	|   Starting Restate Server 1.3.0-dev (7570395 x86_64-unknown-linux-gnu 2025-04-01)
2025-04-01T11:12:31.6360813Z node-2	|     node_name: "node-2"
2025-04-01T11:12:31.6361138Z node-2	|     config_source: /tmp/.tmpE32nzU/node-2/config.toml
2025-04-01T11:12:31.6361295Z node-2	|     base_dir: /tmp/.tmpE32nzU/node-2/
2025-04-01T11:12:31.6361353Z node-2	| on main

and then nothing more is logged from this process. The next expected log statement would be starting the node server. Something like this but it's not printed.

2025-04-01T11:12:31.6179486Z node-2	| 2025-04-01T11:12:13.013742Z INFO restate_core::network::net_util
2025-04-01T11:12:31.6179548Z node-2	|   Server listening
2025-04-01T11:12:31.6179614Z node-2	| on rs:worker-13
2025-04-01T11:12:31.6179701Z node-2	|   in restate_core::network::net_util::server
2025-04-01T11:12:31.6179775Z node-2	|     server_name: node-rpc-server 
2025-04-01T11:12:31.6179869Z node-2	|     uds.path: "/tmp/.tmpE32nzU/node-2/node.sock"
2025-04-01T11:12:31.6179964Z node-2	| 2025-04-01T11:12:13.014175Z INFO restate_node::init
2025-04-01T11:12:31.6180152Z node-2	|   Trying to join the provisioned cluster 'cluster_chaos_test'
2025-04-01T11:12:31.6180214Z node-2	| on rs:worker-13
@tillrohrmann tillrohrmann added the tests flaky tests or other test related issues label Apr 2, 2025
@tillrohrmann
Copy link
Contributor Author

Hard to tell what is happening from these logs. I suggest to increase the log level to debug and to wait until the test failure happens again.

@tillrohrmann tillrohrmann self-assigned this Apr 2, 2025
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Apr 2, 2025
Increasing the log level of the cluster_chaos_test to restate=debug allows
us hopefully to find out why the test got stuck (concretely why a newly
started process did not go beyond the "Starting Restate Server" log statement.

This fixes restatedev#3072.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests flaky tests or other test related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant