Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky system test: custom operations for workflows #3427

Open
reubenmiller opened this issue Feb 26, 2025 · 2 comments
Open

flaky system test: custom operations for workflows #3427

reubenmiller opened this issue Feb 26, 2025 · 2 comments
Labels
bug Something isn't working flaky test Label given to flaky or otherwise faulty tests

Comments

@reubenmiller
Copy link
Contributor

Describe the bug

Flaky system tests (both in the same Suite). Though this could indicate that the second test is failing due to the first test since the tests are reusing the same device (so there is no test isolation).

Test name

  • Command steps should not be executed twice
  • Placeholder workflow created for ill-defined operations

An initial inspection shows that the two 140 messages are sent because two software list operations are being triggered most likely due to the tedge-mapper being restarted as part of the test and there is no time scope to the MQTT message assertion.

Failures

Test: Command steps should not be executed twice

Matching messages on topic 'te/device/main///cmd/issue-2896/test-1' is less than minimum. wanted: 1 got: 0 messages: []

Test: Placeholder workflow created for ill-defined operations

Matching messages on topic 'te/device/main///cmd/issue-3079/test-1' is less than minimum. wanted: 1 got: 0 messages: []

Build seen on

To Reproduce

To be investigated

Expected behavior

The test should pass consistently.

@reubenmiller reubenmiller added the bug Something isn't working label Feb 26, 2025
@reubenmiller
Copy link
Contributor Author

The initial investigation shows that during the Command steps should not be executed twice test, a device restart is triggered (via shutdown -r now), and both the tedge-mapper-c8y and tedge-agent services shutdown, but the system does not show any log entries afterwards. The second test, Placeholder workflow created for ill-defined operations, also fails as the device container does not seem to be functional, or restarted as the timestamp is the same for the previous container.

So either the container's OS froze on shutdown (after most of the services were shutdown), or the container did not restart automatically for some reason (though this should be by docker).

Below shows the shutdown that looks ok at first glance.

Feb 26 15:55:42 9e22f2eac135 systemd[1]: tedge-agent.service: Deactivated successfully.
Feb 26 15:55:42 9e22f2eac135 systemd[1]: Stopped tedge-agent.service - tedge-agent is a thin-edge.io component to support operations..
Feb 26 15:55:42 9e22f2eac135 systemd[1]: tedge-mapper-c8y.service: Deactivated successfully.
Feb 26 15:55:42 9e22f2eac135 systemd[1]: Stopped tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Feb 26 15:55:42 9e22f2eac135 mosquitto[57]: 1740585342: Received PUBACK from Cumulocity (Mid: 37, RC:0)
Feb 26 15:55:42 9e22f2eac135 mosquitto[57]: 1740585342: Received PUBACK from Cumulocity (Mid: 38, RC:0)

@reubenmiller
Copy link
Contributor Author

It would be beneficial to restructure the tests to use individual containers to make it easier to review the logs for the specific test only.

@Bravo555 Bravo555 added the flaky test Label given to flaky or otherwise faulty tests label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky test Label given to flaky or otherwise faulty tests
Projects
None yet
Development

No branches or pull requests

2 participants