Skip to content

Operator-sdk scorecard basic spec check occasionally produces timeout with no logs  #5452

Open
@tkrishtop

Description

@tkrishtop

Bug Report

operator-sdk v.1.14.0 
scorecard-test v1.14.0

I'm from the DCI team. As a part of our daily cert suite, we regularly run operator-sdk scorecard --selector=test=basic-check-spec-test for two operators: simple-demo-operator and testpmd-operator, 10 tests per day.

Normally, basic-check-spec-test should be green for both. But it fails occasionally in 20% of cases for both operators in a row with timeout error error running tests context deadline exceeded. To increase timeout with wait-time up to 300s doesn't help. Also, the test is always failing for both operators in a row and it looks like a 10-20 min of some internal API.

To have more information, could you please add more information in the logs about where did exactly the timeout happen?

What did you do?

operator_sdk scorecard \
--output json \
--selector=test=basic-check-spec-test \
--kubeconfig {{ kubeconfig_path }} \
--namespace scorecard-testing \
--service-account default \
--config {{ scorecard_config_path }} \
--verbose \
--wait-time 300s \
{{ scorecard_operator_dir }}

What did you expect to see?

The results of basic-check-spec-test should be stable. In the case of timeout, it would be nice to have logs to identify what is the reason for this timeout.

What did you see instead? Under which circumstances?

Timeout in 20% of cases error running tests context deadline exceeded with no detailed logs.

Environment

Operator type:

-
  name: "testpmd-operator"
  version: "v0.2.9"
  image: "quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
  index_image: "quay.io/tkrishtop/index-testpmd-operator-bundle:v0.2.9"
-
  name: "simple-demo-operator"
  version: "v0.0.3"
  image: "quay.io/opdev/simple-demo-operator-bundle@sha256:eff7f86a54ef2a340dbf739ef955ab50397bef70f26147ed999e989cfc116b79"
  index_image: "quay.io/opdev/simple-demo-operator-catalog:v0.0.3"

Kubernetes cluster type:

Happens randomly for the latest stable OCP 4.7, OCP 4.8, OCP 4.9, OCP 4.10

$ operator-sdk version

operator-sdk v.1.14.0 
scorecard-test v1.14.0

Possible Solution

It would be nice to have more detailed logs to identify what is the reason for this timeout.

Metadata

Metadata

Assignees

Labels

lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs discussionscorecardIssue relates to the scorecard subcomponent

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions