-
Notifications
You must be signed in to change notification settings - Fork 17
Additional validations #98
Copy link
Copy link
Open
Description
As per the set of tests I have created for internal testing, here's a list of infrastructure components I'd like to be able to see:
- If the UC/OC were deployed with SSL, verify all endpoints are actually listening on SSL enabled ports
- check SELinux for errors all on nodes (I usually grep for AVC denials)
- check HAProxy (I usually curl the stat page and parse the output for errors)
- check Galera on all the nodes (mysql -e "SHOW STATUS LIKE 'wsrep%'"; and parse the output for problematic messages - wsrep_local_state_comment - must be in sync, wsrep_cluster_status, wsrep_cluster_size - must equal to the number of controllers etc etc)
- check pacemaker on all the relevant nodes (I look for failure messages in pcs status)
- check RabbitMQ (rabbitmqctl status and look for lines that start with "Error")
- check MongoDB
- Check Redis
- services status on all nodes. Will be even more relevant with composable roles, since the service list will match the service to node mapping in the deployment yaml
- check ceph (ceph health and ceph status are the commands I use)
- check keepalived (for the versions where it's relevant) - this one cna be tricky since there is no status command, I had to parse the config file and then verify the IPs and services were actually there using nc/telnet/curl
I can share the code for my tests internally if that will help
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels