Skip to content

Troubleshooting Guide

Jeremy Moffitt edited this page Apr 25, 2018 · 34 revisions

This page will serve as a starting point for a troubleshooting guide for the Cloud 8 Installer UI. After the information has become more complete it can be published into a more persistent location if desired.

Configuration files

The installer service and ardana-service are configured by files that are specified on the command line to these services as they are run. The installer service normally will started with an config file argument of /etc/ardana/ardana-installer-server/installer-server.conf, while the ardana service normally uses /etc/ardana/ardana-service.conf. The command lines for starting these programs can be verified by using displaying their systemctl service startup scripts:

$ sudo systemctl show --property=ExecStart installer-server    # or ardana-service
ExecStart={ path=/usr/bin/ardana-installer-server ; argv[]=/usr/bin/ardana-installer-server --config-file /etc/ardana/ardana-installer-server/installer-server.conf ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }

These services use OpenStack oslo.config to process these config files, so the files use that format, which is basically the usual INI format, with section headers in brackets (e.g. [DEFAULT]) and KEY=VALUE pairs. References to config variables in this guide will be in the format SECTION.KEY, e.g. DEFAULT.log_file.

Location of log files

installer ui

The installer-ui is written in JavaScript and is executed in the browser, so any log messages that it generates can be displayed in the browser's JavaScript console.

Installer service

Normally the installer-ui-service is configured to write its logs into /var/log/ardana_installer/installer-server.log. This location is controlled by the DEFAULT.log_file config setting.

As a fallback, any output that is not logged into the logfile will be written to stdout or stderr, which is viewable with journalctl as root, e.g. journalctl -b 0 -u installer-server as root. There should typically be very little that would percolate down to this journal.

Ardana service

Normally the ardana service is configured to write its logs into /var/log/ardana-service/ardana-service.log. This location is controlled by the DEFAULT.log_file config setting.

As a fallback, any output that is not logged into the logfile will be written to stdout or stderr, which is viewable with journalctl as root, e.g. journalctl -b 0 -u ardana-server as root. There should typically be very little that would percolate down to this journal.

High level architecture

All of the following components run on the lifecycle-manager node:

Note that this diagram was created using Dia, and the source for this diagram is contained within this wiki (you have to clone this wiki locally in order to be able to edit it)

The components in this diagram are:

  • Installer User Interface

    This user interface is written in JavaScript using the ReactJS library, and the code is executed in the user's browser. The code, along with index.html, is served up by the Installer Server.

    The source code is in https://github.com/ArdanaCLM/ardana-installer-ui.

  • Ardana Service

    The primary purpose of this service is to provide a REST API for manipulating the input model and launching Ansible playbooks and the configuration processor. As playbooks are being executed, the Ardana Service captures logs and events and streams them in real-time to the User Interface using SocketIO. Although its main client is the installer UI, the Ardana Service is not part of the installer; it is registered as a service in keystone and is also used by the Operations Console. This service will therefore remain running throughout the lifetime of the cloud even when the installer UI has been stopped or uninstalled.

    This service runs by default on port 9085, and this is controlled by the flask.port config setting.

    This service is written in Python and runs under Flask. The source code is in gerrit and also mirrored to https://github.com/ArdanaCLM/ardana-service. The API documentaion for this service can be programatically generated by using the docs target to tox.

  • Installer Server

    Beyond serving up the installer UI JavaScript, the purposes of this service is to service as a mediator between the UI and the Ardana Service and to provide additional services to the installer, such as interfacing with HPE OneView and SUSE Manager for automatic discovery.

    This service runs by default on port 3000, and this is controlled by the flask.port config setting.

    This small service is written in Python and runs under Flask. The source code is in https://github.com/ArdanaCLM/ardana-installer-server.

Files

Several files are created and updated by the installer UI as it runs. Here are some of the important ones.

  • progress file

    This json-formatted file primarily tracks the progress of the user through the user interface, but it also stores a small number of other preferences and status values. It normally resides in /var/cache/ardana_installer/progress.json, and its location is controlled by the general.progress_file configuration setting for the installer service.

  • discovery file

    This json-formatted file primarily captures the servers that were obtained via auto discovery. It normally resides in /var/cache/ardana_installer/db.json, and its location is controlled by the general.db_file configuration setting for the installer service.

  • input model

    The input model is a collection of mostly YAML files that controls what the deployed cloud will look like. Most of the installer UI is devoted to manipulation of various values within the input model. The customer's input model normally resides in ~ardana/openstack/my_cloud/definition, and its location is controlled by the paths.model_dir configuration setting for that ardana service.

  • play files

    A "play" is a run of a playbook, and the id of the play is basically the timestamp of when it was launched. A play generates three files, each of which has the play id as its basename and a different filename extension. The location of these log files is controlled by the DEFAULT.log_dir config setting, which is normally set to /var/log/ardana-service/logs.

    • Log file (.log)

      This contains the console output that is generated while the playbook is executed. This information is also streamed back to the installer UI (via the installer server) in real-time.

    • Event file (.event)

      As ansible playbooks are running, a custom ansible plugin was created which watches for certain important events (the start and finish of several important playbooks) and makes REST calls into the ardana service. These evens are streamed back to the installer UI (via the installer server) in real-time, which enables the installer UI to track the progress of important steps in the install progress.

      These information that is captured in the .event and .log files can be replayed to any newly created browser window. This is the mechanism that allows the customer to close the browser and re-open it (or say open a second browser window to the installer) and be able to show all of the log messages and events that may have occurred before the window was created.

    • Status file (.json)

      This short file contains metadata about the play, including things like the playbook command being run, its process id, exit status, and so on.

Stages

Initial

The installer service and ardana service are started via the deployer-init.yml ansible playbook. The installer service will be listening on port 3000. The ardana service will be listening on port 9085 in an insecure mode, which means that there is no authentication performed, and its REST API calls could be executed by any process with access. The ardana service therefore listens only on the 127.0.0.1 loopback interface:

$ sudo netstat -pln | grep 9085
tcp        0      0 127.0.0.1:9085          0.0.0.0:*               LISTEN      16547/python

Post-Install

During the installation, the primary (and last) playbook that is launched is site.yml, which deploys the entire cloud. Included in this is deploying keystone and rewriting the ardana service configuration file to put it into secure mode, meaning that requests will be require a valid keystone token in order to proceed. But this configuration file does not take effect until the the very of site.yml when the ardana service is restarted via a call to the _ardana-service-delayed-reset.yml playbook. 60 seconds after this playbook is executed, the ardana service will be listening to requests on a real interface, and all requests will come via haproxy (which also provides the https endpoint that is registered in keystone)

$ sudo netstat -pln | grep 9085
tcp        0      0 192.168.24.18:9085      0.0.0.0:*               LISTEN      471/python
tcp        0      0 192.168.24.2:9085       0.0.0.0:*               LISTEN      41615/haproxy
tcp        0      0 10.84.43.68:9085        0.0.0.0:*               LISTEN      41615/haproxy

Common error messages and resolutions

Get Templates Failure

Failed to get templates. Won't be able to show details of each model.

Error: INTERNAL SERVER ERROR

This error typically indicates that the UI is unable to communicate with the ardana service on the lifecycle-manager via the installer service. Check that the ardana service and the installer-service are both running:

systemctl status ardana-service installer-service

If the ardana-service is running and the error is still occurring, try running a curl command from the deployer

curl http://localhost:9085/api/v2/templates

If the result is a wall of text, then the ardana-service is running and responding normally. If the result is a connection refused error, check if it is running on an IP only and not on localhost with "sudo netstat -plnt | grep 9085" , if there isnt a result, recheck the systemctl status above, otherwise take the IP address and run the following two commands:

curl http://<IP_FROM_NETSTAT>:9085/api/v2/templates
curl http://<IP_FROM_NETSTAT>:9085/api/v2/heartbeat

If the first command comes back blank, but the second succeeds, then the ardana-service is running in keystone authentication mode and thinks that the cloud has already deployed successfully

If the ardana-service status is inactive, start it up with

sudo systemctl restart ardana-service

UI hangs during network restart

Install UI hangs on the following line:

TASK: [network_interface | configure | Restart networking]

This operation can take approximately 5 minutes to finish. If it has not come back after 5 minutes, run the following command as the ardana user on the lifecycle manager:

tail -f ~/.ansible/ansible.log

If it looks the same as what is being displayed in the UI, you may need to wait longer. If the ansible.log shows activity, but the Install UI does not, it means your model's network configuration in your model has been applied which has disconnected your browser from the lifecycle manager. If you were using an ssh tunnel to connect to your UI, it is also possible you have not modified the model's firewall_rules.yml to allow ssh connections thru the firewall (i.e. the ssh section is still commented out and/or improperly configured). In both cases, it is best to monitor the ansible.log from the command above and monitor the installation from there. The best way to avoid such networking issues in the future, is to run the browser on the lifecycle manager.

Error during OS provisioning or deployment

If OS provisioning or deployment fails, click on the "Show logs" to see the error. The bottom of the log may not always show the real problem, so you'll need to search upward from the bottom of the log for the word "failed:" to see the cause of failure. If you see "failure:" along with "ignoring", you may need to search further up the log for the real failure.

Missing CLOUD 8 repo

If the Cloud 8 repo is unavailable, you may see the following error during cloud deployment:

TASK: [package-consumer | install | Install ardana-packager] ******************
ok: [mycloud-cp1-c1-m1]
failed: [mycloud-cp1-comp0001] => {"failed": true}
msg: Package 'python-ardana-packager' not found.

failed: [mycloud-cp1-comp0002] => {"failed": true}
msg: Package 'python-ardana-packager' not found.

You'll need to make the Cloud 8 repo available to the problematic nodes by doing the following:

echo -e "[C8]\nenabled=1\nautorefresh=0\nbaseurl=iso:///?iso=<ABSOLUTE PATH TO CLOUD 8 ISO>\ntype=yast2\ngpgcheck=0" > C8.repo
scp <CLOUD 8 ISO> <your problematic node>:~  
scp C8.repo <your problematic node>:~

# ssh to each problematic node and do the following:
sudo cp ~/C8.repo /etc/zypp/repos.d/
sudo vi /etc/zypp/repos.d/C8.repo (change <ABSOLUTE PATH TO CLOUD 8 ISO> appropriately)
sudo zypper -n --gpg-auto-import-keys ref

In the Install UI, go back to Page 7: Review Configuration Files, and click Validate. When validation is complete, click "Deploy" to continue with installation.

osconfig - diskconfig error

If you encounter an error like this on your hosts:

TASK: [osconfig | diskconfig | check result of osconfig check] ****************
failed: [mycloud-cp2-comp0001] => {"failed": true}
msg: found /etc/openstack/osconfig-ran on node, this means that Ardana OpenStack software has been previously installed on this system mycloud-cp1-comp0001, if you want to continue to wipe this node please remove /etc/openstack/osconfig-ran from the node(s) in question.

In the Install UI, go back to Page 7: Review Configuration Files, click on the Deployment tab and make sure "Wipe Data Disks" is deselected. Then click back to the Model tab, click Validate and then click Deploy.

ConnectionError to the ardana service

If you receive errors on the compute hosts page of the Operations Console regarding not being able to connect to services, and the operations console log file (/var/log/ops-console/error.log on one of the control nodes) contains the following stack trace when executing the model/is_encrypted API in the ardana service:

2018-03-08 15:40:28,788 INFO  [bll.api.controllers.app_contro][Dummy-16           ] [ba3ab932] Received TARGET:ardana ACTION:GET DATA:{path:model/is_encrypted} TXN:ba3ab932-f5f9-425b-ae8a-3c4d9fb61da4
2018-03-08 15:40:28,828 ERROR [bll.plugins.service           ][Dummy-17           ] [ba3ab932] spawn_service failed
Traceback (most recent call last):
... (traceback details omitted for brevity)
ConnectionError: ('Connection aborted.', BadStatusLine("''",))

This is likely caused by the ardana service not being in secure (post-install) mode. The only way we have seen this happen is the site.yml playbook was run and it failed, but its failure was considered benign by the operator and not re-run. Because it did not run to completion, the playbook to restart the ardana service was never run. This can be verified by running netstat -ln | grep 9085 on the lifecycle-manager and seeing the service is still listening only on 127.0.0.1. If this is the case, the ardana service should be restarted by running the following playbook from the lifecycle-manager:

cd ~/scratch/ansible/next/ardana/ansible
ansible-playbook -i hosts/verbs _ardana-service-delayed-reset.yml

Resetting the install UI

Normally there should be no need to reset the UI, but if there is a need, the easiest way is to append ?reset=true to the installer UI URL. This will trigger the installer service to clear out the progress file and the discovery file, but the input model remains untouched. The result is that the user will be shown the first page in the installer, but any changes to the model or the openstack service config files will remain intact.

As described above, the Installer Service and Ardana Service processes are managed via systemd. In the unusual event that they need to be restarted, this can be done with

sudo systemctl restart installer-server    # or ardana-service

Manual workaround for importing a configuration from an existing cloud deployment

IMPORTANT

Since this procedure blindly overwrites the configuration of one deployment on top of another, make sure that the source and target are from the same version of SUSE OpenStack Cloud. If you import files from a different version, it is likely the you will have problems deploying your cloud.

On Source system as the ardana user:

tar -C ~/openstack/my_cloud -czhf export.tgz definition config

Copy that file to the target system, and run the following there as the ardana user:

mkdir temp
tar -C temp -xzf export.tgz
cd temp
mv definition custom
echo "Custom model" > custom/README.html 
sed -i 's/^\( *name: *\)/\1custom-/' custom/cloudConfig.yml

find config -type f -exec cp {} ~/openstack/my_cloud/{} \;
cp -R custom/* ~/openstack/my_cloud/definition