You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/deployment/troubleshooting.md
+38-8
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,16 @@
1
1
# Troubleshooting
2
2
3
-
## Monitoring
3
+
## Tools
4
+
5
+
### Monitoring
4
6
5
7
We have [ping tests](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) set up to notify about availability of each [environment](../infrastructure/#environments). Alerts go to [#benefits-notify](https://cal-itp.slack.com/archives/C022HHSEE3F).
6
8
7
-
## Logs
9
+
###Logs
8
10
9
11
Logs can be found a couple of places:
10
12
11
-
### Azure App Service Logs
13
+
####Azure App Service Logs
12
14
13
15
[Open the `Logs` for the environment you are interested in.](https://docs.google.com/document/d/11EPDIROBvg7cRtU2V42c6VBxcW_o8HhcyORALNtL_XY/edit#heading=h.6pxjhslhxwvj) The following tables are likely of interest:
14
16
@@ -18,7 +20,7 @@ Logs can be found a couple of places:
18
20
19
21
For some pre-defined queries, click `Queries`, then `Group by: Query type`, and look under `Query pack queries`.
[Open the `Logs` for the environment you are interested in.](https://docs.google.com/document/d/11EPDIROBvg7cRtU2V42c6VBxcW_o8HhcyORALNtL_XY/edit#heading=h.n0oq4r1jo7zs)
24
26
@@ -31,19 +33,23 @@ In the latter two, you should see recent log output. Note [there is some latency
31
33
32
34
See [`Failures`](https://docs.microsoft.com/en-us/azure/azure-monitor/app/asp-net-exceptions#diagnose-failures-using-the-azure-portal) in the sidebar (or `exceptions` under `Logs`) for application errors/exceptions.
33
35
34
-
### Live tail
36
+
####Live tail
35
37
36
38
After [setting up the Azure CLI](#making-changes), you can use the following command to [stream live logs](https://docs.microsoft.com/en-us/azure/app-service/troubleshoot-diagnostic-logs#in-local-terminal):
@@ -54,7 +60,29 @@ If Terraform commands fail (locally or in the Pipeline) due to an `Error acquiri
54
60
1.**Do any engineers have a Terrafrom command running locally?** You'll need to ask them. For example: They may have started an `apply` and it's sitting waiting for them to [approve](https://developer.hashicorp.com/terraform/cli/commands/apply#automatic-plan-mode) it. They will need to (gracefully) exit for the lock to be released.
55
61
1.**If none of the steps above identified the source of the lock**, and especially if the `Created` time is more than ten minutes ago, that probably means the last Terraform command didn't release the lock. You'll need to grab the `ID` from the `Lock Info` output and [force unlock](https://developer.hashicorp.com/terraform/language/state/locking#force-unlock).
56
62
57
-
## Eligibility Server
63
+
### App fails to start
64
+
65
+
If the container fails to start, you should see a [downtime alert](#monitoring). Assuming this app version was working in another [environment](../infrastructure/#environments), the issue is likely due to misconfiguration. Some things you can do:
66
+
67
+
- Check the [logs](#logs)
68
+
- Ensure the [environment variables](../../configuration/environment-variables/) and [configuration data](../../configuration/data/) are set properly.
69
+
-[Turn on debugging](../../configuration/environment-variables/#django_debug)
70
+
- Force-push/revert the [environment](../infrastructure/#environments) branch back to the old version to roll back
71
+
72
+
### Littlepay API issue
73
+
74
+
Littlepay API issues may show up as:
75
+
76
+
- The [monitor](https://github.com/cal-itp/benefits/actions/workflows/check-api.yml) failing
77
+
- The `Connect your card` button doesn't work
78
+
79
+
A common problem that causes Littlepay API failures is that the certificate expired. To resolve:
1. Put that certificate into the [configuration data](../../configuration/data/) and/or the [GitHub Actions secrets](https://github.com/cal-itp/benefits/settings/secrets/actions)
84
+
85
+
### Eligibility Server
58
86
59
87
If the Benefits application gets a 403 error when trying to make API calls to the [Eligibility Server](https://docs.calitp.org/eligibility-server/), it may be because the outbound IP addresses changed, and the Eligibility Server firewall is still restricting access to the old IP ranges.
60
88
@@ -64,3 +92,5 @@ If the Benefits application gets a 403 error when trying to make API calls to th
64
92
1. Click `Edit`
65
93
1. Click `Variables`
66
94
1. Update the relevant variable with the new list of CIDRs
95
+
96
+
Note there is nightly downtime as the Eligibility Server restarts and loads new data.
0 commit comments