- 
                Notifications
    You must be signed in to change notification settings 
- Fork 117
feat: cleanly implement HEALTHCHECK for aigw docker #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Adrian Cole <[email protected]>
| Here's an example docker healthcheck run, notice it detected the ephemeral admin server as it no longer is blocked by an upstream bug.  | 
| envoyAdmin, err := aigw.NewEnvoyAdminClient(ctx, os.Getpid(), envoyAdminPort) | ||
| if err != nil { | ||
| stderrLogger.Error("Failed to find Envoy admin server", "error", err) | ||
| serverCancel() // Likely a crashed envoy process | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this cancel only occurs if the envoyAdminPort was zero (needs look up), so we still pass on crashed envoy if you supplied yaml with a hard-coded admin server in it (didn't use env config)
Signed-off-by: Adrian Cole <[email protected]>
Signed-off-by: Adrian Cole <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Adrian Cole <[email protected]>
Head branch was pushed to by a user without write access
| Still failing | 
| opened envoyproxy/gateway#7176 on the race | 
Signed-off-by: Adrian Cole <[email protected]>
Head branch was pushed to by a user without write access
| added a commit to dump the logs on flake since gateway main has this working I think | 
| actually I can get this. there's a race where the admin address isn't yet valid file | 
Signed-off-by: Adrian Cole <[email protected]>
| so glad debug works now. 🤞 this gets it | 
| Codecov Report❌ Patch coverage is  
 ❌ Your patch status has failed because the patch coverage (70.45%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@            Coverage Diff             @@
##             main    #1314      +/-   ##
==========================================
+ Coverage   77.59%   77.64%   +0.05%     
==========================================
  Files         123      123              
  Lines       15717    15718       +1     
==========================================
+ Hits        12195    12204       +9     
+ Misses       2896     2888       -8     
  Partials      626      626              ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
| return "", fmt.Errorf("timeout waiting for admin address file %s: %w", path, lastErr) | ||
| case <-ticker.C: | ||
| // Verify it's a file | ||
| if info, err := os.Stat(path); err != nil { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the key code. before, we weren't waiting for envoy to actually write the admin address file, just erroring if it wasn't there, yet
| woot green! | 
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Description
Before #1309, we had to work around a bug in EG where in standalone mode the admin server couldn't be detected. Since func-e always ensures there's an admin server, and that bug in EG is fixed, we can remove the tech debt.
We also can consolidate the health checking internally with Docker HEALTHCHECK for the same reason.