-
Notifications
You must be signed in to change notification settings - Fork 3
build(medcat-service): CU-869b2zjay Clean runner to free up space for docker builds #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build(medcat-service): CU-869b2zjay Clean runner to free up space for docker builds #216
Conversation
alhendrickson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine so I've approved it - though generally seems off that this has now started to fail but it didnt previously. I mean ideally we wouldnt have to mess around with the runners.
One guess for why it failed is is we copy the whole medcat-v2 folder, which might have grown.
Second one is in general the gpu docker image is massive and probably growing in size as libs get updated...
As a wildcard option we could split this into two jobs - one for the cpu and one for gpu image, and could probably parameterise. This would then go on two diff runners and I'd hope fix any issue with docker layers being kept around.
| - name: Clean Docker to free up space | ||
| # NOTE: otherwise the runner tends to run out of disk space roughly 75% of the time | ||
| run: docker system prune -af |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would want to see a before/after here as well if possible - or remove it altogether if the other step fixes the issue
|
Just as a FYI - the current setup does seem to run without an issue. Reran it 3 times. Yet it didn't after I just added the But will give a check to the disk space before and after purge to see how much that actually does. You're right - we don't know if it's the later addon or the combination that we need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved - clearing 15G is pretty nice, way more than I thought it would be
One task for a later date would be to see how to speed this up, adding 2 minutes on here is quite a lot
I do recall having to restart this earlier as well. Though I'm pretty sure it was a lot less frequent. And I think it was something to do with failing network calls instead.
It's unlikely to be an issue of the folder size. It's 77MB. Though we could avoid copying over the 75MB (i.e vast majority) of the
This sounds more likely. The depenedencies are locked (with
I did think about that as an option as well. Don't think I had a specific reason to go the route I chose (i.e cleaning up things that take up space). Though in hindsight this means there's fewer runners running (potentially in parallel). |
I don't think this added much time for the run. Still seems to be less than 4 minutes for the The overall has gone up from 5m5s to 5m20s. So 15 seconds. Could be related to the removal of files (15GB removal at 15 seconds is quite good overall I would say). |
Previously, a lot of the times the workflow fails the first couple of times due to running out of disk space. For instance, this one only succeeded on the 4th attempt:
https://github.com/CogStack/cogstack-nlp/actions/runs/19104323916
The specific failure reads:
This generally happened in the "Build and push Docker Jupyter singleuser image with GPU support" step.
This PR will attempt to rectify that by:
Cleaning up docker before GPU build stepThis should remove a bunch of the files docker wrote on disk to avoid the issue with running out of disk space in the next stepThe entire (successful) workflow job previously took less than 4 minutesSo the small slowdown due to pruning the cache is unlikely to be significant