-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Minor cleanup on free-disk-space.sh #27088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Minor cleanup on free-disk-space.sh #27088
Conversation
Reviewer's GuideThis PR optimizes the free-disk-space.sh script by skipping the slow man-db removal and replacing bulk rm -rf operations with an rsync-based deletion approach, along with added logging for directory cleanup. Class diagram for new delete_directories_with_rsync functionclassDiagram
class free_up_disk_space_ubuntu {
+directories_to_be_removed: array
+delete_directories_with_rsync(directories)
}
class delete_directories_with_rsync {
+delete_directories_with_rsync(directories)
}
free_up_disk_space_ubuntu --> delete_directories_with_rsync: calls
Flow diagram for updated disk cleanup process in free-disk-space.shflowchart TD
A["Start disk cleanup"] --> B["Log disk space usage before cleaning"]
B --> C["Remove unnecessary packages (including man-db)"]
C --> D["Autoclean apt cache"]
D --> E["Remove toolchain directories using rsync deletion"]
E --> F["Prune docker images"]
F --> G["End"]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
14a4a8c to
ef383f7
Compare
ef383f7 to
2eb1554
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider using an explicit empty source directory with rsync (and its --delete-before flag) instead of syncing from ‘.’ and then rm -rf—this avoids confusion and may eliminate the extra rm step.
- The script echoes “Disk space usage before cleaning:” but doesn’t run any command; add
df -h(and optionally a post-cleanup disk usage) to make the logging actionable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using an explicit empty source directory with rsync (and its --delete-before flag) instead of syncing from ‘.’ and then rm -rf—this avoids confusion and may eliminate the extra rm step.
- The script echoes “Disk space usage before cleaning:” but doesn’t run any command; add `df -h` (and optionally a post-cleanup disk usage) to make the logging actionable.
## Individual Comments
### Comment 1
<location> `.github/bin/free-disk-space.sh:34` </location>
<code_context>
+ "/usr/local/share/boost/"
+ "${AGENT_TOOLSDIRECTORY}")
+
+ delete_directories_with_rsync "${directories_to_be_removed[@]}"
+
+ echo "Prune docker images"
</code_context>
<issue_to_address>
**suggestion:** Consider whether rsync is necessary before rm -rf for directory deletion.
If secure wiping is not required, using rsync before rm -rf increases complexity and execution time without clear benefit.
</issue_to_address>
### Comment 2
<location> `.github/bin/free-disk-space.sh:44` </location>
<code_context>
+{
+ for dir in "$@"; do
+ echo "Deleting contents of $dir using rsync"
+ sudo rsync --delete -a --exclude='*' ./ "$dir"
+ sudo rm -rf "$dir"
+ done
</code_context>
<issue_to_address>
**issue (bug_risk):** Rsync command may not behave as intended with --exclude='*' and ./ as source.
This configuration will prevent rsync from deleting any files, as all are excluded. To delete contents, adjust the rsync options or use a different method.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
419e2b9 to
f775811
Compare
| sudo mkdir /tmp/empty | ||
| for dir in "$@"; do | ||
| echo "Deleting contents of $dir using rsync" | ||
| sudo rsync --delete -a /tmp/empty/ "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we want to use rync when all we want is rm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the number of files in a directory is large, them rm -rf might take sometime, which leads to increase in the execution time crossing the limit i.e 15mins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rsync --delete -a /tmp/empty/ "$dir" is more of mimicking rm -rf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would rsync be faster than rm? is it because it does different system calls, it runs things in threads (good for networks), or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked the ai and give context "work with data heavy systems", it answer:
Because you work with data-heavy systems (databases, lakes, large file trees) you’ll likely encounter directories with very many files and subdirectories. In such cases:
• Using rm -rf triggers lots of individual filesystem operations (unlink, rmdir) in what may be a sub-optimal order; the overhead (and metadata cost) becomes dominant.
• Using rsync -a --delete (with an empty source) shifts the pattern: rsync walks the destination tree, determines all items present, then issues deletions in a more efficient pattern (for many files) which tends to reduce metadata churn and avoid worst-case filesystem behaviour.
• Thus for very large file trees, rsync may “appear” much faster in wall-time, even though conceptually you’re still deleting everything.
• Given your context (large datasets, perhaps S3 or other object stores, etc) you may adapt: for local filesystem large-scale deletion this trick is a practical tool.
f775811 to
c349b52
Compare
Description
This PR attempts to speed up the
free-disk-space.shwhich takes more than 15mins to execute. We first removeman-dbas updating it takes some time - Even official runners from Nov 10th would have them removed - actions/runner-images#13213Additionally we try to use
rsyncto drop the files instead ofrm -rf.Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Summary by Sourcery
Enhancements:
Summary by Sourcery
Optimize disk cleanup script by excluding man-db, introducing rsync-based deletion for directories, and logging pre-cleanup disk usage.
Enhancements: