Skip to content

Conversation

@Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Oct 24, 2025

Description

This PR attempts to speed up the free-disk-space.sh which takes more than 15mins to execute. We first removeman-db as updating it takes some time - Even official runners from Nov 10th would have them removed - actions/runner-images#13213

Additionally we try to use rsync to drop the files instead of rm -rf.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Summary by Sourcery

Enhancements:

  • Add logging of disk usage before cleaning in free-disk-space script

Summary by Sourcery

Optimize disk cleanup script by excluding man-db, introducing rsync-based deletion for directories, and logging pre-cleanup disk usage.

Enhancements:

  • Exclude man-db from the list of Ubuntu package removals to speed up updates
  • Add delete_directories_with_rsync function to use rsync for faster purging of toolchain directories
  • Replace bulk rm -rf directory removal with rsync-based deletion
  • Log disk usage before running cleanup steps to provide pre-cleanup metrics

@cla-bot cla-bot bot added the cla-signed label Oct 24, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 24, 2025

Reviewer's Guide

This PR optimizes the free-disk-space.sh script by skipping the slow man-db removal and replacing bulk rm -rf operations with an rsync-based deletion approach, along with added logging for directory cleanup.

Class diagram for new delete_directories_with_rsync function

classDiagram
    class free_up_disk_space_ubuntu {
        +directories_to_be_removed: array
        +delete_directories_with_rsync(directories)
    }
    class delete_directories_with_rsync {
        +delete_directories_with_rsync(directories)
    }
    free_up_disk_space_ubuntu --> delete_directories_with_rsync: calls
Loading

Flow diagram for updated disk cleanup process in free-disk-space.sh

flowchart TD
    A["Start disk cleanup"] --> B["Log disk space usage before cleaning"]
    B --> C["Remove unnecessary packages (including man-db)"]
    C --> D["Autoclean apt cache"]
    D --> E["Remove toolchain directories using rsync deletion"]
    E --> F["Prune docker images"]
    F --> G["End"]
Loading

File-Level Changes

Change Details Files
Exclude man-db package from slow apt operations
  • Added 'man-db' to the removal list for Ubuntu cleanup
.github/bin/free-disk-space.sh
Refactor directory removal to use rsync for faster deletes
  • Created delete_directories_with_rsync() that logs and uses rsync --delete
  • Replaced inline sudo rm -rf calls with calls to the new function
.github/bin/free-disk-space.sh

Possibly linked issues

  • #N/A: The PR optimizes the free-disk-space.sh script by removing man-db and using rsync for deletion, directly addressing the slow disk cleanup causing the CI timeout.
  • #N/A: The PR removes man-db and uses rsync to speed up disk cleanup, preventing the setup action's timeout.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Praveen2112 Praveen2112 force-pushed the praveen/disk_space_cleanup branch 6 times, most recently from 14a4a8c to ef383f7 Compare October 24, 2025 12:17
@Praveen2112 Praveen2112 changed the title Debug the issue on removing toolchains Minor cleanup on free-disk-space.sh Oct 24, 2025
@Praveen2112 Praveen2112 marked this pull request as ready for review October 24, 2025 13:53
@Praveen2112 Praveen2112 force-pushed the praveen/disk_space_cleanup branch from ef383f7 to 2eb1554 Compare October 24, 2025 13:53
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider using an explicit empty source directory with rsync (and its --delete-before flag) instead of syncing from ‘.’ and then rm -rf—this avoids confusion and may eliminate the extra rm step.
  • The script echoes “Disk space usage before cleaning:” but doesn’t run any command; add df -h (and optionally a post-cleanup disk usage) to make the logging actionable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider using an explicit empty source directory with rsync (and its --delete-before flag) instead of syncing from ‘.’ and then rm -rf—this avoids confusion and may eliminate the extra rm step.
- The script echoes “Disk space usage before cleaning:” but doesn’t run any command; add `df -h` (and optionally a post-cleanup disk usage) to make the logging actionable.

## Individual Comments

### Comment 1
<location> `.github/bin/free-disk-space.sh:34` </location>
<code_context>
+      "/usr/local/share/boost/"
+      "${AGENT_TOOLSDIRECTORY}")
+
+    delete_directories_with_rsync "${directories_to_be_removed[@]}"
+
+    echo "Prune docker images"
</code_context>

<issue_to_address>
**suggestion:** Consider whether rsync is necessary before rm -rf for directory deletion.

If secure wiping is not required, using rsync before rm -rf increases complexity and execution time without clear benefit.
</issue_to_address>

### Comment 2
<location> `.github/bin/free-disk-space.sh:44` </location>
<code_context>
+{
+    for dir in "$@"; do
+        echo "Deleting contents of $dir using rsync"
+        sudo rsync --delete -a --exclude='*' ./ "$dir"
+        sudo rm -rf "$dir"
+    done
</code_context>

<issue_to_address>
**issue (bug_risk):** Rsync command may not behave as intended with --exclude='*' and ./ as source.

This configuration will prevent rsync from deleting any files, as all are excluded. To delete contents, adjust the rsync options or use a different method.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Praveen2112 Praveen2112 force-pushed the praveen/disk_space_cleanup branch 6 times, most recently from 419e2b9 to f775811 Compare October 24, 2025 14:54
sudo mkdir /tmp/empty
for dir in "$@"; do
echo "Deleting contents of $dir using rsync"
sudo rsync --delete -a /tmp/empty/ "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we want to use rync when all we want is rm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of files in a directory is large, them rm -rf might take sometime, which leads to increase in the execution time crossing the limit i.e 15mins

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rsync --delete -a /tmp/empty/ "$dir" is more of mimicking rm -rf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would rsync be faster than rm? is it because it does different system calls, it runs things in threads (good for networks), or?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the ai and give context "work with data heavy systems", it answer:

Because you work with data-heavy systems (databases, lakes, large file trees) you’ll likely encounter directories with very many files and subdirectories. In such cases:
	•	Using rm -rf triggers lots of individual filesystem operations (unlink, rmdir) in what may be a sub-optimal order; the overhead (and metadata cost) becomes dominant.
	•	Using rsync -a --delete (with an empty source) shifts the pattern: rsync walks the destination tree, determines all items present, then issues deletions in a more efficient pattern (for many files) which tends to reduce metadata churn and avoid worst-case filesystem behaviour.
	•	Thus for very large file trees, rsync may “appear” much faster in wall-time, even though conceptually you’re still deleting everything.
	•	Given your context (large datasets, perhaps S3 or other object stores, etc) you may adapt: for local filesystem large-scale deletion this trick is a practical tool.

@Praveen2112 Praveen2112 force-pushed the praveen/disk_space_cleanup branch from f775811 to c349b52 Compare October 24, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants