Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve image build speed (Discussion,Ideas) #2860

Open
gjrtimmer opened this issue Nov 27, 2023 · 11 comments
Open

Improve image build speed (Discussion,Ideas) #2860

gjrtimmer opened this issue Nov 27, 2023 · 11 comments

Comments

@gjrtimmer
Copy link
Contributor

This is a discussion for a concept I'm working on; it is also preliminary work on hopefully support for arm64.
I'm working on a concept to improve the build speed of the image.

The concept I'm using within my work and in my private repositories has a base image automatically updated with a CI cron job at least once a week. This image is written to the registry with the tag base. The next stage of the build is simply using this image tag.

Furthermore, I want several time-consuming build stages of this image to be extracted to speed up the building process.
For example, ruby 3.0.6 is used throughout several versions. However, building this every time is time-consuming.
Therefor, I'm looking towards having this specific ruby bing built by a CI process, which then writes it to the repository as an asset as a *.deb file stored in the repository under git-lfs. The install stage has to checkout the repository, perform git-lfs checkout, start the build context, and then only install the *.deb package. This can also be done with the other compilation parts.

This is a rough draft; I'm looking forward to input and comments.

Please let me know what you think.

Disclaimer: Due to my profession, I'm a hefty GitLab user; I can quickly build all of this in GitLab; however, building it within GitHub actions will be a challenge and probably require help and input from others.

@gjrtimmer
Copy link
Contributor Author

@sameersbn @solidnerd question is the reason for compiling ruby the --enable-shared parameter? Is this not default, second question can we not use the latest ruby 3.2.2?

Im looking to speed up the build process so I'm looking why the image compiles ruby itself

@kkimurak
Copy link
Contributor

kkimurak commented Nov 27, 2023

I authored the change (Install ruby from source - #2429) to fill gitlab operation requirement on 14.3.0 (ruby 2.7.4).
There is no clear reason why I chose the source installation. At the time, I was just in a hurry. Around GitLab 13, there was a significant delay in keeping up with GitLab releases, and I was afraid that the project would stall. This remains true even today.

Through a few years brightbox/ruby-ng was used as ruby package installation source but it has stopped updating,

Early releases of sameersbn/gitlab just put binaries (release tarballs, rubygem caches and so on) but it have been revised by sameersbn - see #30 (comment)

I once tried porting the ruby installation process to a Dockerfile in the hopes that the cache would be utilized. But for local builds, the speedup was only a few minutes.
Edit 20231128 : Found traces of work: kkimurak/docker-gitlab@7c8e2db

I also had an idea to replace the base image from ubuntu to a ruby image, but I haven't tried it yet. I don't know if there are any images for such purpuse available.
Edit 202301128 : Official docker image ruby is available: https://hub.docker.com/_/ruby


For --enable-shared option - I forgot the details about this. I feel like it was needed at runtime, but I'm not sure. I will test to build / run image later,

For ruby 3.2.x - As far as I checked https://gitlab.com/gitlab-org/gitlab/-/issues/404750 , we can test it,

@gjrtimmer
Copy link
Contributor Author

gjrtimmer commented Nov 28, 2023

@kkimurak thank you very much this helps a lot. A few minutes gain on several different processes can become a significant improvement.

@gjrtimmer
Copy link
Contributor Author

Im also looking at more automation for the repository.

@kkimurak
Copy link
Contributor

Yes, automation is one of justice. Totally agree.
I'm working on writing a script to sync upstream configurations. It's still work in progress but will be helpful if it's done, I think.

However, I recently came across an example of a reasonably well-known repository operated with almost complete automation that was forced to fork due to the administrator being unreachable (https://github.com/TheRandomLabs/scoop-nonportable ). I'm afraid of ending up the same way.

There are limits to the amount of work we (the community) can do. If a bug is introduced through automation, it may take some time for it to be noticed (even now, when updates are done manually, bugs are often discovered late). The maintainers are doing a great job, but I'm a worrier..

@kkimurak
Copy link
Contributor

By the way, built image without --enable-shared and with ruby 3.2.2 both worked. At least I have confirmed image build, updating password, creating new blank project and pull from / push to the repository succeed.

@gjrtimmer
Copy link
Contributor Author

@kkimurak Thank you for the update on Ruby.

Furthermore, I have quite some experience with automation of repositories, have been running GitLab for years, and also using GitLab with my employer. When it comes to writing a script to sync repositories, why are you not using github-cli for that?

@gjrtimmer
Copy link
Contributor Author

gjrtimmer commented Nov 28, 2023

@kkimurak should we have a separate repository for assets? Like what I proposed earlier? I'm thinking what would be a good workflow. For example, I have a gitlab-runner project that can detect if there is a new version online and then create a commit and start a new pipeline that builds the image. very handy, did not have to do anything since version 15.9.1, it spits out a new image as soon as there is a release and it checks for a new release about 3 times a week on a scheduled pipeline. This is what I want to create with this repository. The problem I'm facing I'm not very good with circleci.

Another issue is that this repository has no strict rules for commits. I'm a solution architect, and automation also requires strict rules on process and workflow. If we enforce strict rules on the commits for example, the convention git-conventional-commits, then it would allow us to auto-generate the CHANGELOG etc, which means another item is automated.

However, that does require enforcing strict rules and rewriting the entire history of the repository. But it comes with the benefit of automation. What do you think?

@kkimurak
Copy link
Contributor

@gjrtimmer

When it comes to writing a script to sync repositories, why are you not using github-cli for that?

First, if I understand correctly, github-cli is a tool for github. gitlab hosted on github.com is just a mirror, so commit message does not contains informations.
If possible, I would like to include a link to the applied merge request when synchronizing with upstream. It can be useful to know what to refer on upstream when a change creates a task. This is what I (want to) do in the script.
Next, this is major reason but simply I didn't know about it when I started writing this script, and I've never used it :)

I found similar cli tools python-gitlab and checking detail..

separate repositor for assets

It looks good, but I would like to ask just out of curiosity, would it be difficult to maintain a mono-repo?
If we separate repository for assets, where to put it? Ask @sameersbn to create new one? Wouldn't we want to transfer the repository to something like organization?
We also need to find ways to direct people to the correct repositories for issues that are still confusing at now.

The problem I'm facing I'm not very good with circleci.

Additionally you (and me, of course) may have no permission to change configuration of sameersbn/docker-gitlab on circle-ci. Build configurations is in this repo, but scheduled pipeline settings are controled by setting panel on circle-ci web UI or API. https://circleci.com/docs/en/scheduled-pipelines/
We have to wait for maintainer respond.

git-conventional-commits

This also looks good.
Personally, I think rewriting the entire history is pain a bit. References to commits in issue comment, documents and so on may be alive a while but broken in the future (by gabage collection on server side - this cannot be managed by us).
On the other hand, there's no obvious reason for me (as just one of the contributors) to disagree if this is okay for everyone.

Anyway, I'm not very knowledgeable in this area, but for example, is it possible to keep the current commits as they are and set strict commit rules for future commits?

I raised a variety of concerns, but I took them in a positive manner. It's good to have faster builds and maintain regular releases.

@gjrtimmer
Copy link
Contributor Author

  • Yes, I'm currently thinking about a monorepo
  • Changing circleci might be a problem
  • Not sure if it is possible to enforce and have the tooling ignore older commits
  • I'm using both gitlab-cli and github-cli
  • Right now what I'm going to do is create my own version of this using my own runners, and build it all on a main branch. This repository is still using master, which is perfect for us, it allows us to use a private gitlab instance with default branch main while keeping the master branch in sync with everything else including upstream
  • I'm also looking into creating support scripting and additional workflows, for example, create templates that will ask a user to complete a lists of task like checking the upstream gitlab.yml.rb file for new parameters, I'm currently collecting information to create sync jobs which automatically create merge requests when new parameters are added to the upstream gitlab.rb.erb so that it will become easier to implement

The best future I see is trying to automate most of it, even for example template generation and auto scripting, might break a few things like currently used variables however would be cool to have autoconfiguration for new variables and features as soon as they are release, and you only have to approve the MR. Its all possible just require some work.

@kkimurak
Copy link
Contributor

Note: I tested the impact of --enable-shared flag during ruby ​​build (that is currently used) on image build time. When building gitlab 17.2.0 without using the flag, the total build time in my environment increased by 3 minutes to a total of about 45 minutes. Of course I have pruned docker build cache and images before each build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants