Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve & simplify valgrind testing #278

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

onurctirtir
Copy link
Member

@onurctirtir onurctirtir commented Jan 21, 2025

  • Reduce the dependency on the other constructs of this repo, e.g., creation & deallocation of Azure VMs via the existing tooling doesn't seem so much stable. So, introduce a simple docker image that can allow us running multiple valgrind tests in a single vm in parallel and anywhere. Optionally, create_cluster.sh and its friends can still be used to create an Azure VM but they are not a must now. See the changes in README.md for more details.
  • Get rid of valgrind code from fab infra because using this infra for valgrind testing was both unnecessary and this was heavily complicating the things. Instead, a simpler infra that we can also use in our local machines seems much more useful and easier to maintain and understand.
  • Also remove fabfile/pg_report_vg_query.patch since the Postgres patch that we depend on to report the query string that causes the memory error has been merged into Postgres as of PG16.
  • Automatically collect stack traces from valgrind core files that are generated in case of a process crash.

Note to reviewer - although the PR touches nine files, this PR mainly introduces following three files and updates the README on the usage of valgrind testing infra. The other 5 files are only modified to remove the old infra.

  • valgrind/Docker/Dockerfile
    The dockerfile to build a simple docker image that can be used to run valgrind tests.
    This has nothing but Postgres & Citus built with the flags that are needed to properly run valgrind tests and valgrind itself.
  • valgrind/Docker/run_valgrind_test.sh
    The script that we run within the container to run the valgrind test for the specified test schedule.
    We could set the entrypoint of the container to this script but we keep it as a separate script for simplicity.
  • run.sh
    The host side script that we use to build the container and run the valgrind tests through the container.

I strongly recommend reading the relevant section of the README from this PR to better understand the changes.

* Get rid of valgrind code from fab infra because using this infra
  for valgrind testing was both unnecessary and this was heavily
  complicatings the things. Instead, a simpler infra that we can
  also use in our local machines seems much more usuful and easier
  to maintain & understand.

* Also remove fabfile/pg_report_vg_query.patch since the Postgres
  patch that we depend on to report the query string that causes
  the memory error has been merged into Postgres as of PG16.
@onurctirtir onurctirtir changed the title Fix & modernize valgrind testing Fix & simplify valgrind testing Jan 21, 2025
@onurctirtir onurctirtir requested review from hanefi and naisila January 21, 2025 16:34
@onurctirtir onurctirtir changed the title Fix & simplify valgrind testing Improve & simplify valgrind testing Jan 21, 2025
Copy link
Member

@thanodnl thanodnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some pointers to optimize the docker image a bit. I have no real understanding of how we use valgrind, so would be good to get someone involved in day-to-day citus to checkout and try to use this change to run a valgrind test before merging - hence the lack of approval for now.

@@ -0,0 +1,37 @@
FROM ubuntu:24.04

RUN apt-get update && apt-get install -y git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt update pulls in a lot of data that sticks around on this layer, making the image uselessly bigger. It is common to apt clean after you have installed something eg: https://github.com/citusdata/citus/blob/117bd1d04fe55b4001012b3923216d5df0fda10e/.devcontainer/Dockerfile#L43C5-L43C14

There are a couple of more occurances of apt update && apt install ... down below

Comment on lines +10 to +12
RUN ./bin/pgenv config init
RUN sed -i 's/^PGENV_CONFIGURE_OPTIONS=.*/PGENV_CONFIGURE_OPTIONS=([0]="--enable-debug" [1]="--enable-depend" [2]="--enable-cassert" [3]="CFLAGS=-ggdb -Og -fno-omit-frame-pointer -DUSE_VALGRIND")/' ./config/default.conf
RUN sed -i "s/^PGENV_MAKE_OPTIONS=.*/PGENV_MAKE_OPTIONS=([0]='-sj$(nproc)')/" ./config/default.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit hard to read what config you are running, could you copy the config file in?
eg https://github.com/citusdata/citus/blob/117bd1d04fe55b4001012b3923216d5df0fda10e/.devcontainer/Dockerfile#L63

This adds an extra file, but makes it easier to understand what the config is, that we are using.

Comment on lines +4 to +7
RUN git clone https://github.com/theory/pgenv

WORKDIR /pgenv
RUN git fetch && git checkout v1.3.6b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per https://stackoverflow.com/questions/8932389/git-shallow-clone-to-specific-tag

Suggested change
RUN git clone https://github.com/theory/pgenv
WORKDIR /pgenv
RUN git fetch && git checkout v1.3.6b
RUN git clone --branch v1.3.6b --depth 1 https://github.com/theory/pgenv

less code, less data, instantly in the state you want it in

RUN sed -i 's/^PGENV_CONFIGURE_OPTIONS=.*/PGENV_CONFIGURE_OPTIONS=([0]="--enable-debug" [1]="--enable-depend" [2]="--enable-cassert" [3]="CFLAGS=-ggdb -Og -fno-omit-frame-pointer -DUSE_VALGRIND")/' ./config/default.conf
RUN sed -i "s/^PGENV_MAKE_OPTIONS=.*/PGENV_MAKE_OPTIONS=([0]='-sj$(nproc)')/" ./config/default.conf

RUN apt-get update && apt-get install -y bzip2 build-essential libicu-dev pkg-config flex bison liblz4-dev libreadline-dev libzstd-dev libxslt1-dev zlib1g-dev zlib1g-dev libxml2-utils docbook-xsl xsltproc valgrind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when installing multiple packages it is easier maintainable if you put every package on a new line and sort them. Keeping an extra \ at the end of the last package as well for easier diffing in the future, but that is also enforced by the apt clean you want to have in the same RUN as the update and install.

example https://github.com/citusdata/citus/blob/117bd1d04fe55b4001012b3923216d5df0fda10e/.devcontainer/Dockerfile#L8-L35

after each package is on a separate line use your editor to sort the packages. Future edits should remember to insert in sorted position. This will also identify double mentions of the same package, like zlib1g-dev in your list.


RUN apt-get update && apt-get install -y bzip2 build-essential libicu-dev pkg-config flex bison liblz4-dev libreadline-dev libzstd-dev libxslt1-dev zlib1g-dev zlib1g-dev libxml2-utils docbook-xsl xsltproc valgrind
ARG PG_VERSION
RUN ./bin/pgenv build $PG_VERSION && ./bin/pgenv switch $PG_VERSION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will want to add some makeflags here to speed up the build to use all available cores.

https://github.com/citusdata/citus/blob/117bd1d04fe55b4001012b3923216d5df0fda10e/.devcontainer/Dockerfile#L71

Comment on lines +19 to +22
RUN git clone https://github.com/citusdata/citus
WORKDIR /citus
ARG CITUS_VERSION
RUN git fetch && git checkout $CITUS_VERSION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use the same --branch ... --depth 1 as above

WORKDIR /

# install python3 as it's required by Citus test suite
RUN apt-get update && apt-get install -y python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not add this to the bigger install above?

@@ -0,0 +1,37 @@
FROM ubuntu:24.04

RUN apt-get update && apt-get install -y git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I would try to get all installs in a single run, unless there is a good reason to have them at different layers throughout the file

Comment on lines +51 to +54
docker cp "$container_name":/citus/src/test/regress/regression.diffs "$test_artifacts_dir/"
docker cp "$container_name":/citus/src/test/regress/regression.out "$test_artifacts_dir/"
docker cp "$container_name":/citus/src/test/regress/citus_valgrind_test_log.txt "$test_artifacts_dir/"
docker cp "$container_name":/citus/src/test/regress/gdb_core_backtraces "$test_artifacts_dir/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we have copied out the files, would it make sense to remove the container after the run? If there is reason to keep it lets comment why we are not removing the container.

especially when running many schedules it pollutes the docker environment of the eng quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants