Skip to content

Commit de29e7d

Browse files
craig[bot]mw5h
andcommitted
Merge #155998
155998: Create a vector indexing roachtest r=mw5h a=mw5h #### workload/vecann: add shared utility functions for vector workloads Add DeriveDistanceMetric and CalculateRecall functions to the vecann package to be shared between vecbench and the upcoming vector index roachtest. These functions extract the distance metric from dataset naming conventions and compute recall for search result validation. Refactor vecbench to use these shared implementations, removing duplicate code. Informs: #154590 Release note: None #### roachtest/vecindex: add a vector index test stub Add minimal test infrastructure for vector index roachtests: - Create vecindex.go with vecIndexOptions struct and empty registerVectorIndex function - Register the test in registry.go - Add helper functions for test naming, key generation, and distance operator mapping. - Stubs out the phases of the test. Add test configurations and registration for 5 vector index test variants: - vecindex/dbpedia-100k/nodes=3 (standard, no prefix) - vecindex/dbpedia-100k/nodes=3/prefix=3 (with prefix columns) - vecindex/dbpedia-1m/nodes=6 (large-scale) - vecindex/random-s/nodes=1 (local development) - vecindex/random-s/nodes=1/prefix=2 (local with prefix) Informs: #154590 Release note: None #### roachtest/vecindex: implement a test of backfill and merge Add a test phase that loads a dataset using a pool of workers and, at a test-specified percentage of table population, kicks off a create vector index for the data. This allows us to test both backfill (pre-create) and merge (post-create starting). Times are reported for both but are not used as a pass criteria (yet). Informs: #154590 Release note: None #### roachtest/vecindex: test recall of vector ann data This addition to the vecindex roachtest tests the recall of nearest neighbors from the test data provided with each data set. Each test has a configurable set of beam sizes to test and a minimum recall correctneess that is acceptable for each beam size. Tests that load multiple prefixes test each prefix. Informs: #154590 Release note: None #### roachtest/vecindex: add a concurrent reader/writer subtest This subtest spins up a configurable number of readers and writers to drive vector search load to the database. Each writer inserts rows from the first train data file in single row batches until it has inserted all of the rows in that file, at which point it switches into delete mode and starts deleting rows in 10 row batches. When all rows have been deleted, the writer once again becomes an inserter and the process repeats. Each reader randomly selects a beam size from the sizes configured for the test and then runs searches for random vectors in the test data for the dataset. The reader ignores rows inserted by the writer threads to avoid too heavily skewing results. To do this, it searches for more vectors than called for and then filters the output to remove vectors written by the insert workers. When the read worker exits, it validates its recall rate against the expected rate for the number of searches it performed. For multi-prefix tests, this subtest only reads and writes to the first prefix to ensure the maximum amount of contention. Fixes: #154590 Release note: None Co-authored-by: Matt White <[email protected]>
2 parents 4003030 + 21d0073 commit de29e7d

File tree

10 files changed

+851
-33
lines changed

10 files changed

+851
-33
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2024 The Cockroach Authors.
4+
#
5+
# Use of this software is governed by the CockroachDB Software License
6+
# included in the /LICENSE file.
7+
8+
9+
set -exuo pipefail
10+
11+
dir="$(dirname $(dirname $(dirname $(dirname "${0}"))))"
12+
13+
source "$dir/teamcity-support.sh" # For $root
14+
source "$dir/teamcity-bazel-support.sh" # For run_bazel
15+
16+
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e LITERAL_ARTIFACTS_DIR=$root/artifacts -e BUILD_VCS_NUMBER -e CLOUD -e COCKROACH_DEV_LICENSE -e TESTS -e COUNT -e GITHUB_API_TOKEN -e GITHUB_ORG -e GITHUB_REPO -e GOOGLE_EPHEMERAL_CREDENTIALS -e GOOGLE_KMS_KEY_A -e GOOGLE_KMS_KEY_B -e GOOGLE_CREDENTIALS_ASSUME_ROLE -e GOOGLE_SERVICE_ACCOUNT -e SLACK_TOKEN -e TC_BUILDTYPE_ID -e TC_BUILD_BRANCH -e TC_BUILD_ID -e TC_SERVER_URL -e SELECT_PROBABILITY -e COCKROACH_RANDOM_SEED -e ROACHTEST_ASSERTIONS_ENABLED_SEED -e ROACHTEST_FORCE_RUN_INVALID_RELEASE_BRANCH -e GRAFANA_SERVICE_ACCOUNT_JSON -e GRAFANA_SERVICE_ACCOUNT_AUDIENCE -e USE_SPOT -e SNOWFLAKE_USER -e SNOWFLAKE_PVT_KEY -e COCKROACH_EA_PROBABILITY" \
17+
run_bazel build/teamcity/cockroach/nightlies/vecindex_nightly_impl.sh
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2024 The Cockroach Authors.
4+
#
5+
# Use of this software is governed by the CockroachDB Software License
6+
# included in the /LICENSE file.
7+
8+
set -exuo pipefail
9+
10+
dir="$(dirname $(dirname $(dirname $(dirname "${0}"))))"
11+
set -a
12+
source "$dir/teamcity-support.sh"
13+
set +a
14+
15+
if [[ ! -f ~/.ssh/id_rsa.pub ]]; then
16+
ssh-keygen -q -C "roachtest-nightly-bazel $(date)" -N "" -f ~/.ssh/id_rsa
17+
fi
18+
19+
source $root/build/teamcity/util/roachtest_util.sh
20+
21+
artifacts=/artifacts
22+
23+
arch=amd64
24+
$root/build/teamcity/cockroach/nightlies/roachtest_compile_bits.sh $arch
25+
26+
build/teamcity-roachtest-invoke.sh \
27+
--suite vecindex \
28+
--cloud "gce" \
29+
--cluster-id "${TC_BUILD_ID}" \
30+
--artifacts=/artifacts \
31+
--artifacts-literal="${LITERAL_ARTIFACTS_DIR:-}" \
32+
--parallelism="${PARALLELISM}" \
33+
--cpu-quota="${CPUQUOTA}" \
34+
--use-spot="${USE_SPOT:-auto}" \
35+
--slack-token="${SLACK_TOKEN}" \

pkg/cmd/roachtest/registry/test_spec.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -459,12 +459,13 @@ const (
459459
Acceptance = "acceptance"
460460
Perturbation = "perturbation"
461461
MixedVersion = "mixedversion"
462+
VecIndex = "vecindex"
462463
)
463464

464465
var allSuites = []string{
465466
Nightly, Weekly, ReleaseQualification, ORM, Driver, Tool, Quick, Fixtures,
466467
Pebble, PebbleNightlyWrite, PebbleNightlyYCSB, PebbleNightlyYCSBRace, Roachtest, Acceptance,
467-
Perturbation, MixedVersion,
468+
Perturbation, MixedVersion, VecIndex,
468469
}
469470

470471
// SuiteSet represents a set of suites.

pkg/cmd/roachtest/registry/testdata/filter/errors

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ error: invalid owner "badowner"
66

77
filter suite=badsuite
88
----
9-
error: invalid suite "badsuite"; valid suites are nightly,weekly,release_qualification,orm,driver,tool,quick,fixtures,pebble,pebble_nightly_write,pebble_nightly_ycsb,pebble_nightly_ycsb_race,roachtest,acceptance,perturbation,mixedversion
9+
error: invalid suite "badsuite"; valid suites are nightly,weekly,release_qualification,orm,driver,tool,quick,fixtures,pebble,pebble_nightly_write,pebble_nightly_ycsb,pebble_nightly_ycsb_race,roachtest,acceptance,perturbation,mixedversion,vecindex
1010

1111
filter owner=badowner suite=badsuite
1212
----
13-
error: invalid suite "badsuite"; valid suites are nightly,weekly,release_qualification,orm,driver,tool,quick,fixtures,pebble,pebble_nightly_write,pebble_nightly_ycsb,pebble_nightly_ycsb_race,roachtest,acceptance,perturbation,mixedversion
13+
error: invalid suite "badsuite"; valid suites are nightly,weekly,release_qualification,orm,driver,tool,quick,fixtures,pebble,pebble_nightly_write,pebble_nightly_ycsb,pebble_nightly_ycsb_race,roachtest,acceptance,perturbation,mixedversion,vecindex
1414

1515
# Filters with one field leading to no matches.
1616

pkg/cmd/roachtest/tests/BUILD.bazel

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ go_library(
209209
"typeorm.go",
210210
"unoptimized_query_oracle.go",
211211
"validate_system_schema_after_version_upgrade.go",
212+
"vecindex.go",
212213
"versionupgrade.go",
213214
"ycsb.go",
214215
],
@@ -284,6 +285,8 @@ go_library(
284285
"//pkg/sql/pgwire/pgerror",
285286
"//pkg/sql/sem/tree",
286287
"//pkg/sql/ttl/ttlbase",
288+
"//pkg/sql/vecindex/cspann",
289+
"//pkg/sql/vecindex/vecpb",
287290
"//pkg/storage/enginepb",
288291
"//pkg/storage/fs",
289292
"//pkg/testutils",
@@ -310,6 +313,7 @@ go_library(
310313
"//pkg/util/syncutil",
311314
"//pkg/util/timeutil",
312315
"//pkg/util/uuid",
316+
"//pkg/util/vector",
313317
"//pkg/workload",
314318
"//pkg/workload/debug",
315319
"//pkg/workload/histogram",
@@ -318,6 +322,7 @@ go_library(
318322
"//pkg/workload/tpcc",
319323
"//pkg/workload/tpcds",
320324
"//pkg/workload/tpch",
325+
"//pkg/workload/vecann",
321326
"@com_github_aws_aws_sdk_go//aws",
322327
"@com_github_aws_aws_sdk_go_v2//aws",
323328
"@com_github_aws_aws_sdk_go_v2_config//:config",
@@ -340,8 +345,10 @@ go_library(
340345
"@com_github_google_go_cmp//cmp",
341346
"@com_github_google_pprof//profile",
342347
"@com_github_ibm_sarama//:sarama",
348+
"@com_github_jackc_pgconn//:pgconn",
343349
"@com_github_jackc_pgtype//:pgtype",
344350
"@com_github_jackc_pgx_v5//:pgx",
351+
"@com_github_jackc_pgx_v5//pgxpool",
345352
"@com_github_kr_pretty//:pretty",
346353
"@com_github_lib_pq//:pq",
347354
"@com_github_montanaflynn_stats//:stats",

pkg/cmd/roachtest/tests/registry.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ func RegisterTests(r registry.Registry) {
169169
registerTypeORM(r)
170170
registerUnoptimizedQueryOracle(r)
171171
registerValidateSystemSchemaAfterVersionUpgradeSeparateProcess(r)
172+
registerVectorIndex(r)
172173
registerYCSB(r)
173174
registerDeclarativeSchemaChangerJobCompatibilityInMixedVersion(r)
174175
registerMultiRegionMixedVersion(r)

0 commit comments

Comments
 (0)