Add codecs for dealing with `pgsparse` vector. #478

vpetrovykh · 2024-02-09T02:57:37Z

Add codecs for converting to/from regular arrays to sparse vectors.

msullivan · 2024-02-09T07:38:59Z

edgedb/protocol/codecs/codecs.pyx

+        int16_t n_elem
+        int16_t dim
+        Py_ssize_t i
+        int32_t index


Probably make this a uint32_t, and do an unpack_uint32 when reading it out. Then negative numbers will be IndexErrors instead of wrapping from the back

Maybe I don't quite understand some subtle interaction here, but pg_sparse code has this when receiving binary data:

int16 n_elem; int16 dim; int16 unused; n_elem = pq_getmsgint(buf, sizeof(int16)); dim = pq_getmsgint(buf, sizeof(int16));

Which, I think, will interpret the value as a signed int16, even though it will eventually assign that to an int32 n_elem in the Vector struct. Honestly I find this whole business of jumping between int16 and int32 a lot more odd than whether it's unsigned.

https://github.com/paradedb/paradedb/blob/585c5dae321fa99576f1399a9589887573e0c700/pg_sparse/src/svector.c#L423-L428

msullivan · 2024-02-09T07:51:15Z

edgedb/protocol/codecs/codecs.pyx

+    frb_read(buf, 2)
+
+    # Create a float array with size dim
+    val = ONE_EL_ARRAY * dim


We should declare a float[:] array_view and assign val to it, like we do in pgvector_decode. This will use the buffer interface and allow us to avoid boxing and unboxing the floats.
(This is a https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html)

Actually maybe we want to do float[::1] array_view, which should be faster since it will rely on the backing array being contiguous (which it always will be).

msullivan · 2024-02-09T08:10:52Z

edgedb/protocol/codecs/codecs.pyx

@@ -798,6 +798,61 @@ cdef pgvector_decode(pgproto.CodecContext settings, FRBuffer *buf):
    return val


+cdef pgsparse_encode(pgproto.CodecContext settings, WriteBuffer buf,


It might be worth having a typed memoryview fast path for array/ndarray, like we do for pgvector.

Don't know if there is a way to do that without the annoying code duplication, though

Add codecs for converting to/from regular arrays to sparse vectors.

This reverts commit f75993d. We're postponing adding pgsparse.

Add codecs for converting to/from regular arrays to sparse vectors.

This reverts commit f75993d. We're postponing adding pgsparse.

vpetrovykh requested review from msullivan and fantix February 9, 2024 02:57

msullivan approved these changes Feb 9, 2024

View reviewed changes

vpetrovykh force-pushed the pgsparse branch from d0a75f9 to 78ac908 Compare February 9, 2024 18:51

Add codecs for dealing with pgsparse vector.

6ff09d0

Add codecs for converting to/from regular arrays to sparse vectors.

vpetrovykh force-pushed the pgsparse branch from 78ac908 to 6ff09d0 Compare February 10, 2024 08:46

vpetrovykh merged commit f75993d into master Feb 10, 2024
42 checks passed

vpetrovykh deleted the pgsparse branch February 10, 2024 09:11

vpetrovykh added a commit that referenced this pull request Feb 14, 2024

Revert "Add codecs for dealing with pgsparse vector. (#478)"

ab4c082

This reverts commit f75993d. We're postponing adding pgsparse.

aljazerzen pushed a commit that referenced this pull request Feb 15, 2024

Add codecs for dealing with pgsparse vector. (#478)

11739ed

Add codecs for converting to/from regular arrays to sparse vectors.

vpetrovykh added a commit that referenced this pull request Feb 16, 2024

Revert "Add codecs for dealing with pgsparse vector. (#478)" (#481)

701447d

This reverts commit f75993d. We're postponing adding pgsparse.

aljazerzen pushed a commit that referenced this pull request Feb 23, 2024

Revert "Add codecs for dealing with pgsparse vector. (#478)" (#481)

e835fc4

This reverts commit f75993d. We're postponing adding pgsparse.

This was referenced Feb 23, 2024

v1.9.0 #482

Closed

edgedb-python 1.9.0 #483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add codecs for dealing with `pgsparse` vector. #478

Add codecs for dealing with `pgsparse` vector. #478

vpetrovykh commented Feb 9, 2024 •

edited

Loading

msullivan Feb 9, 2024

vpetrovykh Feb 9, 2024

msullivan Feb 9, 2024

msullivan Feb 9, 2024

msullivan Feb 9, 2024

		@@ -798,6 +798,61 @@ cdef pgvector_decode(pgproto.CodecContext settings, FRBuffer *buf):
		return val


		cdef pgsparse_encode(pgproto.CodecContext settings, WriteBuffer buf,

Add codecs for dealing with pgsparse vector. #478

Add codecs for dealing with pgsparse vector. #478

Conversation

vpetrovykh commented Feb 9, 2024 • edited Loading

msullivan Feb 9, 2024

Choose a reason for hiding this comment

vpetrovykh Feb 9, 2024

Choose a reason for hiding this comment

msullivan Feb 9, 2024

Choose a reason for hiding this comment

msullivan Feb 9, 2024

Choose a reason for hiding this comment

msullivan Feb 9, 2024

Choose a reason for hiding this comment

Add codecs for dealing with `pgsparse` vector. #478

Add codecs for dealing with `pgsparse` vector. #478

vpetrovykh commented Feb 9, 2024 •

edited

Loading