Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codecs for dealing with pgsparse vector. #478

Merged
merged 1 commit into from
Feb 10, 2024
Merged

Add codecs for dealing with pgsparse vector. #478

merged 1 commit into from
Feb 10, 2024

Conversation

vpetrovykh
Copy link
Member

@vpetrovykh vpetrovykh commented Feb 9, 2024

Add codecs for converting to/from regular arrays to sparse vectors.

int16_t n_elem
int16_t dim
Py_ssize_t i
int32_t index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably make this a uint32_t, and do an unpack_uint32 when reading it out. Then negative numbers will be IndexErrors instead of wrapping from the back

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't quite understand some subtle interaction here, but pg_sparse code has this when receiving binary data:

	int16		n_elem;
	int16		dim;
	int16		unused;

	n_elem = pq_getmsgint(buf, sizeof(int16));
	dim = pq_getmsgint(buf, sizeof(int16));

Which, I think, will interpret the value as a signed int16, even though it will eventually assign that to an int32 n_elem in the Vector struct. Honestly I find this whole business of jumping between int16 and int32 a lot more odd than whether it's unsigned.

https://github.com/paradedb/paradedb/blob/585c5dae321fa99576f1399a9589887573e0c700/pg_sparse/src/svector.c#L423-L428

frb_read(buf, 2)

# Create a float array with size dim
val = ONE_EL_ARRAY * dim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should declare a float[:] array_view and assign val to it, like we do in pgvector_decode. This will use the buffer interface and allow us to avoid boxing and unboxing the floats.
(This is a https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe we want to do float[::1] array_view, which should be faster since it will rely on the backing array being contiguous (which it always will be).

@@ -798,6 +798,61 @@ cdef pgvector_decode(pgproto.CodecContext settings, FRBuffer *buf):
return val


cdef pgsparse_encode(pgproto.CodecContext settings, WriteBuffer buf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a typed memoryview fast path for array/ndarray, like we do for pgvector.

Don't know if there is a way to do that without the annoying code duplication, though

Add codecs for converting to/from regular arrays to sparse vectors.
@vpetrovykh vpetrovykh merged commit f75993d into master Feb 10, 2024
42 checks passed
@vpetrovykh vpetrovykh deleted the pgsparse branch February 10, 2024 09:11
vpetrovykh added a commit that referenced this pull request Feb 14, 2024
This reverts commit f75993d.

We're postponing adding pgsparse.
aljazerzen pushed a commit that referenced this pull request Feb 15, 2024
Add codecs for converting to/from regular arrays to sparse vectors.
vpetrovykh added a commit that referenced this pull request Feb 16, 2024
This reverts commit f75993d.

We're postponing adding pgsparse.
aljazerzen pushed a commit that referenced this pull request Feb 23, 2024
This reverts commit f75993d.

We're postponing adding pgsparse.
This was referenced Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants