-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add codecs for dealing with pgsparse
vector.
#478
Conversation
int16_t n_elem | ||
int16_t dim | ||
Py_ssize_t i | ||
int32_t index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably make this a uint32_t, and do an unpack_uint32
when reading it out. Then negative numbers will be IndexError
s instead of wrapping from the back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't quite understand some subtle interaction here, but pg_sparse
code has this when receiving binary data:
int16 n_elem;
int16 dim;
int16 unused;
n_elem = pq_getmsgint(buf, sizeof(int16));
dim = pq_getmsgint(buf, sizeof(int16));
Which, I think, will interpret the value as a signed int16, even though it will eventually assign that to an int32 n_elem in the Vector struct. Honestly I find this whole business of jumping between int16 and int32 a lot more odd than whether it's unsigned.
frb_read(buf, 2) | ||
|
||
# Create a float array with size dim | ||
val = ONE_EL_ARRAY * dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should declare a float[:] array_view
and assign val
to it, like we do in pgvector_decode
. This will use the buffer interface and allow us to avoid boxing and unboxing the floats.
(This is a https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually maybe we want to do float[::1] array_view
, which should be faster since it will rely on the backing array being contiguous (which it always will be).
@@ -798,6 +798,61 @@ cdef pgvector_decode(pgproto.CodecContext settings, FRBuffer *buf): | |||
return val | |||
|
|||
|
|||
cdef pgsparse_encode(pgproto.CodecContext settings, WriteBuffer buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth having a typed memoryview fast path for array/ndarray, like we do for pgvector.
Don't know if there is a way to do that without the annoying code duplication, though
Add codecs for converting to/from regular arrays to sparse vectors.
Add codecs for converting to/from regular arrays to sparse vectors.