-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The in
operator in ResultSetCollection.append
causes problems with numpy arrays
#1049
base: master
Are you sure you want to change the base?
Conversation
in
operator in ResultSetCollection
causes problems with numpy arraysin
operator in ResultSetCollection
causes problems with numpy arrays
in
operator in ResultSetCollection
causes problems with numpy arraysin
operator in ResultSetCollection
causes problems with numpy arrays
in
operator in ResultSetCollection
causes problems with numpy arraysin
operator in ResultSetCollection.append
causes problems with numpy arrays
for idx in reversed( | ||
[ | ||
i | ||
for i, item in enumerate(self._result_sets) | ||
if all(starmap(_eq, zip(_results(result), _results(item)))) | ||
] | ||
): | ||
self._result_sets.pop(idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments to explain what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explain here and let you comment on it first. What this is doing is replacing if result in self._result_sets: self._result_sets.remove(result)
which simply removes all items equal to result
from the result set list. I'm not actually clear on why this is done, but the in
operator requires a bool to be returned by the ==
operation. That doesn't happen with np.array
, pd.Series
, etc.
What the new code does is iterates over result
and item
and compares each item individually using the _eq
function which returns a bool for both atomic values and array comparisons. If all of the _eq
operations return true, the index of that item is added to the resulting list. The final list of indexes is reversed so that they can be popped from self._result_sets
without messing up the indexes of the remaining items.
def test_result_set_collection_append_numpy(): | ||
try: | ||
import numpy as np | ||
|
||
a1 = (np.array([1, 2]),) | ||
a2 = (np.array([3, 4]),) | ||
|
||
collection = ResultSetCollection() | ||
collection.append(a1) | ||
collection.append(a2) | ||
|
||
assert len(collection._result_sets) == 2 | ||
assert collection._result_sets[0] is a1 | ||
assert collection._result_sets[1] is a2 | ||
|
||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add numpy as a dev dependency and get rid of the ImportError:
Line 33 in 7e02910
DEV = [ |
def test_result_set_collection_append(): | ||
collection = ResultSetCollection() | ||
collection.append(1) | ||
collection.append(2) | ||
collection.append((1,)) | ||
collection.append((2,)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the original test and add a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this was changed was because the code in append
needs the values to be iterable objects. This is more consistent with what the result objects that get appended from the connection are. I'm not sure there is a case in real-world code where the value will be atomic like the original test.
def test_result_set_collection_iterate(): | ||
collection = ResultSetCollection() | ||
collection.append(1) | ||
collection.append(2) | ||
collection.append((1,)) | ||
collection.append((2,)) | ||
|
||
assert list(collection) == [1, 2] | ||
assert list(collection) == [(1,), (2,)] | ||
|
||
|
||
def test_result_set_collection_is_last(): | ||
collection = ResultSetCollection() | ||
first, second = object(), object() | ||
first, second = (object(),), (object(),) | ||
collection.append(first) | ||
|
||
assert len(collection) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, keep old tests, add new ones
Describe your changes
When using the
in
operator to test for equal results that contain numpy arrays, you will get the following error:This is due to the fact that
np.array == np.array
returns anp.array
not a bool. The SingleStoreDB database uses numpy arrays for vector values.Issue number
None
Checklist before requesting a review
pkgmt format
📚 Documentation preview 📚: https://jupysql--1049.org.readthedocs.build/en/1049/