Supports list-like Python objects for Series comparison.#2022
Supports list-like Python objects for Series comparison.#2022itholic wants to merge 13 commits intodatabricks:masterfrom
Conversation
|
Found a bug: >>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10 False
20 True
30 False
dtype: boolwhereas: >>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0 False
1 False
10 False
2 False
30 False
20 False
dtype: bool |
Codecov Report
@@ Coverage Diff @@
## master #2022 +/- ##
==========================================
- Coverage 94.70% 93.18% -1.53%
==========================================
Files 54 54
Lines 11480 11393 -87
==========================================
- Hits 10872 10616 -256
- Misses 608 777 +169
Continue to review full report at Codecov.
|
|
Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon >>> pser + [3, 2, 1]
10 4
20 4
30 4
dtype: int64
>>> pser - [3, 2, 1]
10 -2
20 0
30 2
dtype: int64
>>> [3, 2, 1] + pser
10 4
20 4
30 4
dtype: int64 |
|
FYI: Seems like pandas has some inconsistent behavior as below. >>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> a.eq(b)
a True
b False
c False
d False
e False
dtype: bool
>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objectsHowever, in their API doc for I posted question to pandas repo, and will share if they response. |
Let me do this in the separated PR since there may be inconsistent cases like |
|
The |
| def __eq__(self, other): | ||
| if isinstance(other, (list, tuple)): | ||
| other = ks.Index(other, name=self.name) | ||
| # pandas always returns False for all items with dict and set. |
There was a problem hiding this comment.
I wonder why pandas behaves like this ..
|
|
||
| equals = eq | ||
|
|
||
| def __eq__(self, other): |
There was a problem hiding this comment.
Does Index support this case too? it might be best to move to base.py.
…parison ### What changes were proposed in this pull request? This PR proposes to implement `Series` comparison with list-like Python objects. Currently `Series` doesn't support the comparison to list-like Python objects such as `list`, `tuple`, `dict`, `set`. **Before** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] Traceback (most recent call last): ... TypeError: The operation can not be applied to list. ... ``` **After** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] 0 False 1 True 2 False dtype: bool ``` This was originally proposed in databricks/koalas#2022, and all reviews in origin PR has been resolved. ### Why are the changes needed? To follow pandas' behavior. ### Does this PR introduce _any_ user-facing change? Yes, the `Series` comparison with list-like Python objects now possible. ### How was this patch tested? Unittests Closes #34114 from itholic/SPARK-36438. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Currently Series doesn't support the comparison to list-like Python objects such as
list,tuple,dict,set.This PR proposes supporting them as well for Series comparison.
This should resolve #2018