Skip to content

Commit

Permalink
Treat boolean columns as numeric (#380)
Browse files Browse the repository at this point in the history
  • Loading branch information
lbittarello authored Aug 13, 2024
1 parent 24bd564 commit 2066f43
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 1 deletion.
5 changes: 4 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@
Changelog
=========

4.0.1 - 2024-06-25
4.1.0 - unreleased
------------------

**New feature:**

- Added a new function, :func:`tabmat.from_polars`, to convert a :class:`polars.DataFrame` into a :class:`tabmat.SplitMatrix`.

4.0.1 - 2024-06-25
------------------

**Other changes:**

- Removed reference to the ``.A`` attribute and replaced it with ``.toarray()``.
Expand Down
18 changes: 18 additions & 0 deletions src/tabmat/constructor.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@
pd = None


def _is_boolean(series, engine: str):
if engine == "pandas":
return pd.api.types.is_bool_dtype(series)
elif engine == "polars":
return series.dtype.is_(pl.Boolean)
else:
raise ValueError(f"Unknown engine: {engine}")


def _is_numeric(series, engine: str):
if engine == "pandas":
return pd.api.types.is_numeric_dtype(series)
Expand Down Expand Up @@ -154,6 +163,15 @@ def _from_dataframe(
mxcolidx += cat.shape[1]
elif cat_position == "end":
indices.append(np.arange(cat.shape[1]))
elif _is_boolean(coldata, engine):
if (coldata != False).mean() <= sparse_threshold: # noqa E712
sparse_dfidx.append(dfcolidx)
sparse_tmidx.append(mxcolidx)
mxcolidx += 1
else:
dense_dfidx.append(dfcolidx)
dense_tmidx.append(mxcolidx)
mxcolidx += 1
elif _is_numeric(coldata, engine):
if (coldata != 0).mean() <= sparse_threshold:
sparse_dfidx.append(dfcolidx)
Expand Down

0 comments on commit 2066f43

Please sign in to comment.