Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabmat v4 alpha #286

Merged
merged 57 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
31ca046
Minimal implementation (tests green)
stanmart Jul 18, 2023
24525c8
Remove sum method and rely on np.sum
stanmart Jul 19, 2023
1e31779
Force DenseMatrix to always be 2-dimensional
stanmart Jul 19, 2023
755e634
Add __repr__ and __str__ methods
stanmart Jul 19, 2023
0560529
Fix as_mx
stanmart Jul 20, 2023
80143ef
Fix ufunc return value
stanmart Jul 20, 2023
34d6f37
Wrap SparseMatrix, too
stanmart Jul 20, 2023
97349f4
Demo of how the ufunc interface can be implemented
stanmart Jul 20, 2023
e86c005
Do not subclass csc_matrix
stanmart Jul 20, 2023
9f5582f
Merge branch 'main' into densematrix-refactor
stanmart Jul 20, 2023
5a88fbc
Demonstrate binary ufuncs for sparse
stanmart Jul 21, 2023
44e1970
Add tocsc method
stanmart Jul 21, 2023
ffe918e
Fix type checks
stanmart Jul 21, 2023
3f94e4d
Minor improvements
stanmart Jul 21, 2023
9f943d8
ufunc support for categoricals
stanmart Jul 21, 2023
34cc13c
Remove __array_ufunc__ interface
stanmart Jul 25, 2023
a396a09
Remove numpy operator mixin
stanmart Jul 25, 2023
e046dcd
Add hstack function
stanmart Jul 26, 2023
e7f216c
Add method for unpacking underlying array
stanmart Jul 26, 2023
c66e026
Add __matmul__ methods to SparseMatrix
stanmart Jul 26, 2023
38813e7
Stricter and more consistent indexing
stanmart Jul 27, 2023
78d0278
Be consistent when instantiating from 1d arrays
stanmart Aug 9, 2023
d3d3d82
Merge branch 'main' into tabmat-v4
stanmart Aug 9, 2023
e042ce3
Add column name metadata to `tabmat` matrices (#278)
stanmart Aug 15, 2023
a384ee6
Matrices from formulas (#267)
MatthiasSchmidtblaicherQC Aug 15, 2023
3bec539
Apply Matthias' suggestions
stanmart Aug 15, 2023
a830107
Allow missing values in `CategoricalMatrix` (#281)
stanmart Aug 17, 2023
41ad4af
Add changelog entry
stanmart Aug 17, 2023
ab53673
Correctly create missing category from model_spec (#297)
stanmart Aug 28, 2023
617b564
Merge remote-tracking branch 'origin/main' into tabmat-v4
MarcAntoineSchmidtQC Oct 16, 2023
2bc0463
pyupgrade 3.9
MarcAntoineSchmidtQC Oct 16, 2023
f644e4f
Merge branch 'main' into tabmat-v4
MatthiasSchmidtblaicherQC Dec 8, 2023
96e79df
make ruff and mypy happy
MatthiasSchmidtblaicherQC Dec 8, 2023
4c5ad31
bump minimum formulaic version (stateful transforms)
MatthiasSchmidtblaicherQC Dec 8, 2023
84c6a52
Merge remote-tracking branch 'origin/main' into tabmat-v4
MarcAntoineSchmidtQC Dec 12, 2023
3ee7dbf
add test case with custom cat format
MatthiasSchmidtblaicherQC Jan 11, 2024
2171bb3
Merge branch 'main' into tabmat-v4
MatthiasSchmidtblaicherQC Jan 11, 2024
e4baafe
pin formulaic minimum version to 0.6 (#340)
MatthiasSchmidtblaicherQC Jan 15, 2024
249b5e5
cosmetics
MatthiasSchmidtblaicherQC Jan 23, 2024
bd20e0d
Raise for unseen categories when materializing from an existing `Mode…
stanmart Jan 25, 2024
54e937c
consistent tense
MatthiasSchmidtblaicherQC Jan 30, 2024
89ffd02
typo
MatthiasSchmidtblaicherQC Jan 30, 2024
f5df2bd
slightly improve wording
MatthiasSchmidtblaicherQC Jan 30, 2024
4d621d6
Describe breaking change
MatthiasSchmidtblaicherQC Feb 6, 2024
2123a47
improve wording
MatthiasSchmidtblaicherQC Feb 7, 2024
aec685f
review comments
MatthiasSchmidtblaicherQC Feb 7, 2024
0772aef
Merge branch 'main' into tabmat-v4
MatthiasSchmidtblaicherQC Mar 25, 2024
953d215
Merge branch 'main' into tabmat-v4
MatthiasSchmidtblaicherQC Apr 2, 2024
c67ff7c
add change from #356
MatthiasSchmidtblaicherQC Apr 11, 2024
1f1014b
fix
MatthiasSchmidtblaicherQC Apr 11, 2024
34360ad
Merge branch 'main' into tabmat-v4
MatthiasSchmidtblaicherQC Apr 11, 2024
d64fd1e
set default context to None
MatthiasSchmidtblaicherQC Apr 15, 2024
7ac7c60
add scope to other test, too
MatthiasSchmidtblaicherQC Apr 15, 2024
99f80fe
tiny docstring cosmetics
MatthiasSchmidtblaicherQC Apr 22, 2024
926d9ef
remove duplicate . [skip-ci]
MatthiasSchmidtblaicherQC Apr 22, 2024
2838629
more docstring formatting
MatthiasSchmidtblaicherQC Apr 22, 2024
7483518
update changelog for release and add Sphinx cross references where mi…
MatthiasSchmidtblaicherQC Apr 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ Changelog
Unreleased
----------

**New features:**

- Added column name and term name metadata to ``MatrixBase`` objects. These are automatically populated when initializing a ``MatrixBase`` from a ``pandas.DataFrame``. In addition, they can be accessed and modified via the ``column_names`` and ``term_names`` properties.
- Added a formula interface for creating tabmat matrices from pandas data frames. See :func:`tabmat.from_formula` for details.
- Added support for missing values in ``CategoricalMatrix`` by either creating a separate category for them or treating them as all-zero rows.
- Added support for handling missing categorical values in pandas data frames.

**Bug fix:**

- Added cython compiler directive legacy_implicit_noexcept = True to fix performance regression with cython 3.
Expand Down
1 change: 1 addition & 0 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ requirements:
- {{ pin_compatible('numpy') }}
- pandas
- scipy
- formulaic>=0.6

test:
requires:
Expand Down
1 change: 1 addition & 0 deletions environment-win.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ channels:
dependencies:
- libblas>=0=*mkl
- pandas
- formulaic>=0.6

# development tools
- click
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ channels:
- nodefaults
dependencies:
- pandas
- formulaic>=0.6

# development tools
- click
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ python_version = '3.9'

[tool.cibuildwheel]
skip = [
"cp36-*",
"*-win32",
"*-manylinux_i686",
"pp*",
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
print(f"Debug Build: {debug_build}")

if sys.platform == "win32":
allocator_libs = []
allocator_libs = [] # type: ignore
extra_compile_args = ["/openmp", "/O2"]
extra_link_args = ["/openmp"]
# make sure we can find xsimd headers
Expand Down Expand Up @@ -157,7 +157,7 @@
],
package_dir={"": "src"},
packages=find_packages(where="src"),
install_requires=["numpy", "pandas", "scipy"],
install_requires=["numpy", "pandas", "scipy", "formulaic>=0.6"],
python_requires=">=3.9",
ext_modules=cythonize(
ext_modules,
Expand Down
7 changes: 5 additions & 2 deletions src/tabmat/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import importlib.metadata

from .categorical_matrix import CategoricalMatrix
from .constructor import from_csc, from_pandas
from .constructor import from_csc, from_formula, from_pandas
from .dense_matrix import DenseMatrix
from .matrix_base import MatrixBase
from .sparse_matrix import SparseMatrix
from .split_matrix import SplitMatrix
from .split_matrix import SplitMatrix, as_tabmat, hstack
from .standardized_mat import StandardizedMatrix

try:
Expand All @@ -21,5 +21,8 @@
"SplitMatrix",
"CategoricalMatrix",
"from_csc",
"from_formula",
"from_pandas",
"as_tabmat",
"hstack",
]
Loading
Loading