Skip to content

CI Linting #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ build --protocopt=--experimental_allow_proto3_optional
# parameter 'user_link_flags' is deprecated and will be removed soon.
# It may be temporarily re-enabled by setting --incompatible_require_linker_input_cc_api=false
build --incompatible_require_linker_input_cc_api=false

37 changes: 37 additions & 0 deletions .github/reusable-build/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Resusable steps to build data-validation

inputs:
python-version:
description: 'Python version'
required: true
upload-artifact:
description: 'Should upload build artifact or not'
default: false

runs:
using: 'composite'
steps:
- name: Set up Python ${{ inputs.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}

- name: Build the package for Python ${{ inputs.python-version }}
shell: bash
run: |
version="${{ matrix.python-version }}"
docker compose run -e PYTHON_VERSION=$(echo "$version" | sed 's/\.//') manylinux2010

- name: Upload wheel artifact for Python ${{ matrix.python-version }}
if: ${{ inputs.upload-artifact == 'true' }}
uses: actions/upload-artifact@v3
with:
name: data-validation-wheel-py${{ matrix.python-version }}
path: dist/*.whl

- name: Install built wheel
shell: bash
run: |
pip install twine
twine check dist/*
pip install dist/*.whl
54 changes: 54 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Build

on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Build data-validation
id: build-data-validation
uses: ./.github/reusable-build
with:
python-version: ${{ matrix.python-version }}
upload-artifact: true

upload_to_pypi:
name: Upload to PyPI
runs-on: ubuntu-latest
if: (github.event_name == 'release' && startsWith(github.ref, 'refs/tags')) || (github.event_name == 'workflow_dispatch')
needs: [build]
environment:
name: pypi
url: https://pypi.org/p/tensorflow-data-validation/
permissions:
id-token: write
steps:
- name: Retrieve wheels
uses: actions/[email protected]
with:
merge-multiple: true
path: wheels

- name: List the build artifacts
run: |
ls -lAs wheels/

- name: Upload to PyPI
uses: pypa/gh-action-pypi-publish@release/v1.9
with:
packages_dir: wheels/
21 changes: 21 additions & 0 deletions .github/workflows/ci-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: pre-commit

on:
pull_request:
push:
branches: [master]

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/[email protected]
with:
# Ensure the full history is fetched
# This is required to run pre-commit on a specific set of commits
# TODO: Remove this when all the pre-commit issues are fixed
fetch-depth: 0
- uses: actions/[email protected]
with:
python-version: 3.13
- uses: pre-commit/[email protected]
37 changes: 37 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Test

on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Build data-validation
id: build-data-validation
uses: ./.github/reusable-build
with:
python-version: ${{ matrix.python-version }}

- name: Install test dependencies
run: |
pip install pytest scikit-learn scipy

- name: Run Test
run: |
rm -rf bazel-*
# run tests
pytest -vv
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,4 @@ dmypy.json
.pyre/

# pb2.py files
*_pb2.py
*_pb2.py
39 changes: 39 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# pre-commit is a tool to perform a predefined set of tasks manually and/or
# automatically before git commits are made.
#
# Config reference: https://pre-commit.com/#pre-commit-configyaml---top-level
#
# Common tasks
#
# - Register git hooks: pre-commit install --install-hooks
# - Run on all files: pre-commit run --all-files
#
# These pre-commit hooks are run as CI.
#
# NOTE: if it can be avoided, add configs/args in pyproject.toml or below instead of creating a new `.config.file`.
# https://pre-commit.ci/#configuration
ci:
autoupdate_schedule: monthly
autofix_commit_msg: |
[pre-commit.ci] Apply automatic pre-commit fixes

repos:
# general
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: end-of-file-fixer
exclude: '\.svg$'
- id: trailing-whitespace
exclude: '\.svg$'
- id: check-json
- id: check-yaml
args: [--allow-multiple-documents, --unsafe]
- id: check-toml

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.6
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,4 +236,3 @@ tag.
* [TensorFlow Data Validation PyPI](https://pypi.org/project/tensorflow-data-validation/)
* [TensorFlow Data Validation Paper](https://mlsys.org/Conferences/2019/doc/2019/167.pdf)
* [TensorFlow Data Validation Slides](https://conf.slac.stanford.edu/xldb2018/sites/xldb2018.conf.slac.stanford.edu/files/Tues_09.45_NeoklisPolyzotis_Data%20Analysis%20and%20Validation%20(1).pdf)

2 changes: 0 additions & 2 deletions g3doc/custom_data_validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,3 @@ See the
[documentation](https://github.com/tensorflow/data-validation/blob/master/tensorflow_data_validation/anomalies/proto/custom_validation_config.proto)
in the `CustomValidationConfig` proto for example
configurations.


127 changes: 127 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,130 @@ requires = [
# Required for using org_tensorflow bazel repository.
"numpy~=1.22.0",
]

[tool.ruff]
line-length = 88

[tool.ruff.lint]
select = [
# pycodestyle
"E",
"W",
# Pyflakes
"F",
# pyupgrade
"UP",
# flake8-bugbear
"B",
# flake8-simplify
"SIM",
# isort
"I",
# pep8 naming
"N",
# pydocstyle
"D",
# annotations
"ANN",
# debugger
"T10",
# flake8-pytest
"PT",
# flake8-return
"RET",
# flake8-unused-arguments
"ARG",
# flake8-fixme
"FIX",
# flake8-eradicate
"ERA",
# pandas-vet
"PD",
# numpy-specific rules
"NPY",
]

ignore = [
"D104", # Missing docstring in public package
"D100", # Missing docstring in public module
"D211", # No blank line before class
"PD901", # Avoid using 'df' for pandas dataframes. Perfectly fine in functions with limited scope
"ANN201", # Missing return type annotation for public function (makes no sense for NoneType return types...)
"ANN101", # Missing type annotation for `self`
"ANN204", # Missing return type annotation for special method
"ANN002", # Missing type annotation for `*args`
"ANN003", # Missing type annotation for `**kwargs`
"D105", # Missing docstring in magic method
"D203", # 1 blank line before after class docstring
"D204", # 1 blank line required after class docstring
"D413", # 1 blank line after parameters
"SIM108", # Simplify if/else to one line; not always clearer
"D206", # Docstrings should be indented with spaces; unnecessary when running ruff-format
"E501", # Line length too long; unnecessary when running ruff-format
"W191", # Indentation contains tabs; unnecessary when running ruff-format

# REMOVE AFTER FIXING
# ANN rules (flake8-annotations)
"ANN001", # Missing type annotation for function argument `args`
"ANN102", # Missing type annotation for `cls` in classmethod
"ANN202", # Missing Missing return type annotation for private function
"ANN205", # Missing return type annotation for staticmethod
"ANN206", # Missing return type annotation for classmethod
"ANN401", # Dynamically typed expressions (typing.Any) are disallowed in `domain`
# ARG rules (flake8-unused-arguments)
"ARG001", # Unused function argument
"ARG002", # Unused method argument
# B rules (flake8-bugbear)
"B005", # Using `.strip()` with multi-character strings is misleading
"B007", # Loop control variable not used within loop body
"B008", # Do not perform function call in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
"B904", # Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
# D rules (pydocstyle)
"D101", # Missing docstring in public class
"D102", # Missing docstring in public method
"D103", # Missing docstring in public function
"D107", # Missing docstring in `__init__`,
"D401", # First line of docstring should be in imperative mood: "Loads the vocabulary from the specified path."
"D404", # First word of the docstring should not be "This"
"D417", # Missing argument descriptions in the docstring
# E rules (pycodestyle)
"E731", # Do not assign a `lambda` expression, use a `def`
"E741", # Ambiguous variable name
# ERA rules (flake8-eradicate)
"ERA001", # Found commented-out code
# F rules (Pyflakes)
"F821", # Undefined name
# FIX rules (flake8-fixme)
"FIX002", # Line contains TODO, consider resolving the issue
# N rules (pep8-naming)
"N802", # Function name should be lowercase,
# NPY rules (numpy-specific rules)
"NPY002", # Replace legacy
# PD rules (pandas-vet)
"PD002", # `inplace=True` should be avoided; it has inconsistent behavior
"PD003", # `.isna` is preferred to `.isnull`; functionality is equivalent
"PD011", # Use `.to_numpy()` instead of `.values`
"PD015", # Use `.merge` method instead of `pd.merge` function
# PT rules (flake8-pytest-style)
"PT009", # Use a regular `assert` instead of unittest-style `assertEqual`
"PT018", # Assertion should be broken down into multiple parts
"PT027", # Use `pytest.raises` instead of unittest-style `assertRaisesRegex`
# RET rules (flake8-return)
"RET504", # Unnecessary assignment to variable before `return` statement
"RET505", # Unnecessary `elif` after `return` statement
# SIM rules (flake8-simplify)
"SIM101", # Multiple `isinstance` calls for `maybe_collection`, merge into a single call
"SIM102", # Use a single `if` statement instead of nested `if` statements
"SIM103", # Return the condition directly
"SIM105", # Use `contextlib.suppress(...)` instead of `try`-`except`-`pass`
"SIM117", # Use a single `with` statement with multiple contexts instead of nested `with` statements
"SIM211", # Use `not ...` instead of `False if ... else True`
# UP rules (pyupgrade)
"UP008", # Use `super()` instead of `super(__class__, self)`
"UP028", # Replace `yield` over `for` loop with `yield from`
"UP031", # Use format specifiers instead of percent format
]


[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["F401"]
Loading