-
Notifications
You must be signed in to change notification settings - Fork 183
Introducing agents instructions for the repo #2695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# AGENTS.md - CI/CD Infrastructure (.ci/) | ||
|
||
## Purpose | ||
CI/CD infrastructure for building, testing, and releasing Intel Extension for Scikit-learn across multiple platforms. | ||
|
||
## Key Files for Agents | ||
- `.ci/pipeline/ci.yml` - Main CI orchestrator | ||
- `.ci/pipeline/build-and-test-*.yml` - Platform-specific builds | ||
- `.ci/pipeline/linting.yml` - Code quality enforcement | ||
- `.ci/scripts/` - Automation utilities | ||
|
||
## Platform Support | ||
- **Linux/macOS**: Uses conda, Intel DPC++ compiler, MPI support | ||
- **Windows**: Visual Studio 2022, conda-forge packages | ||
- **GPU**: Intel GPU support via DPC++/SYCL (dpctl, dpnp packages) | ||
|
||
## Quality Gates | ||
- **Linting**: black, isort, clang-format, numpydoc validation | ||
- **Testing**: pytest with cross-platform compatibility | ||
- **Coverage**: codecov integration with threshold enforcement | ||
|
||
## Build Dependencies | ||
- **oneDAL**: Downloads nightly builds from upstream oneDAL repo | ||
- **Python**: Matrix testing across Python 3.9-3.13 (verified in .ci/pipeline/ci.yml) | ||
- **sklearn**: Multiple version compatibility (1.0-1.7) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is bound to get outdated. Perhaps could rephrase it as "all sklearn versions beyond 1.0", or "last 3 releases of sklearn", or something like that. |
||
- **GPU Libraries**: dpctl, dpnp for Intel GPU acceleration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing torch. |
||
|
||
## Release Process | ||
- **Automated**: Dynamic matrix generation for PyPI/conda releases | ||
- **Multi-channel**: Both PyPI wheels and conda packages | ||
- **Quality**: Automated sklearn compatibility testing before release | ||
|
||
## Local Development Setup | ||
|
||
### Quality Tools Configuration (from pyproject.toml) | ||
```bash | ||
# Code formatting | ||
black --line-length 90 <files> | ||
isort --profile black --line-length 90 <files> | ||
|
||
# C++ formatting | ||
clang-format --style=file <cpp_files> | ||
|
||
# Documentation validation | ||
numpydoc-validation <python_files> | ||
``` | ||
|
||
### Build Dependencies Download | ||
```bash | ||
# oneDAL nightly builds (from .github/workflows/ci.yml) | ||
# Automatically downloads from uxlfoundation/oneDAL nightly builds | ||
# Sets DALROOT to downloaded oneDAL location | ||
``` | ||
|
||
### Platform-Specific Build Commands | ||
|
||
**Linux/macOS** (from .ci/pipeline/build-and-test-lnx.yml): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't support macOS. |
||
```bash | ||
# Install DPC++ compiler | ||
bash .ci/scripts/install_dpcpp.sh | ||
|
||
# Set up environment | ||
source /opt/intel/oneapi/compiler/latest/env/vars.sh | ||
export DPCPPROOT=/opt/intel/oneapi/compiler/latest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not needed. |
||
|
||
# Create conda environment | ||
conda create -q -y -n CB -c conda-forge python=3.11 mpich pyyaml | ||
conda activate CB | ||
pip install -r dependencies-dev | ||
|
||
# Build | ||
./conda-recipe/build.sh | ||
``` | ||
|
||
**Windows** (from .ci/pipeline/build-and-test-win.yml): | ||
```batch | ||
# Visual Studio setup | ||
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall" x64 | ||
|
||
# Build | ||
call conda-recipe\bld.bat | ||
``` | ||
|
||
### Environment Variables for Development | ||
```bash | ||
# From setup.py and CI scripts | ||
export DALROOT=/path/to/onedal # Required | ||
export DPCPPROOT=/opt/intel/oneapi/compiler/latest # For GPU support | ||
export MPIROOT=/path/to/mpi # For distributed computing | ||
export NO_DPC=1 # Disable GPU support | ||
export NO_DIST=1 # Disable distributed computing | ||
export SKLEARNEX_VERSION=2024.7.0 # Version override | ||
export MAKEFLAGS="-j$(nproc)" # Parallel build | ||
``` | ||
|
||
## For AI Agents | ||
- Follow established build templates | ||
- Respect quality gates (linting, testing, coverage) | ||
- Use platform-specific configurations appropriately | ||
- Test across supported Python/sklearn version combinations | ||
- Set required environment variables (DALROOT, DPCPPROOT, MPIROOT) | ||
- Use conda environments to avoid dependency conflicts | ||
- Run pre-commit hooks before submitting changes |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Build Configuration Files | ||
|
||
## Core Build Files | ||
- `setup.py`: Main build script (500+ lines, complex configuration) | ||
- `pyproject.toml`: Python project metadata + linting configuration | ||
- `dependencies-dev`: Build-time dependencies (Cython, numpy, pybind11, cmake) | ||
- `requirements-test.txt`: Test dependencies with version constraints | ||
- `conda-recipe/meta.yaml`: Conda package build configuration | ||
|
||
## Environment Variables (Critical) | ||
```bash | ||
# MANDATORY for building | ||
export DALROOT=/path/to/onedal # oneDAL installation path (required) | ||
|
||
# OPTIONAL but commonly needed | ||
export MPIROOT=/path/to/mpi # MPI for distributed features | ||
export NO_DIST=1 # Disable distributed mode | ||
export NO_DPC=1 # Disable GPU/SYCL support | ||
export NO_STREAM=1 # Disable streaming mode | ||
export DEBUG_BUILD=1 # Debug symbols + no optimization | ||
export MAKEFLAGS=-j$(nproc) # Parallel build threads | ||
``` | ||
|
||
## Build Process (4 Stages) | ||
1. **Code Generation**: oneDAL C++ headers → Python/Cython sources | ||
2. **oneDAL Bindings**: cmake + pybind11 compilation | ||
3. **Cython Processing**: .pyx files → C++ sources | ||
4. **Final Compilation**: Link everything into Python extensions | ||
|
||
## Dependencies | ||
**Build Dependencies (dependencies-dev):** | ||
- Cython==3.1.1 (exact version required) | ||
- numpy>=2.0 (version varies by Python version) | ||
- pybind11==2.13.6 | ||
- cmake==4.0.2 | ||
- setuptools==79.0.1 | ||
|
||
**Runtime Dependencies:** | ||
- Intel oneDAL 2021.1+ (backwards compatible) | ||
- numpy (version-specific, see requirements-test.txt) | ||
- scikit-learn 1.0-1.7 (see compatibility matrix) | ||
|
||
## Build Commands | ||
```bash | ||
# Development build (RECOMMENDED) | ||
python setup.py develop # Creates .egg-link, editable | ||
|
||
# Production builds | ||
python setup.py install # Full install | ||
python setup.py build_ext --inplace --force # Extensions only | ||
|
||
# Special flags (Linux) | ||
python setup.py build --abs-rpath # Absolute RPATH for custom oneDAL | ||
|
||
# Conda build | ||
conda build . # Uses conda-recipe/meta.yaml | ||
``` | ||
|
||
## Common Build Issues | ||
```bash | ||
# oneDAL not found | ||
RuntimeError: "Not set DALROOT variable" | ||
→ Solution: export DALROOT=/path/to/onedal | ||
|
||
# MPI required but missing | ||
ValueError: "'MPIROOT' is not set, cannot build with distributed mode" | ||
→ Solution: export NO_DIST=1 or set MPIROOT | ||
|
||
# Cython version mismatch | ||
→ Solution: pip install Cython==3.1.1 (exact version) | ||
|
||
# Linking issues (Linux) | ||
→ Solution: Use --abs-rpath flag | ||
``` | ||
|
||
## CI/CD Configuration | ||
- **GitHub Actions**: `.github/workflows/ci.yml` | ||
- **Azure DevOps**: `.ci/pipeline/ci.yml` (main CI system) | ||
- **Pre-commit**: `.pre-commit-config.yaml` (code quality) | ||
|
||
Build timeouts: 120 minutes in CI (can be slow due to oneDAL compilation) | ||
|
||
## Related Instructions | ||
- `general.instructions.md` - Quick start build commands | ||
- `src.instructions.md` - C++/Cython build details | ||
- `tests.instructions.md` - Testing after successful builds | ||
|
||
For platform-specific build details, see `.ci/AGENTS.md` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# daal4py/* - Direct oneDAL Python Bindings | ||
|
||
## Purpose | ||
Direct Python bindings to Intel oneDAL for maximum performance and model builders for XGBoost/LightGBM conversion. | ||
|
||
## Three Sub-APIs | ||
1. **Native oneDAL**: `import daal4py as d4p` - Direct algorithm access | ||
2. **sklearn-compatible**: `from daal4py.sklearn import ...` - sklearn API with oneDAL backend | ||
3. **Model Builders**: `from daal4py.mb import convert_model` - External model conversion | ||
|
||
## API Overview | ||
|
||
For detailed native oneDAL patterns and model builders, see [daal4py/AGENTS.md](../daal4py/AGENTS.md). | ||
|
||
**Basic Pattern**: | ||
```python | ||
import daal4py as d4p | ||
algorithm = d4p.dbscan(epsilon=0.5, minObservations=5) | ||
result = algorithm.compute(data) | ||
``` | ||
|
||
**Model Conversion**: | ||
```python | ||
from daal4py.mb import convert_model | ||
d4p_model = convert_model(xgb_model) # 10-100x faster inference | ||
``` | ||
|
||
## Testing | ||
```bash | ||
# Native daal4py tests | ||
pytest --verbose --pyargs daal4py | ||
pytest tests/test_daal4py_examples.py # Native API examples | ||
pytest tests/test_model_builders.py # Model conversion tests | ||
|
||
# sklearn compatibility in daal4py | ||
pytest daal4py/sklearn/tests/ # sklearn-compatible API | ||
``` | ||
|
||
## Development Notes | ||
- Native API provides direct oneDAL algorithm access (fastest performance) | ||
- sklearn-compatible API in `daal4py/sklearn/` maintains full sklearn compatibility | ||
- Model builders enable oneDAL inference for models trained with other frameworks | ||
|
||
## Related Instructions | ||
- `general.instructions.md` - Repository setup and build requirements | ||
- `onedal.instructions.md` - Low-level backend that daal4py wraps | ||
- `src.instructions.md` - Core C++/Cython implementation details | ||
- `tests.instructions.md` - Testing native oneDAL algorithms | ||
- See `daal4py/AGENTS.md` for detailed algorithm usage patterns |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# General Repository Instructions - Intel Extension for Scikit-learn | ||
|
||
## Repository Overview | ||
|
||
**Intel Extension for Scikit-learn** (scikit-learn-intelex) accelerates scikit-learn by 10-100x using Intel oneDAL. Zero code changes required for existing sklearn applications. | ||
|
||
- **Languages**: Python (70%), C++ (25%), Cython (5%) | ||
- **Architecture**: 4-layer system (sklearnex → daal4py → onedal → Intel oneDAL C++) | ||
- **Platforms**: Linux, Windows, macOS; CPU (x86_64, ARM), GPU (Intel via SYCL) | ||
- **Python**: 3.9-3.13 supported | ||
|
||
## Quick Start | ||
|
||
**Build Setup**: See [build-config.instructions.md](build-config.instructions.md) for complete details. | ||
```bash | ||
export DALROOT=/path/to/onedal | ||
python setup.py develop | ||
``` | ||
|
||
**Testing**: See [tests.instructions.md](tests.instructions.md) for comprehensive testing. | ||
```bash | ||
pytest --verbose --pyargs sklearnex | ||
``` | ||
|
||
**Code Quality**: | ||
```bash | ||
pre-commit run --all-files | ||
``` | ||
|
||
## Code Standards | ||
|
||
- **Python**: Black (line-length=90) + isort | ||
- **C++**: clang-format version ≥14 | ||
- **Commits**: Must be signed-off (`git commit -s`) | ||
- **Documentation**: numpydoc format | ||
|
||
## Common Issues & Solutions | ||
|
||
```bash | ||
# Build failures | ||
export NO_DIST=1 # Disable distributed mode if MPI issues | ||
export NO_DPC=1 # Disable GPU if driver issues | ||
python setup.py build_ext --inplace --force --abs-rpath # Linux linking | ||
|
||
# Import/path issues | ||
export PYTHONPATH=$(pwd) # Add repo to path | ||
python setup.py develop # Ensure editable install | ||
``` | ||
|
||
## Related Instructions | ||
- `sklearnex.instructions.md` - Primary sklearn interface and patching | ||
- `daal4py.instructions.md` - Direct oneDAL bindings and model builders | ||
- `onedal.instructions.md` - Low-level C++ bindings | ||
- `src.instructions.md` - Core C++/Cython implementation | ||
- `tests.instructions.md` - Testing infrastructure and validation | ||
- `build-config.instructions.md` - Build system and environment setup | ||
|
||
For detailed implementation guides, see the corresponding AGENTS.md files in each directory. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# onedal/* - Low-Level C++ Bindings | ||
|
||
## Purpose | ||
Pybind11-based C++ bindings providing the bridge between Python and Intel oneDAL C++ library. | ||
|
||
## Key Components | ||
- `datatypes/`: Memory management and array conversions (NumPy, SYCL USM, DLPack) | ||
- `common/`: Policy management, device selection, serialization | ||
- `*/`: Algorithm-specific implementations (cluster/, decomposition/, linear_model/, etc.) | ||
- `spmd/`: Distributed computing interfaces | ||
|
||
## Memory Management | ||
```python | ||
# Zero-copy conversions handled automatically | ||
import numpy as np | ||
from onedal.cluster import DBSCAN | ||
|
||
# NumPy arrays converted to oneDAL tables without copying | ||
X = np.random.random((1000, 10)) | ||
model = DBSCAN().fit(X) # Automatic NumPy → oneDAL conversion | ||
``` | ||
|
||
## Device Context | ||
|
||
For comprehensive device management, see [onedal/AGENTS.md](../onedal/AGENTS.md). | ||
|
||
```python | ||
import dpctl | ||
with dpctl.device_context("gpu:0"): | ||
model = DBSCAN().fit(X) | ||
``` | ||
|
||
## Algorithm Structure | ||
- Each algorithm module follows consistent pattern: | ||
- `fit()` method for training | ||
- `predict()` method for inference (where applicable) | ||
- Parameters match oneDAL C++ API | ||
- Results as Python objects with named attributes | ||
|
||
## Testing | ||
```bash | ||
# Low-level onedal tests | ||
pytest onedal/tests/ # Core functionality | ||
pytest onedal/datatypes/tests/ # Memory management | ||
pytest onedal/common/tests/ # Device/policy tests | ||
|
||
# Algorithm-specific tests | ||
pytest onedal/cluster/tests/test_dbscan.py # DBSCAN implementation | ||
pytest onedal/linear_model/tests/ # Linear models | ||
``` | ||
|
||
## Development Notes | ||
- Direct interface to oneDAL C++ API through pybind11 | ||
- Handles memory management between Python/C++ automatically | ||
- Provides foundation for both daal4py and sklearnex layers | ||
- SPMD module enables distributed computing with MPI | ||
|
||
## Related Instructions | ||
- `general.instructions.md` - Repository setup and build requirements | ||
- `src.instructions.md` - C++/Cython implementation that uses onedal | ||
- `sklearnex.instructions.md` - High-level layer built on onedal | ||
- `daal4py.instructions.md` - Alternative interface to onedal | ||
- See `onedal/AGENTS.md` for detailed technical implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a threshold enforcement from codecov.