Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions .ci/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# AGENTS.md - CI/CD Infrastructure (.ci/)

## Purpose
CI/CD infrastructure for building, testing, and releasing Intel Extension for Scikit-learn across multiple platforms.

## Key Files for Agents
- `.ci/pipeline/ci.yml` - Main CI orchestrator
- `.ci/pipeline/build-and-test-*.yml` - Platform-specific builds
- `.ci/pipeline/linting.yml` - Code quality enforcement
- `.ci/scripts/` - Automation utilities

## Platform Support
- **Linux/macOS**: Uses conda, Intel DPC++ compiler, MPI support
- **Windows**: Visual Studio 2022, conda-forge packages
- **GPU**: Intel GPU support via DPC++/SYCL (dpctl, dpnp packages)

## Quality Gates
- **Linting**: black, isort, clang-format, numpydoc validation
- **Testing**: pytest with cross-platform compatibility
- **Coverage**: codecov integration with threshold enforcement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a threshold enforcement from codecov.


## Build Dependencies
- **oneDAL**: Downloads nightly builds from upstream oneDAL repo
- **Python**: Matrix testing across Python 3.9-3.13 (verified in .ci/pipeline/ci.yml)
- **sklearn**: Multiple version compatibility (1.0-1.7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bound to get outdated. Perhaps could rephrase it as "all sklearn versions beyond 1.0", or "last 3 releases of sklearn", or something like that.

- **GPU Libraries**: dpctl, dpnp for Intel GPU acceleration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing torch.


## Release Process
- **Automated**: Dynamic matrix generation for PyPI/conda releases
- **Multi-channel**: Both PyPI wheels and conda packages
- **Quality**: Automated sklearn compatibility testing before release

## Local Development Setup

### Quality Tools Configuration (from pyproject.toml)
```bash
# Code formatting
black --line-length 90 <files>
isort --profile black --line-length 90 <files>

# C++ formatting
clang-format --style=file <cpp_files>

# Documentation validation
numpydoc-validation <python_files>
```

### Build Dependencies Download
```bash
# oneDAL nightly builds (from .github/workflows/ci.yml)
# Automatically downloads from uxlfoundation/oneDAL nightly builds
# Sets DALROOT to downloaded oneDAL location
```

### Platform-Specific Build Commands

**Linux/macOS** (from .ci/pipeline/build-and-test-lnx.yml):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support macOS.

```bash
# Install DPC++ compiler
bash .ci/scripts/install_dpcpp.sh

# Set up environment
source /opt/intel/oneapi/compiler/latest/env/vars.sh
export DPCPPROOT=/opt/intel/oneapi/compiler/latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.


# Create conda environment
conda create -q -y -n CB -c conda-forge python=3.11 mpich pyyaml
conda activate CB
pip install -r dependencies-dev

# Build
./conda-recipe/build.sh
```

**Windows** (from .ci/pipeline/build-and-test-win.yml):
```batch
# Visual Studio setup
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall" x64

# Build
call conda-recipe\bld.bat
```

### Environment Variables for Development
```bash
# From setup.py and CI scripts
export DALROOT=/path/to/onedal # Required
export DPCPPROOT=/opt/intel/oneapi/compiler/latest # For GPU support
export MPIROOT=/path/to/mpi # For distributed computing
export NO_DPC=1 # Disable GPU support
export NO_DIST=1 # Disable distributed computing
export SKLEARNEX_VERSION=2024.7.0 # Version override
export MAKEFLAGS="-j$(nproc)" # Parallel build
```

## For AI Agents
- Follow established build templates
- Respect quality gates (linting, testing, coverage)
- Use platform-specific configurations appropriately
- Test across supported Python/sklearn version combinations
- Set required environment variables (DALROOT, DPCPPROOT, MPIROOT)
- Use conda environments to avoid dependency conflicts
- Run pre-commit hooks before submitting changes
2 changes: 2 additions & 0 deletions .github/.licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,11 @@ header:
- '.github/CODEOWNERS'
- '.github/Pull_Request_template.md'
- '.github/renovate.json'
- '.github/instructions/*.md'
# Specific files
- 'setup.cfg'
- 'LICENSE'
- 'AGENTS.md'
# External copies of copyrighted work
- 'onedal/datatypes/dlpack/dlpack.h'
comment: never
Expand Down
88 changes: 88 additions & 0 deletions .github/instructions/build-config.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Build Configuration Files

## Core Build Files
- `setup.py`: Main build script (500+ lines, complex configuration)
- `pyproject.toml`: Python project metadata + linting configuration
- `dependencies-dev`: Build-time dependencies (Cython, numpy, pybind11, cmake)
- `requirements-test.txt`: Test dependencies with version constraints
- `conda-recipe/meta.yaml`: Conda package build configuration

## Environment Variables (Critical)
```bash
# MANDATORY for building
export DALROOT=/path/to/onedal # oneDAL installation path (required)

# OPTIONAL but commonly needed
export MPIROOT=/path/to/mpi # MPI for distributed features
export NO_DIST=1 # Disable distributed mode
export NO_DPC=1 # Disable GPU/SYCL support
export NO_STREAM=1 # Disable streaming mode
export DEBUG_BUILD=1 # Debug symbols + no optimization
export MAKEFLAGS=-j$(nproc) # Parallel build threads
```

## Build Process (4 Stages)
1. **Code Generation**: oneDAL C++ headers → Python/Cython sources
2. **oneDAL Bindings**: cmake + pybind11 compilation
3. **Cython Processing**: .pyx files → C++ sources
4. **Final Compilation**: Link everything into Python extensions

## Dependencies
**Build Dependencies (dependencies-dev):**
- Cython==3.1.1 (exact version required)
- numpy>=2.0 (version varies by Python version)
- pybind11==2.13.6
- cmake==4.0.2
- setuptools==79.0.1

**Runtime Dependencies:**
- Intel oneDAL 2021.1+ (backwards compatible)
- numpy (version-specific, see requirements-test.txt)
- scikit-learn 1.0-1.7 (see compatibility matrix)

## Build Commands
```bash
# Development build (RECOMMENDED)
python setup.py develop # Creates .egg-link, editable

# Production builds
python setup.py install # Full install
python setup.py build_ext --inplace --force # Extensions only

# Special flags (Linux)
python setup.py build --abs-rpath # Absolute RPATH for custom oneDAL

# Conda build
conda build . # Uses conda-recipe/meta.yaml
```

## Common Build Issues
```bash
# oneDAL not found
RuntimeError: "Not set DALROOT variable"
→ Solution: export DALROOT=/path/to/onedal

# MPI required but missing
ValueError: "'MPIROOT' is not set, cannot build with distributed mode"
→ Solution: export NO_DIST=1 or set MPIROOT

# Cython version mismatch
→ Solution: pip install Cython==3.1.1 (exact version)

# Linking issues (Linux)
→ Solution: Use --abs-rpath flag
```

## CI/CD Configuration
- **GitHub Actions**: `.github/workflows/ci.yml`
- **Azure DevOps**: `.ci/pipeline/ci.yml` (main CI system)
- **Pre-commit**: `.pre-commit-config.yaml` (code quality)

Build timeouts: 120 minutes in CI (can be slow due to oneDAL compilation)

## Related Instructions
- `general.instructions.md` - Quick start build commands
- `src.instructions.md` - C++/Cython build details
- `tests.instructions.md` - Testing after successful builds

For platform-specific build details, see `.ci/AGENTS.md`
49 changes: 49 additions & 0 deletions .github/instructions/daal4py.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# daal4py/* - Direct oneDAL Python Bindings

## Purpose
Direct Python bindings to Intel oneDAL for maximum performance and model builders for XGBoost/LightGBM conversion.

## Three Sub-APIs
1. **Native oneDAL**: `import daal4py as d4p` - Direct algorithm access
2. **sklearn-compatible**: `from daal4py.sklearn import ...` - sklearn API with oneDAL backend
3. **Model Builders**: `from daal4py.mb import convert_model` - External model conversion

## API Overview

For detailed native oneDAL patterns and model builders, see [daal4py/AGENTS.md](../daal4py/AGENTS.md).

**Basic Pattern**:
```python
import daal4py as d4p
algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
result = algorithm.compute(data)
```

**Model Conversion**:
```python
from daal4py.mb import convert_model
d4p_model = convert_model(xgb_model) # 10-100x faster inference
```

## Testing
```bash
# Native daal4py tests
pytest --verbose --pyargs daal4py
pytest tests/test_daal4py_examples.py # Native API examples
pytest tests/test_model_builders.py # Model conversion tests

# sklearn compatibility in daal4py
pytest daal4py/sklearn/tests/ # sklearn-compatible API
```

## Development Notes
- Native API provides direct oneDAL algorithm access (fastest performance)
- sklearn-compatible API in `daal4py/sklearn/` maintains full sklearn compatibility
- Model builders enable oneDAL inference for models trained with other frameworks

## Related Instructions
- `general.instructions.md` - Repository setup and build requirements
- `onedal.instructions.md` - Low-level backend that daal4py wraps
- `src.instructions.md` - Core C++/Cython implementation details
- `tests.instructions.md` - Testing native oneDAL algorithms
- See `daal4py/AGENTS.md` for detailed algorithm usage patterns
58 changes: 58 additions & 0 deletions .github/instructions/general.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# General Repository Instructions - Intel Extension for Scikit-learn

## Repository Overview

**Intel Extension for Scikit-learn** (scikit-learn-intelex) accelerates scikit-learn by 10-100x using Intel oneDAL. Zero code changes required for existing sklearn applications.

- **Languages**: Python (70%), C++ (25%), Cython (5%)
- **Architecture**: 4-layer system (sklearnex → daal4py → onedal → Intel oneDAL C++)
- **Platforms**: Linux, Windows, macOS; CPU (x86_64, ARM), GPU (Intel via SYCL)
- **Python**: 3.9-3.13 supported

## Quick Start

**Build Setup**: See [build-config.instructions.md](build-config.instructions.md) for complete details.
```bash
export DALROOT=/path/to/onedal
python setup.py develop
```

**Testing**: See [tests.instructions.md](tests.instructions.md) for comprehensive testing.
```bash
pytest --verbose --pyargs sklearnex
```

**Code Quality**:
```bash
pre-commit run --all-files
```

## Code Standards

- **Python**: Black (line-length=90) + isort
- **C++**: clang-format version ≥14
- **Commits**: Must be signed-off (`git commit -s`)
- **Documentation**: numpydoc format

## Common Issues & Solutions

```bash
# Build failures
export NO_DIST=1 # Disable distributed mode if MPI issues
export NO_DPC=1 # Disable GPU if driver issues
python setup.py build_ext --inplace --force --abs-rpath # Linux linking

# Import/path issues
export PYTHONPATH=$(pwd) # Add repo to path
python setup.py develop # Ensure editable install
```

## Related Instructions
- `sklearnex.instructions.md` - Primary sklearn interface and patching
- `daal4py.instructions.md` - Direct oneDAL bindings and model builders
- `onedal.instructions.md` - Low-level C++ bindings
- `src.instructions.md` - Core C++/Cython implementation
- `tests.instructions.md` - Testing infrastructure and validation
- `build-config.instructions.md` - Build system and environment setup

For detailed implementation guides, see the corresponding AGENTS.md files in each directory.
63 changes: 63 additions & 0 deletions .github/instructions/onedal.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# onedal/* - Low-Level C++ Bindings

## Purpose
Pybind11-based C++ bindings providing the bridge between Python and Intel oneDAL C++ library.

## Key Components
- `datatypes/`: Memory management and array conversions (NumPy, SYCL USM, DLPack)
- `common/`: Policy management, device selection, serialization
- `*/`: Algorithm-specific implementations (cluster/, decomposition/, linear_model/, etc.)
- `spmd/`: Distributed computing interfaces

## Memory Management
```python
# Zero-copy conversions handled automatically
import numpy as np
from onedal.cluster import DBSCAN

# NumPy arrays converted to oneDAL tables without copying
X = np.random.random((1000, 10))
model = DBSCAN().fit(X) # Automatic NumPy → oneDAL conversion
```

## Device Context

For comprehensive device management, see [onedal/AGENTS.md](../onedal/AGENTS.md).

```python
import dpctl
with dpctl.device_context("gpu:0"):
model = DBSCAN().fit(X)
```

## Algorithm Structure
- Each algorithm module follows consistent pattern:
- `fit()` method for training
- `predict()` method for inference (where applicable)
- Parameters match oneDAL C++ API
- Results as Python objects with named attributes

## Testing
```bash
# Low-level onedal tests
pytest onedal/tests/ # Core functionality
pytest onedal/datatypes/tests/ # Memory management
pytest onedal/common/tests/ # Device/policy tests

# Algorithm-specific tests
pytest onedal/cluster/tests/test_dbscan.py # DBSCAN implementation
pytest onedal/linear_model/tests/ # Linear models
```

## Development Notes
- Direct interface to oneDAL C++ API through pybind11
- Handles memory management between Python/C++ automatically
- Provides foundation for both daal4py and sklearnex layers
- SPMD module enables distributed computing with MPI

## Related Instructions
- `general.instructions.md` - Repository setup and build requirements
- `src.instructions.md` - C++/Cython implementation that uses onedal
- `sklearnex.instructions.md` - High-level layer built on onedal
- `daal4py.instructions.md` - Alternative interface to onedal
- See `onedal/AGENTS.md` for detailed technical implementation
Loading
Loading