Skip to content

Commit 04c32a5

Browse files
mtaurasoclaude
andauthored
Mtauraso robot instructions (#679)
* Add three-file robot instructions hierarchy Create HYRAX_GUIDE.md as the canonical shared reference, CLAUDE.md for Claude Code, and rewrite .github/copilot-instructions.md for Copilot. This deduplicates content and fixes inaccuracies identified in PRs #635, #656, and #657: Python version (>=3.11), ConfigDict (Pydantic's, not custom), verbs (internal only), primary interface (notebooks), config philosophy (three-tier "Configuration OR Code"), manifest files (compromise, not design goal), changelogs (none), Pydantic scope (data_request only), and HyraxCifarDataset spelling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 6190043 commit 04c32a5

3 files changed

Lines changed: 447 additions & 256 deletions

File tree

.github/copilot-instructions.md

Lines changed: 80 additions & 179 deletions
Original file line numberDiff line numberDiff line change
@@ -1,186 +1,87 @@
1-
# Hyrax - An extensible Framework for Machine Learning in Astronomy
2-
3-
**ALWAYS follow these instructions first and only fallback to additional search and context gathering if the information here is incomplete or found to be in error.**
4-
5-
Hyrax is a Python-based tool for hunting rare and anomalous sources in large astronomical imaging surveys. It supports downloading cutouts, building latent representations, interactive visualization, and anomaly detection using PyTorch models.
6-
7-
## Working Effectively
8-
9-
### Bootstrap and Setup - NEVER CANCEL these commands
10-
- Create virtual environment: `conda create -n hyrax python=3.10 && conda activate hyrax`
11-
- Clone repository: `git clone https://github.com/lincc-frameworks/hyrax.git`
12-
- **CRITICAL**: Install dependencies using `.setup_dev.sh` script:
13-
- `cd hyrax && echo 'y' | bash .setup_dev.sh` -- NEVER CANCEL: Takes 5-15 minutes depending on network. Set timeout to 20+ minutes.
14-
- Script installs package with `pip install -e .'[dev]'` and sets up pre-commit hooks
15-
- **Note**: Script prompts for system install if no virtual environment detected - respond 'y' to proceed
16-
- **Alternative manual installation** if script fails due to network issues:
17-
- `python -m pip install --upgrade pip` first
18-
- `python -m pip install -e .'[dev]'` -- NEVER CANCEL: Takes 5-15 minutes. Set timeout to 20+ minutes.
19-
- `python -m pip install pre-commit && pre-commit install`
20-
- `conda install pandoc` (for documentation)
21-
- **Network Issues**: Installation may fail with ReadTimeoutError due to PyPI connectivity. Retry installation multiple times if needed.
22-
23-
### Build and Test Commands - NEVER CANCEL these commands
24-
- **Run tests**: `python -m pytest -m "not slow"` -- NEVER CANCEL: Takes 2-5 minutes. Set timeout to 10+ minutes.
25-
- **Run tests with coverage**: `python -m pytest --cov=hyrax --cov-report=xml -m "not slow"` -- NEVER CANCEL: Takes 3-6 minutes. Set timeout to 10+ minutes.
26-
- **Run slow tests**: `python -m pytest -m "slow"` -- NEVER CANCEL: Takes 10-20 minutes. Set timeout to 30+ minutes.
27-
- **Run all tests**: `python -m pytest` -- NEVER CANCEL: Takes 15-25 minutes. Set timeout to 45+ minutes.
28-
- **Run parallel tests**: `python -m pytest -n auto` (uses multiple cores)
29-
30-
### CLI Usage and Functionality
31-
- **Main CLI entry point**: `hyrax` command (defined in pyproject.toml as `hyrax = "hyrax_cli.main:main"`)
32-
- **Check version**: `hyrax --version`
33-
- **Get help**: `hyrax --help`
34-
- **Available verbs/commands**:
35-
- **Core operations**: `train`, `infer`, `download`, `prepare`
36-
- **Analysis**: `umap`, `visualize`, `lookup`
37-
- **Vector DB**: `save_to_database`, `database_connection`
38-
- **Utilities**: `rebuild_manifest`
39-
- **Verb-specific help**: `hyrax <verb> --help` (e.g., `hyrax train --help`)
40-
- **Configuration**: Use `--runtime-config path/to/config.toml` or `-c path/to/config.toml`
41-
- **Verb implementation**: All verbs are classes in `src/hyrax/verbs/` that inherit from `Verb` base class
42-
43-
### Development and Code Quality - NEVER CANCEL these commands
44-
- **Pre-commit checks**: `pre-commit run --all-files` -- NEVER CANCEL: Takes 3-8 minutes. Set timeout to 15+ minutes.
45-
- **Linting with ruff**: `ruff check src/ tests/` -- Takes 10-30 seconds.
46-
- **Format with ruff**: `ruff format src/ tests/` -- Takes 10-30 seconds.
47-
- **Build documentation**: `sphinx-build -M html ./docs ./_readthedocs` -- NEVER CANCEL: Takes 2-4 minutes. Set timeout to 10+ minutes.
48-
49-
## Validation and Testing
50-
51-
### CRITICAL: Always run these validation steps after making changes
52-
1. **NEVER CANCEL**: Lint and format code: `ruff check src/ tests/ && ruff format src/ tests/`
53-
2. **NEVER CANCEL**: Run unit tests: `python -m pytest -m "not slow"` (timeout: 10+ minutes)
54-
3. **NEVER CANCEL**: Run pre-commit hooks: `pre-commit run --all-files` (timeout: 15+ minutes)
55-
56-
### Manual Validation Scenarios
57-
After making changes, ALWAYS test these scenarios:
58-
1. **CLI functionality**: Run `hyrax --help` and `hyrax --version` to ensure CLI works
59-
2. **Import test**: `python -c "import hyrax; h = hyrax.Hyrax(); print('Success')"`
60-
3. **Configuration loading**: Verify config loads with `hyrax.Hyrax()` constructor
61-
4. **Verb functionality**: Test relevant verbs like `hyrax train --help` if modifying training code
62-
63-
### Test Categories and Markers
64-
- **Fast tests**: `python -m pytest -m "not slow"` (default test suite)
65-
- **Slow tests**: `python -m pytest -m "slow"` (integration and E2E tests)
66-
- **E2E tests**: Full end-to-end workflows testing models and datasets
67-
- **Test datasets**: Uses built-in datasets like `HyraxCifarDataset`, `HSCDataSet`
68-
- **Test models**: Primarily tests `HyraxAutoencoder` model
69-
- **Parallel testing**: Use `-n auto` for multiprocessing
70-
71-
### Timeout Values and Timing Expectations
72-
- **NEVER CANCEL**: Package installation: 5-15 minutes (timeout: 20+ minutes)
73-
- **NEVER CANCEL**: Unit tests: 2-5 minutes (timeout: 10+ minutes)
74-
- **NEVER CANCEL**: Full test suite: 15-25 minutes (timeout: 45+ minutes)
75-
- **NEVER CANCEL**: Pre-commit hooks: 3-8 minutes (timeout: 15+ minutes)
76-
- **NEVER CANCEL**: Documentation build: 2-4 minutes (timeout: 10+ minutes)
77-
- Code formatting/linting: 10-30 seconds
78-
79-
### Network and Installation Issues
80-
- **PyPI Connectivity**: May encounter ReadTimeoutError when installing packages
81-
- **Retry Strategy**: If installation fails, wait 1-2 minutes and retry the same command
82-
- **Alternative mirrors**: Consider using `--index-url` with alternative PyPI mirrors if persistent issues
83-
- **Dependency conflicts**: The package has complex ML dependencies (PyTorch, etc.) which may cause conflicts
84-
85-
## Repository Structure and Navigation
86-
87-
### Key Directories
88-
- `src/hyrax/`: Main package source code
89-
- `src/hyrax_cli/`: CLI entry point (`main.py`)
90-
- `src/hyrax/verbs/`: Command implementations (train, infer, download, etc.)
91-
- `src/hyrax/data_sets/`: Dataset implementations
92-
- `src/hyrax/models/`: Model definitions
93-
- `src/hyrax/vector_dbs/`: Vector database implementations (ChromaDB, Qdrant)
94-
- `tests/hyrax/`: Unit tests
95-
- `docs/`: Documentation source files
96-
- `benchmarks/`: Performance benchmarks
97-
- `example_notebooks/`: Example Jupyter notebooks
98-
99-
### Important Files
100-
- `pyproject.toml`: Project configuration, dependencies, scripts
101-
- `src/hyrax/hyrax_default_config.toml`: Default configuration template
102-
- `.setup_dev.sh`: Development environment setup script
103-
- `.pre-commit-config.yaml`: Pre-commit hook configuration
104-
- `.github/workflows/`: CI/CD pipeline definitions
105-
106-
### Configuration System
107-
- Default config: `src/hyrax/hyrax_default_config.toml`
108-
- Users can override with custom config files via `--runtime-config`
109-
- Config sections: `[general]`, `[model]`, `[train]`, `[data_set]`, `[download]`, etc.
110-
111-
## Common Tasks and Workflows
1+
# Hyrax — GitHub Copilot Instructions
2+
3+
> **`HYRAX_GUIDE.md` in the repo root is the canonical reference** for architecture,
4+
> config system, registries, and conventions. This file contains only Copilot-specific
5+
> overrides (primarily timeout handling for async execution). When in doubt, follow
6+
> `HYRAX_GUIDE.md`.
7+
8+
## Critical: Long-Running Commands
9+
10+
**NEVER CANCEL** these commands — they are expected to take minutes, not seconds:
11+
12+
| Command | Typical duration |
13+
|---------|-----------------|
14+
| `echo 'y' \| bash .setup_dev.sh` | 5–15 min |
15+
| `python -m pytest -m "not slow"` | 2–5 min |
16+
| `python -m pytest` | 15–25 min |
17+
| `pre-commit run --all-files` | 3–8 min |
18+
19+
Set timeouts generously (at least 2× the typical duration). If a command appears to
20+
hang, it is almost certainly still working.
21+
22+
**Network Issues:** Installation may fail with ReadTimeoutError due to PyPI connectivity. Retry installation
23+
multiple times if needed.
24+
25+
## Validation Workflow
26+
27+
After every change, run these three steps in order:
28+
29+
```bash
30+
ruff check src/ tests/ && ruff format src/ tests/
31+
python -m pytest -m "not slow"
32+
pre-commit run --all-files
33+
```
34+
35+
Let the linter fix style issues — do not hand-tune formatting.
36+
37+
## Quick Reference
38+
39+
| Item | Value |
40+
|------|-------|
41+
| **Python version** | ≥ 3.11 |
42+
| **Primary interface** | Jupyter notebooks |
43+
| **Secondary interface** | `hyrax` CLI (for HPC / Slurm) |
44+
| **CLI entry point** | `hyrax = "hyrax_cli.main:main"` |
45+
| **Line length** | 110 (ruff-enforced) |
46+
| **Config format** | TOML (`hyrax_default_config.toml`) |
47+
| **Config override** | `--runtime-config path/to/config.toml` or `-c` |
48+
| **Test markers** | `slow` (integration), unmarked (fast) |
49+
50+
Common verbs: `train`, `infer`, `download`, `prepare`, `umap`, `visualize`, `lookup`,
51+
`save_to_database`, `rebuild_manifest`, `to_onnx`, `test`, `search`, `engine`,
52+
`database_connection`, `model`.
53+
54+
## Key Pitfalls
55+
56+
See `HYRAX_GUIDE.md` for the full list. Key points:
57+
58+
- **`key = false` means `None`** — TOML has no null. Hyrax uses `false` as a sentinel
59+
for "not set." Code must treat `False` as `None` for these keys.
60+
- **`ConfigDict` is Pydantic's** — the runtime config is an ordinary mutable `dict`,
61+
not a custom immutable wrapper.
62+
- **Verbs are internal only** — external plugins register models and datasets, not verbs.
63+
- **Manifest files** — ask the user before extending this pattern.
64+
- **Pydantic validation** — do not add to new config sections.
65+
66+
## Repository Layout
67+
68+
```
69+
src/hyrax/ Main package (models, data_sets, verbs, config_schemas, vector_dbs)
70+
src/hyrax_cli/ CLI entry point
71+
tests/hyrax/ Test suite
72+
docs/ Sphinx documentation
73+
example_notebooks/ Jupyter examples
74+
benchmarks/ ASV performance benchmarks
75+
```
76+
77+
See `HYRAX_GUIDE.md` for detailed structure and architecture.
11278

11379
### Adding New Features
80+
Only skip these if specifically requested by the user, otherwise:
81+
11482
1. **ALWAYS** run full validation first: `python -m pytest -m "not slow"`
11583
2. Make changes in appropriate `src/hyrax/` subdirectory
116-
3. Add tests in `tests/hyrax/` following existing patterns
84+
3. Add tests in `tests/hyrax/test_<name>.py` following existing patterns
11785
4. **ALWAYS** run: `ruff format src/ tests/ && ruff check src/ tests/`
11886
5. **ALWAYS** run: `python -m pytest -m "not slow"` (timeout: 10+ minutes)
11987
6. **ALWAYS** run: `pre-commit run --all-files` (timeout: 15+ minutes)
120-
121-
### Working with Models
122-
- Models defined in `src/hyrax/models/`
123-
- Built-in models: `HyraxAutoencoder`, `HyraxCNN`
124-
- Model registry system automatically discovers models
125-
- General model configuration in `[model]` section of config files
126-
- Configurations for specific models in `[model.<ModelName>]` sections
127-
- Training via `hyrax train` command
128-
- Export to ONNX format supported
129-
130-
### Working with Data
131-
- Data loaders in `src/hyrax/data_sets/`
132-
- Built-in datasets: `HSCDataSet`, `HyraxCifarDataset`, `LSSTDataset`, `FitsImageDataSet`
133-
- Dataset splits: train/validation/test controlled by config
134-
- Configuration in `[data_set]` section
135-
- Default data directory: `./data/`
136-
- Sample data includes HSC1k dataset for testing
137-
138-
### Working with Vector Databases
139-
- Implementations in `src/hyrax/vector_dbs/`
140-
- Supported: ChromaDB, Qdrant
141-
- Commands: `save_to_database`, `database_connection`
142-
- Configuration in `[vector_db]` section
143-
144-
## Notebook Development
145-
- Jupyter integration via `holoviews`, `bokeh` for visualizations
146-
- Interactive visualization via `hyrax visualize` verb
147-
- Pre-executed examples in `docs/pre_executed/`
148-
149-
## CI/CD and GitHub Workflows
150-
- Main workflows in `.github/workflows/`
151-
- **Testing**: `testing-and-coverage.yml` runs on PRs and main branch
152-
- **Smoke test**: `smoke-test.yml` runs daily
153-
- **Documentation**: `build-documentation.yml` builds docs
154-
- **Benchmarks**: ASV benchmarks via `asv-*.yml` workflows
155-
- **Pre-commit**: Automated via `pre-commit-ci.yml`
156-
157-
## Troubleshooting
158-
- **Import errors**: Ensure `pip install -e .'[dev]'` completed successfully
159-
- **Network timeouts during install**: Retry installation multiple times, may require 3-5 attempts due to PyPI connectivity issues
160-
- **ReadTimeoutError**: Common during installation - wait 1-2 minutes and retry the same pip command
161-
- **CLI not found**: Verify installation with `pip list | grep hyrax`
162-
- **Tests failing**: Check if in virtual environment and dependencies installed
163-
- **Pre-commit issues**: Run `pre-commit install` if hooks not working
164-
- **Permission issues**: Use `--user` flag with pip if encountering permission errors
165-
- **Virtual environment**: Always use conda/venv to avoid system Python conflicts
166-
167-
## Performance Notes
168-
- Vector database operations can be slow with large datasets
169-
- Benchmarks available in `benchmarks/` directory (run with `asv` tool)
170-
- Use `--timeout` parameters appropriately for long-running operations
171-
- ChromaDB performance degrades with vectors >10,000 elements
172-
- UMAP fitting limited to 1024 samples by default for performance
173-
- Benchmark tests include timing for CLI help commands, object construction, and vector DB operations
174-
175-
## Common Command Reference
176-
```bash
177-
# Full development setup
178-
conda create -n hyrax python=3.10 && conda activate hyrax
179-
git clone https://github.com/lincc-frameworks/hyrax.git && cd hyrax
180-
echo 'y' | bash .setup_dev.sh
181-
182-
# Quick validation workflow
183-
ruff check src/ tests/ && ruff format src/ tests/
184-
python -m pytest -m "not slow"
185-
pre-commit run --all-files
186-
```

0 commit comments

Comments
 (0)