|
1 | | -# Hyrax - An extensible Framework for Machine Learning in Astronomy |
2 | | - |
3 | | -**ALWAYS follow these instructions first and only fallback to additional search and context gathering if the information here is incomplete or found to be in error.** |
4 | | - |
5 | | -Hyrax is a Python-based tool for hunting rare and anomalous sources in large astronomical imaging surveys. It supports downloading cutouts, building latent representations, interactive visualization, and anomaly detection using PyTorch models. |
6 | | - |
7 | | -## Working Effectively |
8 | | - |
9 | | -### Bootstrap and Setup - NEVER CANCEL these commands |
10 | | -- Create virtual environment: `conda create -n hyrax python=3.10 && conda activate hyrax` |
11 | | -- Clone repository: `git clone https://github.com/lincc-frameworks/hyrax.git` |
12 | | -- **CRITICAL**: Install dependencies using `.setup_dev.sh` script: |
13 | | - - `cd hyrax && echo 'y' | bash .setup_dev.sh` -- NEVER CANCEL: Takes 5-15 minutes depending on network. Set timeout to 20+ minutes. |
14 | | - - Script installs package with `pip install -e .'[dev]'` and sets up pre-commit hooks |
15 | | - - **Note**: Script prompts for system install if no virtual environment detected - respond 'y' to proceed |
16 | | - - **Alternative manual installation** if script fails due to network issues: |
17 | | - - `python -m pip install --upgrade pip` first |
18 | | - - `python -m pip install -e .'[dev]'` -- NEVER CANCEL: Takes 5-15 minutes. Set timeout to 20+ minutes. |
19 | | - - `python -m pip install pre-commit && pre-commit install` |
20 | | - - `conda install pandoc` (for documentation) |
21 | | - - **Network Issues**: Installation may fail with ReadTimeoutError due to PyPI connectivity. Retry installation multiple times if needed. |
22 | | - |
23 | | -### Build and Test Commands - NEVER CANCEL these commands |
24 | | -- **Run tests**: `python -m pytest -m "not slow"` -- NEVER CANCEL: Takes 2-5 minutes. Set timeout to 10+ minutes. |
25 | | -- **Run tests with coverage**: `python -m pytest --cov=hyrax --cov-report=xml -m "not slow"` -- NEVER CANCEL: Takes 3-6 minutes. Set timeout to 10+ minutes. |
26 | | -- **Run slow tests**: `python -m pytest -m "slow"` -- NEVER CANCEL: Takes 10-20 minutes. Set timeout to 30+ minutes. |
27 | | -- **Run all tests**: `python -m pytest` -- NEVER CANCEL: Takes 15-25 minutes. Set timeout to 45+ minutes. |
28 | | -- **Run parallel tests**: `python -m pytest -n auto` (uses multiple cores) |
29 | | - |
30 | | -### CLI Usage and Functionality |
31 | | -- **Main CLI entry point**: `hyrax` command (defined in pyproject.toml as `hyrax = "hyrax_cli.main:main"`) |
32 | | -- **Check version**: `hyrax --version` |
33 | | -- **Get help**: `hyrax --help` |
34 | | -- **Available verbs/commands**: |
35 | | - - **Core operations**: `train`, `infer`, `download`, `prepare` |
36 | | - - **Analysis**: `umap`, `visualize`, `lookup` |
37 | | - - **Vector DB**: `save_to_database`, `database_connection` |
38 | | - - **Utilities**: `rebuild_manifest` |
39 | | -- **Verb-specific help**: `hyrax <verb> --help` (e.g., `hyrax train --help`) |
40 | | -- **Configuration**: Use `--runtime-config path/to/config.toml` or `-c path/to/config.toml` |
41 | | -- **Verb implementation**: All verbs are classes in `src/hyrax/verbs/` that inherit from `Verb` base class |
42 | | - |
43 | | -### Development and Code Quality - NEVER CANCEL these commands |
44 | | -- **Pre-commit checks**: `pre-commit run --all-files` -- NEVER CANCEL: Takes 3-8 minutes. Set timeout to 15+ minutes. |
45 | | -- **Linting with ruff**: `ruff check src/ tests/` -- Takes 10-30 seconds. |
46 | | -- **Format with ruff**: `ruff format src/ tests/` -- Takes 10-30 seconds. |
47 | | -- **Build documentation**: `sphinx-build -M html ./docs ./_readthedocs` -- NEVER CANCEL: Takes 2-4 minutes. Set timeout to 10+ minutes. |
48 | | - |
49 | | -## Validation and Testing |
50 | | - |
51 | | -### CRITICAL: Always run these validation steps after making changes |
52 | | -1. **NEVER CANCEL**: Lint and format code: `ruff check src/ tests/ && ruff format src/ tests/` |
53 | | -2. **NEVER CANCEL**: Run unit tests: `python -m pytest -m "not slow"` (timeout: 10+ minutes) |
54 | | -3. **NEVER CANCEL**: Run pre-commit hooks: `pre-commit run --all-files` (timeout: 15+ minutes) |
55 | | - |
56 | | -### Manual Validation Scenarios |
57 | | -After making changes, ALWAYS test these scenarios: |
58 | | -1. **CLI functionality**: Run `hyrax --help` and `hyrax --version` to ensure CLI works |
59 | | -2. **Import test**: `python -c "import hyrax; h = hyrax.Hyrax(); print('Success')"` |
60 | | -3. **Configuration loading**: Verify config loads with `hyrax.Hyrax()` constructor |
61 | | -4. **Verb functionality**: Test relevant verbs like `hyrax train --help` if modifying training code |
62 | | - |
63 | | -### Test Categories and Markers |
64 | | -- **Fast tests**: `python -m pytest -m "not slow"` (default test suite) |
65 | | -- **Slow tests**: `python -m pytest -m "slow"` (integration and E2E tests) |
66 | | -- **E2E tests**: Full end-to-end workflows testing models and datasets |
67 | | -- **Test datasets**: Uses built-in datasets like `HyraxCifarDataset`, `HSCDataSet` |
68 | | -- **Test models**: Primarily tests `HyraxAutoencoder` model |
69 | | -- **Parallel testing**: Use `-n auto` for multiprocessing |
70 | | - |
71 | | -### Timeout Values and Timing Expectations |
72 | | -- **NEVER CANCEL**: Package installation: 5-15 minutes (timeout: 20+ minutes) |
73 | | -- **NEVER CANCEL**: Unit tests: 2-5 minutes (timeout: 10+ minutes) |
74 | | -- **NEVER CANCEL**: Full test suite: 15-25 minutes (timeout: 45+ minutes) |
75 | | -- **NEVER CANCEL**: Pre-commit hooks: 3-8 minutes (timeout: 15+ minutes) |
76 | | -- **NEVER CANCEL**: Documentation build: 2-4 minutes (timeout: 10+ minutes) |
77 | | -- Code formatting/linting: 10-30 seconds |
78 | | - |
79 | | -### Network and Installation Issues |
80 | | -- **PyPI Connectivity**: May encounter ReadTimeoutError when installing packages |
81 | | -- **Retry Strategy**: If installation fails, wait 1-2 minutes and retry the same command |
82 | | -- **Alternative mirrors**: Consider using `--index-url` with alternative PyPI mirrors if persistent issues |
83 | | -- **Dependency conflicts**: The package has complex ML dependencies (PyTorch, etc.) which may cause conflicts |
84 | | - |
85 | | -## Repository Structure and Navigation |
86 | | - |
87 | | -### Key Directories |
88 | | -- `src/hyrax/`: Main package source code |
89 | | -- `src/hyrax_cli/`: CLI entry point (`main.py`) |
90 | | -- `src/hyrax/verbs/`: Command implementations (train, infer, download, etc.) |
91 | | -- `src/hyrax/data_sets/`: Dataset implementations |
92 | | -- `src/hyrax/models/`: Model definitions |
93 | | -- `src/hyrax/vector_dbs/`: Vector database implementations (ChromaDB, Qdrant) |
94 | | -- `tests/hyrax/`: Unit tests |
95 | | -- `docs/`: Documentation source files |
96 | | -- `benchmarks/`: Performance benchmarks |
97 | | -- `example_notebooks/`: Example Jupyter notebooks |
98 | | - |
99 | | -### Important Files |
100 | | -- `pyproject.toml`: Project configuration, dependencies, scripts |
101 | | -- `src/hyrax/hyrax_default_config.toml`: Default configuration template |
102 | | -- `.setup_dev.sh`: Development environment setup script |
103 | | -- `.pre-commit-config.yaml`: Pre-commit hook configuration |
104 | | -- `.github/workflows/`: CI/CD pipeline definitions |
105 | | - |
106 | | -### Configuration System |
107 | | -- Default config: `src/hyrax/hyrax_default_config.toml` |
108 | | -- Users can override with custom config files via `--runtime-config` |
109 | | -- Config sections: `[general]`, `[model]`, `[train]`, `[data_set]`, `[download]`, etc. |
110 | | - |
111 | | -## Common Tasks and Workflows |
| 1 | +# Hyrax — GitHub Copilot Instructions |
| 2 | + |
| 3 | +> **`HYRAX_GUIDE.md` in the repo root is the canonical reference** for architecture, |
| 4 | +> config system, registries, and conventions. This file contains only Copilot-specific |
| 5 | +> overrides (primarily timeout handling for async execution). When in doubt, follow |
| 6 | +> `HYRAX_GUIDE.md`. |
| 7 | +
|
| 8 | +## Critical: Long-Running Commands |
| 9 | + |
| 10 | +**NEVER CANCEL** these commands — they are expected to take minutes, not seconds: |
| 11 | + |
| 12 | +| Command | Typical duration | |
| 13 | +|---------|-----------------| |
| 14 | +| `echo 'y' \| bash .setup_dev.sh` | 5–15 min | |
| 15 | +| `python -m pytest -m "not slow"` | 2–5 min | |
| 16 | +| `python -m pytest` | 15–25 min | |
| 17 | +| `pre-commit run --all-files` | 3–8 min | |
| 18 | + |
| 19 | +Set timeouts generously (at least 2× the typical duration). If a command appears to |
| 20 | +hang, it is almost certainly still working. |
| 21 | + |
| 22 | +**Network Issues:** Installation may fail with ReadTimeoutError due to PyPI connectivity. Retry installation |
| 23 | +multiple times if needed. |
| 24 | + |
| 25 | +## Validation Workflow |
| 26 | + |
| 27 | +After every change, run these three steps in order: |
| 28 | + |
| 29 | +```bash |
| 30 | +ruff check src/ tests/ && ruff format src/ tests/ |
| 31 | +python -m pytest -m "not slow" |
| 32 | +pre-commit run --all-files |
| 33 | +``` |
| 34 | + |
| 35 | +Let the linter fix style issues — do not hand-tune formatting. |
| 36 | + |
| 37 | +## Quick Reference |
| 38 | + |
| 39 | +| Item | Value | |
| 40 | +|------|-------| |
| 41 | +| **Python version** | ≥ 3.11 | |
| 42 | +| **Primary interface** | Jupyter notebooks | |
| 43 | +| **Secondary interface** | `hyrax` CLI (for HPC / Slurm) | |
| 44 | +| **CLI entry point** | `hyrax = "hyrax_cli.main:main"` | |
| 45 | +| **Line length** | 110 (ruff-enforced) | |
| 46 | +| **Config format** | TOML (`hyrax_default_config.toml`) | |
| 47 | +| **Config override** | `--runtime-config path/to/config.toml` or `-c` | |
| 48 | +| **Test markers** | `slow` (integration), unmarked (fast) | |
| 49 | + |
| 50 | +Common verbs: `train`, `infer`, `download`, `prepare`, `umap`, `visualize`, `lookup`, |
| 51 | +`save_to_database`, `rebuild_manifest`, `to_onnx`, `test`, `search`, `engine`, |
| 52 | +`database_connection`, `model`. |
| 53 | + |
| 54 | +## Key Pitfalls |
| 55 | + |
| 56 | +See `HYRAX_GUIDE.md` for the full list. Key points: |
| 57 | + |
| 58 | +- **`key = false` means `None`** — TOML has no null. Hyrax uses `false` as a sentinel |
| 59 | + for "not set." Code must treat `False` as `None` for these keys. |
| 60 | +- **`ConfigDict` is Pydantic's** — the runtime config is an ordinary mutable `dict`, |
| 61 | + not a custom immutable wrapper. |
| 62 | +- **Verbs are internal only** — external plugins register models and datasets, not verbs. |
| 63 | +- **Manifest files** — ask the user before extending this pattern. |
| 64 | +- **Pydantic validation** — do not add to new config sections. |
| 65 | + |
| 66 | +## Repository Layout |
| 67 | + |
| 68 | +``` |
| 69 | +src/hyrax/ Main package (models, data_sets, verbs, config_schemas, vector_dbs) |
| 70 | +src/hyrax_cli/ CLI entry point |
| 71 | +tests/hyrax/ Test suite |
| 72 | +docs/ Sphinx documentation |
| 73 | +example_notebooks/ Jupyter examples |
| 74 | +benchmarks/ ASV performance benchmarks |
| 75 | +``` |
| 76 | + |
| 77 | +See `HYRAX_GUIDE.md` for detailed structure and architecture. |
112 | 78 |
|
113 | 79 | ### Adding New Features |
| 80 | +Only skip these if specifically requested by the user, otherwise: |
| 81 | + |
114 | 82 | 1. **ALWAYS** run full validation first: `python -m pytest -m "not slow"` |
115 | 83 | 2. Make changes in appropriate `src/hyrax/` subdirectory |
116 | | -3. Add tests in `tests/hyrax/` following existing patterns |
| 84 | +3. Add tests in `tests/hyrax/test_<name>.py` following existing patterns |
117 | 85 | 4. **ALWAYS** run: `ruff format src/ tests/ && ruff check src/ tests/` |
118 | 86 | 5. **ALWAYS** run: `python -m pytest -m "not slow"` (timeout: 10+ minutes) |
119 | 87 | 6. **ALWAYS** run: `pre-commit run --all-files` (timeout: 15+ minutes) |
120 | | - |
121 | | -### Working with Models |
122 | | -- Models defined in `src/hyrax/models/` |
123 | | -- Built-in models: `HyraxAutoencoder`, `HyraxCNN` |
124 | | -- Model registry system automatically discovers models |
125 | | -- General model configuration in `[model]` section of config files |
126 | | -- Configurations for specific models in `[model.<ModelName>]` sections |
127 | | -- Training via `hyrax train` command |
128 | | -- Export to ONNX format supported |
129 | | - |
130 | | -### Working with Data |
131 | | -- Data loaders in `src/hyrax/data_sets/` |
132 | | -- Built-in datasets: `HSCDataSet`, `HyraxCifarDataset`, `LSSTDataset`, `FitsImageDataSet` |
133 | | -- Dataset splits: train/validation/test controlled by config |
134 | | -- Configuration in `[data_set]` section |
135 | | -- Default data directory: `./data/` |
136 | | -- Sample data includes HSC1k dataset for testing |
137 | | - |
138 | | -### Working with Vector Databases |
139 | | -- Implementations in `src/hyrax/vector_dbs/` |
140 | | -- Supported: ChromaDB, Qdrant |
141 | | -- Commands: `save_to_database`, `database_connection` |
142 | | -- Configuration in `[vector_db]` section |
143 | | - |
144 | | -## Notebook Development |
145 | | -- Jupyter integration via `holoviews`, `bokeh` for visualizations |
146 | | -- Interactive visualization via `hyrax visualize` verb |
147 | | -- Pre-executed examples in `docs/pre_executed/` |
148 | | - |
149 | | -## CI/CD and GitHub Workflows |
150 | | -- Main workflows in `.github/workflows/` |
151 | | -- **Testing**: `testing-and-coverage.yml` runs on PRs and main branch |
152 | | -- **Smoke test**: `smoke-test.yml` runs daily |
153 | | -- **Documentation**: `build-documentation.yml` builds docs |
154 | | -- **Benchmarks**: ASV benchmarks via `asv-*.yml` workflows |
155 | | -- **Pre-commit**: Automated via `pre-commit-ci.yml` |
156 | | - |
157 | | -## Troubleshooting |
158 | | -- **Import errors**: Ensure `pip install -e .'[dev]'` completed successfully |
159 | | -- **Network timeouts during install**: Retry installation multiple times, may require 3-5 attempts due to PyPI connectivity issues |
160 | | -- **ReadTimeoutError**: Common during installation - wait 1-2 minutes and retry the same pip command |
161 | | -- **CLI not found**: Verify installation with `pip list | grep hyrax` |
162 | | -- **Tests failing**: Check if in virtual environment and dependencies installed |
163 | | -- **Pre-commit issues**: Run `pre-commit install` if hooks not working |
164 | | -- **Permission issues**: Use `--user` flag with pip if encountering permission errors |
165 | | -- **Virtual environment**: Always use conda/venv to avoid system Python conflicts |
166 | | - |
167 | | -## Performance Notes |
168 | | -- Vector database operations can be slow with large datasets |
169 | | -- Benchmarks available in `benchmarks/` directory (run with `asv` tool) |
170 | | -- Use `--timeout` parameters appropriately for long-running operations |
171 | | -- ChromaDB performance degrades with vectors >10,000 elements |
172 | | -- UMAP fitting limited to 1024 samples by default for performance |
173 | | -- Benchmark tests include timing for CLI help commands, object construction, and vector DB operations |
174 | | - |
175 | | -## Common Command Reference |
176 | | -```bash |
177 | | -# Full development setup |
178 | | -conda create -n hyrax python=3.10 && conda activate hyrax |
179 | | -git clone https://github.com/lincc-frameworks/hyrax.git && cd hyrax |
180 | | -echo 'y' | bash .setup_dev.sh |
181 | | - |
182 | | -# Quick validation workflow |
183 | | -ruff check src/ tests/ && ruff format src/ tests/ |
184 | | -python -m pytest -m "not slow" |
185 | | -pre-commit run --all-files |
186 | | -``` |
0 commit comments