From 453f6e1a82787a5f0e3c50260c068beb6c68d5d9 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 11:56:14 -0600 Subject: [PATCH 1/9] separate repo from the earlier Allen Insitute repo. Make this repo canonical. --- README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ab3c446..bff4687 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,18 @@ -[![PyPI version](https://badge.fury.io/py/anscombe-transform.svg)](https://badge.fury.io/py/anscombe-transform) ![tests](https://github.com/datajoint/anscombe-transform/actions/workflows/tests.yaml/badge.svg) +[![PyPI version](https://badge.fury.io/py/anscombe-transform.svg)](https://badge.fury.io/py/anscombe-transform) ![tests](https://github.com/datajoint/anscombe-transform/actions/workflows/test.yml/badge.svg) # Anscombe transform -This codec is designed for compressing image recordings with Poisson noise, which are produced by photon-limited modalities such multiphoton microscopy, radiography, and astronomy. +This codec is designed for compressing image recordings with Poisson noise such as in microscopy, radiography, and astronomy. -The codec assumes that the video is linearly encoded with a potential offset (`zero_level`) and that the `photon_sensitivity` (the average increase in intensity per photon) is either already known or can be accurately estimated from the data. +**Status:** This is the official and actively maintained Anscombe transform codec for Zarr/Numcodecs, maintained by DataJoint. +It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier develpoments at [https://github.com/datajoint/compress-multiphoton]. It has since diverged significantly. New users should rely on this repository as the canonical source. + +The codec assumes that the video is linearly encoded with a potential offset (`zero_level`) and that the `conversion_gain` (also called `photon_sensitivity`)—the average increase in intensity per photon—is either already known or can be accurately estimated from the data. The codec re-quantizes the grayscale efficiently with a square-root-like transformation to equalize the noise variance across the grayscale levels: the [Anscombe Transform](https://en.wikipedia.org/wiki/Anscombe_transform). This results in a smaller number of unique grayscale levels and significant improvements in the compressibility of the data without sacrificing signal accuracy. -To use the codec, one must supply two pieces of information: `zero_level` (the input value corresponding to the absence of light) and `photon_sensitivity` (levels/photon). +To use the codec, one must supply two pieces of information: `zero_level` (the input value corresponding to the absence of light) and `conversion_gain` (levels/photon). The codec is used in Zarr as a filter prior to compression. From b2e0206506d391b64a5936e19f32ba1199d146a0 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 12:05:14 -0600 Subject: [PATCH 2/9] minor --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bff4687..4200379 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ This codec is designed for compressing image recordings with Poisson noise such as in microscopy, radiography, and astronomy. **Status:** This is the official and actively maintained Anscombe transform codec for Zarr/Numcodecs, maintained by DataJoint. -It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier develpoments at [https://github.com/datajoint/compress-multiphoton]. It has since diverged significantly. New users should rely on this repository as the canonical source. +It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier develpoments at https://github.com/datajoint/compress-multiphoton. It has since diverged significantly. New users should rely on this repository as the canonical source. The codec assumes that the video is linearly encoded with a potential offset (`zero_level`) and that the `conversion_gain` (also called `photon_sensitivity`)—the average increase in intensity per photon—is either already known or can be accurately estimated from the data. From 11c2ba410ea7a1a40b3231e57b048d0603494a17 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 13:42:43 -0600 Subject: [PATCH 3/9] add CITATION.cff --- CITATION.cff | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 CITATION.cff diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 0000000..85e9571 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,53 @@ +cff-version: 1.2.0 +title: Anscombe Transform Codec +message: If you use this software, please cite it. +version: 1.0.0 +repository-code: https://github.com/datajoint/anscombe-transform +abstract: > + Codec implementation of the Anscombe transform for compressing photon-limited + microscopy, radiography, and astronomy image recordings, published by DataJoint. +authors: + - family-names: Yatsenko + given-names: Dimitri + email: dimitr@datajoint.com + - family-names: Lecoq + given-names: Jerome + email: jeromel@alleninstitute.org + - family-names: Bennett + given-names: Davis + email: davis.v.bennett@gmail.com +contact: + - family-names: Yatsenko + given-names: Dimitri + email: dimitr@datajoint.com +license: MIT +preferred-citation: + type: article + title: Standardized measurements for monitoring and comparing multiphoton microscope systems + authors: + - family-names: Lees + given-names: Robert M. + - family-names: Bianco + given-names: Isaac H. + - family-names: Campbell + given-names: Robert A. A. + - family-names: Orlova + given-names: Natalia + - family-names: Peterka + given-names: Darcy S. + - family-names: Pichler + given-names: Bruno + - family-names: Smith + given-names: Spencer LaVere + - family-names: Yatsenko + given-names: Dimitri + - family-names: Yu + given-names: Che-Hang + - family-names: Packer + given-names: Adam M. + journal: Nature Protocols + volume: '20' + pages: 1-38 + date-published: 2025-03-17 + doi: 10.1038/s41596-024-01120-w + url: https://doi.org/10.1038/s41596-024-01120-w From 0d26ecd4b0a5a4d154a52649de74f7950ee279de Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 10 Nov 2025 20:01:40 +0000 Subject: [PATCH 4/9] Initial plan From 253db136966dc737bc8eb46b52fd66b5721c674e Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 14:02:08 -0600 Subject: [PATCH 5/9] Fix typo in README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4200379..062ab36 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ This codec is designed for compressing image recordings with Poisson noise such as in microscopy, radiography, and astronomy. **Status:** This is the official and actively maintained Anscombe transform codec for Zarr/Numcodecs, maintained by DataJoint. -It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier develpoments at https://github.com/datajoint/compress-multiphoton. It has since diverged significantly. New users should rely on this repository as the canonical source. +It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier developments at https://github.com/datajoint/compress-multiphoton. It has since diverged significantly. New users should rely on this repository as the canonical source. The codec assumes that the video is linearly encoded with a potential offset (`zero_level`) and that the `conversion_gain` (also called `photon_sensitivity`)—the average increase in intensity per photon—is either already known or can be accurately estimated from the data. From a06745128ba6fbd1423738ab2e432e4cd0c03326 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 10 Nov 2025 20:04:10 +0000 Subject: [PATCH 6/9] =?UTF-8?q?Fix=20spelling=20error:=20develpoments=20?= =?UTF-8?q?=E2=86=92=20developments?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: dimitri-yatsenko <1096169+dimitri-yatsenko@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4200379..062ab36 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ This codec is designed for compressing image recordings with Poisson noise such as in microscopy, radiography, and astronomy. **Status:** This is the official and actively maintained Anscombe transform codec for Zarr/Numcodecs, maintained by DataJoint. -It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier develpoments at https://github.com/datajoint/compress-multiphoton. It has since diverged significantly. New users should rely on this repository as the canonical source. +It originated as a fork of [AllenNeuralDynamics/poisson-numcodecs](https://github.com/AllenNeuralDynamics/poisson-numcodecs) and earlier developments at https://github.com/datajoint/compress-multiphoton. It has since diverged significantly. New users should rely on this repository as the canonical source. The codec assumes that the video is linearly encoded with a potential offset (`zero_level`) and that the `conversion_gain` (also called `photon_sensitivity`)—the average increase in intensity per photon—is either already known or can be accurately estimated from the data. From 5463d875c5482403a17eb43039e8ea72926b9d4e Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 14:59:48 -0600 Subject: [PATCH 7/9] correct tests and tutorial example. --- docs/examples/workbook.md | 30 ++++++++++++++++-------------- pyproject.toml | 2 +- tests/test_zarr.py | 39 ++++++++++++++++++++++++++++++--------- 3 files changed, 47 insertions(+), 24 deletions(-) diff --git a/docs/examples/workbook.md b/docs/examples/workbook.md index 2d918f8..3970a97 100644 --- a/docs/examples/workbook.md +++ b/docs/examples/workbook.md @@ -29,17 +29,19 @@ First, let's create synthetic data that mimics photon-limited imaging: ```python # Parameters for synthetic data -n_frames = 100 -height, width = 512, 512 -mean_photons = 50 # Average photons per pixel -zero_level = 100 # Camera baseline -conversion_gain = 2.5 # ADU per photon +n_frames = 30 +height, width = 120, 120 +mean_photon_rate = 5.0 # Average photons per pixel (exponential distribution of rates) +zero_level = -10.0 # camera baseline +conversion_gain = 2.5 # levels per photon # Generate Poisson-distributed photon counts -photon_counts = np.random.poisson(lam=mean_photons, size=(n_frames, height, width)) +photon_rate = np.random.exponential(scale=5, size=(1, height, width)) +photon_counts = np.random.poisson(lam=np.tile(photon_rate, (n_frames, 1, 1))) +measured_signal = photon_counts + np.random.randn(*size) * 0.2 -# Convert to camera signal (ADU) -camera_signal = (photon_counts * conversion_gain + zero_level).astype('int16') +# Convert to camera signal +camera_signal = (measured_signal * conversion_gain + zero_level).astype('int16') print(f"Data shape: {camera_signal.shape}") print(f"Data range: [{camera_signal.min()}, {camera_signal.max()}]") @@ -58,16 +60,16 @@ estimated_gain = result['sensitivity'] estimated_zero = result['zero_level'] print(f"\nTrue parameters:") -print(f" Conversion gain: {conversion_gain:.3f} ADU/photon") -print(f" Zero level: {zero_level:.1f} ADU") +print(f" Conversion gain: {conversion_gain:.3f} units/photon") +print(f" Zero level: {zero_level:.1f} ") print(f"\nEstimated parameters:") -print(f" Conversion gain: {estimated_gain:.3f} ADU/photon") -print(f" Zero level: {estimated_zero:.1f} ADU") +print(f" Conversion gain: {estimated_gain:.3f} units/photon") +print(f" Zero level: {estimated_zero:.1f}") print(f"\nEstimation error:") -print(f" Gain error: {abs(estimated_gain - conversion_gain):.3f} ADU/photon") -print(f" Zero level error: {abs(estimated_zero - zero_level):.1f} ADU") +print(f" Gain error: {abs(estimated_gain - conversion_gain):.3f} units/photon") +print(f" Zero level error: {abs(estimated_zero - zero_level):.1f} units") ``` ## Visualize Noise Model diff --git a/pyproject.toml b/pyproject.toml index 9462394..8bea9cb 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,8 +7,8 @@ name = "anscombe_transform" version = "1.0.0" authors = [ - { name = "Jerome Lecoq", email = "jeromel@alleninstitute.org" }, { name = "Dimitri Yatsenko", email = "dimitr@datajoint.com" }, + { name = "Jerome Lecoq", email = "jeromel@alleninstitute.org" }, { name = "Davis Bennett", email = "davis.v.bennett@gmail.com" }, ] maintainers = [ diff --git a/tests/test_zarr.py b/tests/test_zarr.py index cba3b5b..ebee94c 100644 --- a/tests/test_zarr.py +++ b/tests/test_zarr.py @@ -10,35 +10,57 @@ def test_zarr_v2_roundtrip() -> None: decoded_dtype = "int16" encoded_dtype = "uint8" - data = np.random.poisson(100, size=(20, 20)).astype(decoded_dtype) + + # generate fake data + np.random.seed(42) + size = (20, 20) sensitivity = 100.0 + zero_level = -5.0 + true_rate = np.random.exponential(scale=5, size=size) + data = ( + zero_level + + sensitivity * (np.random.poisson(true_rate) + np.random.randn(*size) * 0.25) + ).astype(decoded_dtype) + + # construct codec codec = AnscombeTransformV2( conversion_gain=sensitivity, - zero_level=0, + zero_level=zero_level, encoded_dtype=encoded_dtype, decoded_dtype=decoded_dtype, ) data_encoded = codec.encode(data) data_rt = codec.decode(data_encoded).reshape(data.shape) - store = {} - # write data + store = {} _ = create_array(store=store, data=data, zarr_format=2, compressors=codec) + # read data z_arr_r = open_array(store=store) assert z_arr_r.dtype == decoded_dtype - assert nearly_equal(z_arr_r, data_rt, sensitivity / 2) + breakpoint() + assert nearly_equal(z_arr_r, data_rt, sensitivity * 0.5) def test_zarr_v3_roundtrip() -> None: decoded_dtype = "int16" encoded_dtype = "uint8" - data = np.random.poisson(100, size=(20, 20)).astype(decoded_dtype) + + # generate fake data + np.random.seed(42) + size = (20, 20) sensitivity = 100.0 + zero_level = -5.0 + true_rate = np.random.exponential(scale=5, size=size) + data = ( + zero_level + + sensitivity * (np.random.poisson(true_rate) + np.random.randn(*size) * 0.25) + ).astype(decoded_dtype) + codec = AnscombeTransformV3( conversion_gain=sensitivity, - zero_level=0, + zero_level=zero_level, encoded_dtype=encoded_dtype, decoded_dtype=decoded_dtype, ) @@ -47,9 +69,8 @@ def test_zarr_v3_roundtrip() -> None: data_encoded = codec._encode(data) data_rt = codec._decode(data_encoded).reshape(data.shape) - store = {} - # write data + store = {} _ = create_array(store=store, data=data, zarr_format=3, filters=[codec]) # read data z_arr_r = open_array(store=store) From 4831940a8b7870fb0e4a603b1477a0133f06e3de Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 15:31:05 -0600 Subject: [PATCH 8/9] correct image generation in the overview --- docs/getting-started/quick-start.md | 33 +++++++++++++++++------------ docs/user-guide/overview.md | 2 +- 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index ce073c8..8681a64 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -9,18 +9,25 @@ import zarr import numpy as np from anscombe_transform import AnscombeTransformV3 -# Generate sample data with Poisson noise -# Simulating photon-limited imaging data -data = np.random.poisson(lam=50, size=(100, 512, 512)).astype('int16') +# Synthesize data with Poisson noise - simulating photon-limited microscopy data +n_frames, height, width = 50, 256, 256 +mean_photon_rate = 5.0 # Average photons per pixel (exponential distribution of rates) +zero_level = 20.0 # camera baseline +conversion_gain = 30.0 # levels per photon +photon_rate = np.random.exponential(scale=mean_photon_rate, size=(1, height, width)) +photon_counts = np.random.poisson(np.tile(photon_rate, (n_frames, 1, 1))) +measured_signal = photon_counts + np.random.randn(*size) * 0.2 + +data = (zero_level + conversion_gain * measured_signal).astype('int16') # Create a Zarr array with the Anscombe codec store = zarr.storage.MemoryStore() arr = zarr.create( store=store, shape=data.shape, - chunks=(10, 512, 512), + chunks=(12, 160, 80), dtype='int16', - filters=[AnscombeTransformV3(zero_level=100, conversion_gain=2.5)], + filters=[AnscombeTransformV3(zero_level=zero_level, conversion_gain=conversion_gain)], zarr_format=3 ) @@ -44,10 +51,10 @@ import zarr arr = zarr.open_array( 'data.zarr', mode='w', - shape=(100, 512, 512), - chunks=(10, 512, 512), + shape=data.shape, + chunks=(12, 160, 80), dtype='int16', - compressor=AnscombeTransformV2(zero_level=100, conversion_gain=2.5) + compressor=AnscombeTransformV2(zero_level=zero_level, conversion_gain=conversion_gain) ) # Write and read data @@ -64,10 +71,10 @@ from anscombe_transform import compute_conversion_gain import numpy as np # Load your movie data as (time, height, width) -movie = np.random.poisson(lam=50, size=(100, 512, 512)) +movie = data # Estimate parameters -result = compute_conversion_gain(movie) +result = compute_conversion_gain(data) print(f"Estimated conversion gain: {result['sensitivity']:.3f}") print(f"Estimated zero level: {result['zero_level']:.3f}") @@ -90,10 +97,10 @@ from anscombe_transform import AnscombeTransformV3 # For Zarr V3, use filters + codecs arr = zarr.create( - shape=(100, 512, 512), + shape=data.shape, chunks=(10, 512, 512), dtype='int16', - filters=[AnscombeTransformV3(zero_level=100, conversion_gain=2.5)], + filters=[AnscombeTransformV3(zero_level=zero_level, conversion_gain=conversion_gain)], compressor={'id': 'blosc', 'cname': 'zstd', 'clevel': 5}, zarr_format=3 ) @@ -102,7 +109,7 @@ arr = zarr.create( ## Key Parameters - **`zero_level`**: The signal value when no photons are detected. This is the baseline offset in your camera sensor. -- **`conversion_gain`** (also called `photon_sensitivity`): How many signal units correspond to one photon. For example, if your camera reports 2.5 ADU per photon, use `conversion_gain=2.5`. +- **`conversion_gain`** (also called `photon_sensitivity`): How many signal units correspond to one photon. For example, if your camera reports 2.5 levels increase in signal per photon, use `conversion_gain=2.5`. - **`encoded_dtype`**: The data type for encoded values (default: `uint8`). Use `uint8` for maximum compression. - **`decoded_dtype`**: The data type for decoded values (default: inferred from data). diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 5d09038..028cd50 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -74,7 +74,7 @@ These tables are computed once during codec initialization and reused for all en ### Compression Ratios Typical compression ratios (Anscombe + Blosc/Zstd): -- **4-8x** for typical multiphoton microscopy data +- **3-8x** for typical multiphoton microscopy data - **6-10x** for astronomy data - **3-6x** for radiography data From 0230a513719990394eb85bf36d72c696e87b81a9 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 10 Nov 2025 17:19:23 -0600 Subject: [PATCH 9/9] remove breakpoint --- examples/workbook.ipynb | 4 ++-- tests/test_zarr.py | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/workbook.ipynb b/examples/workbook.ipynb index a0e9a83..7518e26 100644 --- a/examples/workbook.ipynb +++ b/examples/workbook.ipynb @@ -392,7 +392,7 @@ ], "metadata": { "kernelspec": { - "display_name": "microns_trippy", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -406,7 +406,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.16" + "version": "3.11.11" } }, "nbformat": 4, diff --git a/tests/test_zarr.py b/tests/test_zarr.py index ebee94c..3ba1761 100644 --- a/tests/test_zarr.py +++ b/tests/test_zarr.py @@ -39,7 +39,6 @@ def test_zarr_v2_roundtrip() -> None: # read data z_arr_r = open_array(store=store) assert z_arr_r.dtype == decoded_dtype - breakpoint() assert nearly_equal(z_arr_r, data_rt, sensitivity * 0.5)