Skip to content

Commit 5d6f891

Browse files
authored
Add change point detection module. (#41)
* Implement various Bayesian conjugate priors. * Allow returning updated posterior after inference. * Allow creating TimeSeries from numpy directly. * Add testing for conjugate priors. * Let conj priors update on a single ts value. * Update docstring for SpectralResidual. * Initial implementation of BOCPD. * Update uninformative priors. The new uninformative priors allow estimation of prior probabilities without conditioning on any data. * Have BOCPD return z-score units. * Smarter prior initialization based on data. * Add BOCPD to API docs. * Use future look-ahead to aggregate probabilities. We consider a maximal look-ahead equal to the lag, and we allow the model to use data up to `lag` steps in the future to decide whether each point is a change point. This can greatly improve the batch prediction. * Update default BOCPD lag to None. * Automatic selection of Bayesian conjugate prior. * Remove unnecessary copying line. * Allow singular covariances in priors. * Fix an offset error in BOCPD dynamic programming. * Add explicit posterior for BayesianMVLinReg. * Use explicit posterior in BOCPD where possible. * Make sparse matrix allocation more efficient. * Use fully uninformative priors. Setting priors in a data-driven way led to over-estimating the probability of some change points. * Make sure matrices are non-singular. * Slightly refine min log likelihood calculation. * Add test coverage for BOCPD. * Make sure matrix is PSD as well as non-singular. * Allow BOCPD to predict on historical data. * Allow time series alignment for empty time series. * Make sure last_train_time is set in BOCPD.update * Build a predictive model for BOCPD. * Allow conjugate priors to make forecasts. * Add forecasting ability to BOCPD. * Train BOCPD on transformed time series. * Make format of model/config docs more consistent. * Add tests for BOCPD visualizations. * Update version. * Fix failing BOCPD tests. * Backwards compatibility for scipy 1.5. Scipy 1.6.0 introduced the multivariate_t random variable, which we use in our implementation of conjugate priors. However, scipy 1.6.0+ requires Python 3.7+. To maintain backwards compatibility with Python 3.6 (and therefore scipy 1.5), we implement the log density of the multivariate t distribution & use it as a fallback where necessary. We also implement an optimized computaiton of pseudo-inverse, and explicitly allow for singular V_0 matrices in the computation of the Bayesian Multivariate Linear Regression posterior. Finally, we update the testing workflows to avoid segfaults due to BLAS bugs in older package versions. * Allow integer # of time_stapms for BOCPD. * Remove issubclass check. * Add mentions of change point detection in the docs
1 parent 18fe266 commit 5d6f891

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1894
-54
lines changed

.github/workflows/tests.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,17 @@ jobs:
3434
- name: Test with pytest
3535
id: test
3636
run: |
37-
coverage run --source=merlion/ -L -m pytest -v
37+
# A BLAS bug causes high-dim multivar Bayesian LR test to segfault in 3.6. Run the test first to avoid.
38+
if [[ $PYTHON_VERSION == 3.6 ]]; then
39+
python -m pytest -v tests/change_point/test_conj_prior.py
40+
coverage run --source=merlion/ -L -m pytest -v --ignore tests/change_point/test_conj_prior.py
41+
# MoE test seems to hang in 3.7. Run the test first to avoid.
42+
elif [[ $PYTHON_VERSION == 3.7 ]]; then
43+
python -m pytest -v tests/forecast/test_MoE_forecast_ensemble.py
44+
coverage run --source=merlion/ -L -m pytest -v --ignore tests/forecast/test_MoE_forecast_ensemble.py
45+
else
46+
coverage run --source=merlion/ -L -m pytest -v
47+
fi
3848
3949
# Obtain code coverage from coverage report
4050
coverage report
@@ -56,6 +66,8 @@ jobs:
5666
COLOR=red
5767
fi
5868
echo "##[set-output name=color;]${COLOR}"
69+
env:
70+
PYTHON_VERSION: ${{ matrix.python-version }}
5971

6072
- name: Create coverage badge
6173
if: ${{ github.ref == 'refs/heads/main' && matrix.python-version == '3.8' }}

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,10 @@
3232
## Introduction
3333
Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that
3434
includes loading and transforming data, building and training models, post-processing model outputs, and evaluating
35-
model performance. It supports various time series learning tasks, including forecasting and anomaly detection for both
36-
univariate and multivariate time series. This library aims to provide engineers and researchers a one-stop solution to
37-
rapidly develop models for their specific time series needs, and benchmark them across multiple time series datasets.
35+
model performance. It supports various time series learning tasks, including forecasting, anomaly detection,
36+
and change point detection for both univariate and multivariate time series. This library aims to provide engineers and
37+
researchers a one-stop solution to rapidly develop models for their specific time series needs, and benchmark them
38+
across multiple time series datasets.
3839

3940
Merlion's key features are
4041
- Standardized and easily extensible data loading & benchmarking for a wide range of forecasting and anomaly

docs/source/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
Welcome to Merlion's documentation!
88
===================================
99
Merlion is a Python library for time series intelligence. It features a unified interface for many commonly used
10-
:doc:`models <merlion.models>` and :doc:`datasets <ts_datasets>` for anomaly detection and forecasting
11-
on both univariate and multivariate time series, along with standard
10+
:doc:`models <merlion.models>` and :doc:`datasets <ts_datasets>` for forecasting, anomaly detection, and change
11+
point detection on both univariate and multivariate time series, along with standard
1212
:doc:`pre-processing <merlion.transform>` and :doc:`post-processing <merlion.post_process>` layers.
1313
It has several modules to improve ease-of-use,
1414
including :ref:`visualization <merlion.plot>`,
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
merlion.models.anomaly.change\_point package
2+
============================================
3+
4+
.. automodule:: merlion.models.anomaly.change_point
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
9+
.. autosummary::
10+
bocpd
11+
12+
Submodules
13+
----------
14+
15+
merlion.models.anomaly.change\_point.bocpd module
16+
-------------------------------------------------
17+
18+
.. automodule:: merlion.models.anomaly.change_point.bocpd
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:

docs/source/merlion.models.anomaly.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Subpackages
2828
:maxdepth: 4
2929

3030
merlion.models.anomaly.forecast_based
31+
merlion.models.anomaly.change_point
3132

3233
Submodules
3334
----------

docs/source/merlion.models.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Finally, we support ensembles of models in :py:mod:`merlion.models.ensemble`.
6262
factory
6363
defaults
6464
anomaly
65+
anomaly.change_point
6566
anomaly.forecast_based
6667
forecast
6768
ensemble
@@ -75,6 +76,7 @@ Subpackages
7576
:maxdepth: 2
7677

7778
merlion.models.anomaly
79+
merlion.models.anomaly.change_point
7880
merlion.models.anomaly.forecast_based
7981
merlion.models.forecast
8082
merlion.models.ensemble

docs/source/merlion.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ each associated with its own sub-package:
77
for anomaly detection and forecasting. More specifically, we have
88

99
- :py:mod:`merlion.models.anomaly`: Anomaly detection models
10+
- :py:mod:`merlion.models.anomaly.change_point`: Change point detection models
1011
- :py:mod:`merlion.models.forecast`: Forecasting models
1112
- :py:mod:`merlion.models.anomaly.forecast_based`: Forecasting models adapted for anomaly detection. Anomaly
1213
scores are based on the residual between the predicted and true value at each timestamp.

docs/source/merlion.utils.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@ utilities for resampling time series.
1111
Submodules
1212
----------
1313

14+
merlion.utils.conj_priors module
15+
--------------------------------
16+
.. automodule:: merlion.utils.conj_priors
17+
:members:
18+
:undoc-members:
19+
:show-inheritance:
20+
1421
merlion.utils.istat module
1522
--------------------------
1623

merlion/models/anomaly/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
#
77
"""
88
Contains all anomaly detection models. Forecaster-based anomaly detection models
9-
may be found in :py:mod:`merlion.models.anomaly.forecast_based`.
9+
may be found in :py:mod:`merlion.models.anomaly.forecast_based`. Change-point detection models may be
10+
found in :py:mod:`merlion.models.anomaly.change_point`.
1011
1112
For anomaly detection, we define an abstract `DetectorBase` class which inherits from `ModelBase` and supports the
1213
following interface, in addition to ``model.save`` and ``DetectorClass.load`` defined for `ModelBase`:

merlion/models/anomaly/base.py

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,18 @@ class NoCalibrationDetectorConfig(DetectorConfig):
9898
def __init__(self, enable_calibrator=False, **kwargs):
9999
super().__init__(enable_calibrator=enable_calibrator, **kwargs)
100100

101+
@property
102+
def calibrator(self):
103+
"""
104+
:return: ``None``
105+
"""
106+
return None
107+
108+
@calibrator.setter
109+
def calibrator(self, calibrator):
110+
# no-op
111+
pass
112+
101113
@property
102114
def enable_calibrator(self):
103115
"""
@@ -132,7 +144,14 @@ def _default_post_rule_train_config(self):
132144
from merlion.evaluate.anomaly import TSADMetric
133145

134146
t = self.config._default_threshold.alm_threshold
135-
q = None if self.config.enable_calibrator or t == 0 else 2 * norm.cdf(t) - 1
147+
# self.calibrator is only None if calibration has been manually disabled
148+
# and the anomaly scores are expected to be calibrated by get_anomaly_score(). If
149+
# self.config.enable_calibrator, the model will return a calibrated score.
150+
if self.calibrator is None or self.config.enable_calibrator or t == 0:
151+
q = None
152+
# otherwise, choose the quantile corresponding to the given threshold
153+
else:
154+
q = 2 * norm.cdf(t) - 1
136155
return dict(metric=TSADMetric.F1, unsup_quantile=q)
137156

138157
@property
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#
2+
# Copyright (c) 2021 salesforce.com, inc.
3+
# All rights reserved.
4+
# SPDX-License-Identifier: BSD-3-Clause
5+
# For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
6+
#
7+
"""
8+
Contains all change point detection algorithms. These models implement the anomaly detector interface, but
9+
they are specialized for detecting change points in time series.
10+
"""

0 commit comments

Comments
 (0)