Skip to content

Commit bc0470c

Browse files
committed
Added readme.
1 parent dccd240 commit bc0470c

File tree

3 files changed

+144
-1
lines changed

3 files changed

+144
-1
lines changed

README.md

+54-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,54 @@
1-
# RollCall
1+
<h1 align="center"> Rollcall </h1><br>
2+
3+
<p align="center">
4+
A solution for managing data releases, indices, and aliases for Elasticsearch.
5+
</p>
6+
7+
<p align="center">
8+
<a href="" target="_blank"><img alt="Under Development" title="Under Development" src="http://www.overture.bio/img/progress-horizontal-UD.svg" width="320" /></a>
9+
</p>
10+
11+
## Introduction
12+
13+
The purpose of Rollcall is to assist those projects which make use of Elasticsearch as a primary data store to drive rich facet search and powerful search APIs.
14+
15+
Specifically, for the types of data models that require sharding of the domain to support incremental updates, real time updates, testing, and data versioning, there is a need for a comprehensive solution for managing index aliases in an automated fashion.
16+
17+
Rollcall solves the very specific problem of applying aliases to indexes in a way that makes sense for a sharded domain.
18+
19+
## Motivating Example
20+
21+
Suppose you are building a search API for patient demographic information across many different clinics and sites and you have decided to store the information as documents in Elasticsearch to make use of the powerful text search and performant aggregations.
22+
23+
Knowing that new clinics will be making data available, others will be making updates to their data at various times, and others will be withdrawing their data, you have decided to shard your domain around the concept of clinic. As such, your Extract-Transform-Load (ETL) pipeline will be producing an Elasticsearch index per clinic.
24+
25+
Example with three clinics:
26+
```
27+
demographic_entity_cl_clinicA_re_0
28+
demographic_entity_cl_clinicB_re_0
29+
demographic_entity_cl_clinicC_re_0
30+
```
31+
32+
For your search API to search across all three indices seamlessly, you create a single index alias for all 3 studies.
33+
```yml
34+
alias: demographic
35+
indices:
36+
- demographic_entity_cl_clinicA_re_0
37+
- demographic_entity_cl_clinicB_re_0
38+
- demographic_entity_cl_clinicC_re_0
39+
```
40+
41+
Now suppose we want to onboard a new clinic `clinicD` and make an update to an existing one `ClinicB`. We now have to start doing some acrobatics in terms of knowing which index needs an alias removed, which ones need one added, and which ones are untouched. This problem quickly becomes untenable for a human as the number of indices grows.
42+
43+
This problem is what Rollcall helps solve by introducing concepts like data releases, rollbacks, and redactions in an opinionated way.
44+
45+
## Index Naming
46+
47+
The thing that Rollcall is opinionated about is the way indices are named. So opinionated in fact that it uses a grammer file to describe the index naming grammer.
48+
49+
It can be found here: [IndexName.g4](src/main/antlr4/bio/overture/rollcall/antlr4/IndexName.g4)
50+
51+
Using one of the indices from the motivating example, `demographic_entity_cl_clinicA_re_0` this is what a parse tree looks like:
52+
![Parse Tree](img/parse.png)
53+
54+
This opinionated way of naming indices allows us to define semantics about the terms in a name which Rollcall uses to manage data releases. It is thanks to this that the index names and aliases are themselves the source of truth and state of the system rather than an external 3rd party acting as a datastore for state and truth.

img/antlr4_parse_tree.svg

+90
Loading

img/parse.png

29.7 KB
Loading

0 commit comments

Comments
 (0)