@@ -10,7 +10,7 @@ wide range of data formats and optimized query patterns without changing
1010the data structures and query patterns that users are programming
1111against.
1212
13- ADAM’ s architecture was introduced as a response to the challenges
13+ ADAM' s architecture was introduced as a response to the challenges
1414processing the growing volume of genomic sequencing data in a reasonable
1515timeframe (Schadt et al. 2010). While the per-run latency of current
1616genomic pipelines such as the GATK could be improved by manually
@@ -24,13 +24,13 @@ make it difficult for bioinformatics developers to create novel
2424distributed genomic analyses, and does little to attack sources of
2525inefficiency or incorrectness in distributed genomics pipelines.
2626
27- ADAM’ s architecture reconsiders how we build software for processing
27+ ADAM' s architecture reconsiders how we build software for processing
2828genomic data by eliminating the monolithic architectures that are driven
2929by the underlying flat file formats used in genomics. These
3030architectures impose significant restrictions, including:
3131
3232- These implementations are locked to a single node processing model.
33- Even the GATK’s “ map-reduce” styled walker API (McKenna et al. 2010)
33+ Even the GATK's " map-reduce" styled walker API (McKenna et al. 2010)
3434 is limited to natively support processing on a single node. While
3535 these jobs can be manually partitioned and run in a distributed
3636 setting, manual partitioning can lead to imbalance in work
@@ -39,8 +39,8 @@ architectures impose significant restrictions, including:
3939 provided by modern distributed systems such as Apache Hadoop or Spark
4040 (Zaharia et al. 2012).
4141- Most of these implementations assume
42- invariants about the sorted order of records on disk. This “ stack
43- smashing” (specifically, the layout of data is used to accelerate a
42+ invariants about the sorted order of records on disk. This " stack
43+ smashing" (specifically, the layout of data is used to accelerate a
4444 processing stage) can lead to bugs when data does not cleanly map to
4545 the assumed sort order. Additionally, since these sort order
4646 invariants are rarely explicit and vary from tool to tool, pipelines
@@ -50,7 +50,7 @@ architectures impose significant restrictions, including:
5050 this at the cost of opacity. If we can express the query patterns
5151 that are accelerated by these invariants at a higher level, then we
5252 can achieve both a better programming environment and enable various
53- query optimizations. \\ end{itemize}
53+ query optimizations.
5454
5555At the core of ADAM, users use the `ADAMContext <#adam-context >`__ to
5656load data as `GenomicRDDs <#genomic-rdd >`__, which they can then
0 commit comments