bigdatagenomics
diff --git a/‎docs/algorithms/dm.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/algorithms/dm.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/algorithms/reads.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/algorithms/reads.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/algorithms/ri.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/algorithms/ri.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/api/adamContext.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/api/adamContext.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/api/genomicRdd.rst‎
Lines changed: 4 additions & 4 deletions b/‎docs/api/genomicRdd.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/api/joins.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/api/joins.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/api/overview.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/api/overview.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/api/pipes.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/api/pipes.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/architecture/evidence.rst‎
Lines changed: 3 additions & 3 deletions b/‎docs/architecture/evidence.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/architecture/overview.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/architecture/overview.rst‎
Lines changed: 6 additions & 6 deletions
@@ -7,7 +7,7 @@ while on the sequencer. To identify duplicated reads, we apply a
 heuristic algorithm that looks at read fragments that have a consistent
 mapping signature. First, we bucket together reads that are from the
 same sequenced fragment by grouping reads together on the basis of read
-name and record group. Per read bucket, we then identify the 5’ mapping
+name and record group. Per read bucket, we then identify the 5' mapping
 positions of the primarily aligned reads. We mark as duplicates all read
 pairs that have the same pair alignment locations, and all unpaired
 reads that map to the same sites. Only the highest scoring read/read
 
@@ -22,7 +22,7 @@ distributed system. These pre-processing stages include:
    2014), which is used by the GATK for Marking Duplicates. Our
    implementation is fully concordant with the Picard/GATK duplicate
    removal engine, except we are able to perform duplicate marking for
-   chimeric read pairs. [2]_ Specifically, because Picard’s traversal
+   chimeric read pairs. [2]_ Specifically, because Picard's traversal
    engine is restricted to processing linearly sorted alignments, Picard
    mishandles these alignments. Since our engine is not constrained by
    the underlying layout of data on disk, we are able to properly handle
@@ -60,7 +60,7 @@ distributed system. These pre-processing stages include:
    distribution of regions in mapped reads, joining two genomic datasets
    can be difficult or impossible when neither dataset fits completely
    on a single node. To reduce the impact of data skew on the runtime of
-   joins, we implemented a load balancing engine in ADAM’s
+   joins, we implemented a load balancing engine in ADAM's
    ShuffleRegionJoin core. This load balancing is a preprocessing step
    to the ShuffleRegionJoin and improves performance by 10–100x. The
    first phase of the load balancer is to sort and repartition the left
 
@@ -40,7 +40,7 @@ set of regions. For genomics, the convexity constraint is trivial to
 check: specifically, the genome is assembled out of reference contigs
 that define disparate 1-D coordinate spaces. If two regions exist on
 different contigs, they are known not to overlap. If two regions are on
-a single contig, we simply check to see if they overlap on that contig’s
+a single contig, we simply check to see if they overlap on that contig's
 1-D coordinate plane.
 
 Given this realization, we can define the convex hull Algorithm, which is a data parallel
@@ -88,7 +88,7 @@ Candidate Generation and Realignment
 Once we have generated the target set, we map across all the reads and
 check to see if the read overlaps a realignment target. We then group
 together all reads that map to a given realignment target; reads that do
-not map to a target are randomly assigned to a \`\`null’’ target. We do
+not map to a target are randomly assigned to a "null" target. We do
 not attempt realignment for reads mapped to null targets.
 
 To process non-null targets, we must first generate candidate haplotypes
 
@@ -108,11 +108,11 @@ With an ``ADAMContext``, you can load:
       ``loadReferenceFile``, which supports 2bit files, FASTA, and Parquet
       (Scala only)
 
-The methods labeled “Scala only” may be usable from Java, but may not be
+The methods labeled "Scala only" may be usable from Java, but may not be
 convenient to use.
 
 The ``JavaADAMContext`` class provides Java-friendly methods that are
 equivalent to the ``ADAMContext`` methods. Specifically, these methods
 use Java types, and do not make use of default parameters. In addition
 to the load/save methods described above, the ``ADAMContext`` adds the
-implicit methods needed for using ADAM’s pipe_ API.
+implicit methods needed for using ADAM's pipe_ API.
@@ -4,7 +4,7 @@ Working with genomic data using GenomicRDDs
 As described in the section on using the
 `ADAMContext <#adam-context>`__, ADAM loads genomic data into a
 ``GenomicRDD`` which is specialized for each datatype. This
-``GenomicRDD`` wraps Apache Spark’s Resilient Distributed Dataset (RDD,
+``GenomicRDD`` wraps Apache Spark's Resilient Distributed Dataset (RDD,
 (Zaharia et al. 2012)) API with genomic metadata. The ``RDD``
 abstraction presents an array of data which is distributed across a
 cluster. ``RDD``\ s are backed by a computational lineage, which allows
@@ -49,7 +49,7 @@ round trip between Parquet and VCF.
 Transforming GenomicRDDs
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
-Although ``GenomicRDD``\ s do not extend Apache Spark’s ``RDD`` class,
+Although ``GenomicRDD``\ s do not extend Apache Spark's ``RDD`` class,
 ``RDD`` operations can be performed on them using the ``transform``
 method. Currently, we only support ``RDD`` to ``RDD`` transformations
 that keep the same type as the base type of the ``GenomicRDD``. To apply
@@ -132,8 +132,8 @@ to load the data directly using the Spark SQL APIs, instead of loading
 the data as an RDD, and then transforming that RDD into a SQL Dataset.
 
 The functionality of the ``adam-codegen`` package is simple. The goal of
-this package is to take ADAM’s Avro schemas and to remap them into
-classes that implement Scala’s ``Product`` interface, and which have a
+this package is to take ADAM's Avro schemas and to remap them into
+classes that implement Scala's ``Product`` interface, and which have a
 specific style of constructor that is expected by Spark SQL.
 Additionally, we define functions that translate between these Product
 classes and the bdg-formats Avro models. Parquet files written with
 
@@ -1,4 +1,4 @@
-Using ADAM’s RegionJoin API
+Using ADAM's RegionJoin API
 ---------------------------
 
 Another useful API implemented in ADAM is the RegionJoin API, which
@@ -75,7 +75,7 @@ A subset of these joins are depicted in Figure 2 below.
 
 One common pattern involves joining a single dataset against many
 datasets. An example of this is joining an RDD of features (e.g.,
-gene/exon coordinates) against many different RDD’s of reads. If the
+gene/exon coordinates) against many different RDDs of reads. If the
 object that is being used many times (gene/exon coordinates, in this
 case), we can force that object to be broadcast once and reused many
 times with the ``broadcast()`` function. This pairs with the
 
@@ -4,9 +4,9 @@ API Overview
 The main entrypoint to ADAM is the `ADAMContext <#adam-context>`__,
 which allows genomic data to be loaded in to Spark as
 `GenomicRDD <#genomic-rdd>`__. GenomicRDDs can be transformed using
-ADAM’s built in `pre-processing algorithms <#algorithms>`__, `Spark’s
+ADAM's built in `pre-processing algorithms <#algorithms>`__, `Spark's
 RDD primitives <#transforming>`__, the `region join <#join>`__
-primitive, and ADAM’s `pipe <#pipes>`__ APIs. GenomicRDDs can also be
+primitive, and ADAM's `pipe <#pipes>`__ APIs. GenomicRDDs can also be
 interacted with as `Spark SQL tables <#sql>`__.
 
 In addition to the Scala/Java API, ADAM can be used from
@@ -54,16 +54,16 @@ changes in ADAM.
 The ADAM Python API
 -------------------
 
-ADAM’s Python API wraps the `ADAMContext <#adam-context>`__ and
+ADAM's Python API wraps the `ADAMContext <#adam-context>`__ and
 `GenomicRDD <#genomic-rdd>`__ APIs so they can be used from PySpark. The
-Python API is feature complete relative to ADAM’s Java API, with the
+Python API is feature complete relative to ADAM's Java API, with the
 exception of the `region join <#join>`__ API, which is not supported.
 
 The ADAM R API
 --------------
 
-ADAM’s R API wraps the `ADAMContext <#adam-context>`__ and
+ADAM's R API wraps the `ADAMContext <#adam-context>`__ and
 `GenomicRDD <#genomic-rdd>`__ APIs so they can be used from SparkR. The
-R API is feature complete relative to ADAM’s Java API, with the
+R API is feature complete relative to ADAM's Java API, with the
 exception of the `region join <#join>`__ API, which is not supported.
 
@@ -1,10 +1,10 @@
-Using ADAM’s Pipe API
+Using ADAM's Pipe API
 ---------------------
 
-ADAM’s ``GenomicRDD`` API provides support for piping the underlying
+ADAM's ``GenomicRDD`` API provides support for piping the underlying
 genomic data out to a single node process through the use of a ``pipe``
-API. This builds off of Apache Spark’s ``RDD.pipe`` API. However,
-``RDD.pipe`` prints the objects as strings to the pipe. ADAM’s pipe API
+API. This builds off of Apache Spark's ``RDD.pipe`` API. However,
+``RDD.pipe`` prints the objects as strings to the pipe. ADAM's pipe API
 adds several important functions:
 
 -  It supports on-the-fly conversions to widely used genomic file
@@ -75,7 +75,7 @@ is being read into or out of the pipe. We support the following:
 -  We do not support piping CRAM due to complexities around the
    reference-based compression.
 -  ``FeatureRDD``:
--  ``InForamtter``\ s: ``BEDInFormatter``, ``GFF3InFormatter``,
+-  ``InFormatter``\ s: ``BEDInFormatter``, ``GFF3InFormatter``,
    ``GTFInFormatter``, and ``NarrowPeakInFormatter`` for writing
    features out to a pipe in BED, GFF3, GTF/GFF2, or NarrowPeak format,
    respectively.
@@ -163,7 +163,7 @@ each machine in our cluster. We suggest several different approaches:
 Using the Pipe API from Java
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The pipe API example above uses Scala’s implicit system and type
+The pipe API example above uses Scala's implicit system and type
 inference to make it easier to use the pipe API. However, we also
 provide a Java equivalent. There are several changes:
 
 
@@ -1,13 +1,13 @@
-Interacting with data through ADAM’s evidence access layer
+Interacting with data through ADAM's evidence access layer
 ----------------------------------------------------------
 
 ADAM exposes access to distributed datasets of genomic data through the
 `ADAMContext <#adam-context>`__ entrypoint. The ADAMContext wraps Apache
-Spark’s SparkContext, which tracks the configuration and state of the
+Spark's SparkContext, which tracks the configuration and state of the
 current running Spark application. On top of the SparkContext, the
 ADAMContext provides data loading functions which yield
 `GenomicRDD <#genomic-rdd>`__\ s. The GenomicRDD classes provide a
-wrapper around Apache Spark’s two APIs for manipulating distributed
+wrapper around Apache Spark's two APIs for manipulating distributed
 datasets: the legacy Resilient Distributed Dataset (Zaharia et al. 2012)
 and the new Spark SQL Dataset/DataFrame API (Armbrust et al. 2015).
 Additionally, the GenomicRDD is enriched with genomics-specific metadata
 
@@ -10,7 +10,7 @@ wide range of data formats and optimized query patterns without changing
 the data structures and query patterns that users are programming
 against.
 
-ADAM’s architecture was introduced as a response to the challenges
+ADAM's architecture was introduced as a response to the challenges
 processing the growing volume of genomic sequencing data in a reasonable
 timeframe (Schadt et al. 2010). While the per-run latency of current
 genomic pipelines such as the GATK could be improved by manually
@@ -24,13 +24,13 @@ make it difficult for bioinformatics developers to create novel
 distributed genomic analyses, and does little to attack sources of
 inefficiency or incorrectness in distributed genomics pipelines.
 
-ADAM’s architecture reconsiders how we build software for processing
+ADAM's architecture reconsiders how we build software for processing
 genomic data by eliminating the monolithic architectures that are driven
 by the underlying flat file formats used in genomics. These
 architectures impose significant restrictions, including:
 
 -  These implementations are locked to a single node processing model.
-   Even the GATK’s “map-reduce” styled walker API (McKenna et al. 2010)
+   Even the GATK's "map-reduce" styled walker API (McKenna et al. 2010)
    is limited to natively support processing on a single node. While
    these jobs can be manually partitioned and run in a distributed
    setting, manual partitioning can lead to imbalance in work
@@ -39,8 +39,8 @@ architectures impose significant restrictions, including:
    provided by modern distributed systems such as Apache Hadoop or Spark
    (Zaharia et al. 2012).
 -  Most of these implementations assume
-   invariants about the sorted order of records on disk. This “stack
-   smashing” (specifically, the layout of data is used to accelerate a
+   invariants about the sorted order of records on disk. This "stack
+   smashing" (specifically, the layout of data is used to accelerate a
    processing stage) can lead to bugs when data does not cleanly map to
    the assumed sort order. Additionally, since these sort order
    invariants are rarely explicit and vary from tool to tool, pipelines
@@ -50,7 +50,7 @@ architectures impose significant restrictions, including:
    this at the cost of opacity. If we can express the query patterns
    that are accelerated by these invariants at a higher level, then we
    can achieve both a better programming environment and enable various
-   query optimizations. \\end{itemize}
+   query optimizations.
 
 At the core of ADAM, users use the `ADAMContext <#adam-context>`__ to
 load data as `GenomicRDDs <#genomic-rdd>`__, which they can then