JuliaStats · nalimilan · Oct 19, 2017 · Sep 27, 2017 · Sep 27, 2017 · Oct 4, 2017
diff --git a/README.md b/README.md
@@ -11,30 +11,22 @@ Documentation:
 [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaStats.github.io/DataArrays.jl/stable)
 [![](https://img.shields.io/badge/docs-latest-blue.svg)](https://JuliaStats.github.io/DataArrays.jl/latest)
 
+The DataArrays package provides array types for working efficiently with [missing data](https://en.wikipedia.org/wiki/Missing_data)
+in Julia, based on the `null` value from the [Nulls.jl](https://github.com/JuliaData/Nulls.jl) package.
+In particular, it provides the following:
 
-The DataArrays package extends Julia by introducing data structures that can contain missing data. In particular, the package introduces three new data types to Julia:
-
-* `NA`: A singleton type that represents a single missing value.
 * `DataArray{T}`: An array-like data structure that can contain values of type `T`, but can also contain missing values.
 * `PooledDataArray{T}`: A variant of `DataArray{T}` optimized for representing arrays that contain many repetitions of a small number of unique values -- as commonly occurs when working with categorical data.
 
-# The `NA` Value
-
-Many languages represent missing values using a reserved value like `NULL` or `NA`. A missing integer value, for example, might be represented as a `NULL` value in SQL or as an `NA` value in R.
-
-Julia takes its conception of `NA` from R, where `NA` denotes missingness based on lack of information. If, for example, we were to measure people's heights as integers, an `NA` might reflect our ignorance of a specific person's height.
-
-Conceptualizing the use of `NA` as a signal of uncertainty will help you understand how `NA` interacts with other values. For example, it explains why `NA + 1` is `NA`, but `NA & false` is `false`. In general, `NA` corrupts any computation whose results cannot be determined without knowledge of the value that is `NA`.
-
 # DataArray's
 
-Most Julian arrays cannot contain `NA` values: only `Array{NAtype}` and heterogeneous Arrays can contain `NA` values. Of these, only heterogeneous arrays could contain values of any type other than `NAtype`.
+Most Julian arrays cannot contain `null` values: only `Array{Union{T, Null}}` and more generally `Array{>:Null}` can contain `null` values.
 
-The generic use of heterogeneous Arrays is discouraged in Julia because it is inefficient: accessing any value requires dereferencing a pointer. The `DataArray` type allows one to work around this inefficiency by providing tightly-typed arrays that can contain values of exactly one type, but can also contain `NA` values.
+The generic use of heterogeneous `Array` is discouraged in Julia versions below 0.7 because it is inefficient: accessing any value requires dereferencing a pointer. The `DataArray` type allows one to work around this inefficiency by providing tightly-typed arrays that can contain values of exactly one type, but can also contain `null` values.
 
-For example, a `DataArray{Int}` can contain integers and NA values. We can construct one as follows:
+For example, a `DataArray{Int}` can contain integers and `null` values. We can construct one as follows:
 
-	da = @data([1, 2, NA, 4])
+	da = @data([1, 2, null, 4])
 
 # PooledDataArray's
 

diff --git a/REQUIRE b/REQUIRE
@@ -1,4 +1,5 @@
 julia 0.6
+Nulls 0.1.2
 StatsBase 0.15.0
 Reexport
 SpecialFunctions
diff --git a/benchmark/operators.jl b/benchmark/operators.jl
@@ -6,11 +6,11 @@ srand(1776)
 
 const TEST_NAMES = [
     "Vector",
-    "DataVector No NA",
-    "DataVector Half NA",
+    "DataVector No null",
+    "DataVector Half null",
     "Matrix",
-    "DataMatrix No NA",
-    "DataMatrix Half NA"
+    "DataMatrix No null",
+    "DataMatrix Half null"
 ]
 
 function make_test_types(genfunc, sz)

diff --git a/benchmark/reduce.jl b/benchmark/reduce.jl
@@ -6,10 +6,10 @@ srand(1776)
 
 const TEST_NAMES = [
     "Vector",
-    "DataVector No NA skipna=false",
-    "DataVector No NA skipna=true",
-    "DataVector Half NA skipna=false",
-    "DataVector Half NA skipna=true"
+    "DataVector No null skipnull=false",
+    "DataVector No null skipnull=true",
+    "DataVector Half null skipnull=false",
+    "DataVector Half null skipnull=true"
 ]
 
 function make_test_types(genfunc, sz)
@@ -29,9 +29,9 @@ macro perf(fn, replications)
         println($fn)
         fns = [()->$fn(Data[1]),
                ()->$fn(Data[2]),
-               ()->$fn(Data[2]; skipna=true),
+               ()->$fn(Data[2]; skipnull=true),
                ()->$fn(Data[3]),
-               ()->$fn(Data[3]; skipna=true)]
+               ()->$fn(Data[3]; skipnull=true)]
         gc_disable()
         df = compare(fns, $replications)
         gc_enable()

diff --git a/benchmark/reducedim.jl b/benchmark/reducedim.jl
@@ -6,10 +6,10 @@ srand(1776)
 
 const TEST_NAMES = [
     "Matrix",
-    "DataMatrix No NA skipna=false",
-    "DataMatrix No NA skipna=true",
-    "DataMatrix Half NA skipna=false",
-    "DataMatrix Half NA skipna=true"
+    "DataMatrix No null skipnull=false",
+    "DataMatrix No null skipnull=true",
+    "DataMatrix Half null skipnull=false",
+    "DataMatrix Half null skipnull=true"
 ]
 
 function make_test_types(genfunc, sz)
@@ -29,9 +29,9 @@ macro perf(fn, dim, replications)
         println($fn, " (region = ", $dim, ")")
         fns = [()->$fn(Data[1], $dim),
                ()->$fn(Data[2], $dim),
-               ()->$fn(Data[2], $dim; skipna=true),
+               ()->$fn(Data[2], $dim; skipnull=true),
                ()->$fn(Data[3], $dim),
-               ()->$fn(Data[3], $dim; skipna=true)]
+               ()->$fn(Data[3], $dim; skipnull=true)]
         gc_disable()
         df = compare(fns, $replications)
         gc_enable()

diff --git a/docs/src/da.md b/docs/src/da.md
@@ -1,14 +1,7 @@
-# Representing missing data
-
 ```@meta
 CurrentModule = DataArrays
 ```
 
-```@docs
-NA
-NAtype
-```
-
 ## Arrays with possibly missing data
 
 ```@docs
@@ -19,9 +12,7 @@ DataArray
 DataVector
 DataMatrix
 @data
-isna
-dropna
-padna
+padnull
 levels
 ```
 

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,11 +1,10 @@
 # DataArrays.jl
 
-This package provides functionality for working with [missing data](https://en.wikipedia.org/wiki/Missing_data)
-in Julia.
+This package provides array types for working efficiently with [missing data](https://en.wikipedia.org/wiki/Missing_data)
+in Julia, based on the `null` value from the [Nulls.jl](https://github.com/JuliaData/Nulls.jl) package.
 In particular, it provides the following:
 
-* `NA`: A singleton representing a missing value
-* `DataArray{T}`: An array type that can house both values of type `T` and missing values
+* `DataArray{T}`: An array type that can house both values of type `T` and missing values (of type `Null`)
 * `PooledDataArray{T}`: An array type akin to `DataArray` but optimized for arrays with a smaller set of unique
   values, as commonly occurs with categorical data
 

diff --git a/spec/literals.md b/spec/literals.md
@@ -19,51 +19,44 @@ Julia's parser rewrites both of these literals as calls to the `vcat`
 function. The `vcat` function computes the tightest type that would
 enclose all of the values in the literal array. (REVISE)
 
-Because of the strange place occupied by `NAtype` in Julia's type
-hierarchy, the tightest type that would enclose any literal array
-containing a single `NA` would be `Any`, which is not very useful.
-As such, the DataArrays package needs to provide an alternative
-tool for writing out literal DataArray's.
-
-This is accomplished by using two macros, `@data` and `@pdata`,
-which rewrite array literals into a form that will allow proper
-typing.
+Two macros, `@data` and `@pdata`, rewrite array literals into a form
+that will allow direct construction of `DataArray`s and `PooledDataArray`s.
 
 # Basic Principle
 
 The basic mechanism that powers the `@data` and `@pdata` macros is the
 rewriting of array literals as a call to DataArray or PooledDataArray
 with a rewritten array literal and a Boolean mask that specifies where
-`NA` occurred in the original literal.
+`null` occurred in the original literal.
 
 For example,
 
-    @data [1, 2, NA, 4]
+    @data [1, 2, null, 4]
 
 will be rewritten as,
 
     DataArray([1, 2, 1, 4], [false, false, true, false])
 
 Note the added `1` created during the rewriting of the array literal.
 This value is called a `stub` and is always the first value found
-in the literal array that is not `NA`. The use of stubs explains two
+in the literal array that is not `null`. The use of stubs explains two
 important properties of the `@data` and `@pdata` macros:
 
 * If the entries of the array literal are not fixed values, but function calls, these function calls must be pure. Otherwise the impure funcion may be called more times than expected.
-* It is not possible to specify a literal DataArray that contains only `NA` values.
-* None of the variables used in a literal array can be called `NA`. This is just good style anyway, so it is not much of a limitation.
+* It is not possible to specify a literal DataArray that contains only `null` values.
+* None of the variables used in a literal array can be called `null`. This is just good style anyway, so it is not much of a limitation.
 
 # Limitations
 
 We restate the limitations noted above:
 
 * If the entries of the array literal are not fixed values, but function calls, these function calls must be pure. Otherwise the impure funcion may be called more times than expected.
-* It is not possible to specify a literal DataArray that contains only `NA` values.
-* None of the variables used in a literal array can be called `NA`. This is just good style anyway, so it is not much of a limitation.
+* It is not possible to specify a literal DataArray that contains only `null` values.
+* None of the variables used in a literal array can be called `null`. This is just good style anyway, so it is not much of a limitation.
 
 
 Note that the latter limitation is not very important, because a DataArray
-with only `NA` values is already problematic because it has no well-defined
+with only `null` values is already problematic because it has no well-defined
 type in Julia.
 
 One final limitation is that the rewriting rules are not able to

diff --git a/src/DataArrays.jl b/src/DataArrays.jl
@@ -4,6 +4,7 @@ module DataArrays
     using Base: promote_op
     using Base.Cartesian, Reexport
     @reexport using StatsBase
+    @reexport using Nulls
     using SpecialFunctions
 
     const DEFAULT_POOLED_REF_TYPE = UInt32
@@ -25,23 +26,10 @@ module DataArrays
            DataArray,
            DataMatrix,
            DataVector,
-           dropna,
-           each_failna,
-           each_dropna,
-           each_replacena,
-           EachFailNA,
-           EachDropNA,
-           EachReplaceNA,
            FastPerm,
            getpoolidx,
            gl,
-           head,
-           isna,
-           levels,
-           NA,
-           NAException,
-           NAtype,
-           padna,
+           padnull,
            pdata,
            PooledDataArray,
            PooledDataMatrix,
@@ -51,11 +39,9 @@ module DataArrays
            rep,
            replace!,
            setlevels!,
-           setlevels,
-           tail
+           setlevels
 
     include("utils.jl")
-    include("natype.jl")
     include("abstractdataarray.jl")
     include("dataarray.jl")
     include("pooleddataarray.jl")
@@ -71,7 +57,6 @@ module DataArrays
     include("extras.jl")
     include("grouping.jl")
     include("statistics.jl")
-    include("predicates.jl")
     include("literals.jl")
     include("deprecated.jl")
 end