diff --git a/docs/ppl-lang/ppl-dedup-command.md b/docs/ppl-lang/ppl-dedup-command.md index 4e06d275e..9f8907d50 100644 --- a/docs/ppl-lang/ppl-dedup-command.md +++ b/docs/ppl-lang/ppl-dedup-command.md @@ -8,7 +8,6 @@ - [Example 1: Dedup by one field](#example-1-dedup-by-one-field) - [Example 2: Keep 2 duplicates documents](#example-2-keep-2-duplicates-documents) - [Example 3: Keep or Ignore the empty field by default](#example-3-keep-or-ignore-the-empty-field-by-default) - - [Example 4: Dedup in consecutive document](#example-4-dedup-in-consecutive-document) - [Limitation](#limitation) ### Description @@ -18,12 +17,11 @@ Using `dedup` command to remove identical document defined by field from the sea ### Syntax ```sql -dedup [int] [keepempty=] [consecutive=] +dedup [int] [keepempty=] ``` * int: optional. The ``dedup`` command retains multiple events for each combination when you specify . The number for must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. **Default:** 1 * keepempty: optional. if true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false. -* consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. **Default:** false. * field-list: mandatory. The comma-delimited field list. At least one field is required. @@ -91,23 +89,6 @@ PPL query: +------------------+-----------------------+ -### Example 4: Dedup in consecutive document - -The example show dedup the consecutive document. - -PPL query: - - os> source=accounts | dedup gender consecutive=true | fields account_number, gender; - fetched rows / total rows = 3/3 - +------------------+----------+ - | account_number | gender | - |------------------+----------| - | 1 | M | - | 13 | F | - | 18 | M | - +------------------+----------+ - - ### Additional Examples - `source = table | dedup a | fields a,b,c` @@ -122,7 +103,6 @@ PPL query: - `source = table | dedup 2 a,b | fields a,b,c` - `source = table | dedup 2 a keepempty=true | fields a,b,c` - `source = table | dedup 2 a,b keepempty=true | fields a,b,c` -- `source = table | dedup 1 a consecutive=true| fields a,b,c` (Consecutive deduplication is unsupported) ### Limitation: @@ -154,4 +134,4 @@ Union +- UnresolvedRelation ``` - - this `dedup` command with `allowedDuplication > 1` feature needs spark version >= 3.4 \ No newline at end of file + - this `dedup` command with `allowedDuplication > 1` feature needs spark version >= 3.4