Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsupported dedup consecutive paramaters from doc #1082

Merged
merged 1 commit into from
Mar 18, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 2 additions & 22 deletions docs/ppl-lang/ppl-dedup-command.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
- [Example 1: Dedup by one field](#example-1-dedup-by-one-field)
- [Example 2: Keep 2 duplicates documents](#example-2-keep-2-duplicates-documents)
- [Example 3: Keep or Ignore the empty field by default](#example-3-keep-or-ignore-the-empty-field-by-default)
- [Example 4: Dedup in consecutive document](#example-4-dedup-in-consecutive-document)
- [Limitation](#limitation)

### Description
Expand All @@ -18,12 +17,11 @@ Using `dedup` command to remove identical document defined by field from the sea
### Syntax

```sql
dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]
dedup [int] <field-list> [keepempty=<bool>]
```

* int: optional. The ``dedup`` command retains multiple events for each combination when you specify <int>. The number for <int> must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. **Default:** 1
* keepempty: optional. if true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false.
* consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. **Default:** false.
* field-list: mandatory. The comma-delimited field list. At least one field is required.


Expand Down Expand Up @@ -91,23 +89,6 @@ PPL query:
+------------------+-----------------------+


### Example 4: Dedup in consecutive document

The example show dedup the consecutive document.

PPL query:

os> source=accounts | dedup gender consecutive=true | fields account_number, gender;
fetched rows / total rows = 3/3
+------------------+----------+
| account_number | gender |
|------------------+----------|
| 1 | M |
| 13 | F |
| 18 | M |
+------------------+----------+


### Additional Examples

- `source = table | dedup a | fields a,b,c`
Expand All @@ -122,7 +103,6 @@ PPL query:
- `source = table | dedup 2 a,b | fields a,b,c`
- `source = table | dedup 2 a keepempty=true | fields a,b,c`
- `source = table | dedup 2 a,b keepempty=true | fields a,b,c`
- `source = table | dedup 1 a consecutive=true| fields a,b,c` (Consecutive deduplication is unsupported)

### Limitation:

Expand Down Expand Up @@ -154,4 +134,4 @@ Union
+- UnresolvedRelation
```

- this `dedup` command with `allowedDuplication > 1` feature needs spark version >= 3.4
- this `dedup` command with `allowedDuplication > 1` feature needs spark version >= 3.4
Loading