diff --git a/_field-types/mapping-parameters/doc-values.md b/_field-types/mapping-parameters/doc-values.md index 620a69e08a..4ef543f87d 100644 --- a/_field-types/mapping-parameters/doc-values.md +++ b/_field-types/mapping-parameters/doc-values.md @@ -8,36 +8,29 @@ has_children: false has_toc: false --- -# doc_values +# Doc values -By default, OpenSearch indexes most fields for search purposes. The `doc_values ` parameter enables document-to-term lookups for operations such as sorting, aggregations, and scripting. +By default, most fields are indexed and searchable using the inverted index. The inverted index works by storing a unique sorted list of terms and mapping each term to the documents that contain it. -The `doc_values` parameter accepts the following options. +Sorting, aggregations, and field access in scripts, however, require a different approach. Instead of finding documents from terms, these operations need to retrieve terms from specific documents. -Option | Description -:--- | :--- -`true` | Enables `doc_values` for the field. Default is `true`. -`false` | Disables `doc_values` for the field. +Doc values make these operations possible. They are an on-disk, column-oriented data structure created at index time. Although they store the same values as the `_source` field, their format is optimized for fast sorting and aggregations. -The `doc_values` parameter is not supported for use in text fields. +Doc values are enabled by default on nearly all field types, except for `text` fields. If you know that a field won't be used for sorting, aggregations, or scripting, you can disable doc values in order to reduce disk usage. -For more information on using `doc_values` with the `index` parameter, see [The `index` and `doc_values` parameters compared]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/index-parameter/#the-index-and-doc-values-parameters-compared). +## Example ---- - -## Example: Creating an index with `doc_values` enabled and disabled - -The following example request creates an index with `doc_values` enabled for one field and disabled for another: +To understand how `doc_values` affect fields, create a sample index. In this index, the `status_code` field has `doc_values` enabled by default, allowing it to support sorting and aggregations. The `session_id` field has `doc_values` disabled, so it does not support sorting or aggregations but can still be queried: ```json -PUT my-index-001 +PUT /web_analytics { "mappings": { "properties": { - "status_code": { + "status_code": { "type": "keyword" }, - "session_id": { + "session_id": { "type": "keyword", "doc_values": false } @@ -46,3 +39,108 @@ PUT my-index-001 } ``` {% include copy-curl.html %} + +Add some sample data to the index: + +```json +PUT /web_analytics/_doc/1 +{ + "status_code": "200", + "session_id": "abc123" +} +``` +{% include copy-curl.html %} + +```json +PUT /web_analytics/_doc/2 +{ + "status_code": "404", + "session_id": "def456" +} +``` +{% include copy-curl.html %} + +```json +PUT /web_analytics/_doc/3 +{ + "status_code": "200", + "session_id": "ghi789" +} +``` +{% include copy-curl.html %} + +Perform an aggregation on the `status_code` field: + +```json +GET /web_analytics/_search +{ + "size": 0, + "aggs": { + "status_codes": { + "terms": { + "field": "status_code" + } + } + } +} +``` +{% include copy-curl.html %} + +This aggregation returns correct results because `status_code` has `doc_values` enabled: + +```json +{ + "took": 37, + "timed_out": false, + "terminated_early": true, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "status_codes": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "200", + "doc_count": 2 + }, + { + "key": "404", + "doc_count": 1 + } + ] + } + } +} +``` + +Attempt to aggregate on the `session_id` field: + +```json +GET /web_analytics/_search +{ + "size": 0, + "aggs": { + "session_counts": { + "terms": { + "field": "session_id" + } + } + } +} +``` +{% include copy-curl.html %} + +This aggregation fails because `session_id` has `doc_values` disabled, preventing the document-to-field lookup required for aggregations.