Skip to content

KVStore Tools #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
9f689aa
init kv features
arcsector Mar 18, 2023
2e10ee3
README and post_install changes
arcsector Mar 21, 2023
ce2c80a
Fixes for auth and disable
arcsector Mar 23, 2023
e30dc3a
Additional KVstore helpers and tasks
arcsector Mar 23, 2023
5b71f97
added a login task and included in kvstore related tasks
dtwersky Mar 29, 2023
abd4739
fixed missing tick in README.md
dtwersky Mar 29, 2023
2fe810b
fixed another typo in README.md
dtwersky Mar 29, 2023
1fbac54
become_user to splunk for login
dtwersky Mar 29, 2023
3263c35
kvstore tools fixes
arcsector Mar 31, 2023
099913b
become and changed_when:false for Get current SHCluster captain
dtwersky Mar 31, 2023
0abf12a
become and checked_when:false for Get current KVStore captain
dtwersky Mar 31, 2023
340a3b7
Using version var & cleaning upgrade conditionals
arcsector Apr 3, 2023
b7272dd
Merge branch 'feat-kv-migration' of github.com:arcsector/ansible-role…
arcsector Apr 3, 2023
360e7e4
created block for task. added become to whole block
dtwersky Apr 3, 2023
38a5c77
fixed splunk_authenticated typo. replaced command with shell
dtwersky Apr 3, 2023
ffe30c4
default values for when kvstore-status doesn't return serverVersion
arcsector Apr 28, 2023
dcaaa7f
Change Oplog size based on support recommendations
arcsector Mar 15, 2024
5061198
Check current oplog size against requested oplog size
arcsector Mar 19, 2024
2fa341d
auth for statuses
arcsector Mar 20, 2024
3f8d4d2
documenting oplog kv task
arcsector Jan 23, 2025
483f335
bring up-to-date with master
arcsector Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,12 @@ This section contains additional reference documentation.

Note: Any task with an **adhoc** prefix means that it can be used independently as a `deployment_task` in a playbook. You can use the tasks to resolve various Splunk problems or perform one-time activities, such as decommissioning an indexer from an indexer cluster.

- **adhoc_backup_kvstore.yml** - Backup your KVStore to a given point - use the var `archive_name` to specify a tar name other than the default.
- **adhoc_clean_dispatch.yml** - This task is intended to be used for restoring service to search heads should the dispatch directory become full. You should not need to use this task in a healthy environment, but it is at your disposal should the need arise. The task will stop splunk, remove all files in the dispatch directory, and then start splunk.
- **adhoc_clean_kvstore.yml** - Cleans the KVStore from all data, allowing it to pull the latest data from the KVStore captain - usually done when a KVStore is down, but Splunkd is still running fine.
- **adhoc_configure_hostname** - Configure a Splunk server's hostname using the value from inventory_hostname. It configures the system hostname, serverName in server.conf and host in inputs.conf. All Splunk configuration changes are made using the ini_file module, which will preserve any other existing configurations that may exist in server.conf and/or inputs.conf.
- **adhoc_decom_indexer.yml** - Executes a splunk offline --enforce-counts command. This is useful when decommissioning one or more indexers from an indexer cluster.
- **adhoc_destructive_resync_kvstore.yml** - Removes an SH Member from the cluster, cleans it's KVStore, then puts it back into the cluster. Usually used when SH Bundle and KV Bundle are out of sync for longer than a few hours.
- **adhoc_fix_mongo.yml** - Use when Splunk is in a stopped state to fix mongodb/kvstore issues. This task ensures that permissions are set correctly on mongo's splunk.key file and deletes mongod.lock if it exists.
- **adhoc_fix_server_certificate.yml** - Use to delete an expired server.pem and generate a new one (default certs). Useful if your server.pem certificate has expired and you are using Splunk's default certificate for splunkd. Note that default certificates present a security risk and that their use should be avoided, if possible.
- **adhoc_kill_splunkd.yml** - Some releases of Splunk have a "feature" that leaves zombie splunkd processes after a 'splunk stop'. Use this task after a 'splunk stop' to make sure that it's really stopped. Useful for upgrades on some of the 7.x releases, and automatically called by the upgrade_splunk.yml task.
Expand All @@ -141,6 +144,7 @@ Note: Any task with an **adhoc** prefix means that it can be used independently
- **configure_idxc_manager.yml** - Configures a Splunk host to act as a manager node using `splunk_idxc_rf`, `splunk_idxc_sf`, `splunk_idxc_key`, and `splunk_idxc_label`.
- **configure_idxc_member.yml** - Configures a Splunk host as an indexer cluster member using `splunk_uri_cm`, `splunk_idxc_rep_port`, and `splunk_idxc_key`.
- **configure_idxc_sh.yml** - Configures a search head to join an existing indexer cluster using `splunk_uri_cm` and `splunk_idxc_key`.
- **configure_kvstore.yml** - Disables KVStore when disabled by `splunk_enable_kvstore` and sets vars related to KVStore in `server.conf` configured in the defaults, like `splunk_kvstore_storage` and `splunk_oplog_size`
- **configure_license.yml** - Configure the license group to the `splunk_license_group` variable defined. Default is `Trial`. Available values are "Trial, Free, Enterprise, Forwarder, Manager or Peer. If set to `Peer`, the `splunk_uri_lm` must be defined. Note: This could also be accomplished using configure_apps.yml with a git repository.
- **configure_os.yml** - Increases ulimits for the splunk user and disables Transparent Huge Pages (THP) per Splunk implementation best practices.
- **configure_serverclass.yml** - Generates a new serverclass.conf file from the serverclass.conf.j2 template and installs it to $SPLUNK_HOME/etc/system/local/serverclass.conf.
Expand All @@ -156,10 +160,13 @@ Note: Any task with an **adhoc** prefix means that it can be used independently
You can set if the download/unarchive process uses the Ansible host or if each host downloads and unarchives the package individually by setting `splunk_download_local`.
Default is `true` which will download the package to the Ansible host once and unarchive to each host from there.
If set to `false` the package will be downloaded and unarchived to each host individually. Immediately after unarchive the package will be removed from the host.
- **get_kvstore_captain.yml** - Gets the current captain in the KVStore cluster.
- **get_shcluster_captain.yml** - Gets the current captain in the SHCluster.
- **install_apps.yml** - *Do not call install_apps.yml directly! Use configure_apps.yml* - Called by configure_apps.yml to perform app installation on the Splunk host.
- **install_splunk.yml** - *Do not call install_splunk.yml directly! Use check_splunk.yml* - Called by check_splunk.yml to install/upgrade Splunk and Splunk Universal Forwarders, as well as perform any initial configurations. This task is called by check_splunk.yml when the check determines that Splunk is not currently installed. This task will create the splunk user and splunk group, configure the bash profile for the splunk user (by calling configure_bash.yml), configure THP and ulimits (by calling configure_os.ym), download and install the appropriate Splunk package (by calling download_and_unarchive.yml), configure a common splunk.secret (by calling configure_splunk_secret.yml, if configure_secret is defined), create a deploymentclient.conf file with the splunk_ds_uri and clientName (by calling configure_deploymentclient.yml, if clientName is defined), install a user-seed.conf with a prehashed admin password (if used_seed is defined), and will then call the post_install.yml task. See post_install.yml entry for details on post-installation tasks.
- **install_utilities.yml** - Installs Linux packages that are useful for troubleshooting Splunk-related issues when `install_utilities: true` and `linux_packages` is defined with a list of packages to install.
- **configure_dmesg.yml** - Some distros restrict access to read `dmesg` for non-root users. This allows the `splunk` user to run the `dmesg` command. Defaults to `false`.
- **kvstore_upgrade.yml** - Upgrades a KVStore storage backend and/or server version on either a single or distributed instance.
- **main.yml** - This is the main task that will always be called when executing this role. This task sets the appropriate variables for full vs uf packages, sends a Slack notification about the play if the slack_token and slack_channel are defined, checks the current boot-start configuration to determine if it's in the expected state, and then includes the task from the role to execute against, as defined by the value of the deployment_task variable. The deployment_task variable should be defined in your playbook(s). Refer to the included example playbooks to see this in action.
- **post_install.yml** - Executes post-installation tasks. Performs a touch on the .ui_login file which disables the first-time login prompt to change your password, ensures that `splunk_home` is owned by the correct user and group, and optionally configures three scripts to: cleanup crash logs and old diags (by calling add_crashlog_script.yml and add_diag_script.yml, respectively), and a pstack generation shell script for troubleshooting purposes (by calling add_pstack_script.yml). This task will install various Linux troubleshooting utilities (by calling install_utilities.yml) when `install_utilities: true`.
- **set_maintenance_mode.yml** - Enables or disables maintenance mode on a cluster manager. Intended to be called by playbooks for indexer cluster upgrades/maintenance. Requires the `state` variable to be defined. Valid values: enabled, disabled
Expand Down
4 changes: 4 additions & 0 deletions roles/splunk/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ splunk_shc_target_group: shc
splunk_shc_deployer: "{{ groups['shdeployer'] | first }}" # If you manage multiple SHCs, configure the var value in group_vars
splunk_shc_uri_list: "{% for h in groups[splunk_shc_target_group] %}https://{{ hostvars[h].ansible_fqdn }}:{{ splunkd_port }}{% if not loop.last %},{% endif %}{% endfor %}" # If you manage multiple SHCs, configure the var value in group_vars
start_splunk_handler_fired: false # Do not change; used to prevent unnecessary splunk restarts
splunk_enable_kvstore: true
splunk_kvstore_storage: undefined # Can be defined here or at the group_vars level - accepted values: "wiredTiger" or "undefined", which leaves as default
splunk_kvstore_version: undefined # Can be defined here or at the group_vars level - accepted values: 4.2 or "undefined", which leaves as default1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this variable used either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, the splunk_kvstore_version is unused - I'll add it to the conditionals for the bottom of the upgrade procedure

splunk_oplog_size: 1000 # Default for Splunk Enterprise - should be changed at the group_vars level only at the behest of Splunk support with special care taken
# Linux and scripting related vars
add_crashlog_script: false # Set to true to install a script and cron job to automatically cleanup splunk crash logs older than 7 days
add_diag_script: false # Set to true to install a script and cron job to automatically cleanup splunk diag files older than 30 days
Expand Down
15 changes: 15 additions & 0 deletions roles/splunk/tasks/adhoc_backup_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
- name: Backup KVStore on desired host
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} backup kvstore {{ archive_name | default("") }}
become: true
become_user: "{{ splunk_nix_user }}"
register: splunk_kvstore_backup_out
changed_when: splunk_kvstore_backup_out.rc == 0
failed_when: splunk_kvstore_backup_out.rc != 0

- name: Check that backup has finished
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} show kvstore-status | grep backupRestoreStatus | sed -r 's/\s+backupRestoreStatus : //g'
register: splunk_kvstore_status_out
until: "{{ splunk_kvstore_status_out.stdout }} == 'Ready'"
14 changes: 14 additions & 0 deletions roles/splunk/tasks/adhoc_clean_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
- name: Stop Splunkd service
include_tasks: splunk_stop.yml

- name: Clean KVStore
ansible.builtin.command: "{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} clean kvstore --local --answer-yes"
become: true
become_user: "{{ splunk_nix_user }}"
register: clean_result
changed_when: clean_result.rc == 0
failed_when: clean_result.rc != 0
notify:
- start splunk
no_log: true
22 changes: 22 additions & 0 deletions roles/splunk/tasks/adhoc_destructive_resync_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
# We have to do this first so that we store the captain before removing from the cluster
- name: Get SHCluster captain
include_tasks: get_shcluster_captain.yml

- name: Remove SHCluster member
ansible.builtin.command: "{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} remove shcluster-member"
register: splunk_remove_shcluster_member
changed_when: splunk_remove_shcluster_member.rc == 0
failed_when: splunk_remove_shcluster_member.rc != 0
no_log: true

- name: Clean KVStore
include_tasks: adhoc_clean_kvstore.yml

- name: Add SHCluster member from current member
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} add shcluster-member -current_member_uri {{ splunk_shc_captain }}
register: splunk_remove_shcluster_member
changed_when: splunk_remove_shcluster_member.rc == 0
failed_when: splunk_remove_shcluster_member.rc != 0
no_log: true
28 changes: 28 additions & 0 deletions roles/splunk/tasks/configure_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
- name: Disable KVStore if specified
include_tasks: disable_kvstore.yml
when: not splunk_enable_kvstore

- name: Configure initial KVStore storage engine in server.conf
community.general.ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: storageEngine
value: "{{ splunk_kvstore_storage }}"
owner: "{{ splunk_nix_user }}"
group: "{{ splunk_nix_group }}"
mode: 0644
become: true
when:
- splunk_kvstore_storage == "wiredTiger"
- splunk_enable_kvstore

- name: Configure initial KVStore oplog size in server.conf
community.general.ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: oplogSize
value: "{{ splunk_oplog_size }}"
become: true
become_user: "{{ splunk_nix_user }}"
when: splunk_enable_kvstore
6 changes: 4 additions & 2 deletions roles/splunk/tasks/disable_kvstore.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
---
- name: Disable KVStore
when: ansible_system == "Linux"
when:
- ansible_system == "Linux"
- not splunk_enable_kvstore
ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: disabled
value: "true"
become: True
become_user: "{{ splunk_nix_user }}"
become_user: "{{ splunk_nix_user }}"
12 changes: 12 additions & 0 deletions roles/splunk/tasks/get_kvstore_captain.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
# Gets KVStore captain hostname - like splunk_captain.domain.com
- name: Get current KVStore captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} show kvstore-status | grep -B10 "KV store captain" | grep "hostAndPort" | sed -r 's/\s+hostAndPort : //g' | sed -r 's/:[0-9]+//g'
register: splunk_get_kvcaptain
changed_when: splunk_get_kvcaptain.rc == 0
failed_when: splunk_get_kvcaptain.rc != 0

- name: Register KVStore captain fact
ansible.builtin.set_fact:
splunk_kv_captain: "{{ splunk_get_kvcaptain.stdout }}"
12 changes: 12 additions & 0 deletions roles/splunk/tasks/get_shcluster_captain.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
# Gets SHC captain management uri - like https://splunk_captain.example:8089
- name: Get current SHCluster captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk -auth {{ splunk_auth }} show shcluster-status | grep -A6 Captain | grep mgmt_uri | sed -r 's/\s+mgmt_uri : //g'
register: splunk_get_shcaptain
changed_when: splunk_get_shcaptain.rc == 0
failed_when: splunk_get_shcaptain.rc != 0

- name: Register SHCluster captain fact
ansible.builtin.set_fact:
splunk_shc_captain: "{{ splunk_get_shcaptain.stdout }}"
Loading