OpenSearch Geographic IP Location Data

Overview

OpenSearch has PPL functions for looking up the geographic location of IP addresses. In order to use these functions, a table needs to be created containing the geographic location information.

How to Create Geographic Location Index

A script has been created that can cleanup and augment a CSV file that contains geographic location information for IP addresses ranges. The CSV file is expected to have the following columns:

Column Name	Description
cidr	IP address subnet in format `IP_ADDRESS/NETMASK` (ex. `192.168.0.0/24`). IP address can be IPv4 or IPv6
country_iso_code	ISO code of the country where the IP address subnet is located
country_name	Name of the country where the IP address subnet is located
continent_name	Name of the continent where the IP address subent is located
region_iso_code	ISO code of the region where the IP address subnet is located
region_name	Name of the region where the IP address subnet is located
city_name	Name of the city where the IP address subnet is located
time_zone	Time zone where the IP address subnet is located
location	Latitude and longitude where the IP address subnet is located

The script will cleanup the data by splitting IP address subnets so that an IP address can only be in at most one subnet.

The data is augmented by adding 3 fields.

Column Name	Description
ip_range_start	An integer value used to determine if an IP address is in a subnet
ip_range_end	An integer value used to determine if an IP address is in a subnet
ipv4	A boolean value, `true` if the IP address subnet is in IPv4 format

Run the Script

Create a copy of the scala file load_geoip_data.scala
Edit the copy of the file load_geoip_data.scala There are three variables that need to be updated.
1. FILE_PATH_TO_INPUT_CSV - the full path to the CSV file to load
2. FILE_PATH_TO_OUTPUT_CSV - the full path of the CSV file to write the sanitized data to
3. TABLE_NAME - name of the index to create in OpenSearch. No table is created if this is null
Save the file
Run the Apache Spark CLI and connect to the database
Load the Scala script
```
:load FILENAME
```
Replace FILENAME with the full path to the Scala script.

Notes for EMR

With EMR it is necessary to load the data from an S3 object. Follow the instructions for Run the Script, but make sure that TABLE_NAME is set to null. Upload the FILE_PATH_TO_OUTPUT_CSV to S3.

End-to-End

How to download a sample data GeoIP location data set, clean it up and import it into a Spark table.

Use a web browser to download the data set Zip file
Extract the Zip file
Copy the file geolite2-City.csv to the computer where you run spark-shell
Copy the file file load_geoip_data.scala to the computer where you run spark-shell
Connect to the computer where you run spark-shell
Change to the directory containing geolite2-City.csv and load_geoip_data.scala

Update the load_geoip_data.scala file to specify the CSV files to read and write. Also update it to specify the Spark table to create (geo_ip_data in this case).

sed -i \
    -e "s#^var FILE_PATH_TO_INPUT_CSV: String =.*#var FILE_PATH_TO_INPUT_CSV: String = \"${PWD}/geolite2-City.csv\"#" \
    load_geoip_data.scala
sed -i \
    -e "s#^var FILE_PATH_TO_OUTPUT_CSV: String = .*#var FILE_PATH_TO_OUTPUT_CSV: String = \"${PWD}/geolite2-City-fixed.csv\"#" \
    load_geoip_data.scala
sed -i \
    -e 's#^var TABLE_NAME: String = .*#var TABLE_NAME: String = "geo_ip_data"#' \
    load_geoip_data.scala

Run spark-shell
```
spark-shell
```
Load and run the load_geoip_data.scala script
```
:load load_geoip_data.scala
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opensearch-geoip.md

opensearch-geoip.md

OpenSearch Geographic IP Location Data

Overview

How to Create Geographic Location Index

Run the Script

Notes for EMR

End-to-End

Files

opensearch-geoip.md

Latest commit

History

opensearch-geoip.md

File metadata and controls

OpenSearch Geographic IP Location Data

Overview

How to Create Geographic Location Index

Run the Script

Notes for EMR

End-to-End