-
Notifications
You must be signed in to change notification settings - Fork 0
Usage
aws-emr-launcher
package is dependent on pyyaml
to read configuration files.
Install pyyaml
package using pip in your virtual environment.
pip install pyyaml==5.2
Install aws-emr-launcher
package using pip in your virtual environment.
pip install aws-emr-launcher
Add your configuration files to your project. Let's assume below is your project structure -
aws-emr-launcher-example
│-- launcher
│-- config
│-- __init__.py
│-- configurations.json
│-- default.yaml
│-- prod.yaml
│-- __init__.py
│-- launcher.py
In your script import trigger_data_load
function and pass in the configuration file paths and placeholder values to launch the EMR cluster.
import os
from emrlauncher.data_loader import trigger_data_load
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
if __name__ == "__main__":
# regions in which the cluster needs to be launched
regions = ["us-east-1"]
# load configuration files
cluster_config_path = os.path.abspath(
os.path.join(__location__, "conf/configurations.json")
)
default_config_path = os.path.abspath(
os.path.join(__location__, "conf/default.yaml")
)
env_config_path = os.path.abspath(os.path.join(__location__, "conf/prod.yaml"))
# placeholder values
input_vars = {"ENVIRONMENT": "prod", "VERSION": "1.0"}
# launch EMR cluster
trigger_data_load(
regions=regions,
cluster_config_path=cluster_config_path,
default_config_path=default_config_path,
env_config_path=env_config_path,
input_vars=input_vars,
)
If you are running this package as part of your lambda function or in an EC2 instance then it will use the role attached to provision EMR clusters.
Otherwise, you have to configure the AWS credentials in your environment variables.