Skip to content
Harshad Ranganathan edited this page Jan 21, 2020 · 2 revisions

aws-emr-launcher package is dependent on pyyaml to read configuration files.

Install pyyaml package using pip in your virtual environment.

pip install pyyaml==5.2

Install aws-emr-launcher package using pip in your virtual environment.

pip install aws-emr-launcher

Add your configuration files to your project. Let's assume below is your project structure -


   │-- launcher

           │-- config


                   │-- configurations.json

                   │-- default.yaml

                   │-- prod.yaml



In your script import trigger_data_load function and pass in the configuration file paths and placeholder values to launch the EMR cluster.

import os
from emrlauncher.data_loader import trigger_data_load

__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))

if __name__ == "__main__":
    # regions in which the cluster needs to be launched
    regions = ["us-east-1"]

    # load configuration files
    cluster_config_path = os.path.abspath(
        os.path.join(__location__, "conf/configurations.json")
    default_config_path = os.path.abspath(
        os.path.join(__location__, "conf/default.yaml")
    env_config_path = os.path.abspath(os.path.join(__location__, "conf/prod.yaml"))

    # placeholder values
    input_vars = {"ENVIRONMENT": "prod", "VERSION": "1.0"}

    # launch EMR cluster

AWS Credentials

If you are running this package as part of your lambda function or in an EC2 instance then it will use the role attached to provision EMR clusters.

Otherwise, you have to configure the AWS credentials in your environment variables.

Clone this wiki locally