Skip to content

Integration Amazon EMR #81

@igorgatis

Description

@igorgatis

Sounds like all that's needed is a new backend to talks to s3 file system and EMR jobflow control (via boto API).

Essential features:

  • Read input from and write output to S3.
  • Create new jobflow or reuse existing one.
  • Options to specify number of instance and their types (e.g. m1.medium)

Nice to have:

  • Automatic upload of local input files to S3.
  • Change number of workers instances.
  • Support to spot instances
  • Resource estimator for future runs (e.g. try with a sample, figure how long it will take for the full thing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions