Notes

replace the following command with your desired s3 location in bootstrap_action.sh

aws s3 cp s3://<s3-bucket>/zeppelin-setup/resources/ zeppelinsetup --recursive

Push the folder setupZeppelin to your desired S3 location.

Tasks

Set Zeppelin to use S3 backed notebooks with Spark on Amazon EMR
Set Anaconda as default python interpreter in Zeppelin

Getting started

Make sure you have the resources before beginning:

AWS Command line interface installed
An SSH client
A key pair in the region where you'll launch the Zeppelin instance
An S3 bucket in same region to store your Zeppelin notebooks, and to transfer files from EMR to your Zeppelin instance
IAM permissions to create S3 buckets, launch EC2 instances, and create EMR clusters

Create an EMR cluster

The first step is to set up an EMR cluster.

On the Amazon EMR console, choose Create cluster.
Choose Go to advanced options and enter the following options:
1. Vendor: Amazon
2. We require Hadoop, Zeppelin, Ganglia, and Spark are selected.
3. In the Add steps section, for Step type, choose Custom JAR, and select configure.
  1. Change name to "custom bootstrap action"
  2. in jar location add s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar replace us-east-1 with the region in which you've created your EMR instance. The script runner allows you run a script at any time during the step process.
  3. In arguments add s3://<s3-bucket>/emr/master-post-init.sh.
Choose Add, Next.
On the Hardware Configuration page, select your VPC and the subnet where you want to launch the cluster, keep the default selection of one master and two core nodes of m4.xlarge, and choose Next.
On the General Options page, give your cluster a name (e.g., Spark-Cluster) and choose Next.
In Additional Options section, add a bootstrap action by selecting "Custom action" and setting s3://<s3-bucket>/emr/bootstrapaction.sh for the script location
On the Security Options page, for EC2 key pair, select a key pair. Keep all other settings at the default values and choose Create cluster.

Your three-node cluster takes a few moments to start up. Your cluster is ready when the cluster status is Waiting.

Discussion

Services on EMR use upstart

Note - services on EMR use upstart, and the supported way to restart them is to use sudo stop <service name>; sudo start <service name>(the start and stop commands are in /sbin, which is in the PATH by default).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resources		resources
README.md		README.md
bootstrapaction.sh		bootstrapaction.sh
master-post-init.sh		master-post-init.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notes

Tasks

Getting started

Create an EMR cluster

Discussion

Services on EMR use upstart

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

anujp31/aws-emr-script

Folders and files

Latest commit

History

Repository files navigation

Notes

Tasks

Getting started

Create an EMR cluster

Discussion

Services on EMR use upstart

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages