Install java 11 resources (for wsl) from local website download:

Solved my last problem:

E.g.:
sudo mkdir -p /var/cache/oracle-jdk11-installer-local
sudo cp jdk-11.0.4_linux-x64_bin.tar.gz /var/cache/oracle-jdk11-installer-local/
sha256sum mismatch jdk-11.0.13_linux-x64_bin.tar.gz
Oracle JDK 11 is NOT installed.
dpkg: error processing package oracle-java11-installer-local (--configure):
 installed oracle-java11-installer-local package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 oracle-java11-installer-local
E: Sub-process /usr/bin/dpkg returned an error code (1)

https://www.linuxquestions.org/questions/linux-software-2/sha256sum-mismatch-jdk-11-0-5_linux-x64_bin-tar-gz-4175664982/

Here are the steps:

Download the JDK: sudo apt install default-jdk sudo update

Download the JRE sudo apt install default-jre sudo update

Download a ppa: [Note: I've also tried ppa:webupd8team but it failed so I had to run sudo add-apt-repository -r ppa:webupd8team/java]

sudo apt install software-properties-common sudo add-apt-repository ppa:linuxuprising/java

Check if this installs, if not proceed: sudo apt install oracle-java11-installer

Make dir (for linux command to find the tar.gz file) sudo mkdir /var/cache/oracle-jdk11-installer-local/

Download: (jdk-11.0.15_linux-aarch64_bin.tar.gz) here --> https://www.oracle.com/au/java/technologies/javase/jdk11-archive-downloads.html

cd into your local Downloads dir and cp the tar.gz file sudo cp jdk-11.0.15_linux-x64_bin.tar.gz /var/cache/oracle-jdk11-installer-local/

sudo rm /var/lib/dpkg/info/oracle-java11-installer-local.postinst -f

sudo apt-get install oracle-java11-installer-local

java --version

Sources: Main: https://phoenixnap.com/kb/how-to-install-java-ubuntu https://www.linuxuprising.com/2019/06/new-oracle-java-11-installer-for-ubuntu.html

Get HADOOP working:

Follow the guide: https://webcms3.cse.unsw.edu.au/static/uploads/course/COMP9313/22T2/52663a121dc657d694c185569ddb6982f6d6a7394f3e5abcf151581f6617830b/Lab_1.pdf

Make sure you add the code snippets between the tags

!!! IMPORTANT !!!

Make sure the ssh server is running AND check that it is installed with: sudo apt list --installed | grep openssh-server Check if running: sudo service ssh status If you get an error or some response like: * sshd is not running Run the service: sudo service ssh start

If you get the error like here: https://stackoverflow.com/questions/68077905/getting-error-permission-denied-publickey-password-after-start-dfs-sh: Please run the commands:

sudo apt install ssh
# Only run below command if you are fine with overriding/don't have a RSA key
# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

SET THE ENV VARS (add in ~/.bashrc file):

export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

start-dfs.sh and stop-dfs.sh will run and stop hdfs

Run jps to make sure Hadoop has started correctly

Check the HADOOP server health, visit link:

http://localhost:9870/dfshealth.html#tab-overview

MapReduce:

Check out some commands by running: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar

Examples:

An example command which runs estimation of Pi with 16 maps and 10,000 samples: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar pi 16 10000
Search for all the strings starting with ‘dfs’ in the xml files hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar grep /user/comp9313/input /user/comp9313/output 'dfs[a-z.]+'
- Copy output files from DFS to local filesystem with command: hdfs dfs -get /user/comp9313/output output
- Check files cat output/* ...
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar wordcount <input_file> <output_file>
- Will need to delete our output file first, run hdfs dfs -rm output/* and then hdfs dfs -rmdir output OR hdfs dfs -rm -r output

YARN:

Using MRJob =:

Commands which are helpful. Create a file e.g. called mydata.txt with a bunch of lines which contains words

# Create a directory to contain input data
hdfs dfs -mkdir hdfs://localhost:9000/user/comp9313/input
# Move the local data file to the dfs
hdfs dfs -put mydata.txt hdfs://localhost:9000/user/comp9313/input/
# Run the wordcount project on that input data with -r hadoop specified
python3 wordcount.py -r hadoop hdfs://localhost:9000/user/comp9313/input/mydata.txt

Submitting stuff:

cp z5207998_proj1.zip ~/../mnt/c/Users/admin/Desktop/comp9313/

Setup SPARK / SCALA

Link: https://downloads.apache.org/spark/spark-3.3.0/ (We will need to wget https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz) Follow the guide here: https://kontext.tech/article/560/apache-spark-301-installation-on-linux-guide

After the installation (and outside the scala shell); run example run-example SparkPi 10 to see spark in action

Get SBT (bild and package scala app)

sudo apt-get update
sudo apt-get install apt-transport-https curl gnupg -yqq
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo -H gpg --no-default-keyring --keyring gnupg-ring:/etc/apt/trusted.gpg.d/scalasbt-release.gpg --import
sudo chmod 644 /etc/apt/trusted.gpg.d/scalasbt-release.gpg
sudo apt-get update
sudo apt-get install sbt

Source: https://www.scala-sbt.org/1.x/docs/Installing-sbt-on-Linux.html

Run the spark app:

NOTE: Remember to first put your files in the HDFS file system in 'project2' dir e.g. hdfs dfs -put tiny-doc.txt project2 Note: Followed this guide --> https://spark.apache.org/docs/latest/quick-start.html

First build the app with sbt package, then run:

$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.12/simple-project_2.12-1.0.jar

For our assignment, problem1:

spark-submit --class "Problem1" --master local[4] target/scala-2.12/problem-1_2.12-1.0.jar

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
21t3		21t3
experiments		experiments
labs		labs
project1		project1
project2		project2
spark		spark
week1		week1
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.md		README.md
SimilarNews.scala		SimilarNews.scala
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install java 11 resources (for wsl) from local website download:

Here are the steps:

Get HADOOP working:

!!! IMPORTANT !!!

SET THE ENV VARS (add in ~/.bashrc file):

Check the HADOOP server health, visit link:

MapReduce:

Examples:

YARN:

Using MRJob =:

Submitting stuff:

Setup SPARK / SCALA

Get SBT (bild and package scala app)

Run the spark app:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Install java 11 resources (for wsl) from local website download:

Here are the steps:

Get HADOOP working:

!!! IMPORTANT !!!

SET THE ENV VARS (add in ~/.bashrc file):

Check the HADOOP server health, visit link:

MapReduce:

Examples:

YARN:

Using MRJob =:

Submitting stuff:

Setup SPARK / SCALA

Get SBT (bild and package scala app)

Run the spark app:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages