Steps to Verify the One Box Setup of Hadoop

Steps to Verify the One Box Setup of Hadoop

  •         Start Virtual Box, Choose the machine you prepared in earlier step and click on the “Start” button ( green color ).

screen-shot-2016-12-09-at-11-08-22-am

  • If asked for please enter password ‘abcd1234’

screen-shot-2016-12-09-at-11-09-13-am

  • Click on the Ubuntu on the top-left corner and look for terminal and click on the terminal

screen-shot-2016-12-09-at-11-10-51-am

  • Once the terminal is up and running it should look similar to following –

screen-shot-2016-12-09-at-11-12-16-am

  • Login as hduser user.
    • su hduser
    • password ‘abcd1234’
  • Go to home directory and take a look on the directory presents
    • cd /home/hduser
    • ‘pwd’ command should show path as ‘/home/hduser’.
    • execute ‘ls -lart’ to take a look on the files and directory in general.
  • Start hadoop
    • cd /usr/local/hadoop/sbin/
    • ./start-all.sh
  • Confirm that serivce is running successfully or not
    • run ‘jps’ – you should see something similar to following –

screen-shot-2016-12-09-at-12-44-15-pm

  • Go to the cd /home/hduser/example/WordCount1/
  • Run command ‘ls’ – if there is a directory named ‘build’ please delete that and recreate the same directory. This step will ensure that your program does not uses precompiled jars and other files
    • rm -rf build
    • mkdir build
  • Set JAVA_HOME and update PATH
  • Build the example ( please make sure that when you copy – paste it does not leave any space between the command) –
    • javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common 2.6.0.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar -d build WordCount.java

screen-shot-2016-12-09-at-11-31-18-am

  • Create Jar –
    • jar -cvf wcount.jar -C build/ .
  • Now prepare the input for the program ( please give ‘output’ directory your own name – it should not be existing earlier )
    • Make your own input directory –
    • hadoop dfs -mkdir /user/hadoop/input
    • Copy the input files ( file1, file2, file3 ) to hdfs location
    • hadoop dfs -put file* /user/hduser/input
    • Check if the output directory already exists.
      • hadoop dfs -ls /user/hduser/output
      • In the below screen shot it already exists
    • If it already existing delete with the help of following command –
      • dfs -rm /user/hduser/output/*
      • hadoop dfs -rmdir /user/hduser/output
  • Run the program
    • hadoop jar wcount.jar org.myorg.WordCount /user/hduser/input/ /user/hduser/output
    • At the end you should see something similar –

screen-shot-2016-12-09-at-11-44-33-am

  • Check if the output files have been generated

screen-shot-2016-12-09-at-11-37-51-am

  • hadoop dfs -ls /user/hduser/output     – you should see something similar to below screenshot

screen-shot-2016-12-09-at-11-46-35-am

  • Get the contents of the output files –
    • hadoop dfs -cat /user/hduser/output/part-r-00000

screen-shot-2016-12-09-at-11-48-22-am

  • Verify the word count with the input files-
    • cat file1 file2 file3
    • The words count should match.