Skip to content

This repo sets up a fully functional Apache Hadoop single-node cluster using Docker on Ubuntu. It allows beginners to explore HDFS, run MapReduce jobs, and understand core Big Data concepts in a simplified and containerized environment — perfect for learning and testing.

Notifications You must be signed in to change notification settings

ronnie-allen/Hadoop-Single-Node-Setup-using-Docker-on-Ubuntu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Hadoop Single Node Setup using Docker on Ubuntu

This guide helps you install and run Apache Hadoop (Single Node) using Docker on Ubuntu.


Prerequisites

  • Ubuntu 20.04 / 22.04
  • Docker installed
  • Internet connection
  • Basic terminal usage

Step 1: Install Docker

sudo apt update
sudo apt install ca-certificates curl gnupg

# Add Docker's GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add Docker's repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Step 2: (Optional) Run Docker Without sudo

sudo usermod -aG docker $USER
newgrp docker

Or restart your system for group changes to take effect.


Step 3: Create a Docker Network

docker network create hadoop

Step 4: Pull the Hadoop Docker Image

We are using the official BDE Hadoop image:

docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8

Step 5: Run the Hadoop NameNode Container

docker run -itd \
--net hadoop \
--name hadoop-master \
-p 9870:9870 -p 9000:9000 \
-e CLUSTER_NAME=HadoopCluster \
bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8

Step 6: Access Hadoop Web Interface

Open your browser:

http://localhost:9870


Step 7: Interact with HDFS (Inside Container)

docker exec -it hadoop-master bash

Example HDFS Commands:

# Check if directory exists
hdfs dfs -ls /

# Create directory (only if it doesn't exist)
hdfs dfs -mkdir /test

# Upload file
hdfs dfs -put /etc/hosts /test

# List files
hdfs dfs -ls /test

# Download file back to container FS
hdfs dfs -get /test/hosts /tmp/

Step 8: Run a WordCount MapReduce Job

cd $HADOOP_HOME
hdfs dfs -mkdir /input
hdfs dfs -put etc/hadoop/*.xml /input

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output

hdfs dfs -cat /output/part-r-00000

Step 9: Stop and Remove Container/Image

docker stop hadoop-master
docker rm hadoop-master
docker rmi bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
docker network rm hadoop

Optional: Clean HDFS

hdfs dfs -rm -r /test
hdfs dfs -rm -r /input
hdfs dfs -rm -r /output

Further Learning

  • Explore Hive (SQL on Hadoop)

  • Add Spark to the cluster

  • Build real-time pipelines with Kafka + Hadoop

  • Use Hadoop with Jupyter + PySpark


Author

Made by a beginner learning Big Data with Docker and Hadoop.
Tested on Ubuntu 22.04 with Docker 24+.

About

This repo sets up a fully functional Apache Hadoop single-node cluster using Docker on Ubuntu. It allows beginners to explore HDFS, run MapReduce jobs, and understand core Big Data concepts in a simplified and containerized environment — perfect for learning and testing.

Topics

Resources

Stars

Watchers

Forks