This guide helps you install and run Apache Hadoop (Single Node) using Docker on Ubuntu.
- Ubuntu 20.04 / 22.04
- Docker installed
- Internet connection
- Basic terminal usage
sudo apt update
sudo apt install ca-certificates curl gnupg
# Add Docker's GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add Docker's repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Or restart your system for group changes to take effect.
docker network create hadoop
We are using the official BDE Hadoop image:
docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
docker run -itd \
--net hadoop \
--name hadoop-master \
-p 9870:9870 -p 9000:9000 \
-e CLUSTER_NAME=HadoopCluster \
bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
Open your browser:
http://localhost:9870
docker exec -it hadoop-master bash
# Check if directory exists
hdfs dfs -ls /
# Create directory (only if it doesn't exist)
hdfs dfs -mkdir /test
# Upload file
hdfs dfs -put /etc/hosts /test
# List files
hdfs dfs -ls /test
# Download file back to container FS
hdfs dfs -get /test/hosts /tmp/
cd $HADOOP_HOME
hdfs dfs -mkdir /input
hdfs dfs -put etc/hadoop/*.xml /input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output
hdfs dfs -cat /output/part-r-00000
docker stop hadoop-master
docker rm hadoop-master
docker rmi bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
docker network rm hadoop
hdfs dfs -rm -r /test
hdfs dfs -rm -r /input
hdfs dfs -rm -r /output
-
Explore Hive (SQL on Hadoop)
-
Add Spark to the cluster
-
Build real-time pipelines with Kafka + Hadoop
-
Use Hadoop with Jupyter + PySpark
Made by a beginner learning Big Data with Docker and Hadoop.
Tested on Ubuntu 22.04 with Docker 24+.