Test purpose only!
All-in-one HDFS container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
-
mtsrus/hadoop:hadoop2.7.3-hdfs
-
mtsrus/hadoop:hadoop2-hdfs
- same as above -
mtsrus/hadoop:hadoop3.3.6-hdfs
-
mtsrus/hadoop:hadoop3-hdfs
- same as above
Minimal resources could start with are:
- 200m CPU
- 700Mb RAM
- 1Gb storage
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820
- HDFS IPC9870:9870
- WebHDFS
You can mount custom config files to /var/hadoop/conf
directory inside container to override default Hadoop configuration.
The following substitutions are replaced with proper values:
{{hostname}}
- current hostname
WAIT_TIMEOUT_SECONDS=120
- timeout in seconds after starting each service to check if it is alive
export HADOOP_HEAPSIZE=512
- max JVM memory in megabytes, applied for all Hadoop components (if no overrides)
If container fails with OutOfMemory
, you should increase this value, e.g. up to 1024
or 2048
.
export HADOOP_NAMENODE_OPTS=-Xmx2048m
- max JVM memory for Namenodeexport HADOOP_SECONDARYNAMENODE_OPTS=-Xmx2048m
- max JVM memory for Secondary Namenodeexport HADOOP_DATANODE_OPTS=-Xmx1024m
- max JVM memory for Datanode
All-in-one Yarn container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
- Yarn ResourceManager
- Yarn NodeManager
- MapReduce JobHistory server (if
WITH_JOBHISTORY_SERVER=true
)
-
mtsrus/hadoop:hadoop2.7.3-yarn
-
mtsrus/hadoop:hadoop2-yarn
- same as above -
mtsrus/hadoop:hadoop3.3.6-yarn
-
mtsrus/hadoop:hadoop3-yarn
- same as above
Minimal resources could start with are:
- 400m CPU
- 1Gb RAM
- 1Gb storage
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820
- HDFS IPC9870:9870
- HDFS WebHDFS8042:8042
- NodeManager UI8088:8088
- Yarn UI
if WITH_JOBHISTORY_SERVER=true
:
10020:10020
- MapReduce JobServer19888:19888
- MapReduce JobServer History
- /var/hadoop/conf/core-site.xml
- /var/hadoop/conf/hdfs-site.xml
- /var/hadoop/conf/yarn-site.xml
- /var/hadoop/conf/capacity-scheduler.xml
- /var/hadoop/conf/mapred-site.xml
You can mount custom config files to /var/hadoop/conf
directory inside container to override default Hadoop configuration.
The following substitutions are replaced with proper values:
{{hostname}}
- current hostname
WAIT_TIMEOUT_SECONDS=120
- ti_meout in seconds after starting each service to check if it is aliveWITH_JOBHISTORY_SERVER=false
- set totrue
to start MapReduce JobHistory server
See HDFS image documentation.
export YARN_RESOURCEMANAGER_OPTS=-Xmx1024m
- max JVM memory for Yarn ResourceManagerexport YARN_NODEMANAGER_OPTS=-Xmx1024m
- max JVM memory for NodeManagerexport HADOOP_JOB_HISTORYSERVER_OPTS=-Xmx1024m
- max JVM memory for MapReduce JobHistory server
All-in-one Hive container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
- Yarn ResourceManager
- Yarn NodeManager
- MapReduce JobHistory server
- Hive server
- Hive Metastore server
-
mtsrus/hadoop:hadoop2.7.3-hive2.3.10
-
mtsrus/hadoop:hadoop2-hive
- same as above -
mtsrus/hadoop:hadoop3.3.6-hive3.1.3
-
mtsrus/hadoop:hadoop3-hive
- same as above
Minimal resources could start with are:
- 500m CPU
- 2Gb RAM
- 1Gb storage
- Running RDBMS (e.g. Postgres) instance to operate Metastore
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820
- HDFS IPC9870:9870
- HDFS WebHDFS
if WITH_HIVE_SERVER=true
:
8042:8042
- NodeManager UI8088:8088
- Yarn UI19888:19888
- MapReduce JobServer History10000:10000
- Hive server10002:10002
- Hive Admin UI
if WITH_HIVE_METASTORE_SERVER=true
:
9083:9083
- Hive Metastore server
You can mount custom config files to /var/hive/conf
directory inside container to override default Hive configuration.
HDFS and Yarn configs still can be passed to var/hadoop/conf
directory.
The following substitutions are replaced with proper values:
{{hostname}}
- current hostname{{HIVE_METASTORE_DB_URL}}
-HIVE_METASTORE_DB_URL
env variable (defaultjdbc:postgresql://postgres:5432/metastore
){{HIVE_METASTORE_DB_DRIVER}}
-HIVE_METASTORE_DB_DRIVER
env variable (defaultorg.postgresql.Driver
){{HIVE_METASTORE_DB_USER}}
-HIVE_METASTORE_DB_USER
env variable (defaulthive
){{HIVE_METASTORE_DB_PASSWORD}}
-HIVE_METASTORE_DB_PASSWORD
env variable (defaulthive
)
Hive stores metadata in {{HIVE_METASTORE_DB_URL}}
using driver from {{HIVE_METASTORE_DB_DRIVER}}
. By default, Postgres is used.
You can change URL components by setting environment variables mentioned above, or replace the entire URL by updating the /var/hive/conf/hive-site.xml
file.
You can also use any other supported RDMBS, like MySQL, by changing connection URL and embedding/mounting JDBC driver to /opt/hive/lib/drivername.jar
path inside container. Postgres JDBC driver is already embedded into image.
WAIT_TIMEOUT_SECONDS=120
- timeout in seconds after starting each service to check if it is aliveWITH_HIVE_SERVER=true
- set tofalse
to disable Hive serverWITH_HIVE_METASTORE_SERVER=true
- set tofalse
to disable Hive metastore server
See HDFS image documentation.
See Yarn image documentation.
export HIVE_SERVER2_HEAPSIZE=256
- max JVM memory in megabytes for Hive serverexport HIVE_METASTORE_HEAPSIZE=256
- max JVM memory in megabytes for Hive metastore server
See https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/modify-the-memory-parameters-of-hive