Removed Hadoop

andkret · andkret · commit 87cf92ad8049 · 2023-01-12T10:39:04.000+01:00
Removed the whole Hadoop section. It's just not that important anymore
diff --git a/README.md b/README.md
@@ -130,13 +130,6 @@ If you look for the old PDF version it's [here](https://github.com/andkret/Cookb
     - [Scaling Up](sections/03-AdvancedSkills.md#scaling-up)
     - [Scaling Out](sections/03-AdvancedSkills.md#scaling-out)
     - [When not to Do Big Data](sections/03-AdvancedSkills.md#please-dont-go-big-data)
-- [Hadoop Platforms](sections/03-AdvancedSkills.md#hadoop-platforms)
-  - [What is Hadoop](sections/03-AdvancedSkills.md#what-is-hadoop)
-  - [What makes Hadoop so popular](sections/03-AdvancedSkills.md#what-makes-hadoop-so-popular)
-  - [Hadoop Ecosystem Components](sections/03-AdvancedSkills.md#hadoop-ecosystem-components)
-  - [Hadoop is Everywhere?](sections/03-AdvancedSkills.md#hadoop-is-everywhere)
-  - [Should You Learn Hadoop?](sections/03-AdvancedSkills.md#should-you-learn-hadoop)
-  - [How to Select Hadoop Cluster Hardware](sections/03-AdvancedSkills.md#how-to-select-hadoop-cluster-hardware)
 - [Connect](sections/03-AdvancedSkills.md#connect)
   - [REST APIs](sections/03-AdvancedSkills.md#rest-apis)
     - [API Design](sections/03-AdvancedSkills.md#api-design)
diff --git a/sections/03-AdvancedSkills.md b/sections/03-AdvancedSkills.md
@@ -14,13 +14,6 @@ Advanced Data Engineering Skills
     - [Scaling Up](03-AdvancedSkills.md#scaling-up)
     - [Scaling Out](03-AdvancedSkills.md#scaling-out)
     - [When not to Do Big Data](03-AdvancedSkills.md#please-dont-go-big-data)
-- [Hadoop Platforms](03-AdvancedSkills.md#hadoop-platforms)
-  - [What is Hadoop](03-AdvancedSkills.md#what-is-hadoop)
-  - [What makes Hadoop so popular](03-AdvancedSkills.md#what-makes-hadoop-so-popular)
-  - [Hadoop Ecosystem Components](03-AdvancedSkills.md#hadoop-ecosystem-components)
-  - [Hadoop is Everywhere?](03-AdvancedSkills.md#hadoop-is-everywhere)
-  - [Should You Learn Hadoop?](03-AdvancedSkills.md#should-you-learn-hadoop)
-  - [How to Select Hadoop Cluster Hardware](03-AdvancedSkills.md#how-to-select-hadoop-cluster-hardware)
 - [Connect](03-AdvancedSkills.md#connect)
   - [REST APIs](03-AdvancedSkills.md#rest-apis)
     - [API Design](03-AdvancedSkills.md#api-design)
@@ -340,150 +333,6 @@ If you don't need it it's making absolutely no sense at all!
 On the other side: If you really need big data tools they will save your
 ass :)
 
-## Hadoop Platforms
-
-When people talk about big data, one of the first things come to mind is
-Hadoop. Google's search for Hadoop returns about 28 million results.
-
-It seems like you need Hadoop to do big data. Today I am going to shed
-light onto why Hadoop is so trendy.
-
-You will see that Hadoop has evolved from a platform into an ecosystem.
-Its design allows a lot of Apache projects and 3rd party tools to
-benefit from Hadoop.
-
-I will conclude with my opinion on, if you need to learn Hadoop and if
-Hadoop is the right technology for everybody.
-
-### What is Hadoop
-
-Hadoop is a platform for distributed storing and analyzing of very large
-data sets.
-
-Hadoop has four main modules: Hadoop common, HDFS, MapReduce and YARN.
-The way these modules are woven together is what makes Hadoop so
-successful.
-
-The Hadoop common libraries and functions are working in the background.
-That's why I will not go further into them. They are mainly there to
-support Hadoop's modules.
-
-| Podcast Episode: #060 What Is Hadoop And Is Hadoop Still Relevant In 2019?
-|------------------|
-|An introduction into Hadoop HDFS, YARN and MapReduce. Yes, Hadoop is still relevant in 2019 even if you look into serverless tools.
-| [Watch on YouTube](https://youtu.be/8AWaht3YQgo) \ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/060-What-Is-Hadoop-And-Is-Hadoop-Still-Relevant-In-2019-e45ijp)|
-
-
-### What makes Hadoop so popular?
-
-Storing and analyzing data as large as you want is nice. But what makes
-Hadoop so popular?
-
-Hadoop's core functionality is the driver of Hadoop's adoption. Many
-Apache side projects use it's core functions.
-
-Because of all those side projects Hadoop has turned more into an
-ecosystem. An ecosystem for storing and processing big data.
-
-To better visualize this eco system I have drawn you the following
-graphic. It shows some projects of the Hadoop ecosystem who are closely
-connected with the Hadoop.
-
-It is not a complete list. There are many more tools that even I don't
-know. Maybe I am drawing a complete map in the future.
-
-![Hadoop Ecosystem Components](/images/Hadoop-Ecosystem.jpg)
-
-### Hadoop Ecosystem Components
-
-Remember my big data platform blueprint? The blueprint has four stages:
-Ingest, store, analyse and display.
-
-Because of the Hadoop ecosystem the different tools in these stages can
-work together perfectly.
-
-Here's an example:
-
-![Connections between tools](/images/Hadoop-Ecosystem-Connections.jpg)
-
-You use Apache Kafka to ingest data, and store it in the HDFS. You do
-the analytics with Apache Spark and as a backend for the display you
-store data in Apache HBase.
-
-To have a working system you also need YARN for resource management. You
-also need Zookeeper, a configuration management service to use Kafka and
-HBase
-
-As you can see in the picture below each project is closely connected to
-the other.
-
-Spark for instance, can directly access Kafka to consume messages. It is
-able to access HDFS for storing or processing stored data.
-
-It also can write into HBase to push analytics results to the front end.
-
-The cool thing of such ecosystem is that it is easy to build in new
-functions.
-
-Want to store data from Kafka directly into HDFS without using Spark?
-
-No problem, there is a project for that. Apache Flume has interfaces for
-Kafka and HDFS.
-
-It can act as an agent to consume messages from Kafka and store them
-into HDFS. You even do not have to worry about Flume resource
-management.
-
-Flume can use Hadoop's YARN resource manager out of the box.
-
-![Flume Integration](/images/Hadoop-Ecosystem-Connections-Flume.jpg)
-
-### Hadoop Is Everywhere?
-
-Although Hadoop is so popular it is not the silver bullet. It isn't the
-tool that you should use for everything.
-
-Often times it does not make sense to deploy a Hadoop cluster, because
-it can be overkill. Hadoop does not run on a single server.
-
-You basically need at least five servers, better six to run a small
-cluster. Because of that. the initial platform costs are quite high.
-
-One option you have is to use a specialized systems like Cassandra,
-MongoDB or other NoSQL DB's for storage. Or you move to Amazon and use
-Amazon's Simple Storage Service, or S3.
-
-Guess what the tech behind S3 is. Yes, HDFS. That's why AWS also has the
-equivalent to MapReduce named Elastic MapReduce.
-
-The great thing about S3 is that you can start very small. When your
-system grows you don't have to worry about S3's server scaling.
-
-### Should you learn Hadoop?
-
-Yes, I definitely recommend you to get to know how Hadoop works and how
-to use it. As I have shown you in this article, the ecosystem is quite
-large.
-
-Many big data projects use Hadoop or can interface with it. That's why
-it is generally a good idea to know as many big data technologies as
-possible.
-
-Not in depth, but to the point that you know how they work and how you
-can use them. Your main goal should be to be able to hit the ground
-running when you join a big data project.
-
-Plus, most of the technologies are open source. You can try them out for
-free.
-
-### How does a Hadoop System architecture look like
-
-### What tools are usually in a with Hadoop Cluster
-
-Yarn Zookeeper HDFS Oozie Flume Hive
-
-### How to select Hadoop Cluster Hardware
-
 
 ## Connect