A hands-on DevOps project showcasing end-to-end monitoring, alerting, and observability for modern infrastructure and applications. This project demonstrates how to design, deploy, and manage a production-grade monitoring stack using tools like Prometheus, Grafana, Helm integrated with CI/CD pipelines and cloud-native environments (Kubernetes).
- Screenshots
- Tech Stack
- Prerequisites
- Quick Start
- Documentation
- Features
- Tasks (automation)
- Roadmap
- License
- Contributing
- Contact
List of tools used in the project
This project uses Devbox to manage the development environment. Devbox provides a consistent, isolated environment with all the necessary CLI tools pre-installed.
-
Install Docker
- Follow the installation instructions for your operating system.
The rest of the tools are already installed in the devbox environment
-
Install Devbox
- Follow the installation instructions for your operating system.
-
Clone the Repository
git clone https://github.com/sean-njela/k8s_monitoring.git cd k8s_monitoring
-
Start the Devbox Environment and poetry environment
devbox shell # Start the devbox environment (this will also start the poetry environment) poetry install # Install dependencies poetry env activate # use the output to activate the poetry environment ( ONLY IF DEVBOX DOES NOT ACTIVATE THE ENVIRONMENT)
Note - The first time you run
devbox shell
, it will take a few minutes to install the necessary tools. But after that it will be much faster.
task setup
task status # check if everything is running
# GIVE EVERYTHING A MINUTE TO SETUP THEN
task dev # prints garafana password and port forwards the service
Everything ran well if you see the following output:
> task status
task: [status] kubectl get all -n k8s-monitoring-ns
[status] NAME READY STATUS RESTARTS AGE
[status] pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0
6h26m
[status] pod/log-generator-7d6d496f9-8jkkg 1/1 Running 0
107m
[status] pod/loki-0 2/2 Running 0
4h30m
[status] pod/loki-canary-bcplv 1/1 Running 0
4h53m
[status] pod/loki-gateway-59548bddcd-w24bt 1/1 Running 0
4h53m
[status] pod/prometheus-demo-app-f5d79d7f5-2v4rk 1/1 Running 0
6h27m
[status] pod/prometheus-grafana-674cf8cb44-48pnf 3/3 Running 0
6h27m
[status] pod/prometheus-kube-prometheus-operator-5cdddd9b5-xqnmg 1/1 Running 1 (150m ago) 6h27m
[status] pod/prometheus-kube-state-metrics-7c5fb9d798-xnhhg 1/1 Running 1 (150m ago) 6h27m
[status] pod/prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 1 (149m ago) 6h26m
[status] pod/prometheus-prometheus-node-exporter-d5hz6 1/1 Running 0
6h27m
[status] pod/promtail-75845fb6f9-bhwdk 1/1 Running 0
107m
Then visit localhost:3002 to access the Grafana UI.
For full documentation, setup instructions, and architecture details, visit the docs or run:
task docs
Docs available at: http://127.0.0.1:8000/
- Metrics Collection & Visualization – real-time system, application, and container insights
- Reliability & Scalability – designing a monitoring stack built for production
- AlertManager Alerting & Incident Response – proactive notifications via Slack/Email/PagerDuty
This project is designed for a simple, one-command setup. All necessary actions are orchestrated through
Taskfile.yaml
.
task setup # setup the environment
task dev # automated local provisioning
task cleanup-dev # cleanup the dev environment
The Taskfile.gitflow.yaml
provides a structured Git workflow using Git Flow. This helps in managing features, releases, and hotfixes in a standardized way. To run these tasks just its the same as running any other task. Using gitflow is optional.
task init # Initialize Git Flow with 'main', gh-pages and 'develop'
task sync # Sync current branch with latest 'develop' and handle main updates
task release:finish # Finishes and publishes a release (merges, tags, pushes). e.g task release:finish version="1.2.0"
To see all tasks:
task --list-all
If you do not want the gitflow tasks, you can remove the Taskfile.gitflow.yaml
file and unlink it from the Taskfile.yaml
file (remove the includes
section). If you cannot find the section use CTRL + F to search for Taskfile.gitflow.yaml
.
Important notes to remember whilst using the project
These are the current versions of the charts we used in this tutorial:
NAME CHART APP VERSION
loki loki-6.41.1 3.5.5
prometheus kube-prometheus-stack-77.11.1 v0.85.0
For comprehensive troubleshooting, refer to the Troubleshooting section. Or open the github pages here and use the search bar to search your issue (USE INDIVIDUAL KEYWORDS NOT THE ISSUE NAME).
- Metrics Collection & Visualization – real-time system, application, and container insights
- Alerting & Incident Response – proactive notifications via Slack/Email/PagerDuty
- App instrumentation
- Add Terraform
Contributions are welcome! Open an issue or submit a PR.
Distributed under the MIT License. See LICENSE
for more info.
Your Name – @linkedin – @twitter/x – [email protected]
Project Link: https://github.com/sean-njela/k8s_monitoring
About Me - About Me