Skip to content

Commit 96966f1

Browse files
jaiakashDoris-xmkunal-511akagami-harshjuliusvonkohout
authored
blog: Kubeflow and GSoC 2025 blog (#183)
* init: gsoc 2025 blog Signed-off-by: Akash Jaiswal <[email protected]> * add: project 7 Signed-off-by: Akash Jaiswal <[email protected]> * feat: add project 10 Signed-off-by: Xinmin Du <[email protected]> * add: project-4 Signed-off-by: kunal-511 <[email protected]> * add project 1 Signed-off-by: Harshvir Potpose <[email protected]> * chore: sign off previous commits for DCO compliance Signed-off-by: kunal-511 <[email protected]> * Revise Kubeflow project details and outcomes Updated mentor information and added project outcomes. Signed-off-by: Julius von Kohout <[email protected]> * update: project3 blog added Signed-off-by: madmecodes <[email protected]> * Add Project 5 details for GSoC 2025 Added details about Project 5: JupyterLab Plugin for Kubeflow, including contributor, mentors, overview, key outcomes, and resources. Signed-off-by: Amrit Kumar <[email protected]> * Update 2025-09-06-kubeflow-and-gsoc2025.md Signed-off-by: Amrit Kumar <[email protected]> * added project 2 Signed-off-by: Harshit Nayan <[email protected]> * added project2 Signed-off-by: Harshit Nayan <[email protected]> * spelling Signed-off-by: Julius von Kohout <[email protected]> * Added Project 12 details Signed-off-by: SanthoshToorpu <[email protected]> * fix: formating for project 2 blog Signed-off-by: Akash Jaiswal <[email protected]> * chore: typo fixes Signed-off-by: Akash Jaiswal <[email protected]> * Apply suggestions from code review Signed-off-by: Julius von Kohout <[email protected]> * Project 6 Details Signed-off-by: Fellipe Resende <[email protected]> --------- Signed-off-by: Akash Jaiswal <[email protected]> Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: kunal-511 <[email protected]> Signed-off-by: Harshvir Potpose <[email protected]> Signed-off-by: Julius von Kohout <[email protected]> Signed-off-by: madmecodes <[email protected]> Signed-off-by: Amrit Kumar <[email protected]> Signed-off-by: Harshit Nayan <[email protected]> Signed-off-by: SanthoshToorpu <[email protected]> Signed-off-by: Fellipe Resende <[email protected]> Co-authored-by: Xinmin Du <[email protected]> Co-authored-by: kunal-511 <[email protected]> Co-authored-by: Harshvir Potpose <[email protected]> Co-authored-by: Julius von Kohout <[email protected]> Co-authored-by: madmecodes <[email protected]> Co-authored-by: Amrit Kumar <[email protected]> Co-authored-by: Harshit Nayan <[email protected]> Co-authored-by: SanthoshToorpu <[email protected]> Co-authored-by: Fellipe Resende <[email protected]>
1 parent bb58264 commit 96966f1

File tree

3 files changed

+269
-0
lines changed

3 files changed

+269
-0
lines changed
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
---
2+
toc: true
3+
layout: post
4+
comments: true
5+
title: "GSoC 2025: Meet Our Projects and Contributors πŸš€"
6+
hide: false
7+
categories: [gsoc, community, kubeflow]
8+
author: "Kubeflow Outreach Team"
9+
---
10+
11+
## Introduction
12+
13+
Google Summer of Code (GSoC) 2025 has been an exciting journey for the Kubeflow community! We are very grateful for Google and the open source community members dedication and effort.πŸŽ‰
14+
This year, 9 contributors from around the world collaborated with mentors to improve different parts of the Kubeflow ecosystem β€” from infrastructure and CI/CD, to notebooks, ML workflows, and beyond.
15+
16+
In this blog, we are highlighting all the projects that were part of **GSoC 2025**, their goals, the impact they’ve created, and the amazing contributors behind them.
17+
18+
πŸ‘‰ You can explore the full list on our [GSoC 2025 page](https://www.kubeflow.org/events/gsoc-2025/).
19+
20+
---
21+
22+
## πŸ“š Project Highlights
23+
24+
Below are the projects from this year’s GSoC. Each section includes a short summary, contributor details, and links to project resources.
25+
26+
---
27+
28+
### Project 1: Kubeflow Platform Enhancements
29+
**Contributor:** Harshvir Potpose ([@akagami-harsh](https://github.com/akagami-harsh))
30+
**Mentors:** Julius von Kohout ([@juliusvonkohout](https://github.com/juliusvonkohout))
31+
32+
**Overview:**
33+
We need an up to date S3 storage with hard multi-tenancy and run our containers with PodSecurityStandards restricted. MinIO transitioned to the AGPLv3 license in 2021, creating significant compliance challenges for the project.
34+
35+
This project addressed this critical blocker by implementing SeaweedFS as a production-ready replacement for MinIO. SeaweedFS offers a more permissive Apache 2.0 license while providing superior performance characteristics and enterprise-grade security and reliability.
36+
37+
**Key Outcomes:**
38+
- Provided S3 storage with hard multi-tenancy
39+
- Successfully migrated to SeaweedFS as a secure replacement for MinIO and integrated it into Kubeflow Pipelines
40+
- Eliminated MinIO's licensing constraints by adopting SeaweedFS's more permissive license model
41+
- Implemented comprehensive CI tests for SeaweedFS deployment and namespace isolation functionality
42+
- Strengthened the manifests repository's CI pipeline and contributed to the dashboard migration efforts
43+
- Enforcing PodSecurityStandards baseline/restricted
44+
45+
**Resources:**
46+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/PWDq4Zvt)
47+
- ✍️ [Personal Blog: Kubeflow Pipelines Embraces SeaweedFS](https://medium.com/@hpotpose26/kubeflow-pipelines-embraces-seaweedfs-9a7e022d5571)
48+
49+
---
50+
51+
### Project 2: KServe Models Web Application Modernization
52+
**Contributor:** (GitHub: [@LogicalGuy77](https://github.com/LogicalGuy77))
53+
**Mentors:** Griffin Sullivan ([@Griffin-Sullivan](https://github.com/Griffin-Sullivan)), Julius von Kohout ([@juliusvonkohout](https://github.com/juliusvonkohout))
54+
55+
**Overview:**
56+
This project revived and modernized the KServe Models Web Application (Angular + Flask), the UI used to manage machine learning inference services in Kubeflow via KServe. What began as a small Node.js update evolved into a comprehensive upgrade of the frontend stack, CI/CD, testing, and feature setβ€”bringing the app up to modern standards and making it easier for both users and contributors to work with.
57+
58+
**Key Outcomes:**
59+
- Modernized core stack: upgraded Node.js (v16 β†’ v23) and Angular (v12 β†’ v14), resolving security issues and improving performance
60+
- Migrated container images from Docker Hub to GitHub Container Registry (GHCR) to avoid rate limits and improve reliability
61+
- Overhauled CI/CD with GitHub Actions: updated actions, added intelligent caching for pip, Docker layers, and node_modules for significantly faster builds
62+
- Introduced Jest unit tests for core utilities (e.g., parsing Kubernetes object statuses and KServe predictor configs)
63+
- Added Cypress end-to-end tests for critical user journeys (deploy, edit, delete) including failure handling and input validation
64+
- Wrote comprehensive documentation to help contributors run and extend the test suites
65+
- Shipped β€œEdit InferenceService YAML” directly in the UI via an integrated Monaco editorβ€”no kubectl required
66+
- Fixed RawDeployment-mode crash and added ModelMesh support so resources and statuses render correctly
67+
- Added support for the latest KServe predictor runtimes, including HuggingFace
68+
- Simplified contributor onboarding with a Makefile that automates full frontend setup in a single command
69+
- Implemented runtime-configurable settings via a new `/api/config` endpoint (e.g., Grafana DB names, URL prefixes)
70+
- Cut the v0.15.0 release of the Models Web App, consolidating months of modernization and feature work
71+
72+
**By the Numbers:**
73+
- PRs merged: 19
74+
- Issues closed: 8
75+
- Lines of code changed: +22,309 / βˆ’11,628
76+
- Frontend: Angular, TypeScript, SCSS
77+
- Backend: Flask (Python)
78+
- CI/CD: GitHub Actions, Docker
79+
- Local cluster: Kubernetes (Kind) + Istio + Kubeflow
80+
81+
**Resources:**
82+
- [Project Repo: kserve/models-web-app](https://github.com/kserve/models-web-app)
83+
- [All commits by @LogicalGuy77](https://github.com/kserve/models-web-app/commits?author=LogicalGuy77)
84+
- [Blog Post](https://medium.com/@harshitweb3/my-gsoc-2025-journey-reviving-kserves-models-web-application-2f18ef16fb51)
85+
86+
---
87+
88+
### Project 3: Istio CNI and Ambient Mesh
89+
**Contributor:** Ayush Gupta (GitHub: [@madmecodes](https://github.com/madmecodes))
90+
**Mentors:** Julius von Kohout ([@juliusvonkohout](https://github.com/juliusvonkohout)), Kimonas Sotirchos ([@kimwnasptd](https://github.com/kimwnasptd))
91+
92+
**Overview:**
93+
This GSoC 2025 project modernized Kubeflow's service mesh infrastructure by implementing Istio CNI as the default configuration and pioneering Istio Ambient Mesh support. The 175-hour medium-difficulty project involved 25+ pull requests across multiple Kubeflow repositories, transitioning from traditional sidecar-based architecture to ambient mesh with ztunnel and waypoint proxies, pioneering the migration to Gateway API (HTTPRoute), implementing path-based routing for KServe model serving endpoints, and utilizing Kustomize overlay method for easy installation and configuration management.
94+
95+
**Key Outcomes:**
96+
- Implemented Istio CNI by default with Kustomize overlay method enabling easy switching between traditional Istio and CNI configurations
97+
- Created path-based routing for KServe multi-model serving and Gateway API (HTTPRoute) migration
98+
- Pioneered Ambient Mesh support with ztunnel/waypoint proxies and coordinating cross-repository compatibility
99+
100+
**Resources:**
101+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/WAHCCi8V)
102+
- ✍️ [Blog Post](https://medium.com/@ayushguptadev1/gsoc25-kubeflow-securing-and-optimizing-ml-infrastructure-with-istio-31f535c77fd6)
103+
104+
---
105+
106+
### Project 4: Deploying Kubeflow with Helm Charts
107+
108+
**Contributor:** Kunal Dugar ([@kunal-511](https://github.com/kunal-511))
109+
**Mentors:** Julius von Kohout ([@juliusvonkohout](https://github.com/juliusvonkohout)), Valentina Rodriguez Sosa ([@varodrig](https://github.com/varodrig)), Chase Cadet ([@Chasecadet](https://github.com/Chasecadet))
110+
111+
**Overview:**
112+
This project focused on creating component-based Helm charts for Kubeflow, enabling flexible and incremental deployment of ML infrastructure. Instead of requiring a full platform installation, users can now deploy specific components like Katib, Pipelines, Model Registry, and others independently with customized configurations.
113+
114+
**Key Outcomes:**
115+
- Kubeflow AI reference platform end to end testing
116+
- Created production-ready Helm charts for Katib, Model Registry, KServe Web App, Notebook Controller, and Kubeflow Pipelinesβ€”enabling one-command deployment of individual components
117+
- Built automated testing infrastructure with diff tools to validate Helm charts against Kustomize manifests, ensuring accuracy and catching regressions quickly
118+
- Enabled incremental Kubeflow adoption, reducing deployment complexity from days to hours for organizations building production ML platforms
119+
120+
**Resources:**
121+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/)
122+
- 🧩 [Kubeflow Enhancement Proposal (KEP)-831-Kubeflow-Helm-Support: Support Helm as an Alternative for Kustomize](https://github.com/kubeflow/community/pull/832)
123+
- ✍️ [Blog: My GSoC Journey: Deploying Kubeflow with Helm Charts](https://medium.com/@kunalD02/my-gsoc-journey-deploying-kubeflow-with-helm-charts-e7f9dea7b56e)
124+
125+
---
126+
127+
### Project 5: JupyterLab Plugin for Kubeflow
128+
129+
**Contributor:** Amrit Kumar ([@Amrit27k](https://github.com/Amrit27k))
130+
**Mentors:** Eder Ignatowicz ([@ederign](https://github.com/ederign)), Stefano Fioravanzo ([@StefanoFioravanzo](https://github.com/StefanoFioravanzo))
131+
132+
**Overview:**
133+
The project fully modernized Kubeflow Kale's architecture, migrating the backend from KFPv1 to KFPv2 with a new Jinja2 templating system for notebook-to-pipeline conversion. The initiative also featured a complete overhaul of the JupyterLab frontend (Typescriptv5.9.2, MUIv7) and comprehensive updates to GitHub workflows, documentation, and dependencies to meet modern community standards.
134+
135+
**Key Outcomes:**
136+
- Rebuilt the Kale backend to support the modern, future-proof Kubeflow Pipelines v2 (KFPv2) architecture, moving away from the deprecated KFPv1.
137+
- Implemented a new Jinja2 templating system that intelligently converts annotated Jupyter notebook cells into valid KFPv2 Python DSL scripts.
138+
- Updated the JupyterLab frontend extension using current standards (Typescript v5.9.2, Jupyterlab v4, and MUI v7), resolving hundreds of legacy compatibility issues.
139+
- Integrated KFPv2's robust system for better type-safe artifact handling and automated ML Metadata registration, ensuring rich lineage tracking for pipeline steps.
140+
- Standardized the project structure, updated GitHub workflows, and implemented UI test scripts to align with community standards and ensure maintainability for future contributors.
141+
142+
**Resources:**
143+
- πŸ“„ [Project Repo - Kubeflow Kale](https://github.com/kubeflow-kale/kale)
144+
- 🧩 [Kubeflow Kale 2.0- Project Roadmap](https://github.com/kubeflow-kale/kale/issues/457)
145+
- ✍️ [Blog: From Notebooks to Pipelines: My GSoC’25 Journey Modernizing Kubeflow Kale with KFPv2 and Jupyterlabv4](https://medium.com/@amritkmr4272/from-notebooks-to-pipelines-my-gsoc25-journey-modernizing-kubeflow-kale-with-kfpv2-and-e098f194208c)
146+
147+
---
148+
149+
### Project 6: Spark Operator with Kubeflow Notebooks
150+
151+
**Contributor:** Fellipe Resende ([@fresende](https://github.com/fresende))
152+
**Mentors:** Shekhar Rajak ([@Shekharrajak](https://github.com/Shekharrajak)),
153+
Luciano Resende ([@lresende](https://github.com/lresende)),
154+
Chaoran Yu ([@yuchaoran2011](https://github.com/yuchaoran2011)),
155+
Andrey Velichkevich ([@andreyvelich](https://github.com/andreyvelich))
156+
157+
![Diagram](/images/2025-09-06-kubeflow-and-gsoc2025/project6.png)
158+
159+
**Overview:**
160+
This project enables seamless PySpark execution within Kubeflow Notebooks by integrating the Spark Operator and Jupyter Enterprise Gateway. It allows data scientists to run distributed machine learning and big data workloads directly from their notebooks on Kubernetes, simplifying workflows and eliminating Spark infrastructure overhead, improving both usability and scalability within the Kubeflow ecosystem.
161+
162+
**Key Outcomes:**
163+
164+
- Extended Kubeflow Notebooks to enable seamless integration with Spark via Spark Operator leveraging Jupyter Enterprise Gateway to manage the spark application lifecycle.
165+
166+
- Enable data scientists and ML engineer to run distributed big-data workloads directly in Spark, from inside Kubeflow Notebooks, without manual cluster setup.
167+
168+
- Provided documentation and guidance for setting up, configuring, and customizing Kubeflow Notebook environments integrated with the Spark Operator, enabling users to run scalable distributed Spark workloads directly from Jupyter-based workflows.
169+
170+
**Resources:**
171+
172+
- πŸ“˜ [Main Documentation Page](https://www.kubeflow.org/docs/components/spark-operator/user-guide/notebooks-spark-operator/)
173+
- πŸŽ₯ [Setup Demo Video](https://youtu.be/g7tctdeitvc)
174+
- 🐞 [Debugging Demo Video](https://www.youtube.com/watch?v=p6K6PdlkmeU)
175+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/zRPtxGBI)
176+
- πŸ’» [Implementation Pull Request](https://github.com/kubeflow/website/pull/4141)
177+
178+
---
179+
180+
### Project 7: GPU Testing for LLM Blueprints
181+
182+
**Contributor:** Akash Jaiswal ([@jaiakash](https://github.com/jaiakash))
183+
**Mentors:** Andrey Velichkevich ([@andreyvelich](https://github.com/andreyvelich)), Valentina Rodriguez Sosa([@varodrig](https://github.com/varodrig))
184+
185+
![Diagram](/images/2025-09-06-kubeflow-and-gsoc2025/project7.png)
186+
187+
**Overview:**
188+
We had a few examples in the repository that we wanted to include in our end-to-end (E2E) tests, but all of them were CPU-based. Projects like Torchtune and Qwen 2.5, for instance, require GPU resources to run β€” yet our existing CI setup couldn’t validate them at all because it was entirely CPU-focused.
189+
190+
This created a major gap: whenever someone contributed a new LLM example or modified the trainer logic, we had no automated way to verify if those changes would work in a GPU environment β€” the same environment where these workloads are actually deployed in production.
191+
192+
The goal of this project was to add CI with GPU support directly into our CI/CD workflow.
193+
194+
**Key Outcomes:**
195+
196+
- Integrating GPU runners into GitHub Actions so that any pull request could automatically trigger GPU-backed E2E tests.
197+
198+
- Making the setup scalable and cost-efficient β€” instead of maintaining expensive GPU machines 24/7, we needed an on-demand system that provisions GPU resources only when a test is triggered.
199+
200+
**Resources:**
201+
202+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/fwZkvPr0)
203+
- 🧩 [Kubeflow Enhancement Proposal (KEP)](https://github.com/kubeflow/trainer/pull/2689)
204+
- ✍️ [Personal Blog: Scaling GPU Testing for LLM Blueprints](https://my-experience-with-kubeflow-for-gsoc.hashnode.dev/gsoc-2025-with-kubeflow-scaling-gpu-testing-for-llm-blueprints)
205+
206+
---
207+
208+
### Project 10: Support Volcano Scheduler in Kubeflow Trainer
209+
**Contributor:** Xinmin Du (GitHub: [@Doris-xm](https://github.com/Doris-xm))
210+
**Mentors:** Shao Wang ([@Electronic-Waste](https://github.com/Electronic-Waste)), Yuchen Cheng([@rudeigerc](https://github.com/rudeigerc))
211+
212+
**Overview:**
213+
The project aims to integrate the **Volcano scheduler** into Kubeflow Trainer as a **runtime plugin**.
214+
This will allow users to take advantage of advanced AI-specific scheduling features, such as Gang Scheduling and priority scheduling, supported by Volcano.
215+
216+
**Key Outcomes:**
217+
- Integrate the **Volcano** scheduler into Trainer as a runtime plugin to support Gang Scheduling and resource management for distributed training jobs.
218+
- Enabled AI-specific features such as priority scheduling, queue-based management, and network topology–aware scheduling.
219+
220+
**Resources:**
221+
222+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/ZWbY1Rfj)
223+
- 🧩 [Kubeflow Enhancement Proposal (KEP)](https://github.com/kubeflow/trainer/pull/2672)
224+
225+
---
226+
227+
### Project 12: Empowering Kubeflow Documentation with LLMs πŸ€–
228+
**Contributor:** Santhosh Toorpu (GitHub: [@SanthoshToorpu](https://github.com/SanthoshToorpu))
229+
**Mentors:** Francisco Javier Arceo ([@franciscojavierarceo](https://github.com/franciscojavierarceo)), Chase Cadet ([@Chasecadet](https://github.com/Chasecadet))
230+
231+
**Overview:**
232+
This project introduced an intelligent documentation assistant that uses **Retrieval-Augmented Generation (RAG)** and **KServe-hosted LLMs** to enhance the Kubeflow documentation experience. The goal was to help users find relevant, accurate answers drawn from Kubeflow docs, GitHub issues, and community discussions β€” all through a conversational interface on the Kubeflow website.
233+
234+
The system leverages **Kubeflow Pipelines** to automate documentation ingestion and indexing, **Milvus** for semantic vector search, and **FastAPI with WebSockets** for real-time interactions. Built on Kubernetes, the architecture follows Kubeflow’s MLOps principles end-to-end β€” from automated retraining and indexing to monitored LLM inference served via KServe.
235+
236+
**Key Outcomes:**
237+
- Designed and deployed an **LLM-powered Documentation Assistant** using Kubeflow-native tools (KFP, KServe, Feast, Milvus).
238+
- Implemented **automated documentation indexing pipelines** triggered by GitHub Actions to keep vector embeddings up-to-date.
239+
- Developed an **interactive chat interface** integrated into the Kubeflow website for natural-language documentation search.
240+
- Introduced a **RAG agentic workflow** with tool-calling to decide when to retrieve external documentation or use model knowledge.
241+
- Implemented **RBAC-based access control** for pipelines and KServe endpoints to align with Kubeflow’s multi-user isolation standards.
242+
- Developed a **feedback loop system** (β€œπŸ‘ / πŸ‘Žβ€) to improve the model’s performance and documentation quality.
243+
- Delivered a functional prototype hosted on Kubernetes, showcasing real-time semantic search across Kubeflow repositories.
244+
245+
**Resources:**
246+
- πŸ“„ [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/a9JPxfEh)
247+
- 🧠 [Demo Repo](https://github.com/kubeflow/docs-agent)
248+
- ✍️ [Blog Post: Empowering Kubeflow Documentation with LLMs](https://medium.com/@toorpusanthosh/empowering-kubeflow-documentation-with-llms-my-gsoc-journey-58eb946ba2af) <!-- Add blog link here when published -->
249+
250+
---
251+
252+
## πŸŽ‰ Wrapping Up
253+
254+
We are proud of what our GSoC 2025 contributors achieved and the impact they have made on the Kubeflow ecosystem. Their work not only strengthens existing components but also lays the foundation for future innovation in MLOps and AI infrastructure.
255+
256+
A huge **thank you** πŸ™ to all contributors, mentors, and community members who made this program a success.
257+
258+
---
259+
260+
## πŸ‘©β€πŸ’» Want to Get Involved?
261+
262+
The Kubeflow community is open to contributors of all backgrounds and skill levels. Whether you are passionate about ML infrastructure, frontend, DevOps, or documentation β€” there’s a place for you here.
263+
264+
- πŸ’» Visit our [website](https://www.kubeflow.org/docs/about/community/) and [GitHub](https://github.com/kubeflow)
265+
- πŸ’¬ Join our [Slack](https://www.kubeflow.org/docs/about/community/)
266+
- πŸ—“οΈ Attend the [community calls](https://www.kubeflow.org/docs/about/community/#kubeflow-community-call)
267+
- πŸ“© Subscribe to the [kubeflow-discuss](https://groups.google.com/g/kubeflow-discuss) mailing list
268+
269+
Let’s continue building the future of MLOps together πŸš€
557 KB
Loading
38.3 KB
Loading

0 commit comments

Comments
Β (0)