feat: add a federated learning example using nvidia flare #99

ferrarimarco · 2025-01-28T08:56:21Z

Federated Learning use case:
- Implement an example that deploys NVIDIA FLARE in the cluster.
- Support getting Terraform output in JSON format
- Implement a service to provision Cloud Storage buckets
- Enable Kubernetes network policy logging
- Don't enforce injecting Cloud Service Mesh sidecars in the kube-system namespace
- Fix Cloud Service Mesh auto-injection namespace labels
- Add tolerations to istio-egress so it can be deployed in the cluster
- Fix mutator configuration to add the right tolerations to federated learning workloads
- Allow DNS queries to GKE control plane
- Create missing tenant service account
- Configure Terraform outputs to get Artifact Registry repository FQDN.
- Use the truncated Cloud Build BUILD_ID as the project name suffix to avoid conflicts with projects pending deletion.

arueth · 2025-02-12T18:32:22Z

It seems like you are really intertwining this use case into your "core" federated learning platform. Is it possible to keep them more distinct? Possibly using a separate config file or a different folder structure.

To me this seems like an additional use case on top of your "core" platform, similar to Fine Tuning and RAG on the AI/ML platform.

ferrarimarco · 2025-02-14T09:23:22Z

Good point, let me see how I can restructure this.

ferrarimarco · 2025-02-14T15:34:31Z

@arueth I've reworked things a bit, PTAL, thanks.

arueth · 2025-02-18T18:40:25Z

It looks like we need to find a way to generate a unique project name per build. Right now the job is failing because the project with that SHORT_SHA already exists in a pending delete state. Maybe we can use a truncated version of BUILD_ID.

ferrarimarco · 2025-02-19T09:29:12Z

It looks like we need to find a way to generate a unique project name per build. Right now the job is failing because the project with that SHORT_SHA already exists in a pending delete state. Maybe we can use a truncated version of BUILD_ID.

Done in the latest commit. I used the BUILD_ID as you suggested, truncating it at 7 characters as a short Git hash.

ferrarimarco self-assigned this Jan 28, 2025

ferrarimarco changed the base branch from main to int-federated-learning January 28, 2025 08:56

ferrarimarco force-pushed the example-fl-nvflare branch 7 times, most recently from 626741c to 7fe70e3 Compare January 30, 2025 10:07

ferrarimarco force-pushed the int-federated-learning branch from 4a79619 to 8582f43 Compare February 3, 2025 09:58

ferrarimarco force-pushed the example-fl-nvflare branch 4 times, most recently from c4ed051 to 5a4ee0a Compare February 7, 2025 18:48

ferrarimarco force-pushed the example-fl-nvflare branch 5 times, most recently from e272a6c to 702690b Compare February 11, 2025 08:52

ferrarimarco marked this pull request as ready for review February 11, 2025 08:52

ferrarimarco requested a review from arueth February 12, 2025 08:04

ferrarimarco force-pushed the example-fl-nvflare branch 4 times, most recently from 1db2fc8 to 59b4a55 Compare February 14, 2025 15:05

feat: provide an example for federated learning

f112192

ferrarimarco force-pushed the example-fl-nvflare branch from 59b4a55 to f112192 Compare February 19, 2025 09:28

arueth approved these changes Feb 20, 2025

View reviewed changes

ferrarimarco merged commit 3963e42 into int-federated-learning Feb 20, 2025
14 checks passed

ferrarimarco deleted the example-fl-nvflare branch February 20, 2025 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a federated learning example using nvidia flare #99

feat: add a federated learning example using nvidia flare #99

ferrarimarco commented Jan 28, 2025 •

edited

Loading

arueth commented Feb 12, 2025 •

edited

Loading

ferrarimarco commented Feb 14, 2025

ferrarimarco commented Feb 14, 2025

arueth commented Feb 18, 2025

ferrarimarco commented Feb 19, 2025

feat: add a federated learning example using nvidia flare #99

feat: add a federated learning example using nvidia flare #99

Conversation

ferrarimarco commented Jan 28, 2025 • edited Loading

arueth commented Feb 12, 2025 • edited Loading

ferrarimarco commented Feb 14, 2025

ferrarimarco commented Feb 14, 2025

arueth commented Feb 18, 2025

ferrarimarco commented Feb 19, 2025

ferrarimarco commented Jan 28, 2025 •

edited

Loading

arueth commented Feb 12, 2025 •

edited

Loading