Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: federated learning use case implementation #70

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ferrarimarco
Copy link
Member

This commit introduces the Federated Learning core platform use case.

@ferrarimarco ferrarimarco changed the title feat: federated learning use case implementation (#49) feat: federated learning use case implementation Dec 12, 2024
@ferrarimarco
Copy link
Member Author

@arueth Can you please rebase int-federated-learning on top of main? That would be helpful for this branch as well. Thanks!

@ferrarimarco
Copy link
Member Author

PS: the only conflict is about a typo I fixed in platforms/gke/base/_shared_config/scripts/set_environment_variables.sh. It's no longer applicable.

@ferrarimarco ferrarimarco self-assigned this Dec 20, 2024
@arueth arueth force-pushed the int-federated-learning branch from 9123bf4 to e5385e4 Compare December 20, 2024 15:51
@arueth arueth force-pushed the int-federated-learning branch 2 times, most recently from 2bac60b to 9588c0b Compare January 14, 2025 16:42
@arueth arueth force-pushed the int-federated-learning branch from 15a8eb2 to a39221f Compare January 17, 2025 19:16
@ferrarimarco
Copy link
Member Author

ferrarimarco commented Jan 24, 2025

@arueth The build is failing because despite the KMS API being enabled, it likely needs time to set resources up.

Did we already solve this issue in the core platform? It seems like that other APIs might have this problem too, but we didn't notice it until we used the same project for CI.

If we didn't solve it yet, I can look into it.

@arueth
Copy link
Collaborator

arueth commented Jan 28, 2025

@arueth The build is failing because despite the KMS API being enabled, it likely needs time to set resources up.

Did we already solve this issue in the core platform? It seems like that other APIs might have this problem too, but we didn't notice it until we used the same project for CI.

If we didn't solve it yet, I can look into it.

Previously I've been able to resolve these by forcing a dependency on the google_project_service using google_project_service.<service>.project. If you are unable to resolve it using a dependency, you could either enable it in the initialize or we'd have to hack a check with an API call or gcloud command.

@ferrarimarco
Copy link
Member Author

ferrarimarco commented Jan 29, 2025

Previously I've been able to resolve these by forcing a dependency on the google_project_service using google_project_service.<service>.project.

This probably wouldn't help in this case because the implicit dependency already makes it so that the CryptoKey resource is being created after the "enable API" call returns (google_project_service.cloudkms_googleapis_com reported as created):

google_project_service.cloudkms_googleapis_com: Creating...
1415 | google_project_service.container_googleapis_com: Creating...
1416 | google_project_service.cloudkms_googleapis_com: Still creating... [10s elapsed]
1417 | google_project_service.container_googleapis_com: Still creating... [10s elapsed]
1418 | google_project_service.cloudkms_googleapis_com: Still creating... [20s elapsed]
1419 | google_project_service.container_googleapis_com: Still creating... [20s elapsed]
1420 | google_project_service.cloudkms_googleapis_com: Creation complete after 21s [id=redacted/cloudkms.googleapis.com]
1421 | google_project_service.container_googleapis_com: Creation complete after 21s [id=redacted/container.googleapis.com]

[...]

1426 | google_kms_crypto_key_iam_binding.cluster_secrets_encrypters: Creating...
1427 | google_kms_crypto_key_iam_binding.cluster_secrets_decrypters: Creating...
1428 |  
1429 | Error: Error retrieving IAM policy for KMS CryptoKey "projects/redacted-project/locations/us-central1/keyRings/redacted-keyring/cryptoKeys/redacted-fl-clusterSecretsKey": googleapi: Error 403: Google Cloud KMS API has not been used in project REDACTED_PROJECT_NUMBER before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project=REDACTED_PROJECT_NUMBER then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry., forbidden

[...]

If you are unable to resolve it using a dependency, you could either enable it in the initialize or we'd have to hack a check with an API call or gcloud command.

I think this would be the way forward.

This commit introduces the Federated Learning core platform use case.

As a first step to get a feeling of how it is to integrate and existing
use case with the platform, we provision only a simple resource (an
Artifact Registry repository).
Configure Private Google Access for the federated learning use case
* chore: simplify fl scripts

* chore: remove initialize terraservice in fl

Remove the 'initialize' terraservice in the Federated Learning use case
because the core platform 'initialize' terraservice takes care of
configuring backends for use cases after #71 is merged.

The only task implemented in the use case 'initialize' terraservice was
to initialize backend configuration, so we don't need it anymore.

Also, simplify provisioning and teardown scripts because we don't need
two different terraform init commands anymore because now all the
terraservices in the use case work with a remote backend.
Configure the GKE cluster for the federated learning use case
- configure firewall for federated learning
- configure iam roles and service accounts
- configure dedicated node pools
- configure policy controller and policies
- configure dedicated Kubernetes namespaces
@ferrarimarco ferrarimarco force-pushed the int-federated-learning branch from 4a79619 to 8582f43 Compare February 3, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants