The Open Source Repository of Flyte-based Projects
The purpose of this repository is to showcase Flyte's capabilities in end-to-end applications that do some form of data processing or machine learning.
The source code for each project can be found in the projects directory, where each project has its
own set of dependencies.
Fork the repo on github, then clone it:
git clone https://github.com/<your-username>/flytelab| π Note |
|---|
Make sure you're using Python > 3.7 |
Create a new branch for your project:
git checkout -b my_project # replace this with your project name| π Note |
|---|
For MLOps Community Engineering Labs Hackathon participants: Each team will have its own branch on the main flyteorg/flytelab repo. If you're part of a team of more than one person, assign one teammate to create a project directory and push it into your team's branch. |
We use cookiecutter to manage project templates.
Install prerequisites:
pip install cookiecutter
In the root of the repo, create a new project:
cookiecutter templates/basic -o projects| π Note |
|---|
There are more templates in the templates directory depending on the requirements of your project. |
Answer the project setup questions:
project_name: my_project # replace this with your project name (can only contain alphanumeric characters and `_`)
project_author: foobar # replace this with your name
github_username: my_username # replace this with your github username
flyte_project: my_flyte_project # [optional]
description: project description # [optional]
| π Note |
|---|
For MLOps Community Engineering Labs Hackathon participants: project_author should be your team name, and flyte_project should be left as the default value. |
The project structure looks like the following:
.
βββ Dockerfile
βββ README.md
βββ dashboard
βΒ Β βββ app.py # streamlit app
βΒ Β βββ remote.config
βΒ Β βββ sandbox.config
βββ deploy.py # deployment script
βββ my_project
βΒ Β βββ __init__.py
βΒ Β βββ workflows.py # flyte workflows
βββ requirements-dev.txt
βββ requirements.txtGo into the project directory, then create your project's virtual environment:
cd projects/my_project
# create and activate virtual environment, name the venv whatever you want
python -m venv ~/venvs/my_project
source ~/venvs/my_project/bin/activate
# install requirements
pip install -r requirements.txt -r requirements-dev.txtRun Flyte workflows locally:
python my_project/workflows.py
You should see something like this in the output (you can ignore the warnings):
trained model: LogisticRegression()
Congrats! You just setup your flytelab project π.
You can now modify and iterate on the workflows.py file to create your very own Flyte
workflows using flytekit. You can refer to the
User Guide,
Tutorials,
and Flytekit API Reference to
learn more about all of Flyte's capabilities.
So far you've probably been running your workflows locally by invoking python my_project/workflows.py.
The first step to deploying your workflows to a Flyte cluster is to test it out on a
local sandbox cluster.
Make sure you have docker installed.
Then install flytectl:
π» OSX
brew install flyteorg/homebrew-tap/flytectlπ» Other Operating Systems
curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin # You can change path from /usr/local/bin to any file system path
export PATH=$(pwd)/bin:$PATH # Only required if user used different path then /usr/local/binStart the sandbox cluster from your projects/my_project directory:
flytectl sandbox start --source .βΉ Interacting with Flyte sandbox
Get the status of sandbox:
flytectl sandbox status
Teardown the sandbox:
flytectl sandbox teardown
| π Note |
|---|
| If you're having trouble getting the Flyte sandbox to start, see the troubleshooting guide. |
You should now be able to go to http://localhost:30081/console on your browser to see the Flyte UI.
git commit your changes, then deploy your project's workflows with:
python deploy.pyβΉ Expected output
You should see something like:
Successfully packaged 4 flyte objects into /Users/nielsbantilan/git/flytelab/projects/my_project/flyte-package.tgz
Registering Flyte workflows
---------------------------------------------------------------- --------- ------------------------------
| NAME (4) | STATUS | ADDITIONAL INFO |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/0_my_project.workflows.get_dataset_1.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/1_my_project.workflows.train_model_1.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/2_my_project.workflows.main_2.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/3_my_project.workflows.main_3.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
4 rows
βΉ What just happened?
The python deploy.py command just did the following:
- Built a docker image specified in your project's
Dockerfilefrom within the sandbox docker container. flytekitserializes your tasks and workflows into aflyte-package.tar.gzfile.flytectlregisters those Flyte-compatible artifacts to the playground cluster.
On the Flyte UI, you'll see a flytelab-<project-name> project namespace on the homepage.
Navigate to the my_project.workflows.main workflow and hit the Launch Workflow button, then
the Launch button on the model form.
π Congrats! You just kicked off your first workflow on your local Flyte sandbox cluster.
By default, Flyte uses docker images to encapsulate all the system and python dependencies of your application. If you update those dependencies then you'll need to re-build the docker image. However, if you want to quickly deploy code changes in your tasks/workflows, you can go through fast registration:
python deploy.py --fast
The Union.ai team maintains a playground Flyte cluster that you can use to run your workflows.
When you're ready to deploy your workflows to a full-fledged production Flyte cluster, first you'll need to
request an account on the Flyte OSS Slack #flytelab channel.
| π Note |
|---|
| For MLOps Community Engineering Labs Hackathon participants: you will receive these credentials after all teams have been finalized. |
You'll receive a username and password to sign into the Union.ai Playground, in addition to a client_id and client_secret if you want to use the FlyteRemote object to get the input and output data of your workflow executions from the playground.
Create a personal access token (PAT) on github. Make sure to give your PAT read and write access to packages
Then authenticate to the ghcr.io registry:
export CONTAINER_REPO_TOKEN="<your-token>"
echo $CONTAINER_REPO_TOKEN | docker login ghcr.io -u <your-username> --password-stdinThen, deploying to the playground is as simple as:
python deploy.py --remote
βΉ What just happened?
The python deploy.py --remote command just did the following:
- Built a docker image specified in your project's
Dockerfile. - Pushed the image to the github container registry under your username's package namespace.
flytekitserializes your tasks and workflows into aflyte-package.tgzfile.flytectlregisters those Flyte-compatible artifacts to the playground cluster.
Go to https://github.com/users/<your-username>/packages/container/flytelab/settings, and then:
- Click Add Repository to link your fork of the
flytelabrepo. - Scroll down to the Danger Zone, click Change visibility, and make the package public.
Finally, go to https://playground.hosted.unionai.cloud, authenticate with your union.ai playground
username and password, where you can navigate to your flytelab-<project-name> project
to run your workflows.
| π Note |
|---|
| Fast registering is currently not enabled in the Union.ai playground. |
The basic project template ships with a dashboard/app.py script that uses
streamlit as a UI for interacting with your model.
pip install streamlit
streamlit run dashboard/app.py
| π Note |
|---|
| For the given example, make sure to run the workflow at least once before spinning up the streamlit server. |
To access the data on the Union.ai playground, first export your client_id and client_secret
to your terminal session.
export FLYTE_CREDENTIALS_CLIENT_ID="<client_id>"
export FLYTE_CREDENTIALS_CLIENT_SECRET="<client_secret>"
Then start serving your streamlit app with:
streamlit run dashboard/app.py -- --remote
If you want to use streamlit cloud to deploy your app to share with the world, push your changes to the remote github branch you're working from and point streamlit cloud to the streamlit app script:
flytelab/projects/my_project/dashboard/app.py
You'll need to use their Secrets management system on the streamlit cloud UI to add your client id and secret credentials so that it has access to the playground cluster:
FLYTE_BACKEND = "remote" # point the app to the playground backend
FLYTE_CREDENTIALS_CLIENT_ID = "<client_id>" # replace this with your client id
FLYTE_CREDENTIALS_CLIENT_SECRET = "<client_secret>" # replace this with your client secretYou can also add additional secrets to the secrets file if needed.
