Use regression to predict units sold based on theme for our client.
Tip: If you don't have markdown viewer like atom, you can render this on chrome by following this link.
-
Ensure you have
Minicondainstalled and can be run from your shell. If not, download the installer for your platform here: https://docs.conda.io/en/latest/miniconda.htmlNOTE
- If you already have
Anacondainstalled, go ahead with the further steps, no need to install miniconda. - If
condacmd is not in your path, you can configure your shell by runningconda init.
- If you already have
-
Ensure you have
gitinstalled and can be run from your shellNOTE
- If you have installed
Git BashorGit Desktopthen thegitcli is not accessible by default from cmdline. If so, you can add the path togit.exeto your system path. Here are the paths on a recent setup
- If you have installed
%LOCALAPPDATA%\Programs\Git\git-bash.exe
%LOCALAPPDATA%\GitHubDesktop\app-<ver>\resources\app\git\mingw64\bin\git.exe
- Ensure invoke tool and pyyaml are installed in your
basecondaenvironment. If not, run
(base):~$ pip install invoke
(base):~$ pip install pyyaml
-
Switch to the root folder (i.e. folder containing this file)
-
A collection of workflow automation tasks can be seen as follows NOTE
- Please make sure there are no spaces in the folder path. Environment setup fails if spaces are present.
(base):~/<proj-folder>$ inv -l
- To verify pre-requisites, run
(base)~/<proj-folder>$ inv debug.check-reqs
and check no error messages (Error: ...) are printed.
-
Environment is divided into two sections
- Core - These are must have packages & will be setup by default. These are declared here
deploy/conda_envs/<windows/linux>-cpu-64-dev.yml - Addons - These are for specific purposes you can choose to install. Here are the addon options
formatting- To enforce coding standards in your projects.documentation- To auto-generate doc from doc strings and/or create rst style documentation to share documentation onlinetesting- To use automated test casesjupyter- To run the notebooks. This includes jupyter extensions for spell check, advances formatting.extras- there are nice to haves or for pointed usage.ts- Install this to work with time series datapyspark- Installs pyspark related dependencies in the env.
- Edit the addons here
deploy/conda_envs/<addon-name>-<windows/linux>-cpu-64-dev.ymlto suit your need. - Each of the packages there have line comments with their purpose. From an installation standpoint extras are treated as addons
- Core - These are must have packages & will be setup by default. These are declared here
-
You can edit them to your need. All these packages including addons & extras are curated with versions & tested throughly for acceleration.
-
While you can choose, please decide upfront for your project and everyone use the same options.
-
Below you can see how to install the core environment & addons separately. However, we strongly recommend to update the core env with the addons packages & extras as needed for your project. This ensures there is only one version of the env file for your project.
-
To run the reference notebooks and production codes, it is recommended to install all addons.
-
Tip: Default name of the env is
ta-lib-dev. You can change it for your project.- For example: to make it as
env-myproject-prod. - Open
tasks.py - Set
ENV_PREFIX = 'env-customer-x'
- For example: to make it as
Run below to install core libraries
(base):~/<proj-folder>$ inv dev.setup-env --usecase=<specific usecase>
The above command should create a conda python environment named ta-lib-dev and install the code in the current repository along with all required dependencies.
usecase parameter above is an optional parameter. It takes a value of tpo or mmx.
dev.setup-env in itself will only install the core libs required but when you have to work
with specific use case (e.g MMX or TPO, etc.), one has to install the libraries required for
these specific use cases. So when we provide the usecase option, we are specifying that we
want that dependencies for this use case installed in our environment as well.
Activate the environment first to install other addons. Keep the environment active for all the remaining commands in the manual.
(base):~/<proj-folder>$ conda activate ta-lib-dev
Install invoke and pyyaml in this env to be able to install the addons in this environment.
(ta-lib-dev):~/<proj-folder>$ pip install invoke
Now run all following command to install all the addons. Feel free to customize addons as suggested in the introduction.
(ta-lib-dev):~/<proj-folder>$ inv dev.setup-addon --formatting --jupyter --documentation --testing --extras --ts
You now should have a standalone conda python environment and installed code in the current repository along with all required dependencies.
- Get the installation info by running
(ta-lib-dev):~/<proj-folder>$ inv dev.info
- Test your installation by running
(ta-lib-dev):~/<proj-folder>$ inv test.val-env --usecase=<specific usecase>
We need to specify the usecase to validate the environment for core as well as usecase specific dependencies.
- This will just check the core setup, i.e, the env setup by inv dev.setup-env
- To check the addon installation in the conda env, we check it by specifying the specific addon like
(ta-lib-dev):~/<proj-folder>$ inv test.val-env --formatting --jupyter --documentation --testing --extras --ts --pyspark
- You can specify which addon's installation you want to check here.
-
In order to launch a jupyter notebook locally in the web server, run
(ta-lib-dev):~/$ inv launch.jupyterlab
After running the command, type localhost:8080 to see the launched JupyterLab.
(ta-lib-dev):~/<proj-folder>$ cd production
(ta-lib-dev):~/<proj-folder>/production>$ python cli.py
Now run the following commands for data cleaning, feature engineering, model training and model scoring respectively:
(ta-lib-dev):~/<proj-folder>/production>$ python cli.py job run --job-id data_cleaning
(ta-lib-dev):~/<proj-folder>/production>$ python cli.py job run --job-id feat_engg
(ta-lib-dev):~/<proj-folder>/production>$ python cli.py job run --job-id model_gen
(ta-lib-dev):~/<proj-folder>/production>$ python cli.py job run --job-id model_eval