This document provides guidelines for contributing to the Enterprise Knowledge Solution.
Contributions to this project must be accompanied by a Contributor License Agreement (CLA). You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project.
If you or your current employer have already signed the Google CLA (even if it was for a different project), you probably don't need to do it again.
Visit https://cla.developers.google.com/ to see your current agreements or to sign a new one.
This project follows Google's Open Source Community Guidelines.
All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose.
The main
branch corresponds to the latest release.
Tagged releases are also maintained for major milestones.
When developing a new feature for an upcoming version, create a feature branch from main
, and create a PR to merge the feature branch to the latest version branch when complete.
To manage dependencies across multiple projects, we have adopted the following conventions:
Each python component sits under its own folder under the components/
folder and represents an independent feature of EKS, such as workflow orchestration, doc-processing, form-classifier, etc.
Each component declares its own components in one of the following ways:
pyproject.toml
: For layered dependencies in a library that can be referenced by other components, place pyproject.toml in the root directory of the library. For example, see components/processing/libs/processor-base/pyproject.tomlrequirements.in
: For declaring dependencies used only by an individual component, use requirements.in. For example, seecomponents/dpu-workflow/requirements.in
The files requirement.txt
and constraints.txt
are generated through a script, you should not edit them directly. To generate the requirements.txt
, run the following commands:
-
Ensure you have an empty
requirements.txt
file in the same folder aspyproject.toml
orrequirements.in
touch components/doc-classifier/src/requirements.txt
-
Generate the locked
requirement.txt
with the following script:./invoke.sh lock
To upgrade to the latest available version of dependencies, run the script with the following arguments:
./invoke.sh lock --upgrade
Or, to upgrade a single package:
./invoke.sh lock --upgrade-package package_name
This will upgrade all packages to the latest available version, except where a package has a pinned version specified in pyproject.toml
or requirements.in
. If you need to upgrade the version of such a package, manually change the pinned version then run the upgrade command again.
The file reqs/requirements_all.in is where the combined dependencies of each component are compiled.
-
Dependencies declared in pyproject.toml are imported as editable:
-e components/processing/libs/processor-base
-
Dependencies declared in requirements.in are read as requirements:
-r ../components/dpu-workflow/requirements.in
To install all dependencies in the local venv, run the following script:
./invoke.sh sync
This repository defines a number of CI tests in ./github/workflows
.
All tests must pass before a feature branch can be merged to a release branch.
Before a PR can be merged to a major version branch, it must pass integration test that confirms the repository can be deployed and pass functional tests in a clean GCP project.
- A Cloud Build trigger is configured on an internal (non-public) GCP project to run on Pull Requests to each major version branch of this repository. The trigger runs the build defined at /build/int.cloudbuild.yaml, which does the following high-level tasks:
- Create a new GCP project in an internal test environment
- Run the pre_tf_setup.sh script to prepare the GCP project and deployer service account
- Apply terraform to create all the terraform resources
- Run functional tests to confirm the resources and services are working as intended
- Tear down the ephemeral project
Many of the files in the repository are checked against linting tools and static code analysis for secure coding practices. This workflow is triggered by .github/workflows/lint.yaml, running multiple lint libraries in Super-Linter with the settings configured in .github/linters/super-linter.env
-
To validate that your code passes these checks, use the following methods depending on your environment:
-
GitHub Actions: GitHub Actions will automatically run all configured checks when a PR is created or modified.
-
Local: You can manually trigger the tests in a docker container from your local environment with the following command:
scripts/lint.sh
-
-
For issues that can be fixed automatically, you can automatically fix issues in your local environment with either of the following methods:
-
Fix mode: Run super-linter locally in fix mode by setting an environment variable to additionally run automatic fixes for the libraries configure
export LINTER_CONTAINER_FIX_MODE=true scripts/lint.sh
-
Devcontainer: Use a devcontainer with your preferred IDE to automatically configure all the extensions defined in this repository. Once installed, when you upon the top-level folder of this repository, you will be prompted to launch the devcontainer with all linting extensions preconfigured. See setup instructions for Visual Studio Code.
-