-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* DocAI Form Parser microservice (#12) * DocAI form parser processor integration * form processor build conatiner image script * DocAI form parser code integration * DocAI Form Parser fixes * Changes: - Re-sync'd the development constraints (shared) based on the form parser requirements.in - Moved requirements.txt to requirements.in for form parser - Updated tasks.py to also generate requirements.txt from requirements.in - Reformatted terraform from pre-commit --------- Co-authored-by: Mark Scannell <[email protected]> * down stream tasks only depends on the supported files are moved but will wait for pdf form processor to finish (#13) * made downstream tasks only depends on files move but wait for pdf forms files moved * ignore pylint import errors * form-parser-metadata-load-bigquery * Composer task to trigger Doc AI Form Parser Cloud Run- first version * form-parser-metadata-load-bigquery (#18) * form-parser-metadata-load-bigquery * fixes in form parser * Updated README.md * Updated DPU to EKS and user agent string and label for revenue tracking * skip pre-commit * removing pre-commit check for terraform fmt --------- Co-authored-by: Dharmesh Patel <[email protected]> * Updated Ref. Arch. diagram and added DATAFLOW.md * Changed labels to eks-solution * Composer task to trigger DocAI Form Parser and metadata update for Form parser * Updated labels for tracking (#19) * DocAI form API microservice trigger from Cloud Composer * fix in form parser * Fixed type issue from assigning a `str | None` type to `str` type when reading environment variables. This is done by calling the `os.environ[]` instead of `os.environ.get()` method. This will fail fast if the environment variable does not exist. * batch deletion based on batch-id (#16) * location parameter and batch-id based deletion * Updated the README.md for batch delete. * updated the delete_doc.sh script --------- Co-authored-by: Dharmesh Patel <[email protected]> * Parallelized form parsing and docs parsing, including importing to the data store. (#21) Co-authored-by: Eyal Ben Ivri <[email protected]> * refactored many of the operations in the DAG to a utils package, to reduce complixty and code in the DAG file, and move logic to other files, where the logic is seperated from the airflow runtime. * reordered dag steps and dependencies to optimize runtime * down stream tasks only depends on the supported files are moved but will wait for pdf form processor to finish (#13) * made downstream tasks only depends on files move but wait for pdf forms files moved * ignore pylint import errors * Composer task to trigger Doc AI Form Parser Cloud Run- first version * Composer task to trigger DocAI Form Parser and metadata update for Form parser * DocAI form API microservice trigger from Cloud Composer * form-parser-metadata-load-bigquery (#18) * form-parser-metadata-load-bigquery * fixes in form parser * Updated README.md * Updated DPU to EKS and user agent string and label for revenue tracking * skip pre-commit * removing pre-commit check for terraform fmt --------- Co-authored-by: Dharmesh Patel <[email protected]> * Updated labels for tracking (#19) * Fixed type issue from assigning a `str | None` type to `str` type when reading environment variables. This is done by calling the `os.environ[]` instead of `os.environ.get()` method. This will fail fast if the environment variable does not exist. * Parallelized form parsing and docs parsing, including importing to the data store. (#21) Co-authored-by: Eyal Ben Ivri <[email protected]> * refactored many of the operations in the DAG to a utils package, to reduce complixty and code in the DAG file, and move logic to other files, where the logic is seperated from the airflow runtime. * reordered dag steps and dependencies to optimize runtime * removed commented out step * added license information to new files. * added license information to new files. * Copy all files in `src` folder to `dags` folder in GCS --------- Co-authored-by: anuradha-bajpai-google <[email protected]> Co-authored-by: Mark Scannell <[email protected]> Co-authored-by: Charlie Wang <[email protected]> Co-authored-by: anuradha-bajpai-google <[email protected]> Co-authored-by: Dharmesh Patel <[email protected]> Co-authored-by: Mark Scannell <[email protected]> Co-authored-by: Eyal Ben Ivri <[email protected]>
- Loading branch information
Showing
12 changed files
with
390 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.