Create summaries of a large corpus of documents using Generative AI.
This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Document AI Optical Character Recognition (OCR), and BigQuery.
To deploy this blueprint you must have an active billing account and billing permissions.
- User uploads a new document triggering the webhook Cloud Function.
 - Document AI extracts the text from the document file.
 - A Vertex AI Large Language Model summarizes the document text.
 - The document summaries are stored in BigQuery.
 
Configuration: 1 mins Deployment: 5 mins
| Name | Description | Type | Default | Required | 
|---|---|---|---|---|
| disable_services_on_destroy | Whether project services will be disabled when the resources are destroyed. | bool | 
false | 
no | 
| documentai_location | Document AI location, see https://cloud.google.com/document-ai/docs/regions | string | 
"us" | 
no | 
| labels | A set of key/value label pairs to assign to the resources deployed by this blueprint. | map(string) | 
{} | 
no | 
| project_id | The Google Cloud project ID to deploy to | string | 
n/a | yes | 
| region | The Google Cloud region to deploy to | string | 
"us-central1" | 
no | 
| unique_names | Whether to use unique names for resources | bool | 
false | 
no | 
| Name | Description | 
|---|---|
| bigquery_dataset_id | The name of the BigQuery dataset created | 
| bucket_docs_name | The name of the docs bucket created | 
| bucket_main_name | The name of the main bucket created | 
| documentai_processor_id | The full Document AI processor path ID | 
| neos_walkthrough_url | The URL to launch the in-console tutorial for the Generative AI Document Summarization solution | 
| unique_id | The unique ID for this deployment | 
These sections describe requirements for using this module.
The following dependencies must be available:
- Terraform v0.13
 - Terraform Provider for GCP plugin v3.0
 
A service account with the following roles must be used to provision the resources of this module:
- Storage Admin: 
roles/storage.admin 
A project with the following APIs enabled must be used to host the resources of this module:
- Google Cloud Storage JSON API: 
storage-api.googleapis.com 
Refer to the contribution guidelines for information on contributing to this module.
Please see our security disclosure process.