Skip to content

Commit fe28ee7

Browse files
authored
Merge pull request #734 from 10up/enhancement/728
Amazon Polly as a provider for the text-to-speech feature.
2 parents 77b529e + e13824e commit fe28ee7

13 files changed

+1976
-110
lines changed

README.md

+44-1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
* [Set Up OpenAI Embeddings Language Processing](#set-up-classification-via-openai-embeddings)
2424
* [Set Up OpenAI Whisper Language Processing](#set-up-audio-transcripts-generation-via-openai-whisper)
2525
* [Set Up Azure AI Language Processing](#set-up-text-to-speech-via-microsoft-azure)
26+
* [Set Up AWS Language Processing](#set-up-text-to-speech-via-amazon-polly)
2627
* [Set Up Azure AI Vision Image Processing](#set-up-image-processing-features-via-microsoft-azure)
2728
* [Set Up OpenAI DALL·E Image Processing](#set-up-image-generation-via-openai)
2829
* [Set Up OpenAI Moderation Language Processing](#set-up-comment-moderation-via-openai-moderation)
@@ -45,7 +46,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
4546
* Generate new images on demand to use in-content or as a featured image using [OpenAI's DALL·E 3 API](https://platform.openai.com/docs/guides/images)
4647
* Generate transcripts of audio files using [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
4748
* Moderate incoming comments for sensitive content using [OpenAI's Moderation API](https://platform.openai.com/docs/guides/moderation)
48-
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech)
49+
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech) or [Amazon Polly](https://aws.amazon.com/polly/)
4950
* Classify post content using [IBM Watson's Natural Language Understanding API](https://www.ibm.com/watson/services/natural-language-understanding/) and [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings)
5051
* BETA: Recommend content based on overall site traffic via [Microsoft Azure's AI Personalizer API](https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) *(note that this service has been [deprecated by Microsoft](https://learn.microsoft.com/en-us/azure/ai-services/personalizer/) and as such, will no longer work. We are looking to replace this with a new provider to maintain the same functionality (see [issue#392](https://github.com/10up/classifai/issues/392))*
5152
* Generate image alt text, image tags, and smartly crop images using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
@@ -77,6 +78,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
7778
* To utilize the Azure AI Vision Image Processing functionality or Text to Speech Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account.
7879
* To utilize the Azure OpenAI Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account and you will need to [apply](https://aka.ms/oai/access) for OpenAI access.
7980
* To utilize the Google Gemini Language Processing functionality, you will need an active [Google Gemini](https://ai.google.dev/tutorials/setup) account.
81+
* To utilize the AWS Language Processing functionality, you will need an active [AWS](https://console.aws.amazon.com/) account.
8082

8183
## Pricing
8284

@@ -399,6 +401,47 @@ Note that [OpenAI](https://platform.openai.com/docs/guides/speech-to-text) can c
399401
* Click the button to preview the generated speech audio for the post.
400402
* View the post on the front-end and see a read-to-me feature has been added
401403

404+
## Set Up Text to Speech (via Amazon Polly)
405+
406+
### 1. Sign up for AWS (Amazon Web Services)
407+
408+
* [Register for a AWS account](https://aws.amazon.com/free/) or sign into your existing one.
409+
* Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/)
410+
* Create IAM User (If you don't have any IAM user)
411+
* In the navigation pane, choose **Users** and then click **Create user**
412+
* On the **Specify user details** page, under User details, in User name, enter the name for the new user.
413+
* Click **Next**
414+
* On the **Set permissions** page, under Permissions options, select **Attach policies directly**
415+
* Under **Permissions policies**, search for the policy **polly** and select **AmazonPollyFullAccess** Policy
416+
* Click **Next**
417+
* On the **Review and create** page, Review all of the choices you made up to this point. When you are ready to proceed, Click **Create user**.
418+
* In the navigation pane, choose **Users**
419+
* Choose the name of the user for which you want to create access keys, and then choose the **Security credentials** tab.
420+
* In the **Access keys** section, click **Create access key**.
421+
* On the **Access key best practices & alternatives** page, select **Application running outside AWS**
422+
* Click **Next**
423+
* On the **Retrieve access key** page, choose **Show** to reveal the value of your user's secret access key.
424+
* Copy and save the credentials in a secure location on your computer or click "Download .csv file" to save the access key ID and secret access key to a `.csv` file.
425+
426+
### 2. Configure AWS credentials under Tools > ClassifAI > Language Processing > Text to Speech
427+
428+
* Select **Amazon Polly** in the provider dropdown.
429+
* In the `AWS access key` field, enter the `Access key
430+
` copied from above.
431+
* In the `AWS secret access key` field, enter your `Secret access key` copied from above.
432+
* In the `AWS Region` field, enter your AWS region value eg: `us-east-1`
433+
* Click **Save Changes** (the page will reload).
434+
* If connected successfully, a new dropdown with the label "Voices" will be displayed.
435+
* Select a voice and voice engine as per your choice.
436+
* Select a post type that should use this service.
437+
438+
### 3. Using the Text to Speech service
439+
440+
* Assuming the post type selected is "post", create a new post and publish it.
441+
* After a few seconds, a "Preview" button will appear under the ClassifAI settings panel.
442+
* Click the button to preview the generated speech audio for the post.
443+
* View the post on the front-end and see a read-to-me feature has been added
444+
402445
## Set Up Image Processing features (via Microsoft Azure)
403446

404447
Note that [Azure AI Vision](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/home#image-requirements) can analyze and crop images that meet the following requirements:

composer.json

+9-2
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
"require": {
1313
"php": ">=7.4",
1414
"yahnis-elsts/plugin-update-checker": "5.1",
15-
"ua-parser/uap-php": "dev-master"
15+
"ua-parser/uap-php": "dev-master",
16+
"aws/aws-sdk-php": "^3.300"
1617
},
1718
"autoload": {
1819
"psr-4": {
@@ -30,7 +31,8 @@
3031
},
3132
"scripts": {
3233
"lint": "phpcs -s . --runtime-set testVersion 7.4-",
33-
"lint-fix": "phpcbf ."
34+
"lint-fix": "phpcbf .",
35+
"pre-autoload-dump": "Aws\\Script\\Composer\\Composer::removeUnusedServices"
3436
},
3537
"minimum-stability": "dev",
3638
"config": {
@@ -42,5 +44,10 @@
4244
"exclude": [
4345
"!/vendor/"
4446
]
47+
},
48+
"extra": {
49+
"aws/aws-sdk-php": [
50+
"Polly"
51+
]
4552
}
4653
}

0 commit comments

Comments
 (0)