Skip to content

Commit 4d91802

Browse files
authored
Merge pull request #747 from 10up/feat/718
feat/718: add OpenAI Text to Speech as a Provider
2 parents fe28ee7 + 76d83b7 commit 4d91802

File tree

12 files changed

+671
-45
lines changed

12 files changed

+671
-45
lines changed

README.md

+26-3
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
* [Set Up OpenAI Embeddings Language Processing](#set-up-classification-via-openai-embeddings)
2424
* [Set Up OpenAI Whisper Language Processing](#set-up-audio-transcripts-generation-via-openai-whisper)
2525
* [Set Up Azure AI Language Processing](#set-up-text-to-speech-via-microsoft-azure)
26+
* [Set Up OpenAI Text to Speech Processing](#set-up-text-to-speech-via-openai)
2627
* [Set Up AWS Language Processing](#set-up-text-to-speech-via-amazon-polly)
2728
* [Set Up Azure AI Vision Image Processing](#set-up-image-processing-features-via-microsoft-azure)
2829
* [Set Up OpenAI DALL·E Image Processing](#set-up-image-generation-via-openai)
@@ -46,7 +47,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
4647
* Generate new images on demand to use in-content or as a featured image using [OpenAI's DALL·E 3 API](https://platform.openai.com/docs/guides/images)
4748
* Generate transcripts of audio files using [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
4849
* Moderate incoming comments for sensitive content using [OpenAI's Moderation API](https://platform.openai.com/docs/guides/moderation)
49-
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech) or [Amazon Polly](https://aws.amazon.com/polly/)
50+
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech), [Amazon Polly](https://aws.amazon.com/polly/) or [OpenAI's Text to Speech API](https://platform.openai.com/docs/guides/text-to-speech)
5051
* Classify post content using [IBM Watson's Natural Language Understanding API](https://www.ibm.com/watson/services/natural-language-understanding/) and [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings)
5152
* BETA: Recommend content based on overall site traffic via [Microsoft Azure's AI Personalizer API](https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) *(note that this service has been [deprecated by Microsoft](https://learn.microsoft.com/en-us/azure/ai-services/personalizer/) and as such, will no longer work. We are looking to replace this with a new provider to maintain the same functionality (see [issue#392](https://github.com/10up/classifai/issues/392))*
5253
* Generate image alt text, image tags, and smartly crop images using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
@@ -74,7 +75,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
7475
* PHP 7.4+
7576
* [WordPress](http://wordpress.org) 6.1+
7677
* To utilize the NLU Language Processing functionality, you will need an active [IBM Watson](https://cloud.ibm.com/registration) account.
77-
* To utilize the ChatGPT, Embeddings, or Whisper Language Processing functionality or DALL·E Image Processing functionality, you will need an active [OpenAI](https://platform.openai.com/signup) account.
78+
* To utilize the ChatGPT, Embeddings, Text to Speech or Whisper Language Processing functionality or DALL·E Image Processing functionality, you will need an active [OpenAI](https://platform.openai.com/signup) account.
7879
* To utilize the Azure AI Vision Image Processing functionality or Text to Speech Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account.
7980
* To utilize the Azure OpenAI Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account and you will need to [apply](https://aka.ms/oai/access) for OpenAI access.
8081
* To utilize the Google Gemini Language Processing functionality, you will need an active [Google Gemini](https://ai.google.dev/tutorials/setup) account.
@@ -86,7 +87,7 @@ Note that there is no cost to using ClassifAI itself. Both IBM Watson and Micros
8687

8788
IBM Watson's Natural Language Understanding ("NLU"), which is one of the providers that powers the classification feature, has a ["lite" pricing tier](https://www.ibm.com/cloud/watson-natural-language-understanding/pricing) that offers 30,000 free NLU items per month.
8889

89-
OpenAI, which is one of the providers that powers the classification, title generation, excerpt generation, content resizing, audio transcripts generation, moderation and image generation features, has a limited free trial and then requires a [pay per usage](https://openai.com/pricing) plan.
90+
OpenAI, which is one of the providers that powers the classification, title generation, excerpt generation, content resizing, audio transcripts generation, text to speech, moderation and image generation features, has a limited free trial and then requires a [pay per usage](https://openai.com/pricing) plan.
9091

9192
Microsoft Azure AI Vision, which is one of the providers that powers the descriptive text generator, image tags generator, image cropping, image text extraction and PDF text extraction features, has a ["free" pricing tier](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/) that offers 20 transactions per minute and 5,000 transactions per month.
9293

@@ -349,6 +350,7 @@ IBM Watson's [Categories](https://cloud.ibm.com/docs/natural-language-understand
349350
## Set Up Audio Transcripts Generation (via OpenAI Whisper)
350351

351352
Note that [OpenAI](https://platform.openai.com/docs/guides/speech-to-text) can create a transcript for audio files that meet the following requirements:
353+
352354
* The file must be presented in mp3, mp4, mpeg, mpga, m4a, wav, or webm format
353355
* The file size must be less than 25 megabytes (MB)
354356

@@ -401,6 +403,27 @@ Note that [OpenAI](https://platform.openai.com/docs/guides/speech-to-text) can c
401403
* Click the button to preview the generated speech audio for the post.
402404
* View the post on the front-end and see a read-to-me feature has been added
403405

406+
## Set Up Text to Speech (via OpenAI)
407+
408+
### 1. Sign up for OpenAI
409+
410+
* [Sign up for an OpenAI account](https://platform.openai.com/signup) or sign into your existing one.
411+
* If creating a new account, complete the verification process (requires confirming your email and phone number).
412+
* Log into your account and go to the [API key page](https://platform.openai.com/account/api-keys).
413+
* Click `Create new secret key` and copy the key that is shown.
414+
415+
### 2. Configure OpenAI API Keys under Tools > ClassifAI > Language Processing > Text to Speech
416+
417+
* Select **OpenAI Text to Speech** in the provider dropdown.
418+
* Enter your API Key copied from the above step into the `API Key` field.
419+
420+
### 3. Using the Text to Speech service
421+
422+
* Assuming the post type selected is "post", create a new post and publish it.
423+
* After a few seconds, a "Preview" button will appear under the ClassifAI settings panel.
424+
* Click the button to preview the generated speech audio for the post.
425+
* View the post on the front-end and see a read-to-me feature has been added
426+
404427
## Set Up Text to Speech (via Amazon Polly)
405428

406429
### 1. Sign up for AWS (Amazon Web Services)

includes/Classifai/Features/TextToSpeech.php

+27-1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
use Classifai\Services\LanguageProcessing;
66
use Classifai\Providers\Azure\Speech;
77
use Classifai\Providers\AWS\AmazonPolly;
8+
use Classifai\Providers\OpenAI\TextToSpeech as OpenAITTS;
9+
use Classifai\Normalizer;
810
use WP_REST_Server;
911
use WP_REST_Request;
1012
use WP_Error;
@@ -44,6 +46,14 @@ class TextToSpeech extends Feature {
4446
*/
4547
const DISPLAY_GENERATED_AUDIO = '_classifai_display_generated_audio';
4648

49+
/**
50+
* Meta key to get/set the audio hash that helps to indicate if there is any need
51+
* for the audio file to be regenerated or not.
52+
*
53+
* @var string
54+
*/
55+
const AUDIO_HASH_KEY = '_classifai_post_audio_hash';
56+
4757
/**
4858
* Constructor.
4959
*/
@@ -55,8 +65,9 @@ public function __construct() {
5565

5666
// Contains just the providers this feature supports.
5767
$this->supported_providers = [
58-
Speech::ID => __( 'Microsoft Azure AI Speech', 'classifai' ),
5968
AmazonPolly::ID => __( 'Amazon Polly', 'classifai' ),
69+
Speech::ID => __( 'Microsoft Azure AI Speech', 'classifai' ),
70+
OpenAITTS::ID => __( 'OpenAI Text to Speech', 'classifai' ),
6071
];
6172
}
6273

@@ -840,6 +851,21 @@ public function get_audio_generation_subsequent_state( $post = null ): bool {
840851
return apply_filters( 'classifai_audio_generation_subsequent_state', false, get_post( $post ) );
841852
}
842853

854+
/**
855+
* Normalizes the post content for text to speech generation.
856+
*
857+
* @param int $post_id The post ID.
858+
*
859+
* @return string The normalized post content.
860+
*/
861+
public function normalize_post_content( int $post_id ): string {
862+
$normalizer = new Normalizer();
863+
$post = get_post( $post_id );
864+
$post_content = $normalizer->normalize_content( $post->post_content, $post->post_title, $post_id );
865+
866+
return $post_content;
867+
}
868+
843869
/**
844870
* Generates feature setting data required for migration from
845871
* ClassifAI < 3.0.0 to 3.0.0

includes/Classifai/Providers/AWS/AmazonPolly.php

+3-14
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
namespace Classifai\Providers\AWS;
1010

1111
use Classifai\Providers\Provider;
12-
use Classifai\Normalizer;
1312
use Classifai\Features\TextToSpeech;
1413
use WP_Error;
1514
use Aws\Sdk;
@@ -18,14 +17,6 @@ class AmazonPolly extends Provider {
1817

1918
const ID = 'aws_polly';
2019

21-
/**
22-
* Meta key to get/set the audio hash that helps to indicate if there is any need
23-
* for the audio file to be regenerated or not.
24-
*
25-
* @var string
26-
*/
27-
const AUDIO_HASH_KEY = '_classifai_post_audio_hash';
28-
2920
/**
3021
* AmazonPolly Text to Speech constructor.
3122
*
@@ -374,12 +365,10 @@ public function synthesize_speech( int $post_id ) {
374365
);
375366
}
376367

377-
$normalizer = new Normalizer();
378368
$feature = new TextToSpeech();
379369
$settings = $feature->get_settings();
380-
$post = get_post( $post_id );
381-
$post_content = $normalizer->normalize_content( $post->post_content, $post->post_title, $post_id );
382-
$content_hash = get_post_meta( $post_id, self::AUDIO_HASH_KEY, true );
370+
$post_content = $feature->normalize_post_content( $post_id );
371+
$content_hash = get_post_meta( $post_id, TextToSpeech::AUDIO_HASH_KEY, true );
383372
$saved_attachment_id = (int) get_post_meta( $post_id, $feature::AUDIO_ID_KEY, true );
384373

385374
// Don't regenerate the audio file it it already exists and the content hasn't changed.
@@ -453,7 +442,7 @@ public function synthesize_speech( int $post_id ) {
453442
$polly_client = $this->get_polly_client();
454443
$result = $polly_client->synthesizeSpeech( $synthesize_data );
455444

456-
update_post_meta( $post_id, self::AUDIO_HASH_KEY, md5( $post_content ) );
445+
update_post_meta( $post_id, TextToSpeech::AUDIO_HASH_KEY, md5( $post_content ) );
457446
$contents = $result['AudioStream']->getContents();
458447
return $contents;
459448
} catch ( \Exception $e ) {

includes/Classifai/Providers/Azure/Speech.php

+3-14
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
namespace Classifai\Providers\Azure;
77

88
use Classifai\Providers\Provider;
9-
use Classifai\Normalizer;
109
use Classifai\Features\TextToSpeech;
1110
use stdClass;
1211
use WP_Http;
@@ -30,14 +29,6 @@ class Speech extends Provider {
3029
*/
3130
const API_PATH = 'cognitiveservices/v1';
3231

33-
/**
34-
* Meta key to get/set the audio hash that helps to indicate if there is any need
35-
* for the audio file to be regenerated or not.
36-
*
37-
* @var string
38-
*/
39-
const AUDIO_HASH_KEY = '_classifai_post_audio_hash';
40-
4132
/**
4233
* Azure Text to Speech constructor.
4334
*
@@ -337,12 +328,10 @@ public function synthesize_speech( int $post_id ) {
337328
);
338329
}
339330

340-
$normalizer = new Normalizer();
341331
$feature = new TextToSpeech();
342332
$settings = $feature->get_settings();
343-
$post = get_post( $post_id );
344-
$post_content = $normalizer->normalize_content( $post->post_content, $post->post_title, $post_id );
345-
$content_hash = get_post_meta( $post_id, self::AUDIO_HASH_KEY, true );
333+
$post_content = $feature->normalize_post_content( $post_id );
334+
$content_hash = get_post_meta( $post_id, TextToSpeech::AUDIO_HASH_KEY, true );
346335
$saved_attachment_id = (int) get_post_meta( $post_id, $feature::AUDIO_ID_KEY, true );
347336

348337
// Don't regenerate the audio file it it already exists and the content hasn't changed.
@@ -415,7 +404,7 @@ public function synthesize_speech( int $post_id ) {
415404
);
416405
}
417406

418-
update_post_meta( $post_id, self::AUDIO_HASH_KEY, md5( $post_content ) );
407+
update_post_meta( $post_id, TextToSpeech::AUDIO_HASH_KEY, md5( $post_content ) );
419408

420409
return $response_body;
421410
}

includes/Classifai/Providers/OpenAI/APIRequest.php

+21-7
Original file line numberDiff line numberDiff line change
@@ -270,19 +270,33 @@ public function post_form( string $url = '', array $body = [] ) {
270270
*/
271271
public function get_result( $response ) {
272272
if ( ! is_wp_error( $response ) ) {
273+
$headers = wp_remote_retrieve_headers( $response );
274+
$content_type = false;
275+
276+
if ( ! is_wp_error( $headers ) ) {
277+
$content_type = isset( $headers['content-type'] ) ? $headers['content-type'] : false;
278+
}
279+
273280
$body = wp_remote_retrieve_body( $response );
274281
$code = wp_remote_retrieve_response_code( $response );
275-
$json = json_decode( $body, true );
276282

277-
if ( json_last_error() === JSON_ERROR_NONE ) {
278-
if ( empty( $json['error'] ) ) {
279-
return $json;
283+
if ( false === $content_type || false !== strpos( $content_type, 'application/json' ) ) {
284+
$json = json_decode( $body, true );
285+
286+
if ( json_last_error() === JSON_ERROR_NONE ) {
287+
if ( empty( $json['error'] ) ) {
288+
return $json;
289+
} else {
290+
$message = $json['error']['message'] ?? esc_html__( 'An error occured', 'classifai' );
291+
return new WP_Error( $code, $message );
292+
}
280293
} else {
281-
$message = $json['error']['message'] ?? esc_html__( 'An error occured', 'classifai' );
282-
return new WP_Error( $code, $message );
294+
return new WP_Error( 'Invalid JSON: ' . json_last_error_msg(), $body );
283295
}
296+
} elseif ( $content_type && false !== strpos( $content_type, 'audio/mpeg' ) ) {
297+
return $response;
284298
} else {
285-
return new WP_Error( 'Invalid JSON: ' . json_last_error_msg(), $body );
299+
return new WP_Error( 'Invalid content type', $response );
286300
}
287301
} else {
288302
return $response;

0 commit comments

Comments
 (0)