Skip to content

Commit 6a7159e

Browse files
author
Jim Bennett
authored
Lesson 23 (#103)
* Adding content * Update en.json * Update README.md * Update TRANSLATIONS.md * Adding lesson tempolates * Fixing code files with each others code in * Update README.md * Adding lesson 16 * Adding virtual camera * Adding Wio Terminal camera capture * Adding wio terminal code * Adding SBC classification to lesson 16 * Adding challenge, review and assignment * Adding images and using new Azure icons * Update README.md * Update iot-reference-architecture.png * Adding structure for JulyOT links * Removing icons * Sketchnotes! * Create lesson-1.png * Starting on lesson 18 * Updated sketch * Adding virtual distance sensor * Adding Wio Terminal image classification * Update README.md * Adding structure for project 6 and wio terminal distance sensor * Adding some of the smart timer stuff * Updating sketchnotes * Adding virtual device speech to text * Adding chapter 21 * Language tweaks * Lesson 22 stuff * Update en.json * Bumping seeed libraries * Adding functions lab to lesson 22 * Almost done with LUIS * Update README.md * Reverting sunlight sensor change Fixes #88 * Structure * Adding speech to text lab for Pi * Adding virtual device text to speech lab * Finishing lesson 23
1 parent 63a0723 commit 6a7159e

File tree

25 files changed

+984
-29
lines changed

25 files changed

+984
-29
lines changed

.vscode/settings.json

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
"Geospatial",
66
"Kbps",
77
"Mbps",
8+
"SSML",
89
"Seeed",
910
"Siri",
1011
"Twilio",

6-consumer/lessons/1-speech-recognition/code-iot-hub/virtual-iot-device/smart-timer/app.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@
1414
device_client.connect()
1515
print('Connected')
1616

17-
speech_config = SpeechConfig(subscription=api_key,
18-
region=location,
19-
speech_recognition_language=language)
17+
recognizer_config = SpeechConfig(subscription=api_key,
18+
region=location,
19+
speech_recognition_language=language)
2020

21-
recognizer = SpeechRecognizer(speech_config=speech_config)
21+
recognizer = SpeechRecognizer(speech_config=recognizer_config)
2222

2323
def recognized(args):
2424
if len(args.result.text) > 0:

6-consumer/lessons/1-speech-recognition/code-speech-to-text/virtual-iot-device/smart-timer/app.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
location = '<location>'
66
language = '<language>'
77

8-
speech_config = SpeechConfig(subscription=api_key,
9-
region=location,
10-
speech_recognition_language=language)
8+
recognizer_config = SpeechConfig(subscription=api_key,
9+
region=location,
10+
speech_recognition_language=language)
1111

12-
recognizer = SpeechRecognizer(speech_config=speech_config)
12+
recognizer = SpeechRecognizer(speech_config=recognizer_config)
1313

1414
def recognized(args):
1515
print(args.result.text)

6-consumer/lessons/1-speech-recognition/virtual-device-speech-to-text.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,9 @@ On Windows, Linux, and macOS, the speech services Python SDK can be used to list
4545
location = '<location>'
4646
language = '<language>'
4747
48-
speech_config = SpeechConfig(subscription=api_key,
49-
region=location,
50-
speech_recognition_language=language)
48+
recognizer_config = SpeechConfig(subscription=api_key,
49+
region=location,
50+
speech_recognition_language=language)
5151
```
5252

5353
Replace `<key>` with the API key for your speech service. Replace `<location>` with the location you used when you created the speech service resource.
@@ -59,7 +59,7 @@ On Windows, Linux, and macOS, the speech services Python SDK can be used to list
5959
1. Add the following code to create a speech recognizer:
6060

6161
```python
62-
recognizer = SpeechRecognizer(speech_config=speech_config)
62+
recognizer = SpeechRecognizer(speech_config=recognizer_config)
6363
```
6464

6565
1. The speech recognizer runs on a background thread, listening for audio and converting any speech in it to text. You can get the text using a callback function - a function you define and pass to the recognizer. Every time speech is detected, the callback is called. Add the following code to define a callback that prints the text to the console, and pass this callback to the recognizer:

6-consumer/lessons/2-language-understanding/README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ Once published, the LUIS model can be called from code. In the last lesson you s
347347
if prediction_response.prediction.top_intent == 'set timer':
348348
numbers = prediction_response.prediction.entities['number']
349349
time_units = prediction_response.prediction.entities['time unit']
350-
total_time = 0
350+
total_seconds = 0
351351
```
352352
353353
The `number` entities wil be an array of numbers. For example, if you said *"Set a four minute 17 second timer."*, then the `number` array will contain 2 integers - 4 and 17.
@@ -392,15 +392,15 @@ Once published, the LUIS model can be called from code. In the last lesson you s
392392
393393
```python
394394
if time_unit == 'minute':
395-
total_time += number * 60
395+
total_seconds += number * 60
396396
else:
397-
total_time += number
397+
total_seconds += number
398398
```
399399
400400
1. Finally, outside this loop through the entities, log the total time for the timer:
401401
402402
```python
403-
logging.info(f'Timer required for {total_time} seconds')
403+
logging.info(f'Timer required for {total_seconds} seconds')
404404
```
405405
406406
1. Run the function app and speak into your IoT device. You will see the total time for the timer in the function app output:

6-consumer/lessons/2-language-understanding/assignment.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Instructions
44

5-
So far in this lesson you have trained a model to understand setting a timer. Another useful feature is cancelling a timer - maybe your bread is ready and can be taken out of the oven.
5+
So far in this lesson you have trained a model to understand setting a timer. Another useful feature is cancelling a timer - maybe your bread is ready and can be taken out of the oven before the timer is elapsed.
66

77
Add a new intent to your LUIS app to cancel the timer. It won't need any entities, but will need some example sentences. Handle this in your serverless code if it is the top intent, logging that the intent was recognized.
88

6-consumer/lessons/2-language-understanding/code/functions/smart-timer-trigger/speech-trigger/__init__.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,16 @@ def main(events: List[func.EventHubEvent]):
2828
if prediction_response.prediction.top_intent == 'set timer':
2929
numbers = prediction_response.prediction.entities['number']
3030
time_units = prediction_response.prediction.entities['time unit']
31-
total_time = 0
31+
total_seconds = 0
3232

3333
for i in range(0, len(numbers)):
3434
number = numbers[i]
3535
time_unit = time_units[i][0]
3636

3737
if time_unit == 'minute':
38-
total_time += number * 60
38+
total_seconds += number * 60
3939
else:
40-
total_time += number
40+
total_seconds += number
4141

42-
logging.info(f'Timer required for {total_time} seconds')
42+
logging.info(f'Timer required for {total_seconds} seconds')
4343

6-consumer/lessons/3-spoken-feedback/README.md

+67-6
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,50 @@ In this lesson we'll cover:
2626

2727
## Text to speech
2828

29+
Text to speech, as the name suggests, is the process of converting text into audio that contains the text as spoken words. The basic principle is to break down the words in the text into their constituent sounds (known as phonemes), and stitch together audio for those sounds, either using pre-recorded audio or using audio generated by AI models.
30+
31+
![The three stages of typical text to speech systems](../../../images/tts-overview.png)
32+
33+
Text to speech systems typically have 3 stages:
34+
35+
* Text analysis
36+
* Linguistic analysis
37+
* Wave-form generation
38+
39+
### Text analysis
40+
41+
Text analysis involves taking the text provided, and converting into words that can be used to generate speech. For example, if you convert "Hello world", there there is no text analysis needed, the two words can be converted to speech. If you have "1234" however, then this might need to be converted either into the words "One thousand, two hundred thirty four" or "One, two, three, four" depending on the context. For "I have 1234 apples", then it would be "One thousand, two hundred thirty four", but for "The child counted 1234" then it would be "One, two, three, four".
42+
43+
The words created vary not only for the language, but the locale of that language. For example, in American English, 120 would be "One hundred twenty", in British English it would be "One hundred and twenty", with the use of "and" after the hundreds.
44+
45+
✅ Some other examples that require text analysis include "in" as a short form of inch, and "st" as a short form of saint and street. Can you think of other examples in your language of words that are ambiguous without context.
46+
47+
Once the words have been defined, they are sent for linguistic analysis.
48+
49+
### Linguistic analysis
50+
51+
Linguistic analysis breaks the words down into phonemes. Phonemes are based not just on the letters used, but the other letters in the word. For example, in English the 'a' sound in 'car' and 'care' is different. The English language has 44 different phonemes for the 26 letters in the alphabet, some shared by different letters, such as the same phoneme used at the start of 'circle' and 'serpent'.
52+
53+
✅ Do some research: What are the phonemes for you language?
54+
55+
Once the words have been converted to phonemes, these phonemes need additional data to support intonation, adjusting the tone or duration depending on the context. One example is in English pitch increases can be used to convert a sentence into a question, having a raised pitch for the last word implies a question.
56+
57+
For example - the sentence "You have an apple" is a statement saying that you have an apple. If the pitch goes up at the end, increasing for the word apple, it becomes the question "You have an apple?", asking if you have an apple. The linguistic analysis needs to use the question mark at the end to decide to increase pitch.
58+
59+
Once the phonemes have been generated, they can be sent for wave-form generation to produce the audio output.
60+
61+
### Wave-form generation
62+
63+
The first electronic text to speech systems used single audio recordings for each phoneme, leading to very monotonous, robotic sounding voices. The linguistic analysis would produce phonemes, these would be loaded from a database of sounds and stitched together to make the audio.
64+
65+
✅ Do some research: Find some audio recordings from early speech synthesis systems. Compare it to modern speech synthesis, such as that used in smart assistants.
66+
67+
More modern wave-form generation uses ML models built using deep learning (very large neural networks that act in a similar way to neurons in the brain) to produce more natural sounding voices that can be indistinguishable from humans.
68+
69+
> 💁 Some of these ML models can be re-trained using transfer learning to sound like real people. This means using voice as a security system, something banks are increasingly trying to do, is no longer a good idea as anyone with a recording of a few minutes of your voice can impersonate you.
70+
71+
These large ML models are being trained to combine all three steps into end-to-end speech synthesizers.
72+
2973
## Set the timer
3074

3175
The timer can be set by sending a command from the serverless code, instructing the IoT device to set the timer. This command will contain the time in seconds till the timer needs to go off.
@@ -38,11 +82,11 @@ The timer can be set by sending a command from the serverless code, instructing
3882
3983
You will need to set up the connection string for the IoT Hub with the service policy (*NOT* the device) in your `local.settings.json` file and add the `azure-iot-hub` pip package to your `requirements.txt` file. The device ID can be extracted from the event.
4084

41-
1. The direct method you send needs to be called `set-timer`, and will need to send the length of the timer as a JSON property called `time`. Use the following code to build the `CloudToDeviceMethod` using the `total_time` calculated from the data extracted by LUIS:
85+
1. The direct method you send needs to be called `set-timer`, and will need to send the length of the timer as a JSON property called `seconds`. Use the following code to build the `CloudToDeviceMethod` using the `total_seconds` calculated from the data extracted by LUIS:
4286

4387
```python
4488
payload = {
45-
'time': total_time
89+
'seconds': total_seconds
4690
}
4791
direct_method = CloudToDeviceMethod(method_name='set-timer', payload=json.dumps(payload))
4892
```
@@ -60,11 +104,23 @@ The timer can be set by sending a command from the serverless code, instructing
60104
* [Arduino - Wio Terminal](wio-terminal-set-timer.md)
61105
* [Single-board computer - Raspberry Pi/Virtual IoT device](single-board-computer-set-timer.md)
62106

63-
> 💁 You can find this code in the [code-command/wio-terminal](code-command/wio-terminal), [code-command/virtual-device](code-command/virtual-device), or [code-command/pi](code-command/pi) folder.
64-
65107
## Convert text to speech
66108

67-
The same speech service you used to convert speech to text can be used to convert text back into speech, and this can be played through a microphone on your IoT device.
109+
The same speech service you used to convert speech to text can be used to convert text back into speech, and this can be played through a speaker on your IoT device. The text to convert is sent to the speech service, along with the type of audio required (such as the sample rate), and binary data containing the audio is returned.
110+
111+
When you send this request, you send it using *Speech Synthesis Markup Language* (SSML), an XML-based markup language for speech synthesis applications. This defines not only the text to be converted, but the language of the text, the voice to use, and can even be used to define speed, volume, and pitch for some or all of the words in the text.
112+
113+
For example, this SSML defines a request to convert the text "Your 3 minute 5 second time has been set" to speech using a British English voice called `en-GB-MiaNeural`
114+
115+
```xml
116+
<speak version='1.0' xml:lang='en-GB'>
117+
<voice xml:lang='en-GB' name='en-GB-MiaNeural'>
118+
Your 3 minute 5 second time has been set
119+
</voice>
120+
</speak>
121+
```
122+
123+
> 💁 Most text to speech systems have multiple voices for different languages, with relevant accents such as a British English voice with an English accent and a New Zealand English voice with a New Zealand accent.
68124

69125
### Task - convert text to speech
70126

@@ -78,12 +134,17 @@ Work through the relevant guide to convert text to speech using your IoT device:
78134

79135
## 🚀 Challenge
80136

137+
SSML has ways to change how words are spoken, such as adding emphasis to certain words, adding pauses, or changing pitch. Try some of these out, sending different SSML from your IoT device and comparing the output. You can read more about SSML, including how to change the way words are spoken in the [Speech Synthesis Markup Language (SSML) Version 1.1 specification from the World Wide Web consortium](https://www.w3.org/TR/speech-synthesis11/).
138+
81139
## Post-lecture quiz
82140

83141
[Post-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/46)
84142

85143
## Review & Self Study
86144

145+
* Read more on speech synthesis on the [Speech synthesis page on Wikipedia](https://wikipedia.org/wiki/Speech_synthesis)
146+
* Read more on ways criminals are using speech synthesis to steal on the [Fake voices 'help cyber crooks steal cash' story on BBC news](https://www.bbc.com/news/technology-48908736)
147+
87148
## Assignment
88149

89-
[](assignment.md)
150+
[Cancel the timer](assignment.md)
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
#
1+
# Cancel the timer
22

33
## Instructions
44

5+
In the assignment for the last lesson, you added a cancel timer intent to LUIS. For this assignment you need to handle this intent in the serverless code, send a command to the IoT device, then cancel the timer.
6+
57
## Rubric
68

79
| Criteria | Exemplary | Adequate | Needs Improvement |
810
| -------- | --------- | -------- | ----------------- |
9-
| | | | |
11+
| Handle the intent in serverless code and send a command | Was able to handle the intent and send a command to the device | Was able to handle the intent but was unable to send the command to the device | Was unable to handle the intent |
12+
| Cancel the timer on the device | Was able to receive the command and cancel the timer | Was able to receive the command but not cancel the timer | Was unable to receive the command |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"version": "2.0",
3+
"logging": {
4+
"applicationInsights": {
5+
"samplingSettings": {
6+
"isEnabled": true,
7+
"excludedTypes": "Request"
8+
}
9+
}
10+
},
11+
"extensionBundle": {
12+
"id": "Microsoft.Azure.Functions.ExtensionBundle",
13+
"version": "[2.*, 3.0.0)"
14+
}
15+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"IsEncrypted": false,
3+
"Values": {
4+
"FUNCTIONS_WORKER_RUNTIME": "python",
5+
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
6+
"IOT_HUB_CONNECTION_STRING": "<connection string>",
7+
"LUIS_KEY": "<primary key>",
8+
"LUIS_ENDPOINT_URL": "<endpoint url>",
9+
"LUIS_APP_ID": "<app id>",
10+
"REGISTRY_MANAGER_CONNECTION_STRING": "<connection string>"
11+
}
12+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Do not include azure-functions-worker as it may conflict with the Azure Functions platform
2+
3+
azure-functions
4+
azure-cognitiveservices-language-luis
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
from typing import List
2+
import logging
3+
4+
import azure.functions as func
5+
6+
import json
7+
import os
8+
from azure.cognitiveservices.language.luis.runtime import LUISRuntimeClient
9+
from msrest.authentication import CognitiveServicesCredentials
10+
11+
from azure.iot.hub import IoTHubRegistryManager
12+
from azure.iot.hub.models import CloudToDeviceMethod
13+
14+
def main(events: List[func.EventHubEvent]):
15+
luis_key = os.environ['LUIS_KEY']
16+
endpoint_url = os.environ['LUIS_ENDPOINT_URL']
17+
app_id = os.environ['LUIS_APP_ID']
18+
registry_manager_connection_string = os.environ['REGISTRY_MANAGER_CONNECTION_STRING']
19+
20+
credentials = CognitiveServicesCredentials(luis_key)
21+
client = LUISRuntimeClient(endpoint=endpoint_url, credentials=credentials)
22+
23+
for event in events:
24+
logging.info('Python EventHub trigger processed an event: %s',
25+
event.get_body().decode('utf-8'))
26+
27+
device_id = event.iothub_metadata['connection-device-id']
28+
29+
event_body = json.loads(event.get_body().decode('utf-8'))
30+
prediction_request = { 'query' : event_body['speech'] }
31+
32+
prediction_response = client.prediction.get_slot_prediction(app_id, 'Staging', prediction_request)
33+
34+
if prediction_response.prediction.top_intent == 'set timer':
35+
numbers = prediction_response.prediction.entities['number']
36+
time_units = prediction_response.prediction.entities['time unit']
37+
total_seconds = 0
38+
39+
for i in range(0, len(numbers)):
40+
number = numbers[i]
41+
time_unit = time_units[i][0]
42+
43+
if time_unit == 'minute':
44+
total_seconds += number * 60
45+
else:
46+
total_seconds += number
47+
48+
logging.info(f'Timer required for {total_seconds} seconds')
49+
50+
payload = {
51+
'seconds': total_seconds
52+
}
53+
direct_method = CloudToDeviceMethod(method_name='set-timer', payload=json.dumps(payload))
54+
55+
registry_manager_connection_string = os.environ['REGISTRY_MANAGER_CONNECTION_STRING']
56+
registry_manager = IoTHubRegistryManager(registry_manager_connection_string)
57+
58+
registry_manager.invoke_device_method(device_id, direct_method)
59+
60+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"scriptFile": "__init__.py",
3+
"bindings": [
4+
{
5+
"type": "eventHubTrigger",
6+
"name": "events",
7+
"direction": "in",
8+
"eventHubName": "samples-workitems",
9+
"connection": "IOT_HUB_CONNECTION_STRING",
10+
"cardinality": "many",
11+
"consumerGroup": "$Default",
12+
"dataType": "binary"
13+
}
14+
]
15+
}

0 commit comments

Comments
 (0)