Predictive Speech with Raspberry Pi and Deep Learning

by May 15, 2020Projects

You are giving a speech, get stuck in between, and don’t know what to say. No worries here is an Artificially Intelligent system that helps you out. Predictive Speech with raspberry pi and deep learning uses a raspberry pi and a microphone to record your speech. Its core is a trained LSTM (Long short term memory) model which either uses behavioral cloning or some speech data set and deployed to IBM Watson Cloud. Finally, a pygame window is used to display the speaker’s words as well as the predicted ones.

Here is the schematic representation of the workflow of this project. We will go step-by-step and dive into each topic separately.

Content

Audio Recognition

1. Raspberry Pi

Raspberry Pi is a small computer. It mostly uses Rasbian as its operating system but, as of today, it can support almost anything. From home automation to self-driving cars, it is used in almost every hardware project. We will use it to record audio continuously from the microphone.

Voice Recording on Raspberry Pi

2. Microphone

In this tutorial, a USB microphone will be used, but feel free to explore. The advantage of the USB microphone is that it is easily detected by Rasbian. To get started we first need to install pyaudio library in our system after that the speech recognition library.

pip install pyaudio

Since the speech recognition library is based on pyaudio we need to install it first. Pyauidio helps read the microphone and use it to directly record audio from python code.

pip install SpeechRecognition

The speech recognition library is used to capture the audio and easily translate it to text using google speech API. It also has various helpful functions that remove ambient noise and records audio in a different thread in the background. These libraries can be very useful in audio processing. It is recommended to go through its documentation to learn more.

3. Python Code

Recording the audio is a looped function and is the main control of the program and therefore all other functions will be called in its loop.

STEP-1 : Create a new python file and import the following libraries.

import speech_recognition as sr

STEP-2 : The pyaudio library need not be imported . Now declare a object :

#speech recognizer initialization
r = sr.Recognizer()
m = sr.Microphone()

STEP-3 : The code for the main loop to record audio continuously:

while(start):

    try:
        with m as source:
            
            # It listens by default for 1 sec and removes external noise
            r.adjust_for_ambient_noise(source)
            
            print("start speaking")
            audio = r.listen(source)  

    except sr.RequestError as e: 
        print("Could not request results; {0}".format(e)) 
        
    except sr.UnknownValueError: 
        print("unknown error occured") 

  • Line 7 listens for ambient noise only once through the looping.
  • If no sound is recorded then the second exception occurs.
  • The advantage of using this library is that it starts recording when you speak and stops when you are done.

Speech To Text

Each time the speaker speaks a sentence, it is recorded in an audio object, which is then converted to text using the google speech-to-text API. It is very efficient and the conversion quality is very high. The default API key is used and a personal key is not necessary. There are other APIs also like Bing and IBM, try them and see for yourself. The speech-to-text conversion is very extensively trained machine learning model and need not be understood in detail for this project.

A Speech Interface in 3 Steps

After listening to the audio through the microphone, in the same function, we pass it though the API and print the output.

Mytext = r.recognize_google(audio)
Mytext = Mytext.lower()

print(Mytext)

Artificial Intelligence

Today the world is heading towards a technology where IoT and AI are but one discipline. Together these fields can perform wonders. To implement and bring forward such ideas we will add the AI element to our project. To help the speaker in his speech we use this system to predict what he might say next in his speech and display it in a dialogue box. Creating and training an Ml dataset requires a dataset, an algorithm, and finally a certain output in mind. We will go through all these in detail as we code.

ML Board Help | MAGELLAN BLOCKS

1. Dataset

Each AI system has to initially learn from something. We use some raw data containing predetermined input and the desired output. It repeatedly uses this data to better its brain. In our project, we can either use the speaker’s previous speeches as data or simply, sample speeches. The dataset is input in text format.

# the dataset can be any text file containing strings
filename = "wonderland.txt"

raw_text = open(filename, 'r', encoding='utf-8').read()

raw_text = raw_text.lower()

2. Pre-processing

Each model requires the input data to be in a certain format. Our aim is to use the speaker’s last hundred words and predict the next character he might say. Then use the last ninety characters and the predicted output to again predict the next character. Do this till we predict at least a hundred new characters that the speaker might speak. So we need a sequential input and single character as output. Start by importing the NumPy library.

import numpy as np

STEP-1: Define the variables of our dataset

# find number of distinct characters our dataset contains
chars = sorted(list(set(raw_text)))

# give each character a particular number and vice versa
char_to_int = dict((c, i) for i,c in enumerate(chars))
int_to_char = dict((i, c) for i,c in enumerate(chars))
  • Since our model can understand numbers and not characters, Line 5 is necessary
  • The output of our model will too be a number, and to convert it back Line 6 is necessary.

STEP-2 : Define the total size of the dataset in vocabulary and in size.

n_chars = len(raw_text)
n_vocab = len(chars)

STEP-3 : Convert the string input into a sequence of integers , each of length 100. The output into a single integer that represents the character which comes after the input sequence.

seq_length = 100
dataX = []
dataY = []

for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i+seq_length]
    # The input data
    dataX.append([char_to_int[char] for char in seq_in])
    # The output data
    dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print("total patterns :", n_patterns)
  • If the dataset contains 101 characters then the loop will run 1 time, if it contains 110 characters then it will run 10 times.
  • the length of the input data is the total number of patterns formed, in the above example it will be 10.

Import a keras package to perform further processing:

from keras.utils import np_utils

STEP-4: Reshape the input and then normalize it. Convert the output to categorical data. So if there are 10 characters, a single output will not be an integer but an array of length 10 will all zeros except for one value whose index matches the integer.

X = np.reshape(dataX, (n_patterns, seq_length, 1))
# Normalization
X =X/ float(n_vocab)

y = np_utils.to_categorical(dataY)

Final output after pre-processing :

3. The Model

We will be using deep learning to create our model and LSTM(Long Short Term Memory) to build its structure. LSTM is used since it is perfectly suitable for recurrent and sequential operations. Each cell in an LSTM model has three gates which help it categorize and filter data. Refer to this YouTube channel if you want to learn more about LSTMs and recurrent networks : https://www.youtube.com/watch?v=8HyCNIVRbSU

Keras library and its sequential API are used to build our neural network.Start by importing all the required libraries:

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.callbacks import ModelCheckpoint
  • Dense is a fully connected layer
  • Dropout layer is used to set random units to 0 for proper training.

The Sequential model:

model = Sequential()
model.add(LSTM(256, input_shape = (X.shape[1], X.shape[2]), return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam')
  • Since there are many classes we use categorical cross-entropy as loss function.

4. Training

Depending on the dataset we train the model for certain epochs. For now the training will be for 50 epochs with a batch size of 64. We will also use callbacks to get the loss value after every batch is trained.

STEP-1 : Write the code for callbacks to print loss and save the model after every epoch.

filepath = "weights/weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose = 1, save_best_only=True, mode = 'min')
callbacks_list = [checkpoint]

STEP-2: Fit the model with the specified parameters. Also, depending on your system the training time will vary, so if you do not have a decent enough system it is suggested to use google colab or other cloud computing services.

model.fit(X, y, epochs = 50, batch_size=64, callbacks=callbacks_list)

5. Prediction

For now, we will predict the output of the model to test it locally and if it is good enough we will upload it in the cloud. For prediction either create a new file and load all variables or simply use the same file. Create a new function predict which inputs text and the dataset variables and outputs the prediction text.

def predict(text_in, char_to_int, int_to_char, model, n_vocab):
    text1=[]
    out = ''
    #converts the string input to numbers
    for i in text_in:
        text1.append(char_to_int[i]) 
    # each loop predicts one character therefore 100
    for i in range(100):
        text = np.reshape(text1, (1, len(text1), 1))
        #Normalize the input
        text = text /float(n_vocab)
        predict = model.predict(text, verbose = 0)
        index = np.argmax(predict)
        result = int_to_char[index]
        out +=result
        text1.append(index)
        text1 = text1[1:len(text1)]

    return out
  • Since Softmax activation gives a probability output, Line 13 finds the max probability value which should be our required output.
  • Each output integer is converted to the character and added to the output.
  • Lines 16 and 17 are used to include the prediction output as the next input to the model.

Depending on the dataset if the prediction output seems valid enough proceed to the deployment otherwise consider training the model for more epochs or vary the complexity and hyper-parameters of the model by hid and trial.

Deployment

Every IoT project needs cloud computing and so does ours. Not only for sake of it but because a raspberry pi cannot handle the computational complexity and will probably crash. Even with better-embedded systems, it is recommended to use Cloud Computing. Now, all we need to do is create Cloud credentials, deploy our model file there, and easily predict outputs.

IBM Knowledge Center

1. IBM-Watson

In this tutorial, we will be using the IBM-Watson Cloud Computing platform to deploy our model. Normally people prefer either Google Cloud or Digital Ocean but IBM is also very good. Also if you are comfortable with other platforms, feel free to use them instead. To get started with deployment we first need to create an IBM Cloud account.

STEP-1 : Create a IBM cloud account.

STEP-2 : Catalog -> Services -> AI -> Machine Learning

STEP-3 : New credential, and then copy the credential to your local system.

Install Watson library in your local system.

pip install watson_machine_learning_client

Create a new python file solely for deploying the model, import the library and paste the credentials

from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_credentials={
  "url": "xxxxxxxxxxxxxxxxxx",
  "apikey": "xxxxxxxxxxxxxxxxxxxxxxx",
  "instance_id": "xxxxxxxxxxxxxxxxxxxxxx"
}

# A client with the credentials
client = WatsonMachineLearningAPIClient(wml_credentials)

2. ML model

Compress the model first before deployment .Import the required libraries for compression in the same file.

from contextlib import suppress
import os

Now to compress the file we use the above libraries. The output file must be a “model.tgz” file.

#compressing model
filename = 'weights/model.h5'
# Compression to .tgz format
tar_filename = 'weights/model' + '.tgz'
cmdstring = 'tar -zcvf ' + tar_filename + ' ' + filename
os.system(cmdstring)

After compression, we need to define the properties of the model that we are uploading like the libraries and their versions. Check the versions used while training the model. Also, IBM supports only particular versions and the list is updated frequently, so do check it out. As of today, these specifications are up-to-date.

model_props = {
    client.repository.ModelMetaNames.NAME: 'Speech - compressed keras model',
    client.repository.ModelMetaNames.FRAMEWORK_NAME: 'tensorflow',
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: '1.15.0',
    client.repository.ModelMetaNames.RUNTIME_NAME: 'python',
    client.repository.ModelMetaNames.RUNTIME_VERSION: '3.6',
    client.repository.ModelMetaNames.FRAMEWORK_LIBRARIES: [{'name':'keras', 'version': '2.2.5'}]

Finally, deploy the model and get the required details to predict the output from it.

# Evaluates the model
published_model_details = client.repository.store_model(model=tar_filename, meta_props=model_props)       

# gets and prints the unique id of the model
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

# Deploys it as a web service
deployment = client.deployments.create(model_uid, 'Keras Speech recognition through compressed file.')

3. Prediction

For prediction using the cloud, some changes need to be madeto the local prediction code written earlier.

STEP-1 : Instead of loading the model using the credentials declare a client as done before. Then go to the deployment in your cloud account and copy the scoring endpoint.

def init():

    wml_credentials={
    "url": "xxxxxxxxxxxxxxxxxx",
    "apikey": "xxxxxxxxxxxxxxxxxxxxxxx",
    "username": "xxxxxxxxxxxxxxxxxxxx",
    "password": "xxxxxxxxxxxxxxxxxxxxxxx",
    "instance_id": "xxxxxxxxxxxxxxxxxxxxxx"}
    
    # IBM deployment initialization
    client = WatsonMachineLearningAPIClient(wml_credentials)

    char_to_int = pickle.load(open("variables/c2i.pkl", "rb"))
    int_to_char = pickle.load(open("variables/i2c.pkl", "rb"))
    n_vocab = pickle.load(open("variables/n_vocab.pkl", "rb"))

    scoring_endpoint = 'https://eu-gb.ml.cloud.ibm.com/xxxxxxxxxx'

    return client, char_to_int, int_to_char, n_vocab, scoring_endpoint

STEP-2 : Instead of predicting using the model use the client and endpoint to predict.

scoring_payload = {'values': text.tolist()}
predict = client.deployments.score(scoring_endpoint, scoring_payload)

Pygame

The output of this project is in the form of a GUI text. It continuously displays the speech as well as the prediction output. We will use pygame as the GUI library since its display is flexible and not bound by a loop or rigid like Tkinter. The text output from the google speech API is fed to pygame, then after crossing 100 words, the prediction output is also fed to a different function that displays the dialogue box. Install pygame and import it in a new file.

pip install pygame

STEP-1 : Importing:

import pygame

STEP-2 : Initialize basic pygame variables in a new function. The parameters can be varied as per choice.

def show_init():
    pygame.init()

    screen_size=(1500, 600)
    screen=pygame.display.set_mode(screen_size)

    pygame.display.set_caption('Show speech') 

    return screen

This should be able to display a blank pygame window titled ‘Show Speech’.

1. Speech Output

The speech output is in the form of scrolling text. Now we will define a new function to display the text. First, we input the text in the function then assign it a font, and then draw a rectangle around it to measure its height and width. Using those we can determine where to place the text. The next step is to take the text and go through the loop to add a scrolling effect.

def paint(screen, txt, posx, posy):
    font = pygame.font.Font('freesansbold.ttf', 18) 
    screen.fill((0, 0, 0), rect=[posx, posy, (font.size(txt[0])[0]*100), 20])

    clock = pygame.time.Clock()

    green = (0, 255, 0) 

    text = font.render(txt, True, green) 
    
    # create a rectangular object for the text surface object
    textRect = text.get_rect() 
    if((posx + textRect[2]) <= 800):
        textRect = textRect.move(posx, posy)
    else:
        posy +=textRect[3]
        posx = 0
        textRect = textRect.move(posx, posy)

    pygame.display.update()
    
    for i in range(len(txt)):
            text2 = font.render(txt[i], True, green) 
            screen.blit(text2, (posx +(font.size(txt[:i])[0]), posy)) 
            pygame.display.update()
            clock.tick(12)
    
    posx += textRect[2]
    
    return pygame.display.get_surface(), posx, posy
  • The same pygame screen initialized earlier is updated every function.
  • Line 12-17 are used for logically placing the text after the last, adjusted according to the size of the window.
  • Line 21-25 are used to scroll the text. A clock timer is required to check the speed of scrolling.
The accuracy of the text can be seen clearly. The only fault it had was interpreting pygame as tiger, but for common words even this error is highly rare.

2. Predict Output

The prediction is a 100 character output derived from the last 100 characters spoken by the speaker. It is shown in a dialogue box after the last words in the same window. The basic visual properties can be modified easily. Create a new function and copy the above in it, since only slight changes need to be made.

def pred_show(screen, txt, posx, posy):

    green = (0, 255, 0) 
    blue = (0, 0, 128) 

    font = pygame.font.Font('freesansbold.ttf', 18) 

    text = font.render(txt, True, green, blue) 

    # create a rectangular object for the text surface object 
    textRect = text.get_rect() 

    if((posx + textRect[2]) <= 800):
        print(str(posx+textRect[2])+ "ii")
        textRect = textRect.move(posx, posy)
    else:
        print(str(textRect[3]+ posy) + "jj")
        textRect = textRect.move(0, 300)
        #posx += textRect[2]

    pygame.display.update()
    
    # infinite loop 
    screen.blit(text, (posx, posy)) 
    pygame.display.update()

    return pygame.display.get_surface()
  • As can be seen, it is very similar to the earlier function as their job description is almost the same.
  • The value of x and y positions are not modified in this function
  • The scroll effect is not added
  • The background color of the text is blue to display it as a dialogue box.

The prediction may not be accurate because of the limitations of the model but it can bettered.

With this we come to the end of the tutorial, I hope you learned something new about raspberry pi, audio processing, and Artificial Intelligence. If you come along any doubts or errors go through the respective documentation or comment below. The entire working code and original project directory of this project can be found in this GitHub repository: https://github.com/Shaashwat05/Predictive_speech

If you like projects like these, do comment and more tutorials will be your way. Also, need similar ideas , visit this link: https://dzone.com/articles/artificial-intelligence-in-iot-4-examples-how-to-m-1

Creating a multiplication Skill in Alexa using python

Written By Monisha Macharla

Hi, I'm Monisha. I am a tech blogger and a hobbyist. I am eager to learn and explore tech related stuff! also, I wanted to deliver you the same as much as the simpler way with more informative content. I generally appreciate learning by doing, rather than only learning. Thank you for reading my blog! Happy learning!

RELATED POSTS

How to Simulate IoT projects using Cisco Packet Tracer

How to Simulate IoT projects using Cisco Packet Tracer

In this tutorial, let's learn how to simulate the IoT project using the Cisco packet tracer. As an example, we shall build a simple Home Automation project to control and monitor devices. Introduction Firstly, let's quickly look at the overview of the software. Packet...

How to design a Wireless Blind Stick using  nRF24L01 Module?

How to design a Wireless Blind Stick using nRF24L01 Module?

Introduction Let's learn to design a low-cost wireless blind stick using the nRF24L01 transceiver module. So the complete project is divided into the transmitter part and receiver part. Thus, the Transmitter part consists of an Arduino Nano microcontroller, ultrasonic...

How to implement Machine Learning on IoT based Data?

How to implement Machine Learning on IoT based Data?

Introduction The industrial scope for the convergence of the Internet of Things(IoT) and Machine learning(ML) is wide and informative. IoT renders an enormous amount of data from various sensors. On the other hand, ML opens up insight hidden in the acquired data....

Smart Display Board based on IoT and Google Firebase

Smart Display Board based on IoT and Google Firebase

Introduction In this tutorial, we are going to build a Smart Display Board based on IoT and Google Firebase by using NodeMCU8266 (or you can even use NodeMCU32) and LCD. Generally, in shops, hotels, offices, railway stations, notice/ display boards are used. They are...

Smart Gardening System – GO GREEN Project

Smart Gardening System – GO GREEN Project

Automation of farm activities can transform agricultural domain from being manual into a dynamic field to yield higher production with less human intervention. The project Green is developed to manage farms using modern information and communication technologies....

How to build a Safety Monitoring System for COVID-19

How to build a Safety Monitoring System for COVID-19

It is expected that the world will need to battle the COVID-19 pandemic with precautious measures until an effective vaccine is developed. This project proposes a real-time safety monitoring system for COVID-19. The proposed system would employ an Internet of Things...

VIDEOS – FOLLOW US ON YOUTUBE

EXPLORE OUR IOT PROJECTS

IoT Smart Gardening System – ESP8266, MQTT, Adafruit IO

Gardening is always a very calming pastime. However, our gardens' plants may not always receive the care they require due to our active lifestyles. What if we could remotely keep an eye on their health and provide them with the attention they require? In this article,...

How to Simulate IoT projects using Cisco Packet Tracer

In this tutorial, let's learn how to simulate the IoT project using the Cisco packet tracer. As an example, we shall build a simple Home Automation project to control and monitor devices. Introduction Firstly, let's quickly look at the overview of the software. Packet...

All you need to know about integrating NodeMCU with Ubidots over MQTT

In this tutorial, let's discuss Integrating NodeMCU and Ubidots IoT platform. As an illustration, we shall interface the DHT11 sensor to monitor temperature and Humidity. Additionally, an led bulb is controlled using the dashboard. Besides, the implementation will be...

All you need to know about integrating NodeMCU with Ubidots over Https

In this tutorial, let's discuss Integrating NodeMCU and Ubidots IoT platform. As an illustration, we shall interface the DHT11 sensor to monitor temperature and Humidity. Additionally, an led bulb is controlled using the dashboard. Besides, the implementation will be...

How to design a Wireless Blind Stick using nRF24L01 Module?

Introduction Let's learn to design a low-cost wireless blind stick using the nRF24L01 transceiver module. So the complete project is divided into the transmitter part and receiver part. Thus, the Transmitter part consists of an Arduino Nano microcontroller, ultrasonic...

Sending Temperature data to ThingSpeak Cloud and Visualize

In this article, we are going to learn “How to send temperature data to ThingSpeak Cloud?”. We can then visualize the temperature data uploaded to ThingSpeak Cloud anywhere in the world. But "What is ThingSpeak?” ThingSpeak is an open-source IoT platform that allows...

Amaze your friend with latest tricks of Raspberry Pi and Firebase

Introduction to our Raspberry Pi and Firebase trick Let me introduce you to the latest trick of Raspberry Pi and Firebase we'll be using to fool them. It begins with a small circuit to connect a temperature sensor and an Infrared sensor with Raspberry Pi. The circuit...

How to implement Machine Learning on IoT based Data?

Introduction The industrial scope for the convergence of the Internet of Things(IoT) and Machine learning(ML) is wide and informative. IoT renders an enormous amount of data from various sensors. On the other hand, ML opens up insight hidden in the acquired data....

Smart Display Board based on IoT and Google Firebase

Introduction In this tutorial, we are going to build a Smart Display Board based on IoT and Google Firebase by using NodeMCU8266 (or you can even use NodeMCU32) and LCD. Generally, in shops, hotels, offices, railway stations, notice/ display boards are used. They are...

Smart Gardening System – GO GREEN Project

Automation of farm activities can transform agricultural domain from being manual into a dynamic field to yield higher production with less human intervention. The project Green is developed to manage farms using modern information and communication technologies....