Text-To-Speech calls with Laravel and Nexmo
Published on by Michael Heap
This is the second part of a series of tutorials by Michael Heap, covering how to build a multi-channel help desk system with Laravel.
In our last post, we covered sending and receiving SMS Laravel Notifications with Nexmo, and I’d recommend completing the tasks in that post before working through this one. Alternatively, you can check out a pre-built version on Github.
Today, we’re going to take it a little bit further and add the ability to make a text-to-speech call to a customer whenever a new response is added to their ticket.
Prerequisites
To work through this post, you’ll need a Nexmo account and the Nexmo Command Line tool installed and configured, as well as your usual Laravel prerequisites. You’ll also need the Deskmo application from part 1 of this series.
Before we get started, you’ll want to expose your local server to the internet again. Run php artisan serve
in one terminal, and ngrok http 8000
in another. If you’re using an auto-generated ngrok
URL, you’ll need to update your SMS webhook URL on the Nexmo dashboard.
Choosing our notification method
The first thing to do is to make the notification mechanism for our ticket responses configurable. Instead of sending an SMS automatically, we want the help desk agent to be able to choose the method of feedback when they add a reply.
Open up resources/views/ticket/create.blade.php
and add the following HTML between the recipient
and submit
form groups to add radio buttons to select the notification method:
<div class="form-group"> <div class="radio"> <label> <input type="radio" name="notification_method" value="sms"> SMS </label> </div> <div class="radio"> <label> <input type="radio" name="notification_method" value="voice"> Voice </label> </div></div>
As well as adding the HTML, we need to update our TicketController
to use the value passed in. Open up app/Http/Controllers/TicketController.php
and edit the store
method. We need to mark notification_method
as a required field in the validator at the top of this method:
$data = $request->validate([ 'title' => 'required', 'content' => 'required', 'recipient' => 'required|exists:users,id', 'channel' => 'required', 'notification_method' => 'required',]);
Finally, at the end of the store
method look for where we send a notification and wrap it in an if
statement, adding conditions for when the preferred notification method is a voice call and when an invalid notification method is provided.
if ($data['notification_method'] === 'sms') { Notification::send($ticket->subscribedUsers()->get(), new TicketCreated($entry));} elseif ($data['notification_method'] === 'voice') { // Make a voice call here} else { throw new \Exception('Invalid notification method provided');}
The agent can now choose a voice call as the notification method, but it won’t result in a notification being sent to the customer. We need to add support for voice calls to application.
When handling a voice call, Nexmo will interact with our application in two different ways – when the call is answered and when the call status changes.
Answering a call
When a call is answered, Nexmo make a GET
request to an answer_url
. This endpoint in our application will return a JSON document that explains what to do on the call. This response is known as a Nexmo Call Control Object (NCCO).
The first thing to do, is create a WebhookController
to handle this incoming request:
php artisan make:controller WebhookController
To have the call speak a message to the customer, we need to use a talk action. This takes the following form when expressed as JSON:
[ { "action": "talk", "text": "This is an example talk action" }]
Let’s take that format and implement an answer
method that will return an NCCO. Our text should contain the message contents if the ticket entry exists, and an error message if it does not. Add the following to the new WebhookController
:
public function answer(TicketEntry $ticket) { if (!$ticket->exists) { return response()->json([ [ 'action' => 'talk', 'text' => 'Sorry, there has been an error fetching your ticket information' ] ]); } return response()->json([ [ 'action' => 'talk', 'text' => $ticket->content ] ]);}
We will need to add use App\TicketEntry;
to the top of our controller so that it can be used as a type hint in our method.
As well as implementing the controller, we need to tell our application how to route the incoming request. Open up routes/web.php
and add the following route at the bottom:
Route::get('/webhook/answer/{ticket?}', 'WebhookController@answer');
You should now be able to visit this webhook endpoint and view both a successful response and an error response.
Note: This will work for demo purposes, but it is extremely insecure as an attacker can put any ID in the URL to view a ticket response. If implementing this in your application, you’ll need to set up a shared secret between your application and Nexmo.
Voice call events
In addition to making a GET
request to the answer_url
when a call is answered, Nexmo will send status changes on the call to an event_url
. For example, Nexmo will make a POST
request to our event_url
when:
- A call is placed
- The call recipient answers
- All parties in the call hang up
- The call recording is available
As this information from these events could be useful later on, we’re going to add an entry to our application log each time we receive an event.
We’ll need to import the Log
facade at the top of our WebhookController
:
use Log;
Following that, we need to implement our event
method. As we’re writing everything we receive in the request as contextual information we only need two lines in this method – one to log the data and the other to tell Nexmo that data was received without any issues:
public function event(Request $request) { Log::info('Call event', $request->all()); return response('', 204);}
Finally, we need to add some routing so that the inbound Nemxo requests make it to this controller. Open up routes/web.php
again and the following route:
Route::post('/webhook/event', 'WebhookController@event');
As this is a POST
route and the data is coming from outside of our application, we need to open up app/Middleware/VerifyCsrfToken.php
like we did in the last post and add our new event endpoint to the list of whitelisted endpoints. This will prevent Nexmo from receiving a CSRF error from Laravel:
protected $except = [ 'ticket-entry', 'webhook/event'];
If you want to give it a go, make a POST
request to http://localhost:8000/webhook/event
with any data you like, and you will see it appear in the application log file at storage/logs/laravel.log
.
Creating a Nexmo application
Now that we’ve created all of the endpoints that Nexmo depend on, it’s time to register our application with them. To do this, we’ll use the Nexmo CLI tool that we installed in the first post.
There’s an app:create
command which accepts an answer_url
and event_url
, and will generate a private key that we use to authenticate with the Nexmo API. Create an application now, replacing the ngrok
URL with your own, by running the following command in the same directory as composer.json
:
nexmo app:create LaravelNews http://abc123.ngrok.io/webhook/answer http://abc123.ngrok.io/webhook/event --keyfile private.key
Once our application is created, we need to associate a Nexmo phone number to it. This is so that Nexmo know which application answer_url
to query when a phone call is made to that phone number. We’ll need the application ID that was returned by app:create
and the phone number we purchased in the first post. Once we have those, we can use the Nexmo CLI to link that number to our application:
nexmo link:app <YOUR_NUMBER> <APPLICATION_ID>
We’ve done everything we need to for Nexmo to be able to handle a voice call for our application. At this point, we can call our Nexmo number to hear the default text-to-speech message.
The only thing left to do is actually make an outbound voice call to our customer when a ticket entry is added.
nexmo/laravel
In the first blog post, we installed nexmo/client
with Composer to allow Laravel to send SMS notifications through the Notification system. This library is where all of the logic for interacting with the Nexmo API lives.
We could use that library directly, but Nexmo provide a Laravel service provider that makes life easier for us. Let’s go ahead and install that now:
composer require nexmo/laravel
We previously populated our .env
file with an API key and secret, but we need to authenticate using a JWT when making a voice call (rather than using our API key and secret). The Nexmo PHP library will take care of authentication for us – all we need to do is provide the path to the private.key
which was created earlier and the application_id
to use. Open up your .env
file and add the following:
NEXMO_APPLICATION_ID=<application_id>NEXMO_PRIVATE_KEY=./private.key
The nexmo/laravel
usually package publishes config/nexmo.php
, which will read the .env
file and populate the correct configuration values. However, as we created a nexmo
section in config/services.php
in the last post this will not happen in our application.
To add support for voice calls, we need to update config/services.php
and add private_key
and application_id
to the nexmo
section:
'nexmo' => [ 'key' => env('NEXMO_KEY'), 'secret' => env('NEXMO_SECRET'), 'sms_from' => env('NEXMO_NUMBER'), 'private_key' => env('NEXMO_PRIVATE_KEY'), 'application_id' => env('NEXMO_APPLICATION_ID'),],
Make a Text-To-Speech voice call
Now that we’ve provided our authentication details, it’s time to make a voice call. We need to update our TicketController
to trigger a call when the communication method chosen is voice.
To make an outbound voice call, we need to provide Nexmo with a to
number, a from
number and details on the answer_url
and event_url
to use for this call. This is important, as we want to provide a specific answer_url
based on the TicketEntry
ID that was just created.
Open TicketController
and scroll down to where we left the comment // Make a voice call here
. Replace that comment with the following code, making sure to replace the ngrok
URL with your own host name:
$currentHost = 'http://abc123.ngrok.io';Nexmo::calls()->create([ 'to' => [[ 'type' => 'phone', 'number' => $cc->user->phone_number ]], 'from' => [ 'type' => 'phone', 'number' => config('services.nexmo.sms_from') ], 'answer_url' => [$currentHost.'/webhook/answer/'.$entry->id], 'event_url' => [$currentHost.'/webhook/event']]);
In addition, we’ll need to add use Nexmo;
to our imports at the top of the file so that it is resolved correctly.
Once we’ve made these changes, we can give it a try! Add a new ticket and select voice call as the notification method. Your phone should ring, and the latest ticket entry should be read out using the Nexmo text-to-speech engine.
Capturing the user’s response
This is a great first step, but just like when we were sending an SMS, communication is one way. Wouldn’t it be great if the user could speak back to us and have that added as an entry on the ticket?
Nexmo allow us to record audio from a call and fetch that recording once the call has finished. We’ll use that recording and send it to a transcription service, taking the text returned and adding it to our ticket as a new entry.
For Nexmo to record our call, we need to add some new actions to our NCCO. We need to update our WebhookController
so that the answer
action reads the TicketEntry
that was added, then starts listening for a response from the user. We want the call to beep when the user can start talking, and stop recording when they press the # key:
return response()->json([ [ 'action' => 'talk', 'text' => $ticket->content ], [ 'action' => 'talk', 'text' => 'To add a reply, please leave a message after the beep, then press the pound key', 'voiceName' => 'Brian' ], [ 'action' => 'record', 'endOnKey' => '#', 'beepStart' => true ]]);
Try adding a new Ticket now, choosing voice as your notification method. When the call is placed you should hear the new actions that we just added and be prompted to add a response via voice.
Fetching the call recording
When the call is completed Nexmo will send an event to our event_url
containing the recording_url
of the call, which will look like the following:
{ "start_time": "2018-02-11T15:02:28Z", "recording_url": "https://api.nexmo.com/v1/files/092c732b-19b0-468c-bcd6-3f069650ddaf", "size": 28350, "recording_uuid": "8c618cc3-5bf5-42af-91cd-b628857f7fea", "end_time": "2018-02-11T15:02:35Z", "conversation_uuid": "CON-9ff341d8-fb45-47c7-aa27-9144c8db0447", "timestamp": "2018-02-11T15:02:35.889Z"}
At the moment, our application will take this event and log it to disk. As well as logging the information, we want to fetch the recording_url
and use a transcription service to fetch the text.
Open up WebhookController
again and update the event
method to check if recording_url
is set. If it is, call the transcribeRecording
method (which we’ll implement next):
public function event(Request $request) { $params = $request->all(); Log::info('Call event', $params); if (isset($params['recording_url'])) { $voiceResponse = $this->transcribeRecording($params['recording_url']); } return response('', 204);}
Next, we need to implement our transcribeRecording
method. The first thing to do is take the recording_url
and fetch it with the Nexmo library:
public function transcribeRecording($recordingUrl) { $audio = \Nexmo::get($recordingUrl)->getBody();}
This will provide the raw audio for the call, which we can feed in to a transcription service to get the speech as text.
Transcribe the call recording
To transcribe the call, we’re going to use IBM’s speech-to-text API. If you don’t already have an account, you’ll need to register for IBM Bluemix to work through this next section.
Once you’re logged in, visit the projects page and click on Create project in the top right. In the modal, click Get Watson Services , select Speech to Text and click Add Services on the right. Give your project a name, then create a project.
At the bottom of the page that appears there will be a Credentials section. Click on Show on the right hand side and make a note of your username
and password
.
IBM don’t have an official PHP client library, so we’re going to be using Guzzle
and making a HTTP request directly to their API. We make a POST
request with the audio data we just requested from Nexmo and will get a JSON document as a response that contains the transcribed text.
Add the following to your transcribeRecording
method to call the IBM transcription API, replacing username
and password
with your IBM credentials:
$client = new \GuzzleHttp\Client([ 'base_uri' => 'https://stream.watsonplatform.net/']); $transcriptionResponse = $client->request('POST', 'speech-to-text/api/v1/recognize', [ 'auth' => ['username', 'password'], 'headers' => [ 'Content-Type' => 'audio/mpeg', ], 'body' => $audio]); $transcription = json_decode($transcriptionResponse->getBody());
The IBM transcription API returns a JSON response, with a structure that looks like the following:
{ "results": [ { "alternatives": [ { "confidence": 0.767, "transcript": "hello " } ], "final": true }, { "alternatives": [ { "confidence": 0.982, "transcript": "this is a test " } ], "final": true } ], "result_index": 0}
We need to loop through this response, build up a string and add it as a reply to our ticket. Add the following to the end of your transcribeRecording
method:
$voiceResponse = '';foreach ($transcription->results as $result) { $voiceResponse .= $result->alternatives[0]->transcript.' ';} return $voiceResponse;
Now that we have our transcribed text, it’s time to add it to our ticket. Unfortunately we don’t have a way of linking the incoming recording_url
to the user’s phone number, so we don’t know which ticket the audio is in response to.
Note: To keep things simple in this post, we’ll add this response to the latest ticket created and attribute it to the user that created the ticket. In the real world, you’d need to keep track of the call’s conversation_uuid
when you place the call and use that to find the correct user (but I’ll leave that as an exercise for you).
Update your event
method so that the if
statement that checks if there’s a recording_url
also creates a new TicketEntry
and adds it to the latest ticket:
if (isset($params['recording_url'])) { $voiceResponse = $this->transcribeRecording($params['recording_url']); $ticket = Ticket::all()->last(); $user = $ticket->subscribedUsers()->first(); $entry = new TicketEntry([ 'content' => $voiceResponse, 'channel' => 'voice', ]); $entry->user()->associate($user); $entry->ticket()->associate($ticket); $entry->save();}
As well as updating the event
method, you’ll need to add use App\Ticket;
to your list of imports at the top of the WebhookController
.
Add a new ticket now, and when you receive a phone call try saying “this is an audio reply, which is awesome” before pressing the # key. Once the call is done, open up the ticket that was just added and refresh until your reply appears (it shouldn’t take more than a few seconds).
Conclusion
Congratulations, you made it to the end! We took our existing application that could handle two way SMS messaging and added the capability to make voice calls and accept responses from our customers using the IBM Watson transcription API.
In the next post in this series, we’ll be adding support for messaging your customers using a chat application rather than SMS or voice, providing them with a way to have a real-time conversation with your support staff.
If you’d like some Nexmo credit to work through this post and test the platform out, get in touch at devrel@nexmo.com quoting LaravelNews and we’ll get that sorted for you.
If you have any thoughts or questions, don’t hesitate to reach out to @mheap on Twitter or to devrel@nexmo.com.
Many thanks for Nexmo sponsoring Laravel News this week.
Michael is a PHP developer advocate at Nexmo. Working with a variety of languages and tools, he shares his technical expertise to audiences all around the world at user groups and conferences. When he finds time to code, he enjoys reducing complexity in systems and making them more predictable.