Text-To-Speech calls with Laravel and Nexmo

This is the second part of a series of tutorials by Michael Heap, covering how to build a multi-channel help desk system with Laravel.

In our last post, we covered sending and receiving SMS Laravel Notifications with Nexmo, and I’d recommend completing the tasks in that post before working through this one. Alternatively, you can check out a pre-built version on Github.

Today, we’re going to take it a little bit further and add the ability to make a text-to-speech call to a customer whenever a new response is added to their ticket.

Prerequisites

To work through this post, you’ll need a Nexmo account and the Nexmo Command Line tool installed and configured, as well as your usual Laravel prerequisites. You’ll also need the Deskmo application from part 1 of this series.

Before we get started, you’ll want to expose your local server to the internet again. Run php artisan serve in one terminal, and ngrok http 8000 in another. If you’re using an auto-generated ngrok URL, you’ll need to update your SMS webhook URL on the Nexmo dashboard.

Choosing our notification method

The first thing to do is to make the notification mechanism for our ticket responses configurable. Instead of sending an SMS automatically, we want the help desk agent to be able to choose the method of feedback when they add a reply.

Open up resources/views/ticket/create.blade.php and add the following HTML between the recipient and submit form groups to add radio buttons to select the notification method:

<div class="form-group">
    <div class="radio">
        <label>
            <input type="radio" name="notification_method" value="sms">
            SMS
        </label>
    </div>
    <div class="radio">
        <label>
            <input type="radio" name="notification_method" value="voice">
            Voice
        </label>
    </div>
</div>

As well as adding the HTML, we need to update our TicketController to use the value passed in. Open up app/Http/Controllers/TicketController.php and edit the store method. We need to mark notification_method as a required field in the validator at the top of this method:

$data = $request->validate([
    'title' => 'required',
    'content' => 'required',
    'recipient' => 'required|exists:users,id',
    'channel' => 'required',
    'notification_method' => 'required',
]);

Finally, at the end of the store method look for where we send a notification and wrap it in an if statement, adding conditions for when the preferred notification method is a voice call and when an invalid notification method is provided.

if ($data['notification_method'] === 'sms') {
    Notification::send($ticket->subscribedUsers()->get(), new TicketCreated($entry));
} elseif ($data['notification_method'] === 'voice') {
    // Make a voice call here
} else {
    throw new \Exception('Invalid notification method provided');
}

The agent can now choose a voice call as the notification method, but it won’t result in a notification being sent to the customer. We need to add support for voice calls to application.

When handling a voice call, Nexmo will interact with our application in two different ways – when the call is answered and when the call status changes.

Answering a call

When a call is answered, Nexmo make a GET request to an answer_url. This endpoint in our application will return a JSON document that explains what to do on the call. This response is known as a Nexmo Call Control Object (NCCO).

The first thing to do, is create a WebhookController to handle this incoming request:

php artisan make:controller WebhookController

To have the call speak a message to the customer, we need to use a talk action. This takes the following form when expressed as JSON:

[
  {
    "action": "talk",
    "text": "This is an example talk action"
  }
]

Let’s take that format and implement an answer method that will return an NCCO. Our text should contain the message contents if the ticket entry exists, and an error message if it does not. Add the following to the new WebhookController:

public function answer(TicketEntry $ticket) {
    if (!$ticket->exists) {
        return response()->json([
            [
                'action' => 'talk',
                'text' => 'Sorry, there has been an error fetching your ticket information'
            ]
        ]);
    }
 
    return response()->json([
        [
            'action' => 'talk',
            'text' => $ticket->content
        ]
    ]);
}

We will need to add use App\TicketEntry; to the top of our controller so that it can be used as a type hint in our method.

As well as implementing the controller, we need to tell our application how to route the incoming request. Open up routes/web.php and add the following route at the bottom:

Route::get('/webhook/answer/{ticket?}', 'WebhookController@answer');

You should now be able to visit this webhook endpoint and view both a successful response and an error response.

Note: This will work for demo purposes, but it is extremely insecure as an attacker can put any ID in the URL to view a ticket response. If implementing this in your application, you’ll need to set up a shared secret between your application and Nexmo.

Voice call events

In addition to making a GET request to the answer_url when a call is answered, Nexmo will send status changes on the call to an event_url. For example, Nexmo will make a POST request to our event_url when:

A call is placed
The call recipient answers
All parties in the call hang up
The call recording is available

As this information from these events could be useful later on, we’re going to add an entry to our application log each time we receive an event.

We’ll need to import the Log facade at the top of our WebhookController:

use Log;

Following that, we need to implement our event method. As we’re writing everything we receive in the request as contextual information we only need two lines in this method – one to log the data and the other to tell Nexmo that data was received without any issues:

public function event(Request $request) {
    Log::info('Call event', $request->all());
    return response('', 204);
}

Finally, we need to add some routing so that the inbound Nemxo requests make it to this controller. Open up routes/web.php again and the following route:

Route::post('/webhook/event', 'WebhookController@event');

As this is a POST route and the data is coming from outside of our application, we need to open up app/Middleware/VerifyCsrfToken.php like we did in the last post and add our new event endpoint to the list of whitelisted endpoints. This will prevent Nexmo from receiving a CSRF error from Laravel:

protected $except = [
    'ticket-entry',
    'webhook/event'
];

If you want to give it a go, make a POST request to http://localhost:8000/webhook/event with any data you like, and you will see it appear in the application log file at storage/logs/laravel.log.

Creating a Nexmo application

Now that we’ve created all of the endpoints that Nexmo depend on, it’s time to register our application with them. To do this, we’ll use the Nexmo CLI tool that we installed in the first post.

There’s an app:create command which accepts an answer_url and event_url, and will generate a private key that we use to authenticate with the Nexmo API. Create an application now, replacing the ngrok URL with your own, by running the following command in the same directory as composer.json:

nexmo app:create LaravelNews http://abc123.ngrok.io/webhook/answer http://abc123.ngrok.io/webhook/event --keyfile private.key

Once our application is created, we need to associate a Nexmo phone number to it. This is so that Nexmo know which application answer_url to query when a phone call is made to that phone number. We’ll need the application ID that was returned by app:create and the phone number we purchased in the first post. Once we have those, we can use the Nexmo CLI to link that number to our application:

nexmo link:app <YOUR_NUMBER> <APPLICATION_ID>

We’ve done everything we need to for Nexmo to be able to handle a voice call for our application. At this point, we can call our Nexmo number to hear the default text-to-speech message.

The only thing left to do is actually make an outbound voice call to our customer when a ticket entry is added.

nexmo/laravel

In the first blog post, we installed nexmo/client with Composer to allow Laravel to send SMS notifications through the Notification system. This library is where all of the logic for interacting with the Nexmo API lives.

We could use that library directly, but Nexmo provide a Laravel service provider that makes life easier for us. Let’s go ahead and install that now:

composer require nexmo/laravel

We previously populated our .env file with an API key and secret, but we need to authenticate using a JWT when making a voice call (rather than using our API key and secret). The Nexmo PHP library will take care of authentication for us – all we need to do is provide the path to the private.key which was created earlier and the application_id to use. Open up your .env file and add the following:

NEXMO_APPLICATION_ID=<application_id>
NEXMO_PRIVATE_KEY=./private.key

The nexmo/laravel usually package publishes config/nexmo.php, which will read the .env file and populate the correct configuration values. However, as we created a nexmo section in config/services.php in the last post this will not happen in our application.

To add support for voice calls, we need to update config/services.php and add private_key and application_id to the nexmo section:

'nexmo' => [
    'key' => env('NEXMO_KEY'),
    'secret' => env('NEXMO_SECRET'),
    'sms_from' => env('NEXMO_NUMBER'),
    'private_key' => env('NEXMO_PRIVATE_KEY'),
    'application_id' => env('NEXMO_APPLICATION_ID'),
],

Make a Text-To-Speech voice call

Now that we’ve provided our authentication details, it’s time to make a voice call. We need to update our TicketController to trigger a call when the communication method chosen is voice.

To make an outbound voice call, we need to provide Nexmo with a to number, a from number and details on the answer_url and event_url to use for this call. This is important, as we want to provide a specific answer_url based on the TicketEntry ID that was just created.

Open TicketController and scroll down to where we left the comment // Make a voice call here. Replace that comment with the following code, making sure to replace the ngrok URL with your own host name:

$currentHost = 'http://abc123.ngrok.io';
Nexmo::calls()->create([
    'to' => [[
        'type' => 'phone',
        'number' => $cc->user->phone_number
    ]],
    'from' => [
        'type' => 'phone',
        'number' => config('services.nexmo.sms_from')
    ],
    'answer_url' => [$currentHost.'/webhook/answer/'.$entry->id],
    'event_url' => [$currentHost.'/webhook/event']
]);

In addition, we’ll need to add use Nexmo; to our imports at the top of the file so that it is resolved correctly.

Once we’ve made these changes, we can give it a try! Add a new ticket and select voice call as the notification method. Your phone should ring, and the latest ticket entry should be read out using the Nexmo text-to-speech engine.

Capturing the user’s response

This is a great first step, but just like when we were sending an SMS, communication is one way. Wouldn’t it be great if the user could speak back to us and have that added as an entry on the ticket?

Nexmo allow us to record audio from a call and fetch that recording once the call has finished. We’ll use that recording and send it to a transcription service, taking the text returned and adding it to our ticket as a new entry.

For Nexmo to record our call, we need to add some new actions to our NCCO. We need to update our WebhookController so that the answer action reads the TicketEntry that was added, then starts listening for a response from the user. We want the call to beep when the user can start talking, and stop recording when they press the # key:

return response()->json([
    [
        'action' => 'talk',
        'text' => $ticket->content
    ],
    [
        'action' => 'talk',
        'text' => 'To add a reply, please leave a message after the beep, then press the pound key',
        'voiceName' => 'Brian'
    ],
    [
        'action' => 'record',
        'endOnKey' => '#',
        'beepStart' => true
    ]
]);

Try adding a new Ticket now, choosing voice as your notification method. When the call is placed you should hear the new actions that we just added and be prompted to add a response via voice.

Fetching the call recording

When the call is completed Nexmo will send an event to our event_url containing the recording_url of the call, which will look like the following:

{
  "start_time": "2018-02-11T15:02:28Z",
  "recording_url": "https://api.nexmo.com/v1/files/092c732b-19b0-468c-bcd6-3f069650ddaf",
  "size": 28350,
  "recording_uuid": "8c618cc3-5bf5-42af-91cd-b628857f7fea",
  "end_time": "2018-02-11T15:02:35Z",
  "conversation_uuid": "CON-9ff341d8-fb45-47c7-aa27-9144c8db0447",
  "timestamp": "2018-02-11T15:02:35.889Z"
}

At the moment, our application will take this event and log it to disk. As well as logging the information, we want to fetch the recording_url and use a transcription service to fetch the text.

Open up WebhookController again and update the event method to check if recording_url is set. If it is, call the transcribeRecording method (which we’ll implement next):

public function event(Request $request) {
    $params = $request->all();
    Log::info('Call event', $params);
    if (isset($params['recording_url'])) {
        $voiceResponse = $this->transcribeRecording($params['recording_url']);
    }
    return response('', 204);
}

Next, we need to implement our transcribeRecording method. The first thing to do is take the recording_url and fetch it with the Nexmo library:

public function transcribeRecording($recordingUrl) {
    $audio = \Nexmo::get($recordingUrl)->getBody();
}

This will provide the raw audio for the call, which we can feed in to a transcription service to get the speech as text.

Transcribe the call recording

To transcribe the call, we’re going to use IBM’s speech-to-text API. If you don’t already have an account, you’ll need to register for IBM Bluemix to work through this next section.

Once you’re logged in, visit the projects page and click on Create project in the top right. In the modal, click Get Watson Services , select Speech to Text and click Add Services on the right. Give your project a name, then create a project.

At the bottom of the page that appears there will be a Credentials section. Click on Show on the right hand side and make a note of your username and password.

IBM don’t have an official PHP client library, so we’re going to be using Guzzle and making a HTTP request directly to their API. We make a POST request with the audio data we just requested from Nexmo and will get a JSON document as a response that contains the transcribed text.

Add the following to your transcribeRecording method to call the IBM transcription API, replacing username and password with your IBM credentials:

$client = new \GuzzleHttp\Client([
    'base_uri' => 'https://stream.watsonplatform.net/'
]);
 
$transcriptionResponse = $client->request('POST', 'speech-to-text/api/v1/recognize', [
    'auth' => ['username', 'password'],
    'headers' => [
        'Content-Type' => 'audio/mpeg',
    ],
    'body' => $audio
]);
 
$transcription = json_decode($transcriptionResponse->getBody());

The IBM transcription API returns a JSON response, with a structure that looks like the following:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.767,
          "transcript": "hello "
        }
      ],
      "final": true
    },
    {
      "alternatives": [
        {
          "confidence": 0.982,
          "transcript": "this is a test "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

We need to loop through this response, build up a string and add it as a reply to our ticket. Add the following to the end of your transcribeRecording method:

$voiceResponse = '';
foreach ($transcription->results as $result) {
    $voiceResponse .= $result->alternatives[0]->transcript.' ';
}
 
return $voiceResponse;

Now that we have our transcribed text, it’s time to add it to our ticket. Unfortunately we don’t have a way of linking the incoming recording_url to the user’s phone number, so we don’t know which ticket the audio is in response to.

Note: To keep things simple in this post, we’ll add this response to the latest ticket created and attribute it to the user that created the ticket. In the real world, you’d need to keep track of the call’s conversation_uuid when you place the call and use that to find the correct user (but I’ll leave that as an exercise for you).

Update your event method so that the if statement that checks if there’s a recording_url also creates a new TicketEntry and adds it to the latest ticket:

if (isset($params['recording_url'])) {
    $voiceResponse = $this->transcribeRecording($params['recording_url']);
 
    $ticket = Ticket::all()->last();
    $user = $ticket->subscribedUsers()->first();
 
    $entry = new TicketEntry([
        'content' => $voiceResponse,
        'channel' => 'voice',
    ]);
 
    $entry->user()->associate($user);
    $entry->ticket()->associate($ticket);
    $entry->save();
}

As well as updating the event method, you’ll need to add use App\Ticket; to your list of imports at the top of the WebhookController.

Add a new ticket now, and when you receive a phone call try saying “this is an audio reply, which is awesome” before pressing the # key. Once the call is done, open up the ticket that was just added and refresh until your reply appears (it shouldn’t take more than a few seconds).

Conclusion

Congratulations, you made it to the end! We took our existing application that could handle two way SMS messaging and added the capability to make voice calls and accept responses from our customers using the IBM Watson transcription API.

In the next post in this series, we’ll be adding support for messaging your customers using a chat application rather than SMS or voice, providing them with a way to have a real-time conversation with your support staff.

If you’d like some Nexmo credit to work through this post and test the platform out, get in touch at devrel@nexmo.com quoting LaravelNews and we’ll get that sorted for you.

If you have any thoughts or questions, don’t hesitate to reach out to @mheap on Twitter or to devrel@nexmo.com.

Many thanks for Nexmo sponsoring Laravel News this week.

Search Articles

Text-To-Speech calls with Laravel and Nexmo

Prerequisites

Choosing our notification method

Answering a call

Voice call events

Creating a Nexmo application

nexmo/laravel

Make a Text-To-Speech voice call

Capturing the user’s response

Fetching the call recording

Transcribe the call recording

Conclusion

Laravel Newsletter

Laravel Jobs

Partners

The latest

Add Comments to your Laravel Application with the Commenter Package

Laravel Advanced String Package

Take the Annual State of Laravel 2024 Survey

Upload Files Using Filepond in Livewire Components

The Best Laravel Tutorials and Resources for Developers

Introducing Built with Laravel

Laravel News

Text-To-Speech calls with Laravel and Nexmo

#Prerequisites

#Choosing our notification method

#Answering a call

#Voice call events

#Creating a Nexmo application

#nexmo/laravel

#Make a Text-To-Speech voice call

#Capturing the user’s response

#Fetching the call recording

#Transcribe the call recording

#Conclusion

Laravel Newsletter

Laravel Jobs

Partners

Laravel Forge

Tinkerwell

No Compromises

Kirschbaum

Shift

Bacancy

Lucky Media

Lunar: Laravel E-Commerce

LaraJobs

SaaSykit: Laravel SaaS Starter Kit

Rector

MongoDB

The latest

Add Comments to your Laravel Application with the Commenter Package

Laravel Advanced String Package

Take the Annual State of Laravel 2024 Survey

Upload Files Using Filepond in Livewire Components

The Best Laravel Tutorials and Resources for Developers

Introducing Built with Laravel

Prerequisites

Choosing our notification method

Answering a call

Voice call events

Creating a Nexmo application

nexmo/laravel

Make a Text-To-Speech voice call

Capturing the user’s response

Fetching the call recording

Transcribe the call recording

Conclusion