I came back from vacation to find that I was unable to login to OCS. Turns out that an update installed on 10/14 broke the front-end services for OCS. The event log shows the following error:
The evaluation period for Microsoft Office Communications Server 2007 R2 has expired. Please upgrade from the evaluation version to the full released version of the product.
Uninstalling the KB974581 update corrected the problem.
Big thanks to Dietmar Kraume’s blog post at http://tinyurl.com/yjefeg9 for pointing to the solution.
I’m going to be at SpeechTEK August 25th and 26th (Tues and Wed). If anyone in the GotSpeech community is attending, drop me a line. It is always nice to meet in person for a change.
One of the most common ways of queuing outbound calls with Speech Server is by using the built in MSMQ support. For the most part, using a message queue is extremely straightforward and easy to implement. But there is one gotcha – call throttling.
When an outbound call fails to connect, Speech Server will start throttling down the number of messages it pulls from the queue. The assumption is that failures are the result of your system having insufficient capacity to handle the load. The more failures your have, the more it will throttle the load. This would be acceptable if it were not for two issues:
- Almost any failure, even acceptable ones, can result in throttling
- It will happy throttle you all the way down to 0, effectively shutting down your application
Disabling call throttling needs to be done on a per-call bases by binding to the TelephonySession.OpenCompleted event prior to the MakeCall Activity. Typically I do this in a Code Activity at the very top of my workflow.
Inside my initWorkflow Activity I bind to the OpenCompleted like so:
TelephonySession.OpenCompleted += new EventHandler<Microsoft.SpeechServer.AsyncCompletedEventArgs>(TelephonySession_OpenCompleted);
Then I use the following to disable throttling on any failure:
void TelephonySession_OpenCompleted(object sender, Microsoft.SpeechServer.AsyncCompletedEventArgs e)
{
if (e.Error != null && e.Error is SipPeerException)
{
SipPeerException sipEx = e.Error as SipPeerException;
sipEx.ShouldThrottleMsmqCalls = false;
}
}
Yesterday I received an email announcing that I had been made an MVP for my work in the Microsoft Communications Server community. This is my first MVP award. I’m very excited to be a part of the MVP program and to be a resources to others in the community. I expect Unified Communications (and OCS along with it) to experience substantial growth over the next 5 years and I’m thrilled to be a part of it.

When placing outbound calls there are a number of scenarios that you will want to apply business logic too. When your application reaches a BUSY number for example, you will need to make an appropriate decision as to how to process this result (should you redial, wait some amount of time and then redial, stop calling, etc).
In order to handle these events you will first need to catch them using the Fault Handler page. From here you define the workflows for processing each fault type.
The default project template for speech workflows includes two fault handlers. The first handles call disconnections (normally this happens when a caller hangs up) and a general fault handler for catching everything else.
We’re going to add another one for trapping SIP results. This will fire off a code block where we will make decisions on how to handle the results we care about.
- From the toolbox, drag a new FaultHandler object into the fault handler’s list.
- On the Properties tab for the FaultHandler, open the details dialog for the FaultType property.
- In the Type Name text box enter “Microsoft.SpeechServer.SipPeerException” and click ok.
- Drop a Code activity into the workflow area of the new FaultHandler instance
- Double-click the new Code activity to open the code view for the ExectuteCode event.
Now you can process the SipPeerException based on the ResponseCode you’ve received. For example:
if (sipPeerFaultHandler.Fault is Microsoft.SpeechServer.SipPeerException)
{
Microsoft.SpeechServer.SipPeerException ex = (Microsoft.SpeechServer.SipPeerException)sipPeerFaultHandler.Fault;
switch (sipException.ResponseCode)
{
case 486: // Decline with Busy Here
case 600: // Decline with Busy everywhere
/*
*
* Handle your busy case here
*
*/
break;
case 480: // Temporarily unavailable
case 503: // Service Unavailable
case 603: // Decline
case 408: // Request Timeout
case 504: // Gateway Timeout
case 404: // Not Found
case 484: // Address Incomplete
case 604: // Does Not Exist Anywhere
case 485: // Ambiguous
case 410: // Gone
default:
break;
}
}
I’ve just finished reading the Microsoft Office Communications server 2007 R2 Resource Kit (yes I know, I read a resource kit cover to cover. I’m a geek+ this week). If you’re going to be working with OCS 2007 R2 then do yourself a favor and pick this book up.
They do a great job at covering the OCS architecture and outlining implementation scenarios. There are a lot of different bits involved with deploying OCS and they cover each of them. They also include some helpful troubleshooting along side.
From a developer’s perspective it is important to understand how OCS is deployed. Understanding how certificates are used for authentication for example is vital before you attempt to get your first Hello World off the ground.
Unfortunately they don’t cover Speech Server or the new UCMA 2.0 API here. It would have been nice to have an overview of OCS as a development platform. But at 800+ pages I can see why they decided to no address it.
Even without API coverage this is a must have for IT Pros and Devs working with OCS. Check it out: http://shrinkster.com/154z
When testing Speech Server applications in our lab we often use what we call a “Box-2-Box” test. This is where we point two Speech Server instances at each other such that “Box A” makes an outbound call to “Box B” over the local LAN. This can be useful when load testing an the application beyond the number of physical ports you have available.
One of the issues we ran into immediately however is that Speech Server issues a 302 Redirect upon answering a call to move the call to another TCP port. While most SIP endpoints will seamlessly handle the redirect, ironically the Speech Server MakeCall Activity isn’t one of them. Luckily there is a simple workaround to the issue.
Prior to initiating the outbound call you need to subscribe to the Redirecting event of the TelephonySession object.
TelephonySession.Redirecting += new EventHandler<Microsoft.SpeechServer.RedirectingEventArgs>(TelephonySession_Redirecting);
Then in the TelephonySession_Redirecting event handler you tell it to accept the redirect.
void TelephonySession_Redirecting(object sender, Microsoft.SpeechServer.RedirectingEventArgs e)
{
e.Action = Microsoft.SpeechServer.RedirectAction.TryAll;
}
Wireshark is a free tool for capturing network traffic. It is an invaluable resource for troubleshooting problems with VOIP calls. It is available for download from www.wireshark.org (current version for Windows as of this post is 1.05).
I’m going to walk through the steps required for setting up a basic trace with Wireshark. This will give you a view of the underlying SIP traffic between your speech platform and your SIP gateway. Please note that this information isn’t specific to OCS Speech Server so if you’re using something else simply replace the OCS references with your platform.
1) Download and install Wireshark from www.wireshark.org on the machine where Speech Server is installed.
2) Now launch Wireshark and open up the “Capture” menu. Select “Interfaces” (the first item in the menu). This will open a window listing each of your network adapters, their current IP address and the number of packets currently travailing down the wire.
3) Click the “Start” button for the interface that Speech Server is using (sometimes it helps to run a call and see which interface has traffic).
4) Now you should be seeing a flood of traffic in the Wireshark window. This is a real-time view of the data traveling up and down you’re network connection (it is a bit scary how much traffic you see on a typical network). Of course as it stands right now there is far too much information being shown to parse through. To narrow our focus we’ll apply a “Filter” for the traffic we want.
5) In the Wireshark filter’s box enter the value “sip” (all lowercase) and hit enter. This will filter out everything that isn’t SIP traffic. In this screenshot you can see the traffic for a single call.
6) The top section shows a summary of each packet including the time, source and destination. The lower section shows the contents of the selected packet from above. It is this lower area where you can see the actual SIP traffic details. By expanding the “Session Initial Protocol” section for the first packet we can see the INVITE header that was sent to the gateway.
This is just a starting point. There are a lot of valuable functionality in Wireshark such as decoding the audio from the call, viewing ladder diagrams of the conversation, complex filtering rules… There is a lot in there. But every one of them starts with this simple collection process.
There is quite a bit of information in a typical SIP conversation and often times troubleshooting involves figuring out obscure differences between the server and the gateway SIP implementation. As an example, I once found a problem between the Nuance Voice Platform and an Acme Packet SBC involving the format of the ALLOW property of the SIP INVITE (turns out Nuance wants them all delimited on one line and Acme Packet put them out as separate elements). Without Wireshark I would have been trying to diagnose this blind.
The only downside is that you can only monitor conversations between two endpoints. For developers this means you can’t easily view the conversations using the Visual Studio debugging tools. You need a gateway or remote SIP phone in order to see what information is being transmitted.
UPDATE: Marshall Harrison wrote a very similar post a year ago on this topic that I missed somehow (Google, are you failing me?). Someone should be ready to do the third version for 2010. :)
A while ago Microsoft released a patch for the nasty red X that would appear over certain workflow activities (KB950210). Unfortunately this patch only worked on 32 bit editions of Windows. If you were running Vista x64 for example the patch would fail during installation.
After more effort than this cosmetic error deserved, I’ve been able to manually patch my Vista x64 machine. Using a great post by Heath Stewart (Extract Files from Patches, February 2006) I was able to construct the patched .DLL using the x86 edition of the patch.
First I performed an “Administrative Installation” of the x86 edition of Speech Server. This extracts the MSI into a temporary directory.
start /wait msiexec /a .\Data\mss32.msi TARGETDIR="%TMP%\mss32" /qn
I then patched the administrative installation.
start /wait msiexec /p SpeechServer2007-KB950210-x86-ENT.msp /a "%tmp%\mss32\mss32.msi"
The result of this patch is a new version of Microsoft.SpeechServer.Authoring.DialogDesigner.dll located in the %Tmp%\mss32\IDE\PrivateAssemblies directory.
To manually patch Visual Studio 2005 I simply copied the new .DLL into Common7\IDE\PrivateAssemblies directory under Visual Studio (C:\Program Files (x86)\Microsoft Visual Studio 8\Common7\IDE\PrivateAssemblies on my machine)
For those not wanting to go through the extraction process I’ve made the patched version I extracted available here. It should work just fine, but given the nature of MSI I cannot promise that it won’t in fact destroy your computer, burn down your home and eat all your candy. So be careful.
More developers are starting to work with Office Communication Server 2007 these days. As the community has grown I’ve noticed a number of developers running into a few “gotchas” when working with the OCS outbound calling mechanism.
MSMQ Path Configuration
OCS makes use of MSMQ for managing the queue of outbound calls to be placed. This allows you to place many more calls into the queue than there are available ports (aka phone lines).
When pointing OCS at your MSMQ you need to provide the full path to the queue. This isn’t made very clear in the application configuration dialog. This coupled with the lack of validation within the dialog makes it difficult to determine why your calls are in the queue but OCS isn’t placing the calls.
For example:
Let’s say we created a private queue on our OCS machine named “MyOutboundQueue”. Inside our application configuration we’ll need to supply the full path of “.\private$\MyOutboundQueue”.
MSMQ Permissions
Speech applications run under IIS and as with ASP.NET are run under an Application Pool. The pool, among other things, runs as a given user. By default this is the NETWORK SERVICE user. Your queue therefore must grant read, write and delete permissions for the NETWORK SERVICE user (or whatever user you’ve configured for the Pool).
Running on Windows XP and Vista
Speech Server only supports XP and Vista for development purposes. In production it requires Windows Server 2003. As such it imposes some limitations. One of these is that you’re applications will only run when a debugger is attached.
The easiest way to attach a debugger is to run it from within Visual Studio 2005. Once the application is running in debug mode just close the “Voice Response Debugging Window” and your application will behave similarly to how it runs in production.
Another XP/Vista limitation is that it only supports two concurrent calls. This isn’t an issue most of the time but it can be difficult to understand why you have 96 ports configured, 100 calls in your queue and only 2 calls going out at a time.
Troubleshooting Other Issues
When you launch the application from within Visual Studio you can see any error messages in the debugger's output window. This is often the best way to determine what is going on as MSS is a bit terse with what it tosses into the Windows event log.
When building custom Speech Sequence Activities you may or may not want to show the "innards" to the consumer. Thanks to the [DesignerAttribute] we can control how the activity is rendered in the workflow.
A designer controls how components appear and behave in Visual Studio at design time. You can implement your own custom designers by implementing the IDesigner interface. In this case we'll use the System.Workflow.ComponentModel.Design.ActivityDesigner designer provided with Speech Server.
I've built a number of customer activities for our team. In some cases we want the developer to understand what is going on under the covers. In other cases we want to present a "black box" so the developer doesn't need to concern themselves with how the magic happens.
By simply decorating the activity class with [DesignerAttribute(typeof(System.Workflow.ComponentModel.Design.ActivityDesigner))] we can hide the details from the user. What the developer sees is a single entity in their workflow, much like the view they receive from Speech Server's own library of controls.
For example, when the following class is dropped into a workflow the user will see every activity contains within the component:
public partial class VoicemailDetectionActivity: SpeechSequenceActivity
public VoicemailDetectionActivity()
{
InitializeComponent();
}
}
By adding the designer attribute the user will only see a single block representing the entire component:
[DesignerAttribute(typeof(System.Workflow.ComponentModel.Design.ActivityDesigner))]
public partial class VoicemailDetectionActivity: SpeechSequenceActivity
public VoicemailDetectionActivity()
{
InitializeComponent();
}
}
The Office Communication Server team announced Friday that OCS 2007 R2 will support 64-bit operating systems only.
As a part of the broad initiative across Microsoft to support 64 bit versions across many of its product lines, the next release of OCS will support 64-bit operating systems only. This decision will help meet customer demand and is a natural progression of the product that aligns with the same approach taken by the Exchange team (with Exchange 2007) and the SharePoint team (with SharePoint 2007) to support 64 bit operating systems only.
I'm not particularly surprised. One of the first things you learn when you start dealing with telephony is just how hungry recognition applications are for system resources. We’re simply unable to realize the full potential of speech recognition in a 32-bit environment.
The Speech Server API is interesting to play around with. And understanding how Speech Server works behind the scenes is invaluable in debugging. But the real value of learning the API comes when you decide to build your own custom activities.
Speech Server ships with a number of activities (aka "Speech Dialog Components"). These cover most of the common scenarios. They allow you to quickly start building applications. But as with the default Windows Forms or ASP.NET components, they are just a starting point.
There are two different types of custom controls, the Voice Response Sequence Activity and the Voice Response Composite Activity. Sequences allow you to wrap existing activities in a reusable control. You work with them much like a standard workflow. Composite activities however are built entirely using the Core API.
I'm going to build a very simple composite control. This control will act much like the existing Recording Audio activity but without requiring a prompt.
If you are interested in the complete project code you may download it here.
Creating a Voice Response Activity Library
To begin we'll create a new Voice Response Activity Library project called RecordWithoutPrompt.
During the creation process Visual Studio will ask you which Application Resources you wish to generate. We'll uncheck all of them; they are not necessary for our library.
When the project is created it adds a Sequence activity called VoiceResponseActivity1.cs by default. We'll remove this activity and add our own Voice Response Composite Activity called RecordWithoutPrompt.cs.
Core Methods
Opening the designer for our new class shows a bare-bones graphical view of the activity. From here we'll simply right-click and select "View Code" from the context menu. What you'll find is a very simple stub for the ExecuteCore method:
protected override void ExecuteCore(ActivityExecutionContext context)
{
// Add Core API calls here.
// base.ExecuteCore(context);
}
Every Composite Activity should contain at least two methods - ExecuteCore and CancelCore. The ExecuteCore method is the entry point for our activity. The CancelCore method is fired when the parent workflow is canceling the activity. The CancelCore method is not added by default so you'll need to add it yourself.
protected override void CancelCore(ActivityExecutionContext executionContext)
{
base.CancelCore(executionContext);
}
Input Data Using Properties
We'll use a couple of public properties to pass information into your activity from the parent workflow - PlayBeep and EndSilenceTimeout. These will be used by our activity to control the recording behavior. Being C# the syntax for these properties are rather straightforward.
private bool _PlayBeep = true;
[DefaultValueAttribute(true)]
public bool PlayBeep
{
get { return _PlayBeep; }
set { _PlayBeep = value; }
}
private TimeSpan _EndSilenceTimeout = TimeSpan.Parse("00:00:03");
[DefaultValueAttribute(typeof(TimeSpan), "00:00:03")]
[TypeConverterAttribute("Microsoft.SpeechServer.Authoring.DialogDesigner.ShortTimeSpanConverter, Microsoft.SpeechServer.Authoring.DialogDesigner, Version=2.0.3400.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35")]
public TimeSpan EndSilenceTimeout
{
get { return _EndSilenceTimeout; }
set { _EndSilenceTimeout = value; }
}
You'll notice that the EndSilenceTimeout uses the TypeConverterAttribute. This controls how the property is used in the designer. Rather than having to define the TimeSpan as a string ("00:00:03") the user can enter "3 sec".
Return Data Using Events
To return the recording results from our activity back to the parent workflow we'll create an event that we can bind to in the parent workflow.
public delegate void RecordingCompleteEventHandler(object sender, Microsoft.SpeechServer.RecordCompletedEventArgs e);
public event RecordingCompleteEventHandler RecordingComplete;
protected void OnRecordingComplete(Microsoft.SpeechServer.RecordCompletedEventArgs e)
{
if (RecordingComplete != null)
RecordingComplete(this, e);
}
Recording Audio via the Core API
We're going to kick off our audio recording inside the ExecuteCore method. We'll do so using the API to control the TelephonySession.Recorder object.
First we assign the properties we collected earlier. This will control how long the system waits after the caller stops talking until it declares the recording complete. It will also tell the recording object wether or not to play a beep prior to starting the recording process.
Next we'll wire up the RecordCompleted handler. This will process the results of our recording and close our activity.
Finally we'll kick off the recording using the RecordAsync() method.
protected override void ExecuteCore(ActivityExecutionContext context)
{
// Assign the properties for the activity
Workflow.TelephonySession.Recorder.EndSilenceTimeout = EndSilenceTimeout;
Workflow.TelephonySession.Recorder.PlayBeep = PlayBeep;
// Wire up our RecordCompleted event handler
Workflow.TelephonySession.Recorder.RecordCompleted += new EventHandler<Microsoft.SpeechServer.RecordCompletedEventArgs>(Recorder_RecordCompleted);
// Start the recording
string tempFileName = System.IO.Path.GetTempPath() + Guid.NewGuid() + ".wav";
Workflow.TelephonySession.Recorder.RecordAsync(tempFileName);
Completing The Activity
Once our recording is complete we'll want to fire off our RecordingComplete event and close the activity. The Close() method is vital as it tells the parent workflow that we're done. Failing to include it will cause Speech Server to hang inside this activity.
private void Recorder_RecordCompleted(object sender, Microsoft.SpeechServer.RecordCompletedEventArgs e)
{
// Fire our custom RecordingComplete event
OnRecordingComplete(e);
// Close this activity. This tells the workflow to continue.
this.Close(e.Error);
}
If you are interested in the complete project code you may
download it here. It includes a sample workflow project to test the activity with.
With the introduction of Voice Response Workflows in Speech Server 2007, Microsoft has greatly simplified voice-enabled application development. The entire process is relatively painless; even downright enjoyable. And for most applications it is all you'll need to build outstanding applications.
Sometimes however you find the workflow model just isn't the right fit. It can be cumbersome to work with large application models, they are nearly impossible to diff against prior versions, and they lack much fine-grained control over the engine. For these reasons and more it is sometimes desirable to dig to a lower-level - the Core API.
Even if you never plan to write a real application using the API, I suggest learning how it works. The workflow model is really just driving the API in the background and understanding what is going on under the covers can be very helpful.
Setting Up Your Project
Getting started with the API can be a bit confusing at first. There is very little documentation available and no built-in project templates for it.
We'll start by creating a new Voice Response Workflow Application. We'll use the project that gets generated as our foundation; removing any items we don't need.
During the creation process Visual Studio will ask you which Application Resources you wish to generate. We'll uncheck all of them; they are not necessary for our project.
Once our project is ready we'll remove the following files:
- VoiceResponseWorkflow1.cs
- PromptStrings.resx
- manifest.xml
Finally we remove all references to the VoiceResponseWorkflow1 class from Class1.cs. The resulting Class1 should look like the following:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SpeechServer ;
using Microsoft.SpeechServer.Dialog;
namespace CoreAPI_EmptyProject
{
public class Class1 : IHostedSpeechApplication
{
private IApplicationHost _host;
public void Start(IApplicationHost host)
{
if (host != null)
{
_host = host;
_host.TelephonySession.CurrentUICulture = System.Globalization.CultureInfo.GetCultureInfo("en-US");
}
else
{
throw new ArgumentNullException("host");
}
}
public void Stop(bool immediate)
{
}
public void OnUnhandledException(Exception exception)
{
if (exception != null)
_host.TelephonySession.LoggingManager.LogApplicationError(100, "An unexpected exception occurred: {0}", exception.Message);
else
_host.TelephonySession.LoggingManager.LogApplicationError(100, "An unknown exception occurred: {0}", System.Environment.StackTrace);
_host.OnCompleted();
}
}
}
If you would like to download the completed project, it is avilable here
. In future posts I will often use this project as my starting point.
NOTE: Some may notice that this post is very similar to one I made on my other blog. Given the nature of this new blog I felt it was worth duplicating it here as a starting point.
Hello!
My name is Marc LaFleur. I'm a software architect and developer in the Boston area. I currently work for Parlance Corporation, a managed service provider of voice-enabled applications. At Parlance I work on a number of different products using various speech technologies and platforms.
On this blog I'm going to share some of my experiences working on large-scale voice applications. I'll go into the ins and outs of building voice-enabled applications "on the edge"; including the dark-art of working with the Speech Server Core API. I also hope to cover some more general topics such as application portability using VoiceXML, testing voice applications, PBX integration horrors and simply how to make this stuff work.
I'd like to thank Marshall Harrison for inviting me to start a blog here on GotSpeech. As a long time reader I'm honored to given this opportunity.