I remember as a kid taking two cans, punching holes in the bottoms, threading string into the holes and tying knots to hold the string. Then my friend and I would hold the cans so that the string was taught and talk into the can while the other had their can up to their ear. We were never quite sure if we were actually hearing the voices through the cans or just hearing each other because we were so close. But it was a fun little experiment that entertained us for a while. The physics behind this assumes the bottom of the can you talk into vibrates with your voice, which in turn vibrates the string, which finally vibrates the bottom of the other can. So in theory we had built ourselves a voice user interface (the cans) along with network (the string) that allowed us to talk to each other over a distance. It was a pretty crude interface, but none-the-less an interface.
With Microsoft Speech Server (MSS), there are a multitude of interfaces, all of which need the “cans” and “string” to tie it all together. In previous installments I have discussed how a speech application is very similar to a web application. On one end you have the web server, which services up the web pages, and on the other end you have a user sitting in front of a computer using a web browser, or in the case of a speech application, talking on a phone. The user types in an address such as www.microsoft.com in their browser and at the other end, the web server gets that request and sends back the start or default page of the web application. As the user interacts with the web application, more requests are made to the web server and the web server responds with new or updated pages back to the user’s web browser.
Think of MSS as a computerized version of a person with a web browser. MSS requests a speech application from the web server, the web server in turn sends back a web page, but instead of pictures and text, the speech server gets some special instructions that tell it to play prompts, and accept speech input for recognition. So fundamentally, the interface between the web server and the Speech Server is a standard web connection, but there is another interface between MSS and the telephony network. In this installment, I would like to talk in some detail about that interface.
If a user wanting to use the internet had no keyboard to “talk” to the internet with, and no screen to see the results of a request to the internet, that would make for a pretty boring user experience. So to allow a user to “talk” to the internet, we have a piece of hardware called a keyboard, and a piece of software that translates the electromechanical input from the keyboard to the text that is required to make a request. By the same token, to see results, we have some software that translates the ones and zeros to an electrical signal by a special circuit that sends a signal to the computer monitor to display the results of a web request.
Setting aside the use of a telephone for a moment, it is possible to plug a microphone and speaker into the MSS computer to speak to MSS, and then hear spoken words from MSS through a speaker. But having hundreds of users with microphones and headsets plugged into one machine is quite obviously not very practical. So to accommodate lots of users from anywhere in the world, some how MSS needs to connect to an entire telephone network."Out-of-the-box,” that is not something the MSS can do without some special hardware and software.
MSS understands computer networks and connects easily to a web server, but it "knows" very little about connecting to a telephone network. So Microsoft relies on third party vendors to provide that telephone interface. As with a keyboard and monitor (the hardware) there is also accompanying software (drivers) that allow any company who makes those devices to connect to a computer running Microsoft Windows. So for MSS to connect to a telephone network, other companies make special boards that fit into an ISA slot in your computer that make the physical connection to standard telephony systems. These companies also provide software drivers for the board and a software interface between MSS and the board called the Telephony Interface Manager (TIM).
These boards and software can be purchased from several sources; however as a way of detailing how these boards and software work, I would like to tell you about the boards and software from Cantata Technology (formerly known as Brooktrout). Cantata offers several versions of their Brooktrout TR1000 boards. For development, testing and small deployments they offer a 4 port (4 telephone line) analog board. Other configurations use ISDN digital cards up to 96 ports. For an inexpensive test system, you can purchase a Starter Kit from Cantata that includes a 4 port analog board, the TIM software, and a 90 day license for Microsoft Speech Server. I have used this configuration to set up a small office system that connects right to my standard telephone line.
First let’s talk about the TIM. The TIM software, as I mentioned is something that is shipped with the board. Without the TIM, the board would not be able to interface with MSS. The purpose of the TIM software is two-fold. First it translates the digitized audio from the board into a standard that is acceptable to MSS as well as takes the audio from MSS and translates it into a standard acceptable by the TR-1000 boards. Second, the TIM takes instructions from the MSS to do things like answer the incoming telephone calls, as well as hang-up, transfer calls and perform outbound calls.
The installation of the Brooktrout TR1000 4 port analog telephony card and TIM was a rather simple process. The board does require a full length board slot, so don’t try to install it in a compact desktop computer. The instructions in the user manual were well documented and provide excellent step-by-step installation instructions as well troubleshooting techniques. It is important that you follow the instructions carefully. For example they tell you to install the TR1000 software first, before installing the board. In this way, when you do install the board, the Plug-and-play features kick in and automatically detect and install the appropriate drivers. (The TR1000 software has recently been updated with a number of enhancements and fixes. The software is going through the final Microsoft certification for MSS 2004 R2, but has not been release as of this writing.) After you install the board, you are asked to run the configuration software to determine how you want the ports set up. For example you can determine how many rings before the line gets picked up, as well as determining if the ports will be used for in-bound or out-bound calls. After the configuration, it’s time to install the TIM software. Since I was familiar with Intervoice’s TIM installation, I quickly recognized the process. The Brooktrout TIM software is really a re-branding of the Intervoice TIM.
I have installed three different versions of TIM and two different boards (the other being the Intel Dialogic board). None of them went as smoothly as this install did. I do think it’s fair to say, however, that nothing is problem free, and so I do have a couple of minor issues.
First, for some reason when installing the MMC interface for TIM, the system seemed to lock up during the install. There was no indication that the MCC console had completed its install, but upon inspection of the system, it did get installed. I decided to re-install the entire system to see if the problem repeated. The second installation went without a hitch.
Second, near the completion of the installation of the TIM a pop-up box appeared saying that the IviGrp group had been installed and it asked me if I wanted to install a user for that group. At this point I could have chosen no, but I was not really sure. I looked for documentation to help explain the question, but the documentation had not yet been installed. So my guess was to choose yes and I was immediately taken to the Computer Management console. My next thought was what user to install and decided not to do anything. Later after successfully installing the TIM software (even without a new user) I was able to read and understand what it was the installation software was having me do. It turns out to run some of the command line utilities for the TIM software, your system user requires that it a member of the IviGrp group. So I was later able to assign my login to that group without a problem.
Following the install of the TR1000 and the TIM, I installed Microsoft Speech Server. To test the system from end-to-end, I plugged Port 0 of the Brooktrout board into the telephone jack in the wall and dialed the number. The first time I called, the system took three rings before it answered, even though the board was set up to answer on the second ring. After it answered, it took about 10 seconds before MSS sent me the “Welcome to Microsoft Speech Server” message. The delays were likely due to the server caching all the necessary software, because on the second call, everything worked without delay.
So that’s it. When compared to developing a speech application and then configuring and optimizing the application, installing the Brooktrout TR1000 and TIM software was a breeze. In the next installment, I will talk a little about configuring MSS and how MSS knows what application to run when a caller calls the system.