Telephone numbers have been an integral part of our communication and will remain so for decades to come irrespective of the technological advancements in VoIP, SIP and multimedia communications. In this respect, “Number Portability” mandates across the globe had and will continue to have a significant impact on how services, existing and future, are delivered to a phone number. Most of the initial activities on Number Portability were centered within the intra-country scope. However, with more and more countries deploying Number Portability and increasing inter-country traffic due to globalization, Number Portability can no longer be viewed on an intra-country basis. Already, more than 50 countries have mandated/deployed Number Portability. By 2010, more than 3 billion estimated phone numbers will be impacted by Number Portability mandates. International voice minutes have crossed 300 billion annually, and the exchange of text messages across national boundaries continues to increase. Improper routing of international terminating traffic penalizes the originating operator/subscriber with additional cost and even worse – potential degradation of service.
A popular misconception is that Number Portability impacts only those numbers which have been ported. This is not correct. Once a single number is ported in a market, routing to all phone numbers in that market will be impacted. The service originating operator has to find out if the number is ported or not for each and every number in that market and optimally route based on the current terminating carrier.
A potential solution is to route all international traffic through a transit network that provides Number Portability corrected routing. But this may not be an optimal solution because it does not leverage the already existing direct-peering relationship that exists between operators from different countries. In addition, it does not evolve well as new services are introduced in the future that require Number Portability corrected routing to a phone number.
A second solution is to have a Global Number Portability Registry. The recent announcement of such a registry by the GSM Association is such an example. A third solution is for the originating operator to directly query the donor operator in the target country to find if the number is ported and provide Number Portability correction at the source.
The impact of Global Number Portability applies to both traditional operators and VoIP operators. It is becoming very important to think globally when delivering services to phone numbers – as more countries mandate Number Portability, more subscribers port their numbers, and more traffic crosses national boundaries.
I have recently asked a few of the very first providers to deploy VoIP, what the biggest technology pain was for them in their VoIP deployment. What do you think it was? Conceptual concerns, such as security, migration path to Telco 2.0 or Telco 3.0, connectivity to Facebook, IPv6, computing clouds and power grids? Was it perhaps QoS which has been socialized as a key obstacle since the age of 2.4 kbps modems? Or maybe operational concerns: troubleshooting, or customer-care? What else could it have been?
In fact, the answer is less exciting than that: the winner in being the most long-lived and unnerving technology gap in VoIP deployments is dual-tone multi-frequency (DTMF). Tones over the telephone network are still used massively to communicate with IVRs, and this does not work well yet with VoIP. One can certainly imagine that IVRs will begin to fade out in favor of well-organized web-pages. It really appears very unlikely that any new applications now in the Internet age would choose DTMF as its “user interface”. Still DTMF is utilized every day. Many call centers are implementing initial dialogs with callers using DTMF. Other applications like payment-terminals and elevator-control systems use in-band signaling, sometimes to the surprise of those who have attempted to migrate to VoIP seamlessly.
It is not that SIP/VoIP do not include protocols for conveying DTMF. In fact, a big problem is that they may include too many of them. Obviously, DTMF tones traversing a VoIP network as digitized audio degrades and becomes harder-to-detect if they traverse multiple points of encoding/decoding.
Two digital alternatives have therefore emerged in the marketplace. Advocates of the “DTMF tones are real-time” approach have standardized transmission of digitized tones in the RTP packets. Yet there is another camp that has seen DTMF as an application control mechanism. From this perspective DTMF steers applications, belongs by its purpose in signaling, and is related to media only by historical evolution.
Based on historical evolution we can see audio, RTP and SIP DTMF encoding in networks – or a combination of these – frequently causing DTMF-based applications not to operate well without extensive tuning.
This is my second in a series of posts about MSRP, or the Message Session Relay Protocol. My previous entry gave an overview of MSRP.
Like most other types of media that one can negotiate using SIP, you use the Session Description Protocol (SDP) Offer/Answer model to negotiate MSRP sessions. But MSRP is different in several ways than RTP, and these differences require some different approaches in SDP.
First, MSRP allows multiple sessions to use the same TCP connection. This means you can’t identify a session by IP address and port alone like with RTP. To get around this problem, MSRP defines its own URL scheme. Here’s an example:
msrp://host.example.com:2855/asfd34;tcp
In this example, “host.example.com” identifies the host. In this case, the host was identified by name–this could just as easily been an IP address. The port is “2855″, and the session identifier is “asfd34″. The unvalued “tcp” parameter merely means that this session uses TCP. (Right now, all MSRP sessions use TCP. It could work with other reliable stream oriented transports, such as SCTP, but the IETF has not yet defined bindings for any but TCP.) Just like what the IP address and port do for RTP, the MSRP URI defines where a device wants to receive media. A given session has a separate URI for each endpoint.
You can also specify the use of Transport Layer Security (TLS) by using a URI scheme of “msrps”.
Since the SDP m-line syntax does not support the transfer of URIs, MSRP defines an SDP media-level attribute called “path”. A path attribute for our example would look like the following:
a=path:msrp://host.example.com:2855/asfd34;tcp
It’s called “path” because it can actually carry more than one URI. This is useful when an MSRP session crosses one or more relays. We’ll talk more about that when we cover MSRP relays in a later post.
We do not ignore the m-line and c-line completely. MSRP endpoints copy the host from the URI into the c-line, and the port into the m-line. The peer doesn’t use those fields–it looks at the path attribute instead. The fields are copied just in case something in the middle cares about them, and doesn’t understand the MSRP specific extensions. Finally, the m-line media field is set to “message”, the proto field to “TCP/MSRP”, and the fmt list to “*”. The first two identify the session as MSRP. Here’s an example:
The m-line fmt field is ignored because MSRP has another extension attribute to describe allowable content formats: accept-types. The accept-types attribute carries a list of MIME format types that an endpoint understands, in order of preference. It can also include a “*” entry, meaning all types are acceptable. The following example indicates an endpoint is willing to accept any type, but prefers plain text or HTML:
a=accept-types:text/plain text/html *
There’s also an “accept-wrapped-types” attribute, which is useful when you want to require the use of some envelope type such as “message/cpim”, but still negotiate the formats allowed inside that envelope. Use of “accept-wrapped-types” can get a bit complicated for this blog posting. If you’re interested, please see the RFC.
Here’s an example to tie it all together. Notice that the lines I didn’t mention are treated the same as for any other media type.
v=0
o=alice 2890844526 2890844526 IN IP4 host.example.com
This post continues the
series of posts on SIP-I and SIP-T deployment challenges. You may wish to read
the Introduction
to SIP-I and SIP-T post for some general background on these two protocols
before continuing.
This post deals with the
issues surrounding the establishment of an audio path before a call is
completely set up.
These problems stem from the fact that SIP and ISUP have rather
different models for the way the media is set up.These differences are rooted in a philosophical difference about where call progress information is generated. In the PSTN, it is typically generated by the called party’s end office. So, if a media path isn’t set up before the call is completed, the call progress tones can’t be sent. By contrast, in a SIP network, call progress information is usually generated by the calling party’s device — so the media path doesn’t matter until the called party answers.
Because it does not require a media path to convey call progress information, SIP’s design expects that
the session will be completely established before media begins to flow. There is one minor exception: as a means
to avoid clipping off the initial media travelling from the called party to the
calling party, SIP does specify that clients are supposed to play any media
received prior to the session being established. However, this provision was
designed to avoid a very specific corner case, not to carry long-lived media
sessions.
To further understand this
behavior, keep in mind that SIP uses an offer/answer model for establishing
session parameters. One endpoint sends a proposed session description – an “offer”
– with the IP address to which media of the offerer is to be sent, and a set of
acceptable session parameters. The other endpoint responds with an “answer”
session description; this “answer” selects final values for the various session
parameters that are to be used for the media, and also includes the IP address
to which media is to be sent for the answerer.
Since the media session
negotiation does not typically complete until the calling party answers, media sent
towards the caller before the session is completely established may or may not
work properly; and, since the IP address to send media to only shows up in the
answer, it is actually impossible to send media towards the caller. (Keep in mind that an RTP “session” is actually composed of two streams: one flowing towards the caller, and one flowing towards the called party. Once the offer is sent, the stream towards the caller can begin. However, the stream towards the called party cannot start until after the answer is received.)
By contrast, ISUP expects
the ability to send media on a circuit as soon as it is seized, which happens
as soon as the call attempt begins, so that it can send call progress and call error tones. To further complicate matters, many deployed
IVRs take advantage of this behavior by not triggering an ACM (thus
establishing the session and, typically, marking the start of charging) until a
human answers the call.
In other words, SIP
provides a “best effort” attempt at passing media prior to the call, while ISUP
has an absolute requirement for sending media before the call is completely
established.
The IETF began considering
this problem well before the current round of SIP specifications were published,
with significant impact on the final documents. In June of 1999, a proposed “183
Session Progress” response code was described [1] for instructing an ingress
gateway to suppress ringback, and to use the in-band media instead. Ultimately, this solution was put aside
due to a number of shortcomings, including the inability to ensure that the 183
response is actually received by the calling party. (A “183 Session Progress”
response code was later added to the core SIP specification, but with very
different semantics than originally proposed).
As a result of the work on
early media and PSTN interoperation, by the time the core set of SIP
specifications was published as RFCs 3261 through 3265, it contained a limited set
of tools that allowed the establishment of “early” media sessions. However,
exact procedures for combining these tools were not finally published until
December of 2004, in the form of RFC
3960.
At a high level, here’s
how PSTN gateways can set up media sessions prior to the final establishment of
a call:
This is pretty similar to
the diagram for a basic call setup that we looked at in the introduction post.
The key differences are that this diagram shows exactly when the media path is
set up between each component in the system. In the PSTN, these audio paths are
set up as soon as the network can – messages 2, 5, 12, and 21 all happen as
soon as possible. Because the SIP network doesn’t treat media quite the same
way as the PSTN, we need some extra signaling to set up the session. That’s
where messages 7 through 9 come in. Message 7 contains a provisional session
description “answer” for the session “offer” that was present in message 6. At
this point, both gateways have enough information to exchange audio. (Messages
8 and 9 simply acknowledge the receipt of message 7; this is necessary to
ensure that message 7 is delivered reliably.) In this scenario, because there
is an audio path all the way to the called party’s end office, the ringback
tone can be generated remotely (message 15) instead of being generated by the
caller’s end office. However, even with these procedures in place, the ingress
gateway cannot rely on the remote end
generating ringback tones. It must monitor the media stream, and locally
generate ringback information if there is none present in the audio. This can
lead to jarring transitions where a calling party hears ringback generated by
the ingress gateway, followed by an abrupt change to ringback generated by the
called party’s end office.
Further complicating
matters: even with the procedures defined in RFC 3960, the exact behavior
defined for local versus remote generation of call progress and error tones
remains a matter of local policy at the PSTN gateways.
However, the key problem
with this approach is that the additional procedures used to set up an early
session are not necessarily supported by native SIP terminals – which means
that early media tends not to work properly when calling from a native SIP
device. This is particularly troublesome in the case of IVRs that expect to
play information to a user before establishing the call. Similarly, when a call
is made to a SIP device, the gateway and end office must assume responsibility
for generating call progress and error tones that would typically come from the
remote end office.
There are some other
early-media complications that arise when more than one egress gateway is involved, but we’ll
save those for next time.