Archive

Author Archive

The end of IPv4, part 3: Towards a post-shortage world

May 17th, 2011by Adam Roach under SIP

In my last two posts [1][2], I talked about the recent exhaustion of the IPv4 address pool, some of the approaches that are being considered to squeeze a little more life out of the existing IPv4 address space, and the unfortunate consequences of those approaches.

I’d like to start this entry by following up on my statement regarding the emergence of an IPv4 exchange market. A few weeks after my last post, NetworkWorld published an article on the IPv4 broker websites that emerged almost immediately after the Asia/Pacific RIR issued its last address to an ISP.

Of course, exhaustion of the IPv4 network space was hardly an unforeseen event. As early as 1995, it was obvious to the IETF that 4.2 billion addresses were simply not enough. And so, IPv4’s successor – IPv6 – was born.

IPv6 actually adds a lot of useful features that weren’t present in IPv4, but the key one that has people talking is that it has a much bigger address space than IPv4. Instead of being able to represent 4.2 billion addresses, IPv6 can talk about 340 undecillion addresses. To put that in perspective, IPv6 will allow each square millimeter of earth, including the oceans, to be assigned over 67 million addresses. Each.

Certainly this technology that can save us from the coming IPv4 pain must be exotic and unavailable, right? Not really. All Microsoft operating systems since Windows 2000 have supported IPv6. Linux has included IPv6 support since 1996. Macs have had IPv6 since OS X 10.2 (which came out in 2002).  In fact, there isn’t a viable operating system connected to the internet that didn’t have IPv6 support by 2003.

So it must be the core of the network holding us back, right? Together, Cisco and Juniper constitute around 80% of the core Internet router market. Cisco has had IPv6 support in its core routers since 2001, and Juniper has had support since at least 2002.

So the networks have supported it for almost a decade, and the endpoints have supported it for almost a decade… what’s the hold up?

There are two barriers left to clear: ISPs and Applications.

Exactly why ISPs don’t offer IPv6 to their customers is a complete mystery to me. Axel Pawlik, the managing director for RIPE (the European regional internet registry) summarized the issue quite succinctly at the beginning of the year: “If [ISPs] do not have any plans for IPv6 now, [they] are irresponsible. They should have that in place, if they do not have that by now something is going seriously wrong.” However, even as the last of the IPv4 addresses are being handed out, most ISPs haven’t begun any public dialog about their transition plans. Notable exceptions exist – Comcast has led the charge in deploying IPv6 to actual customers – but they’re rare exceptions. The other major U.S. ISPs are strangely silent on when they plan to roll IPv6 out to their residential customers. Right now, the only reliable way to get a true IPv6 connection in the U.S. is to work with a boutique ISP like Hurricane Electric or Global Crossing (soon to be part of Level 3).

The other hurdle is application support for IPv6. The biggest applications on the Internet – web browsers, web servers, email – already have a fairly widely deployed base of IPv6-capable software. Less-commonly used applications – for example, Slingbox media players – don’t have IPv6 support yet. Luckily, IPv6 isn’t an all-or-nothing proposition. All of the operating systems I mention above have the ability to operate in “dual stack” mode, where they have an IPv4 and IPv6 address at the same time. Older apps can use IPv4 (which may subject them to the unfortunate effects I discussed in my previous post), while newer apps will be able to avoid the pain of IPv4 exhaustion by using IPv6. So this isn’t a true hurdle in the way that ISP support is; once ISPs start issuing IPv6 addresses to customers, application support will gradually improve over time. And, since the most-used applications are already IPv6 capable, things will start out pretty good anyway.

We’ve done a lot of work in the IETF to make sure that SIP can easily be deployed over IPv6. While the version if SIP that was published in 1999 (RFC2543) didn’t include IPv6 support, the subsequent 2002 version (RFC3261) did. And it was published along with an addendum to SDP to allow it to talk about IPv6 addresses for sending and receiving media. Subsequent implementation experience led to the publication of some further minor clarifications of SIP’s IPv6 handling, but the core support for IPv6 in SIP has been around for almost nine years. And the good news is that 68% of the SIP implementations present at the most recent SIP interoperability test included IPv6 support. This is up from 53% in late 2010, 36% in 2009, and 30% in 2008.

In other words, things are actually looking pretty good for network applications in general, and really good for SIP applications in particular.

The good news is that most of the really hard problems – getting IPv6 support into the Internet at large, getting IPv6 onto everyone’s computer, and getting IPv6 support into the most important applications on the Internet – have all been taken care of. All we need now is for operators to step up to the plate, and the pain caused by IPv4 exhaustion begins to fade into nothing more than a closing chapter in Internet history.

The end of IPv4, part 2: Living with the Shortage

April 5th, 2011by Adam Roach under SIP

The last time I posted, we explored the situation surrounding IPv4 address exhaustion. This time, we’re going to look at some of the implications of that shortage.

One approach to addressing the shortage that has been bandied around is the use of what is called “Carrier Grade Network Address Translators (CG-NATs).” The idea behind these CG-NATs is that ISPs would assign all of their users un-routable IP addresses in a private network. The CG-NAT would operate similar to consumer NATs, opening and closing ports as customers send traffic to and from the network.

The major difference between CG-NATs and consumer NATs is that end users have no control over the policy of these CG-NATs. Consumer NATs typically allow applications to automatically open and close specific ports (critical for many multiplayer games) using uPNP, IGD, or NAT-PMP; and they allow users to specify port forwarding rules manually (to enable Slingboxes, certain types of VoIP applications, etc).

So, what happens when these CG-NATs start popping up between users and the Internet? Well, a lot of stuff breaks. To be fair, the most popular applications – web browsing, email, and most instant messaging clients – will continue to work just fine. But real-time applications? They won’t fare as well. And many VPN technologies – in particular, those based on IPSec – have no hope of working at all.

To add insult to this injury, most users already have NATs deployed in their home networks (often supplied by the ISP itself), which means that deployment of CG-NATs will now put them behind two NATs. And several techniques that applications can use to successfully traverse a single NAT start to break down when multiple NATs are in the way.

But CG-NATs aren’t the only way to stretch the existing IPv4 address pool a little bit. There’s an alternate proposal being put forward by an individual in the IETF, which would effectively issue a fraction of an IP address to each customer. The basic idea is that users’ equipment would receive an IP address and a valid range of TCP and UDP ports that they are allowed to use.  It’s an imperfect solution, to be sure (for example: who gets the high-value ports, like 21, 25, 80, 443, and 993?), but at least it avoids some of the problems that are intractable in CG-NAT scenarios.

It is also predictable that the actual economic value of an IP address will increase. An exchange market in IP addresses – whether a black market or an RIR-facilitated swap – is almost inevitable. Of course, this means that anyone requiring a whole IP address to themselves (for example, to run a web server or an IP PBX) will start paying increasingly stiff fees to continue to use that address.

Hosting just about any service requires the use of key ports on an IP address (e.g., ports 80 and 443 for web sites), so doing so requires one of these “whole IP addresses” that will be going up in price. And it’s not too hard to predict that, as the cost of hosting content goes up, the number of content sources that can afford to make information available will start going down. It will have a distinctly un-democratizing effect on the Internet at large.

Finally, the introduction of CG-NATs – if that is the path carriers take – requires additional hardware, leading to increases in both capital and operational expenses. The introduction of another choke point in the network means that the network becomes increasingly brittle and breaks more often, leading to increased customer service calls, and decreased customer satisfaction.

So, while it’s not the end of the network as we know it, life after address exhaustion will be far worse for users, application providers, and even carriers.

But perhaps it’s not as bleak as things sound. There is a far better solution to the address shortage. I’ll discuss that next time.

The end of IPv4, part 1: What’s Going On?

February 22nd, 2011by Adam Roach under SIP

The IPv4 address shortage has officially begun.

On February 3rd, the Internet Assigned Numbers Authority (IANA) handed out the last IPv4 address blocks to the five Regional Internet Registries (RIRs).

What does this mean?

The current version of the Internet Protocol (IPv4) has enough room for about 4.2 billion addresses. The way these have been allocated — at least, since the mid-90′s — is that IP addresses were handed out by IANA in blocks of 16.7 million at a time to the RIRs. There are five regional registries: one each for North America, Europe, Asia/Pacific, Latin America/Caribbean Islands, and Africa.

These regional registries, in turn, hand out much smaller blocks of addresses (depending on their policies) to ISPs; who then assign them (sometimes one-at-a-time and sometimes in blocks) to their customers.

So, when an ISP runs low on IP addresses, it asks its RIR for another block. When the RIR runs low on its own blocks, it asks the IANA for another chunk of 16.7 million addresses.

There are only 256 of those chunks available, and some are reserved for special purposes. Back in 2009, the RIRs and the IANA forged an agreement: once there are only 5 chunks left, they’ll be handed out to the RIRs (one for each). And that’s what happened two weeks ago.

For the first time in the Internet’s history, it is possible that an RIR could run out of addresses, and IANA would have no more to give it. And this, of course, means that an ISP could get a new customer, but not have an IP address to assign to him.

There won’t be any major changes immediately — at least, none that normal Internet users will notice. The RIRs have their own policies that kick into place when this “exhaustion phase” happens (for example, in North America, an ISP can’t request addresses more often than every 3 months), but that won’t matter to end-users.

However, sometime in the next year or so, an RIR will hand out its final address.  Projections are somewhat volatile, but most signs point to the Asia/Pacific region allocating its final address to an ISP sometime around September or October of this year. And that’s assuming the currently situation doesn’t cause a run on addresses.

What happens after that can take a couple of paths. If ISPs are allowed to allocate addresses out of pools from other regions, then the worldwide address pool will completely dry up sometime in late summer or early fall of 2012. If the RIRs stick with allocating only to their own region, then Europe runs out summer of next year; North America runs out fall of next year; and Latin America and Africa run out sometime around early 2015.

In either case, ISPs in Europe, Asia, and North America cannot request new IPv4 addresses by the end of 2012, even using conservative models.

What happens after that is a matter of much speculation.  I’ll engage in that speculation next time.

The Realtime Web

January 11th, 2011by Adam Roach under SIP

One of the efforts currently gaining some traction within the IETF is a push to define a framework to allow the creation of web pages that can engage in real-time voice and video communication.

For quite some time, JavaScript – a staple for every modern browser – has had primitives that allowed browser-based scripts to send and receive arbitrary data from the server that they came from. This has allowed web sites – such as Gmail and Facebook – to implement presence and instant messaging solutions. These have been performed in near-real-time, which is perfectly acceptable for text messaging (in which latencies of several seconds are unnoticeable).  However, due to its security model, this information was limited to being sent over HTTP (and hence TCP), and little to no support for specific applications was provided.

For real-time video and voice, these behaviors are problematic. TCP is unsuitable as a transport for real-time audio, and the additional latency of sending the media through a web server will typically make the experience frustrating and unusable. Further, current JavaScript APIs provide no access to microphones, cameras, or codecs that would be required to collect, encode, and decode media.

Browser add-ons – such as Flash, Silverlight, and Java – have the ability to solve some of these problems (such as access to the microphone), but still share JavaScript’s “same origin” security policy that requires them to send and receive information only to and from the same web server that the application came from.

So, while we’ve almost been there for quite some time, there are some critical components missing to enable web application developers to provide two-way voice and video communication in a web browser.

At its heart, this effort will require work within two different organizations: the IETF (for selection and definition of network issues, such as datagram transport, framing, connection management, codec selection, and protocols for conference control), and the W3C (for additional JavaScript and HTML5 mechanisms that allow secure access to these protocols, to codecs, and to multimedia input devices). In fact, it’s probably this required division of labor that has caused this work to be so slow to develop: each group correctly identified substantial portions of the problem to be outside their areas of expertise.

Although these discussions are still in their infancy, key participants have proposed some very promising constraints: the use of ICE for firewall traversal (including using TURN servers where appropriate); the use of RTP and SRTP for media conveyance; and session management that is either explicitly based on or trivially  convertible to and from SIP and/or XMPP.  This means that gatewaying to existing real-time networks – such as IMS, Google Talk, Microsoft OCS, most enterprise PBX systems, and many commodity PSTN gateways – should be a straightforward exercise.

This work, if successful, takes us closer to ubiquitous communication. Imagine the convenience of logging into your voice- and video-enabled web site from anywhere, and being able to start making calls without installing a single piece of software. Imagine being able to create a web site that allows your users to interact with customer service representatives using a videoconference, without waiting for anything to download.

I expect there to be a lot of effort put into this work between now and the March IETF meeting in Prague. At that meeting, the plan is to hold a BOF session to nail down the charter for a new working group to complete the IETF portion of this work.

Of course, this level of coordination between the IETF and the W3C has never been attempted before, and I’m sure there will be growing pains as we attempt to jointly push this work to completion. But I’m excited that we’ve taken the first steps in this journey, and have high hopes that something fruitful will come out of the effort.

Multiparty Communications in SIP: A Brief History

October 5th, 2010by Adam Roach under SIP

A lot of the industry focus in SIP pertains to two-party interactions: a calling party and a called party; an instant message sender and an instant message recipient; and so on. But in the IETF, we’ve actually done quite a bit of work to facilitate group communications.

In fact, the SIP effort originated in the IETF’s Multiparty Multimedia Session Control (MMUSIC) working group. This work traces its roots all the way back to 1992, when the first Conferencing Control (CONFCTRL) BOF met at IETF 25, which was, itself, spawned from the Remote Conferencing Architecture (REMCONF), work that started shortly before that.

Despite this long history of conference-oriented working groups in the IETF, a lot of the thought around how to facilitate communications with three or more parties in SIP didn’t start in earnest until a decade later, with the introduction of a conferencing framework that would eventually be published as RFC 4353. This document spurred some work on preliminary conference control within the SIP protocol itself, using existing SIP tools such as REFER (RFC 3515), “Join” (RFC 3911), and “Replaces” (RFC 3891) to control conference servers. This SIP-based conference control is published as RFC 4579. Its companion document, RFC 4575, defines a means for SIP clients to learn certain advanced facts about the state of an ongoing conference, such as the list of participants and the types of media in use in the conference.

At a very high level, with SIP conference control, conferences spring into existence when users send an INVITE to a URI that corresponds either to a predefined list of users, or to a special URI (a “factory” URI) that creates a new ad-hoc conference. The conference creator can then send SIP “REFER” requests to the conference to add or delete users to and from the conference.

As work on the SIP conference control progressed, two facts became clear: first, the effort to develop a comprehensive conference control protocol could easily become far too large for the SIP working group to tackle while doing other work; and second, the use of SIP to provide more advanced conference control was an increasingly ill fit. As a consequence, the IETF formed the Centralized Conferencing (XCON) working group in 2003. XCON was chartered with creating a new protocol for the purpose of creating and controlling multi-party conferences.

Although initial interest in the XCON work was high, it took considerable time to rationalize the various conflicting approaches that were proposed into a unified system. Some proposals strove for simplicity, while others wanted to define arbitrarily complex systems for describing media mixing and video panel layouts. Some wanted syntactic manipulation of documents representing conference state, while others wanted semantic operations on objects representing conference participants and conferences themselves. In hindsight, this level of conflict isn’t surprising; the problem was identified as early as 1993 in the CONFCTRL notes from IETF 26: “It is difficult to design a CONFCTRL protocol that balances simplicity with a high degree of semantic flexibility, e.g., Jack Jansen concluded that different conferencing styles require entirely separate CONFCTRL protocols.”

While XCON plugged away at its chartered work, the SIP working group (and related groups like SIPPING and SIMPLE) moved forward with several extensions that relate to multiparty communications. RFC 4662 defined a mechanism for subscribing to resource state for several resources at the same time (e.g., to learn presence information for a list of friends all at once). RFC 4825 (XCAP) and RFC 4826 (the resource list XML format) defined the means to create, manipulate, and delete the members of a list, providing users the ability to dynamically change the users that a conferencing URI corresponds to.

Moving beyond these long-lived lists, later work within SIP allowed users to actually send the list of relevant URIs in the request itself, using a framework known as “URI-List Services” (RFC 5363). This framework, along with RFC 5364, defines the syntax for conveying lists of URIs in SIP message bodies (using multi-part MIME bodies) and for tagging copy control attributes (equivalent to “To,” “Cc,” and “Bcc” in email). The framework has been defined for operation with MESSAGE (RFC 5365), INVITE (RFC 5366), SUBSCRIBE (RFC 5367), and REFER (RFC 5368) so far.

URI-list services for MESSAGE allows users to send a single instant message to a special “message exploder” URI, and have that exploder copy the message to all the users listed in the URI list. The INVITE and REFER extensions allow users to apply an action to multiple conference participants when using RFC 4579 mechanisms. And the SUBSCRIBE extensions allow users to subscribe to the presence state for several users at once, without first creating the list of users with XCAP.

Of course, with the ability to send instant messages to many users at once, or to make many phones ring at the same time, comes the potential for abuse. To mitigate this, the URI-list services were published in conjunction with a consent framework (RFCs 5360, 5361, and 5362). Effectively, these consent protocols allow server operators to provide an opt-in experience for users named in URI-list services requests.

Meanwhile, XCON has been making steady and solid progress, and has finally sent its key deliverable – the conference control protocol itself – to the IESG for evaluation and publication as an RFC. At the same time, the SIP instant messaging and presence working group (SIMPLE) is nearing completion on a document that defines specific behavior for text-chat-room conferences. I expect both of these to reach RFC status some time in 2011.

However, even as this work winds down, new work is spinning up in the IETF for controlling some additional media-related aspects of conferences. Specifically, an as-yet unnamed working group is in the process of being chartered for the full-immersion conferences commonly referred to as “telepresence.” The general idea of the work to be taken on is described in the teleconference use case document, with specific proposed deliverables defined in the currently proposed working group charter.

SIP and Network Data: Simplified at Last

August 24th, 2010by Adam Roach under SIP

Many proposed deployments of SIP are seeing an increasing number of components based on HTTP – for storing information such as feature settings, user-provisioned data, instant message archives, and user agent configuration information.

Unfortunately, while HTTP’s ubiquity makes it a good candidate for storing and retrieving this information, it doesn’t serve as a particularly good substrate for finding out as soon as the information changes. In a real-time system, finding out about these changes immediately becomes important.

There have been some efforts to make HTTP more responsive to these kinds of changes – using, for example, Comet-style approaches. In fact, the IETF has even begun work on a standardized mechanism in the HYBI working group. But Comet stretches the HTTP request-response architecture well beyond its original design goals, resulting in a less-efficient and less-scalable model than can be achieved by a purpose-built solution. And HYBI is still a very young working group, unlikely to yield usable results in the short-term.

Now, the SIP community did recognize the need to store information in the network and discover changes to such information pretty early on. It is for this exact purpose that XCAP (RFC 4825) was developed. Technically, XCAP only provides the mechanism for storing and retrieving information in the network – it is used in conjunction with two companion specifications – RFC 5874 and  RFC 5875 – to find out about changes to the information in real-time.

This XCAP approach is really very robust, scalable, and well-designed. Unfortunately, at 122 pages spread across three documents, it also ended up being rather labyrinthine. It also requires the data to be stored not just as XML, but as XML with certain special restrictions that make addressing individual elements in the document easier. As a consequence, the implementation community, industry fora, and other standards bodies – and IETF working groups, for that matter – have been somewhat loath to use XCAP.

Clearly, we need a simpler mechanism.

The forthcoming RFC 5989 defines exactly this simpler mechanism. Rather than trying to define a large, complicated framework, RFC 5989 defines a fairly minimal SIP event package. Interested clients can use this event package to request notification whenever a specified HTTP resource changes. Here’s how it works.

When a client gets an HTTP resource, it also receives what is called a “link relation.” Link relations are simply a URI that is related to the resource in some way. These link relations can be carried in the HTTP response header, in HTML bodies (using the <link> element), and in ATOM bodies (using the <atom:link> element). They also receive a unique identifier (typically, an ETag) that corresponds to the current contents of the HTTP resource. So, if the resource changes, this unique identifier changes also.

RFC 5989 defines a new link relation type that contains a SIP URI. Clients who want to know when the resource changes subscribe to this SIP URI using the SIP SUBSCRIBE method. Whenever the HTTP resource changes, the clients receive a new SIP NOTIFY message containing a new unique identifier for the changed HTTP resource. They can then compare this tag against the tag in their local copy of the resource, and download a new copy.

SIP and Network resized 600 

By using this approach, clients can maintain a completely up-to-date view of the value of an HTTP resource without constantly polling the HTTP server, resorting to the long-poll approaches of Comet, or burdening the data with the restrictions and complications of XCAP.

SIP Trunking: Request Routing

July 13th, 2010by Adam Roach under SIP

SIP trunking, broadly defined, is a service in which an Internet Telphony Service Provider (ITSP) provides service to a customer-operated Private Branch Exchange (PBX). There has been considerable work on defining parameters around commercial SIP Trunk offerings over the past few years, including the SIPconnect effort within the SIP Forum and the Business Trunking specification developed by ETSI.

One of the problems that has remained most pervasive, however, is the means by which an ITSP knows where to send messages destined for a particular customer. Early offerings frequently required manual provisioning of customer IP addresses – calls addressed to one of a customer’s phone numbers would be routed to the address that they gave the ITSP when the service was set up. Unfortunately, this approach suffers from a large number of shortcomings. For example, the additional provisioning step of gathering IP address information from customers leads to less efficient provisioning and higher operational costs. Also, this kind of set requires customers to contact their ITSP if they ever need to change the IP address of their PBX. And, since such provisioning changes often take hours or days, this approach can leave customers without phone service for very long periods of time.

The first serious attempts to solve this problem came from the IMS network, and were modeled on the way IMS handles single users with multiple AORs. Basically, the PBX would register a single identity – a lead number, for example – and the ITSP would presume that calls for all the identities associated with that PBX should be routed to the same destination. It was a very simple solution to the problem, and it worked passably for the kinds of environments that IMS can assume (i.e., tightly controlled walled garden networks, where non-standard behavior can be provisioned into SIP servers by bilateral agreement between the ITSP and the PBX owner).

This naïve solution to the problem, however, suffered from a number of drawbacks. Significant details about processing of inbound INVITE requests were left unspecified, leading to very real deployment issues in the field. Further, this very real change to the semantics of REGISTER – that is, its nature of registering many disparate AORs instead of a single AOR – was not signaled between the PBX and the ITSP. Outside of tightly-controlled walled garden networks, this lead to situations in which the ITSP or the PBX thought the IMS mechanism was in use while the other end did not. The resulting call failures – which often would involve signaling loops – were difficult to diagnose, and even more difficult to solve. The solution also suffered from being designed without significant input from SIP protocol experts, making mistakes such as defining a wildcarding syntax that is fundamentally incompatible with SIP syntax in general.

However, the key problems were far more structural than these, which could be solved by minor tweaks to the specification. In particular, while these attempts did manage to make basic calls work under the right circumstances, they were designed without regard for key registration-based mechanisms developed within the IETF. Interaction with the registration event package was added as an afterthought, and in a way that assumed everyone in the network would be aware of the new REGISTER semantics. No provisions were made for allowing the use of temporary GRUUs, which are a critical part of the ability to make and receive calls in an anonymous fashion.

To address this situation, the IETF took on work near the end of last year to specify a mechanism for registering multiple AORs with a single SIP message. This work was spurred predominantly by the SIP Forum’s SIPconnect work. Within the SIPconnect effort, it became apparent that the existing solutions weren’t sufficient for the more general architectures they wanted to enable. The resulting working group – called MARTINI – has been working at a feverish pitch over the past six months to produce a mechanism that solves the registration problem, while addressing the shortcomings of the previous mechanisms.

The proposed solution [1] has largely stabilized, and is now entering a final comment period within the MARTINI working group before being passed off to the IETF leadership for publication. At a high level, this solution sidesteps a large number of the problems that existed in prior solutions by closely simulating what would happen if the PBX sent a separate REGISTER message for each of its phone numbers. In other words, it uses REGISTER to update a registration database, in contrast to earlier solutions that were effectively updating a broader domain routing database.

The solution also includes significant provisions to ensure that previously-defined registration-related mechanisms in SIP remain viable for PBXes that choose to use it.

With any luck, then, we should finally have a general-purpose solution to the problem of how to route requests over a SIP trunk to a PBX finished and stabilized within the year. Combined with the other work being done in the SIP Forum SIPconnect group, this should lead to a well-defined, unified specification that allows ITSPs to quickly and confidently deploy SIP trunking services. And that can only be a good thing for SIP.

__

[1]­ Full Disclosure: I am the editor of the solution developed by the working group, and have been deeply involved in its design.

SIP and “Secure” Communication: What does it mean?

June 1st, 2010by Adam Roach under SIP

One of the recurring topics in the discussion of SIP security is how you give users the information they need to make informed decisions. In most of these conversations, a parallel is drawn between web browser security and SIP security – usually, in terms of  “why can’t SIP terminals have a simple lock icon that tells the user the call is secure?” And all major web browsers do have a simple visual indicator, like these two from Internet Explorer and Firefox:

Macintosh HD:Users:adam:Desktop:Screen shot 2010-05-25 at May 25, 14.10.34.png  Macintosh HD:Users:adam:Desktop:Screen shot 2010-05-25 at May 25, 14.11.07.png

Unfortunately, the issue with SIP is significantly more difficult than that. With web browsers, you really need to ensure only two things: that the website you’re connecting to is the web site you think you’re connecting to (authentication), that no one other than you and the website can see the information you’re sending and receiving (confidentiality). For the web, this is easy to do because TLS (used by https) provides both of these properties.

With SIP, you have at least five different major problems to solve – and possibly more, depending on how you account for them: Caller-ID, Called Party Identity, Media Privacy, Media Authentication, and Signaling Confidentiality.

Caller ID and Called Party Identity

First, when a call arrives, the user is going to want to know who is calling, similar to Caller-ID on today’s PSTN. Jiri did a series of posts (1,
2,
3) detailing the need for identity in the SIP network. (While this is a good treatment of the need for identity, I think its conclusion – that we should use the same spam-prevention mechanisms as email – is a bit naïve; as Ben later points out, 94% of all email is spam, and I think we need to do better than that.)

While some techniques can be employed to “spoof” caller ID information on the PSTN, it’s difficult to do, so people generally can and do trust what their phone says when it rings. On the other hand, since SIP signaling flows all the way out to the edge of the network, this kind of identity is much easier to fake in a SIP network. Some deployment architectures have developed specialized “transitive trust” models that get you pretty close to what the PSTN provides today, but they don’t work across the general Internet, or when you transition from one architecture to another.

A more bulletproof means of conveying identity can be performed with RFC 4474, which uses cryptography to let a proxy on the call path make an assertion about the calling party’s identity. Unfortunately, RFC 4474 does suffer from some deployment difficulties, such as perceived deficiencies in key distribution, the difficulty in asserting ownership of phone numbers, and bad interactions with SBCs. And while there are good answers to each of those issues, they still have slowed down acceptance of RFC 4474 as a solution.

A related issue is validation that the person you’re trying to reach is the person you’ve actually reached. For example, if Alice is trying to reach Bob but really reaches Charlie, she needs to know this to make an informed decision. This is even more important when Alice is trying to reach, for example, her bank. There are fairly benign reasons that the called party might not be who the caller was trying to reach – a call-forwarding service, for example – but it also may indicate something more nefarious. To fill this niche, RFC 4916
defines a mechanism for conveying called party identity back to a calling party. It shares RFC 4474’s strengths (cryptographic assertions, leveraging the web’s public key infrastructure), but suffers from the same drawbacks as well.

One interesting twist to the behavior of RFCs 4474 and 4916 is that they only protect the caller and called parties’ addresses, not their names. To protect things like caller names, it becomes necessary to use a mechanism like cryptographic certificates with S/MIME.

Media Privacy and Authentication

Another user expectation of “secure calls” is a guarantee that third parties cannot intercept their call.  This is especially important when users make calls on a shared network, such as a public WiFi network, a hotel network, or certain types of cable networks. Unless the media itself is encrypted, anyone on the same network can use any one of a variety of easy-to-use call interception tools, including some very sophisticated, free ones, and record any call or calls they want to.

The other issue with media is ensuring that the media you receive is coming from the person you think it is. The ability to insert new media into a call can be highly damaging for certain types of calls.

Unfortunately, this area has historically suffered from too many solutions, as opposed to not enough. Luckily, the IETF finally winnowed the solution space down to a single approach for SIP media encryption: RFC 5763. There is also a competing solution in zRTP. This approach has some interesting properties that Jiri discussed in a previous posting – but it also suffers some non-technical drawbacks (see my response at the end of that article) that are likely to limit its deployment outside of the opensource and hobbyist communities. And, while zRTP provides encryption, it requires an onerous manual step to ensure that you’re talking to the person you think you’re talking to (and, without this protection, your call can be listened to by a sophisticated attacker in the middle of the network).

Hopefully, with the recent publication of RFC 5763, we’ll start seeing more vendor support for media privacy and authentication.

Signaling Confidentiality

A final aspect of SIP security that needs to be addressed is confidentiality of the signaling information itself. For voice calls, access to the signaling allows you to figure out who called whom and when. And, while the privacy implications of exposing that kind of information are evident enough, things get much worse once you start mixing in features like instant messaging and presence: eavesdroppers on this information can learn highly sensitive information, such as the contents of instant message conversations.

Support of TLS to protect information as it passes between network entities (say, from a phone to its proxy) is required by the baseline SIP protocol, and has fairly good implementation (on the average, approximately 50% of the implementations at the SIPit interop event
have had TLS support over the past few years). That’s a really good way to ensure that arbitrary third parties can’t eavesdrop on the information being sent.

But TLS doesn’t protect information from being intercepted by servers on the call path.

And while I might be happy to get my SIP service from bobs-discount-voip.com, I may be a bit more reticent to trust them with things I send and receive via instant messages – things like my banking information. And that brings us back to the use of S/MIME certificates, which can be used to hide this kind of information from proxies on the path (while still providing them enough information to route messages correctly).

Summary

So, back to the original question: if you wanted to have a simple, visual indicator to indicate that a call is secure… what would it mean? Is it a promise that the phone number on the caller ID is correct? How about the name? Does it mean that the media is encrypted? And, if it is, can you be sure it’s coming from where you think it’s coming from? Is the signaling protected? And, if so, is it protected from everyone, or can proxies along the call path read it? There are so many degrees of freedom here that there’s no good way to render them all to the user in a sensible fashion. And an all-or-nothing indicator (like a single lock icon) is completely nonsensical – as you’ve seen, SIP security is just about as far from “all-or-nothing” as you can get.

At this point, sadly, it’s mostly a moot point anyway – just about all SIP service providers employ exactly none of these techniques. But as user expectations around identity and privacy start colliding with the reality of service providers’ carelessness, we’re going to run into a few challenges making sure that users can be given the information they need to make informed decisions.


SIP and NAT Traversal: If not SBCs, then how?

April 20th, 2010by Adam Roach under SIP

Several previous entries in this blog have dealt with the issues that arise when SBCs and other back-to-back user agents (B2BUAs) are included in a SIP network. Of course, SBCs do serve useful purposes in the network – that’s why they were deployed – and you can’t really get rid of them until you understand how you’re going to do those things without an SBC in place.

One of the biggest issues that SBCs typically address is helping the audio and video sessions that are set up with SIP get through NATs and firewalls. If we get rid of SBCs, how do we do this? Luckily, the IETF has developed a suite of tools for exactly this purpose: STUN, TURN, and ICE. And, although they’ve been a long time in coming, the final RFC versions of these protocols are about to be published in the upcoming few weeks.

STUN, defined in RFC 5389, allows clients to determine that they are behind a NAT; and, if they are, to figure out which public address and port has been assigned to them by the NAT. Depending on the kind of NAT, this may be sufficient to allow NAT traversal for media. The load on a STUN server is generally very low, since it only has to process one message exchange for each call established. There’s also an adjunct RFC, RFC 5780, which will be published soon; it allows clients to determine some of the properties of the type of NAT they’re behind.

Once a client has used STUN to determine the address assigned to it on the firewall, it can then send this address to the other SIP device as the location to send media.

Figure 1: Using STUN to find an external IP address

 

TURN, defined in the forthcoming RFC 5766, uses a network server to act as a relay for client media. The SIP endpoint uses TURN to set up an association with the TURN server, and then advertises the TURN server’s address as the place that media is to be sent to. They use the TURN association to send and receive media through the TURN server to and from the remote endpoint. This has a significantly higher chance of success than STUN servers. On the other hand, the load on a TURN server is generally very high, as they must relay every packet in a media session to and from the endpoint.

Figure 2: Using TURN to relay media

 

ICE, defined in the upcoming RFC 5245, doesn’t have dedicated network servers per se. ICE is a technique employed by the endpoints to find the “best” viable path between the endpoints. ICE uses both STUN and TURN as means to collect potential candidate addresses. They then try these candidate addresses (along with other addresses they have, such a local IP addresses) pair-wise with the other endpoint. There’s a ranking system that ICE uses to try to find the “best” path (direct is better than through a NAT; through a NAT is better than using TURN, etc). This allows it to set up an optimal connection with the other terminal without needing detailed information about the network topology.

Figure 3: Using STUN and TURN with ICE

 

In practice, the application of ICE gives endpoints about the same chance of success as TURN does. The key difference is that when the endpoints use ICE, the TURN server isn’t burdened with calls that could have succeeded using STUN or direct connections.


<% Response.Write("" & vbcrlf) %>