Archive

Archive for the ‘SIP’ Category

Super Storm Sandy Highlighted Need for Signaling in Crisis Mode

When natural disasters like tropical storm Sandy hit, IP networks bring about a different challenge than traditional networks. Where network operators traditionally could block or throttle traffic after a storm to ensure congestion would not bring down networks, the status quo now is to have many elements of the network under the control of a 3rd party, which means operators cannot directly control all parts of their networks in a crisis.

Because IP invites many new methods for communicating, it also has to invite many new methods for managing the network. And as we see it, the network must be controlled at two different points: the packet network where the data flows, and the control plane where the signaling controls the sessions.

We also see two distinct forms of signaling, with signaling in the RAN and signaling at the core with Diameter. These forms of signaling serve different purposes. The signaling at the RAN typically establishes data session (or voice session if applicable), and signaling in the core uses Diameter to authorize and authenticate subscribers. Though the latter is not invoked as frequently as RAN signaling, it is just as critical to the operations of the network.

As proven during Sandy and other natural disasters, congestion of the core signaling network is a key concern operators have to address when friends and families flood lines in search of loved ones. When the core fails, nothing works, therefore making the core becomes a critical component in the network. This was also true within the SS7 domain, where operators also blocked traffic at the core level.

But, in using a point-to-point architecture, where the Diameter end-points are actually embedded within a network element, blocking of traffic could become difficult, if not impossible. That is attributable to the fact that congestion control can be applied only at the point at which the function resides. It’s well accepted, therefore, that a centralized approach to end-to-end core network congestion control is most effective.

The Importance of a Diameter Signaling Router in Crisis Situations

Geographic redundancy and traffic control is paramount to a robust signaling network that can survive any crisis. There exist countless examples of how the SS7 network survived calamities such as floods, earthquakes, fires, and even terrorist attacks. It was usually geographic redundancy and optimal routing managed through the core rather than the end points that made this possible.

In a Diameter world, the Diameter protocol itself does not inherently support automatic re-routing and disaster recover functions like SS7 did, but the same can be accomplished through a centralized routing function in the network core. That’s why a Diameter routing agent like our Diameter Signaling Router (DSR) is becoming so important to preventing core signaling outages during a crisis. The DSR ensures messages reach their destination through alternative routes known to the DSR. That means the messages so important to subscriber databases like the Home Subscriber Server (HSS) , policy servers (PCRF), charging systems and gateways will get through in times of disaster.

And most importantly, it means operators can continue to generate revenue from services requiring Diameter signaling, even in times of disaster.

4X4SJXBU78FH

What is a Signal?

Ever wonder what Lily Tomlin was doing when she would say “one ringy dinghy, two ringy dinghy”? Or how about Sarah in Mayberry RFD when Andy would pick up the phone, turn the crank a few times, and ask her to connect him to Aunt Bea? These are all examples of signaling being used to connect calls in the days before electronic switching. When you wanted to make a call, you turned a crank on the side of the phone, which then triggered “signaling” in the form of a light illuminating and a bell ringing on a switchboard.

The operator would then ask a series of questions so she knew how to connect your call (signaling again), after which she would manually plug a cord into a jack on the switchboard, completing the circuit to the destination, or to another operator in another city.

Signaling has changed drastically through the years, with everything involved now fully automated. Signaling allows the various elements within a network to communicate with each other regarding a specific connection. But nowadays, signaling takes many forms, depending on its purpose. There is signaling between a mobile device and the cell tower. There is signaling between the cell tower and the core network. And there is signaling within the core of the network. Regardless of its purpose, signaling up to now has been nothing more than pure overhead, contributing little to service provider revenue.

Though signaling has taken many forms over the years, the industry is now making a concerted effort to consolidate technologies and reduce the number of signaling methods used in networks to just two: Session Initiation Protocol (SIP) for connecting voice and video, and Diameter protocol for authorizing and authenticating subscribers and their devices.

Not only is Diameter used to access subscriber databases authorizing network access, but it also is used for charging as well. Most importantly, Diameter is used by network elements to communicate with the Policy and Charging Rules Function (PCRF).

It is the PCRF in the Evolved Packet Core (EPC) that allows service providers to personalize services they deliver to their subscribers, whether tiered service plans, parental controls or others. The role of policy in the network continues to grow as service providers get more and more creative with the rules they can generate to control the traffic in their networks and define new services.

The PCRF not only contributes to the bottom line on the balance sheet, but it generates new revenue streams for service providers such as mobile advertising and over-the-top (OTT) application subsidies.

Never before has one function in the network represented so many new opportunities for service providers, which are literally redefining the role that they play in the mobile ecosystem. They can now offer to their subscribers more intelligent choices tailored to their lifestyles, while also engaging new partners previously seen as competitors for the purpose of creating more compelling services.

OTT players such as Google, Facebook, and YouTube depend heavily on the network to reach their subscribers, but until now have contributed little to nothing back to the service providers as compensation for the network costs. But that can change as OTT players come to realize the value of becoming partners. As that happens, signaling will continue to move to the spotlight as a revenue generator rather than a pure cost of doing business.

As that happens, Diameter will be the signaling protocol that makes monetization of OTT services possible, and it might possibly be the one technology that will change the face of service provider business models forever.

Yet More on Proxy versus B2BUA

June 22nd, 2011by Jiri Kuthan under SIP

It seems that for some readers I was too little technical on this subject in the past, so let me address that, particularly for transparency and feature richness.

First of all, what is it exactly lack of transparency, and why is it bad? In the Internet context, transparency means the network is guaranteed not to interfere with traffic between end devices. Transparency therefore affords innovation because new features will not be impaired by unexpected network behavior. More can be found in several RFCs, namely rfc4924 and rfc2775. Similarly in the SIP context, the network represented by SIP servers, is transparent if the servers interfere only little with traffic.

Behavior of a SIP proxy server is strictly governed by RFC3261: A compliant SIP proxy server only modifies few header-fields to mark the SIP message path (Via, Record-Route, Route) and that’s it. In the field, such a hard constraint turned out to be impractical particularly due to NAT traversal. In many cases, SIP traffic from behind NATs advertises un-routable addresses (affected message parts: SDP payload, Contact header field). Then “pragmatic” proxy server implementations choose to give up on strict compliance and change the un-routable addresses to routable ones.

The key difference in a B2BUA is that rewriting SIP traffic is not an exception, it is a rule. SIP proxy servers are bound to keep messages unaltered and implementations only divert from compliance when necessary. In contrast to that, a B2BUA is not bound to keep traffic unaltered by any standard. In fact, many B2BUA implementations are built upon the concept that altering as much as possible is a feature. Some apparently suspect this is sort of academic concern — well it is definitely not. Here is a very real-world implication of B2BUA: you simply cannot troubleshoot traffic. If a request is changed, while visiting a B2BUA, it is very hard to correlate incoming traffic to outgoing. Even if — in absence of unique call id — you try to correlate by phone numbers, there are numbers to which so many calls are routed that you will get lost in tons of traffic.

Similarly the argument that B2BUA produces features is fairly broken as it in fact tends to break them. A good example is the attempt to standardize reverse verification of incoming calls as a practice. The idea has been fairly simple: when a call is coming in and before your phone begins to ring, you ask the originating end devices if the call is genuine. The argument from B2BUA vendors has been you cannot do this because B2BUA’s way of mangling traffic will remove a valid reference between the original call and the verification request.

Shortly, B2BUA does aggressive changes to SIP traffic, and these changes have negative impact both on operational life and introduction of new features.

Some argue too that B2BUA creates new features. That may sometimes be the case, even though not always, see the reverse verification example. Less vaguely this is the case when nature of a feature requires an automaton to alter existing calls. The most frequent case I am aware of is network-terminated calls either on exhaustion of prepaid calling card credit or dead end-devices.

Keeping call information does not come for free though: it increases memory consumption and effort to replicate data to offer reliability.
This may sound too uninteresting in the age of cheap gigabytes of memory, however in high-density deployments that utilize servers’ memory to full extent it means buying more equipment, and being less “green”.

What I can therefore recommend is to use “pragmatic” proxy servers for better scalability and fewer side-effects unless you need to terminate calls from inside SIP-networks. It may be accommodating to find that some vendors offer servers that offer both “passive” proxy mode and active call-state approach, which gives you the possibility to change your operational mode if your network’s requirement change.

Wait a Second!

June 9th, 2011by Robert Sparks under SIP

SIP has several robustness mechanisms that leverage being able to say “Wait a bit before you try that again.”

A 486 Busy Here response can contain a Retry-After header field, allowing the endpoint to say “Please don’t try to call here again for 30 minutes,” based perhaps on knowledge it obtained from its user’s calendar.

A 500 Server Internal error can use Retry-After to say “Something’s keeping me from servicing this particular request right now, but please try again in 5 seconds.”

A 503 Service Unavailable error can use Retry-After to say something much stronger: “Something’s keeping me from servicing _any_ requests. Don’t send me anything more for at least 30 seconds.” As we’ll see in a moment, this is a very strong statement – one that needs to be carefully invoked.

The SIP Events architecture provides a way for an event server (such as a presence server) to say “I’m tearing down this subscription, and I need you to resubscribe, but don’t try to do so until at least 20 seconds have passed.” It does this with a NOTIFY containing a Subscription-State header similar to this:

    Subscription-State: terminated;reason=probation;retry-after=20

These mechanisms allow servers to avoid, and even redistribute, load. A registrar handling a burst of simultaneous registrations can quickly tell some or all of them to wait, using a different wait times to spread the returning load out a little.  One node in a cluster of presence servers can move its subscriptions to its peers by throwing all of its subscriptions into probation, as described above, again using a range of different wait times for different subscriptions. As the clients re-establish their subscriptions, the mechanisms for finding SIP servers can distribute the subscriptions among the peers. 

While the mechanisms are useful in the situations I’ve described so far, and may be exactly the tools an application relying on a limited external resource like a specialized DSP needs, they aren’t sufficient to handle the general case of overload protection. The granularity the tools work at is either very small (affecting this particular method applied to this particular resource), or very large (affecting all traffic between two elements). The IETF’s SOC working group is developing richer ways to help a server avoid being overloaded.

But even with those tools, there are situations where crushing load can appear before mechanisms at the SIP layer have a chance to help. Avalanche-restart scenarios, when whole campuses or even cities full of clients all come online at the same time due perhaps to restoration of power are a good example. In the extreme, action closer to the physical layer of the network (such as using firewalls to introduce the load in smaller increments) is warranted.

Finally, like most tools, using them without understanding what they do can lead to surprising results. Any code that generates a 503 Service Unavailable response, for example, deserves careful inspection. Some early proxy implementations make the mistake of forwarding 503 responses, when they should be taking a received 503 as input into generating their own final response. By blindly forwarding a 503, they are saying “Stop talking to me” instead of “I can’t find something that can handle this request,” which leads to unintended failures, such as the following:

Here, Alice and Bob are in a SIP dialog, perhaps for a phone call. Carol and Dave are in a separate dialog, either a different call or perhaps they have a subscription set up.  

503fig1

Something goes wrong with Bob’s UA and it has to return a 503 to a request it received.  Proxy 2 does the wrong thing and forwards the 503.

503fig2

Now Carol’s next request towards Dave can’t be forwarded through Proxy 2, even though there was nothing really preventing Proxy 2 from being able to service the request.  Carol and Dave have lost service unnecessarily and have no idea why.

503fig3

Alice (or anyone else whose requests towards Dave would have taken the path from Proxy 1 to Proxy 2) can’t reach Dave either.

Proxy 2 should have returned its own response, probably a 480 Temporarily unavailable, to the request that elicited the 503 from Bob’s UA. That way only the requests Alice was sending to Bob would be affected.

MSRP Session Match Backwards Compatibility

May 24th, 2011by Ben Campbell under SIP

My last post described the MSRP Session Match extension (aka “sessmatch”) and its purpose. I mentioned that there were some backwards compatibility issues. This week I will discuss those issues in more detail.

The session matching criteria in sessmatch were intended to be backwards compatible with RFC 4975. Remember, sessmatch relaxes the session matching rules so that only the “session identifier” component of an MSRP URI is considered when matching an MSRP request to a session. An MSRP URI that matches a session according RFC 4975 (that is, an as an exact URI match) always also matches according to sessmatch. And as long as nothing modified an MSRP URI in the SDP offer and answer, then things will still match according to RFC 4975 rules.

But keep in mind that the whole point of sessmatch is to allow a device, for example an SBC, to modify the MSRP URIs in the SDP. So I think we can assume that, whenever sessmatch is used, something probably does modify the URIs.

As sessmatch was specified prior to version 11 of the draft, An MSRP endpoint that supports sessmatch, and is also behind an SBC, cannot talk to an endpoint that does not support sessmatch.

And even if both endpoints support sessmatch, things break down if one endpoint uses an SBC and the other uses an MSRP Relay. The sessmatch extension, as currently written, does not apply to MSRP relays. But relays use the same session matching rules as in RFC 4975. So if an SBC modifies the offer or answer, then the session will fail to match at the relay.

That problem could be mitigated if we extended sessmatch to apply to relays as well. But there’s still a more subtle problem. One very popular feature of SBCs is something called “topology hiding.” Topology hiding means that the SBC hides artifacts (such as IP addresses, host names, and URIs) that identify hosts on one side of the SBC from devices on the other side. This is done for a variety of reasons. Some providers believe their internal topology to be sensitive information. Some want to anonymize the IP addresses of their customers.

Topology hiding can be applied in any number of places. For example, it’s common to hide SIP Route, Record-Route, and Via header field values. But if an SBC performs topology hiding on SDP payloads, then MSRP relays will break.

An endpoint that uses an MSRP relay puts the entire MSRP path to the endpoint in its SDP path attribute. This may look something like the following:

a=path:msrp://relay.example.com:2855/asfd34;tcp msrp://bob.example.com:3464/siefkd938;tcp

MSRP Relays are session-stateless. They need this entire path in order to figure out where to route MSRP messages. But if a topology hiding SBC removes part of that path, the relay can’t deliver inbound messages. For example, the SBC might change the previous path attribute to look like this:

a=path:msrp://sbc.example.com:2855/asfd34;tcp 

Now when Bob’s peer tries to send an MSRP SEND request, it will probably get as far as Bob’s relay. But the relay will have no idea how to send it on to Bob.

Version 11 of the sessmatch draft recognizes the backward compatibility issues and proposes a new SIP option tag that indicates an endpoint both supports and is deployed in a way that it can actually use the sessmatch extension. This will at least allow endpoints to fail in a graceful way if their network policies prevent them from exchanging MSRP messages. In the best case, the endpoint behind the SBC could attempt to use an MSRP relay instead, or an SBC could change its behavior for the session to avoid incompatibilities. But in reality, providers that use an SBC in the first place are unlikely to allow such fallbacks as a matter of policy.

The option tag will help prevent the balkanization of the MSRP protocol itself. Unfortunately, the need for sessmatch in the first place shows that service providers, through incompatible policies and network designs, are likely to break the MSRP user communities into islands that can’t talk to each other.

The end of IPv4, part 3: Towards a post-shortage world

May 17th, 2011by Adam Roach under SIP

In my last two posts [1][2], I talked about the recent exhaustion of the IPv4 address pool, some of the approaches that are being considered to squeeze a little more life out of the existing IPv4 address space, and the unfortunate consequences of those approaches.

I’d like to start this entry by following up on my statement regarding the emergence of an IPv4 exchange market. A few weeks after my last post, NetworkWorld published an article on the IPv4 broker websites that emerged almost immediately after the Asia/Pacific RIR issued its last address to an ISP.

Of course, exhaustion of the IPv4 network space was hardly an unforeseen event. As early as 1995, it was obvious to the IETF that 4.2 billion addresses were simply not enough. And so, IPv4’s successor – IPv6 – was born.

IPv6 actually adds a lot of useful features that weren’t present in IPv4, but the key one that has people talking is that it has a much bigger address space than IPv4. Instead of being able to represent 4.2 billion addresses, IPv6 can talk about 340 undecillion addresses. To put that in perspective, IPv6 will allow each square millimeter of earth, including the oceans, to be assigned over 67 million addresses. Each.

Certainly this technology that can save us from the coming IPv4 pain must be exotic and unavailable, right? Not really. All Microsoft operating systems since Windows 2000 have supported IPv6. Linux has included IPv6 support since 1996. Macs have had IPv6 since OS X 10.2 (which came out in 2002).  In fact, there isn’t a viable operating system connected to the internet that didn’t have IPv6 support by 2003.

So it must be the core of the network holding us back, right? Together, Cisco and Juniper constitute around 80% of the core Internet router market. Cisco has had IPv6 support in its core routers since 2001, and Juniper has had support since at least 2002.

So the networks have supported it for almost a decade, and the endpoints have supported it for almost a decade… what’s the hold up?

There are two barriers left to clear: ISPs and Applications.

Exactly why ISPs don’t offer IPv6 to their customers is a complete mystery to me. Axel Pawlik, the managing director for RIPE (the European regional internet registry) summarized the issue quite succinctly at the beginning of the year: “If [ISPs] do not have any plans for IPv6 now, [they] are irresponsible. They should have that in place, if they do not have that by now something is going seriously wrong.” However, even as the last of the IPv4 addresses are being handed out, most ISPs haven’t begun any public dialog about their transition plans. Notable exceptions exist – Comcast has led the charge in deploying IPv6 to actual customers – but they’re rare exceptions. The other major U.S. ISPs are strangely silent on when they plan to roll IPv6 out to their residential customers. Right now, the only reliable way to get a true IPv6 connection in the U.S. is to work with a boutique ISP like Hurricane Electric or Global Crossing (soon to be part of Level 3).

The other hurdle is application support for IPv6. The biggest applications on the Internet – web browsers, web servers, email – already have a fairly widely deployed base of IPv6-capable software. Less-commonly used applications – for example, Slingbox media players – don’t have IPv6 support yet. Luckily, IPv6 isn’t an all-or-nothing proposition. All of the operating systems I mention above have the ability to operate in “dual stack” mode, where they have an IPv4 and IPv6 address at the same time. Older apps can use IPv4 (which may subject them to the unfortunate effects I discussed in my previous post), while newer apps will be able to avoid the pain of IPv4 exhaustion by using IPv6. So this isn’t a true hurdle in the way that ISP support is; once ISPs start issuing IPv6 addresses to customers, application support will gradually improve over time. And, since the most-used applications are already IPv6 capable, things will start out pretty good anyway.

We’ve done a lot of work in the IETF to make sure that SIP can easily be deployed over IPv6. While the version if SIP that was published in 1999 (RFC2543) didn’t include IPv6 support, the subsequent 2002 version (RFC3261) did. And it was published along with an addendum to SDP to allow it to talk about IPv6 addresses for sending and receiving media. Subsequent implementation experience led to the publication of some further minor clarifications of SIP’s IPv6 handling, but the core support for IPv6 in SIP has been around for almost nine years. And the good news is that 68% of the SIP implementations present at the most recent SIP interoperability test included IPv6 support. This is up from 53% in late 2010, 36% in 2009, and 30% in 2008.

In other words, things are actually looking pretty good for network applications in general, and really good for SIP applications in particular.

The good news is that most of the really hard problems – getting IPv6 support into the Internet at large, getting IPv6 onto everyone’s computer, and getting IPv6 support into the most important applications on the Internet – have all been taken care of. All we need now is for operators to step up to the plate, and the pain caused by IPv4 exhaustion begins to fade into nothing more than a closing chapter in Internet history.

SIP Load Balancing != IP Based-Load Balancing

May 12th, 2011by Dorgham Sisalem under SIP

When it is time to scale up a SIP infrastructure the network planner will most likely ask himself: Because DNS is not a sufficient solution, would a simple IP load balancer be OK?

A simple IP load balancer would act as a front-end for the SIP cluster and all traffic going to the SIP cluster would pass the load balancer. This can be achieved by having a DNS entry for the SIP cluster that maps the URL of the cluster to an IP address that is served by the load balancer. The IP load balancer would then distribute the incoming SIP traffic using some load distribution mechanism such as round-robin or based on the hash of the source IP address.

Such an approach might be sufficient for the case when the SIP nodes in the cluster are transaction stateless SIP proxies. In all other cases, this simple approach would not work:

  • Responses and requests for the same transaction should traverse the same nodes. Hence, the load balancer should at least be able to route the responses based on the VIA header, otherwise the response will reach a SIP node that knows nothing about the transaction and will most likely just drop the response or generate an error. This means that the load balancer will need to act as a transaction stateless proxy and parse at least the VIA headers.
  • In case all requests that belong to the same dialog are expected to be processed by the same server in the cluster then using round-robin or a hash of the source IP address will not work as well. This would be the case, if the SIP server is collecting and generating CDRs for example or the SIP server is an IVR. Why round-robin is not an option should be clear. Using a hash of the source IP address for determining the SIP node could work in a perfect world. However, as a SIP client might change its IP address during the same dialog or the size of the cluster might change. For example, if a server is added or removed from the cluster then the hashing mechanism will lead to wrong results.
  • In some scenarios such as clusters of PSTN gateways, the nodes of the cluster might generate calls themselves. In this case the load balancer will need to be able to route the incoming responses to the right nodes. This will require the load balancer to be able to process the SIP headers and route the responses using the VIA headers.

So, in short, a load balancer for a cluster of SIP nodes must have some SIP logic. The level of SIP logic will depend however on the usage scenario and the type of servers in the cluster as well as the expectations of the operator.

In general one can implement a SIP load balancer in one of two ways:

  • Transparent: The existence of the load balancer is transparent to both the clients and servers. Clients send their traffic to the load balancer, which forwards the traffic to the servers without adding any SIP headers. The servers use the load balancer sort of a router to send their responses back to the clients. The VIA and Record-Route headers in the SIP messages leaving the load balancer will include the IP address of the load balancer. This can be achieved by either convincing the nodes in the cluster to use the IP address of the load balancer when adding a VIA or Record-Route header or by having the load balancer manipulating the messages leaving the cluster and replace the IP addresses included in the messages with its own address.
  • Non-Transparent: The load balancer acts as an outbound proxy that receives traffic from clients, then adds VIA and possibly RR headers and forwards the traffic to some server.

The transparent mode has the advantage that the addresses of the nodes in the cluster are hidden from the clients and provides this way topology hiding. Also, when the servers in the cluster are supporting NAT traversal, then in the case of symmetrical NATs the clients expect that incoming calls are routed through the same SIP server which is handling the registrations and outgoing calls of the client. With the non-transparent approach the load balancer would have to deal with the NAT traversal aspect itself. With the transparent approach the different servers in the clusters would be each responsible for a subset of the clients which would keep the complexity of the load balancer low and its capacity high.

A major advantage of the non-transparent approach is that the load balancer acts as a SIP proxy and can for example reroute requests that are rejected by an overloaded server to another one, for example.

First SIPNOC Event

May 3rd, 2011by Jiri Kuthan under SIP

 On April 25-27, SIPNOC, a new expert event organized by the SIP Forum took place, in Washington, D.C. The event was inspired by the North American Network Operators Group (NANOG) and was aimed at providers to exchange hands-on experience. Almost 150 attendees discussed a variety of operational technical topics. Here are a few topics which generated strong attention:

  • Monitoring networks and traffic analysis was a topic of major interest. SIP is enjoying massive deployments, in which seeking errors is like seeking the proverbial needle in haystack.
  • Legacy features, such as fax, continue to be demanded while they do not work to perfection yet. SIP Forum is addressing these shortcomings in the FoIP Task Group.
  • The IETF is perceived as a standards body with limited attention to operational aspects of its architecture. Particularly weak identity schemes and lack of crisp migration path to IPv6 were criticized, even though it was fairly hard to find companies working on such migration.
  • Security continues to be under-appreciated. For example, many service providers keep using fairly simple authentication based on source-IP-addresses. It seems it will have to take a publicized incident until attention will be paid to this topic.
  • Many detailed operational observations were shared. For example, a presenter was demonstrating a hard-to-find 10 percent bandwidth loss caused by interaction of G.711 with HDLC. And Guess what? A frequently appearing G.711 bytecode happens to be escape-coded in HDLC’s payload, resulting in additional bandwidth consumption.

The presentations are available on the SIP Forum Website. Stay tuned — the event will continue.

 

SIP Crossing Multiple Transports and Address Families

April 26th, 2011by Robert Sparks under SIP

At the SIPit earlier this month, the multiparty testing uncovered a few implementations that assumed that since they only used SIP over IPv4 with protocols like TCP and UDP, they would never encounter an IPv6 addresses or tokens related to other transports like SCTP or DTLS in the SIP messages they receive. This is becoming more unlikely as more advanced transports are being deployed and as we go further into the IPv4 to IPv6
transition
.

During rendezvous, a SIP request may traverse several SIP proxies. The RFC 3263 rules for locating the server at each hop can result in different transport protocols.

For example:

fig1.cp

The Via stack and the Record-Route stack that are created as the message is forwarded will contain tokens and addresses from each of the traversed networks. Here’s what the Via and Record-Route header fields from the last message in the diagram might look like:

Via: SIP/2.0/UDP 192.0.2.1;received=198.51.100.15;rport=55021;branch=z9hG4bK8h23i5
Via: SIP/2.0/TLS proxy1.example.net;received=[2001:db8::2:1];branch=z9hG4bK3udn3z13
Via: SIP/2.0/SCTP proxy2.example.net;received=198.51.100.18;branch=z9hG4bK23sd093n42
Via: SIP/2.0/UDP proxy3.example.net;branch=z9hG4bK3s09ujnsnnen3
Record-Route: <sip:proxy1udp.example.net;lr>
Record-Route: <sips:proxy1tls.example.net;lr>
Record-Route: <sips:proxy2tls.example.net;lr>
Record-Route: <sip:proxy2sctp.example.net;lr>
Record-Route: <sip:proxy3sctp.example.net;lr>
Record-Route: <sip:proxy3udp.example.net;lr>

Note the double-record-route technique used by the proxies. For more information, see RFC 5658.

Now, these endpoints don’t need to be able to parse these tokens and addresses, at least not to the level of understanding their internal structure – they just need to be able to preserve them, following the requirements in the standards for including them in subsequent messages. This is where a few implementations ran into trouble – resulting in errors from failed parses of information they didn’t need in the first place. For instance, some IPv4-only endpoints assumed they would never see IPv6 addresses in the messages they received, so they dropped the incoming message above as malformed when parsing across the [2001:db8::2:1] address in the second Via header field value. These implementations didn’t really need to parse the value at all. They only need to reflect it, unchanged, in any response they send to the request. Following Postel’s Maxim in this case would result in higher levels of interoperability.

An earlier article went deeper into how RFC3263′s requirements provide for switching between transports like UDP, TCP, and SCTP at each hop. One thing that RFC doesn’t specify well is when to use IPv6 or IPv4 when both are available. As IPv6 becomes more available, dual-stack hosts (those able to use IPv6 and IPv4 at the same time) will encounter usability conditions that change over time. As a new IPv6 path is formed between endpoints, configuration and optimization changes may lead to IPv6 significantly outperforming IPv4 early in a given day, but not work as well (if at all) later that afternoon. How does a host choose which address family to use when it needs a new connection for a request?

Work is ongoing in the IETF to provide an algorithm answer that question. The current working document focuses on TCP and uses a weighting parameter, P, that affects what order the different address families are tried and how long to delay before trying the other family.

When P is 0, both families are tried simultaneously, each starting with the A or AAAA DNS lookup, followed by a TCP connection attempt. Whichever connects successfully first causes P to be adjusted in the direction that favors the quicker family. Positive values of P favor IPv6, negative favor IPv4. The size of the adjustments depend on the several factors, including whether the quickest family was the one currently favored.

fig2

When P is non-zero, the attempt to use the second family is delayed by 10*abs(P) milliseconds.

In all cases, when the attempts using both families succeed, the connection that was established first is kept. The subsequent connection is reset.

These ideas are still being refined – the current working draft anticipates that algorithm will be adjusted before it is proposed as a standard. For instance, there may need to be more clarity around which address family to issue the DNS queries over. There is also work proposed to specify what to do when the choices are richer than TCP over IPv6 vs. TCP over IPv4.

<% Response.Write("" & vbcrlf) %>