Archive

Posts Tagged ‘SIP Protocol’

Multiparty Communications in SIP: A Brief History

October 5th, 2010by Adam Roach under SIP

A lot of the industry focus in SIP pertains to two-party interactions: a calling party and a called party; an instant message sender and an instant message recipient; and so on. But in the IETF, we’ve actually done quite a bit of work to facilitate group communications.

In fact, the SIP effort originated in the IETF’s Multiparty Multimedia Session Control (MMUSIC) working group. This work traces its roots all the way back to 1992, when the first Conferencing Control (CONFCTRL) BOF met at IETF 25, which was, itself, spawned from the Remote Conferencing Architecture (REMCONF), work that started shortly before that.

Despite this long history of conference-oriented working groups in the IETF, a lot of the thought around how to facilitate communications with three or more parties in SIP didn’t start in earnest until a decade later, with the introduction of a conferencing framework that would eventually be published as RFC 4353. This document spurred some work on preliminary conference control within the SIP protocol itself, using existing SIP tools such as REFER (RFC 3515), “Join” (RFC 3911), and “Replaces” (RFC 3891) to control conference servers. This SIP-based conference control is published as RFC 4579. Its companion document, RFC 4575, defines a means for SIP clients to learn certain advanced facts about the state of an ongoing conference, such as the list of participants and the types of media in use in the conference.

At a very high level, with SIP conference control, conferences spring into existence when users send an INVITE to a URI that corresponds either to a predefined list of users, or to a special URI (a “factory” URI) that creates a new ad-hoc conference. The conference creator can then send SIP “REFER” requests to the conference to add or delete users to and from the conference.

As work on the SIP conference control progressed, two facts became clear: first, the effort to develop a comprehensive conference control protocol could easily become far too large for the SIP working group to tackle while doing other work; and second, the use of SIP to provide more advanced conference control was an increasingly ill fit. As a consequence, the IETF formed the Centralized Conferencing (XCON) working group in 2003. XCON was chartered with creating a new protocol for the purpose of creating and controlling multi-party conferences.

Although initial interest in the XCON work was high, it took considerable time to rationalize the various conflicting approaches that were proposed into a unified system. Some proposals strove for simplicity, while others wanted to define arbitrarily complex systems for describing media mixing and video panel layouts. Some wanted syntactic manipulation of documents representing conference state, while others wanted semantic operations on objects representing conference participants and conferences themselves. In hindsight, this level of conflict isn’t surprising; the problem was identified as early as 1993 in the CONFCTRL notes from IETF 26: “It is difficult to design a CONFCTRL protocol that balances simplicity with a high degree of semantic flexibility, e.g., Jack Jansen concluded that different conferencing styles require entirely separate CONFCTRL protocols.”

While XCON plugged away at its chartered work, the SIP working group (and related groups like SIPPING and SIMPLE) moved forward with several extensions that relate to multiparty communications. RFC 4662 defined a mechanism for subscribing to resource state for several resources at the same time (e.g., to learn presence information for a list of friends all at once). RFC 4825 (XCAP) and RFC 4826 (the resource list XML format) defined the means to create, manipulate, and delete the members of a list, providing users the ability to dynamically change the users that a conferencing URI corresponds to.

Moving beyond these long-lived lists, later work within SIP allowed users to actually send the list of relevant URIs in the request itself, using a framework known as “URI-List Services” (RFC 5363). This framework, along with RFC 5364, defines the syntax for conveying lists of URIs in SIP message bodies (using multi-part MIME bodies) and for tagging copy control attributes (equivalent to “To,” “Cc,” and “Bcc” in email). The framework has been defined for operation with MESSAGE (RFC 5365), INVITE (RFC 5366), SUBSCRIBE (RFC 5367), and REFER (RFC 5368) so far.

URI-list services for MESSAGE allows users to send a single instant message to a special “message exploder” URI, and have that exploder copy the message to all the users listed in the URI list. The INVITE and REFER extensions allow users to apply an action to multiple conference participants when using RFC 4579 mechanisms. And the SUBSCRIBE extensions allow users to subscribe to the presence state for several users at once, without first creating the list of users with XCAP.

Of course, with the ability to send instant messages to many users at once, or to make many phones ring at the same time, comes the potential for abuse. To mitigate this, the URI-list services were published in conjunction with a consent framework (RFCs 5360, 5361, and 5362). Effectively, these consent protocols allow server operators to provide an opt-in experience for users named in URI-list services requests.

Meanwhile, XCON has been making steady and solid progress, and has finally sent its key deliverable – the conference control protocol itself – to the IESG for evaluation and publication as an RFC. At the same time, the SIP instant messaging and presence working group (SIMPLE) is nearing completion on a document that defines specific behavior for text-chat-room conferences. I expect both of these to reach RFC status some time in 2011.

However, even as this work winds down, new work is spinning up in the IETF for controlling some additional media-related aspects of conferences. Specifically, an as-yet unnamed working group is in the process of being chartered for the full-immersion conferences commonly referred to as “telepresence.” The general idea of the work to be taken on is described in the teleconference use case document, with specific proposed deliverables defined in the currently proposed working group charter.

FAQ: Why do SIP proxies sometimes absorb responses?

June 29th, 2010by Robert Sparks under SIP

SIP proxies play a key role in realizing SIP’s rendezvous service – helping entities that want to communicate find each other by forwarding SIP requests to the places they can best be served.

 

A proxy can try the request at more than one place, either one at a time, a few at a time, or all at once. This behavior is called forking.

A request may be forked by more than one proxy on the way to its destination.

Each of the places receiving the request is supposed to generate a response. The proxy is responsible for choosing the “best” response to forward back to the requester. Except for one special case, the proxy will only return one final response to the request it receives. The final responses from the other branches of the fork are dropped at the proxy. RFC 3261 section 16.7 discusses how the proxy chooses the best response.

SIP proxies were originally designed this way to allow the endpoints to have the same behavior whether there were proxies in the path of a request or not. (Other design decisions forced different behavior at the endpoints anyhow). Dropping the other requests was a tradeoff, and it introduced a problem known in the specifications as HERFP (the Heterogeneous Error Return Forking Problem). When there is a mix of error responses from the various fork branches, only one is returned to the requester, but that requester might have been able to do something useful with the other responses. In the example above, the rightmost phone’s response could have included information about when to try the request again. The original requester would have learned that a future call there had a reasonable probability of being accepted.

Though the condition is known as HERFP, it applies to non-error responses for non-INVITE requests like REGISTER or SUBSCRIBE. For any non-INVITE request, a proxy will only return one final response, whether it’s a success or error response. This is why the SIP Events extension to SIP requires that elements accepting a subscription MUST send an immediate NOTIFY.

 
 

There is one special case where a proxy might return more than one final response to a request. When a proxy sees a 200 OK to an INVITE it is required to forward that to the requestor, even if it has already forwarded a 200 OK from another branch. This exception was added to the protocol to allow the calling user to choose which person to talk to if more than one endpoint answers a call. Other protocol rules try to make this condition unlikely. When the proxy sees the first 200 OK to the INVITE, it will send a CANCEL request to all the other branches. A second 200 OK could only be received from one of those branches if it crossed that CANCEL on the wire (or in the processing at the endpoint). Unfortunately, it’s not hard to encounter that race-condition in practice. It’s up to the endpoint to decide what to do if it finds itself in multiple calls after sending an INVITE. Many deployed endpoints today send an immediate BYE to peers beyond the first accepting their call.

There have been a few proposals in the past for changing or extending the protocol to avoid HERFP, allowing the endpoint to learn about all the final responses that currently get absorbed at proxies. None of them achieved consensus. So far, it’s an open problem.

Can SIP be a successful protocol?

January 19th, 2010by Dorgham Sisalem under SIP

Some time ago my colleague Jiri Kuthan recommended me to read RFC5218. In it the authors discuss what makes protocols succeed or fail. A successful protocol is defined as one that meets its design goals and is widely deployed.  The authors present some factors which they believe to be crucial for the success of a protocol and present some use cases in which they apply these factors to some successful and failed protocols. Among these factors the authors list the design, extensibility and openness of networking protocols.

While reading the RFC I started thinking, what would be the result of applying these factors on SIP:

Initial Success factors: These are the factors that help a protocol to become successful in the initial phase of their deployment

  • Positive net value: SIP obviously solves a problem; namely that of establishing a session in IP networks. While SIP bears the promise of enabling all kinds of sessions it is mostly used for establishing voice calls. In this context it does not offer more functionality than traditional SS7 signaling, H.323 or Skype. The real positive net value of SIP is hence demonstrated when operators start deploying more SIP-based services such as presence and application servers that offer more flexible and intelligent communication services than we have today.
  • Incremental deployment: SIP can be deployed without having to update the network routers. However, unlike the arguably most successful Internet protocol, HTTP, it is not sufficient to provide a server and a client. For a communication service to be of use there must be a lot of clients and users available. While there are already different providers offering VoIP services using SIP with millions of users, these providers act as islands that are connected over the PSTN. Hence, in order for SIP to excel on this point, more SIP-based peering between providers is needed.
  • Open code availability: There are already different open source components needed for a SIP service. The SIP Express Router is an excellent and widely used SIP proxy. Asterisk and SEMS offer flexible and easy to use media services such as IVR or conferencing. On the user agent side, there are also different implementations of different quality.
  • Restriction free: SIP is a provided as a patent free technology for all.
  • Open specifications: The SIP specifications are provided by IETF and are open.
  • Open maintenance: SIP is maintained by the IETF and is extended and fixed continuously. While this is surely a good thing, this has also led to a load of specifications that some might claim are too much.
  • Good technical design: While SIP was being hailed at the beginning as the simpler alternative to H.323, it has gained a lot of weight over the years. Taking the same comparison factors used in RFC5218 – namely security and congestion control – then SIP does not seem so perfect as congestion control is not considered and it does not have a powerful concept for identity management. Also, deployment issues such as NAT traversal were only added at later stages.

Wild success factors: These are the factors that contribute to success and wide deployment:

  • Extensible: While designed in the early stage for simple calls, SIP is now used for multi-party calls, presence and trunking scenarios. Also, the integration of new applications and services should be rather straightforward as SIP is not restricted to a certain usage scenario.
  • Scalability: While we still do not have any experience regarding the cost and complexity of building a SIP infrastructure for hundreds of millions of users. I do not see a real reason why this could not be done.
  • Security: SIP has different mechanisms for authenticating users and protecting the signaling traffic. However, it does not have explicit mechanisms for protection against DoS attacks or fraud.

Discussion

Looking at the points above it looks like SIP has more or less a positive result on the discussed factors. However, getting positive marks on the evaluation factors does not mean that a protocol will be a success. If we evaluate Skype based on these parameters then we should conclude that Skype should fail. There is no open source code or open specifications and the net value is not much higher than PSTN or SIP. However, the number of users of Skype is higher than that of SIP.

So does this mean that SIP will become a wild success? Well, I guess the answer is a very definite maybe! The success or failure of a protocol can only be judged 5 to 10 years after finishing the standardization – so we still have a few years in front of us. But, it has the needed success factors, and with more applications, peering relations and clearer business models, the chance that SIP will be wildly successful are pretty good.

Why there isn’t a successful SIP certification program

December 1st, 2009by Robert Sparks under SIP

Over the last several years, I’ve had many conversations about building a certification program for SIP, including trying to define a few. All of those conversations have ended either in frustration or the conclusion that such a certification program is not the right thing to build.

The proponents of such a program came in each time with a lot of energy and excitement. The conversations got tough when we looked closely at what the program would actually certify. What do you test? What do you require a passing system to do? Each time, it turned out that the proponents really had a single, focused use of SIP in mind (usually simple telephony replacement). The motivation statements tended to look like “I want to buy a phone from and have it work with my service”. The participants quickly became mired in arguments about what the tests to ensure that should cover. They discovered that they really wanted to test for a lot of end-user visible behavior that the SIP specification itself leaves undefined.

As we tried to work further through the details, we’d frequently see arguments to profile the protocol. In very early conversations, there was pressure to not require (or even penalize) the implementation of SIP over TCP or the use of the 100rel/PRACK extension, usually driven with a “nobody really does that” argument. It’s worth noting that both of those are required in many deployments today. The folks focusing on simple telephony didn’t want to be burdened with testing the parts of the protocol needed for presence and vice-versa. The business telephony oriented people had an entirely different idea of what a program should look like than the single-line replacement oriented folks.

In short, what people really wanted was a certification program for their particular application, not for the protocol itself. Unfortunately, at the time, I don’t think anyone involved realized that was the root of why such programs weren’t coming together.

With that realization, I’ve become even more convinced that a generic SIP certification program isn’t feasible – it wouldn’t produce a useful tool for making our ecosystem(s) better. The energy would be better focused on how the protocol is used rather than trying to certify implementation of the protocol itself.

There are a few new programs under discussion now, in the SIP Forum and other organizations, which are trying the approach of defining certification programs for an application. Those conversations seem to be going further than the earlier attempts, and I think some of them have a chance of succeeding.

<% Response.Write("" & vbcrlf) %>