Multiparty Communications in SIP: A Brief History
A lot of the industry focus in SIP pertains to two-party interactions: a calling party and a called party; an instant message sender and an instant message recipient; and so on. But in the IETF, we’ve actually done quite a bit of work to facilitate group communications.
In fact, the SIP effort originated in the IETF’s Multiparty Multimedia Session Control (MMUSIC) working group. This work traces its roots all the way back to 1992, when the first Conferencing Control (CONFCTRL) BOF met at IETF 25, which was, itself, spawned from the Remote Conferencing Architecture (REMCONF), work that started shortly before that.
Despite this long history of conference-oriented working groups in the IETF, a lot of the thought around how to facilitate communications with three or more parties in SIP didn’t start in earnest until a decade later, with the introduction of a conferencing framework that would eventually be published as RFC 4353. This document spurred some work on preliminary conference control within the SIP protocol itself, using existing SIP tools such as REFER (RFC 3515), “Join” (RFC 3911), and “Replaces” (RFC 3891) to control conference servers. This SIP-based conference control is published as RFC 4579. Its companion document, RFC 4575, defines a means for SIP clients to learn certain advanced facts about the state of an ongoing conference, such as the list of participants and the types of media in use in the conference.
At a very high level, with SIP conference control, conferences spring into existence when users send an INVITE to a URI that corresponds either to a predefined list of users, or to a special URI (a “factory” URI) that creates a new ad-hoc conference. The conference creator can then send SIP “REFER” requests to the conference to add or delete users to and from the conference.
As work on the SIP conference control progressed, two facts became clear: first, the effort to develop a comprehensive conference control protocol could easily become far too large for the SIP working group to tackle while doing other work; and second, the use of SIP to provide more advanced conference control was an increasingly ill fit. As a consequence, the IETF formed the Centralized Conferencing (XCON) working group in 2003. XCON was chartered with creating a new protocol for the purpose of creating and controlling multi-party conferences.
Although initial interest in the XCON work was high, it took considerable time to rationalize the various conflicting approaches that were proposed into a unified system. Some proposals strove for simplicity, while others wanted to define arbitrarily complex systems for describing media mixing and video panel layouts. Some wanted syntactic manipulation of documents representing conference state, while others wanted semantic operations on objects representing conference participants and conferences themselves. In hindsight, this level of conflict isn’t surprising; the problem was identified as early as 1993 in the CONFCTRL notes from IETF 26: “It is difficult to design a CONFCTRL protocol that balances simplicity with a high degree of semantic flexibility, e.g., Jack Jansen concluded that different conferencing styles require entirely separate CONFCTRL protocols.”
While XCON plugged away at its chartered work, the SIP working group (and related groups like SIPPING and SIMPLE) moved forward with several extensions that relate to multiparty communications. RFC 4662 defined a mechanism for subscribing to resource state for several resources at the same time (e.g., to learn presence information for a list of friends all at once). RFC 4825 (XCAP) and RFC 4826 (the resource list XML format) defined the means to create, manipulate, and delete the members of a list, providing users the ability to dynamically change the users that a conferencing URI corresponds to.
Moving beyond these long-lived lists, later work within SIP allowed users to actually send the list of relevant URIs in the request itself, using a framework known as “URI-List Services” (RFC 5363). This framework, along with RFC 5364, defines the syntax for conveying lists of URIs in SIP message bodies (using multi-part MIME bodies) and for tagging copy control attributes (equivalent to “To,” “Cc,” and “Bcc” in email). The framework has been defined for operation with MESSAGE (RFC 5365), INVITE (RFC 5366), SUBSCRIBE (RFC 5367), and REFER (RFC 5368) so far.
URI-list services for MESSAGE allows users to send a single instant message to a special “message exploder” URI, and have that exploder copy the message to all the users listed in the URI list. The INVITE and REFER extensions allow users to apply an action to multiple conference participants when using RFC 4579 mechanisms. And the SUBSCRIBE extensions allow users to subscribe to the presence state for several users at once, without first creating the list of users with XCAP.
Of course, with the ability to send instant messages to many users at once, or to make many phones ring at the same time, comes the potential for abuse. To mitigate this, the URI-list services were published in conjunction with a consent framework (RFCs 5360, 5361, and 5362). Effectively, these consent protocols allow server operators to provide an opt-in experience for users named in URI-list services requests.
Meanwhile, XCON has been making steady and solid progress, and has finally sent its key deliverable – the conference control protocol itself – to the IESG for evaluation and publication as an RFC. At the same time, the SIP instant messaging and presence working group (SIMPLE) is nearing completion on a document that defines specific behavior for text-chat-room conferences. I expect both of these to reach RFC status some time in 2011.
However, even as this work winds down, new work is spinning up in the IETF for controlling some additional media-related aspects of conferences. Specifically, an as-yet unnamed working group is in the process of being chartered for the full-immersion conferences commonly referred to as “telepresence.” The general idea of the work to be taken on is described in the teleconference use case document, with specific proposed deliverables defined in the currently proposed working group charter.
No related posts.

