SIP-I and SIP-T Challenge: SIP Forking
This post continues the
series on SIP-I and SIP-T deployment challenges. You may wish to read
the Introduction to SIP-I and SIP-T post for some general background on these two
protocols before continuing.
One of the most powerful
features built into the core of the SIP protocol is called “forking.” Forking
allows any SIP proxy to send an inbound request – such as an INVITE request –
to more than one destination. It can send these multiple requests either all at
once, sequentially, in groups, or use any arbitrary combination of those options.
This feature allows the
implementation of services such as “find me, follow me,” parallel ringing,
delivery of instant messages to multiple devices, and several other interesting
capabilities.
When SIP forking occurs
during session establishment, the INVITE messages involved in setting up the
call actually travel all the way to the called party’s devices, and establish a
protocol relationship directly between the calling device and the called
devices.
The reason this was built
into the core of the SIP protocol is that, unlike many other technologies used for
real-time communication, SIP inherently supports the concept of having a single
user potentially available via several devices simultaneously. Callers are
generally interested in contacting a user, not a device – so, to support
mapping from one user to several devices, we decided to inherently provide functionality
for contacting several devices.
While it is immensely
useful, SIP forking has proven to be one of the most difficult challenges we
face when developing SIP protocol extensions in general. SIP-I and SIP-T are no
exception: forking causes problems for both signaling and for audio.
The signaling problem
arises from the fact that ISUP and BICC have no inherent protocol behavior that
is analogous to SIP forking. Implementation of parallel ringing services in an
ISUP network requires termination of the call at an application server, which
re-initiates the call towards the various target devices. So, for example, if a
parallel ringing call alerts three devices, there are four ISUP calls involved:
one from the caller to the application server, and one from the application
server to each of the three devices. There is no direct relationship, from an
ISUP perspective, between the caller and the devices.
Consider the case in which
a SIP-I or SIP-T call arrives at an ingress gateway, and is forked by a
SIP proxy to two different egress gateways. The messaging looks something like
this (I’ve omitted PRACK transactions for the sake of clarity):

The INVITE messages sent
to the two egress gateways will contain the IAM message that started the call, which
is sent to both of the called end offices. (Note that the egress gateways will
adjust the called party number in the IAM according to the SIP URI in the
INVITE, so it will end up indicating the two different devices the call is
being sent to).
Assuming that both of the
called devices are available, both egress gateways will receive ACM messages from the called end offices,
which get mapped into SIP “180 Ringing” messages. Both of these messages arrive
at the ingress gateway. The first one that arrives – message 5 in the above diagram
– will have its ACM extracted by the ingress gateway, and sent back towards the
calling party. However, the gateway must be careful not to send the second ACM
(from message 7) into the ISUP network: doing so would be a protocol error,
which would cause the calling end office to tear down the call.
Depending on how much ISUP
signaling occurs prior to the called party answering, there may be several
tunneled ISUP messages that arrive at the ingress gateway while the SIP forking
is still active. The ingress gateway is responsible for taking the two
different streams of ISUP messages and converting them into a coherent set of
messages for the calling party’s end office. This can be tricky to get right,
and any errors will cause the call to fail.
The media-related issue
with forking arises from the difference between when SIP expects media to start
flowing and when ISUP expects media to start flowing (see my
earlier entry about early media for a summary of the general issue).
Forking makes this problem much more difficult, since there can be more than
one media stream present. If both media streams are simply ringback, it doesn’t
typically make much difference which one the ingress gateway plays. But there’s
no way for the ingress gateway to know ahead of time what might arrive in the
media – it could contain ringback, an announcement, or even playout of an IVR
menu.
To further complicate
matters: if the gateway elects to play the media stream from one gateway, but
the call is answered by another gateway, the called party’s media won’t be
played out immediately. This will clip off the beginning of whatever the called
party says upon answering the phone. Even worse, it isn’t always possible to
tell which media stream belongs to which call, which means the gateway might
have to wait for the “incorrect” media stream to completely stop before it can
switch over to the proper stream. Since the only way to detect the end of an
RTP stream is via timeout, it may be a full second or longer between the called
party answering and the media being established.
Unfortunately, neither
SIP-I nor SIP-T provides guidance for handling the issues that arise from
forking. Implementations are left to handle the problems how they best see fit.
And, in many cases, there aren’t any good answers.
No related posts.


I was having problem with my proxy server on this morning and using SIPp in a forking scenario. This really helped me out alot. Now I just need to put together the logic that will allow me to simulate this call flow.