Archive

Archive for November, 2010

Generic Bootstrap Architecture

November 30th, 2010by Ben Campbell under SIP

Readers are probably familiar with the use of the Authentication and Key Agreement (AKA) mechanism in IMS. An IMS mobile device, or UE, has a ISIM or USIM that shares a secret with the 3GPP Authentication Center. It uses the AKA-digest mechanism in SIP to authenticate with its S-CSCF, and as a side-effect, generate the keying material for an IPSec ESP security association between the UE and the P-CSCF. But what about non-SIP applications? How would a carrier authenticate and authorize applications over other protocols, such as HTTP?

Enter the 3GPP Generic Bootstrap Architecture (GBA). The GBA is part of the Generic Authentication Architecture (GAA). UEs can use the GBA for shared-secret based authentication. The GAA also includes a parallel architecture for Public Key Infrastructure (PKI) based authentication, called Support for Subscriber Certificates (SCC). GAA is described in 3GPP TR 33.919. GBA is described in 3GPP TS 33.220 .

GBA is a generalization of IMS AKA. It differs from IMS AKA authentication in that it supports arbitrary application protocols, while IMS AKA only supports SIP. For example, GBA could be used to authentication Web-based applications, or even access to an XDM, over HTTP. It could be used to authenticate email access. It could even be used for SIP applications, although since IMS already uses AKA over SIP, the author does not expect to see GBA authentication in IMS applications anytime soon.

The figure below illustrates GBA (thanks to my associate Ajay Deo for the original picture on which this one is based).

Generic Bootstrapping Protocol

At its core is the Bootstrapping Server Function (BSF). The BSF implements four primary interfaces:

Ub – HTTP interface to the UE

Zn – Interface to the Network Application Function (NAF). This interface can be either Diameter or web services (SOAP/HTTP).

Dz – Diameter interface to the Server Location Function (SLF)

Zh – Diameter interface to the Home Subscriber Server (HSS)

The Network Application Function (NAF) is an abstract placeholder for an application server for some arbitrary application protocol.

At a very high level, GBA operates as follows: The UE attempts to invoke some application at the NAF. Let’s imagine that this application is based on HTTP. The NAF then returns an indicator that the UE must first authenticate with the BSF. The UE then performs an HTTP digest-AKA handshake with the BSF. The BSF checks with the SLF to find the HSS for the given user, then retrieves the authentication vector (AV) and GBA User Security Settings (GUSS) for that user from the HSS.

Assuming success so far, the BSF derives an application specific key (Ks_NAF) from the AV, and constructs a Bootstrapping Transaction Identifier (B-TID), and hands the B-TID (but not the Ks_NAF) back to the UE. Since the UE’s USIM or ISIM contains a copy of the same master key that the BSF used to derive Ks_NAF, it can reconstruct that on its own.

The UE then tries again to communicate with the NAF, this time using B-TID and Ks_NAF as its credentials. The NAF recognizes B-TID, and queries the BSF to get the Ks_NAF value associated with that B-TID, as well as relevant parts of the GUSS. The NAF can then complete the authentication of the UE using its application specific portico. (For example, if the application protocol is HTTP, this could be a normal (as in non-AKA) HTTP Digest authentication.)

The NAF does not have to be in the same network as the BSF. This could be extremely useful if carriers outsource services to the cloud, but wish to retain control of user identities. In the roaming case, GBA calls for a “Zn Proxy” that lives in the visited network, and intermediates communication to a BSF in the home network.

There are a few variations on this theme. The 3GPP GBA definition defines a Zh interface that the BSF can use to communicate with some legacy device in the HSS role that does not support Zh, such as an HLR or a legacy HSS. The Dz interface is optional for cases where the BSF talks to a single HSS node for all users. 3GPP2 has defined a version of GBA that can use an IS-41 interface to an HLR, or a RADIUS interface to a AAA service.

Early Media or Late Charging?

November 22nd, 2010by Jiri Kuthan under SIP

In today’s article, I would like to address something that is frequently confusing to SIP newcomers: the concept of early media. Early media is about exchanging voice before the call actually happens – but isn’t the call actually happening once you begin to hear each other? What then is this feature good for?

That’s a question which is not rhetorical because it introduces a bunch of protocol traps. For example, a call can be forked in SIP to multiple destinations. They can start exchanging early media with the caller, and caller’s phone may totally confuse the caller by reproducing multiple conversations in parallel. The network may be confused, as well, because the call setup under the early media is neither completed nor declined. The early media keeps “setting up” the call and resources remain allocated in a server with no real impact on the service. One could misuse this behavior to mount a DoS attack on the server, or have an endless “early media” conversation between two cooperating parties. So why are we having this?

This artifact can only be understood with knowledge of the SIP history and the effort to mimic the PSTN in SIP. Particularly, PSTN queuing announcements (please wait until operator is available) and gateway interoperability have been the most frequently debated cases for which the notion of early media has been introduced to SIP. It is not really out of signaling necessity though. A call could be declared technically as already established during the initial announcements, despite that this phase of a call is worthless to the caller. This way the call setup would complete earlier, occupy less network resources and shorten forking race condition window.

However, that’s not the way billing works in the PSTN model. Billing is frequently postponed to the moment when a caller gets a “real service,” such as a human representative of an airline. The SIP standard has chosen to mimic this model in the IP environment. Said shortly, “early media” is as troublesome its side effects appear.

As a colleague in the IETF has mentioned humorously, “It should have been named ‘late charging.’”

VoIP Interconnection: NNI vs. UNI

November 10th, 2010by Dorgham Sisalem under SIP

In my last blog entry, I discussed why VoIP peering is needed and what is slowing its introduction. In this sequel, I will take a closer look at some of the issues operators need to deal with once they decide to peer.

SIP was designed to work in a similar manner to email services. That is, a caller that wants to reach a callee either sends its requests to the server–proxy responsible for the callee or to its own proxy. The caller’s proxy then forwards the request to the callee’s proxy. DNS is used to discover the IP address of the involved proxies. While this model was rather successful with email services, VoIP providers decided that such a model is too open for their needs. Allowing users to access network components such as proxies, media servers or PSTN gateways was deemed to be too insecure. This was the moment for the SBC providers that introduced session border controllers that separate the end users from the VoIP service provider. SBCs terminate the SIP sessions of the users and establish new ones to the operator’s servers.

peering final resized 600

Figure 1 UNI vs. NNI scenario

When it comes to peering, operators show a similar reluctance to allow other operators to be able to send traffic directly to their servers and gateways. From a high level point of view one might ask, “Why not just use SBCs on the network to network interface (NNI) as it was done on the user to network interface (UNI)?”

I would say that the main difference between the UNI and NNI stems from the traffic characteristics and security requirements. At the UNI, SBCs are usually located as close as possible to the users, and, hence, there are many of them, with each box dealing with a low amount of traffic.  In contrast, operators will only have a couple of peering points to other operators and will route a high amount of traffic through these points. Besides scale, the kind of traffic control needed at the NNI is different from that needed at the UNI. While SBCs need to keep local registrations and support user authentication, at the NNI border components need only to worry about call signaling, and no user relevant processing is needed. Further, the media compression styles to be supported between two operators can be negotiated beforehand in service level agreements (SLA). Therefore there is less of a need to worry about all compression styles, and the need for transcoding is less urgent than in the case of UNI, where an operator does not know the transcoding supported by user devices. Further, even if transcoding was needed at the NNI, the operator can use dedicated servers with special hardware at the peering points and does not have to equip each border component with this expensive hardware.

From the security point of view the concerns are also different. SBCs need to prevent user fraud, ensure that users’ behavior conforms to the operator’s policy, and protect the network from malicious traffic. At the NNI side the concern is more about ensuring that SLA’s are respected, filtering unwanted traffic, and ensuring the interoperability between the SIP components of the peering partners.

Besides the security and traffic issues, border components at the NNI are expected to provide features that are not necessarily needed by an SBC. This includes providing CDRs to enable billing between operators, as well as flexible routing mechanisms that allow the NNI box to forward incoming calls directly to a gateway or application server. SBCs at the UNI are more relay points between the users agents and the operator service platform, which is responsible for the routing and the CDR generation.

So in short, while the requirements of the UNI and NNI might seem rather similar at first, there is actually a need for border elements at the NNI that differ in their feature set and architecture from the SBCs used at the UNI. While at the UNI lower scale devices with more complex SIP processing capabilities and user oriented logic is needed, at the NNI high scale servers are needed that support flexible routing, denial of service protection and SLA management.

Changing SIP’s Transaction Timers (Part 2 of 2)

November 3rd, 2010by Robert Sparks under SIP

In a recent post we covered how the SIP non-INVITE transaction state machines work together to ensure reliable delivery of a response to a request. This post covers the INVITE transaction.

The INVITE transaction has a different shape than the non-INVITE transaction. It consists of a request, a response, and an acknowledgement to that response. This 3-way handshake was created to allow the INVITE transaction to pend indefinitely (while an INVITE-ed phone rings).

The client retransmits the request until it receives some evidence that something will respond (typically through receiving a 100 Trying response). It then waits, perhaps hours, for a final response. Once a server sends a final response, it retransmits it until receipt of the response is acknowledged with an ACK message. The associated hop-by-hop state machines and application logic and the endpoints ensure that a retransmitted request always receives the same response, and a retransmitted response is re-acknowledged. Both ends have to keep state to recognize the requests and responses as retransmissions, and timers in the state machines will inform the elements when they can forget that state.

The INVITE transaction state machines are defined by RFC6026. They use a series of timers to control the retransmission of messages and to allow elements to recognize when things have gone wrong and a transaction should be considered a failure. As in the non-INVITE machines described previously, the values for these timers are set using a pair of parameters defined in RFC3261 named T1 and T2. That standard allows those values to be configured, and provides default values of T1=500ms and T2=4s.

There are really two different types of INVITE transaction. One pattern is followed when the INVITE gets a 200-class response (one indicating the INVITE has been accepted). An entirely different pattern is used when the INVITE is rejected.

invite failinvite success

The effects of adjusting the timers for the rejected INVITE case are very similar to what happens with non-INVITE transactions.  Since the error response carried and acknowledged hop-by-hop, the INVITE server and client state machines take care of reliable delivery each message, including the ACK.

The effects on adjusting the timers for an accepted INVITE are quite different. In this case, the success response is not retransmitted hop-by-hop (it may not even go through the intermediary hops). It is an end-to-end acknowledgement. The following figure shows how the endpoints recover from a series of lost 200 OKs and ACKs. Note that in this figure, the proxy P3 did not add a Record-Route header field value to the INVITE as it forwarded it, so future dialog requests (including the ACK to a 200 OK to this invite) will not go through P3. In this figure, all of the elements are using the default values for T1 and T2.

invite loss

As you can see from the figure above, moving a 200 and its ACK across several unreliable hops creates a lot of exposure to packetloss, and each loss will lead to another end-to-end round-trip retransmission. If some hop is particularly lossy, the system will need as many opportunities to recover as it can get.

This points to a problem when one endpoint is configured with a different value for T1 than the value used at intermediate proxies and the other endpoint. In the figure below, the server endpoint and P3′s interface towards the server endpoint are using T1=2s, T2=16s. P3′s other interface, and all the remaining elements are using the default values for T1 and T2.

invite mismatch

Note the effect of the different values of T1 at P2 and P3 on Timer M. Per the requirements in RFC6026, P2 will stop forwarding retransmissions of the 200 OK  for this transaction 32 seconds after seeing the first one, making the majority of the retransmissions the server endpoint sent useless. Similarly, the client will stop paying attention to retransmissions of the 200 OK 64*T1 seconds after seeing one for the first time, either absorbing them at the transaction level if they are RFC6026 compliant, or at the higher application level if they are an older implementation (RFC3261 did not require the application to keep this timer, but many implementations assumed it, reflecting the requirement for the server to declare the transaction a failure if an ACK is not received in 64*T1 seconds).

Even more so than with non-INVITE transactions, the slower retransmission schedule from the server endpoint makes it more likely that the overall transaction fails.  In general, networks that have hops with mixed values of T1 will not behave as reliably as networks that share the same value of T1 across all hops.

<% Response.Write("" & vbcrlf) %>