Archive

Author Archive

Wait a Second!

June 9th, 2011by Robert Sparks under SIP

SIP has several robustness mechanisms that leverage being able to say “Wait a bit before you try that again.”

A 486 Busy Here response can contain a Retry-After header field, allowing the endpoint to say “Please don’t try to call here again for 30 minutes,” based perhaps on knowledge it obtained from its user’s calendar.

A 500 Server Internal error can use Retry-After to say “Something’s keeping me from servicing this particular request right now, but please try again in 5 seconds.”

A 503 Service Unavailable error can use Retry-After to say something much stronger: “Something’s keeping me from servicing _any_ requests. Don’t send me anything more for at least 30 seconds.” As we’ll see in a moment, this is a very strong statement – one that needs to be carefully invoked.

The SIP Events architecture provides a way for an event server (such as a presence server) to say “I’m tearing down this subscription, and I need you to resubscribe, but don’t try to do so until at least 20 seconds have passed.” It does this with a NOTIFY containing a Subscription-State header similar to this:

    Subscription-State: terminated;reason=probation;retry-after=20

These mechanisms allow servers to avoid, and even redistribute, load. A registrar handling a burst of simultaneous registrations can quickly tell some or all of them to wait, using a different wait times to spread the returning load out a little.  One node in a cluster of presence servers can move its subscriptions to its peers by throwing all of its subscriptions into probation, as described above, again using a range of different wait times for different subscriptions. As the clients re-establish their subscriptions, the mechanisms for finding SIP servers can distribute the subscriptions among the peers. 

While the mechanisms are useful in the situations I’ve described so far, and may be exactly the tools an application relying on a limited external resource like a specialized DSP needs, they aren’t sufficient to handle the general case of overload protection. The granularity the tools work at is either very small (affecting this particular method applied to this particular resource), or very large (affecting all traffic between two elements). The IETF’s SOC working group is developing richer ways to help a server avoid being overloaded.

But even with those tools, there are situations where crushing load can appear before mechanisms at the SIP layer have a chance to help. Avalanche-restart scenarios, when whole campuses or even cities full of clients all come online at the same time due perhaps to restoration of power are a good example. In the extreme, action closer to the physical layer of the network (such as using firewalls to introduce the load in smaller increments) is warranted.

Finally, like most tools, using them without understanding what they do can lead to surprising results. Any code that generates a 503 Service Unavailable response, for example, deserves careful inspection. Some early proxy implementations make the mistake of forwarding 503 responses, when they should be taking a received 503 as input into generating their own final response. By blindly forwarding a 503, they are saying “Stop talking to me” instead of “I can’t find something that can handle this request,” which leads to unintended failures, such as the following:

Here, Alice and Bob are in a SIP dialog, perhaps for a phone call. Carol and Dave are in a separate dialog, either a different call or perhaps they have a subscription set up.  

503fig1

Something goes wrong with Bob’s UA and it has to return a 503 to a request it received.  Proxy 2 does the wrong thing and forwards the 503.

503fig2

Now Carol’s next request towards Dave can’t be forwarded through Proxy 2, even though there was nothing really preventing Proxy 2 from being able to service the request.  Carol and Dave have lost service unnecessarily and have no idea why.

503fig3

Alice (or anyone else whose requests towards Dave would have taken the path from Proxy 1 to Proxy 2) can’t reach Dave either.

Proxy 2 should have returned its own response, probably a 480 Temporarily unavailable, to the request that elicited the 503 from Bob’s UA. That way only the requests Alice was sending to Bob would be affected.

SIP Crossing Multiple Transports and Address Families

April 26th, 2011by Robert Sparks under SIP

At the SIPit earlier this month, the multiparty testing uncovered a few implementations that assumed that since they only used SIP over IPv4 with protocols like TCP and UDP, they would never encounter an IPv6 addresses or tokens related to other transports like SCTP or DTLS in the SIP messages they receive. This is becoming more unlikely as more advanced transports are being deployed and as we go further into the IPv4 to IPv6
transition
.

During rendezvous, a SIP request may traverse several SIP proxies. The RFC 3263 rules for locating the server at each hop can result in different transport protocols.

For example:

fig1.cp

The Via stack and the Record-Route stack that are created as the message is forwarded will contain tokens and addresses from each of the traversed networks. Here’s what the Via and Record-Route header fields from the last message in the diagram might look like:

Via: SIP/2.0/UDP 192.0.2.1;received=198.51.100.15;rport=55021;branch=z9hG4bK8h23i5
Via: SIP/2.0/TLS proxy1.example.net;received=[2001:db8::2:1];branch=z9hG4bK3udn3z13
Via: SIP/2.0/SCTP proxy2.example.net;received=198.51.100.18;branch=z9hG4bK23sd093n42
Via: SIP/2.0/UDP proxy3.example.net;branch=z9hG4bK3s09ujnsnnen3
Record-Route: <sip:proxy1udp.example.net;lr>
Record-Route: <sips:proxy1tls.example.net;lr>
Record-Route: <sips:proxy2tls.example.net;lr>
Record-Route: <sip:proxy2sctp.example.net;lr>
Record-Route: <sip:proxy3sctp.example.net;lr>
Record-Route: <sip:proxy3udp.example.net;lr>

Note the double-record-route technique used by the proxies. For more information, see RFC 5658.

Now, these endpoints don’t need to be able to parse these tokens and addresses, at least not to the level of understanding their internal structure – they just need to be able to preserve them, following the requirements in the standards for including them in subsequent messages. This is where a few implementations ran into trouble – resulting in errors from failed parses of information they didn’t need in the first place. For instance, some IPv4-only endpoints assumed they would never see IPv6 addresses in the messages they received, so they dropped the incoming message above as malformed when parsing across the [2001:db8::2:1] address in the second Via header field value. These implementations didn’t really need to parse the value at all. They only need to reflect it, unchanged, in any response they send to the request. Following Postel’s Maxim in this case would result in higher levels of interoperability.

An earlier article went deeper into how RFC3263′s requirements provide for switching between transports like UDP, TCP, and SCTP at each hop. One thing that RFC doesn’t specify well is when to use IPv6 or IPv4 when both are available. As IPv6 becomes more available, dual-stack hosts (those able to use IPv6 and IPv4 at the same time) will encounter usability conditions that change over time. As a new IPv6 path is formed between endpoints, configuration and optimization changes may lead to IPv6 significantly outperforming IPv4 early in a given day, but not work as well (if at all) later that afternoon. How does a host choose which address family to use when it needs a new connection for a request?

Work is ongoing in the IETF to provide an algorithm answer that question. The current working document focuses on TCP and uses a weighting parameter, P, that affects what order the different address families are tried and how long to delay before trying the other family.

When P is 0, both families are tried simultaneously, each starting with the A or AAAA DNS lookup, followed by a TCP connection attempt. Whichever connects successfully first causes P to be adjusted in the direction that favors the quicker family. Positive values of P favor IPv6, negative favor IPv4. The size of the adjustments depend on the several factors, including whether the quickest family was the one currently favored.

fig2

When P is non-zero, the attempt to use the second family is delayed by 10*abs(P) milliseconds.

In all cases, when the attempts using both families succeed, the connection that was established first is kept. The subsequent connection is reset.

These ideas are still being refined – the current working draft anticipates that algorithm will be adjusted before it is proposed as a standard. For instance, there may need to be more clarity around which address family to issue the DNS queries over. There is also work proposed to specify what to do when the choices are richer than TCP over IPv6 vs. TCP over IPv4.

Maintaining Communication Protocols As Security Technologies Evolve

March 16th, 2011by Robert Sparks under SIP

There is a very visible, continuous drive to develop improved cryptographic algorithms. As time passes, increases in the understanding of the mechanics of existing algorithms and increases in raw computing power reduce the existing algorithms’ effectiveness. At some point a decision has to be made to stop using certain algorithms.

This month, the IETF released several RFCs capturing such decisions.  RFC6176 requires that TLS implementations never negotiate SSL version 2.  RFC6149 and RFC6150 retire the MD2 and MD4 message digest algorithms.  RFC6151 provides updated guidance when MD5 is reasonable to use.  

A great deal of effort is going into the evolution of the Secure Hash Algorithm (SHA) family, including a contest between the contenders for the upcoming selection of SHA-3. At some point, SHA-1 will join the list of retired algorithms.

Deployments of real-time communication protocols, such as SIP and RTP, will require maintenance as algorithms are retired. Most of the time, that maintenance is bounded to the configuration of a single subsystem, such as TLS – this is one of the benefits of using a framework that encapsulates algorithm agility.

In other cases, adapting to a change will affect more than a subsystem. SIP’s use of MD5 for Digest Authentication, for example, is widely deployed with no mechanisms to change it through configuration. As MD5 further weakens, relying on it to obfuscate a password from an eavesdropper is unwise.  Currently, this form of SIP authentication is best used only across one TLS protected hop. Moving to a different challenge-response authentication mechanism will require significant standardization and implementation effort (such as what went into Digest-AKA in RFC3310), which might better be spent on models that don’t require challenge-response, and efforts that would allow the reuse of already algorithm-agile subsystems.

Unfortunately, even when adapting is as simple as updating or configuring one subsystem, rollout can take a surprisingly long time. There are still large commercial websites that have not adopted the defense against a well-known TLS Man-in-the-middle vulnerability documented in RFC5746.  

SIP’s Timer C : How long can your phone ring?

February 8th, 2011by Robert Sparks under SIP

Recent posts describe how SIP’s INVITE and non-INVITE transactions work. These transactions are defined with very different state machines. The non-INVITE transaction is designed to complete within a fixed period of time – 32 seconds using the default values for the transaction timers. Typically, this doesn’t allow enough time for interaction with a user in determining how to respond. The INVITE transaction, on the other hand, is designed to allow the application to decide how long to wait for the transaction to complete. This allows the application to ask its user what should happen, wait for the user to notice, and eventually provide some input.

For the application of SIP most people are familiar with, this means a phone receiving an INVITE can ring until someone answers it, the caller gives up, or some other part of the system decides it’s waited long enough and makes the decision for the endpoints.

A pair of endpoints exchanging SIP directly could leave an INVITE pending indefinitely.

pendforever

They could also try to leave the INVITE pending forever with one or more proxies in the path. This forces each proxy to keep state – at least two transaction state machines per INVITE processed, more if the proxy forks the request.

proxystate

If the endpoints above collude to abandon the transaction without sending any signaling, the proxy is left holding that state potentially forever. The colluding endpoints could exchange a large number of INVITE requests and abandon them in that state. The proxy needs a way to decide that it’s waited long enough and is going to throw that state away. This is where Timer C comes in.

When the proxy forwards the INVITE, it associates an instance of Timer C to the client INVITE transaction it creates, initializing the timer to some value. If Timer C expires while the downstream INVITE transaction state machine is still pending, the proxy can force the downstream transaction to complete by sending a CANCEL. This will either stimulate a final response to the INVITE or time out.

timerc

The specifications allow a proxy to continue to wait when Timer C fires by resetting it to a new value instead of issuing the CANCEL. For the details, see RFC3261 section 16.8.

The specifications provide little guidance on what value to use for Timer C – only that it MUST be greater than 3 minutes. A proxy operator may be tempted to set the value to its smallest value given the possibility of the colluding endpoints mentioned earlier. Unfortunately, this would lead to a poor experience for many users. As Jiri recently pointed out, many calls through PSTN gateways to large IVRs (Interactive Voice Response systems) remain in the early media state until a human agent is reached – meaning many are never actually answered at all. While you are interacting with the IVR, the call is effectively “ringing”. Imagine waiting in an answered-in-the-order-it-was-received queue for 3 minutes and then having the call hang up because the proxy between you and the gateway decided things have been ringing long enough. Alternatively, imagine that you’ve been making menu selections towards a purchase, and have just entered the last digit of your credit card number when proxy makes this decision and hangs up the call. Did your order complete?

A proxy operator will need to evaluate the applications using the service and weigh their requirements against the cost of keeping pending call state when choosing a value for Timer C.

SIP Outbound

December 14th, 2010by Robert Sparks under SIP

The Session Initiation Protocol is designed using a very flexible model of how endpoints can reach each other. Two SIP user agents can enter into a call with each other without the assistance of any intermediary proxies if they can find each other. That said, most deployments are better described with variants of the SIP trapezoid described in RFC 3261. A calling endpoint sends its signaling through one or more proxies associated with its service provider. Those messages are forwarded through one or more proxies at the receiving endpoint’s service provider before being forwarded to the receiving endpoint.

outbound fig1
Network Address Translators (NATs) and firewalls between the endpoint and its service provider can prevent the base SIP protocol from working. Most NATs will only forward traffic from the outside to the inside if the traffic corresponds to a binding created by earlier traffic from the inside to the outside. Typically, TCP connections can only be made in the outward direction.  UDP traffic will not flow from the outside until one or more packets have been sent from the inside. Because of this, an endpoint sitting behind a NAT that hasn’t been generating traffic usually won’t be able to receive a call – the incoming INVITE will be blocked at the NAT.

outbound fig2

RFC 5626 provides extensions to make this better. The title of the RFC is “Managing Client-Initiated Connections in the Session Initiation Protocol (SIP),” but most implementers refer to it as SIP Outbound.

This extension allows an endpoint to establish a “flow” with a proxy on the other side of a NAT or firewall. The flow is managed with a series of keep-alive messages designed to keep NAT bindings in place, and to allow both the endpoint and proxy discover when that particular flow is no longer working. The word “flow” was chosen instead of “connection” because a flow can be established over protocols, like UDP, that do not support the notion of a connection.

Here’s how it works at a very high level. When the endpoint sends a REGISTER request to its service provider, it indicates that it supports the SIP outbound extension, providing an identifier that uniquely identifies the endpoint (a sip.instance), and an identifier that that corresponds to this registration (a reg-id). When the service provider’s proxy/registrar accepts this registration, it attaches any extra information it needs to the binding created by the registration to cause future requests that would have been routed to the Contact in the registration to be routed, instead, down whatever path the REGISTER arrived over. For instance, if the registration arrived over TCP, future requests towards the endpoint will be sent down that TCP connection. If it arrived over UDP, future requests will be sent to the address and port that the REGISTER appeared to come from.

outbound fig3

Once such a registration is established, the endpoint is expected to periodically send a keep-alive. For connection oriented protocols like TCP, this keep-alive message is a simple Carriage-Return/ Line-Feed pair (CRLF). The endpoint sends a double CRLF, and the server responds with a CRLF.  For connection-less protocols, the endpoint will send a STUN binding request over the flow.  The server will respond with a Binding Response. If the endpoint does not see the appropriate keep-alive response within a configured interval, it will begin to take corrective action.  Additionally, the registration may also negotiate a Flow-Timer – an interval during which the server expects to receive a keep-alive from the endpoint. In this case, the server may also monitor for flow failure, and attempt to take corrective action if keep-alives don’t arrive on time.

This is where the registration identifier (reg-id) the client provided comes into play.  SIP Outbound allows a client to set up multiple flows for the same sip.instance.  The service provider will send inbound messages down whichever flow was most recently refreshed.  Furthermore, SIP Outbound expects that the proxy and registrar functions at the service will be decoupled, allowing flows from the same sip.instance to be established through multiple edge-proxies.  If one edge proxy (or the network path to it) fails, the flow through another is standing by ready to carry inbound requests to the endpoint. Additionally, when the endpoint detects a given flow has failed, it will create a new one, using the same sip.instance, and reg-id, which lets the service know this new flow replaces the failed one.

 


A client uses REGISTER to create multiple flows

outbound fig6


The most recently refreshed flow carries inbound requests

describe the image


Multiple flows facilitate failure recovery

 

describe the image

This is, of course, only a high-level overview of the Outbound mechanism. The full range of features the extension creates can be found in the RFC.

SIP Outbound was released as an RFC in October of 2009. There were several implementations using the extension interoperably at the November 2010 SIPit test event.

Changing SIP’s Transaction Timers (Part 2 of 2)

November 3rd, 2010by Robert Sparks under SIP

In a recent post we covered how the SIP non-INVITE transaction state machines work together to ensure reliable delivery of a response to a request. This post covers the INVITE transaction.

The INVITE transaction has a different shape than the non-INVITE transaction. It consists of a request, a response, and an acknowledgement to that response. This 3-way handshake was created to allow the INVITE transaction to pend indefinitely (while an INVITE-ed phone rings).

The client retransmits the request until it receives some evidence that something will respond (typically through receiving a 100 Trying response). It then waits, perhaps hours, for a final response. Once a server sends a final response, it retransmits it until receipt of the response is acknowledged with an ACK message. The associated hop-by-hop state machines and application logic and the endpoints ensure that a retransmitted request always receives the same response, and a retransmitted response is re-acknowledged. Both ends have to keep state to recognize the requests and responses as retransmissions, and timers in the state machines will inform the elements when they can forget that state.

The INVITE transaction state machines are defined by RFC6026. They use a series of timers to control the retransmission of messages and to allow elements to recognize when things have gone wrong and a transaction should be considered a failure. As in the non-INVITE machines described previously, the values for these timers are set using a pair of parameters defined in RFC3261 named T1 and T2. That standard allows those values to be configured, and provides default values of T1=500ms and T2=4s.

There are really two different types of INVITE transaction. One pattern is followed when the INVITE gets a 200-class response (one indicating the INVITE has been accepted). An entirely different pattern is used when the INVITE is rejected.

invite failinvite success

The effects of adjusting the timers for the rejected INVITE case are very similar to what happens with non-INVITE transactions.  Since the error response carried and acknowledged hop-by-hop, the INVITE server and client state machines take care of reliable delivery each message, including the ACK.

The effects on adjusting the timers for an accepted INVITE are quite different. In this case, the success response is not retransmitted hop-by-hop (it may not even go through the intermediary hops). It is an end-to-end acknowledgement. The following figure shows how the endpoints recover from a series of lost 200 OKs and ACKs. Note that in this figure, the proxy P3 did not add a Record-Route header field value to the INVITE as it forwarded it, so future dialog requests (including the ACK to a 200 OK to this invite) will not go through P3. In this figure, all of the elements are using the default values for T1 and T2.

invite loss

As you can see from the figure above, moving a 200 and its ACK across several unreliable hops creates a lot of exposure to packetloss, and each loss will lead to another end-to-end round-trip retransmission. If some hop is particularly lossy, the system will need as many opportunities to recover as it can get.

This points to a problem when one endpoint is configured with a different value for T1 than the value used at intermediate proxies and the other endpoint. In the figure below, the server endpoint and P3′s interface towards the server endpoint are using T1=2s, T2=16s. P3′s other interface, and all the remaining elements are using the default values for T1 and T2.

invite mismatch

Note the effect of the different values of T1 at P2 and P3 on Timer M. Per the requirements in RFC6026, P2 will stop forwarding retransmissions of the 200 OK  for this transaction 32 seconds after seeing the first one, making the majority of the retransmissions the server endpoint sent useless. Similarly, the client will stop paying attention to retransmissions of the 200 OK 64*T1 seconds after seeing one for the first time, either absorbing them at the transaction level if they are RFC6026 compliant, or at the higher application level if they are an older implementation (RFC3261 did not require the application to keep this timer, but many implementations assumed it, reflecting the requirement for the server to declare the transaction a failure if an ACK is not received in 64*T1 seconds).

Even more so than with non-INVITE transactions, the slower retransmission schedule from the server endpoint makes it more likely that the overall transaction fails.  In general, networks that have hops with mixed values of T1 will not behave as reliably as networks that share the same value of T1 across all hops.

Changing SIP’s Transaction Timers (Part 1 of 2)

October 27th, 2010by Robert Sparks under SIP

SIP is a request-response protocol designed to run over a wide range of transports. A SIP element will send a request, and then wait some time for a response, giving up if one doesn’t arrive “soon enough”. If the element knows it’s using a transport protocol that might lose the request before it’s delivered (such as UDP), it will resend the request a few times while it’s waiting.

The specifications define when to resend that request, and when to give up waiting for a response, by defining a set of formal transaction state machines. The element sending a request starts a client state machine the first time it emits the request. It starts a series of timers at the same time. As the timers fire, the client state machine cause the request to be retransmitted. If a response doesn’t arrive in time, one of the timers will indicate it’s time for the element to give up and declare the transaction a failure. Of course, if responses arrive before that happens, the transaction state machine transitions to a state that recognizes the receipt.

There are two basic transaction patterns in SIP, each with their own state machine definitions. One handles INVITE requests exclusively. The other handles all other SIP requests (REGISTER, BYE, SUBSCRIBE, etc.). These are referred to as non-INVITE requests.

The definition of the state machines for both kinds of transaction has changed since RFC3261 was published. The INVITE transaction state machines are currently defined by RFC6026. The non-INVITE transaction state machines are mostly defined in RFC3261, but the element’s behavior is modified by RFC4320.

This post is going to focus on the non-INVITE transaction. We’ll dive into the INVITE transaction in a future entry.

The figure below represents the non-INVITE client transaction state machine. We’re not going to go through it in detail, but please note the Timers E and F in the figure. Timer E controls when the request is retransmitted. Timer F lets the element know when to give up waiting for a response.

non invite client transaction state machine

There is a companion non-INVITE server transaction state machine that the element receiving the request will run, with its own set of timers that are companions to the timers in the client machine. These machines work together to make sure the request gets to the responder and that the response gets back to the requester.

describe the image

For unreliable transports like UDP, the machines ensure this reliability using a pair of companion actions:

1) The client will retransmit an unaltered request periodically while waiting for a response

2) The server will send exactly the same answer to each copy of the request it receives. To do this, it has to remember what it sent to the original request for awhile. It can’t remember that forever, so after some time (metered using Timer J) it will declare the transaction over and forget what it’s sent.

Timers E and F are set when the request is first sent to values based on a global set of defaults named T1 and T2 in the specifications. T1 was chosen to roughly represent the RTT to the responding SIP element, and is the starting point for when a client will retransmit its request. As the client continues to retransmit, it will double the interval between retransmissions (in case it was too many packets being sent that was impairing the arrival of the response in the first place). Rather than allowing the retransmission interval to get too large to be useful, it is capped at the T2 value.

RFC 3261 sets the default value for T1 to 500 milliseconds, and T2 to 4 seconds. It is critical for correct operation that T1 and T2 be the same in the companion client and server transaction state machines.

With these values, a client not receiving a response will retransmit at .5s, 1.5s, 3.5s, 7.5s, 11.5s, and every 4 seconds after that until Timer F fires. Timer F is set to 64*T1 – using the default values this is 32 seconds.

The figure below shows a worst-case successful non-INVITE transaction using the default values of T1 and T2.

non invite loss

Now, what happens if the client and server state machines aren’t configured to use the same values of T1 and T2? Lets assume the server is using the default values, but the client is using T1=2s, T2=16s

non invite loss mix

The server element saw the 7th retransmission as a new request. This could be particularly problematic if, for instance, the request was a SIP MESSAGE containing “Buy 100 shares.”

Now consider a request that goes through a few SIP proxies. Each hop in the request is managed with its own pair of state machines. Each pair must share the same values of T1 and T2 or run into the problem we just described.

proxy machines

There is another problem if the pair of machines at one hop uses a different set of timers than the rest of the hops. In the figure below, the first hop uses the default timers while the second hop uses T1=2s, T2=16s.

non invite 2hops

The slower retransmission schedule on the second hop made it far more likely that the overall transaction fails. The number of chances the proxy had to receive the response was much smaller than what it would have been using the default timers. When the response finally arrived, the requesting client had long given up. In general, networks with hops with mixed values of T1 will not behave as reliably as networks that share the same value of T1 across all hops.

The INVITE transaction state machines share similar, but not identical issues. We’ll explore those in a future post.

How do SIP endpoints find the right servers?

September 22nd, 2010by Robert Sparks under SIP

When a SIP endpoint is ready to register with a service, it has the name of the service and the Address of Record (AoR) that it wants to register under. Both of these are constructed as SIP URIs. For example, my phone might register sip:Robert.Sparks@tekelec.com by sending a REGISTER request to sip:tekelec.com. It takes the domain name from that URI and uses it to start a series of DNS queries as specified in RFC 3263 : Locating SIP Servers.

The algorithms specified in that RFC allow an endpoint to learn what transport (UDP, TCP, TLS over TCP, SCTP) to use, and what IP address and port to send the message to. They also give the service provider tools to provide redundancy and load-leveling. Here’s a short overview of how it works:

overview resized 600

The endpoint will first make a query for all Naming Authority Pointer (NAPTR) records for tekelec.com. These records allow service providers to advertise various services. The records that are returned might look like this:

naptr

The service field contains strings like “SIP+D2U” identifying the service being advertised – the full set of strings currently defined for SIP is:

  SIP+D2T (SIP over TCP)
  SIPS+D2T (SIP over TLS over TCP)
  SIP+D2U (SIP over UDP)
  SIP+D2S (SIP over SCTP)
  SIPS+D2S (SIP over TLS over SCTP)

As new service strings are standardized, they will be registered with IANA.

The order and service fields allow the service provider to say things like “If you support it, you must use TCP” or “Try TCP first and if that fails, try UDP”. The numbers are processed from lowest to highest. Records with lower order values are inspected first. Once a record is found with a protocol the endpoint supports, it will only consider other records with that same order value. When multiple records appear with the same order value, they are considered in preference order.

Some examples:

If these records are returned, the service is saying “If you support TCP, use that. If it fails stop. Only try UDP if you don’t support TCP.”

  tekelec.com. IN NAPTR 10 50 “s” “SIP+D2T” “” _sip._tcp.tekelec.com.
  tekelec.com. IN NAPTR 20 50 “s” “SIP+D2U” “” _sip._udp.tekelec.com.

If the following records are returned, the service is saying “Try SCTP first if you support it. If you don’t or it fails, try TCP. If you don’t support that, or it fails, try UDP”:

  tekelec.com. IN NAPTR 50 10 “s” “SIP+D2S” “” _sip._sctp.tekelec.com.
  tekelec.com. IN NAPTR 50 20 “s” “SIP+D2T” “” _sip._tcp.tekelec.com.
  tekelec.com. IN NAPTR 50 30 “s” “SIP+D2U” “” _sip._udp.tekelec.com.

Let’s proceed assuming those last three records were returned, and that the endpoint I’m using only supports TCP and UPD. In this case, the endpoint will use the second of those three records, learning that it should use TCP and it should take “_sip._tcp.tekelec.com” as input into the next step.

The endpoint now queries the DNS for all the SRV records matching “_sip._tcp.tekelec.com”. The SRV records returned will have this form:

srv

The endpoint will process all records ordered by the priority field, from lowest to highest. If multiple records have the same priority, the endpoint will choose randomly from them, weighting the probability of selecting a particular record using the weight field. This gives the service provider a tool to realize a form of load distribution.

Assume the following records are returned:

_sip._tcp.tekelec.com. IN SRV   10 1 5060 crowned.tekelec.com.
_sip._tcp.tekelec.com. IN SRV   10 1 5065 crested.tekelec.com.
_sip._tcp.tekelec.com. IN SRV   10 2 6065 golden.tekelec.com.

The endpoint will randomly choose crowned.tekelec.com 1/4 of the time, crested.tekelec.com 1/4 of the time, and golden.tekelec.com 2/4 = 1/2 of the time. Lets assume the random selection chose crested.tekelec.com.  The endpoint knows to use port 5065 when it sends its request.  

Finally, the endpoint looks up A or AAAA (depending on whether it is using IPv4 or IPv6) records for crested.tekelec.com, yielding the IP address to send.

At this point the endpoint has the information it needs to send the request to the right server.

That example assumed that the endpoint was starting with a SIP URI. The RFC 3263 steps are the same whether the endpoint is preparing to send a REGISTER request or an INVITE request to start a call.

Sometimes, the endpoint starts with an E.164 formatted telephone number instead of a SIP URI. The ENUM specs define how to convert that telephone number into a URI. Once the endpoint has performed that conversion, it follows the same RFC 3263 algorithm discussed above starting with that URI to find the server to contact.

Enabling Location-Based Services while Protecting Privacy

August 19th, 2010by Robert Sparks under SIP

An increasing number of portable devices (such as cell-phones) are becoming location-aware. Services such as restaurant finders, turn-by-turn navigation tools, and social networking sites are already leveraging any location information these devices provide. So far, these services primarily use custom, proprietary mechanisms to convey location information from the devices to the application.

Standard mechanisms for representing and conveying location information have been defined. These standards recognize that carrying a simple geospatial (latitude-longitude) coordinate or a  civic address isn’t sufficient. It’s also important to indicate how this location may be used. The IETF’s GEOPRIV working group has specified a Location Object format that addresses that concern, as well as a rich policy language that allows a user to control who can see his location, and with what precision their location is exposed. There are many challenges related to privacy-protection in location systems, and providing this control over location precision is one of the tougher ones. It is difficult to design a system that doesn’t expose more information than intended.

Let’s look at a couple of examples where applications are using the location of a given user. These applications will query (or subscribe to) the user’s location service, which could be a network hosted service that communicates with the user’s location-aware devices, could be one of the actual devices.

Suppose Mary, a user in the United States, chooses an “only expose what state I’m in” policy. The simplest implementation of that policy would be to tell any asking application what state Mary’s in whenever it asks. Mary’s expectations of privacy are easily met as long has she primarily moves around within one state. But with that simple implementation, when Mary crosses a border into another state, applications learn much more than what state she’s in – they know which border she’s near. In some situations, that may be enough to deduce her location with under a mile’s worth of uncertainty. For instance, if she were to travel along US 160 from Arizona into Colorado, applications would see her location transition from Arizona to New Mexico, and then to Colorado. The interval between those transitions gives the application a good estimate of her speed, and knowing she’s traveling at highway speeds, the applications can be fairly confident she’s on 160 (there are no other roads that would allow a transition between those states with that timing).

ar co

One way for the location service to respect Mary’s privacy requirements in this case would be to obfuscate the transitions between the states in time, perhaps not exposing the brief transition through New Mexico at all.

Bob might choose a seemingly easier policy to implement – “show where I am, but only to within 100 meters”. A simple implementation of that policy would be to expose Bob’s location as a circle with a radius of 100 meters, covering Bob’s current location, but randomly centered somewhere around Bob’s actual location.

bob 4

The location service would return that circle as long as Bob’s actual location is inside it.

describe the image

When Bob leaves the circle, the location service generates a new covering circle.

describe the image

Unfortunately, if the application knows this is how the location server implements Bob’s privacy requirement, it just learned Bob’s location much more precisely than to within 100 meters. The problem is the application knows that Bob just left the old circle, so he is somewhere close to the edge of it, and is within the new circle, so the application has a very good idea of where Bob actually is.

bob 3a

So, this simple implementation is insufficient to respect Bob’s privacy requirement – a more intricate algorithm will be required. While different location servers do not need to use the same method, a well-known algorithm with good privacy preserving properties would be very valuable. Discussions of a standard algorithm to satisfy this kind of requirement are underway in the GEOPRIV working group.

<% Response.Write("" & vbcrlf) %>