As SIP networks mature (established public deployments are way beyond the one million subscriber mark), they begin to attract malicious attention as well. Reports on Edwin Pena , German “midnight attack” , and “Egyptian story”  make apparent that security, unlike in the early pioneering days of VoIP, is of paramount importance.
Of particular importance is confidentiality — the ability to make sure that no one other than the intended recipient can listen to your phone call over the Internet. Obviously, this is not trivial as the path between two parties is long, and there are many parties along the way — your network administrator, your ISP, your ISP’s upstream provider, backbone provider, anyone on the air (should you be using a wireless link), and possibly more. For many, concerns about unlawful use of legal interception facilities and industrial and political spying are also important. All of these potential interception points sit along the path of your voice packets. Confidentiality seems a battle of one against crowds.
Still, even when outnumbered, the privacy battle can be won using mathematics, if wrapped in carefully designed communication protocols. Encryption techniques and protocols have been with us on the Internet for decades and make it very hard for privacy intruders to understand intercepted IP packets.
While the existence of an encrypted phone call remains apparent, the content is perfectly hidden.
In the rest of this post, I try to forecast which technology will prevail for securing the confidentiality of VoIP calls. I’m leaving the debate about appropriateness and downsides of hard-to-break to some other post. One of the reasons is simply that I believe that good privacy mechanisms are too disruptive to be stopped, whether they are used for just purposes or not.
IETF, the Internet standardization organization, has put enormous effort over the last decade into hard-to-break protocols for conveying encrypted traffic. Particularly, SRTP (RFC 3711) came out as a suitable protocol for transporting encrypted VoIP. While the capability to convey encrypted packets is necessary, it is not sufficient. Over time it became apparent that the yet-to-be-understood piece is actually “keying,” also known as key exchange or key agreement protocols. This is the mechanism in which two or more parties can agree on a “key” for encryption/decryption even in the presence of malicious parties. A variety of protocols have been debated over the years and abandoned mostly on the grounds of complexity, which is generally considered security’s and adoption’s enemy. Eventually two dueling protocols, ZRTP  and SRTP-DTLS , remained in the play.
The zRTP protocol was invented and advocated by Phil Zimmermann, famous for his brilliant PGP privacy software for email. SRTP-DTLS has been promoted by Eric Rescorla, known for his work on TLS. While both protocols have the potential for achieving confidentiality, zRTP seems a far simpler design and, therefore, a more likely winner.
zRTP uses Diffie-Hellman (DH) key exchange to create a key when a call is being set up. The DH-keying is multiplexed in the RTP stream. That in a nutshell is all it does; its beauty and power lies in its simplicity. As a result, it does not require any enhancements to the signaling infrastructure and bypasses problems that have been troubling SIP deployments since inception — NAT traversal, in particular. As it is rather straightforward, several implementations exist today that are publicly available for use, testing and expert audits.
The competing protocol, SRTP-DTLS, has chosen a more sophisticated approach for key exchange, which uses Public Key Infrastructure (PKI). It leans on SIP negotiation features and needs further security protocols such as SIP extensions for IDentity (RFC4474) and updated Identity (RFC 4916). These extensions have so far experienced very limited, if any, adoption, presumably because of their use of PKI and excessive message integrity protection . The gain is these protocols (if used and deployed together) buy us a notion of identity. We not only know that we speak in privacy but also that a trusted provider vouches for the identity of the other party. Isn’t it comforting to know you are actually having a private conversation with a family member of yours?
While the notion of identity appears appealing, it is not clear yet if it provides any value. It requires additional security vehicles, which are not in place, and can therefore hinder deployment of the system as a whole. It still has its limits. For example it cannot reliably ensure that an office-mate won’t answer a phone call for your family member and impersonate her. In fact, the notion of identity on the Internet is still rather vague after all these years, and it will take a long time until we have a viable sort of “DNA match” for a VoIP call. In many real-world cases, use of common sense to identify a phone call peer seems “good enough”.
What remains though is the argument of complexity, which strongly speaks in favor of zRTP. We have learned in the past that simplicity is the winning ingredient. Simplicity keeps the number of boxes, configuration files, and other pieces to learn, buy and manage, at a minimum. At the same time, the low number of dependencies on other functions keeps the system simple and virtually resilient against failures, interop problems and security attacks.
Perhaps even more important from the adoption point of view is the number of parties that are needed to support the functionality. Keeping it as low as one is the key to success. You don’t have to run from one vendor to another, request commitments and roadmaps and pray they will be mutually in sync.
I personally believe that simplicity has been THE ingredient that allowed the Ethernet to win its race against Token-Ring, SMTP against X.400, and IP against ISO. Simplicity goes hand-in-hand with rapid adoption, public and proprietary implementations, field experience, public audits and subsequent improvements targeting real problems. My observation is that when simple technologies reach V2.0, their more complex (and possibly more “perfect”) counterparts still remain at stage 1.0, fighting for adoption.
In conclusion, I think that zRTP’s amazingly simple design (and simple by no way means simplistic!) will very quickly make it the confidentiality protocol of choice. Simplicity is just too appealing and disruptive to be ignored during new protocol adoption.