Session Initiation Protocol
Tuomas Nurmela
Univerisity of Helsinki
Seminar on Transport of Multimedia Streams in Wireless Internet
Tuomas.Nurmela@teliasonera
Abstract—Session Initiation Protocol, SIP, provides control-plane signaling for the IP networks. SIP enables initiating, modifying and terminating sessions for a user, while maintaining neutrality to physical media capabilities and using other protocols to negotiate these. SIP assumes that the transport layer is inherently unreliable and as such provides transport layer mechanisms. For target device discovery SIP requires the use of application layer routing.  Besides these, the protocol is extensible and has already been extended to support IETF presence framework and instant messaging. However, in order to perform in its core area, IP telephony call signaling, in regards to PSTN-IP Telephony integration, the protocol requires further work especially in the area of emergency calls.  3GPP has decided to use SIP for signaling and work is ongoing to meet 3GPP network and IP multimedia system requirements.
Keywords— SIP, SIPPING, state of standardization of SIP, SDP, session application layer routing, emergency calls
I. I NTRODUCTION
New multimedia application needs drive towards new functionalities in the IP network. This is coupled with continuing pressure to enable IP-based Internet Telephony in order to avoid having network providers to replace the aging telephony networks with new dedicated hardware. All this coincides with huge amounts of Internet fiber overcapacity for which demand is hard to find.
IETF has been trying to adjust to the new conditions. As an answer to challenges IETF has been developing new multimedia architecture. The architecture aims to be flexible enough to support various application needs as well as deployable, in order to enable incremental transfer to production by standardizing interoperability mechanisms.
The multimedia architecture capabilities in Internet and Wireless networks are closely linked to IETF efforts to develop a flexible Quality of Service architecture [6] as well as ongoing research efforts in multicast protocols [7]. However, intra-company networks or close to core networks with high bandwidth overcapacity allow limited deployment without requiring either.
Multimedia handling, with its soft real-time or hard real-time requirements depending on target of use, is complicated to do on packet switching networks, since Quality of Service cannot inherently be determined. However packet switching is essential in order to maintain a scalable, fault-tolerant system without need or resorting to expensive special-purpose equipment. The new architecture is an evolutionary step extending the current TCP/IP protocol family. It could be said its somewhat of a compromise, forcing middleware and systems engineers to choose correct combinations of the whole stack instead of just safely choosing one of the transport layer alternative protocols and be done with it like with the basic Internet TCP/IP-architecture.
The Session Initiation Protocol, SIP [19], is one of the protocols used in the IETF multimedia architecture.  The architecture includes a number of other protocols, such as Real-time Transport Protocol (RTP) [1] for transporting real-time data and providing QoS feedback, the Real-Time streaming protocol (RTSP) [2] for controlling delivery of streaming media, the Media Gateway Control Protocol (MGCP) [3] or the joint ITU-T and IETF developed Megaco, also called H.248 [4] [5], used for controlling gateways to the Public Switched Telephone Network (PSTN) and the Session Description Protocol (SDP) [10] for describing multimedia sessions.    SIP is a text-based application-layer control protocol that is mainly used to establish, modify, and terminate multimedia
Internet Telephony calls. However, SIP is not limited to either devices, supporting also pagers, laptops etc, nor to a specific call type, supporting one-to-one as well as multiparty conferences. While applications of SIP involve mainly human-to-human communication, SIP design clearly addresses the needs of a generic user, enabling anything with an address to participate.
The establishing phase supports locating the user, negotiating whether the party wishes to accept the call and what the supported and required features of the communicating parties and communication media. SIP does not define the actual session attributes, treating them as opaque payload data in order to remain independent of the communication media capabilities. Modification of session includes changing parameters of
the session, inviting additional participant to conference calls and invoking available data-plane services.
Currently SIP is intended to address multiple needs: these include IP telephony special needs such as supporting caller availability status change, emergency call connectivity as well as supporting IETF presence framework and new applications such as instant messaging. Originally this was not the case: the primary focus was audio and video conferencing over the Internet prior to carrier-grade signaling needs [55].
To manage SIP development in a controlled manner IETF currently has two main working groups (WGs): the SIP WG concentrates on basic functionality of SIP and its extensions to ensure the protocol suitability is considered in areas where it will be applied while the Session Initiation Proposal Investigation (SIPPING) WG concentrates on evaluating and prioritizing SIP special needs and multimedia requirements, documenting SIP extension requirements and forwarding these to SIP WG for standardization.  While SIP is currently at its version 2 defined in 2002 in [19], three years after the initial RFC [15], with over 100 documents (RFCs, internet-drafts and working papers) the question of managing the development still lingers in the air.
In addition to the basic SIP development, there is work done in IP Telephony WG to integrate SIP and PSTN signaling, in Geographic Location Privacy (GEOPRIV) WG to extend user location-based service to cover geographical location and in Authentication, Authorization and Accounting (AAA) WG to support and finalize SIP security issues.
Besides IETF activities, 3rd Generation Partnership Project, 3GPP, has adopted SIP as a mandatory protocol for handling signaling in IP multimedia services provided to 3G devices. This assures SIP deployment to millions of phones.
Two active forums are associated to SIP: the SIP Forum [8] promotes general awareness by providing information about SIP whereas SIP Center [9] promotes commercialization and interoperability of vendor solutions by offering technical resources, testing environments and interoperability tests.
The paper is structured as follows: Section II describes SIP protocol basic functionality including participants, messages, application-layer routing, application-layer transport mechanisms, basic flow phases and SDP.  Section III describes SIP layered design approach and protocol properties such as security, quality of service and performance as well as discussing the scope of SIP usage and additional requirements especially in the context of Emergency Call support requirements. Section IV provides a short summary on related work including description of APIs, key SIP extensions, key differences to related protocols such as ITU H.323 and Cisco SCCP and short summaries on work done to integrate SIP to PSTN and 3GPP IP multimedia systems. Section VI draws conclusions regarding SIP.
As a clarification to terminology, the paper uses call, session and conference (multiparty session) interchangeably, mostly depending on context at a given time. When not otherwise referred, [19] is used as the source.
II. B ASICS OF S ESSION I NITIATION P ROTOCOL
This section describes the typical participants to a SIP infrastructure. This is followed with introduction to SIP messaging, routing of request in session establishing and SIP transport mechanisms provide reliability to UDP-based messaging. Section concludes with basics of SDP and ends with an example of SIP flow.
A. SIP components
SIP has four logical entity types (user agents, registrars, redirect servers, proxies) and an abstract service known as the location service. SIP doesn’t define how logical entities are implemented or deployed: a SIP element can include multiple entity types. Basic network services such as DHCP (for boot-strapping) and DNS (for name-to-IP address, port transport protocol -resolution) [21] are also required. Each entity that actively participates is said to have a core, an identity. The abstract location service on the other hand is used by SIP but not defined by it. Figure 1 provides a possible configuration of SIP capable network.
Figure 1: SIP entities and basic network services
User agents  (UA) have two roles: a Client (UAC) that issues requests and receives responses and a Server (UAS) that receives requests directed to it and issues responses by either accepting, rejecting or redirecting the request. A SIP user can be represented by multiple SIP addresses, each of which can point to multiple devices. A device can be accessible through multiple SIP addresses.    The SIP address  is similar to an email address and is assumed to remain stable in relation to how it is defined: it can be given by a network provider                              (e.g. tuomas@inet.fi), be in relation to ones work role (e.g. admins@cs.helsinki.fi) or affiliated organization (e.g. tuomas@sonera.
com). The address changes the user changes the network provider, moves to another job or changes organization, not necessary when the user switches location. temporary change of location purposes, the user can have multiple SIP addresses and redirect calls to the current location.        As a SIP address can concurrently relate to multiple devices, a SIP request has to be able to fork. This is something that no other signaling protocol currently does. User agents have to be implemented in a way that they can manage multiple responses to a single request, although under normal one-to-one circumstances, they receive only a single response to requests.    Registrars  are responsible for maintaining User Agent access information based on User Agents informing on modification needs with specific request containing the SIP address and the contact IP-addresses of the devices bound to the SIP address. The registrars accept requests that are targeted to SIP addresses within its managed domain and communicate these onward to Location Service that maintains this information.    Proxy servers are intermediaries used mainly for routing requests to another target that must be closer to the final target than the proxy. Proxies also allow policy enforcement and rerouting of requests.    One way of classifying the proxies is by the location of the proxy in the path from the UA to the target UA. The closest proxy to the UAC is the outbound proxy , while the closest proxy to the target UAS is the inbound proxy . All proxies in between these two are the intermediate proxies .
Another way of classifying the proxies is the statefulness. Stateless proxies  simply forward requests and responses without actively generating new types of request and response messages. Stateful proxies on the other hand act as UASs: they respond to UAC requests with the best response out of possible UASs’, which is closest to the UAC’s requirements. To find multiple answers the stateful proxy can fork the original UA request to two or more destinations. The forking can be done in unicast or in multicast provide better support for automatic call distribution (ACD) systems. The stateful proxy groups “best” responses (i.e. responses that allow the UAC to continue session establishing process) in a response context from which it chooses the final response based on its response precedence rule-set. The proxy can cancel all non-suitable responses, i.e. errors or responses that were not selected due a better response, in order to keep state management down to a minimum in the SIP network    Stateful proxies can be further divided into call stateful proxies , which maintain state of the entire call, from the establishing to the termination of the call and transactional stateful proxies  that maintain state of at least a single request of the UAC. As such all call stateful proxies are transactional, but the reverse doesn’t apply. The concept of transaction in SIP is explained later in Section III.C.
Redirect servers manage redirecting contacts to UAs that are out of the registrars domain. Redirect s
ervers can be used to redirect callers to another SIP address in order to avoid having a SIP user know all SIP addresses of the target user. Redirections are done by a specific status-code in the reply, like in HTTP1.1 [63].    In addition to user availability, it must be remembered that proxies are the centralized component in the SIP architecture. One server is likely to handle small-to medium deployment, but multiple proxies are likely to be needed in large domains. To alleviate possible problems, the redirect servers can provide another SIP address for the UAS in order to direct the UAC to another proxy path.    Location Service  is a database that contains the SIP-address to a list of contact IP- addresses bindings.  The location service is used by proxy and redirect services to locate the UAS and by the Registrar to update UA location information. The location service also maintains user level availability and preferences as well as contact address-specific capabilities. Contact-address specific state of the device (e.g. whether turned on mute, not connected etc) is not maintained in the location service.  B. SIP Messages
SIP request and response message form resembles HTTP1.1, consisting of a request or status line, header fields and an optional entity body.
The basic SIP 2.0 defines six request methods :
REGISTER-request is used to provide location information by the UA. The method is passed
periodically to a Registrar that updates the Location Service.
INVITE-request is used to establish a session. Because the invitation can lead to a long pause the target party answers the phone, the method is linked to a separate additional reliability mechanism, provided by the ACK-request.
-method is used by the caller to confirm reliable INVITE-request exchange to UAS, somewhat analogous to TCP three-way handshake. The use of the method is independent of the transport protocol used.
OPTIONS-request enables to negotiate session options without requiring establishing of a session. This enables both caller preferences (e.g. if in a shower and a phone with a video-capability rings, one may want to turn the video-transfer off despite phone capabilities. Likewise choosing a language is typically something that can be useful for text-based communication or when calling a work role-based SIP address) and device preferences (e.g. which authentication protocol should be used, what algorithm is used for payload encoding compression etc).
CANCEL-request is used to terminate requests due request forking. The use of the method doesn’t affect an ongoing session.
BYE-request is used to terminate a session. The request is valid if he requestor has already established the session or is negotiating the establishing.
Extensions to SIP describe a SUBSCRIBE-request used to indicate interest in knowing when the party is available and a NOTIFY-request for informing of status changes [16]. Implementations supporting the additional messages can automatically handle informing a person when the called party becomes available by sending a NOTIFY-request after state modification by a REGISTER-request. In order to seamlessly work with the PSTN system, a separate PSTN-Internet Networking (PINT) Server logical entity is defined to communicate the methods to and from the PSTN and VoIP networks.    Additional extensions describe the UPDATE-request [23] that is used to modify the session either during INVITE-request exchange prior to final ACK or after the INVITE-request has resolved. However, since the UPDATE-request is not allowed to affect dialogue state (see below), specific rules apply to how it must be used. Header fields manage device caller id’s, content type, loop-prevention, packet-order handling, party identifiers, and SIP routing. Header fields can be often expressed in a compact form and don’t require a specific order (excluding internal order of stackable header fields like those used in routing). Certain header fields can contain parameters that identify extensions to SIP. While standardization of parameters provides support for interoperability between vendors, it also provides means for proprietary functionality.
For now, the most common parameter used is the tag parameter that is contained in the From- and To-header field as a random local session identifiers, that can, with the globally unique identifier in Call-ID -header field, identify a peer-to-peer relationship called a dialogue. Since a flow state cannot be established with UDP, dialogue identifiers with separate message ordering mechanism is used to help in message sequencing and proper routing as intermediate SIP elements can distinguish SIP dialogue state [19, pp.69].
Section I.C contains further information on how application-layer routing is performed with header fields. Section I.D provides information on header field used for message ordering.
Response messages are divided into provisional and final responses based on the status-code. Provisional responses (status-code 1xx) are used to indicate that the request was received and is being processed. Provisional responses enable the requesting party know that the VoIP-phone is ringing on the other end of the line.  There are no reliability mechanisms in the basic SIP for provisional responses.    Final responses (status-code 2xx-6xx) indicate a resolution of UAC request.  Final responses are divided into successes (2xx), redirects (3xx) and different types of failures (4xx-6xx) suggest trying again (possibly later) or indicate global inability to provide service.
An optional entity body enables carrying (control-plane) data. This can be used to create additional functionality by defining another protocol that conforms to the SIP request-response model. As such, entity body enables extending application of SIP beyond parameter-based extensions to SIP itself. The entity-body can use Multipurpose Internet Mail Extensions (MIME) [65] encoding to carry. The MIME message type has to be indicated through separate header fields in the INVITE-request or with separate OPTIONS-requests. Likewise, SIP can tunnel itself in the entity body. In this case it use encryption with Secure MIME [66] to ensure privacy, yet some of the header fields used for routing still need to be clear-text. The main use for entity body is carrying SDP, discussed in section I.E.
C. SIP application-layer routing and SIP mobility
SIP application layer routing includes basic routing, loop-prevention, and mobility support. The last issue is dealt from MobileIP and SIP perspective.
SIP application layer routing is required mainly in the call establish -phase. As the application layer routing forms an overlay network, SIP entities have no knowledge of the actual network layer topology or even adjacent link strain. As such, the path to a device that is, in the network layer, very close can become burdened with extra hops.
The application layer routing is independent of network layer protocol. SIP is not tied to IP addressing in any way, supporting both IPv4 and IPv6. When the parties have located each other though call establishing, the contact addresses (IP addresses) are known and no application layer routing is required.
session和application的区别
SIP response route is created in the request path. Each SIP elements adds a Via-header field, forming a sequential list of hops on the route the request has passed. The response message is routed back based on these header fields with each SIP element removing the Via-header field it inserted before forwarding the response to the next hop. Compared to many peer-to-peer application layer routing algorithms (e.g. Chord [49], CAN [50]), SIP similarly doesn’t try to do peer-to-peer for all transfer. The UAs only use it for session establishing, or more specifically for service discovery, as direct contact addresses are shared during session invitation.  Contact addresses can be cached by UAs and stateful proxies based on the expiration information.
In case the UA (original client, a redirect server or stateful proxy) wants to force a specific request path, it can define a list of Route -header fields, called a route set, that explicitly indicate the target and intermediate systems. Proxies can request to remain in the path by using Record-route header field.
Symmetric response routing  [30]is a critical extension to the routing. It allows the UAs addresses to be NATted. According to the basic SIP the UAs are expected to use public IP addresses, which are recorded with ports to the SIP message. NATted private IP addresses are not a problem, as (outside) IP-address is marked by receiver with a Via-header field received parameter if it differs from sent-by parameter (inside) IP address that is marked by sender. However, the NAT port translation is a problem since the port marked by the sender, which are typically PATted by NAT-boxes.
To work around this problem, UACs add an additional rport parameter to Via-header field in the request that is sent to outbound proxy. The outbound proxy places a (outside) PAT-port number to the rport parameter. When it receives the response, the proxy can uses the received and rport parameters to send the response to the NAT-binding. The UAC must be able receive to the same port it sent the request from. Additionally, in UDP cases, the NAT may need an additional fix to maintain the NAT-binding, since it is maintained for a minute or so as it creates no flow. Alternatively the UAC should retransmit the INVITE-request  20s frequency.  Loop-prevention [19, pp.173] in the application-layer is an optional feature in SIP. This is done with a hop count kept in Max-Forwards -header field that is decremented by one by each entity in the path from the UAC to the target. The default value for Max-Forwards is 90, that is estimated to cover large SIP deployment
s. Mobility support is limited in SIP as contact address negotiation is not meant to be ongoing or be done during a call. To further look at mobility in SIP context we divide the needs to mobile IP managed mobility and session layer SIP mobility.
Mobile IPv4 [67] [68] uses a home agent to represent the user to the network. The home agent tunnels IP packets sent to the mobile node to a foreign agent that is located in the visited network. The foreign agent forwards these to the care-of-address allocated for the mobile node. Packets from the mobile node are routed through the foreign agent directly towards the target host, creating what is called triangle routing.
Triangle routing adds additional complexity to SIP application layer routing. Since the UAS device (e.g. laptop) has been registered to the home agent and the home agent manages the Mobile IP connection, the location service directs all incoming calls to the home agent, which in turn tunnels the connection. SIP UAC will always see the Home Agent as the UAS address. Likewise, the SIP UAS would direct INVITE-request replies to the home agent proxy. Mobile IPv4 might be required the visited network firewalls only permit tunneling to a foreign agent address in the network or the visited address has no SIP entities at all, similar to UA C in Figure 2.
HomeAgent
Figure 2: SIP entities and MobileIP enabled SIP host
SIP mobility [40] [37] [38] can be divided into roaming mobility, personal mobility, session mobility and service mobility. Some of these mobility scenarios can be divided into pre-call mobility, which is the situation when mobility happens prior to the session being established and mid-call mobility, where the session has been established and the user has to be able to maintain the session during mobility. Mid-call mobility is currently an open issue under investigation. Pre-call mobility in each of the cases is done with SIP redirection. All approaches require that the visited network is SIP aware.
Roaming mobility is the situation when a user is not in a home network. To enable pre-call mobility, the UA, after address resolution through DHCP, registers to the visited network registrar. After this, it registers to the home network registrar through the visited network outbound proxy and the home network proxy. To avoid this double registration, the visited domain registrar could do the registering on behalf of the UA or the administrative domains of the Registrars could be combined. The first option requires further specification of SIP.
Personal mobility refers to users ability to redirect calls to any user device. This basically refers to how SIP deals with addressing as discussed in II.A.
Session mobility is changing from a VoIP phone to a SIP capable mobile phone because one is leaving the office. This can done in a preplanned manner or occur due because the battery of phone died out, which would require automatically initiated recovery, which SIP doesn’t support. There are at least three ways in which session mobility can be worked into SIP: the original sender issues a new INVITE-request to the same address and it is transferred to the new devices contact addresses and negotiated normally. This requires sending party intervention and of course the apps must be able to send to multiple destinations. Another approach is third party call control (3pcc), in which the receiver sends an INVITE-request to the new device indicating other party’s para
meters. A third approach could be a REFER-request the other UA, indicating the new target to which a session should be negotiated. These require full use of the current device until the tear down of the connection.
Service mobility is about keeping ones personal services (e.g. calender, buddy list etc). Service mobility is related to extending SIP to cover signaling between different types of services and maintaining these. REGISTER and NOTIFY messages are one part of the solution. Service signaling is still under investigation.
Session re-establishing by quick re-negotiation created with a proactive make-before-break –mechanism is not supported in SIP or SDP. Disconnection could happen for a number of reasons including temporary local network failure and temporary device (e.g. cell phone) failure. SIP and SDP probably see this as an implementation issue, since the protocols avoids making assumptions regarding UA capabilities.
D. SIP application-layer transport mechanisms
SIP is typically transported on top of UDP to avoid TCP handshake delay although TCP support is also mandatory. The default port for UDP and TCP is 5060 with TLS [69] [70] encrypted TCP in 5061.
SCTP support is still in draft state [33], though it is not currently further developed. It should be noted that due to application layer routing, SIP transport-layer protocol choice is not an end-to-end but a per application layer hop decision. Even though the original UA uses UDP, proxies may use another transport protocol.
Basic SIP has multiple mechanisms to provide additional application-layer transport mechanisms to overcome UDP problems. These include packet message reliability, congestion management and message ordering. SIP has no mechanisms for fast session re-establishing to support recovery from connectivity failure. Additionally SIP supports message-level multi-homing, however, this requires taking into account connection reuse and symmetric response routing issues and is as such scoped out.
Reliability is handled in two different ways, depending on whether the protocol is using an INVITE- message or other than INVITE-request.
For INVITE- message, since it can take a while before the phone is answered, the entities send provisional responses to notify the UAC that call is being processed. In addition to this, the final response is separately ACKed by the requestor. The UAC has a retransmission timer that is initially t
he estimated RTT, defaulting to 500ms. This grows exponentially. Besides the UAC retransmission timer, the overall INVITE-request resolution has a 3 minute timer, after which proxies automatically timeout the connection.
For other than INVITE-request on top of UDP, every request is retried if no answer is received within the UAC retransmission timer. For other than INVITE-requests, the timer is 64*estimated RTT, also grown exponentially with retransmission needs.
While the reliability mechanism seems to be necessary, it should be noted that if used in mobility

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。