...and File Storage Format for the--688IT编程网

Network Working Group J. Sjoberg Request for Comments: 4867 M. Westerlund Obsoletes: 3267 Ericsson Category: Standards Track A. Lakaniemi Nokia Q. Xie Motorola April 2007 RTP Payload Format and File Storage Format for the

Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB)

Audio Codecs

Status of This Memo

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for

improvements. Please refer to the current edition of the "Internet

Official Protocol Standards" (STD 1) for the standardization state

and status of this protocol. Distribution of this memo is unlimited. Copyright Notice

Abstract

This document specifies a Real-time Transport Protocol (RTP) payload format to be used for Adaptive Multi-Rate (AMR) and Adaptive Multi-

Rate Wideband (AMR-WB) encoded speech signals. The payload format is designed to be able to interoperate with existing AMR and AMR-WB

transport formats on non-IP networks. In addition, a file format is specified for transport of AMR and AMR-WB speech data in storage mode applications such as email. Two separate media type registrations

are included, one for AMR and one for AMR-WB, specifying use of both the RTP payload format and the storage format. This document

obsoletes RFC 3267.

Sjoberg, et al. Standards Track [Page 1]

Table of Contents

1. Introduction (4)

2. Conventions and Acronyms (4)

3. Background on AMR/AMR-WB and Design Principles (5)

3.1. The Adaptive Multi-Rate (AMR) Speech Codec (5)

3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec (6)

3.3. Multi-Rate Encoding and Mode Adaptation (6)

3.4. Voice Activity Detection and Discontinuous Transmission (7)

3.5. Support for Multi-Channel Session (7)

3.6. Unequal Bit-Error Detection and Protection (8)

3.6.1. Applying UEP and UED in an IP Network (8)

3.7. Robustness against Packet Loss (10)

3.7.1. Use of Forward Error Correction (FEC) (10)

3.7.2. Use of Frame Interleaving (12)

3.8. Bandwidth-Efficient or Octet-Aligned Mode (12)

3.9. AMR or AMR-WB Speech over IP Scenarios (13)

4. AMR and AMR-WB RTP Payload Formats (15)

4.1. RTP Header Usage (15)

4.2. Payload Structure (17)

4.3. Bandwidth-Efficient Mode (17)

4.3.1. The Payload Header (17)

4.3.2. The Payload Table of Contents (18)

4.3.3. Speech Data (20)

4.3.4. Algorithm for Forming the Payload (21)

4.3.5. Payload Examples (21)

4.3.

5.1. Single-Channel Payload Carrying a

Single Frame (21)

4.3.

5.2. Single-Channel Payload Carrying

Multiple Frames (22)

4.3.

5.3. Multi-Channel Payload Carrying

Multiple Frames (23)

4.4. Octet-Aligned Mode (25)

4.4.1. The Payload Header (25)

4.4.2. The Payload Table of Contents and Frame CRCs (26)

4.4.2.1. Use of Frame CRC for UED over IP (28)

4.4.3. Speech Data (30)

4.4.4. Methods for Forming the Payload (31)

4.4.5. Payload Examples (32)

4.4.

5.1. Basic Single-Channel Payload

Carrying Multiple Frames (32)

4.4.

5.2. Two-Channel Payload with CRC,

Interleaving, and Robust Sorting (32)

4.5. Implementation Considerations (33)

4.5.1. Decoding Validation (34)

5. AMR and AMR-WB Storage Format (35)

5.1. Single-Channel Header (35)

5.2. Multi-Channel Header (36)

Sjoberg, et al. Standards Track [Page 2]

adaptive5.3. Speech Frames (37)

6. Congestion Control (38)

7. Security Considerations (38)

7.1. Confidentiality (39)

7.2. Authentication and Integrity (39)

8. Payload Format Parameters (39)

8.1. AMR Media Type Registration (40)

8.2. AMR-WB Media Type Registration (44)

8.3. Mapping Media Type Parameters into SDP (47)

8.3.1. Offer-Answer Model Considerations (48)

8.3.2. Usage of Declarative SDP (50)

8.3.3. Examples (51)

9. IANA Considerations (53)

10. Changes from RFC 3267 (53)

11. Acknowledgements (55)

12. References (55)

12.1. Normative References (55)

12.2. Informative References (56)

Sjoberg, et al. Standards Track [Page 3]

1. Introduction

This document obsoletes RFC 3267 and extends that specification with offer/answer rules. See Section 10 for the changes made to this

format in relation to RFC 3267.

This document specifies the payload format for packetization of AMR

and AMR-WB encoded speech signals into the Real-time Transport

Protocol (RTP) [8]. The payload format supports transmission of

multiple channels, multiple frames per payload, the use of fast codec mode adaptation, robustness against packet loss and bit errors, and

interoperation with existing AMR and AMR-WB transport formats on

non-IP networks, as described in Section 3.

The payload format itself is specified in Section 4. A related file format is specified in Section 5 for transport of AMR and AMR-WB

speech data in storage mode applications such as email. In Section

8, two separate media type registrations are provided, one for AMR

and one for AMR-WB.

Even though this RTP payload format definition supports the transport of both AMR and AMR-WB speech, it is important to remember that AMR

and AMR-WB are two different codecs and they are always handled as

different payload types in RTP.

2. Conventions and Acronyms

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [5].

The following acronyms are used in this document:

3GPP - the Third Generation Partnership Project

AMR - Adaptive Multi-Rate (Codec)

AMR-WB - Adaptive Multi-Rate Wideband (Codec)

CMR - Codec Mode Request

CN - Comfort Noise

DTX - Discontinuous Transmission

ETSI - European Telecommunications Standards Institute

FEC - Forward Error Correction

SCR - Source Controlled Rate Operation

SID - Silence Indicator (the frames containing only CN

parameters)

VAD - Voice Activity Detection

UED - Unequal Error Detection

UEP - Unequal Error Protection

Sjoberg, et al. Standards Track [Page 4]

The term "frame-block" is used in this document to describe the

time-synchronized set of speech frames in a multi-channel AMR or

AMR-WB session. In particular, in an N-channel session, a frame-

block will contain N speech frames, one from each of the channels,

and all N speech frames represents exactly the same time period.

The byte order used in this document is network byte order, i.e., the most significant byte first. The bit order is also the most

significant bit first. This is presented in all figures as having

the most significant bit leftmost on a line and with the lowest

number. Some bit fields may wrap over multiple lines in which cases the bits on the first line are more significant than the bits on the next line.

3. Background on AMR/AMR-WB and Design Principles

AMR and AMR-WB were originally designed for circuit-switched mobile

radio systems. Due to their flexibility and robustness, they are

also suitable for other real-time speech communication services over packet-switched networks such as the Internet.

Because of the flexibility of these codecs, the behavior in a

particular application is controlled by several parameters that

select options or specify the acceptable values for a variable.

These options and variables are described in general terms at

appropriate points in the text of this specification as parameters to be established through out-of-band means. In Section 8, all of the

parameters are specified in the form of media subtype registrations

for the AMR and AMR-WB encodings. The method used to signal these

parameters at session setup or to arrange prior agreement of the

participants is beyond the scope of this document; however, Section

8.3 provides a mapping of the parameters into the Session Description Protocol (SDP) [11] for those applications that use SDP.

3.1. The Adaptive Multi-Rate (AMR) Speech Codec

The AMR codec was originally developed and standardized by the

European Telecommunications Standards Institute (ETSI) for GSM

cellular systems. It is now chosen by the Third Generation

Partnership Project (3GPP) as the mandatory codec for third

generation (3G) cellular systems [1].

The AMR codec is a multi-mode codec that supports eight narrow band

speech encoding modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency used in AMR is 8000 Hz and the speech encoding is performed on 20 ms speech frames. Therefore, ea

ch encoded AMR speech frame represents 160 samples of the original speech.

Sjoberg, et al. Standards Track [Page 5]

Among the eight AMR encoding modes, three are already separately

adopted as standards of their own. Particularly, the 6.7 kbps mode

is adopted as PDC-EFR [18], the 7.4 kbps mode as IS-641 codec in TDMA [17], and the 12.2 kbps mode as GSM-EFR [16].

3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec

The Adaptive Multi-Rate Wideband (AMR-WB) speech codec [3] was

originally developed by 3GPP to be used in GSM and 3G cellular

systems.

Similar to AMR, the AMR-WB codec is also a multi-mode speech codec.

AMR-WB supports nine wide band speech coding modes with respective

bit rates ranging from 6.6 to 23.85 kbps. The sampling frequency

used in AMR-WB is 16000 Hz and the speech processing is performed on 20 ms frames. This means that each AMR-WB encoded frame represents

320 speech samples.

3.3. Multi-Rate Encoding and Mode Adaptation

The multi-rate encoding (i.e., multi-mode) capability of AMR and

AMR-WB is designed for preserving high speech quality under a wide

range of transmission conditions.

With AMR or AMR-WB, mobile radio systems are able to use available

bandwidth as effectively as possible. For example, in GSM it is

possible to dynamically adjust the speech encoding rate during a

session so as to continuously adapt to the varying transmission

conditions by dividing the fixed overall bandwidth between speech

data and error protective coding. This enables the best possible

trade-off between speech compression rate and error tolerance. To

perform mode adaptation, the decoder (speech receiver) needs to

signal the encoder (speech sender) the new mode it prefers. This

mode change signal is called Codec Mode Request or CMR.

Since in most sessions speech is sent in both directions between the two ends, the mode requests from the decoder at one end to the

encoder at the other end are piggy-backed over the speech frames in

the reverse direction. In other words, there is no out-of-band

signaling needed for sending CMRs.

Every AMR or AMR-WB codec implementation is required to support all

the respective speech coding modes defined by the codec and must be

able to handle mode switching to any of the modes at any time.

However, some transport systems may impose limitations in the number of modes supported and how often the mode can change due to bandwidth Sjoberg, et al. Standards Track [Page 6]

688IT编程网

...and File Storage Format for the

发表评论

推荐文章

mongodb中match多个条件

纯数字正则表达式

zipkin tagquery用法

excel匹配正则 -回复

re正则匹配之findall

热门文章

java非负整数正则表达式

js 动态生成整数范围的正则

z正整数校验规则

生成2位随机整数的正则表达式

大于等于0的整数的正则

大于指定整数的数字正则表达式

阿里云密码正则表达式

el-form 密码正则表达

js 密码正则表达式

php密码正则

excel字母正则 -回复

shell 中括号正则

sn明细正则表达式

字母对称的正则表达式

shell akw 正则表达式

hive中的正则表达式

密码数字字母符号混合 java 正则

正则数字字母组合

组织机构代码正则

8位密码的正则表达式

最新文章

mongodb中match多个条件

excel匹配正则 -回复

re正则匹配之findall

数据库正则匹配数字

ue 匹配数字正则

ireport常用正则表达式

标签列表

688IT编程网

...and File Storage Format for the

发表评论

推荐文章

mongodb中match多个条件

纯数字正则表达式

zipkin tagquery用法

excel匹配正则 -回复

re正则匹配之findall

热门文章

java非负整数正则表达式

js 动态生成整数范围的正则

z正整数校验规则

生成2位随机整数的正则表达式

大于等于0的整数的正则

大于指定整数的数字 正则表达式

阿里云密码正则表达式

el-form 密码正则表达

js 密码 正则表达式

php密码正则

excel字母正则 -回复

shell 中括号 正则

sn明细正则表达式

字母对称的正则表达式

shell akw 正则表达式

hive中的正则表达式

密码 数字字母符号混合 java 正则

正则数字字母组合

组织机构代码正则

8位密码的正则表达式

最新文章

mongodb中match多个条件

excel匹配正则 -回复

re正则匹配之findall

数据库正则匹配数字

ue 匹配数字 正则

ireport常用正则表达式

标签列表

大于指定整数的数字正则表达式

js 密码正则表达式

shell 中括号正则

密码数字字母符号混合 java 正则

ue 匹配数字正则