from: Teleconferencing - Introduction, Services and requirements, Teleconferencing standards, Itu-t h.323 – packet-based multimedia communications systems, Call signaling and control, Audio
Shervin Shirmohammadi
University of Ottawa, Canada
Jauvane C. de Oliveira
National Laboratory for Scientific Computation, Petropolis, RJ, Brazil
Definition: Teleconferencing is an aggregation of audio conferencing, video conferencing, and data conferencing, and includes multiple participants in a live real-time session.
Teleconferencing (a.k.a. Multimedia Conference Services) consists of a live real time session between multiple participants with the ability to hear and see each other as well as share data and applications. Alternatively, teleconferencing can be thought of as an aggregation of
audio conferencing, video conferencing, and
data conferencing (or application sharing). Although a subject of interest for many years, teleconferencing has recently grabbed a lot of attention due to current economic and social trends, which emphasize the need for rich media communication between geographically-distributed people. Economic incentives include cutting traveling costs, as well as reducing security risks, and increasing worker availability, whereas social incentives are caused by the higher expectations from technology and experience of ordinary users of today.
Figure 1 shows a sample teleconferencing session in the
JETS 2000 system. The participants in this session can see and hear each other, in addition to being able to do whiteboarding and application sharing. It should be noted that teleconferencing is, by nature, a
live and therefore
real-time multimedia application. Offline communication paradigms such as blogs, chat boards, and email are not considered to be part of teleconferencing.
Services and Requirements
A typical multimedia teleconferencing system should provide the following services:
- audio conferencing
- video conferencing
- data conferencing
- control and signaling
Since these services are provided in a live environment, communication lag and deficiencies such as delay, jitter, packet loss, and lack of sufficient bandwidth adversely affects the execution of the teleconferencing session. This is particularly significant in networks that don’t guarantee quality of service, such as the Internet.
Audio and video are medium which are continuous by nature. As such, they both suffer from network lag. However, it is a well-known fact that, from a human perception point of view, audio is affected more adversely than video in the presence of network lag. For example, if a given video frame is delayed, one can simply repeat the previous frame until the new one arrives. This causes some “unnaturalness” in the video, but it is acceptable for all practical purposes if the repeat duration is not too long. For audio streaming; however, if audio samples are delayed, they are either replaced by undesirable silence, which become especially irritating when happening in the middle of a word, or they are replaced by the last samples available until new ones arrive, which causes noise. In both cases, the flow of audio not only becomes unnatural, but also the conversation becomes incomprehensible. Under less than desirable network conditions, participants in teleconferencing do experience such problems. In order to mitigate this undesirable phenomenon, it is common to buffer a given amount of audio and video so that there is something available for playback in the event that delays occur. Naturally this buffering introduces further delay and needs to be limited, especially in the context of a live session. To accommodate the transmission of audio and video over the network, protocols such as the
RTP (Real-time Transport Protocol) are used.
Teleconferencing Standards
In any field, standards are needed for compatibility: allowing products and services offered by different vendors to interoperate. Teleconferencing is no exception. In fact, the need for standards in teleconferencing is most crucial due to the large number of network operators, product vendors, and service providers.
The International Telecommunications Union – Telecommunications Standardization Sector (ITU-T) has created a large set of recommendations that deal with teleconferencing, encompassing audio, video, data, and signaling requirements. The
ITU-T F.702 recommendation describes what is known as Multimedia Conference Services. It defines the terminology used in multimedia conferencing, as well as a description of the service with a functional model, configuration, and roles of participants, terminal aspects, applications, and additional services. There is a set of recommendations which are typically umbrella standards, each containing a number of other recommendations specifically for audio, video, data, and signaling. The most widely-used of such umbrella standards for teleconferencing on the Internet is the
ITU-T H.323.
ITU-T H.323 – Packet-based multimedia communications systems
The H.323 standard provides a foundation for real time audio, video and/or data communications across packet-based networks, such as the Internet. Support for audio is mandatory, while data and video are optional. By complying with the H.323 recommendation, multimedia products and applications from multiple vendors can interoperate, allowing users to communicate with ensured compatibility. H.323 is specifically designed for multimedia communications services over Packet Based Networks (PBN) which may not provide a guaranteed Quality of Service (QoS), such as the Internet.
The standard is quite broad in its scope. It encompasses stand-alone devices, embedded devices, point to point, and multipoint conferences. It also covers issues such as multimedia management, bandwidth management, and interoperation with various terminal types.
As an umbrella standard, H.323 specifies other standards for audio, video, data, and signaling. Figure 2 shows an example of an H.323 recommendation compliant system, illustrating audio, video, data, and signaling, and their relation with
ITU-T standards and the
TCP/IP protocol stack.
Call signaling and control
Call signaling and control is concerned mostly with setting up, maintaining, and taking down connections between teleconferencing parties. In addition, the recommendations define a
Gateway: an optional element that provides many services, mainly used for translating between H.323 conferencing endpoints and other terminal types. This includes translation between transmission formats (H.225.0 to H.221) and between communications procedures (H.245 to H.242). The Gateway also translates between audio and video codecs and performs call setup and clearing on both the packet-based and the switched-circuit network side. Some of the relevant standards for signaling and control are:
- H.225.0 – Call signaling protocols and media stream packetization for packet-based multimedia communication systems.
- H.235 – Security and encryption for H-series (H.323 and other H.245-based) multimedia terminals
- H.245 – Control protocol for multimedia communication
- Q.931 – ISDN user network interface layer 3 specification for basic call control.
- H.450.1 – Generic functional protocol for the support of supplementary services in H.323.
Audio
Audio signals transmitted over the network are digitized and, most of the time, compressed. H.323 supports a number of compression algorithms but only one is mandatory: H.323 terminals must support the G.711 recommendation for speech coding, which is basically uncompressed 8-bit
PCM signal at 8KHz in either A-Law or ยต Law format, leading to bitrates of 56 or 64kbps. Support for other
ITU-T audio recommendations and compressions is optional, and is implementation specific depending on the required speech quality, bit rate, computational power, and delay. Provisions for asymmetric operation of audio codecs have also been madel i.e., it is possible to send audio using one codec but receive audio using another codec. If the G.723.1 audio compression standard is provided, the terminal must be able to encode and decode at both the 5.3 kbps and the 6.3 kbps modes. If a terminal is audio only, it should also support the
ITU-T G.729 recommendation. Note that if a terminals is known to be on a low-bandwidth network (<64kbps),>
- G.711 – Pulse Code Modulation (PCM) of voice frequencies.
- G.722 – 7 kHz audio-coding within 64 kbit/s.
- G.723.1 – Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s.
- G.728 – Coding of speech at 16 kbit/s using low-delay code excited linear prediction.
- G.729 – Coding of speech at 8 kbit/s using conjugate-structure algebraic code excited linear prediction (CS-ACELP).
Video
Video support in H.323 is optional. However, if a terminal is to support video, it must at the very least support the H.261 codec at the
QCIF frame format. Support for other H.261 modes or the H.263 codec is optional. During the initial setup, a specific video data rate is selected during the capability exchange. This rate should not be violated throughout the duration of the session. The H.261 standard uses communication channels that are multiples of 64 kbps, known as px64, where p {1, 2, 3, 30]. From a video encoding perspective, there are no Bidirectional or
‘B’ frames in H.261. Instead, it uses Intra or
‘I’ frames which are fully and independently encoded, and Predicted or
‘p’ frames which code the difference between the frame and its previous frame by using motion estimation.
Compared to H.261, H.263 uses 1/2 pixel motion-estimation for better picture quality, and a Huffman coding table that is optimized specifically for low bit rate transmissions. H.263 defines more picture modes than H.261, as seen in Table 1. In addition, H.263 introduces the PB frames, which consist of a P frame interpolated with a Bi-directional or
‘B’ frame: a frame that depends not only on a previous frame but also on a forthcoming frame. Similar to a
P frame, a
B frame uses motion estimation to reduce the amount of information to carry. See the related article on “Video Compression and Coding Techniques” and “Motion Estimation” for further details.
Data Conferencing
The
ITU-T has a set of standards for data conferencing and application sharing. The
ITU-T T.120 recommendation summarizes the relationships amongst a set of protocols for data conferencing, providing real time communication between two or more entities in a conference. Applications specified as part of the T.120 family include application sharing, electronic whiteboarding, file exchange, and chat.
Data conferencing is an optional capability in multimedia conferences. When supported, data conferencing enables collaboration through applications such as whiteboards, application sharing, and file transfer. The list below summarizes the related recommendations in the T.120 family set forth herein:
- T.120 – Data protocols for multimedia conferencing.
- T.121 – Generic application template It defines the generic application template (GAT), specifying guidelines for building application protocols and facilities that manage the control of the resources used by the application. T.121 is a mandatory standard for products that support T.120.
- T.122 – Multipoint communication service – Service definition: It defines the multipoint services allowing one or more participants to send data. The actual mechanism for transporting the data is defined by T.125, T.122 and T.125 together constitute the T.120 multipoint communication services (MCS).
- T.123 – Network specific data protocol stacks for multimedia conferencing: It defines the sequencing and transporting of the data and its flow control across the network.
- T.124 – Generic Conference Control It defines the generic conference control (GCC) for initiating and maintaining multipoint data conferences. The lists of conference participants, their applications, and the latest conference information is kept here.
- T.125 – Multipoint communication service protocol specification: It defines how data is actually transmitted. One can think of T.125 as the implementation of T.122 services, among other things.
- T.126 – Multipoint still image and annotation protocol: It defines how the whiteboard application sends and receives data. Both compressed and uncompressed form for viewing and updating are supported.
- T.127 – Multipoint binary file transfer protocol: It defines how files are transferred, sometimes simultaneously, among users. Similar to T.126, it supports both compressed and uncompressed forms.
- T.128 – Multipoint application sharing: It defines the program sharing protocol; i.e., how participants can share programs that are running locally.
Other Standards
H.320
Narrow-band visual telephone systems and terminal equipment (March 2004) supports videoconferencing over
ISDN. This protocol has a long and successful history. Sometimes considered a ‘legacy’ protocol, private industry still relies heavily on this protocol, and it provides an important bridge to the
PSTN.
H.321
Adaptation of H.320 visual telephone terminals to B-ISDN environments (Feb 98) provides support for videoconferencing over
ATM. A number of successful systems have been built upon this technology, though the general scarce deployment of
ATM limits the overall reach of these systems.
H.322
Visual telephone systems and terminal equipment for local area networks which provide a guaranteed quality of service (Mar 96) used over LANs that guarantee bandwidth, such as
ISO-Ethernet.
H.324
Terminal for low bit rate multimedia communication (Mar 2002)provides support for low bandwidth videoconferencing over
PSTN. This protocol enjoys success particularly in Asian cellular markets.
Tools
There are many teleconferencing tools currently available. Some of the most popular commercial tools are Microsoft’s NetMeeting, CU-SeeMe,
ICUII, and Isabel developed at the Universidad Politecnica de Madrid. In terms of open source solution, OpenH323 is a widely used platform that implements the H.323 standard.
Recording a Teleconference Session
The ability to record a teleconferencing session is an important requirement for many a applications. In many cases, it is necessary to play back the events that took place in a session. For example, when a participant misses a teleconferencing meeting, he/she can play back exactly what happened in the meeting, if the session was recorded. This includes conversations and discussions among participants, as well as applications and documents that were shared. Ideally, this recording should be done in a transparent way; i.e., user applications need not be modified for this recording to take place. One approach to achieve this would be for the “recorder” module to join the session as a regular client and observe what is taking place and record it. This is also referred to as non-intrusive recording. The J-VCR system is an example of a non-intrusive teleconferencing recording tool.