EN 301 549 Accessibility requirements for ICT products and services - 6. ICT with two-way voice communication.

6.1 Audio bandwidth for speech

Where ICT provides two-way voice communication, in order to provide good audio quality, that ICT shall be able to encode and decode two-way voice communication with a frequency range with an upper limit of at least 7 000 Hz.

NOTE 1:  For the purposes of interoperability, support of Recommendation ITU-T G.722 [i.21] is widely used.

NOTE 2:  Where codec negotiation is implemented, other standardized codecs such as Recommendation ITU‑T G.722.2 [i.22] are sometimes used so as to avoid transcoding.

6.2 Real-Time Text (RTT) functionality

6.2.1 RTT provision

6.2.1.1 RTT communication

Where ICT is in a mode that provides a means for two-way voice communication, the ICT shall provide a means for two-way RTT communication, except where this would require design changes to add input or output hardware to the ICT.

NOTE 1:  This requirement includes those products which do not have physical display or text entry capabilities but have the capability to connect to devices that do have such capabilities. It also includes intermediate ICT between the endpoints of the communication.

NOTE 2:  There is no requirement to add: a hardware display, a hardware keyboard, or hardware to support the ability to connect to a display or keyboard, wired or wirelessly, if this hardware would not normally be provided.

NOTE 3:  For the purposes of interoperability, support of Recommendation ITU-T T.140 [i.36] is widely used.

6.2.1.2 Concurrent voice and text

Where ICT provides a means for two-way voice communication and for users to communicate by RTT, it shall allow concurrent voice and text through a single user connection.

NOTE 1:  With many-party communication, as in a conference system, it is allowed (but not required or necessarily recommended) that RTT be handled in a single display field and that "turn-taking" be necessary to avoid confusion (in the same way that turn-taking is required for those presenting/talking with voice).

NOTE 2:  With many-party communication, best practice is for hand-raising for voice users and RTT users to be handled in the same way, so that voice and RTT users are in the same queue.

NOTE 3:  With a many-party conference system that has chat as one of its features - the RTT (like the voice) would typically be separate from the chat so that RTT use does not interfere with chat (i.e. people can be messaging in the chat field while the person is presenting/talking with RTT - in the same manner that people message using the chat feature while people are talking with voice). RTT users would then use RTT for presenting and use the Chat feature to message while others are presenting (via Voice or RTT).

NOTE 4:  The availability of voice and RTT running concurrently (and separately from chat) can also allow the RTT field to support text captioning when someone is speaking (and it is therefore not being used for RTT since it is not the RTT user's turn to speak).

NOTE 5:  Where both server-side software and local hardware and software are required to provide voice communication, where neither part can support voice communication without the other and are sold as a unit for the voice communication function, the local and server-side components are considered a single product.

6.2.2 Display of RTT

6.2.1.1 RTT communication

6.2.2.1 Visually distinguishable display

Where ICT has RTT send and receive capabilities, displayed sent text shall be visually differentiated from, and separated from, received text.

NOTE:  The ability of the user to choose between having the send and receive text be displayed in-line or separately, and with options to select, allows users to display RTT in a form that works best for them. This would allow Braille users to use a single field and take turns and have text appear in the sequential way that they may need or prefer.

6.2.2.2 Programmatically determinable send and receive direction

Where ICT has RTT send and receive capabilities, the send/receive direction of transmitted/received text shall be programmatically determinable, unless the RTT is implemented as closed functionality.

NOTE:  This enables screen readers to distinguish between incoming text and outgoing text when used with RTT functionality.

6.2.2.3 Speaker identification

Where ICT has RTT capabilities, and provides speaker identification for voice, the ICT shall provide speaker identification for RTT.

NOTE:      This is necessary to enable both voice and RTT participants to know who is currently communicating, whether it be in RTT or voice.

6.2.2.4 Visual indicator of Audio with RTT

Where ICT provides two-way voice communication, and has RTT capabilities, the ICT shall provide a real-time visual indicator of audio activity on the display.

NOTE 1:  The visual indicator may be a simple character position on the display that flickers on and off to reflect audio activity, or presentation of the information in another way that can be both visible to sighted users and passed on to deaf-blind users who are using a braille display.

NOTE 2:  Without this indication a person who lacks the ability to hear does not know when someone is talking.

6.2.3 Interoperability

Where ICT with RTT functionality interoperates with other ICT with RTT functionality (as required by clause 6.2.1.1) they shall support the applicable RTT interoperability mechanisms described below:

  1. ICT interoperating with other ICT directly connected to the Public Switched Telephone Network (PSTN), using Recommendation ITU-T V.18 [i.23] or any of its annexes for text telephony signals at the PSTN interface;
  2. ICT interoperating with other ICT using VOIP with Session Initiation Protocol (SIP) and using RTT that conforms to IETF RFC 4103 [i.13]. For ICT interoperating with other ICT using the IP Multimedia Sub‑System (IMS) to implement VOIP, the set of protocols specified in ETSI TS 126 114 [i.10], ETSI TS 122 173 [i.11] and ETSI TS 134 229 [i.12] describe how IETF RFC 4103 [i.13] would apply;
  3. ICT interoperating with other ICT using technologies other than a or b, above, using a relevant and applicable common specification for RTT exchange that is published and available for the environments in which they will be operating. This common specification shall include a method for indicating loss or corruption of characters;
  4. ICT interoperating with other ICT using a standard for RTT that has been introduced for use in any of the above environments, and is supported by all of the other active ICT that support voice and RTT in that environment.

NOTE 1:  In practice, new standards are introduced as an alternative codec/protocol that is supported alongside the existing common standard and used when all end-to-end components support it while technology development, combined with other reasons including societal development and cost efficiency, may make others become obsolete.

NOTE 2:  Where multiple technologies are used to provide voice communication, multiple interoperability mechanisms may be needed to ensure that all users are able to use RTT.

EXAMPLE:         A conferencing system that supports voice communication through an internet connection might provide RTT over an internet connection using a proprietary RTT method (option c). However, regardless of whether the RTT method is proprietary or non-proprietary, if the conferencing system also offers telephony communication it will also need to support options a or b to ensure that RTT is supported over the telephony connection.

6.2.4 RTT responsiveness

Where ICT utilises RTT input, that RTT input shall be transmitted to the ICT network or platform on which the ICT runs within 500 ms of the time that the smallest reliably composed unit of text entry is available to the ICT for transmission. Delays due to platform or network performance shall not be included in the 500 ms limit.

NOTE 1:  For character by character input, the "smallest reliably composed unit of text entry" would be a character. For word prediction it would be a word. For some voice recognition systems - the text may not exit the recognition software until an entire word (or phrase) has been spoken. In this case, the smallest reliably composed unit of text entry available to the ICT would be the word (or phrase).

NOTE 2:  The 500 ms limit allows buffering of characters for this period before transmission so character by character transmission is not required unless the characters are generated more slowly than 1 per 500 ms.

NOTE 3:  A delay of 300 ms, or less, produces a better impression of flow to the user.

6.3 Caller ID

Where ICT provides caller identification or similar telecommunications functions, the caller identification and similar telecommunications functions shall be available in text form as well as being programmatically determinable, unless the functionality is closed.

6.4 Alternatives to voice-based services

Where ICT provides real-time voice-based communication and also provides voice mail, auto-attendant, or interactive voice response facilities, the ICT shall offer users a means to access the information and carry out the tasks provided by the ICT without the use of hearing or speech.

NOTE 1:  Tasks that involve both operating the interface and perceiving the information would require that both the interface and information be accessible without use of speech or hearing.

NOTE 2:  Solutions capable of handling audio, RTT and video media could satisfy the above requirement.

6.5 Video communication

6.5.1 General (informative)

Clause 6.5 (Video communications) provides performance requirements that support users who communicate using sign language and lip-reading. For these users, good usability is achieved with a resolution of at least Quarter Video Graphics Array (QVGA, 320 x 240), a frame rate of 20 frames per second and over, with a time difference between speech audio and video that does not exceed 100 ms.

Increasing the resolution and frame rate further improves both sign language (especially finger spelling) and lipreading, with frame rate being more important than resolution.

Time differences between audio and video (asynchronicity) can have a great impact onlip-reading - with video that lags behind audio having greater negative effect.

End-to-end latency can be a problem in video (sign) communication. Overall delay values below 400 ms are preferred, with an increase in preference down to 100 ms. Overall delay depends on multiple factors, including e.g. network delay and video processing. For this reason a testable requirement on minimum values for overall delay cannot be produced.

NOTE:      Recommendation ITU‑T F.703 [i.37] defines and gives requirements for Total Conversation that relate to the integration of audio, RTT and video in a single user connection.

6.5.2 Resolution

Where ICT that provides two-way voice communication includes real-time video functionality, the ICT:

  1. shall support at least QVGA resolution;
  2. should preferably support at least VGA resolution.

6.5.3 Frame rate

Where ICT that provides two-way voice communication includes real-time video functionality, the ICT:

  1. shall support a frame rate of at least 20 Frames Per Second (FPS);
  2. should preferably support a frame rate of at least 30 Frames Per Second (FPS) with or without sign language in the video stream.

6.5.4 Synchronization between audio and video

Where ICT that provides two-way voice communication includes real-time video functionality, the ICT shall ensure a maximum time difference of 100 ms between the speech and video presented to the user.

NOTE:      Recent research shows that, if audio leads the video, the intelligibility suffers much more than the reverse.

6.5.5 Visual indicator of audio with video.

Where ICT provides two-way voice communication, and includes real-time video functionality, the ICT shall provide a real-time visual indicator of audio activity.

NOTE 1:  The visual indicator may be a simple visual dot or LED, or other type of on/off indicator, that flickers to reflect audio activity.

NOTE 2:  Without this indication a person who lacks the ability to hear does not know when someone is talking.

6.5.6 Speaker identification with video (sign language) communication

Where ICT provides speaker identification for voice users, it shall provide a means for speaker identification for real-time signing and sign language users once the start of signing has been indicated.

NOTE 1:  The speaker ID can be in the same location as for voice users for multiparty calls.

NOTE 2:  This mechanism might be triggered manually by a user, or automatically where this is technically achievable.

6.6 Alternatives to video-based services

Where ICT provides real-time video-based communication and also provides answering machine, auto attendant or interactive response facilities, the ICT should offer users a means to access the information and carry out the tasks related to these facilities:

  1. for audible information, without the use of hearing;
  2. for spoken commands, without the use of speech;
  3. for visual information, without the use of vision.

NOTE:      Solutions capable of generating real-time captions or handling RTT could satisfy the above requirement.