POLQA

POLQA, Perceptual Objective Listening Quality Assessment, also known as P.OLQA, is a work item of ITU-T Study Group 12 that covers a model to predict speech quality by means of digital speech signal analysis.
Measurement scope
P.OLQA covers a model to predict speech quality by means of digital speech signal analysis. The predictions of those objective measures should come as close as possible to subjective quality scores as obtained in subjective listening tests. Usually, a Mean Opinion Score (MOS) is predicted.
Technology capabilities
P.OLQA may become the successor of ITU-T P.862 PESQ after its approval by ITU-T. P.OLQA avoids weaknesses of the current P.862 model and has an extension towards higher bandwidth audio signals. Similarly to P.862, P.OLQA supports measurements in the common telephony band (300-3400Hz), but in addition it has a second operational mode for assessing wideband and so-called super-wideband speech signals (50-14000Hz), also referred to as HD-Voice. P.OLQA also targets the assessment of speech signals recorded by an artificial head with ear simulators.
Development history
The P.OLQA activities started in ITU-T in early 2006. In mid-2009 a competition was started to evaluate several candidate models. In May 2010 ITU-T selected candidate models from three companies, OPTICOM, SwissQual and TNO (Netherlands Organisation for Applied Scientific Research), to form the future Recommendation P.OLQA. The three companies were asked to merge their approaches to one single standardized model.
Testing typology
P.OLQA, similar to P.862 PESQ, is a Full Reference (FR) algorithm that rates a degraded or processed speech signal in relation to the original signal. It compares each sample of the reference signal (talker side) to each corresponding sample of the degraded signal (listener side). Perceptual differences between both signals are scored as differences. The perceptual ‘psycho-acoustic’ model is based on similar model of human perception as MP3 or AAC. Basically, the signals are analyzed in the frequency domain (in critical bands) after applying masking functions. Unmasked differences between the two signal representations will be counted as distortions. Finally, the accumulated distortions in the speech file are mapped into a 1 to5 quality scale as usual for MOS tests. FR measurements deliver the highest accuracy and repeatability but can only be applied for dedicated tests in live networks (e.g. drive test tools for mobile network benchmarks).
 
< Prev   Next >