Introduction to CELP Coding
Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section 7. The CELP technique is based on three ideas:
- The use of a linear prediction (LP) model to model the vocal tract
- The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
- The search performed in closed-loop in a ``perceptually weighted domain''
This section describes the basic ideas behind CELP. Note that it's still incomplete.
Linear Prediction (LPC)
Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal using a linear combination of its past samples:
![\begin{displaymath}
y[n]=\sum_{i=1}^{N}a_{i}x[n-i]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img4.png)
where is the linear prediction of
. The prediction error is thus given by:
![\begin{displaymath}
e[n]=x[n]-y[n]=x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img6.png)
The goal of the LPC analysis is to find the best prediction coefficients which minimize the quadratic error function:
![\begin{displaymath}
E=\sum_{n=0}^{L-1}\left[e[n]\right]^{2}=\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img8.png)
That can be done by making all derivatives equal to zero:
![\begin{displaymath}
\frac{\partial E}{\partial a_{i}}=\frac{\partial}{\partial a...
...um_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img10.png)
The filter coefficients are computed using the Levinson-Durbin algorithm, which starts from the auto-correlation
of the signal
.
![\begin{displaymath}
R(m)=\sum_{i=0}^{N-1}x[i]x[i-m]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img12.png)
For an order filter, we have:
![\begin{displaymath}
\mathbf{R}=\left[\begin{array}{cccc}
R(0) & R(1) & \cdots & ...
...s & \vdots\\
R(N-1) & R(N-2) & \cdots & R(0)\end{array}\right]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img13.png)
![\begin{displaymath}
\mathbf{r}=\left[\begin{array}{c}
R(1)\\
R(2)\\
\vdots\\
R(N)\end{array}\right]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img14.png)
The filter coefficients are found by solving the system
. What the Levinson-Durbin algorithm does here is making the solution to the problem
instead of
by exploiting the fact that matrix
is toeplitz hermitian. Also, it can be proven that all the roots of
are within the unit circle, which means that
is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply
by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.
The linear prediction model represents each speech sample as a linear combination of past samples, plus an error signal called the excitation (or residual).
![\begin{displaymath}
x[n]=\sum_{i=1}^{N}a_{i}x[n-i]+e[n]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img22.png)
In the z-domain, this can be expressed as
![\begin{displaymath}
x(z)=\frac{1}{A(z)}\: e(z)\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img23.png)
where is defined as
![\begin{displaymath}
A(z)=1-\sum_{i=1}^{N}a_{i}z^{-i}\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img24.png)
We usually refer to as the analysis filter and
as the synthesis filter. The whole process is called short-term prediction as it predicts the signal
using a prediction using only the
past samples, where
is usually around 10.
Because LPC coefficients have very little robustness to quantization, they are converted to Line Spectral Pair (LSP) coefficients which have a much better behaviour with quantization, one of them being that it's easy to keep the filter stable.
Pitch Prediction
During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal by a gain times the past of the excitation:
![\begin{displaymath}
e[n]\simeq p[n]=\beta e[n-T]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img26.png)
where is the pitch period,
is the pitch gain. We call that long-term prediction since the excitation is predicted from
with
.
Innovation Codebook
The final excitation will be the sum of the pitch prediction and an innovation signal
taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by:
![\begin{displaymath}
e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img32.png)
The quantization of is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal
as
![\begin{displaymath}
X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img34.png)
Analysis-by-Synthesis and Error Weighting
Most (if not all) modern audio codecs attempt to ``shape'' the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error
![\begin{displaymath}
E=\sum_{n}\left(x[n]-\overline{x}[n]\right)^{2}\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img35.png)
where is the encoder signal, we minimize the error for the perceptually weighted signal
![\begin{displaymath}
X_{w}(z)=W(z)X(z)\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img37.png)
where is the weighting filter, usually of the form
with control parameters . If the noise is white in the perceptually weighted domain, then in the signal domain its spectral shape will be of the form
![\begin{displaymath}
A_{noise}(z)=\frac{1}{W(z)}=\frac{A\left(\frac{z}{\gamma_{2}}\right)}{A\left(\frac{z}{\gamma_{1}}\right)}\end{displaymath}](http://ntools.net/arc/Documents/speex/manual/img41.png)
If a filter has (complex) poles at
in the
-plane, the filter
will have its poles at
, making it a flatter version of
.
Analysis-by-synthesis refers to the fact that when trying to find the best pitch parameters (,
) and innovation signal
, we do not work by making the excitation
as close as the original one (which would be simpler), but apply the synthesis (and weighting) filter and try making
as close to the original as possible.
参考资料:
1 百科总结: https://zh.wikipedia.org/wiki/%E7%A0%81%E6%BF%80%E5%8A%B1%E7%BA%BF%E6%80%A7%E9%A2%84%E6%B5%8B
2 详细介绍: http://ntools.net/arc/Documents/speex/manual/node8.html
相关文章
- A Beginner’s Introduction to CSS Animation中文版
- 机器学习入门14 - 神经网络简介 (Introduction to Neural Networks)
- [Typescript] Using Extract type until to get the value from Union type
- [Functional Programming] Introduction to State, thinking in State
- [Angular] Use :host-context and the ::ng-deep selector to apply context-based styling
- [Canvas] Introduction to drawing with p5js
- [AngularJS] Introduction to ui-router
- eclipse failed to create the java virtual machine 问题图文解析
- [Kotlin] Adding the Hibernate dependencies to our project and creating the database
- [Functional Programming] Introduction to State, thinking in State
- [Angular] Introduction to Angular Internationalization (i18n)
- [Javascript] An Introduction to JSPM (JavaScript Package Manager)
- when is view bound to its corresponding controller instance
- Use BAdI to link appointment to a given opportunity during creation
- You currently don‘t have access to this membership resource. To resolve this issue, agree to the lat
- AI:2020年6月22日北京智源大会演讲分享之09:50-10:40 Anil教授《Pattern Recognition: Statistics to Pattern Recognition》
- 成功解决ValueError: Unable to add relationship because child variable ‘ID‘ in ‘cats_df‘ is also its inde
- 【异常】Failed to configure a DataSource: ‘url‘ attribute is not specified and no embedded datasource
- Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
- 已解决E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
- 使用gridlayout布局后,因某些原因又删除,并整理文件夹结构时,Unable to resolve target 'android-7'
- Introduction to nonlinear optimization第三章习题
- Introduction to nonlinear optimization第一章习题
- Give root password for maintenance(or type control -D to continue)
- 玩转华为数据中心交换机系列 | 配置基于VLAN的VLAN Mapping(1 to 1)
- Android12之error:is not visible to this module(一百五十一)
- 【文献学习】An Introduction to Deep Learning for the Physical Layer
- Introduction to Command Line
- SQL报错——check the manual that corresponds to your MySQL server version for the right syntax to use
- 第一周:深度学习引言(Introduction to Deep Learning)
- A brief introduction to Depthwise Separable Convolution
- Glib2: undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)问题(六)