## Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector *y* from available observation vectors *x*. Specifically, we want to have an estimate

Mathinline |
---|

`\hat y = f(x)` |

such that

Mathinline |
---|

`\hat y \approx y.` |

In particular, we will focus on *linear estimates* where

Mathinline |
---|

`\hat y=f(x):=x^T A,` |

and where *A* is a matrix of parameters.

## The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is

Mathinline |
---|

`e=y-\hat y` |

*and the squared error is the L_{2}-norm of the error, that is,*

Mathinline |
---|

`\left\|e\right\|^2 = e^T e` |

*and its mean can be written as*

Mathinline |
---|

`E\left[\left\|e\right\|^2\right].` |