Opentopia Directory Encyclopedia Tools

M-estimator

Encyclopedia : M : ME : MES : M-estimator


In statistics, M-estimators are a type of estimator whose properties are quite well-known. The name "M-estimators" comes from "generalized Maximum likelihood estimators". Indeed, a motivation behind M-estimators can be to generalize maximum likelihood estimators. When using a maximum likelihood estimator, the aim is to maximize [\prod_^n f(x_i)] or, equivalently, minimize [\sum_^n-\log f(x_i)], where f is a probability density function. In 1964, Huber proposed generalising this to the minimization of [\sum_^n\rho(x_i)], where [\rho] is some function (more precision will be given later on in the article). Maximum likelihood estimators are therefore a special case of M-estimators.

The aim of this article is to give an overview of M-estimators: what they are, their properties and give examples of what they are used for. The following article assumes that the reader is familiar with undergraduate probability theory e.g. probability distributions, probability density functions, random variables and expectation.

Types of M-estimators

M-estimators are scalars [\theta] minimizing [\sum_^n\rho(x_i,\theta)]. This can either be done directly or more simply by differentiating. If differentiation is not possible, we speak of an M-estimator of ρ-type. Otherwise, the M-estimator is said to be of ψ-type. In most practical cases, the M-estimators are of ψ-type, which are a much richer class of M-estimators.

M-estimators of ρ-type

For [p\in\mathbb^*], let [(\mathcal,\Sigma)] and [(\Theta\subset\mathbb^p,S)] be measure spaces. [\theta\in\Theta] is a vector of parameters. An M-estimator of ρ-type T is defined through a measurable function [\rho:\mathcal\times\Theta\rightarrow\mathbb]. It maps a probability distribution F on [\mathcal] to the value [T(F)\in\Theta] (if it exists) that minimizes [\int_}\rho(x,\theta)dF(x)]:

[T(F):=\arg\min_\int_}\rho(x,\theta)dF(x)]

For example, for the maximum likelihood estimator, [\rho(x,\theta)=-\log(f(x,\theta))], where [f(x,\theta)=\frac].

M-estimators of ψ-type

If [\rho] is differentiable, the computation of [\widehat] is usually much easier. An M-estimator of ψ-type T is defined through a measurable function [\psi:\mathcal\times\Theta\rightarrow\mathbb^p]. It maps a probability distribution F on [\mathcal] to the value [T(F)\in\Theta] (if it exists) that solves the vector equation: [\int_}\psi(x,\theta)dF(x)=0]
[\int_}\psi(x,T(F))dF(x)=0]

For example, for the maximum likelihood estimator, [\psi(x,\theta)=\left(\frac,\cdots,\frac\right)^t], where [u^t] denotes the transpose of vector u and [f(x,\theta)=\frac].

Such an estimator is not necessarily an M-estimator of ρ-type, but if ρ has a continuous first derivative with respect to [\theta], then a necessary corresponding M-estimator of ψ-type to be an M-estimator of ρ-type is [\psi(x,\theta)=\nabla_\theta\rho(x,\theta)]. The previous definitions can easily be extended to finite samples.

Finite-sample M-estimators

In estimation, for all practical cases, one deals with finite samples. For [n\in\mathbb^*], an estimator [T_n] is said to be an M-estimator of [\rho]-type (resp. of [\psi]-type) if there exists an M-estimator of [\rho]-type (or of [\psi]-type) such that [\forall (X_1,\cdots,X_n)\in F^n, T_n(X_1,\cdots,X_n)=T(G_n)], where [G_n=\frac\sum_^n\Delta_], [\Delta_] being the Dirac distribution with mass 1 in [X_i].

Influence function

The following theorem is of crucial importance, as it states that the influence function of an M-estimator of [\psi]-type is proportional to its defining [\psi] function. This explains why the function [\psi] is sometimes (abusively) called "influence function".

Let T be an M-estimator of ψ-type. Let G be a probability distribution for which [T(G)] is defined and let [x\in\mathcal].

[IF(x;T,G)=-\fracright]}]

Proof:

By definition, [\forall G\in\mbox(T),\int\psi(x,T(G))dG(x)=0]. Let [c(0)=G] and [c'(0)=\Delta_x-G], for example [c(t)=G+t(\Delta_x-G)].Then

[\forall t\in\mathcal,\int\psi(y,T(c(t)))d(c(t)(y))=0]

Differentiating yields

[\forallt\in\mathcal,\frac\int\psi(y,T(c(t)))d(c(t)(y))=0]

We know that [dc(t)=td(\Delta_x-G)+dF]. Therefore,

[\forallt\in\mathcal,\frac\int\psi(y,T(c(t)))td(\Delta_x-G)(y)+\frac\int\psi(x,T(c(t)))dG(y)=0]

Supposing differentiation and integration can be interchanged,

[\forallt\in\mathcal,t\int\fracd(\Delta_x-G)(y)+\int\psi(y,T(c(t)))d(\Delta_x-G)(y)+\int\fracdG(x)=0]

As [\int\psi(y,T(c(t)))d(\Delta_x-G)(y)] [=\int\psi(y,T(c(t)))d(\Delta_x)(y)-\int\psi(y,T(c(t)))dG(y)=\psi(x,T(c(t)))-0]

we can write: [\forallt\in\mathcal, \psi(x,c(t))+t\int\fracd(\Delta_x-G)(y)+\int\fracdG(x)=0]

Now, [\frac=\left[fracright]_]

Therefore, [\forallt\in\mathcal,\psi(x,c(t))+t\int\fracd(\Delta_x-G)(y)+] [\int\left[fracright]_dG(x)\frac=0]

As this equation is valid for all t in [\mathcal], we can take [t=0]:

[\psi(x,T(G))+\int\left[fracright]_dG(x)\left[fracright]_=0]

By definition, [\left[fracright]_=d_GT(\Delta_x-G)=IF(x;T,G)], hence

[IF(x;T,G)=-\fracright]}]

which completes the proof.

Examples

Mean and least-squares

The mean is simply the best approximation of a set of random variables by a constant. Let [(X_1,\cdots,X_n)] be a set of independent, identically distributed (iid) random variables, with distribution F. The arithmetic mean is the orthogonal projection of this sample onto the subspace generated by the constant random variable [(1,\cdots,1)]. The artihmetic mean is the (scalar) random variable [\overline] that minimizes [\|(X_1,\cdots,X_n)-\overline(1,\cdots,1)\|^2], where the norm [\|.\|^2] is given by: [\|X\|^2:=\mathbb[XX^t]].

If we define [\rho(x,\theta)=\frac], then we can see the mean is an M-estimator of ρ-type for this function:

[T_n(X_1,\cdots,X_n):=T(F):=T\left(\frac\sum_^n\Delta_\right)=\arg\min_\frac\sum_^n\rho(X_i,\theta)=\arg\min_\sum_^n(X_i-\theta)^2]

As this function is continuously differentiable, it is also an M-estimator of ψ-type for [\psi(x,\theta)=\theta-x]:

In fact, the arithmetic mean is just a special case of linear least-square regression. For all such estimators, the influence function is:

Indeed,

Sensible choices

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: