Statistical classification
Encyclopedia : S : ST : STA : Statistical classification
Statistical classification is a statistical procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items.
Formally, the problem can be stated as follows: given training data [\,y),\dots,(\mathbf, y)\}] produce a classifier [h:\mathcal\rightarrow\mathcal] which maps an object [\mathbf \in \mathcal] to its classification label [y \in \mathcal]. For example, if the problem is filtering spam, then [\mathbf] is some representation of an email and [y] is either "Spam" or "Non-Spam".
Statistical classification algorithms are typically used in pattern recognition systems.
Note: in community ecology, the term "classification" is synonymous with what is commonly known (in machine learning) as clustering. See that article for more information about purely unsupervised techniques.
Statistical classification techniques
While there are many methods for classification, they are solving one of three related mathematical problems.
The first is to find a map of a feature space (which is typically a multi-dimensional vector space) to a set of labels. This is equivalent to partitioning the feature space into regions, then assigning a label to each region. Such algorithms (e.g., the nearest neighbour algorithm) typically do not yield confidence or class probabilities, unless post-processing is applied. Another set of algorithms to solve this problem first apply unsupervised clustering to the feature space, then attempt to label each of the clusters or regions.
The second problem is to consider classification as an estimation problem, where the goal is to estimate a function of the form
- [P(|) = f\left(\vec x;\vec \theta\right)]
- [P(|) = \int f\left(\vec x;\vec \theta\right)P(\vec \theta|D) d\vec \theta]
Examples of classification algorithms include:
- Linear classifiers
- * Fisher's linear discriminant
- * Logistic regression
- * Naive Bayes classifier
- * Perceptron
- k-nearest neighbor
- Boosting
- Decision trees
- Neural networks
- Bayesian networks
- Support vector machines
- Hidden Markov models
Application domains
- Computer vision
- * Medical Imaging and Medical Image Analysis
- * Optical character recognition
- Geostatistics
- Speech recognition
- Handwriting recognition
- Biometric indentification
- Document classification
- Internet search engines
- Credit scoring
External links
- [Classifier showdown] A practical comparison of classification algorithms.
See also
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
