The algorithms we list here are used for classification, clustering, statistical learning, association analysis and link mining. The list gives you an overview of what’s available.
Algorithm/method | Description |
---|---|
Regression (Predict/classify) |
Prediction and classification: linear and logistic are common |
Penalised regression (Predict/classify) |
Prediction and classification using reduced set of variables |
Ridge regression (Prediction) |
Form of penalised regression, dimension reduction |
Lasso regression (Prediction) |
Form of penalised regression, dimension reduction |
Partial least squares (Prediction) |
Form of penalised regression, dimension reduction |
Naïve Bayes (Predict/classify) |
Create a score to predict/classify using input data and prior |
Bayesian networks (Predict/classify) |
Probabilistic network model |
Neural networks (Predict/classify) |
Form layers of nonlinear functions of input variables |
CART, C4.5, C5.0 (Classification) |
Recursively split data into increasingly smaller subgroups |
Random forest, GBM (Classification) |
Other types of tree-based approaches like CART |
Apriori (Classification) |
Find association rules from frequent sets of variables or items |
SVM (Classification) |
Find linear function of inputs that separates the classes |
kNN (Classification) |
Predict class based on majority vote of k nearest neighbours |
AdaBoost (Classification) |
Use multiple algorithms, ensemble learning method |
PageRank (Pattern finding) |
Find associations (rank websites) based on links (hyperlinks) |
Mixtures (Density estimation) |
Describe non-standard densities, also used to find clusters |
EM (Clustering) |
Estimate clusters as mixture of normal distributions |
K-Means (Clustering) |
Allocates points to closest cluster based on distance measure |
PCA, FA (Dimension reduction) |
Convert input variables to smaller set of output variables |
Do you have a reference where these or other algorithms are explained well?