| Feature Extracting |
TF-IDF |
no |
no |
yes |
related to NLP |
| Feature Extracting |
Word2Vec |
no |
no |
yes |
related to NLP |
| Feature Extracting |
CountVectorizer |
no |
no |
yes |
|
| Feature Extracting |
FeatureHasher |
no |
no |
yes |
|
| Feature Transformation |
Tokenizer |
no |
no |
yes |
related to NLP |
| Feature Transformation |
StopWordsRemover |
no |
no |
yes |
related to NLP |
| Feature Transformation |
nn-gram |
no |
no |
yes |
related to NLP |
| Feature Transformation |
Binarizer |
no |
yes |
yes |
|
| Feature Transformation |
PCA |
no |
no |
yes |
|
| Feature Transformation |
PolynomialExpansion |
no |
no |
yes |
|
| Feature Transformation |
Discrete Cosine Transform (DCT) |
no |
no |
yes |
|
| Feature Transformation |
StringIndexer |
no |
yes |
yes |
|
| Feature Transformation |
OneHotEncoder |
no |
yes |
yes |
|
| Feature Transformation |
Normalizer |
no |
yes |
yes |
|
| Feature Transformation |
StandardScaler |
no |
no |
yes |
|
| Feature Transformation |
MinMaxScaler |
no |
yes |
yes |
|
| Feature Transformation |
MaxAbsScaler |
no |
no |
yes |
|
| Feature Transformation |
QuantileDiscretizer |
no |
no |
yes |
|
| Feature Transformation |
Imputer |
no |
yes |
yes |
|
| Feature Transformation |
Locality Sensitive Hashing (LSH) |
no |
no |
yes |
|
| Feature Selection |
Feature Extractor |
yes |
yes |
yes* |
*VectorAssembler/VectorSlicer and others Vector Transformers |
| Feature Selection |
Chi-Squared test of independence |
no |
no |
yes* |
*ChiSqSelector |
| Classification |
Binomial logistic regression |
no |
yes |
yes |
|
| Classification |
Decision tree classifier |
yes |
yes |
yes |
|
| Classification |
Linear SVM |
yes |
yes |
yes |
|
| Classification |
Random forest classifier |
no |
yes |
yes |
|
| Classification |
Gradient-boosted tree classifier |
no |
no |
yes |
|
| Classification |
Multilayer perceptron classifier |
yes |
yes |
yes |
|
| Classification |
KNN |
yes |
yes |
no |
|
| Classification |
Weigthed KNN |
yes |
yes |
no |
|
| Classification |
ANN (Approximate Nearest Neighbour) with ACD strategy |
no |
yes |
no* |
*Spark can use ANN methods via pre-built buckets with LSH but it doesn't support another method to build annoy index |
| Classification |
Naive Bayes |
no |
no |
yes |
|
| Regression |
Generalized linear regression |
no |
no |
yes* |
*supports only 4096 features |
| Regression |
Linear regression with LSQR |
yes |
yes |
no |
|
| Regression |
Linear regression with SGD |
yes |
yes |
yes |
|
| Regression |
Decision tree regression |
yes |
yes |
yes |
|
| Regression |
Random forest regression |
no |
yes |
yes |
|
| Regression |
Gradient-boosted tree regression |
no |
yes |
yes |
|
| Regularization |
L1, L2 .. Lp as parameter for trainer |
yes |
yes |
yes |
|
| Model Composition / Model Ensembles |
Ensemble as a Mean value of predictions |
no |
yes |
no* |
*supported for trees only |
| Model Composition / Model Ensembles |
Majority-based Ensemble |
no |
yes |
no* |
*supported for trees only |
| Model Composition / Model Ensembles |
Ensemble as a weighted sum of predictions |
no |
yes |
no* |
*supported for trees only |
| Clustering |
K-means |
yes |
yes |
yes |
|
| Clustering |
Latent Dirichlet allocation (LDA) |
no |
no |
yes |
|
| Clustering |
Bisecting k-means |
no |
no |
yes |
|
| Clustering |
Gaussian Mixture Model (GMM) |
no |
no |
yes |
|
| Collaborative Filtering |
ALS |
no |
no |
yes |
|
| Model selection |
TrainTest Splitting |
no |
yes |
yes |
|
| Model selection |
Cross Validation |
no |
yes |
yes |
|
| Model selection |
Parameter Grid |
no |
yes |
yes |
|
| Model selection |
Binary Evaluator |
no |
yes |
yes |
|
| Model selection |
Multi-class Evaluator |
no |
no |
yes |
|
| Metric |
Accuracy |
no |
yes |
yes |
|
| Metric |
Fmeasure |
no |
yes |
yes |
|
| Metric |
Precision |
no |
yes |
yes |
|
| Metric |
Recall |
no |
yes |
yes |
|
| Metric |
ROC AUC |
no |
no |
yes |
|
| Advanced Topics |
Genetic Algorithms |
yes |
yes |
no |
|
| Advanced Topics |
Model Export/Import |
yes |
yes |
yes* |
* PMML is supported |
| Advanced Topics |
TensorFlow Integration via tensorflow/tensorflow/contrib/ |
no |
yes |
no* |
* no projects which are parts of original TF |
| Advanced Topics |
Parallel run of particular trainings |
no |
yes |
no* |
* It supports in a tiny number of algorithms like KMeans initialization |