stochasticSVM

Description

An implementation of PEGASOS, powered by Blaze.

Complete: linear [reference]. kernel. [reference].

Incomplete: Randomized Fourier Feature and Forgetron/Budgeted kernel algorithms.

The linear method is notable for training-time with inverse dependence on dataset size.
The kernel method runtime does increase with the number of datapoints, but unlike static solvers, the gradient descent method only has linear memory requirements.

Efforts will eventually be made to accommodate multiclass classification, though binary is the only form currently supported.

Data may be provided in either sparse SVM form (SVM-light format) or a dense tab-delimited format.

Dependencies

Dependency	Reference	Comments
Blaze	K. Iglberger, et al.: Expression Templates Revisited: A Performance Analysis of Current Methodologies. SIAM Journal on Scientific Computing, 34(2): C42—C69, 2012	For optimal performance, this should be linked against BLAS and parallelized, as controlled in blaze/blaze/config/BLAS.h
C++14		DenseSVM is currently only tested on gcc under 5.2 and 6.3
OpenMP		OpenMP is currently required for certain operations, but this requirement could be easily removed.

Blaze leverages SIMD, BLAS, and expression template metaprogramming for state-of-the-art linear algebra performance.

Building

Simply make.

If you wish to use floats instead of doubles, which can be twice as fast in arithmetic operations due to SIMD, use make FLOAT_TYPE=float.

TODO

Add simple streaming classifier using already-built SVM.
Expand to multiclass.
Add Random Fourier Feature extractor.
1. Add Hadamard/Fourier “FastFood” adaptor.
2. Consider attempting to outperform F2F.
Consider compressive projections.