Engines

Description

Computation Model

ML Tools

(see Table 5)

Spark [85]

Spark is an open source (Apache License 2.0) engine that sustains to be capable of outperforming Hadoop in a scale of 100 to 1, interfaced with several well-known programming languages.

According to Matei Zaharia, the original Spark’s developer, the engine is capable of outperforming Hadoop by 10× to 100× [83] , depending on the use case, even considering real-time concerns.

Real-Time, Historical

MLlib,

Mahout

Flink [86]

Flink is an open source (Apache License 2.0) distributed streaming dataflow engine for distributed

computations over data. Flink provides several APIs for users of the Flink engine, and also several libraries for use cases, as for instance, event processing, machine learning or graph processing.

Real-Time, Historical

SAMOA

(see Table 5)

Storm

[87] [88]

Storm is an open source (Apache License 2.0) distributed real-time computation framework, usable for stream-based use cases, like real-time analytics, online machine learning, etc.

Real-Time

SAMOA

(see Table 5)