Skip to main content
- The level of distribution: the distribution of tensorflow is achieved at Graph level, facilitated by subgraph execution of tensorflow. The component of tensorflow Graph (Tensor/Variable/Operation) can notbe distributed. While Spark’s distribution is achieved at RDD level which is the base of Spark. That it to say all the RDD operation and computational graph that is built on RDDs are distributed.
- tensorflow supports asynchronous training: Asynchronous training is supported naturally by concurrent execution of replicated subgraphs. In addition, synchronous training is also possible in distributed tensorflow. While Spark only supports synchronous computation, since Spark follows Bulk Synchronous Parallel(BSP) model. Therefore, asynchronous training in SparkMLlib hardly happens.
- tensorflow supports parameter-server & worker structure: in distributed tensorflow user can assign a device with either ps task or worker task. I think this feature is inherent from Google’s first generation DeepLearning system: DistBelief. Stateful Variables can be assigned to PS devices. While spark doesn’t have the PS-Worker feature and readable&writable RDD like tensorflow Variable. We know all RDDs are immutable.
- tensorflow supports input data shuffle and batch: facilitated by queue structure, tensorflow allows batch&shuffle input at each iteration. Users can control the batch size/ whether to shuffle/ epoch limits of input data. While Spark does not have this feature.
- Subgraph execution is a key feature of tensorflow: in tensorflow/Distributed tensorflow, user can specify partial graph to execute. This provides the flexibility of testing and the possibility of reuse. While SparkMLlib does not support subgraph exectuion.
Comments
Post a Comment