Skip to main content

Posts

Scala

Basics: def abc(a:type):String = {} obj.abc  //getter obj.abc = pop  //setter val var:type = value class someclass(var: type) //cant use getter or setter. need new case class someclass(var: type) // can use getter. do not need new Instance methods on classes Static methods on objects Variables created by val are immutable. Tuples in scala: -Fixed size of things -Elements could be of different types Creating a tuple: val t:(s,b) = (new s, new b)
Recent posts

Parameters to compare performance based on

For K means: Run algorithm for various -values of K -num of iterations -num of executors - size of the dataset Regression trees to find out the parameters that affect the total execution time the most. 1. total execution time - To predict 2. number of jobs 3. avg time per job 4. number of stages 5. avg time per stage 6. Number of shuffle and reads 7. Number of Retries 8. Number of executors 9. avg number of tasks per stage 10. time per task 11. Get thread pool values for executors. (Repeat for MLlib, ML, tensorflow and dataflow)  - check which has the best performance Check values for each executor - ????

Comparing various tools

The level of distribution : the distribution of tensorflow is achieved at  Graph  level, facilitated by subgraph execution of tensorflow. The component of tensorflow Graph (Tensor/Variable/Operation)  can not be distributed. While Spark’s distribution is achieved at  RDD  level which is the base of Spark. That it to say all the RDD operation and computational graph that is built on RDDs are distributed. tensorflow supports asynchronous training : Asynchronous training is supported naturally by concurrent execution of replicated subgraphs. In addition, synchronous training is also possible in distributed tensorflow. While Spark only supports synchronous computation, since Spark follows Bulk Synchronous Parallel(BSP) model. Therefore, asynchronous training in SparkMLlib hardly happens. tensorflow supports parameter-server & worker structure:  in distributed tensorflow user can assign a device with either ps task or worker task. I think this feature is...