For K means:
Run algorithm for various
-values of K
-num of iterations
-num of executors
- size of the dataset
Regression trees to find out the parameters that affect the total execution time the most.
1. total execution time - To predict
2. number of jobs
3. avg time per job
4. number of stages
5. avg time per stage
6. Number of shuffle and reads
7. Number of Retries
8. Number of executors
9. avg number of tasks per stage
10. time per task
11. Get thread pool values for executors.
(Repeat for MLlib, ML, tensorflow and dataflow) - check which has the best performance
Check values for each executor -
????
Run algorithm for various
-values of K
-num of iterations
-num of executors
- size of the dataset
Regression trees to find out the parameters that affect the total execution time the most.
1. total execution time - To predict
2. number of jobs
3. avg time per job
4. number of stages
5. avg time per stage
6. Number of shuffle and reads
7. Number of Retries
8. Number of executors
9. avg number of tasks per stage
10. time per task
11. Get thread pool values for executors.
(Repeat for MLlib, ML, tensorflow and dataflow) - check which has the best performance
Check values for each executor -
????
Comments
Post a Comment