Why I like tez
https://github.com/t3rmin4t0r/notes/wiki/I-Like-Tez,-DevOps-Edition-%28WIP%29?utm_source=xp&utm_medium=blog&utm_campaign=content
Tez container reuse
https://stackoverflow.com/questions/45346104/how-container-reuse-works-in-apache-tez-while-reusing-what-is-the-data-stored-i
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.1.5/bk_performance_tuning/content/ch01s01s02.html
Tez Hive doAs
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.1.5/bk_performance_tuning/content/ch01s01s03.html
Set enable.doAs to False – We want enable.doAs to be false since this uses the Hive user identity rather than the individual user identities for YARN. This helps with security and reuse.
hive.server2.enable.doAs=false
Timeline service v2
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html
DAG information
https://docs.cloudera.com/HDPDocuments/DAS/DAS-1.3.0/operations/content/das_viewing_the_task_level_dag_information.html
DAG Flow
https://docs.cloudera.com/HDPDocuments/DAS/DAS-1.3.0/operations/content/das_viewing_the_dag_flow.html
Apache Tez : Accelerating Hadoop Data Processing
https://slideplayer.com/slide/3373591/ https://pt.slideshare.net/hortonworks/apache-tez-accelerating-hadoop-query-processing
tez tools swimlanes
https://github.com/apache/tez/tree/master/tez-tools/swimlanes
https://lists.apache.org/thread/ph12rgvd6o3z077qhhqybgmfozq44rny
More tooling for isolating which vertex is taking up time (and which task)
https://github.com/apache/tez/tree/master/tez-tools/swimlanes
or alternatively run
https://github.com/t3rmin4t0r/tez-swimlanes/blob/master/vertex.py
The first one should get you a graph which a lot like
http://people.apache.org/~gopalv/query4.svg
and the 2nd one should get you something which looks like
http://people.apache.org/~gopalv/q21_suppliers_who_kept_orders_waiting.svg (note skewed tail in Reducer 3)