Beeline Hive Clients Study

Fetch size

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

https://issues.apache.org/jira/browse/HIVE-22853

set hive.server2.thrift.resultset.default.fetch.size=200;
!set fetchsize 200

Standard JDBC enables you to specify the number of rows fetched with each database round-trip for a query, and this number is referred to as the fetch size. Setting the fetch size in Beeline overrides the JDBC driver’s default fetch size and affects subsequent statements executed in the current session.

A value of -1 instructs Beeline to use the JDBC driver’s default fetch size (default)
A value of zero or more is passed to the JDBC driver for each statement
Any other negative value will throw an Exception

Usage: !set fetchsize 200 Version: 4.0.0 (HIVE-22853)

Hive query cache

https://docs.cloudera.com/cdw-runtime/cloud/hive-performance-tuning/topics/hive-query-results-cache.html

hive.query.results.cache.enabled=false;`

Hive args

https://github.com/hortonworks/hive-testbench

cd sample-queries-tpcds
hive -i testbench.settings
hive> use tpcds_bin_partitioned_orc_1000;
hive> source query55.sql;

Hive beeline conf set

https://community.cloudera.com/t5/Support-Questions/Beeline-Configuration-hiveconf-Equivalent/td-p/163357

opt 1 jdbc:hive2://hadoop.company.com:8443/default;transportMode=http;httpPath=gateway/default/hive?hive.execution.engine=tez;tez.queue.name=di;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled

opt 2 [LAKE] [smanjee@lake1 ~]# beeline -u "jdbc:hive2://example.com:2181,example.com:2181,example:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" --hiveconf hive.execution.engine=mr

opt 3 beeline -n $user -p $password -u "$jdbc_url" -f $script --verbose true --property-file "query.properties" --fastConnect

query.properties

hive.execution.engine=tez;
tez.queue.name=di;
hive.exec.parallel=true;

Load dummy data

https://github.com/SistemaStrategy/HiveDataPopulator

#!/bin/bash
if [[ -z $1 ]]; then
	echo 'You must provide as first and only arg the number of lines generated'
else
	hexdump -v -e '5/1 "%02x""\n"' /dev/urandom | 
	awk -v OFS=';' '
	NR == 1 { print "col1", "col2", "col3" }
	{ print substr($0, 1, 8), substr($0, 9, 2), int(NR * 32768 * rand()) }' |
	head -n "$1" > random_values.csv
  fi

hdfs dfs -put random_values.csv /tmp/

$HIVE -e "CREATE DATABASE IF NOT EXISTS test; USE test; CREATE TABLE random (col1 string,col2 string,col3 int) row format delimited fields terminated by '\073' stored as textfile;"

$HIVE -e "use test; LOAD DATA INPATH '/tmp/random_values.csv' OVERWRITE INTO TABLE random;"

Carlos Aguni

Fetch size

Hive query cache

Hive args

Hive beeline conf set

Load dummy data