Fetch size
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
https://issues.apache.org/jira/browse/HIVE-22853
set hive.server2.thrift.resultset.default.fetch.size=200;
!set fetchsize 200
Standard JDBC enables you to specify the number of rows fetched with each database round-trip for a query, and this number is referred to as the fetch size. Setting the fetch size in Beeline overrides the JDBC driver’s default fetch size and affects subsequent statements executed in the current session.
- A value of -1 instructs Beeline to use the JDBC driver’s default fetch size (default)
- A value of zero or more is passed to the JDBC driver for each statement
- Any other negative value will throw an Exception
Usage: !set fetchsize 200
Version: 4.0.0 (HIVE-22853)
Hive query cache
https://docs.cloudera.com/cdw-runtime/cloud/hive-performance-tuning/topics/hive-query-results-cache.html
hive.query.results.cache.enabled=false;
`
Hive args
https://github.com/hortonworks/hive-testbench
cd sample-queries-tpcds
hive -i testbench.settings
hive> use tpcds_bin_partitioned_orc_1000;
hive> source query55.sql;
Hive beeline conf set
https://community.cloudera.com/t5/Support-Questions/Beeline-Configuration-hiveconf-Equivalent/td-p/163357
opt 1 jdbc:hive2://hadoop.company.com:8443/default;transportMode=http;httpPath=gateway/default/hive?hive.execution.engine=tez;tez.queue.name=di;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled
opt 2 [LAKE] [smanjee@lake1 ~]# beeline -u "jdbc:hive2://example.com:2181,example.com:2181,example:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" --hiveconf hive.execution.engine=mr
opt 3 beeline -n $user -p $password -u "$jdbc_url" -f $script --verbose true --property-file "query.properties" --fastConnect
query.properties
hive.execution.engine=tez;
tez.queue.name=di;
hive.exec.parallel=true;
Load dummy data
https://github.com/SistemaStrategy/HiveDataPopulator
#!/bin/bash
if [[ -z $1 ]]; then
echo 'You must provide as first and only arg the number of lines generated'
else
hexdump -v -e '5/1 "%02x""\n"' /dev/urandom |
awk -v OFS=';' '
NR == 1 { print "col1", "col2", "col3" }
{ print substr($0, 1, 8), substr($0, 9, 2), int(NR * 32768 * rand()) }' |
head -n "$1" > random_values.csv
fi
hdfs dfs -put random_values.csv /tmp/
$HIVE -e "CREATE DATABASE IF NOT EXISTS test; USE test; CREATE TABLE random (col1 string,col2 string,col3 int) row format delimited fields terminated by '\073' stored as textfile;"
$HIVE -e "use test; LOAD DATA INPATH '/tmp/random_values.csv' OVERWRITE INTO TABLE random;"