Carlos Aguni

Highly motivated self-taught IT analyst. Always learning and ready to explore new skills. An eternal apprentice.


Big Data Benchmark examples

26 Mar 2022 »

SLR TPC Benchmark

  • https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=tpc+ds+&btnG=
    • The_Making_of_TPCDS
      • https://www.tpc.org/tpcds/presentations/the_making_of_tpcds.pdf
    • Why You Should Run TPC-DS: A Workload Analysis.
      • https://www.vldb.org/conf/2007/papers/industrial/p1138-poess.pdf

We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results

https://www.singlestore.com/blog/memsql-tpc-benchmarks/

tldr This is a long post because we wanted to write an in-depth overview of how we ran the benchmarks so you can see how we achieved all these results. But if you want just a quick summary, here’s how we did on the benchmarks.

  • TPC-C: SingleStore scaled performance nearly linearly over a scale factor of 50x.
  • TPC-H: SingleStore’s performance against other databases varied on this benchmark, but was faster than multiple modern scale-out database products that only support data warehousing.
  • TPC-DS: SingleStore’s performance ranged from similar, to as much as two times faster than other databases. Expressed as a geometric mean, as is often done for benchmarking results, our performance was excellent.

Characterizing TPCx-BB queries, Hive, and Spark in multi-cloud environments

https://upcommons.upc.edu/bitstream/handle/2117/114812/Characterizing%20TPCx-BB%20queries%2C.pdf

  • [22] - Big Data Compendium
    • 10.1007/978-3-319-31409-9_9

Querying 6.35 Billion Records - a TPC-DS Performance and Cost Comparison between Big Data platforms Starburst Enterprise and EMR SQL engines

https://www.concurrencylabs.com/blog/starburst-enterprise-vs-aws-emr-sql-tpcds/

I created two 1TB TPC-DS data sets (ORC and Parquet), stored in AWS S3. Data sets contain approximately 6.35 billion records stored in 24 tables. The TPC-DS standard also consists of 99 SQL queries. For this test, the following tables were partitioned as described below:

  • catalog_returns on cr_returned_date_sk
  • catalog_sales on cs_sold_date_sk
  • store_returns on sr_returned_date_sk
  • store_sales on ss_sold_date_sk
  • web_returns on wr_returned_date_sk
  • web_sales on ws_sold_date_sk