• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Optimizing Parquet String Data Compression with RAPIDS

July 17, 2024
in Blockchain
Reading Time: 2min read
0 0
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
16
VIEWS
ShareShareShareShareShare


Jessie A Ellis
Jul 17, 2024 17:53

Discover how to optimize encoding and compression for Parquet string data using RAPIDS, leading to significant performance improvements.





Parquet writers offer various encoding and compression options that are turned off by default. Enabling these options can provide better lossless compression for your data, but understanding which options to use is crucial for optimal performance, according to the NVIDIA Technical Blog.

Understanding Parquet Encoding and Compression

Parquet’s encoding step reorganizes data to reduce its size while preserving access to each data point. The compression step further reduces the total size in bytes but requires decompression before accessing the data again. The Parquet format includes two delta encodings designed to optimize string data storage: DELTA_LENGTH_BYTE_ARRAY (DLBA) and DELTA_BYTE_ARRAY (DBA).

RAPIDS libcudf and cudf.pandas

RAPIDS is a suite of open-source accelerated data science libraries. In this context, libcudf is the CUDA C++ library for columnar data processing. It supports GPU-accelerated readers, writers, relational algebra functions, and column transformations. The Python cudf.pandas library accelerates existing pandas code by up to 150x.

Benchmarking with Kaggle String Data

A dataset of 149 string columns, comprising 4.6 GB total file size and 12 billion total character count, was used to compare encoding and compression methods. The study found less than 1% difference in encoded size between libcudf and arrow-cpp and a 3-8% increase in file size when using the ZSTD implementation in nvCOMP 3.0.6 compared to libzstd 1.4.8+dfsg-3build1.

String Encodings in Parquet

String data in Parquet is represented using the byte array physical type. Most writers default to RLE_DICTIONARY encoding for string data, which uses a dictionary page to map string values to integers. If the dictionary page grows too large, the writer falls back to PLAIN encoding.

Total File Size by Encoding and Compression

For the 149 string columns in the dataset, the default setting of dictionary encoding and SNAPPY compression yields a total 4.6 GB file size. ZSTD compression outperforms SNAPPY, and both outperform uncompressed options. The best single setting for the dataset is default-ZSTD, with further reductions possible using delta encoding for specific conditions.

When to Choose Delta Encoding

Delta encoding is beneficial for data with high cardinality or long string lengths, generally achieving smaller file sizes. For string columns with less than 50 characters, DBA encoding can provide significant file size reductions, especially for sorted or semi-sorted data.

Reader and Writer Performance

The GPU-accelerated cudf.pandas library showed impressive performance compared to pandas, with 17-25x faster Parquet read speeds. Using cudf.pandas with an RMM pool further improved throughput to 552 MB/s read and 263 MB/s write speeds.

Conclusion

RAPIDS libcudf offers flexible, GPU-accelerated tools for reading and writing columnar data in formats such as Parquet, ORC, JSON, and CSV. For those looking to leverage GPU acceleration for Parquet processing, RAPIDS cudf.pandas and libcudf provide significant performance benefits.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

XRP Lawyer Debunks Ripple-SEC Settlement Rumor For Tomorrow

Next Post

NVIDIA Unveils Generative AI-Powered Visual AI Agents for Edge Deployment

Next Post
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals

NVIDIA Unveils Generative AI-Powered Visual AI Agents for Edge Deployment

You might also like

Why Is Crypto Up Today? – October 15, 2025

Bitcoin Price Prediction: Bitcoin Suddenly Reclaims $73K Despite War Chaos — But Analysts Issue a Stark Warning

March 5, 2026
Bitcoin Bear Market Could Be Shrinking, But Are We Watching History Repeating Itself?

Bitcoin Bear Market Could Be Shrinking, But Are We Watching History Repeating Itself?

March 8, 2026
Hayes Says Hyperliquid’s HYPE Is Headed To $150 By August 2026

Hayes Says Hyperliquid’s HYPE Is Headed To $150 By August 2026

March 10, 2026
Sydney-Based Iren Orders 50,000 Nvidia GPUs to Supercharge AI Data Center Expansion

Sydney-Based Iren Orders 50,000 Nvidia GPUs to Supercharge AI Data Center Expansion

March 6, 2026
ETH USD: Is the Ethereum Breakout a Bull Trap?

ETH USD: Is the Ethereum Breakout a Bull Trap?

March 6, 2026
Bitcoin Spot ETFs See 14-Day Netflows Surge: Demand Returning?

Bitcoin Spot ETFs See 14-Day Netflows Surge: Demand Returning?

March 6, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Analyst Maps Out XRP’s Exact Path For 2026, Here’s The Roadmap

Analyst Maps Out XRP’s Exact Path For 2026, Here’s The Roadmap

March 11, 2026
Swiss-based Crypto Firms Selects Tezos for Tokenizing Finance Products

Etherlink Hits 70M Transactions as Tezos L2 Expands Developer Tools

March 11, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.