• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

NVIDIA Unveils Mistral-NeMo-Minitron 8B Model with Superior Accuracy

August 22, 2024
in Blockchain
Reading Time: 3min read
0 0
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
37
VIEWS
ShareShareShareShareShare


Tony Kim
Aug 22, 2024 05:37

NVIDIA’s new Mistral-NeMo-Minitron 8B model demonstrates superior accuracy across nine benchmarks, utilizing advanced pruning and distillation techniques.





NVIDIA, in collaboration with Mistral AI, has announced the release of the Mistral-NeMo-Minitron 8B model, a highly advanced open-access large language model (LLM). According to the NVIDIA Technical Blog, this model surpasses other models of a similar size in terms of accuracy on nine popular benchmarks.

Advanced Model Pruning and Distillation

The Mistral-NeMo-Minitron 8B model was developed by width-pruning the larger Mistral NeMo 12B model, followed by a light retraining process using knowledge distillation. This methodology, originally proposed by NVIDIA in their paper on Compact Language Models via Pruning and Knowledge Distillation, has been validated through multiple successful implementations, including the NVIDIA Minitron 8B and 4B models, as well as the Llama-3.1-Minitron 4B model.

Model pruning involves reducing the size and complexity of a model by either dropping layers (depth pruning) or neurons and attention heads (width pruning). This process is often paired with retraining to recover any lost accuracy. Model distillation, on the other hand, transfers knowledge from a large, complex model (the teacher model) to a smaller, simpler model (the student model), aiming to retain much of the predictive power of the original model while being more efficient.

The combination of pruning and distillation allows for the creation of progressively smaller models from a large pretrained model. This approach significantly reduces the computational cost, as only 100-400 billion tokens are needed for retraining, compared to the much larger datasets required for training from scratch.

Mistral-NeMo-Minitron 8B Performance

The Mistral-NeMo-Minitron 8B model demonstrates leading accuracy on several benchmarks, outperforming other models in its class, including the Llama 3.1 8B and Gemma 7B models. The table below highlights the performance metrics:








 Training tokensWino-Grande 5-shotARC Challenge 25-shotMMLU 5-shotHella Swag 10-shotGSM8K 5-shotTruthfulQA 0-shotXLSum en (20%) 3-shotMBPP 0-shotHuman Eval 0-shot
Llama 3.1 8B15T77.2757.9465.2881.8048.6045.0630.0542.2724.76
Gemma 7B6T786164825045173932
Mistral-NeMo-Minitron 8B380B80.3564.4269.5183.0358.4547.5631.9443.7736.22
Mistral NeMo 12BN/A82.2465.1068.9985.1656.4149.7933.4342.6323.78

Table 1. Accuracy of the Mistral-NeMo-Minitron 8B base model compared to the teacher Mistral-NeMo 12B, Gemma 7B, and Llama-3.1 8B base models. Bold numbers represent the best among the 8B model class

Implementation and Future Work

Following the best practices of structured weight pruning and knowledge distillation, the Mistral-NeMo 12B model was width-pruned to yield the 8B target model. The process involved fine-tuning the unpruned Mistral NeMo 12B model using 127 billion tokens to correct for distribution shifts, followed by width-only pruning and distillation using 380 billion tokens.

The Mistral-NeMo-Minitron 8B model showcases superior performance and efficiency, making it a significant advancement in the field of AI. NVIDIA plans to continue refining the distillation process to produce even smaller and more accurate models. The implementation of this technique will be gradually integrated into the NVIDIA NeMo framework for generative AI.

For further details, visit the NVIDIA Technical Blog.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

Historical Data Suggests Bitcoin Could Rise 1,000%, Here’s Why

Next Post

Ethereum Is Flat, and Whales Selling: More Pain to Follow?

Next Post
Ethereum Is Flat, and Whales Selling: More Pain to Follow?

Ethereum Is Flat, and Whales Selling: More Pain to Follow?

You might also like

Uniswap (UNI) Price Rallies 6.53% – Is Now the Time to Buy? Comprehensive Analysis & Trading Insights

PEPE Price Prediction: Data Crisis Forces Trading Halt – Zero Price Feeds Signal Market Breakdown

April 29, 2026
What every crypto trader needs to think about before EOFY

What every crypto trader needs to think about before EOFY

May 1, 2026
Here’s How The Ethereum Vs. Solana Rivalry Is Going

Here’s How The Ethereum Vs. Solana Rivalry Is Going

April 29, 2026
Binance Ethereum Supply Hits 2020 Levels While Staking Locks A Third: Repricing Ahead?

Binance Ethereum Supply Hits 2020 Levels While Staking Locks A Third: Repricing Ahead?

April 28, 2026
AI-Powered Crypto Trading Tools That Don’t Require Coding Skills: Review

BNB Chain Just Activated the Osaka Hard Fork: Will 20,000 TPS Finally Trigger a Price Breakout Above $700?

April 28, 2026
Bitcoin Price Prediction: Japan’s Crypto Banking Shift and AI Trading Boom Fuel Bullish Outlook

XRP Price Prediction: Rakuten Integration Sends Sentiment to 2-Year High

May 1, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Ethereum Is Up 30% But Shorts Refuse to Let Go – The Last Time This Setup Didn’t End Quietly

Ethereum Is Up 30% But Shorts Refuse to Let Go – The Last Time This Setup Didn’t End Quietly

May 2, 2026
Bitcoin Price Prediction: Japan’s Crypto Banking Shift and AI Trading Boom Fuel Bullish Outlook

XRP Price Prediction: Rakuten Integration Sends Sentiment to 2-Year High

May 1, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.