FuriosaAI Ends 2024 on a High Note: Llama 3.1 Performance, SDK Release, Leadership Expansion
SANTA CLARA, Calif., Dec. 19, 2024 /PRNewswire/ -- FuriosaAI, an emerging leader in AI semiconductor solutions, is closing out the year with rapid technical and customer progress with its second-generation chip, RNGD (pronounced 'Renegade'). The recently announced AI solution has achieved compelling performance metrics in real-world enterprise deployments meeting the demand for inference with advanced large language and multimodal models.
The new performance benchmarks showcase RNGD's ability to meet industry-leading throughput demands for Llama 3.1 models, including the 8B and 70B variants, with additional optimizations already in progress. The company also announced key software features that bring advanced optimization for customers currently sampling RNGD hardware in their production environments. These achievements represent the first phase of Furiosa's vision for AI infrastructure that overcomes the inherent limitations of GPUs.
RNGD delivers winning throughput metrics with Llama 3.1 8B and 70B:
Building on the AI-native Tensor Contraction Processor (TCP) architecture of RNGD, Furiosa is redefining real-world AI deployments, delivering unmatched performance, programmability, and power efficiency. Furiosa's RNGD recently achieved a throughput of 3,200–3,300 Tokens per Second (TPS) when running the LLaMA 3.1-8B model. In single-user scenarios, RNGD consistently delivers 40–60 TPS performance.
Additionally, RNGD demonstrates exceptional power efficiency, consuming 181W per card, with further optimization efforts underway. Rather than excessively boosting per-user performance, the company aims to maintain performance levels exceeding typical text-reading speeds (10–20 TPS or higher) while optimizing for multi-user environments and achieving a balanced performance approach.
Furiosa is advancing the performance and efficiency of the LLaMA 3.1-70B model. With just two RNGD cards, LLaMA 3.1-70B can be executed effectively. Currently, a single server supports up to 100 concurrent user queries, with ongoing optimizations aiming to achieve 8,000 TPS per server when equipped with 8 RNGD cards.
With the release of SDK v2024.3.0, Furiosa will expand the range of preloaded models. The SDK will also include support for tensor parallelism, enabling seamless processing across multiple elements without requiring model modifications, and a torch.compile, providing the foundation for executing customized models. Integration with HuggingFace Optimum will further empower customers to leverage a broader variety of models.
Advanced optimization tools delivered to early RNGD customers:
Building on these milestones, domestic and global enterprise customers are conducting tests with Furiosa to find a more efficient solution for scaling the inference of their self-developed models, compared to their existing setup. Their objective is to manage TCO effectively as they prepare for large-scale AI adoption. Furiosa plans to provide a high-quality AI development environment through a powerful and user-friendly SDK optimized for RNGD. The SDK v2024.1.0, currently available through the Early Access Program (EAP), is designed to handle high-performance processing of multiple LLM serving requests. It incorporates optimization techniques such as PagedAttention, Block KV Cache, and Continuous Batching, while also supporting various token sampling methods, including Greedy, Beam Search, and Top-k/p. These features allow developers to seamlessly create AI services customized to meet a wide range of requirements. The SDK and online sample will be available after the release of v2024.3.0.
Furiosa remains committed to delivering the most sustainable AI deployment solutions through rigorous optimization at an unprecedented pace.
"With RNGD now in customers' hands, we are accelerating the next generation of frontier LLMs to unlock emerging Agentic AI applications—bringing advanced reasoning capabilities to enterprise verticals, all at dramatically lower costs," said June Paik, Co-Founder and CEO of FuriosaAI.
Furiosa Expands Global Footprint with Strategic Leadership Appointment
Furiosa is scaling production and expanding its leadership team with the appointment of Alex Liu as Senior Vice President of Product and Business. A Technology Emmy Award winner and co-founder of NETINT Technologies, Alex brings over 20 years of expertise in startup management, technology innovation, and strategic leadership. At NETINT, he spearheaded groundbreaking achievements, including the development of the world's first VPU SoC, setting new industry benchmarks and securing the prestigious 2024 Technology Emmy Award. At Furiosa, Alex will lead global product management, go-to-market strategies, and partnerships to drive innovation and align the company's AI-native technologies with a vision to empower the development of planet-scale AI infrastructure.
RNGD is currently sampling with customers, and mass production will ramp up in partnership with TSMC for 2025 availability. To learn more about Furiosa, please visit https://furiosa.ai/.
About FuriosaAI
FuriosaAI is a semiconductor company dedicated to creating sustainable AI computing solutions that make powerful AI accessible to all. With its innovative Tensor Contraction Processor architecture, FuriosaAI is revolutionizing the AI hardware landscape, offering unparalleled efficiency and programmability for the most demanding AI workloads. For more information, please visit https://furiosa.ai/.
View original content to download multimedia:https://www.prnewswire.com/news-releases/furiosaai-ends-2024-on-a-high-note-llama-3-1-performance-sdk-release-leadership-expansion-302336756.html
SOURCE FuriosaAI