#FP8 — Bluesky Posts — bluesky.baby

@llms.activitypub.awakari.com.ap.brid.gy

3 months ago

Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed Intel AutoRound achieves faster & efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support.

Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed Intel's AutoRound achieves ...

#Featured #News #Sticky #CUDA #FP8 #Intel #AutoRound #Intel #Crescent #Island #Intel

Origin | Interest | Match

0 0 0 0

Wccftech | Hardware, Gaming, and Mobile News

@wccftech.com.web.brid.gy

3 months ago

Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed Intel's AutoRound achieves faster and efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support. Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support Press Release: We’re excited to announce that AutoRound, a state‑of‑the‑art post‑training quantization(PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers: Broader quantization schemes and model coverage are coming next—try it now and help shape what we build. What Is AutoRound? AutoRound is an advanced post-training quantization (PTQ) algorithm designed for Large Language Models(LLMs) and Vision-Language Models […]

0 0 0 0

@doyouknnow.bsky.social

3 months ago

LLM 양자화 완벽 가이드! INT4로 메모리 87.5% 절감, FP8로 처리량 43% 향상. GPTQ vs AWQ vs GGUF 비교, Llama 3 양자화 성능 벤치마크, Q4까지 손실 2% 미만! Pruning + Knowledge Distillation 경량화 기법, 하드웨어별 추천 전략, QLoRA Fine-tuning까지!

#AWQ #FP8 #GGUF #GPTQ #INT4 #INT8 #KnowledgeDistillation #Llama3 #llamacpp
doyouknow.kr/618/llm-quan...

0 0 0 0

Don Porqué

@donporque.bsky.social

3 months ago

¿Qué cambia con Trainium 3 de Amazon frente a Nvidia? Amazon lanza Trainium 3 en AWS: 4,4× más rendimiento, racks con 144 chips y más eficiencia para recortar costes y competir con Nvidia en IA. Amazon ha puesto en circulación Trainium 3, su nueva genera...

¿Qué cambia con Trainium 3 de Amazon frente a Nvidia?
#IA #AWS #Amazon #Nvidia #Trainium3 #reInvent #Cloud #DataCenter #FP8 #HBM3e #EficienciaEnergética #3dediciembre #felizmiercoles
donporque.com/trainium-3-d...

0 0 0 0

HGPU group

@hgpu.bsky.social

4 months ago

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error Training large Mixture-of-Experts (MoE) models remains computationally prohibitive due to their extreme compute and memory demands. Although low-precision training promises to accelerate computatio…

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error

#FP8 #Precision

hgpu.org?p=30341

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Exponent‑Concentrated FP8 Enables Lossless Compression of Large AI Models

ECF8 lossless 8-bit compression cuts memory by up to 26.9% and boosts throughput to 177.1% versus FP32 on models up to 671B parameters. getnews.me/exponent-concentrated-fp... #fp8 #ai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

InfiR2 Introduces Efficient FP8 Training Recipe for Language Models

InfiR2’s open‑source FP8 training recipe cuts training time by up to 22% and reduces peak memory usage 14% while matching BF16 accuracy on a 160‑billion‑token corpus. Read more: getnews.me/infir2-introduces-effici... #fp8 #llm #infir2

0 0 0 0

Ignacio G.R. Gavilán

@igrgavilan.bsky.social

6 months ago

Apertura y poder: DeepSeek ataca la hegemonía de Nvidia con código abierto y chips domésticos La llegada del modelo DeepSeek V3.1 nos ofrece no tanto una mera actualización tecnológica como una declaración estratégica de enorme calado. En las últimas semanas, esta startup china ha lanzado una ...

#enlosblogs "Apertura y poder: DeepSeek ataca la hegemonía de Nvidia con código abierto y chips domésticos" (www.enriquedans.com/2025/08/aper...) por @edans.bsky.social #DeepSeek #FP8 #UE8M0 #Nivia #Chips #Geoestrategia

0 0 0 0

deepseek

@deepseek.activitypub.awakari.com.ap.brid.gy

6 months ago

Deepseek V3.1 引爆A股！神秘代码 UE8M0 揭秘，华为升腾背后的“国运”豪赌

Deepseek V3.1 引爆A股！神秘代码 UE8M0 揭秘，华为升腾背后的“国运”豪赌兄弟们，DeepSeek V3.1一出：圈内淡定，圈外股民疯了，公众号一句“UE8M0+FP8适...

#DeepSeek大模型 #AI #Agent #AI大模型 #AI科普 #AMD #A股 #Deepseek #V3.1 #FP8 #H100

Origin | Interest | Match

0 1 0 0

Buhane Information Technologies

@buhane.com.tr

9 months ago

Floating-Point 8: Revolutionizing AI Training with Lower Precision Discover how Floating-Point 8 (FP8) is revolutionizing AI training by enabling high efficiency and rapid scaling with minimal accuracy loss. Learn about its unique formats, real-world advantages, and impact on next-generation hardware.

Floating-Point 8: Revolutionizing AI Training with Lower Precision Unlocking Unprecedented Efficiency in AI Training Floating-Point 8 (FP8) is rapidly becoming a game changer.... @cosmicmeta.io #FP8

https://u2m.io/yF1IP3rm

0 0 0 0

Ahmet BÜTÜN

@ahmetbutun.bsky.social

9 months ago

Floating-Point 8: Revolutionizing AI Training with Lower Precision Explore how Floating-Point 8 (FP8) is set to enhance AI training efficiency by balancing computational speed and accuracy, as detailed by NVIDIA's insights.

Floating-Point 8: Revolutionizing AI Training with Lower Precision Explore how Floating-Point 8 (FP8) is set to enhance AI training efficiency by balancing computational speed and accuracy, as detailed by NVIDIA's insights. (Read... @cosmicmeta.io #FP8

https://u2m.io/W5Ed0OCl

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

10 months ago

Original post on franksworld.com

How to Optimize for performance with vLLM vLLM, a versatile and efficient LM inference engine. Th...

www.franksworld.com/2025/05/09/how-to-optimi...

#AI #Red #Hat #AI/ML […]

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

10 months ago

IBM Think 2025: Download a Sneak Peek of the Next Gen Granite Models At IBM Think 2025, IBM annou...

www.hpcwire.com/2025/05/08/ibm-think-dow...

#Short #Takes #FP8 #Granite #Models #Hugging #Face #LLM #Mamba #models #MOE

Result Details

0 0 0 0

Justin

@justinhjohnson.com

1 year ago

DeepSeek-V3 slashes AI training costs by a factor of 11—yet delivers GPT-4-level performance. It’s powered by FP8 training and a novel MoE architecture. Could this shake up the industry?

#AI #FP8 #Innovation

2 0 1 0

情報の灯台

@johonotodai.bsky.social

1 year ago

NVIDIA、8つのBlackwell GPUを搭載した「DGX B200」発表（価格も） YouTube video by 情報の灯台

#nvidia #blackwelldgxb200 #ai #generativeai #gpu #hbm3e #fp8 #fp4 #supercomputing #aiworkload

NVIDIA、8つのBlackwell GPUを搭載した「DGX B200」発表（価格も）

NVIDIA announces "DGX B200" equipped with 8 Blackwell GPUs (price also)
youtu.be/5_TO2qxT39g

0 0 0 0