#PerformancePortability — Bluesky Posts

@hgpu.bsky.social

4 months ago

Enhancing Transformer Performance and Portability through Auto-tuning Frameworks Abstract Transformer-based models such as BERT and GPT2 have become the foundation of many modern applications, yet their execution requires substantial computational and memory resources. To addre…

Enhancing Transformer Performance and Portability through Auto-tuning Frameworks

#CUDA #LLM #AutoTuning #PerformancePortability #Package

hgpu.org?p=30329

0 0 0 0

HGPU group

@hgpu.bsky.social

8 months ago

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration The rapid growth of deep learning has driven exponential increases in model parameters and computational demands. NVIDIA GPUs and their CUDA-based software ecosystem provide robust support for para…

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

#CUDA #LLM #Compilers #AI #PerformancePortability #Package

hgpu.org?p=29940

0 0 0 0

Amanda Randles 🧪⚛️ 👩‍🔬

@profamandarandles.bsky.social

9 months ago

🧪Curious about high performance across GPUs? Our new paper benchmarks a parallel FSI code on CUDA, SYCL & OpenMP across top systems. See Aristotle Martin present it at #ISC2025 on June 11, 10:45 in Hamburg!

#HPC #GPUcomputing #PerformancePortability

7 1 1 0

HGPU group

@hgpu.bsky.social

9 months ago

Acceleration as a Service (XaaS) Source Containers In this thesis, we address the challenge of performance portability in heterogeneous computing environments. Performance portability refers to the ability of an application to maintain high perform…

Thesis: Acceleration as a Service (XaaS) Source Containers

#HPC #MPI #PerformancePortability #LLM #Package

hgpu.org?p=29925

1 0 0 0

HGPU group

@hgpu.bsky.social

9 months ago

Exploring SYCL for batched kernels with memory allocations Batched kernels with memory allocations is a common pattern in HPC, appearing in multi-dimensional FFTs, neural networks processing, or split computation of numerical operators. Its efficient suppo…

Exploring SYCL for batched kernels with memory allocations

#SYCL #CUDA #PerformancePortability #Package

hgpu.org?p=29911

0 0 0 0

HGPU group

@hgpu.bsky.social

11 months ago

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing …

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

#SYCL #TaskScheduling #PerformancePortability #HPC #Package

hgpu.org?p=29823

0 1 0 0

HGPU group

@hgpu.bsky.social

1 year ago

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified so…

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

#Kokkos #CUDA #HIP #OpenMP #PerformancePortability #Package

hgpu.org?p=29747

1 1 0 0

HGPU group

@hgpu.bsky.social

1 year ago

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL The performance and energy efficiency offered by heterogeneous systems are highly useful for modern C++ applications, but the technological variety demands adequate portability and programmability.…

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

#SYCL #OpenCL #CUDA #LLVM #PerformancePortability #LoadBalancing #HybridComputing

hgpu.org?p=29717

1 0 0 0

HGPU group

@hgpu.bsky.social

1 year ago

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study…

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

#SYCL #oneAPI #Bioinformatics #Databases #HPC #PerformancePortability #Package

hgpu.org?p=29596

2 0 0 0

HGPU group

@hgpu.bsky.social

1 year ago

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study Applications that analyze data from modern scientific experiments will soon require a computing capacity of ExaFLOPs. The current trend to achieve such performance is to employ GPU-accelerated supe…

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study
#HIP #SYCL #OpenMP #CUDA #PerformancePortability #HPC #Astrophysics #Package
hgpu.org?p=29555

3 0 0 1

HGPU group

@hgpu.bsky.social

1 year ago

Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen Kokkidio is a newly developed C++ template library that combines the performance portability framework Kokkos and its strength in utilising GPUs with the expressive syntax and CPU optimisations of …

Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen
#GPU #Kokkos #PerformancePortability #Package
hgpu.org?p=29541

3 0 0 0