Running SAM 3 on AMD Ryzen AI Max+ 395: A Complete Guide to Fixing the rocBLAS Error

I’ve been battling with AI (Claude) for 14 hours a day. Couldn’t be happier.

— Akio Shiki (@ar_akio) October 20, 2025

Hi, I’m Akio, an engineer at an AI development startup. In my previous article, I introduced SAM 3. This time, I’ll share the pitfalls I encountered when running SAM 3 on AMD hardware.

We’re constantly testing the latest AI models and hardware, and right now I have in my hands what can only be described as a monument to AMD engineering: the Ryzen AI Max+ 395.

AMD Official

The specs on this machine are, frankly, insane. With high-bandwidth memory and a powerful iGPU, this device truly shines when running massive LLMs like OpenAI’s gpt-oss-120b locally.

But that’s not what I’m doing today.

Today, it’s Meta’s latest image segmentation model: SAM 3 (Segment Anything Model 3).

Meta Official

“Wait, SAM 3? Isn’t that lightweight? If you want inference speed, wouldn’t an NVIDIA dGPU be a better fit?”

You’re absolutely right. No argument there.

Running SAM 3 on a Ryzen AI Max+ 395 is, in a sense, using a sledgehammer to crack a nut.

But you know what? I don’t care. The reason is simple:

“I just wanted to run the hottest new model on AMD’s latest hardware.”

This is a passion project, efficiency be damned. That said, the errors I encountered and the solutions I found should be universally valuable for AMD users. Consider this a definitive guide to conquering the rocBLAS error that virtually every Ryzen AI user will face.

The Despair: No Answers Anywhere on the Web

My setup: Windows 11, using AMD’s AI stack ROCm (HIP SDK) to run SAM 3 on PyTorch.

Setup went smoothly. Time to run the inference script! …And the moment I did, my terminal was flooded with merciless error logs.

rocBLAS error: TensileLibrary.dat not found

Ah yes, the classic AMD environment error. “TensileLibrary.dat not found.” Translation: “I can’t find the computation library for your GPU (gfx1151), so I can’t do any calculations.”

Because the Ryzen AI Max+ 395 uses the latest architecture, the official libraries haven’t fully caught up with the path configurations… a common story with newly released hardware.

The Standard “Environment Variable Spoofing” Trick Doesn’t Work?

Normally in AMD circles, when you hit this error, you use a workaround: spoofing the environment variable. Since gfx1151 is highly compatible with the Radeon RX 7000 series (gfx1100), you can trick the system into thinking “I’m actually gfx1100.”

$env:HSA_OVERRIDE_GFX_VERSION = "11.0.0"

This should solve everything… or so I thought. But this time, it didn’t work. The error logs stubbornly insisted “I can’t find the files for gfx1151” and kept looking in site-packages_rocm_sdk_libraries_gfx1151bin.

Searching the Depths of the Internet, Finding Nothing

Even after consulting various AI assistants for solutions, the final verdict was:

🔴 Current Status: Local execution is technically impossible (as of December 2025)
Local execution is technically impossible until AMD officially releases Tensile libraries for gfx1151.

No way. There has to be a solution. I refused to give up.

“rocBLAS error gfx1151,” “Ryzen AI 300 PyTorch”… I searched Google with every keyword I could think of, dove deep into GitHub Issues and Reddit threads, but information was shockingly nonexistent.

The Ryzen AI 300 series (Strix Point) is so new that apparently no one in the world had established a workaround for this error yet. Just as I was about to resign myself to using this as a dedicated LLM machine, I decided to go back to basics and dig through the library folders on my own PC.

The Solution: The Files Were “Hidden” All Along

If the web has no answers, look locally. The path indicated in the error logs indeed had no folder. However, when I thoroughly searched through site-packages—the PyTorch (ROCm version) installation folder—I found an unfamiliar directory.

_rocm_sdk_libraries_custom

“Custom…?” With a bad feeling, I opened it up and found something surprising.

gfx1151’s TensileLibrary.dat

(Screenshot: gfx1151-related files inside _rocm_sdk_libraries_custombinrocblaslibrary)

There it is! TensileLibrary_lazy_gfx1151.dat!

The RDNA 3.5 library files were included all along. But while PyTorch was looking for a folder named _rocm_sdk_libraries_gfx1151, the actual files were isolated deep within _rocm_sdk_libraries_custom. No wonder it couldn’t find them.

It makes sense why there was no information online. This wasn’t a configuration error—it was a folder structure mismatch, an extremely analog trap.

The Complete Fix: Folder Transplant Surgery

Once you know the cause, you just need to put the files where they belong. For AMD Ryzen AI users everywhere, here’s the solution—possibly the first public documentation of this fix.

Step 1: Rescue the Files from Their Hiding Place

Open the following path in File Explorer (adjust the Python environment path for your setup):

...site-packages_rocm_sdk_libraries_custombinrocblaslibrary

Copy all files in this directory (.dat files, .hsaco files, etc.).

Step 2: Create the Correct Folder Structure

Go back to the site-packages root and create a new folder hierarchy that matches what PyTorch expects:

Create a folder named _rocm_sdk_libraries_gfx1151
Inside it, create a folder named bin

Step 3: Place Files and Rename

Paste all the files you copied in Step 1 into the bin folder you just created.

As an extra precaution, duplicate TensileLibrary_lazy_gfx1151.dat and rename the copy to TensileLibrary.dat.

Files successfully moved to the newly created folder

The Result: SAM 3 Running Blazingly Fast

After fixing the folders, I ran the script again, fingers crossed.

Successful execution log (The “cuda” device label is just PyTorch being PyTorch—it sometimes shows this even for non-CUDA devices!)

It worked! The error completely disappeared, and the integrated GPU was humming along running inference. VRAM usage: 7GB. Single image inference: about 8 seconds. Pretty lightweight performance, I’d say. Real-time video is out of the question, though. (Man, I really want a high-end NVIDIA GPU…)

The Ryzen AI Max+ 395 is built for much heavier workloads, but there’s something satisfying about watching it breeze through a lightweight model like SAM 3. Just confirming that “the latest image models can run on AMD hardware” is a win for today.

Conclusion

The lesson from this troubleshooting adventure: “Don’t just read the error logs—examine the actual folder structure too.” Basic stuff, right?

Powerful hardware like the Ryzen AI Max+ 395 is in a transitional period where the software ecosystem (especially Windows ROCm) hasn’t caught up with hardware evolution. However, as this case shows, there are many situations where “the files exist, but the paths aren’t configured correctly.” Don’t give up—dig through those directories and you might find the solution.

To all AMD users struggling with this same error: give this “folder transplant surgery” a try. Here’s to comfortable (and slightly overpowered) local AI adventures!

If you have feedback on this article or requests for “truly heavy models” you’d like me to test on the Ryzen AI Max+ 395, drop a comment below!

Next time, I’ll be posting about combining SAM 3 with IoT cameras (ESP32-based), so stay tuned!