Why Your Browser Benchmark is Lying to You About AI Performance

For years, we’ve measured web performance through the lens of latency. How fast does this script load? How quickly can the engine execute this single loop? However, the “Document Web” is no longer active. We are now living in the “Compute Web” era—where browsers are expected to run local AI inference, process massive data streams, and handle complex UI states simultaneously.

Traditional benchmarks are like testing a 16-cylinder engine by checking the speed of a single piston. They don’t tell you how the engine performs under actual load.

The Single-Task Fallacy: An Outdated View

Most benchmarks (like JetStream or Speedometer) focus on sequential execution. While they are great for measuring JS engine maturity and browser performance, they fail to account for Task Saturation.

Peak performance in a modern AI WebApp is all about how efficiently the browser can orchestrate concurrent, resource-intensive tasks, such as:

CPU Intensive Work: Pre-processing large datasets or 50MB JSON payloads in a Web Worker.
GPU Intensive Work: Running a local AI inference model using WebGPU.
Main Thread Work: Ensuring the UI remains responsive to keep the user interface smooth at 60fps.

To perform their best, modern web apps require a benchmark that tests this simultaneous load, because the true bottleneck is often not the raw speed of one component, but the efficient “handoff” and scheduling between all of them. Focusing on a single isolated variable, such as raw GPU speed, fails to capture the “Ultimate Performance” of the application under real-world conditions.

Technical Deep-Dive

At ScaleDynamics, we’ve observed that the true bottleneck in AI-driven web apps is often the “handoff” between the CPU and GPU.

We built SpeedPower.run out of frustration. Existing browser benchmarks are too synthetic and disconnected from the real-world challenges of the modern web, where real web applications perform heavy pre/post-processing, run multiple AI models, and handle critical rendering simultaneously. Our mission is simple: to create the definitive benchmark for real-world compute performance on the modern web.

The Simultaneous Load Methodology

SpeedPower.run determines your browser and device’s maximum performance by pushing all CPUs and GPUs to their limit simultaneously.

Unlike other tools that test one thing at a time, we run multiple, concurrent tasks, like running AI inferences while also doing heavy JavaScript processing. We use all available web technologies: JavaScript, WASM, WebGL, and WebGPU.

How We Ensure a Fair Score (Methodology & Integrity):

The SpeedPower.run benchmark ensures a fair and accurate score by focusing purely on your device’s computational power. It achieves this by guaranteeing Zero Network Interference, as the test timer only starts after all large assets, including the ~400MB of AI models, are loaded into memory. Additionally, it implements a Warm-up Execution phase before recording the final score, allowing the browser to finish its internal optimizations (like code compilation) to ensure the result reflects your device’s peak performance, not initial slow-down.

To provide a reliable measurement, the methodology prioritizes Score Stability by using statistical regression analysis on peak metrics to smooth out system-level scheduling noise. This process generates a dependable result that is not based on a single moment in time. For users, the process is simple: since factors outside the benchmark’s control (like the operating system) can affect performance, it is recommended to run the test multiple times to confidently capture the highest possible score your device can achieve.

The Benchmarks

SpeedPower.run consists of the following core benchmarks:

JavaScript: This benchmark measures raw computational power for pre/post-processing on JS objects and JSON. It utilizes four tests from the Apple/WebKit JetStream 2 suite: Access Binary Trees, Control Flow Recursive, Regexp DNA, and String Tag Cloud. We run these benchmarks in parallel across multiple Web Workers to measure the maximum multi-core CPU processing power.
AI with TensorFlow.js: We utilize TensorFlow.js to test the maturity and performance of established web AI pipelines.
- AI Recognition TFJS: Measures the steady-state inference throughput of the BlazeFace model (via TensorFlow.js). Using a 128×128 input tensor and a pre-warmed graph, this test isolates the raw performance of the backend (JavaScript, WASM, WebGL, or WebGPU). It specifically measures the speed of the forward pass and the subsequent interpretive post-processing (decoding the highest-confidence face detection).
- AI Classify TFJS: This benchmark measures the throughput of the MobileNetV3 Small architecture. Using a fixed 224×224 input tensor and a pre-warmed graph, this test isolates the raw performance of the backend (JavaScript, WASM, WebGL, or WebGPU). It specifically measures the speed of the forward pass and the subsequent interpretive post-processing (decoding the highest-confidence score).
AI with Transformers.js: SpeedPower.run pushes the boundaries of next-gen in-browser AI by leveraging Transformers.js v3 for our most advanced workloads.
- AI Classify Transformers: Measures the throughput of the MobileNetV4-Small architecture (via Transformers.js v3). It prioritizes a high-performance WebGPU backend (falling back to WebGL) with a fixed 224×224 input tensor. This score reflects the system’s capacity for parallel inference, leveraging asynchronous command queues and compute shaders to process workloads with high concurrency.
- AI LLM Transformers: Measures the throughput of the SmolLM2-135M-Instruct causal language model (via Transformers.js v3). Using a 4-bit quantized (q4) ONNX model, this benchmark isolates the GPU runtime efficiency from model loading overhead. It captures the hardware’s ability to orchestrate multi-threaded LLM execution and real-time autoregressive decoding.
- AI Speech Transformers: Measures the throughput of the Moonshine-Tiny automatic speech recognition (ASR) architecture. It uses a hybrid-precision model (FP32 encoder + q4 decoder) to isolate GPU runtime efficiency from audio processing overhead. The score highlights the capacity for complex, high-concurrency speech-to-text pipelines.
Exchange: Since modern apps rely on Web Workers, the “Exchange” benchmark measures the communication bottleneck between the main thread and workers. It tests the transfer speed of IPC, Transferables, Arrays, Buffers, Objects, and OffScreen Canvas. The higher the score, the more efficiently your main thread communicates with background workers.

Architecture: No Installation Required

We were adamant that this should require zero installation or setup. By leveraging WebAssembly (WASM) and WebGPU, we can access the bare metal of your device directly through the browser.

You don’t need to download a 5GB suite to see if your rig is ready for the AI web. You just click, and in 30 seconds, we saturate every available thread to find your browser’s breaking point for modern, complex applications.

Help Us Calibrate the Benchmark

We are currently collecting data across thousands of hardware/browser combinations to refine our scoring for the “Ultimate Performance” of the modern web.

We’ve seen some fascinating anomalies already, like high-end mobile ARM chips showing better task-switching efficiency than some mid-range x86 desktops due to better thermal-aware scheduling in the browser.
Run the test on your dev rig: https://speedpower.run

Does the result match your “real-world” multitasking experience? Drop your score and your hardware specs in the comments. Let’s talk about the future of the compute-heavy web.

The Single-Task Fallacy: An Outdated View

Technical Deep-Dive

The Benchmarks

Architecture: No Installation Required

Help Us Calibrate the Benchmark

Leave a Reply Cancel reply