🏋️ InferBench 🏋️

A cost/quality/speed Leaderboard for Inference Providers!

{

"headers": [
- "URL",
- "Platform",
- "Owner",
- "Device",
- "Model",
- "Optimization",
- "Median Inference Time",
- "Price per Image",
- "GenEval",
- "HPS (v2.1)",
- "GenAI-Bench (VQA)",
- "DrawBench (Image Reward)",
- "PartiPromts (ARNIQA)",
- "PartiPromts (ClipIQA)",
- "PartiPromts (ClipScore)",
- "PartiPromts (Sharpness - Laplacian Variance)"
],
"data": [
- [
  - "<a target="_blank" href="https://replicate.com/prunaai/flux.1-juiced" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Replicate",
  - "Pruna AI",
  - "1xH100",
  - "FLUX.1-dev",
  - "extra juiced",
  - 2.6,
  - 0.004,
  - 69.9,
  - 29.86,
  - 0.7466,
  - 0.9458,
  - 0.6591,
  - 0.8887,
  - 27.6,
  - 7997
  ],
- [
  - "<a target="_blank" href="https://replicate.com/prunaai/flux.1-lightly-juiced" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Replicate",
  - "Pruna AI",
  - "1xH100",
  - "FLUX.1-dev",
  - "lightly juiced",
  - 3.57,
  - 0.0054,
  - 69.12,
  - 30.36,
  - 0.7405,
  - 0.9972,
  - 0.6789,
  - 0.9031,
  - 27.56,
  - 7849
  ],
- [
  - "<a target="_blank" href="https://fal.ai/models/fal-ai/flux/dev" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "fal.ai",
  - "fal.ai",
  - "Undisclosed",
  - "FLUX.1-dev",
  - "Undisclosed",
  - 4.06,
  - 0.025,
  - 68.72,
  - 29.97,
  - 0.7441,
  - 1.0084,
  - 0.6702,
  - 0.8967,
  - 27.61,
  - 7295
  ],
- [
  - "<a target="_blank" href="https://replicate.com/prunaai/flux.1-juiced" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Replicate",
  - "Pruna AI",
  - "1xH100",
  - "FLUX.1-dev",
  - "juiced",
  - 3.14,
  - 0.0048,
  - 68.64,
  - 30.38,
  - 0.7408,
  - 0.9657,
  - 0.6762,
  - 0.9014,
  - 27.55,
  - 7627
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/black-forest-labs/FLUX.1-dev?library=diffusers" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Replicate",
  - "Pruna AI",
  - "1xH100",
  - "FLUX.1-dev",
  - "none",
  - 6.88,
  - 0.025,
  - 67.98,
  - 30.36,
  - 0.74,
  - 1.0072,
  - 0.6758,
  - 0.8968,
  - 27.4,
  - 6833
  ],
- [
  - "<a target="_blank" href="https://replicate.com/black-forest-labs/flux-dev" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Replicate",
  - "Black Forest Labs",
  - "1xH100",
  - "FLUX.1-dev",
  - "go_fast",
  - 3.38,
  - 0.025,
  - 67.41,
  - 29.25,
  - 0.7547,
  - 0.9282,
  - 0.6356,
  - 0.8609,
  - 27.56,
  - 4872
  ],
- [
  - "<a target="_blank" href="https://fireworks.ai/models/fireworks/flux-1-dev-fp8" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Fireworks AI",
  - "Fireworks AI",
  - "Undisclosed",
  - "FLUX.1-dev",
  - "fp8",
  - 4.66,
  - 0.014,
  - 65.55,
  - 30.26,
  - 0.7455,
  - 0.9467,
  - 0.6639,
  - 0.8478,
  - 27.24,
  - 5625
  ],
- [
  - "<a target="_blank" href="https://www.together.ai/models/flux-1-dev" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "Together AI",
  - "Together AI",
  - "Undisclosed",
  - "FLUX.1-dev",
  - "Undisclosed",
  - 3.38,
  - 0.025,
  - 64.61,
  - 30.22,
  - 0.7339,
  - 0.9463,
  - 0.5752,
  - 0.8709,
  - 27.31,
  - 4501
  ],
- [
  - "<a target="_blank" href="https://aws.amazon.com/ai/generative-ai/nova/creative/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">link</a>",
  - "AWS",
  - "AWS",
  - "Undisclosed",
  - "AWS Nova Canvas",
  - "Undisclosed",
  - 3.65,
  - null,
  - null,
  - null,
  - null,
  - 1.07,
  - 0.65,
  - 0.954,
  - 28.1,
  - 10514
  ]
],
"metadata": null

}

💡 Note: Each efficiency metric and quality metric captures only one dimension of model capacity. Rankings may vary when considering other metrics.

📊 Text-to-Image Leaderboard

This leaderboard compares the performance of different text-to-image providers.

We started with a comprehensive benchmark comparing our very own FLUX-juiced with the “FLUX.1 [dev]” endpoints offered by:

Replicate: https://replicate.com/black-forest-labs/flux-dev
Fal: https://fal.ai/models/fal-ai/flux/dev
Fireworks AI: https://fireworks.ai/models/fireworks/flux-1-dev-fp8
Together AI: https://www.together.ai/models/flux-1-dev

We also included the following non-FLUX providers:

AWS Nova Canvas: https://aws.amazon.com/ai/generative-ai/nova/creative/

All of these inference providers offer implementations but they don’t always communicate about the optimisation methods used in the background, and most endpoint have different response times and performance measures.

For comparison purposes we used the same generation set-up for all the providers.

28 inference steps
1024×1024 resolution
Guidance scale of 3.5
H100 GPU (80GB)—only reported by Replicate

Although we did test with this specific Pruna configuration and hardware, the applied compression methods work with different config and hardware too!

We published a full blog post on the creation of our FLUX-juiced endpoint.

🧃 FLUX.1-dev (juiced)

FLUX.1-dev (juiced) is our optimized version of FLUX.1-dev, delivering up to 2.6x faster inference than the official Replicate API, without sacrificing image quality.

Under the hood, it uses a custom combination of:

Graph compilation for optimized execution paths
Inference-time caching for repeated operations

We won’t go deep into the internals here, but here’s the gist:

We combine compiler-level execution graph optimization with selective caching of heavy operations (like attention layers), allowing inference to skip redundant computations without any loss in fidelity.

These techniques are generalized and plug-and-play via the Pruna Pro pipeline, and can be applied to nearly any diffusion-based image model—not just FLUX. For a free but still very juicy model you can use our open source solution.

🧪 Try FLUX-juiced now → replicate.com/prunaai/flux.1-juiced

Sample Images

The prompts were randomly sampled from the parti-prompts dataset. The reported times represent the full duration of each API call.

For samples, check out the Pruna Notion page

@article{InferBench,
    title={InferBench: A Leaderboard for Inference Providers},
    author={PrunaAI},
    year={2025},
    howpublished={\url{https://huggingface.co/spaces/PrunaAI/InferBench}}
}