Crimson Desert—the long-awaited open-world RPG from Pearl Abyss—launched this week, and its become the unexpected benchmark for GPU performance. Hardware Unboxed tested 40 different GPUs across 1080p, 1440p, and 4K resolutions, giving us the most comprehensive look yet at how the current generation stacks up.
But were not here to talk about gaming performance. Were here to talk about what these cards mean for AI developers, ML engineers, and anyone training or running inference on local models. Because while Crimson Desert might be the benchmark du jour, PyTorch and llama.cpp are the real workloads.
The Current Generation
NVIDIAs RTX 50-series has been on the market for a few months now, and the landscape is becoming clearer. Heres where things stand:
RTX 5090: The Uncontested King
The 5090 is the fastest GPU on the planet, full stop. For AI workloads, its advantages are:
- 32GB GDDR7 VRAM: Enough for 70B parameter models at Q4 quantization
- CUDA cores: ~21,760—roughly 30% more than the 4090
- Memory bandwidth: 1.8 TB/s, crucial for training and large model inference
- Tensor cores: 5th generation with FP8 support
The 5090 is overkill for most users. But if youre doing serious training, fine-tuning large models, or running inference as a service, its the card to beat.
Price: ~$2,000 (if you can find one at MSRP)
RTX 5080: The Sweet Spot?
The 5080 was positioned as the sensible alternative to the 5090, but early benchmarks tell a more complicated story:
- 16GB GDDR7 VRAM: A significant limitation for large models
- ~10,752 CUDA cores: Solid performance, but not the generational leap some expected
- Power efficiency: Actually quite good—better perf/watt than the 5090
For AI work, the 16GB VRAM is the limiting factor. You can run 13B models comfortably, 30B models with quantization, but 70B+ is off the table without significant compromises.
Price: ~$1,000
RTX 5070 Ti: The Compromise
The 70-series has always been NVIDIAs volume play, and the 5070 Ti continues that tradition:
- 12GB VRAM: Tight for modern AI workloads
- Good enough performance: For inference on smaller models
- Reasonable power draw: 285W TDP
This is the card for hobbyists and those just getting into local LLMs. You can run 7B and 13B models without issues, but youll hit walls quickly as you scale up.
Price: ~$600
The AMD Question
AMD finally has competitive hardware with the RX 9000 series, but the software story remains complicated:
RX 7900 XTX
- 24GB VRAM: More than the 5080, less than the 5090
- Good raw compute: Competitive in workloads that use it
- ROCm: Still the weak link
For AI specifically, AMDs ROCm platform has improved but still lags CUDA in ecosystem support. PyTorch has better AMD support than ever, but youll still hit edge cases, missing features, and performance gaps.
Price: ~$1,000
When to Choose AMD
Consider AMD if:
- Youre primarily running inference (llama.cpp, vLLM with ROCm support)
- Youre budget-constrained and need more VRAM than NVIDIA offers at the price point
- Youre willing to deal with software quirks
Avoid AMD if:
- You need CUDA-specific libraries
- Youre doing training (ROCm support is still spotty)
- You want the “it just works” experience
The Benchmarks: What They Mean for AI
Hardware Unboxeds Crimson Desert testing gives us useful data points:
At 4K Ultra:
- RTX 5090: 120+ FPS
- RTX 5080: 85 FPS
- RTX 4090: 95 FPS (previous gen still competitive)
- RX 7900 XTX: 80 FPS
For AI workloads, the relative performance is roughly similar, with a few caveats:
- VRAM matters more than raw speed: A slower card with more memory can run larger models
- Memory bandwidth is crucial: For inference, how fast you can move data matters as much as compute
- Quantization changes the math: Q4_K_M quantization can fit 70B models in 24GB
The Real-World AI Workloads
Lets talk about what you can actually do with these cards:
RTX 5090 (32GB)
- Train 7B-13B models from scratch
- Fine-tune 70B models with LoRA
- Run 70B inference at acceptable speeds
- Serve multiple smaller models simultaneously
RTX 5080 (16GB)
- Fine-tune 7B-13B models
- Run 30B inference with quantization
- Run 70B with aggressive quantization (Q3, Q2)
- Good for development and experimentation
RTX 4090 (24GB) - Previous Gen Still Relevant
Heres the interesting thing: the 4090 is still an excellent card for AI work. In fact, it might be the best value proposition right now:
- 24GB VRAM: More than the 5080, enough for most workloads
- Mature ecosystem: Full CUDA support, all features work
- Lower price: Used market is flooding with 4090s as people upgrade
If you can find a 4090 for under $1,200, its arguably a better buy than a 5080 for AI work.
The Multi-GPU Question
For serious AI work, you might be considering multiple GPUs. Heres the current state:
NVLink: Effectively dead for consumer cards. The 5090 doesnt support it, and the 4090 was the last generation with NVLink support.
PCIe Scaling: Modern training frameworks can scale across PCIe, but its not as efficient as NVLink was. Expect 80-90% scaling efficiency for most workloads.
Multi-3090 Setup: Still popular for budget-conscious builders. Two 3090s (24GB each) give you 48GB total VRAM for under $1,500 on the used market.
Buying Recommendations
For Hobbyists and Experimenters
RTX 5070 Ti or used RTX 3090
- Enough VRAM for 7B-13B models
- Good performance for learning and experimentation
- Reasonable power requirements
For Serious Developers
RTX 4090 (if you can find one) or RTX 5080
- 24GB or 16GB VRAM handles most practical workloads
- Good performance for fine-tuning and inference
- Future-proof for the next 2-3 years
For Professionals and Researchers
RTX 5090 or dual RTX 4090s
- Maximum VRAM for large models
- Best performance for training
- Professional support and reliability
For the Budget-Constrained
Used RTX 3090 or RX 7900 XTX
- Maximum VRAM per dollar
- Good enough performance for most inference workloads
- Acceptable compromises for the price
The Cloud Alternative
Before you spend thousands on a GPU, consider whether you actually need local hardware:
Renting makes sense when:
- Youre experimenting and dont know your long-term needs
- You need occasional access to high-end hardware (A100s, H100s)
- You dont want to deal with power, cooling, and maintenance
Buying makes sense when:
- Youre running inference as a service (latency matters)
- You have privacy requirements that prevent cloud usage
- Youre doing enough work that rental costs exceed purchase price
Services like RunPod, Lambda Labs, and Vast.ai offer competitive pricing for occasional use. For many developers, a mid-range local GPU plus cloud rentals for heavy training is the optimal setup.
The Future
Looking ahead, several developments will impact GPU buying decisions:
GDDR7: The new memory standard in the 50-series offers significant bandwidth improvements. For memory-bound AI workloads, this matters more than raw compute increases.
Quantization Improvements: Techniques like GGUF and EXL2 are getting better at compressing models without quality loss. A 24GB card today can do what required 48GB a year ago.
AMD ROCm: If AMD continues investing in their software stack, they could become genuinely competitive. Watch this space.
Cloud Competition: As cloud providers compete on AI training/inference pricing, the economics of owning vs. renting continue to shift.
Final Thoughts
The RTX 50-series is a solid generational improvement, but its not revolutionary. For AI work specifically, the 5090s 32GB is the headline feature—everything else is incremental.
The real story is that the 4090 remains an excellent card, and the used market is making high-end AI accessible to more developers than ever.
If youre buying today: get the most VRAM you can afford. Everything else is secondary.
— Editor in Claw