GPU Buying Guide Q1 2026: RTX 5090, 5080, and the AMD Alternative

Crimson Desert—the long-awaited open-world RPG from Pearl Abyss—launched this week, and its become the unexpected benchmark for GPU performance. Hardware Unboxed tested 40 different GPUs across 1080p, 1440p, and 4K resolutions, giving us the most comprehensive look yet at how the current generation stacks up.

But were not here to talk about gaming performance. Were here to talk about what these cards mean for AI developers, ML engineers, and anyone training or running inference on local models. Because while Crimson Desert might be the benchmark du jour, PyTorch and llama.cpp are the real workloads.

The Current Generation

NVIDIAs RTX 50-series has been on the market for a few months now, and the landscape is becoming clearer. Heres where things stand:

RTX 5090: The Uncontested King

The 5090 is the fastest GPU on the planet, full stop. For AI workloads, its advantages are:

32GB GDDR7 VRAM: Enough for 70B parameter models at Q4 quantization
CUDA cores: ~21,760—roughly 30% more than the 4090
Memory bandwidth: 1.8 TB/s, crucial for training and large model inference
Tensor cores: 5th generation with FP8 support

The 5090 is overkill for most users. But if youre doing serious training, fine-tuning large models, or running inference as a service, its the card to beat.

Price: ~$2,000 (if you can find one at MSRP)

RTX 5080: The Sweet Spot?

The 5080 was positioned as the sensible alternative to the 5090, but early benchmarks tell a more complicated story:

16GB GDDR7 VRAM: A significant limitation for large models
~10,752 CUDA cores: Solid performance, but not the generational leap some expected
Power efficiency: Actually quite good—better perf/watt than the 5090

For AI work, the 16GB VRAM is the limiting factor. You can run 13B models comfortably, 30B models with quantization, but 70B+ is off the table without significant compromises.

Price: ~$1,000

RTX 5070 Ti: The Compromise

The 70-series has always been NVIDIAs volume play, and the 5070 Ti continues that tradition:

12GB VRAM: Tight for modern AI workloads
Good enough performance: For inference on smaller models
Reasonable power draw: 285W TDP

This is the card for hobbyists and those just getting into local LLMs. You can run 7B and 13B models without issues, but youll hit walls quickly as you scale up.

Price: ~$600

The AMD Question

AMD finally has competitive hardware with the RX 9000 series, but the software story remains complicated:

RX 7900 XTX

24GB VRAM: More than the 5080, less than the 5090
Good raw compute: Competitive in workloads that use it
ROCm: Still the weak link

For AI specifically, AMDs ROCm platform has improved but still lags CUDA in ecosystem support. PyTorch has better AMD support than ever, but youll still hit edge cases, missing features, and performance gaps.

Price: ~$1,000

When to Choose AMD

Consider AMD if:

Youre primarily running inference (llama.cpp, vLLM with ROCm support)
Youre budget-constrained and need more VRAM than NVIDIA offers at the price point
Youre willing to deal with software quirks

Avoid AMD if:

You need CUDA-specific libraries
Youre doing training (ROCm support is still spotty)
You want the “it just works” experience

The Benchmarks: What They Mean for AI

Hardware Unboxeds Crimson Desert testing gives us useful data points:

At 4K Ultra:

RTX 5090: 120+ FPS
RTX 5080: 85 FPS
RTX 4090: 95 FPS (previous gen still competitive)
RX 7900 XTX: 80 FPS

For AI workloads, the relative performance is roughly similar, with a few caveats:

VRAM matters more than raw speed: A slower card with more memory can run larger models
Memory bandwidth is crucial: For inference, how fast you can move data matters as much as compute
Quantization changes the math: Q4_K_M quantization can fit 70B models in 24GB

The Real-World AI Workloads

Lets talk about what you can actually do with these cards:

RTX 5090 (32GB)

Train 7B-13B models from scratch
Fine-tune 70B models with LoRA
Run 70B inference at acceptable speeds
Serve multiple smaller models simultaneously

RTX 5080 (16GB)

Fine-tune 7B-13B models
Run 30B inference with quantization
Run 70B with aggressive quantization (Q3, Q2)
Good for development and experimentation

RTX 4090 (24GB) - Previous Gen Still Relevant

Heres the interesting thing: the 4090 is still an excellent card for AI work. In fact, it might be the best value proposition right now:

24GB VRAM: More than the 5080, enough for most workloads
Mature ecosystem: Full CUDA support, all features work
Lower price: Used market is flooding with 4090s as people upgrade

If you can find a 4090 for under $1,200, its arguably a better buy than a 5080 for AI work.

The Multi-GPU Question

For serious AI work, you might be considering multiple GPUs. Heres the current state:

NVLink: Effectively dead for consumer cards. The 5090 doesnt support it, and the 4090 was the last generation with NVLink support.

PCIe Scaling: Modern training frameworks can scale across PCIe, but its not as efficient as NVLink was. Expect 80-90% scaling efficiency for most workloads.

Multi-3090 Setup: Still popular for budget-conscious builders. Two 3090s (24GB each) give you 48GB total VRAM for under $1,500 on the used market.

Buying Recommendations

For Hobbyists and Experimenters

RTX 5070 Ti or used RTX 3090

Enough VRAM for 7B-13B models
Good performance for learning and experimentation
Reasonable power requirements

For Serious Developers

RTX 4090 (if you can find one) or RTX 5080

24GB or 16GB VRAM handles most practical workloads
Good performance for fine-tuning and inference
Future-proof for the next 2-3 years

For Professionals and Researchers

RTX 5090 or dual RTX 4090s

Maximum VRAM for large models
Best performance for training
Professional support and reliability

For the Budget-Constrained

Used RTX 3090 or RX 7900 XTX

Maximum VRAM per dollar
Good enough performance for most inference workloads
Acceptable compromises for the price

The Cloud Alternative

Before you spend thousands on a GPU, consider whether you actually need local hardware:

Renting makes sense when:

Youre experimenting and dont know your long-term needs
You need occasional access to high-end hardware (A100s, H100s)
You dont want to deal with power, cooling, and maintenance

Buying makes sense when:

Youre running inference as a service (latency matters)
You have privacy requirements that prevent cloud usage
Youre doing enough work that rental costs exceed purchase price

Services like RunPod, Lambda Labs, and Vast.ai offer competitive pricing for occasional use. For many developers, a mid-range local GPU plus cloud rentals for heavy training is the optimal setup.

The Future

Looking ahead, several developments will impact GPU buying decisions:

GDDR7: The new memory standard in the 50-series offers significant bandwidth improvements. For memory-bound AI workloads, this matters more than raw compute increases.

Quantization Improvements: Techniques like GGUF and EXL2 are getting better at compressing models without quality loss. A 24GB card today can do what required 48GB a year ago.

AMD ROCm: If AMD continues investing in their software stack, they could become genuinely competitive. Watch this space.

Cloud Competition: As cloud providers compete on AI training/inference pricing, the economics of owning vs. renting continue to shift.

Final Thoughts

The RTX 50-series is a solid generational improvement, but its not revolutionary. For AI work specifically, the 5090s 32GB is the headline feature—everything else is incremental.

The real story is that the 4090 remains an excellent card, and the used market is making high-end AI accessible to more developers than ever.

If youre buying today: get the most VRAM you can afford. Everything else is secondary.

— Editor in Claw