DeepSeek V3 Series Model Release Overview and Testing

September 30, 2025

Key Upgrade Highlights

DeepSeek’s new V3 series includes two versions:

V3.1: Production-ready stable version
V3.2-Exp: Cutting-edge experimental version

Major improvements:

✅ Reasoning: 9.3 point increase on GPQA benchmark
✅ Code Generation: Optimized for web/game frontend development
✅ Chinese Processing: Significant enhancement in long-form writing quality
✅ Function Calling: Improved API interaction reliability

Technical Specifications:

685B parameter mixture-of-experts model
128K context window
MIT open-source license

Try it | API Docs

Reasoning Capability Tests

Test Case 1: 7-digit Safe Combination

Problem:

Sroan has a private safe with a 7-digit combination using distinct numbers.
Guess #1: 9062437 
Guess #2: 8593624 
Guess #3: 4286915 
Guess #4: 3450982 
Hint: Each guess has exactly two non-adjacent digits completely correct (both digit and position)

Results:

V3.1: Incorrect reasoning ❌
V3.2-Exp: Correct answer ✅ (Solution: 4053927)

Test Case 2: 8-digit Safe Combination (Enhanced)

Problem:

Now using 8 distinct digits:
Guess #1: 42617895 
Guess #2: 05379821 
Guess #3: 27358014 
Guess #4: 34567902 
Same hint conditions apply

Results:

V3.1: Incorrect reasoning ❌
V3.2-Exp: Still couldn’t solve ❌
Multiple valid solutions exist (e.g. 45678912, 02368975)

Interesting Finding: SOTA-AI’s Early Model Combination Performance

SOTA-AI platform experiments showed that using DeepSeek-R1-0528’s reasoning_content output as input for DeepSeek-V3-0324 (when reasoning and instruction models were separate) demonstrated remarkable synergy:

🔍 Combination Test Results:

7-digit test: Perfect accuracy ✅
8-digit test: Successfully found multiple valid solutions ✅
- Example output: "Through elimination, possible combinations include 45678912 or 02368975"

💡 Technical Principle:

R1 model generates detailed reasoning steps
V3 model makes final judgments based on these steps
This “step-by-step reasoning + comprehensive judgment” approach effectively overcomes single-model limitations

Conclusion

DeepSeek V3.2-Exp outperforms V3.1 on basic reasoning tasks
More complex 8-digit problems require:
- Longer reasoning chains
- Or innovative architecture designs (like third-party model combinations)
Looking forward to continued optimization in complex logical reasoning

Tip: These combination lock problems effectively test AI reasoning capabilities through elimination and logical chain construction. Third-party model combinations provide valuable reference for architecture optimization.

Technical Evaluation & Decision: Our platform’s SOTA-AI will not be upgraded to DeepSeek V3.2-Exp

Based on SOTA-AI platform testing data, we’re keeping production on current versions despite V3.2-Exp’s 40% lower API costs. Key technical considerations:

Core Issue: UE8M0 FP8 Format’s Radical Design

Precision Loss Risk
- Uses “8-bit exponent (E8) + 0-bit mantissa (M0)” pure exponential encoding
- Unstable performance on precision-sensitive tasks like semantic understanding
- Example: Higher error rate than R1 on “校服上别别别的” polysemy parsing
Reasoning Quality Tradeoff
- Hybrid reasoning modes (Think/Non-Think) show benchmark performance drops
- Similar Qwen research suggests flexible mode switching may reduce quality

Performance Comparison

Test Item	R1-0528	V3.2-Exp	Analysis
Semantic Accuracy	92%	85%	UE8M0 unfriendly to semantic encoding
Reasoning Latency	320ms	210ms	FP8 computational efficiency advantage
Long-text Coherence	4.8/5	4.2/5	Missing mantissa affects context modeling

Final Decision Factors

Quality First Principle
- Current R1+V3 combination maintains 98.3% accuracy in key scenarios
Cost-Benefit Analysis
- While V3.2-Exp API is 40% cheaper, error handling costs rise 60%
Technical Maturity
- Awaiting UE8M0 e8m2 improved version (expected 2025Q4)

Note: This decision applies specifically to SOTA-AI’s use cases. Other applications may require different tradeoffs. We’ll continue monitoring V3.3 improvements.