AI Explained Official Podcast

Episodios

o3 breaks (some) records, but AI becomes pay-to-win

Apr 25 2025

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.

https://app.grayswan.ai/ai-explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:33 - FictionLiveBench
01:37 - PHYBench
02:14 - SimpleBench
02:54 - Virology Capabilities Test
03:13 - Mathematics Performance
04:29 - Vision Benchmarks
05:43 - V* and how o3 works
06:44 - Revenue and costs for you
08:54 - Expensive RL and trade-offs
09:40 - How to spend the OOMs
13:27 - Gray Swan Arena

Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
Visual puzzles: https://neulab.github.io/VisualPuzzles/
Fiction Bench: https://x.com/ficlive/status/1912863028141244850
https://geobench.org/
https://simple-bench.com/
AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
USAMO: https://x.com/mbalunovic/status/1914398518896193747
NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
GPU Trade-offs: https://x.com/sama/status/1915098951067554030
RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
Model Size: https://x.com/slow_developer/status/1874554473256997201
Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
Papers on Patreon: https://arxiv.org/pdf/2502.01839
https://arxiv.org/pdf/2504.13837
Chollet Quote: https://x.com/fchollet/status/1912934762580447447
OpenSim: https://opensim.stanford.edu/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Más Menos

15 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
o3 and o4-mini - they’re great, but easy to over-hype

Apr 16 2025

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - o3 and o4-mini

https://simple-bench.com/

Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679

System Card: https://openai.com/index/o3-o4-mini-system-card/

Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/

https://deepmind.google/technologies/gemini/pro/

https://x.com/DeryaTR_/status/1912558350794961168

https://x.com/polynoamial/status/1912564068168450396

API Pricing:https://openai.com/api/pricing/

https://aider.chat/docs/leaderboards/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Más Menos

14 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed

Apr 16 2025

This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.
https://www.emergentmind.com/

Chapters:
00:00 - Introduction
00:30 - Kling 2.0
01:35 - GPT 4.1
05:25 - o3 Build-up
07:37 - ‘Product Company’
09:31 - Safe Superintelligence
10:54 - DolphinGemma
13:16 - Data Dominance?

Kling 2.0: https://app.klingai.com/global/release-notes

Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09

https://openai.com/index/gpt-4-1/

OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq

Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503

Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626

Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k

https://simple-bench.com/try-yourself

https://aider.chat/docs/leaderboards/

4.5: https://www.youtube.com/watch?v=6nJZopACRuQ

Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/

Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151
Evals: https://www.youtube.com/watch?v=scsW6_2SPC4
Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai
https://x.com/sethsaler/status/1912188383457059301

https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/
https://ai.meta.com/blog/llama-4-multimodal-intelligence/
https://deepmind.google/technologies/gemini/pro/
https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/
OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490

Más Menos

20 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...

Apr 7 2025

The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.

Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

DeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:47 - Stock Crash
02:28 - Llama 4
10:55 - o3 News
11:59 - OpenAI non-profit?
13:13 - AI 2027

Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJik

Knowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/

Aider Polyglot: https://aider.chat/docs/leaderboards/

Gemini 1.5: https://arxiv.org/pdf/2403.05530

Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlock

OpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans

Deep research System Card: https://cdn.openai.com/deep-research-system-card.pdf

https://openai.com/index/paperbench/

AI 2027: https://ai-2027.com/

METR Paper: https://arxiv.org/pdf/2503.14499

OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/

NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09

Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like

https://simple-bench.com/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Más Menos

24 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Mar 28 2025

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats

Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

https://simple-bench.com/

WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542

Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

https://aistudio.google.com/prompts/new_chat

Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314

LiveCode Bench: https://livecodebench.github.io/

SWE-Verified: https://arxiv.org/pdf/2310.06770

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Más Menos

21 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI

Mar 25 2025

Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:15 - Gemini 2.5 Benchmarks
05:46 - Long Context, Simple indication
07:08 - New Deepseek V3 -024
09:11 - Microsoft MAI
11:48 - 90% of code but new Claude jobs

‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975

Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/

Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihq

LMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1

Free for now: https://x.com/btibor91/status/1904578053537476628

Vista Bench:https://scale.com/leaderboard/visual_language_understanding

DeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324

Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemon
Amodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017s

Anthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008

Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/

https://simple-bench.com/

Release Date Comments: https://x.com/zacharynado/status/1904647277861318979

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Más Menos

14 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)

Mar 13 2025

Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI.

https://app.grayswan.ai/arena

AI Insiders ($9!): https://www.patreon.com/AIExplained
Patreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767

Chapters:
00:00 - Introduction
00:46 - Hype Campaign
02:40 - Single, Public Benchmark
03:12 - What is Manus AI?
04:22 - Test 1
05:12 - Cost and Rate Limits
06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch
08:24 - Test 3 (not AGI)
11:10 - 4 Trends in AI in 2025
11:37 - Hype Works

Manus AI: https://manus.im/app

Xiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensation

Gaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3
MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihq

Hype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940
https://x.com/EHuanglu/status/1899110687902978373
https://x.com/AJs_AI/status/1898756132384178291

Mistakes: https://x.com/TheXeophon/status/1898737178273829220

Tools and Code: https://x.com/peakji/status/1898994802194346408

https://operator.chatgpt.com/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Más Menos

13 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
GPT 4.5 - not so much wow

Feb 28 2025

GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI.

https://www.emergentmind.com/

AI Insiders (now $9!): https://www.patreon.com/AIExplained

Chapters
00:00 - Introduction
01:04 - Details and Benchmarks
03:04 - Emotional intelligence?
08:37 - Creative writing?
11:40 - Visual reasoning and Pricing
12:41 - Simple Performance
16:01 - End of Pretraining Scaling?
17:03 - CEO Hype
18:11 - System Card Highlights
23:32 - Karpathy Reaction

GPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf
Release Notes: https://openai.com/index/gpt-4-5-system-card/
Altman Hype: https://x.com/sama/status/1891533802779910471
Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792
End of an Era: https://x.com/wgussml/status/1895187231666774377
Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/
Smell: https://x.com/rapha_gl/status/1895213014699385082
Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265
Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdf
Reddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/
API Pricing: https://openai.com/api/pricing/
LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1s
https://simple-bench.com/

Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863
https://x.com/karpathy/status/1895337579589079434

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Más Menos

25 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis

Comienza Ahora

Listas Populares

Explora Audible

Episodios

o3 breaks (some) records, but AI becomes pay-to-win

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

o3 and o4-mini - they’re great, but easy to over-hype

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

GPT 4.5 - not so much wow

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast