• Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

  • Mar 28 2025
  • Duración: 21 m
  • Podcast

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

  • Resumen

  • Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

    https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

    … and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.


    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:36 - Fiction Bench
    02:41 - Practicality - YouTube urls + Security - cut-off date
    03:42 - Coding
    06:22 - WeirdML Bench
    07:01 - Simple Bench Record High
    11:23 - Reverse Engineering!
    13:22 - Anthropic Paper
    17:49 - 3 Caveats

    Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

    Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

    https://simple-bench.com/

    WeirdML: https://htihle.github.io/weirdml.html
    https://x.com/htihle/status/1905014058228625542

    Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
    https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

    https://aistudio.google.com/prompts/new_chat

    Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

    Live bench: https://livebench.ai/#/
    Paper: https://arxiv.org/pdf/2406.19314

    LiveCode Bench: https://livecodebench.github.io/

    SWE-Verified: https://arxiv.org/pdf/2310.06770


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Más Menos
adbl_web_global_use_to_activate_webcro768_stickypopup

Lo que los oyentes dicen sobre Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.