• “Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

  • Nov 15 2024
  • Length: 27 mins
  • Podcast

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

  • Summary

  • Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.

    Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts.

    First, some high-level thoughts on what I want to talk about here:

    • I want to focus on a level of future capabilities substantially beyond current models, but below superintelligence: specifically something approximately human-level and substantially transformative, but not yet superintelligent.
      • While I don’t think that most of the proximate cause of AI existential risk comes from such models—I think most of the direct takeover [...]
    ---

    Outline:

    (02:31) Why is catastrophic sabotage a big deal?

    (02:45) Scenario 1: Sabotage alignment research

    (05:01) Necessary capabilities

    (06:37) Scenario 2: Sabotage a critical actor

    (09:12) Necessary capabilities

    (10:51) How do you evaluate a model's capability to do catastrophic sabotage?

    (21:46) What can you do to mitigate the risk of catastrophic sabotage?

    (23:12) Internal usage restrictions

    (25:33) Affirmative safety cases

    ---

    First published:
    October 22nd, 2024

    Source:
    https://www.lesswrong.com/posts/Loxiuqdj6u8muCe54/catastrophic-sabotage-as-a-major-threat-model-for-human

    ---

    Narrated by TYPE III AUDIO.

    Show more Show less
activate_Holiday_promo_in_buybox_DT_T2

What listeners say about “Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.