• 269: Crowdstrike: Does Anyone Know the Graviton of this Situation?

  • Jul 30 2024
  • Length: 1 hr and 13 mins
  • Podcast

269: Crowdstrike: Does Anyone Know the Graviton of this Situation?

  • Summary

  • Welcome to episode 269 of the Cloud Pod Podcast – where the forecast is always cloudy! Justin, Matthew and Ryan are your hosts this week as we talk about – you guessed it – the Crowdstrike update that broke, well, everything! We’re also looking at Databricks, Google potentially buying Wiz, NY Summit news, and more!

    Titles we almost went with this week:
    • You can’t take Justin down; but a 23-hour flight to India (or Crowdstrike updates) can
    • Google wants Wiz, and Crowdstrike Strikes all
    • Crowdstrike, does anyone know the Graviton of this situation?
    • We are called to this summit to talk AWS AI Supremacy
    • Crowdstrike, Wiz and Chat GPT 4o Mini… oh my
    • An Impatient Wiz builds his own data centers not impacted by Crowdstrike
    A big thanks to this week’s sponsor: We’re sponsorless! Want to reach a dedicated audience of cloud engineers? Send us an email or hit us up on our Slack Channel and let’s chat! General News

    00:58 You Guessed It – Crowdstrike

    Microsoft, CrowdStrike outage disrupts travel and business worldwide

    Our Statement on Today’s Outage (listener note: paywall article)

    • It’s not every day you get to experience one of the largest IT Outages in history, and it even impacted our recording of the show last week.
    • Crowdstrike, a popular EDR solution caused major disruption to the worlds IT systems with an errant update to their software that caused servers to BSOD, disrupting travel (airplanes, trains, etc), governments, news organizations and more.
    • Crowdstrike removed the errant file quickly, but still the damage was done with tons of systems requiring manual intervention to be recovered.
      • The fix required booting into safe mode, and removing a file from the crowdstrike directory.
        • This was all complicated by bitlocker and lack of local admin rights for many end user devices.
      • Sometimes doing up to 15 reboots would bring the server back to life.
      • Swinging the hard drives from one broken server to a working server manually removes the files and puts them back.
    • The issue also caused a large-scale outage in the Azure Central region.
      • In addition to services on AWS being impacted that run Windows (Amazon is a well-known large Crowdstrike customer)
    • Crowdstrike CEO Goerge Kurtz (who happened to be the CTO at Mcafee during the 2010 Update Fiasco that impacted Mcafee clients globally) stated that he was deeply sorry and vowed to make sure every customer is fully recovered.
    • By the time of this recording, most clients should be mostly fixed and recovered, and we are all anxiously waiting to hear how this could have happened.

    04:50 Justin – “It’s really an Achilles heel of the cloud. I mean, to fix this, you need to be able to boot a server into safe mode or into recovery mode and then remove this file manually, which requires that you have console access, which, you know, Amazon just added a couple of years ago.”

    07:45 Matthew – “It’s always fun when you’re like, okay, everyone sit down, no stupid ideas. Like these crazy ideas that you have, like end up being the ones that work, but you would never realistically have the opportunity to try them because you know, one, how often and God, I hope in your day job, you’re not actively logging into the serial port for fun or how to automate your d...

    Show more Show less

What listeners say about 269: Crowdstrike: Does Anyone Know the Graviton of this Situation?

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.