Episodios

  • DeepSeek's Cost-Efficient Model Training ($5M vs hundreds of millions for competitors)
    Feb 22 2025

    The episode features hosts Chris Detzel and Michael Burke discussing DeepSeek, a Chinese AI company making waves in the large language model (LLM) space. Here are the key discussion points:

    Major Breakthrough in Cost Efficiency:
    - DeepSeek claimed they trained their latest model for only $5 million, compared to hundreds of millions or billions spent by competitors like OpenAI
    - This cost efficiency created market disruption, particularly affecting NVIDIA's stock as it challenged assumptions about necessary GPU resources

    Mixture of Experts (MoE) Innovation:
    - Instead of using one large model, DeepSeek uses multiple specialized "expert" models
    - Each expert model focuses on specific areas/topics
    - Uses reinforcement learning to route queries to the appropriate expert model
    - This approach reduces both training and inference costs
    - DeepSeek notably open-sourced their MoE architecture, unlike other major companies

    Technical Infrastructure:
    - Discussion of how DeepSeek achieved results without access to NVIDIA's latest GPUs
    - Highlighted the dramatic price increase in NVIDIA GPUs (from $3,000 to $30,000-$50,000) due to AI demand
    - Explained how inference costs (serving the model) often exceed training costs

    Chain of Thought Reasoning:
    - DeepSeek open-sourced their chain of thought reasoning system
    - This allows models to break down complex questions into steps before answering
    - Improves accuracy on complicated queries, especially math problems
    - Comparable to Meta's LLAMA in terms of open-source contributions to the field

    Broader Industry Impact:
    - Discussion of how businesses are integrating AI into their products
    - Example of ZoomInfo using AI to aggregate business intelligence and automate sales communications
    - Noted how technical barriers to AI implementation are lowering through platforms like Databricks

    The hosts also touched on data privacy concerns regarding Chinese tech companies entering the US market, drawing parallels to TikTok discussions. They concluded by discussing how AI tools are making technical development more accessible to non-experts and mentioned the importance of being aware of how much personal information these models collect about users.

    Más Menos
    25 m
  • Clean Data, Business Context, and the Future of Analytics - Featuring Noy Twerski, Sherloq Co-founder & CEO
    Feb 17 2025

    This episode of Data Hurdles features an in-depth conversation with Noy Twerski, CEO and Co-founder of Sherloq, a collaborative SQL repository platform. The discussion, hosted by Chris Detzel and Michael Burke, explores several key themes in data analytics and management.

    Key Topics Covered:

    1. Introduction to Sherloq
    - Sherloq is introduced as a plugin that integrates with various SQL editors including Databricks, Snowflake, and JetBrains editors
    - The platform serves as a centralized repository for SQL queries, addressing the common problem of scattered SQL code across organizations

    2. Origin Story
    - Twerski shares her background as a product manager who experienced firsthand the challenges of managing SQL queries
    - The company was founded about 2.5 years ago with her co-founder Nadav, whom she knew from computer science undergrad
    - They identified the problem through extensive user research, finding that 80% of data analysts struggled with locating their tables, fields, and SQL

    3. Business Context and AI Discussion
    - A significant portion of the conversation focuses on the relationship between SQL, business context, and AI
    - The hosts and guest discuss the challenges of automating SQL generation through AI, emphasizing the importance of business context
    - They explore why text-to-SQL solutions are more complex than they appear, particularly in enterprise settings

    4. Future Outlook
    - Discussion of Sherloq's future plans, focusing on deepening their collaborative SQL repository capabilities
    - Exploration of how the platform could serve as infrastructure for future AI capabilities
    - Consideration of data quality as an ongoing challenge in the enterprise data space

    5. Industry Insights
    - The conversation includes broader discussions about data quality, governance, and the evolution of data teams
    - Twerski shares insights about different user personas and how they approach the product differently

    Notable Aspects:
    - The podcast includes interesting perspectives on the future of data analytics and AI
    - There's a strong emphasis on practical business applications and real-world challenges
    - The hosts and guest share thoughtful insights about data quality as a persistent challenge in the industry

    The episode provides valuable insights for data professionals, particularly those interested in data management, SQL development, and the evolution of data tools in an AI-driven landscape.

    Más Menos
    34 m
  • Top 10 MDM 2025 Platforms - Who's Rising, Who's Falling & Why It Matters
    Dec 1 2024

    The Data Hurdles Impact Index (DHII) provides a comprehensive analysis of the top Master Data Management platforms, evaluating vendors based on multi-domain capabilities, core features, AI enablement, data governance integration, architecture flexibility, total cost of ownership, market reach, and vendor stability. This inaugural DHII analysis covers ten leading MDM platforms that are shaping enterprise data management in 2025.

    The assessment, led by 20-year MDM veteran Rohit Singh Verma, Director - Data practice, Nvizion Solutions, examines market leaders and emerging players including Informatica, Stibo Systems, Profisee, Reltio, Ataccama, TIBCO EBX, IBM Infosphere MDM, SAP MDM, Syndigo, and Viamedic. Each vendor is evaluated through the lens of practical implementation experience, market presence, and technological innovation.

    Key findings reveal Informatica's continued dominance with their IDMC cloud offering, though facing increasing pressure in specific domains from specialists like Stibo Systems in product data management. The analysis highlights a significant market opportunity in the Middle East, where only select vendors have established strong presences. The DHII also identifies critical factors beyond technical capabilities, including the importance of system integrator networks, implementation speed, and regional market penetration.

    The evaluation exposes interesting market dynamics, such as the challenges faced by legacy vendors like IBM and SAP in keeping pace with cloud-native solutions, and the emergence of AI-enabled capabilities as a key differentiator. The analysis also addresses the persistent challenge of high implementation failure rates (estimated at 75%) and how vendors are evolving to address this through improved user interfaces, AI-assisted implementations, and stronger partner ecosystems.

    This groundbreaking DHII assessment serves as an essential guide for organizations navigating the complex MDM vendor landscape, offering insights that go beyond traditional analyst evaluations to provide a practical, implementation-focused perspective on the market's leading solutions.

    Más Menos
    1 h y 7 m
  • The Future of Data Teams in the AI Era: Insights from Alex Welch, dbt Labs' Head of Data and Analytics
    Nov 1 2024

    In this insightful episode of Data Hurdles, hosts Chris Detzel and Michael Burke sit down with Alex Welch, Head of Data at dbt Labs, to explore the transformative impact of AI on data organizations and the future of analytics.

    With over a decade of experience in FinTech and now leading data initiatives at dbt Labs, Alex shares valuable perspectives on:

    • Data Quality & Governance:
    - The critical importance of establishing data quality frameworks
    - How to approach data governance without creating unnecessary friction
    - The balance between control and accessibility in data management

    • AI Implementation & Challenges:
    - Two major hurdles in AI adoption: data/tech debt and the skills/culture gap
    - Practical approaches to introducing AI into existing workflows
    - The importance of starting small rather than trying to "boil the ocean"

    • Future of Data Teams:
    - Emerging roles like prompt engineering specialists and AI ethics officers
    - The shift from hierarchical structures to dynamic pod-based teams
    - How human-AI collaboration will reshape organizational structures

    • Skills & Development:
    - Why traditional analytical skills remain crucial in the AI era
    - The importance of maintaining human judgment and expertise
    - How to prepare for an AI-augmented workplace

    The conversation takes an especially interesting turn when discussing practical applications of AI, including Alex's personal example of using AI for meal planning and grocery shopping automation. The hosts and guest also explore thought-provoking perspectives on maintaining human expertise while leveraging AI capabilities, emphasizing the importance of using AI to augment rather than replace human decision-making.

    The episode concludes with valuable insights about preparing organizations for emerging AI trends and the importance of considering security implications in an AI-enabled future.

    This episode is particularly relevant for:
    - Data leaders planning AI initiatives
    - Organizations navigating data quality challenges
    - Professionals interested in the future of data careers
    - Anyone looking to understand the practical implications of AI in business

    Más Menos
    51 m
  • Data Mesh in Action: Challenges, Opportunities, and Real-World Examples with Willem Koenders
    Sep 29 2024

    In this comprehensive episode of Data Hurdles, hosts Chris Detzel and Michael Burke engage in a deep and insightful conversation with Willem Koenders, a global data strategy leader at ZS Associates, about the increasingly popular concept of data mesh.

    The episode begins with Willem providing his background and expertise in the data field, setting the stage for a rich discussion. He explains the core concept of data mesh, describing it as a domain-driven approach to data architecture that emphasizes decentralized ownership and governance of data across an organization.

    Throughout the conversation, Willem uses various analogies to make the concept more accessible, likening data mesh to a net with strategic data nodes, and comparing data assets to real estate properties that need proper management and care. These analogies help illustrate the shift from centralized data warehouses or lakes to a more distributed, domain-oriented approach.

    The hosts and guest delve into the challenges of implementing data mesh, including cultural shifts required within organizations. Willem emphasizes the importance of clear ownership, quality control, and the need for a product-oriented mindset when it comes to data assets. He discusses how data mesh can help solve long-standing issues of data quality and accessibility that many organizations face.

    Real-world examples and case studies are shared, providing listeners with practical insights into how data mesh principles are being applied across various industries. Willem talks about the financial sector's early adoption of similar concepts and how medical technology companies are now embracing data mesh to deal with evolving market demands and data-generating products.

    The conversation also covers the critical aspect of data governance in a mesh environment. Willem explains how governance needs to be balanced between centralized standards (especially for security) and domain-specific controls. He stresses the importance of enablement and providing the right tools for domain teams to manage their data effectively.

    Chris and Michael bring up the challenges of cross-functional collaboration and the often siloed nature of data work in organizations. Willem acknowledges these difficulties and discusses strategies for improving communication and alignment between different teams and roles.

    The episode explores how to measure the business impact of data mesh implementations. Willem advocates for a portfolio approach, where organizations track the value generated by specific data assets and their associated use cases, rather than focusing solely on technology investments.

    Looking to the future, the discussion touches on the potential for data mesh to become a dominant data architecture approach, especially for larger and more complex organizations. Willem expresses hope that evolving tools and technologies, including AI, will make data mesh implementation more accessible to a broader range of companies.

    Throughout the episode, the hosts and guest maintain a balanced view, acknowledging both the potential benefits and the significant challenges of adopting a data mesh approach. They emphasize that success depends not just on technology, but on organizational culture, trust, and effective communication.

    The conversation concludes with reflections on the importance of building trust between different parts of an organization and how frameworks like data mesh can facilitate better collaboration and data utilization when implemented thoughtfully.

    This episode provides listeners with a comprehensive overview of data mesh, blending theoretical concepts with practical insights and real-world examples. It offers valuable perspectives for data professionals, business leaders, and anyone interested in modern data architecture and management strategies.

    Más Menos
    42 m
  • Revolutionizing Healthcare Data Sharing: Shubh Sinha, Integral's CEO, on Data Hurdles
    Aug 10 2024

    In this enlightening episode of "Data Hurdles," hosts Chris Detzel and Michael Burke engage in a deep conversation with Shubh Sinha, CEO and co-founder of Integral, about revolutionizing healthcare data sharing. Sinha, leveraging his experience at LiveRamp and his current leadership role at Integral, offers valuable insights into the intricate world of regulated data in healthcare. He elucidates how data fragmentation across various healthcare touchpoints creates significant challenges in comprehending a patient's complete journey. Sinha emphasizes the crucial balance between utilizing comprehensive patient data—encompassing both medical and non-medical information—and adhering strictly to evolving privacy regulations such as HIPAA, CCPA, and GDPR.

    The discussion explores Integral's innovative approach to these challenges, showcasing how their technology automates risk assessment and compliance checks for data sets, facilitating faster and more secure data sharing between healthcare entities. Sinha underscores the importance of proactive compliance in an increasingly regulated data landscape and how Integral's solutions are designed to swiftly adapt to new regulations. The conversation also addresses the impact of AI and large language models in the healthcare data space, highlighting new considerations such as bias in training data and the necessity for explainable AI in medical decision-making.

    As co-founder, Sinha provides a forward-looking perspective on the future of healthcare data, predicting a trend towards more regulated data across industries and positioning Integral as a vital link between compliance and data stacks. He envisions a future where data utility and privacy coexist harmoniously, fostering trust between healthcare providers and patients. The episode concludes with reflections on the growing importance of auditability and explainability in data-driven decisions, underscoring Integral's role in shaping a more transparent and efficient healthcare data ecosystem. This insightful discussion offers listeners a comprehensive understanding of the current challenges and innovative solutions in healthcare data sharing, highlighting how companies like Integral, under Sinha's co-leadership, are paving the way for more effective, compliant, and patient-centric healthcare data utilization.

    Más Menos
    27 m
  • Challenging Data Management Norms: A Conversation with Malcolm Hawker, Chief Data Officer at Profisee
    Jul 27 2024

    In this insightful episode of Data Hurdles, hosts Chris Detzel and Michael Burke welcome Malcolm Hawker, Chief Data Officer at Profisee, for an in-depth discussion on the evolving landscape of data management and the role of Chief Data Officers (CDOs) in today's organizations.

    The conversation kicks off with Malcolm sharing his journey from product management to becoming a prominent figure in the data management space. He provides valuable insights into his experiences at Dun & Bradstreet and as a Gartner analyst, which have shaped his perspectives on data governance and strategy.

    A significant portion of the episode is dedicated to Malcolm's contrarian view on the data mesh architecture. He articulates why he favors the data fabric approach, challenging the underlying assumptions of data mesh and discussing the practical limitations of fully decentralized data management. This leads to a broader discussion on the importance of balancing domain autonomy with cross-functional data needs in organizations.

    The conversation then shifts to the impact of AI and machine learning on data governance. Malcolm shares his optimistic view on how AI could potentially solve complex data management challenges, particularly in automating governance processes and bridging the gap between structured and unstructured data.

    Throughout the episode, Malcolm emphasizes the need for CDOs to focus on delivering tangible value to their organizations. He criticizes the overreliance on data maturity assessments and lengthy frameworks, instead advocating for a more practical, customer-centric approach to data management. The discussion touches on the importance of quantifying the value of data initiatives and improving communication with business stakeholders.

    The hosts and Malcolm also explore emerging trends that CDOs should be aware of, including the integration of product management principles into data leadership roles, the growing importance of sustainability in data management, and the need to change the narrative around data quality from a burden to an opportunity.

    Towards the end, the conversation turns to the future of the CDO role. Malcolm expresses optimism about the long-term prospects for data leadership, while acknowledging short-term challenges. He highlights the emergence of a new generation of CDOs who are willing to question the status quo and take innovative approaches to data management.

    Throughout the episode, Malcolm's passion for data management and his commitment to driving change in the industry shine through. His candid insights and provocative ideas make for a compelling and thought-provoking discussion that challenges listeners to rethink traditional approaches to data leadership and governance.

    This Data Hurdles episode offers valuable insights for current and aspiring CDOs, data professionals, and business leaders interested in leveraging data as a strategic asset in their organizations.

    Más Menos
    46 m
  • Stirring the Data Pot: DataKitchen's CEO, Founder, Head Chef, Christopher Bergh on Cooking Up Success
    Jun 30 2024

    This episode of Data Hurdles features an in-depth interview with Christopher Bergh, CEO and Head Chef of Data Kitchen. Hosts Chris Detzel and Michael Burke engage in a wide-ranging discussion about the challenges and opportunities in data analytics and engineering.

    Key Topics Covered:

    1. Introduction and Background
      • Chris Bergh introduces Data Kitchen and explains the company name's origin and significance.
      • He shares his background in software development and transition to data analytics.
    2. Core Challenges in Data Analytics
      • Berg emphasizes that 70-80% of data team work is waste.
      • He stresses the importance of focusing on eliminating waste rather than optimizing the productive 20-30%.
    3. Data Kitchen's Approach
      • The company aims to bring ideas from agile, DevOps, and lean manufacturing to data and analytics teams.
      • They focus on helping teams deliver insights to demanding customers consistently and innovatively.
    4. Key Problems in Data Teams
      • Difficulty in making quick changes and assessing their impact
      • Challenges in measuring team productivity and customer satisfaction
      • The need for better error detection and resolution in production
    5. Data Team Productivity and Happiness
      • Discussion on the high frustration levels among data professionals
      • The importance of connecting data teams with end customers for better feedback and satisfaction
    6. Data Quality and Testing
      • Bergh introduces Data Kitchen's approach to automatically generating data quality validation tests
      • The importance of business context in creating effective tests
    7. Data Journey Concept
      • Bergh explains the "data journey" as a fire alarm control panel for data processes
      • The importance of having a live, actionable view of the entire data production process
    8. Observability in Data Systems
      • Discussion on the future of observability in increasingly complex data systems
      • The need for cross-tool and deep-dive monitoring capabilities
    9. Impact of AI and LLMs
      • Bergh's perspective on the role of AI and Large Language Models in data work
      • Emphasis that while AI can improve efficiency, it doesn't solve the fundamental waste problem
    10. Open Source and Community
      • Data Kitchen's decision to open-source their software
      • The importance of spreading ideas and fostering community in the data space
    11. Certification and Education
      • Data Kitchen's certification program and its popularity among data professionals

    Key Takeaways:

    • The most significant challenge in data analytics is addressing the 70-80% of work that is waste.
    • Connecting data teams directly with customers can significantly improve outcomes and job satisfaction.
    • Automatically generated data quality tests and visualizing the entire data production process are crucial innovations.
    • While AI and new tools can improve efficiency, they don't address the core issues of waste and system-level problems in data work.
    • Open-sourcing and community building are essential for advancing the field of data analytics and engineering.
    Más Menos
    42 m