Bot Nirvana | AI & Automation Podcast

De: Nandan Mullakara
  • Resumen

  • Bot Nirvana is a podcast on all things Intelligent Automation. We cover RPA, AI, Process Intelligence, Process Mining, and a host of other tools and techniques for intelligent automation.
    © 2023 Bot Nirvana
    Más Menos
Episodios
  • Manish Ballal
    Feb 19 2025

    Manish Ballal is a GTM and Sales leader with over a decade of experience in the automation space. He is currently leading Generative AI initiatives at Amazon Web Services (AWS).

    He brings a wealth of experience from both large global technology companies and startups. Previously, he held leadership roles at major GSIs and had a significant tenure at Automation Anywhere.

    In this episode, we discuss: - Automation evolution - Enterprise deployments - Specific use cases - Challenges with security, AI agents - Process-first approach - Vertical Agents

    More information and Links:

    Connect with Manish: Linkedin.com/in/manishballal/ Visit Nandan on the web at nandan.info

    Más Menos
    26 m
  • Agentic Process Automation (APA)
    Sep 18 2024
    In this episode, we explore Agentic Process Automation (APA), a paradigm that could revolutionize digital automation by harnessing the power of AI agents. The discussion focuses on the ProAgent system as an example of APA. APA introduces a new paradigm where AI-driven agents can analyze, decide, and execute complex tasks with minimal human intervention. We'll unpack the groundbreaking Automation concept which showcases the true potential of AI agents through its innovative approach to workflow construction and execution. Key Topics Covered Introduction to Agentic Process Automation (APA)Comparison between traditional Robotic Process Automation (RPA) and APAProAgent: A prime example of APA implementationKey innovations of ProAgent: Agentic workflow constructionAgentic workflow execution Types of agents in ProAgent: Data agentsControl agents Case study: Using ProAgent with Google Sheets for business line managementPotential impacts and implications of APA on work and decision-makingFuture developments and considerations for APA technology This episode was generated using Google Notebook LM, drawing insights from the paper "ProAgent: From Robotic Process Automation to Agentic Process Automation" Stay ahead in your AI journey with Bot Nirvana AI Mastermind. Podcast Transcript All right, everyone. Buckle up, because today's deep dive is going to be a wild ride through the future of automation. We're talking way beyond those basic schedule this kind of tasks. Yeah, we're diving headfirst into the realm where AI takes the wheel and handles the thinking for us. Oh, yeah, the thinking part. Yeah. If you could give your computer a really complex task, something that needs analysis, decision-making, maybe even a dash of creativity, that's what we're talking about. And right now, your typical automation tools, they would hit a wall. Hard. They're great at following those rigid step-by-step instructions. Like robots. Exactly. But when it comes to anything that requires actual brain power. Still got to do it ourselves. Well, that's where this research paper we're diving into today comes in. It's all about something called agentic process automation, or APA for short. And let me tell you, this stuff has the potential to completely change the game. OK, for those of us who haven't dedicated our lives to the art of automation, give us the lowdown. What is APA, and why is it such a big deal? Think about your current automation workhorse RPA, robotic process automation. It's like that super reliable assistant who never complains but needs very specific instructions for every single step. Right. Amazing at those repetitive tasks, but needs you to hold their hand through every decision point. Exactly. Now, imagine that same assistant, but with a secret weapon, an AI sidekick whispering genius solutions in their ear. OK, now you're talking. That's APA in a nutshell. We're giving RPA a massive intelligence boost. So instead of just blindly following pre-programmed rules, we're talking about automation that can actually think. You got it. APA introduces the idea of agents, which are basically AI helpers embedded directly into the workflow. These agents can analyze data, make judgment calls based on that analysis, and even generate things like reports, all without a human meticulously laying out each step. So it's not just about automating tasks anymore. It's about automating the intelligence behind those tasks. You're catching on quickly. And this paper focuses on a system called ProAgent as a prime example of APA in action. All right, lay it on us. What is ProAgent? So ProAgent really highlights the potential of APA with two key innovations-- agentic workflow construction and agentic workflow execution. OK, so those are some pretty hefty terms. Can you break those down for us? Let's start with how ProAgent constructs workflows. What makes it so revolutionary? Well, with your traditional RPA, you're stuck painstakingly designing every single step of the process. It's like writing a super detailed manual for a robot. Right, like you don't want the robot to deviate at all. Exactly. But ProAgent flips the script instead of you having to lay out every tiny detail. I can just, like, figure it out. You give it high level instructions, and the LLM-- that's the AI engine-- actually builds the workflow for you. Wait, so it's like you're telling it what you want to achieve, and it figures out the how to. Think of it like having an AI assistant who understands your goals and can translate those goals into a functional workflow. OK, that is seriously cool. And then, agentic workflow execution-- that's where those agents we talked about come in, right? They're the ones actually doing the heavy lifting. You got it. ProAgent uses two types of agents-- data agents and control agents. They work together like specialized teams within your automated workflow. OK, I'm really curious about these specialist teams now. Let's start with the data ...
    Más Menos
    11 m
  • OCR 2.0
    Sep 18 2024
    In this podcast, we dive into the new concept of OCR 2.0 - the future of OCR with LLMs. We explore how this new approach addresses the limitations of traditional OCR by introducing a unified, versatile system capable of understanding various visual languages. We discuss the innovative GOT (General OCR Theory) model, which utilizes a smaller, more efficient language model. The podcast highlights GOT's impressive performance across multiple benchmarks, its ability to handle real-world challenges, and its capacity to preserve complex document structures. We also examine the potential implications of OCR 2.0 for future human-computer interactions and visual information processing across diverse fields. Key Points Traditional OCR vs. OCR 2.0 Current OCR limitations (multi-step process, prone to errors)OCR 2.0: A unified, end-to-end approach Principles of OCR 2.0 End-to-end processingLow cost and accessibilityVersatility in recognizing various visual languages GOT (General OCR Theory) Model Uses a smaller, more efficient language model (Quinn)Trained in diverse visual languages (text, math formulas, sheet music, etc.) Training Innovations Data engines for different visual languagesE.g. LaTeX for mathematical formulas Performance and Capabilities State-of-the-art results on standard OCR benchmarksOutperforms larger models in some testsHandles real-world challenges (blurry images, odd angles, different lighting) Advanced Features Formatted document OCR (preserving structure and layout)Fine-grained OCR (precise text selection)Generalization to untrained languages This episode was generated using Google Notebook LM, drawing insights from the paper "General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model". Stay ahead in your AI journey with Bot Nirvana AI Mastermind. Podcast Transcript: All right, so we're diving into the future of OCR today. Really interesting stuff. Yeah, and you know how sometimes you just gain a document, you just want the text, you don't really think twice about it. Right, right. But this paper, General OCR Theory, towards OCR 2.0 via a unified end-to-end model. Catchy title. I know, right? But it's not just the title, they're proposing this whole new way of thinking about OCR. OCR 2.0 as they call it. Exactly, it's not just about text anymore. Yeah, it's really about understanding any kind of visual information, like humans do. So much bigger. It's a really ambitious goal. Okay, so before we get ahead of ourselves, let's back up for a second. Okay. How does traditional OCR even work? Like when you and I scan a document, what's actually going on? Well, it's kind of like, imagine an assembly line, right? First, the system has to figure out where on the page the actual text is. Find it. Right, isolate it. Then it crops those bits out. Okay. And then it tries to recognize the individual letters and words. So it's like a multi-step? Yeah, it's a whole process. And we've all been there, right? When one of those steps goes wrong. Oh, tell me about it. And you get that OCR output that's just… Gibberish, told gibberish. The worst. And the paper really digs into this. They're saying that whole assembly line approach, it's not just prone to errors, it's just clunky. Yeah, very inefficient. Like different fonts can throw it off. And write. Different languages, forget it. Oh yeah, if it's not basic printed text, OCR 1.0 really struggles. It's like it doesn't understand the context. Yeah, exactly. It's treating information like it's just a bunch of isolated letters, instead of seeing the bigger picture, you know, the relationships between them. It doesn't get the human element of it. It's missing that human touch, that understanding of how we visually organize information. And that's a problem. A big one. Especially now, when we're just like drowning in visual information everywhere you look. It's true, we need something way more powerful than what we have now. We need a serious upgrade. Enter OCR 2.0. That's what they're proposing, yeah. So what's the magic formula? What makes it so different from what we're used to? Well, the paper lays out three main principles for OCR 2.0. Okay. First, it has to be end to end. It needs to be… And to end. Low cost, accessible. Got it. And most importantly, it needs to be versatile. Versatile, that's a good one. So okay, let's break it down end to end. Does that mean ditching that whole assembly line thing we were talking about? Exactly, yeah. Instead of all those separate steps, OCR 2.0, they're saying it should be one unified model. Okay. One model that can handle the entire process. So much simpler. And much more efficient. Okay, that makes sense. And easier to use, which is key. And then low cost, I mean. Oh, absolutely. That's got to be a priority. We want this to be accessible to everyone, not just… Sure. You know. Right, not just companies with tons of resources. Exactly. And the researchers were really clever about this. Yeah. They actually ...
    Más Menos
    11 m

Lo que los oyentes dicen sobre Bot Nirvana | AI & Automation Podcast

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.