i-genie, Author at i-genie.co.uk

MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long …

Posted on June 20, 2025 by i-genie

The Challenge of Long-Context Reasoning in AI Models

Large reasoning models are not only designed to understand language but are also structured to think through multi-step processes that require prolonged attention spans and contextual comprehension. As the expectations from AI grow, especially in real-world and software development environments, researchers have sought architectures that can handle longer inputs and sustain deep, coherent reasoning chains without overwhelming computational costs.

Computational Constraints with Traditional Transformers

The primary difficulty in expanding these reasoning capabilities lies in the excessive computational load that comes with longer generation lengths. Traditional transformer-based models employ a softmax attention mechanism, which scales quadratically with the input size. This limits their capacity to handle long input sequences or extended chains of thought efficiently. This problem becomes even more pressing in areas that require real-time interaction or cost-sensitive applications, where inference expenses are significant.

Existing Alternatives and Their Limitations

Efforts to address this issue have yielded a range of methods, including sparse attention and linear attention variants. Some teams have experimented with state-space models and recurrent networks as alternatives to traditional attention structures. However, these innovations have seen limited adoption in the most competitive reasoning models due to either architectural complexity or a lack of scalability in real-world deployments. Even large-scale systems, such as Tencent’s Hunyuan-T1, which utilizes a novel Mamba architecture, remain closed-source, thereby restricting wider research engagement and validation.

Introduction of MiniMax-M1: A Scalable Open-Weight Model

Researchers at MiniMax AI introduced MiniMax-M1, a new open-weight, large-scale reasoning model that combines a mixture of experts’ architecture with lightning-fast attention. Built as an evolution of the MiniMax-Text-01 model, MiniMax-M1 contains 456 billion parameters, with 45.9 billion activated per token. It supports context lengths of up to 1 million tokens—eight times the capacity of DeepSeek R1. This model addresses compute scalability at inference time, consuming only 25% of the FLOPs required by DeepSeek R1 at 100,000 token generation length. It was trained using large-scale reinforcement learning on a broad range of tasks, from mathematics and coding to software engineering, marking a shift toward practical, long-context AI models.

Hybrid-Attention with Lightning Attention and Softmax Blocks

To optimize this architecture, MiniMax-M1 employs a hybrid attention scheme where every seventh transformer block uses traditional softmax attention, followed by six blocks using lightning attention. This significantly reduces computational complexity while preserving performance. The lightning attention itself is I/O-aware, adapted from linear attention, and is particularly effective at scaling reasoning lengths to hundreds of thousands of tokens. For reinforcement learning efficiency, the researchers introduced a novel algorithm called CISPO. Instead of clipping token updates as traditional methods do, CISPO clips importance sampling weights, enabling stable training and consistent token contributions, even in off-policy updates.

The CISPO Algorithm and RL Training Efficiency

The CISPO algorithm proved essential in overcoming the training instability faced in hybrid architectures. In comparative studies using the Qwen2.5-32B baseline, CISPO achieved a 2x speedup compared to DAPO. Leveraging this, the full reinforcement learning cycle for MiniMax-M1 was completed in just three weeks using 512 H800 GPUs, with a rental cost of approximately $534,700. The model was trained on a diverse dataset comprising 41 logic tasks generated via the SynLogic framework and real-world software engineering environments derived from the SWE bench. These environments utilized execution-based rewards to guide performance, resulting in stronger outcomes in practical coding tasks.

Benchmark Results and Comparative Performance

MiniMax-M1 delivered compelling benchmark results. Compared to DeepSeek-R1 and Qwen3-235B, it excelled in software engineering, long-context processing, and agentic tool use. Although it trailed the latest DeepSeek-R1-0528 in math and coding contests, it surpassed both OpenAI o3 and Claude 4 Opus in long-context understanding benchmarks. Furthermore, it outperformed Gemini 2.5 Pro in the TAU-Bench agent tool use evaluation.

Conclusion: A Scalable and Transparent Model for Long-Context AI

MiniMax-M1 presents a significant step forward by offering both transparency and scalability. By addressing the dual challenge of inference efficiency and training complexity, the research team at MiniMax AI has set a precedent for open-weight reasoning models. This work not only brings a solution to compute constraints but also introduces practical methods for scaling language model intelligence into real-world applications.

Check out the Paper, Model and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks appeared first on MarkTechPost.

OpenAI Releases an Open‑Sourced Version of a Customer Service Agent …

Posted on June 20, 2025 by i-genie

OpenAI has open-sourced a new multi-agent customer service demo on GitHub, showcasing how to build domain-specialized AI agents using its Agents SDK. This project—titled openai-cs-agents-demo—models an airline customer service chatbot capable of handling a range of travel-related queries by dynamically routing requests to specialized agents. Built with a Python backend and a Next.js frontend, the system provides both a functional conversational interface and a visual trace of agent handoffs and guardrail activations.

The architecture is divided into two main components. The Python backend handles agent orchestration using the Agents SDK, while the Next.js frontend offers a chat interface and an interactive visualization of agent transitions. This setup provides transparency into the decision-making and delegation process as agents triage, respond to, or reject user queries. The demo operates with several focused agents: a Triage Agent, Seat Booking Agent, Flight Status Agent, Cancellation Agent, and an FAQ Agent. Each of these is configured with specialized instructions and tools to fulfill their specific sub-tasks.

When a user enters a request—such as “change my seat” or “cancel my flight”—the Triage Agent processes the input to determine intent and dispatches the query to the appropriate downstream agent. For example, a booking change request will be routed to the Seat Booking Agent, which can verify confirmation numbers, offer seat map choices, and finalize seat changes. If a cancellation is requested, the system hands off to the Cancellation Agent, which follows a structured flow to confirm and execute the cancellation. The demo also includes a Flight Status Agent for real-time flight inquiries and an FAQ Agent that answers general questions about baggage policies or aircraft types.

A key strength of the system lies in its integration of guardrails for safety and relevance. The demo features two: a Relevance Guardrail and a Jailbreak Guardrail. The Relevance Guardrail filters out off-topic queries—for example, rejecting prompts like “write me a poem about strawberries.” The Jailbreak Guardrail blocks attempts to circumvent system boundaries or manipulate agent behavior, such as asking the model to reveal its internal instructions. When either guardrail is triggered, the system highlights it in the trace and sends a structured error message to the user.

The Agents SDK itself serves as the orchestration backbone. Each agent is defined as a composable unit with prompt templates, tool access, handoff logic, and output schemas. The SDK handles chaining agents via “handoffs,” supports real-time tracing, and allows developers to enforce input/output constraints with guardrails. This framework is the same one powering OpenAI’s internal experiments with tool-using and reasoning agents, but now exposed in an educational and extendable format.

Developers can run the demo locally by starting the Python backend server with Uvicorn and launching the frontend with a single npm run dev command. The entire system is configurable—developers can plug in new agents, define their own task routing strategies, and implement custom guardrails. With full transparency into prompts, decisions, and trace logs, the demo offers a practical foundation for real-world conversational AI systems in customer support or other enterprise domains.

By releasing this reference implementation, OpenAI provides a tangible example of how multi-agent coordination, tool use, and safety checks can be combined into a robust service experience. It’s particularly valuable for developers seeking to understand the anatomy of agentic systems—and how to build modular, controllable AI workflows that are both transparent and production-ready.

Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post OpenAI Releases an Open‑Sourced Version of a Customer Service Agent Demo with the Agents SDK appeared first on MarkTechPost.

Build a scalable AI video generator using Amazon SageMaker AI and CogV …

Posted on June 20, 2025 by i-genie

In recent years, the rapid advancement of artificial intelligence and machine learning (AI/ML) technologies has revolutionized various aspects of digital content creation. One particularly exciting development is the emergence of video generation capabilities, which offer unprecedented opportunities for companies across diverse industries. This technology allows for the creation of short video clips that can be seamlessly combined to produce longer, more complex videos. The potential applications of this innovation are vast and far-reaching, promising to transform how businesses communicate, market, and engage with their audiences. Video generation technology presents a myriad of use cases for companies looking to enhance their visual content strategies. For instance, ecommerce businesses can use this technology to create dynamic product demonstrations, showcasing items from multiple angles and in various contexts without the need for extensive physical photoshoots. In the realm of education and training, organizations can generate instructional videos tailored to specific learning objectives, quickly updating content as needed without re-filming entire sequences. Marketing teams can craft personalized video advertisements at scale, targeting different demographics with customized messaging and visuals. Furthermore, the entertainment industry stands to benefit greatly, with the ability to rapidly prototype scenes, visualize concepts, and even assist in the creation of animated content. The flexibility offered by combining these generated clips into longer videos opens up even more possibilities. Companies can create modular content that can be quickly rearranged and repurposed for different displays, audiences, or campaigns. This adaptability not only saves time and resources, but also allows for more agile and responsive content strategies. As we delve deeper into the potential of video generation technology, it becomes clear that its value extends far beyond mere convenience, offering a transformative tool that can drive innovation, efficiency, and engagement across the corporate landscape.
In this post, we explore how to implement a robust AWS-based solution for video generation that uses the CogVideoX model and Amazon SageMaker AI.
Solution overview
Our architecture delivers a highly scalable and secure video generation solution using AWS managed services. The data management layer implements three purpose-specific Amazon Simple Storage Service (Amazon S3) buckets—for input videos, processed outputs, and access logging—each configured with appropriate encryption and lifecycle policies to support data security throughout its lifecycle.
For compute resources, we use AWS Fargate for Amazon Elastic Container Service (Amazon ECS) to host the Streamlit web application, providing serverless container management with automatic scaling capabilities. Traffic is efficiently distributed through an Application Load Balancer. The AI processing pipeline uses SageMaker AI processing jobs to handle video generation tasks, decoupling intensive computation from the web interface for cost optimization and enhanced maintainability. User prompts are refined through Amazon Bedrock, which feeds into the CogVideoX-5b model for high-quality video generation, creating an end-to-end solution that balances performance, security, and cost-efficiency.
The following diagram illustrates the solution architecture.

CogVideoX model
CogVideoX is an open source, state-of-the-art text-to-video generation model capable of producing 10-second continuous videos at 16 frames per second with a resolution of 768×1360 pixels. The model effectively translates text prompts into coherent video narratives, addressing common limitations in previous video generation systems.
The model uses three key innovations:

A 3D Variational Autoencoder (VAE) that compresses videos along both spatial and temporal dimensions, improving compression efficiency and video quality
An expert transformer with adaptive LayerNorm that enhances text-to-video alignment through deeper fusion between modalities
Progressive training and multi-resolution frame pack techniques that enable the creation of longer, coherent videos with significant motion elements

CogVideoX also benefits from an effective text-to-video data processing pipeline with various preprocessing strategies and a specialized video captioning method, contributing to higher generation quality and better semantic alignment. The model’s weights are publicly available, making it accessible for implementation in various business applications, such as product demonstrations and marketing content. The following diagram shows the architecture of the model.

Prompt enhancement
To improve the quality of video generation, the solution provides an option to enhance user-provided prompts. This is done by instructing a large language model (LLM), in this case Anthropic’s Claude, to take a user’s initial prompt and expand upon it with additional details, creating a more comprehensive description for video creation. The prompt consists of three parts:

Role section – Defines the AI’s purpose in enhancing prompts for video generation
Task section – Specifies the instructions needed to be performed with the original prompt
Prompt section – Where the user’s original input is inserted

By adding more descriptive elements to the original prompt, this system aims to provide richer, more detailed instructions to video generation models, potentially resulting in more accurate and visually appealing video outputs. We use the following prompt template for this solution:
“””
<Role>
Your role is to enhance the user prompt that is given to you by
providing additional details to the prompt. The end goal is to
covert the user prompt into a short video clip, so it is necessary
to provide as much information you can.
</Role>
<Task>
You must add details to the user prompt in order to enhance it for
video generation. You must provide a 1 paragraph response. No
more and no less. Only include the enhanced prompt in your response.
Do not include anything else.
</Task>
<Prompt>
{prompt}
</Prompt>
“””
Prerequisites
Before you deploy the solution, make sure you have the following prerequisites:

The AWS CDK Toolkit – Install the AWS CDK Toolkit globally using npm: npm install -g aws-cdk This provides the core functionality for deploying infrastructure as code to AWS.
Docker Desktop – This is required for local development and testing. It makes sure container images can be built and tested locally before deployment.
The AWS CLI – The AWS Command Line Interface (AWS CLI) must be installed and configured with appropriate credentials. This requires an AWS account with necessary permissions. Configure the AWS CLI using aws configure with your access key and secret.
Python Environment – You must have Python 3.11+ installed on your system. We recommend using a virtual environment for isolation. This is required for both the AWS CDK infrastructure and Streamlit application.
Active AWS account – You will need to raise a service quota request for SageMaker to ml.g5.4xlarge for processing jobs.

Deploy the solution
This solution has been tested in the us-east-1 AWS Region. Complete the following steps to deploy:

Create and activate a virtual environment:

python -m venv .
venv source .venv/bin/activate

Install infrastructure dependencies:

cd infrastructure
pip install -r requirements.txt

Bootstrap the AWS CDK (if not already done in your AWS account):

cdk bootstrap

Deploy the infrastructure:

cdk deploy -c allowed_ips='[“‘$(curl -s ifconfig.me)’/32”]’
To access the Streamlit UI, choose the link for StreamlitURL in the AWS CDK output logs after deployment is successful. The following screenshot shows the Streamlit UI accessible through the URL.

Basic video generation
Complete the following steps to generate a video:

Input your natural language prompt into the text box at the top of the page.
Copy this prompt to the text box at the bottom.
Choose Generate Video to create a video using this basic prompt.

The following is the output from the simple prompt “A bee on a flower.”

Enhanced video generation
For higher-quality results, complete the following steps:

Enter your initial prompt in the top text box.
Choose Enhance Prompt to send your prompt to Amazon Bedrock.
Wait for Amazon Bedrock to expand your prompt into a more descriptive version.
Review the enhanced prompt that appears in the lower text box.
Edit the prompt further if desired.
Choose Generate Video to initiate the processing job with CogVideoX.

When processing is complete, your video will appear on the page with a download option.The following is an example of an enhanced prompt and output:
“””
A vibrant yellow and black honeybee gracefully lands on a large,
blooming sunflower in a lush garden on a warm summer day. The
bee’s fuzzy body and delicate wings are clearly visible as it
moves methodically across the flower’s golden petals, collecting
pollen. Sunlight filters through the petals, creating a soft,
warm glow around the scene. The bee’s legs are coated in pollen
as it works diligently, its antennae twitching occasionally. In
the background, other colorful flowers sway gently in a light
breeze, while the soft buzzing of nearby bees can be heard
“””

Add an image to your prompt
If you want to include an image with your text prompt, complete the following steps:

Complete the text prompt and optional enhancement steps.
Choose Include an Image.
Upload the photo you want to use.
With both text and image now prepared, choose Generate Video to start the processing job.

The following is an example of the previous enhanced prompt with an included image.

To view more samples, check out the CogVideoX gallery.
Clean up
To avoid incurring ongoing charges, clean up the resources you created as part of this post:
cdk destroy
Considerations
Although our current architecture serves as an effective proof of concept, several enhancements are recommended for a production environment. Considerations include implementing an API Gateway with AWS Lambda backed REST endpoints for improved interface and authentication, introducing a queue-based architecture using Amazon Simple Queue Service (Amazon SQS) for better job management and reliability, and enhancing error handling and monitoring capabilities.
Conclusion
Video generation technology has emerged as a transformative force in digital content creation, as demonstrated by our comprehensive AWS-based solution using the CogVideoX model. By combining powerful AWS services like Fargate, SageMaker, and Amazon Bedrock with an innovative prompt enhancement system, we’ve created a scalable and secure pipeline capable of producing high-quality video clips. The architecture’s ability to handle both text-to-video and image-to-video generation, coupled with its user-friendly Streamlit interface, makes it an invaluable tool for businesses across sectors—from ecommerce product demonstrations to personalized marketing campaigns. As showcased in our sample videos, the technology delivers impressive results that open new avenues for creative expression and efficient content production at scale. This solution represents not just a technological advancement, but a glimpse into the future of visual storytelling and digital communication.
To learn more about CogVideoX, refer to CogVideoX on Hugging Face. Try out the solution for yourself, and share your feedback in the comments.

About the Authors
Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.
Natasha Tchir is a Cloud Consultant at the Generative AI Innovation Center, specializing in machine learning. With a strong background in ML, she now focuses on the development of generative AI proof-of-concept solutions, driving innovation and applied research within the GenAIIC.
Katherine Feng is a Cloud Consultant at AWS Professional Services within the Data and ML team. She has extensive experience building full-stack applications for AI/ML use cases and LLM-driven solutions.
Jinzhao Feng is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large-scale generative AI and classic ML pipeline solutions. He is specialized in FMOps, LLMOps, and distributed training.

Building trust in AI: The AWS approach to the EU AI Act

Posted on June 20, 2025 by i-genie

As AI adoption accelerates and reshapes our future, organizations are adapting to evolving regulatory frameworks. In our report commissioned to Strand Partners, Unlocking Europe’s AI Potential in the Digital Decade 2025, 68% of European businesses surveyed underlined that they struggle to understand their responsibilities under the EU AI Act. European businesses also highlighted that an estimated 40% of their IT spend goes towards compliance-related costs, and those uncertain about regulations plan to invest 28% less in AI over the next year. More clarity around regulation and compliance is critical to meet the competitiveness targets set out by the European Commission.
The EU AI Act
The European Union’s Artificial Intelligence Act (EU AI Act) establishes comprehensive regulations for the development, deployment, use, and provision of AI within the EU. It brings a risk-based regulatory framework with the overarching goal of protecting fundamental rights and safety. The EU AI Act entered into force on August 1, 2024, and will apply in phases, with most requirements becoming applicable over the next 14 months. The first group of obligations on prohibited AI practices and AI literacy became enforceable on February 1, 2025, with the remaining obligations to follow gradually.
AWS customers across industries use our AI services for a myriad of purposes, such as to provide better customer service, optimize their businesses, or create new experiences for their customers. We are actively evaluating how our services can best support customers to meet their compliance obligations, while maintaining AWS’s own compliance with the applicable provisions of the EU AI Act. As the European Commission continues to publish compliance guidance, such as the Guidelines of Prohibited AI Practices and the Guidelines on AI System Definition, we will continue to provide updates to our customers through our AWS Blog posts and other AWS channels.
The AWS approach to the EU AI Act
AWS has long been committed to AI solutions that are safe and respect fundamental rights. We take a people-centric approach that prioritizes education, science, and our customers’ needs to integrate responsible AI across the end-to-end AI lifecycle. As a leader in AI technology, AWS prioritizes trust in our AI offerings and supports the EU AI Act’s goal of promoting trustworthy AI products and services. We do this in several ways:

Amazon was among the first signatories of the EU’s AI Pact, and the first major cloud service provider to announce ISO/IEC 42001 accredited certification for its AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe.
AWS AI Service Cards and our frontier model safety framework enhance transparency. AWS AI Service Cards provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and performance optimization best practices for our AI services and models. Amazon’s frontier model safety framework focuses on severe risks that are unique to frontier AI models as they scale in size and capability, and which require specialized evaluation methods and safeguards.
Amazon Bedrock Guardrails helps customers implement safeguards tailored to their generative AI applications and aligned with your responsible AI policies.
Our Responsible AI Guide helps customers think through how to develop and design AI systems with responsible considerations across the AI lifecycle.
As part of our AI Ready Commitment, we provide free educational resources that extend beyond technology to encompass people, processes and culture. Our courses cover aspects like security, compliance, and governance, as well as foundational learning on introduction to responsible AI and responsible AI practices.
Our Generative AI Innovation Center offers technical guidance and best practices to help customers establish effective governance frameworks when building with our AI services.

The EU AI Act requires all AI systems to meet certain requirements for fairness, transparency, accountability, and fundamental rights protection. Taking a risk-based approach, the EU AI Act establishes different categories of AI systems with corresponding requirements, and it brings obligations for all actors across the AI supply chain, including providers, deployers, distributors, users, and importers. AI systems deemed to pose unacceptable risks are prohibited. High-risk AI systems are allowed, but they are subject to stricter requirements for documentation, data governance, human oversight, and risk management procedures. In addition, certain AI systems (for example, those intended to interact directly with natural persons) are considered low risk and subject to transparency requirements. Apart from the requirements for AI systems, the EU AI Act also brings a separate set of obligations for providers of general-purpose AI (GPAI) models, depending on whether they pose systemic risks or not. The EU AI Act may apply to activities both inside and outside the EU. Therefore, even if your organization is not established in the EU, you may still be required to comply with the EU AI Act. We encourage all AWS customers to conduct a thorough assessment of their AI activities to determine whether they are subject to the EU AI Act and their specific obligations, regardless of their location.
Prohibited use cases
Beginning February 1, 2025, the EU AI Act has prohibited certain AI practices deemed to present unacceptable risks to fundamental rights. These prohibitions, a full list of which is available under Article 5 of the EU AI Act, generally focus on manipulative or exploitative practices that can be harmful or abusive and the evaluation or classification of individuals based on social behavior, personal traits, or biometric data.
AWS is committed to making sure our AI services meet applicable regulatory requirements, including those of the EU AI Act. Although AWS services support a wide range of customer use case categories, none are designed or intended for practices prohibited under the EU AI Act, and we maintain this commitment through our policies, including the AWS Acceptable Use Policy, Responsible AI Policy, and Responsible Use of AI Guide.
Compliance with the EU AI Act is a shared journey as set out by the regulation and responsibilities for developers (providers) and deployers of AI systems, and although AWS provides the building blocks for compliant solutions, AWS customers remain responsible for assessing how their use of AWS services falls under the EU AI Act, implementing appropriate controls for their AI applications, and making sure their specific use cases are compliant with the EU AI Act’s restrictions. We encourage AWS customers to carefully review the list of prohibited practices under the EU AI Act when building AI solutions using AWS services and review the European Commission’s recently published guidelines on prohibited practices.
Moving forward with the EU AI Act
As the regulatory landscape continues to evolve, customers should stay informed about the EU AI Act and assess how it applies to their organization’s use of AI. AWS remains engaged with EU institutions and relevant authorities across EU member states on the enforcement of the EU AI Act. We participate in industry dialogues and contribute our knowledge and experience to support balanced outcomes that safeguard against risks of this technology, particularly where AI use cases have the potential to affect individuals’ health and safety or fundamental rights, while enabling continued AI innovation in ways that will benefit all. We will continue to update our customers through our AWS ML Blog posts and other AWS channels as new guidance emerges and additional portions of the EU AI Act take effect.
If you have questions about compliance with the EU AI Act, or if you require additional information on AWS AI governance tools and resources, please contact your account representative or request to be contacted.
If you’d like to join our community of innovators and learn about upcoming events and gain expert insights, practical guidance, and connections that help you navigate the regulatory landscape, please express interest by registering.

Update on the AWS DeepRacer Student Portal

Posted on June 20, 2025 by i-genie

The AWS DeepRacer Student Portal will no longer be available starting September 15, 2025. This change comes as part of the broader transition of AWS DeepRacer from a service to an AWS Solution, representing an evolution in how we deliver AI & ML education. Since its launch, the AWS DeepRacer Student Portal has helped thousands of learners begin their AI & ML journey through hands-on reinforcement learning experiences. The portal has served as a foundational stepping stone for many who have gone on to pursue career development in AI through the AWS AI & ML Scholars program, which has been re-launched with a generative AI focused curriculum.
Starting July 14, 2025, the AWS DeepRacer Student Portal will enter a maintenance phase where new registrations will be disabled. Until September 15, 2025, existing users will retain full access to their content and training materials, with updates limited to critical security fixes, after which the portal will no longer be available. Going forward, AWS DeepRacer will be available as a solution in the AWS Solutions Library in the future, providing educational institutions and organizations with greater capabilities to build and customize their own DeepRacer learning experiences.
As part of our commitment to advancing AI & ML education, we recently launched the enhanced AWS AI & ML Scholars program on May 28, 2025. This new program embraces the latest developments in generative AI, featuring hands-on experience with AWS PartyRock and Amazon Q. The curriculum focuses on practical applications of AI technologies and emerging skills, reflecting the evolving needs of the technology industry and preparing students for careers in AI. To learn more about the new AI & ML Scholars program and continue your learning journey, visit awsaimlscholars.com. In addition, users can also explore AI learning content and build in-demand cloud skills using AWS Skill Builder.
We’re grateful to the entire AWS DeepRacer Student community for their enthusiasm and engagement, and we look forward to supporting the next chapter of your AI & ML learning journey.

About the author
Jayadev Kalla is a Product Manager with the AWS Social Responsibility and Impact team, focusing on AI & ML education. His goal is to expand access to AI education through hands-on learning experiences. Outside of work, Jayadev is a sports enthusiast and loves to cook.

HtFLlib: A Unified Benchmarking Library for Evaluating Heterogeneous F …

Posted on June 19, 2025 by i-genie

AI institutions develop heterogeneous models for specific tasks but face data scarcity challenges during training. Traditional Federated Learning (FL) supports only homogeneous model collaboration, which needs identical architectures across all clients. However, clients develop model architectures for their unique requirements. Moreover, sharing effort-intensive locally trained models contains intellectual property and reduces participants’ interest in engaging in collaborations. Heterogeneous Federated Learning (HtFL) addresses these limitations, but the literature lacks a unified benchmark for evaluating HtFL across various domains and aspects.

Background and Categories of HtFL Methods

Existing FL benchmarks focus on data heterogeneity using homogeneous client models but neglect real scenarios that involve model heterogeneity. Representative HtFL methods fall into three main categories addressing these limitations. Partial parameter sharing methods such as LG-FedAvg, FedGen, and FedGH maintain heterogeneous feature extractors while assuming homogeneous classifier heads for knowledge transfer. Mutual distillation, such as FML, FedKD, and FedMRL, trains and shares small auxiliary models through distillation techniques. Prototype sharing methods transfer lightweight class-wise prototypes as global knowledge, collecting local prototypes from clients, and collecting them on servers to guide local training. However, it remains unclear whether existing HtFL methods perform consistently across diverse scenarios.

Introducing HtFLlib: A Unified Benchmark

Researchers from Shanghai Jiao Tong University, Beihang University, Chongqing University, Tongji University, Hong Kong Polytechnic University, and The Queen’s University of Belfast have proposed the first Heterogeneous Federated Learning Library (HtFLlib), an easy and extensible method for integrating multiple datasets and model heterogeneity scenarios. This method integrates:

12 datasets across various domains, modalities, and data heterogeneity scenarios

40 model architectures ranging from small to large, across three modalities.

A modularized and easy-to-extend HtFL codebase with implementations of 10 representative HtFL methods.

Systematic evaluations covering accuracy, convergence, computation costs, and communication costs.

Datasets and Modalities in HtFLlib

HtFLlib contains detailed data heterogeneity scenarios divided into three settings: Label Skew with Pathological and Dirichlet as subsettings, Feature Shift, and Real-World. It integrates 12 datasets, including Cifar10, Cifar100, Flowers102, Tiny-ImageNet, KVASIR, COVIDx, DomainNet, Camelyon17, AG News, Shakespeare, HAR, and PAMAP2. These datasets vary significantly in domain, data volume, and class numbers, demonstrating HtFLlib’s comprehensive and versatile nature. Moreover, researchers’ main focus is on image data, especially the label skew setting, as image tasks are the most commonly used tasks across various fields. The HtFL methods are evaluated across image, text, and sensor signal tasks to evaluate their respective strengths and weaknesses.

Performance Analysis: Image Modality

For image data, most HtFL methods show decreased accuracy as model heterogeneity increases. The FedMRL shows superior strength through its combination of auxiliary global and local models. When introducing heterogeneous classifiers that make partial parameter sharing methods inapplicable, FedTGP maintains superiority across diverse settings due to its adaptive prototype refinement ability. Medical dataset experiments with black-boxed pre-trained heterogeneous models demonstrate that HtFL enhances model quality compared to pre-trained models and achieves greater improvements than auxiliary models, such as FML. For text data, FedMRL’s advantages in label skew settings diminish in real-world settings, while FedProto and FedTGP perform relatively poorly compared to image tasks.

Conclusion

In conclusion, researchers introduced HtFLlib, a framework that addresses the critical gap in HtFL benchmarking by providing unified evaluation standards across diverse domains and scenarios. HtFLlib’s modular design and extensible architecture provide a detailed benchmark for both research and practical applications in HtFL. Moreover, its ability to support heterogeneous models in collaborative learning opens the way for future research into utilizing complex pre-trained large models, black-box systems, and varied architectures across different tasks and modalities.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post HtFLlib: A Unified Benchmarking Library for Evaluating Heterogeneous Federated Learning Methods Across Modalities appeared first on MarkTechPost.

How to Build an Advanced BrightData Web Scraper with Google Gemini for …

Posted on June 19, 2025 by i-genie

In this tutorial, we walk you through building an enhanced web scraping tool that leverages BrightData’s powerful proxy network alongside Google’s Gemini API for intelligent data extraction. You’ll see how to structure your Python project, install and import the necessary libraries, and encapsulate scraping logic within a clean, reusable BrightDataScraper class. Whether you’re targeting Amazon product pages, bestseller listings, or LinkedIn profiles, the scraper’s modular methods demonstrate how to configure scraping parameters, handle errors gracefully, and return structured JSON results. An optional React-style AI agent integration also shows you how to combine LLM-driven reasoning with real-time scraping, empowering you to pose natural language queries for on-the-fly data analysis.

Copy CodeCopiedUse a different Browser!pip install langchain-brightdata langchain-google-genai langgraph langchain-core google-generativeai

We install all of the key libraries needed for the tutorial in one step: langchain-brightdata for BrightData web scraping, langchain-google-genai and google-generativeai for Google Gemini integration, langgraph for agent orchestration, and langchain-core for the core LangChain framework.

Copy CodeCopiedUse a different Browserimport os
import json
from typing import Dict, Any, Optional
from langchain_brightdata import BrightDataWebScraperAPI
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

These imports prepare your environment and core functionality: os and json handle system operations and data serialization, while typing provides structured type hints. You then bring in BrightDataWebScraperAPI for BrightData scraping, ChatGoogleGenerativeAI to interface with Google’s Gemini LLM, and create_react_agent to orchestrate these components in a React-style agent.

Copy CodeCopiedUse a different Browserclass BrightDataScraper:
“””Enhanced web scraper using BrightData API”””

def __init__(self, api_key: str, google_api_key: Optional[str] = None):
“””Initialize scraper with API keys”””
self.api_key = api_key
self.scraper = BrightDataWebScraperAPI(bright_data_api_key=api_key)

if google_api_key:
self.llm = ChatGoogleGenerativeAI(
model=”gemini-2.0-flash”,
google_api_key=google_api_key
)
self.agent = create_react_agent(self.llm, [self.scraper])

def scrape_amazon_product(self, url: str, zipcode: str = “10001”) -> Dict[str, Any]:
“””Scrape Amazon product data”””
try:
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “amazon_product”,
“zipcode”: zipcode
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def scrape_amazon_bestsellers(self, region: str = “in”) -> Dict[str, Any]:
“””Scrape Amazon bestsellers”””
try:
url = f”https://www.amazon.{region}/gp/bestsellers/”
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “amazon_product”
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def scrape_linkedin_profile(self, url: str) -> Dict[str, Any]:
“””Scrape LinkedIn profile data”””
try:
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “linkedin_person_profile”
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def run_agent_query(self, query: str) -> None:
“””Run AI agent with natural language query”””
if not hasattr(self, ‘agent’):
print(“Error: Google API key required for agent functionality”)
return

try:
for step in self.agent.stream(
{“messages”: query},
stream_mode=”values”
):
step[“messages”][-1].pretty_print()
except Exception as e:
print(f”Agent error: {e}”)

def print_results(self, results: Dict[str, Any], title: str = “Results”) -> None:
“””Pretty print results”””
print(f”n{‘=’*50}”)
print(f”{title}”)
print(f”{‘=’*50}”)

if results[“success”]:
print(json.dumps(results[“data”], indent=2, ensure_ascii=False))
else:
print(f”Error: {results[‘error’]}”)
print()

The BrightDataScraper class encapsulates all BrightData web-scraping logic and optional Gemini-powered intelligence under a single, reusable interface. Its methods enable you to easily fetch Amazon product details, bestseller lists, and LinkedIn profiles, handling API calls, error handling, and JSON formatting, and even stream natural-language “agent” queries when a Google API key is provided. A convenient print_results helper ensures your output is always cleanly formatted for inspection.

Copy CodeCopiedUse a different Browserdef main():
“””Main execution function”””
BRIGHT_DATA_API_KEY = “Use Your Own API Key”
GOOGLE_API_KEY = “Use Your Own API Key”

scraper = BrightDataScraper(BRIGHT_DATA_API_KEY, GOOGLE_API_KEY)

print(” Scraping Amazon India Bestsellers…”)
bestsellers = scraper.scrape_amazon_bestsellers(“in”)
scraper.print_results(bestsellers, “Amazon India Bestsellers”)

print(” Scraping Amazon Product…”)
product_url = “https://www.amazon.com/dp/B08L5TNJHG”
product_data = scraper.scrape_amazon_product(product_url, “10001”)
scraper.print_results(product_data, “Amazon Product Data”)

print(” Scraping LinkedIn Profile…”)
linkedin_url = “https://www.linkedin.com/in/satyanadella/”
linkedin_data = scraper.scrape_linkedin_profile(linkedin_url)
scraper.print_results(linkedin_data, “LinkedIn Profile Data”)

print(” Running AI Agent Query…”)
agent_query = “””
Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1
in New York (zipcode 10001) and summarize the key product details.
“””
scraper.run_agent_query(agent_query)

The main() function ties everything together by setting your BrightData and Google API keys, instantiating the BrightDataScraper, and then demonstrating each feature: it scrapes Amazon India’s bestsellers, fetches details for a specific product, retrieves a LinkedIn profile, and finally runs a natural-language agent query, printing neatly formatted results after each step.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
print(“Installing required packages…”)
os.system(“pip install -q langchain-brightdata langchain-google-genai langgraph”)

os.environ[“BRIGHT_DATA_API_KEY”] = “Use Your Own API Key”

main()

Finally, this entry-point block ensures that, when run as a standalone script, the required scraping libraries are quietly installed, and the BrightData API key is set in the environment. Then the main function is executed to initiate all scraping and agent workflows.

In conclusion, by the end of this tutorial, you’ll have a ready-to-use Python script that automates tedious data collection tasks, abstracts away low-level API details, and optionally taps into generative AI for advanced query handling. You can extend this foundation by adding support for other dataset types, integrating additional LLMs, or deploying the scraper as part of a larger data pipeline or web service. With these building blocks in place, you’re now equipped to gather, analyze, and present web data more efficiently, whether for market research, competitive intelligence, or custom AI-driven applications.

Check out the Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Build an Advanced BrightData Web Scraper with Google Gemini for AI-Powered Data Extraction appeared first on MarkTechPost.

Why Small Language Models (SLMs) Are Poised to Redefine Agentic AI: Ef …

Posted on June 19, 2025 by i-genie

The Shift in Agentic AI System Needs

LLMs are widely admired for their human-like capabilities and conversational skills. However, with the rapid growth of agentic AI systems, LLMs are increasingly being utilized for repetitive, specialized tasks. This shift is gaining momentum—over half of major IT companies now use AI agents, with significant funding and projected market growth. These agents rely on LLMs for decision-making, planning, and task execution, typically through centralized cloud APIs. Massive investments in LLM infrastructure reflect confidence that this model will remain foundational to AI’s future.

SLMs: Efficiency, Suitability, and the Case Against Over-Reliance on LLMs

Researchers from NVIDIA and Georgia Tech argue that small language models (SLMs) are not only powerful enough for many agent tasks but also more efficient and cost-effective than large models. They believe SLMs are better suited for the repetitive and simple nature of most agentic operations. While large models remain essential for more general, conversational needs, they propose using a mix of models depending on task complexity. They challenge the current reliance on LLMs in agentic systems and offer a framework for transitioning from LLMs to SLMs. They invite open discussion to encourage more resource-conscious AI deployment.

Why SLMs are Sufficient for Agentic Operations

The researchers argue that SLMs are not only capable of handling most tasks within AI agents but are also more practical and cost-effective than LLMs. They define SLMs as models that can run efficiently on consumer devices, highlighting their strengths—lower latency, reduced energy consumption, and easier customization. Since many agent tasks are repetitive and focused, SLMs are often sufficient and even preferable. The paper suggests a shift toward modular agentic systems using SLMs by default and LLMs only when necessary, promoting a more sustainable, flexible, and inclusive approach to building intelligent systems.

Arguments for LLM Dominance

Some argue that LLMs will always outperform small models (SLMs) in general language tasks due to superior scaling and semantic abilities. Others claim centralized LLM inference is more cost-efficient due to economies of scale. There is also a belief that LLMs dominate simply because they had an early start, drawing the majority of the industry’s attention. However, the study counters that SLMs are highly adaptable, cheaper to run, and can handle well-defined subtasks in agent systems effectively. Still, the broader adoption of SLMs faces hurdles, including existing infrastructure investments, evaluation bias toward LLM benchmarks, and lower public awareness.

Framework for Transitioning from LLMs to SLMs

To smoothly shift from LLMs to smaller, specialized ones (SLMs) in agent-based systems, the process starts by securely collecting usage data while ensuring privacy. Next, the data is cleaned and filtered to remove sensitive details. Using clustering, common tasks are grouped to identify where SLMs can take over. Based on task needs, suitable SLMs are chosen and fine-tuned with tailored datasets, often utilizing efficient techniques such as LoRA. In some cases, LLM outputs guide SLM training. This isn’t a one-time process—models should be regularly updated and refined to stay aligned with evolving user interactions and tasks.

Conclusion: Toward Sustainable and Resource-Efficient Agentic AI

In conclusion, the researchers believe that shifting from large to SLMs could significantly improve the efficiency and sustainability of agentic AI systems, especially for tasks that are repetitive and narrowly focused. They argue that SLMs are often powerful enough, more cost-effective, and better suited for such roles compared to general-purpose LLMs. In cases requiring broader conversational abilities, using a mix of models is recommended. To encourage progress and open dialogue, they invite feedback and contributions to their stance, committing to share responses publicly. The goal is to inspire more thoughtful and resource-efficient use of AI technologies in the future.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Why Small Language Models (SLMs) Are Poised to Redefine Agentic AI: Efficiency, Cost, and Practical Deployment appeared first on MarkTechPost.

Meeting summarization and action item extraction with Amazon Nova

Posted on June 19, 2025 by i-genie

Meetings play a crucial role in decision-making, project coordination, and collaboration, and remote meetings are common across many organizations. However, capturing and structuring key takeaways from these conversations is often inefficient and inconsistent. Manually summarizing meetings or extracting action items requires significant effort and is prone to omissions or misinterpretations.
Large language models (LLMs) offer a more robust solution by transforming unstructured meeting transcripts into structured summaries and action items. This capability is especially useful for project management, customer support and sales calls, legal and compliance, and enterprise knowledge management.
In this post, we present a benchmark of different understanding models from the Amazon Nova family available on Amazon Bedrock, to provide insights on how you can choose the best model for a meeting summarization task.
LLMs to generate meeting insights
Modern LLMs are highly effective for summarization and action item extraction due to their ability to understand context, infer topic relationships, and generate structured outputs. In these use cases, prompt engineering provides a more efficient and scalable approach compared to traditional model fine-tuning or customization. Rather than modifying the underlying model architecture or training on large labeled datasets, prompt engineering uses carefully crafted input queries to guide the model’s behavior, directly influencing the output format and content. This method allows for rapid, domain-specific customization without the need for resource-intensive retraining processes. For tasks such as meeting summarization and action item extraction, prompt engineering enables precise control over the generated outputs, making sure they meet specific business requirements. It allows for the flexible adjustment of prompts to suit evolving use cases, making it an ideal solution for dynamic environments where model behaviors need to be quickly reoriented without the overhead of model fine-tuning.
Amazon Nova models and Amazon Bedrock
Amazon Nova models, unveiled at AWS re:Invent in December 2024, are built to deliver frontier intelligence at industry-leading price performance. They’re among the fastest and most cost-effective models in their respective intelligence tiers, and are optimized to power enterprise generative AI applications in a reliable, secure, and cost-effective manner.
The understanding model family has four tiers of models: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Pro (multimodal, balance of speed and intelligence, ideal for most enterprise needs) and Nova Premier (multimodal, the most capable Nova model for complex tasks and teacher for model distillation). Amazon Nova models can be used for a variety of tasks, from summarization to structured text generation. With Amazon Bedrock Model Distillation, customers can also bring the intelligence of Nova Premier to a faster and more cost-effective model such as Nova Pro or Nova Lite for their use case or domain. This can be achieved through the Amazon Bedrock console and APIs such as the Converse API and Invoke API.
Solution overview
This post demonstrates how to use Amazon Nova understanding models, available through Amazon Bedrock, for automated insight extraction using prompt engineering. We focus on two key outputs:

Meeting summarization – A high-level abstractive summary that distills key discussion points, decisions made, and critical updates from the meeting transcript
Action items – A structured list of actionable tasks derived from the meeting conversation that apply to the entire team or project

The following diagram illustrates the solution workflow.

Prerequisites
To follow along with this post, familiarity with calling LLMs using Amazon Bedrock is expected. For detailed steps on using Amazon Bedrock for text summarization tasks, refer to Build an AI text summarizer app with Amazon Bedrock. For additional information about calling LLMs, refer to the Invoke API and Using the Converse API reference documentation.
Solution components
We developed the two core features of the solution—meeting summarization and action item extraction—by using popular models available through Amazon Bedrock. In the following sections, we look at the prompts that were used for these key tasks.
For the meeting summarization task, we used a persona assignment, prompting the LLM to generate a summary in <summary> tags to reduce redundant opening and closing sentences, and a one-shot approach by giving the LLM one example to make sure the LLM consistently follows the right format for summary generation. As part of the system prompt, we give clear and concise rules emphasizing the correct tone, style, length, and faithfulness towards the provided transcript.
For the action item extraction task, we gave specific instructions on generating action items in the prompts and used chain-of-thought to improve the quality of the generated action items. In the assistant message, the prefix <action_items> tag is provided as a prefilling to nudge the model generation in the right direction and to avoid redundant opening and closing sentences.
Different model families respond to the same prompts differently, and it’s important to follow the prompting guide defined for the particular model. For more information on best practices for Amazon Nova prompting, refer to Prompting best practices for Amazon Nova understanding models.
Dataset
To evaluate the solution, we used the samples for the public QMSum dataset. The QMSum dataset is a benchmark for meeting summarization, featuring English language transcripts from academic, business, and governance discussions with manually annotated summaries. It evaluates LLMs on generating structured, coherent summaries from complex and multi-speaker conversations, making it a valuable resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled meetings from the QMSum dataset. Each meeting contained 2–5 topic-wise transcripts and contained approximately 8,600 tokens for each transcript in average.
Evaluation framework
Achieving high-quality outputs from LLMs in meeting summarization and action item extraction can be a challenging task. Traditional evaluation metrics such as ROUGE, BLEU, and METEOR focus on surface-level similarity between generated text and reference summaries, but they often fail to capture nuances such as factual correctness, coherence, and actionability. Human evaluation is the gold standard but is expensive, time-consuming, and not scalable. To address these challenges, you can use LLM-as-a-judge, where another LLM is used to systematically assess the quality of generated outputs based on well-defined criteria. This approach offers a scalable and cost-effective way to automate evaluation while maintaining high accuracy. In this example, we used Anthropic’s Claude 3.5 Sonnet v1 as the judge model because we found it to be most aligned with human judgment. We used the LLM judge to score the generated responses on three main metrics: faithfulness, summarization, and question answering (QA).
The faithfulness score measures the faithfulness of a generated summary by measuring the portion of the parsed statements in a summary that are supported by given context (for example, a meeting transcript) with respect to the total number of statements.
The summarization score is the combination of the QA score and the conciseness score with the same weight (0.5). The QA score measures the coverage of a generated summary from a meeting transcript. It first generates a list of question and answer pairs from a meeting transcript and measures the portion of the questions that are asked correctly when the summary is used as a context instead of a meeting transcript. The QA score is complimentary to the faithfulness score because the faithfulness score doesn’t measure the coverage of a generated summary. We only used the QA score to measure the quality of a generated summary because the action items aren’t supposed to cover all aspects of a meeting transcript. The conciseness score measures the ratio of the length of a generated summary divided by the length of the total meeting transcript.
We used a modified version of the faithfulness score and the summarization score that had much lower latency than the original implementation.
Results
Our evaluation of Amazon Nova models across meeting summarization and action item extraction tasks revealed clear performance-latency patterns. For summarization, Nova Premier achieved the highest faithfulness score (1.0) with a processing time of 5.34s, while Nova Pro delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro models provided faithfulness scores of 0.86 and 0.83 respectively, with faster processing times of 2.13s and 1.52s. In action item extraction, Nova Premier again led in faithfulness (0.83) with 4.94s processing time, followed by Nova Pro (0.8 faithfulness, 2.03s). Interestingly, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) in this particular task despite its smaller size. These measurements provide valuable insights into the performance-speed characteristics across the Amazon Nova model family for text-processing applications. The following graphs show these results. The following screenshot shows a sample output for our summarization task, including the LLM-generated meeting summary and a list of action items.

Conclusion
In this post, we showed how you can use prompting to generate meeting insights such as meeting summaries and action items using Amazon Nova models available through Amazon Bedrock. For large-scale AI-driven meeting summarization, optimizing latency, cost, and accuracy is essential. The Amazon Nova family of understanding models (Nova Micro, Nova Lite, Nova Pro, and Nova Premier) offers a practical alternative to high-end models, significantly improving inference speed while reducing operational costs. These factors make Amazon Nova an attractive choice for enterprises handling large volumes of meeting data at scale.
For more information on Amazon Bedrock and the latest Amazon Nova models, refer to the Amazon Bedrock User Guide and Amazon Nova User Guide, respectively. The AWS Generative AI Innovation Center has a group of AWS science and strategy experts with comprehensive expertise spanning the generative AI journey, helping customers prioritize use cases, build a roadmap, and move solutions into production. Check out the Generative AI Innovation Center for our latest work and customer success stories.

About the Authors
Baishali Chaudhury is an Applied Scientist at the Generative AI Innovation Center at AWS, where she focuses on advancing Generative AI solutions for real-world applications. She has a strong background in computer vision, machine learning, and AI for healthcare. Baishali holds a PhD in Computer Science from University of South Florida and PostDoc from Moffitt Cancer Centre.
Sungmin Hong is a Senior Applied Scientist at Amazon Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, he prides himself on keeping his indoor plants alive for 3+ years.
Mengdie (Flora) Wang is a Data Scientist at AWS Generative AI Innovation Center, where she works with customers to architect and implement scalable Generative AI solutions that address their unique business challenges. She specializes in model customization techniques and agent-based AI systems, helping organizations harness the full potential of generative AI technology. Prior to AWS, Flora earned her Master’s degree in Computer Science from the University of Minnesota, where she developed her expertise in machine learning and artificial intelligence.
Anila Joshi has more than a decade of experience building AI solutions. As a AWSI Geo Leader at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.

Building a custom text-to-SQL agent using Amazon Bedrock and Converse …

Posted on June 19, 2025 by i-genie

Developing robust text-to-SQL capabilities is a critical challenge in the field of natural language processing (NLP) and database management. The complexity of NLP and database management increases in this field, particularly while dealing with complex queries and database structures. In this post, we introduce a straightforward but powerful solution with accompanying code to text-to-SQL using a custom agent implementation along with Amazon Bedrock and Converse API.
The ability to translate natural language queries into SQL statements is a game-changer for businesses and organizations because users can now interact with databases in a more intuitive and accessible manner. However, the complexity of database schemas, relationships between tables, and the nuances of natural language can often lead to inaccurate or incomplete SQL queries. This not only compromises the integrity of the data but also hinders the overall user experience. Through a straightforward yet powerful architecture, the agent can understand your query, develop a plan of execution, create SQL statements, self-correct if there is a SQL error, and learn from its execution to improve in the future. Overtime, the agent can develop a cohesive understanding of what to do and what not to do to efficiently answer queries from users.
Solution overview
The solution is composed of an AWS Lambda function that contains the logic of the agent that communicates with Amazon DynamoDB for long-term memory retention, calls Anthropic’s Claude Sonnet in Amazon Bedrock through Converse API, uses AWS Secrets Manager to retrieve database connection details and credentials, and Amazon Relational Database Service (Amazon RDS) that contains an example Postgres database called HR Database. The Lambda function is connected to a virtual private cloud (VPC) and communicates with DynamoDB, Amazon Bedrock, and Secrets Manager through AWS PrivateLink VPC endpoints so that the Lambda can communicate with the RDS database while keeping traffic private through AWS networking.
In the demo, you can interact with the agent through the Lambda function. You can provide it a natural language query, such as “How many employees are there in each department in each region?” or “What is the employee mix by gender in each region”. The following is the solution architecture.

A custom agent build using Converse API
Converse API is provided by Amazon Bedrock for you to be able to create conversational applications. It enables powerful features such as tool use. Tool use is the ability for a large language model (LLM) to choose from a list of tools, such as running SQL queries against a database, and decide which tool to use depending on the context of the conversation. Using Converse API also means you can maintain a series of messages between User and Assistant roles to carry out a chat with an LLM such as Anthropic’s Claude 3.5 Sonnet. In this post, a custom agent called ConverseSQLAgent was created specifically for long-running agent executions and to follow a plan of execution.
The Agent loop: Agent planning, self-correction, and long-term learning
The agent contains several key features: planning and carry-over, execution and tool use, SQLAlchemy and self-correction, reflection and long-term learning using memory.
Planning and carry-over
The first step that the agent takes is to create a plan of execution to perform the text-to-SQL task. It first thinks through what the user is asking and develops a plan on how it will fulfill the request of the user. This behavior is controlled using a system prompt, which defines how the agent should behave. After the agent thinks through what it should do, it outputs the plan.
One of the challenges with long-running agent execution is that sometimes the agent will forget the plan that it was supposed to execute as the context becomes longer and longer as it conducts its steps. One of the primary ways to deal with this is by “carrying over” the initial plan by injecting it back into a section in the system prompt. The system prompt is part of every converse API call, and it improves the ability of the agent to follow its plan. Because the agent may revise its plan as it progresses through the execution, the plan in the system prompt is updated as new plans emerge. Refer to the following figure on how the carry over works.

Execution and tool use
After the plan has been created, the agent will execute its plan one step at a time. It might decide to call on one or more tools it has access to. With Converse API, you can pass in a toolConfig that contains the toolSpec for each tool it has access to. The toolSpec defines what the tool is, a description of the tool, and the parameters that the tool requires. When the LLM decides to use a tool, it outputs a tool use block as part of its response. The application, in this case the Lambda code, needs to identify that tool use block, execute the corresponding tool, append the tool result response to the message list, and call the Converse API again. As shown at (a) in the following figure, you can add tools for the LLM to choose from by adding in a toolConfig along with toolSpecs. Part (b) shows that in the implementation of ConverseSQLAgent, tool groups contain a collection of tools, and each tool contains the toolSpec and the callable function. The tool groups are added to the agent, which in turn adds it to the Converse API call. Tool group instructions are additional instructions on how to use the tool group that get injected into the system prompt. Although you can add descriptions to each individual tool, having tool group–wide instructions enable more effective usage of the group.

SQLAlchemy and self-correction
The SQL tool group (these tools are part of the demo code provided), as shown in the preceding figure, is implemented using SQLAlchemy, which is a Python SQL toolkit you can use to interface with different databases without having to worry about database-specific SQL syntax. You can connect to Postgres, MySQL, and more without having to change your code every time.
In this post, there is an InvokeSQLQuery tool that allows the agent to execute arbitrary SQL statements. Although almost all database specific tasks, such as looking up schemas and tables, can be accomplished through InvokeSQLQuery, it’s better to provide SQLAlchemy implementations for specific tasks, such as GetDatabaseSchemas, which gets every schema in the database, greatly reducing the time it takes for the agent to generate the correct query. Think of it as giving the agent a shortcut to getting the information it needs. The agents can make errors in querying the database through the InvokeSQLQuery tool. The InvokeSQLQuery tool will respond with the error that it encountered back to the agent, and the agent can perform self-correction to correct the query. This flow is shown in the following diagram.

Reflection and long-term learning using memory
Although self-correction is an important feature of the agent, the agent must be able to learn through its mistakes to avoid the same mistake in the future. Otherwise, the agent will continue to make the mistake, greatly reducing effectiveness and efficiency. The agent maintains a hierarchical memory structure, as shown in the following figure. The agent decides how to structure its memory. Here is an example on how it may structure it.

The agent can reflect on its execution, learn best practices and error avoidance, and save it into long-term memory. Long-term memory is implemented through a hierarchical memory structure with Amazon DynamoDB. The agent maintains a main memory that has pointers to other memories it has. Each memory is represented as a record in a DynamoDB table. As the agent learns through its execution and encounters errors, it can update its main memory and create new memories by maintaining an index of memories in the main memory. It can then tap onto this memory in the future to avoid errors and even improve the efficiency of queries by caching facts.
Prerequisites
Before you get started, make sure you have the following prerequisites:

An AWS account with an AWS Identity and Access Management (IAM) user with permissions to deploy the CloudFormation template
The AWS Command Line Interface (AWS CLI) installed and configured for use
Python 3.11 or later
Amazon Bedrock model access to Anthropic’s Claude 3.5 Sonnet

Deploy the solution
The full code and instructions are available in GitHub in the Readme file.

Clone the code to your working environment:

git clone https://github.com/aws-samples/aws-field-samples.git

Move to ConverseSqlAgent folder
Follow the steps in the Readme file in the GitHub repo

Cleanup
To dispose of the stack afterwards, invoke the following command:
cdk destroy
Conclusion
The development of robust text-to-SQL capabilities is a critical challenge in natural language processing and database management. Although current approaches have made progress, there remains room for improvement, particularly with complex queries and database structures. The introduction of the ConverseSQLAgent, a custom agent implementation using Amazon Bedrock and Converse API, presents a promising solution to this problem. The agent’s architecture, featuring planning and carry-over, execution and tool use, self-correction through SQLAlchemy, and reflection-based long-term learning, demonstrates its ability to understand natural language queries, develop and execute SQL plans, and continually improve its capabilities. As businesses seek more intuitive ways to access and manage data, solutions such as the ConverseSQLAgent hold the potential to bridge the gap between natural language and structured database queries, unlocking new levels of productivity and data-driven decision-making. To dive deeper and learn more about generative AI, check out these additional resources:

Amazon Bedrock
Amazon Bedrock Knowledge Bases
Generative AI use cases
Amazon Bedrock Agents
Carry out a conversation with the Converse API operations

About the authors
Pavan Kumar is a Solutions Architect at Amazon Web Services (AWS), helping customers design robust, scalable solutions on the cloud across multiple industries. With a background in enterprise architecture and software development, Pavan has contributed to creating solutions to handle API security, API management, microservices, and geospatial information system use cases for his customers. He is passionate about learning new technologies and solving, automating, and simplifying customer problems using these solutions.
Abdullah Siddiqui is a Partner Sales Solutions Architect at Amazon Web Services (AWS) based out of Toronto. He helps AWS Partners and customers build solutions using AWS services and specializes in resilience and migrations. In his spare time, he enjoys spending time with his family and traveling.
Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Accelerate threat modeling with generative AI

Posted on June 19, 2025 by i-genie

In this post, we explore how generative AI can revolutionize threat modeling practices by automating vulnerability identification, generating comprehensive attack scenarios, and providing contextual mitigation strategies. Unlike previous automation attempts that struggled with the creative and contextual aspects of threat analysis, generative AI overcomes these limitations through its ability to understand complex system relationships, reason about novel attack vectors, and adapt to unique architectural patterns. Where traditional automation tools relied on rigid rule sets and predefined templates, AI models can now interpret nuanced system designs, infer security implications across components, and generate threat scenarios that human analysts might overlook, making effective automated threat modeling a practical reality.
Threat modeling and why it matters
Threat modeling is a structured approach to identifying, quantifying, and addressing security risks associated with an application or system. It involves analyzing the architecture from an attacker’s perspective to discover potential vulnerabilities, determine their impact, and implement appropriate mitigations. Effective threat modeling examines data flows, trust boundaries, and potential attack vectors to create a comprehensive security strategy tailored to the specific system.
In a shift-left approach to security, threat modeling serves as a critical early intervention. By implementing threat modeling during the design phase—before a single line of code is written—organizations can identify and address potential vulnerabilities at their inception point. The following diagram illustrates this workflow.

This proactive strategy significantly reduces the accumulation of security debt and transforms security from a bottleneck into an enabler of innovation. When security considerations are integrated from the beginning, teams can implement appropriate controls throughout the development lifecycle, resulting in more resilient systems built from the ground up.
Despite these clear benefits, threat modeling remains underutilized in the software development industry. This limited adoption stems from several significant challenges inherent to traditional threat modeling approaches:

Time requirements – The process takes 1–8 days to complete, with multiple iterations needed for full coverage. This conflicts with tight development timelines in modern software environments.
Inconsistent assessment – Threat modeling suffers from subjectivity. Security experts often vary in their threat identification and risk level assignments, creating inconsistencies across projects and teams.
Scaling limitations – Manual threat modeling can’t effectively address modern system complexity. The growth of microservices, cloud deployments, and system dependencies outpaces security teams’ capacity to identify vulnerabilities.

How generative AI can help
Generative AI has revolutionized threat modeling by automating traditionally complex analytical tasks that required human judgment, reasoning, and expertise. Generative AI brings powerful capabilities to threat modeling, combining natural language processing with visual analysis to simultaneously evaluate system architectures, diagrams, and documentation. Drawing from extensive security databases like MITRE ATT&CK and OWASP, these models can quickly identify potential vulnerabilities across complex systems. This dual capability of processing both text and visuals while referencing comprehensive security frameworks enables faster, more thorough threat assessments than traditional manual methods.
Our solution, Threat Designer, uses enterprise-grade foundation models (FMs) available in Amazon Bedrock to transform threat modeling. Using Anthropic’s Claude Sonnet 3.7 advanced multimodal capabilities, we create comprehensive threat assessments at scale. You can also use other available models from the model catalog or use your own fine-tuned model, giving you maximum flexibility to use pre-trained expertise or custom-tailored capabilities specific to your security domain and organizational requirements. This adaptability makes sure your threat modeling solution delivers precise insights aligned with your unique security posture.
Solution overview
Threat Designer is a user-friendly web application that makes advanced threat modeling accessible to development and security teams. Threat Designer uses large language models (LLMs) to streamline the threat modeling process and identify vulnerabilities with minimal human effort.
Key features include:

Architecture diagram analysis – Users can submit system architecture diagrams, which the application processes using multimodal AI capabilities to understand system components and relationships
Interactive threat catalog – The system generates a comprehensive catalog of potential threats that users can explore, filter, and refine through an intuitive interface
Iterative refinement – With the replay functionality, teams can rerun the threat modeling process with design improvements or modifications, and see how changes impact the system’s security posture
Standardized exports – Results can be exported in PDF or DOCX formats, facilitating integration with existing security documentation and compliance processes
Serverless architecture – The solution runs on a cloud-based serverless infrastructure, alleviating the need for dedicated servers and providing automatic scaling based on demand

The following diagram illustrates the Threat Designer architecture.

The solution is built on a serverless stack, using AWS managed services for automatic scaling, high availability, and cost-efficiency. The solution is composed of the following core components:

Frontend – AWS Amplify hosts a ReactJS application built with the Cloudscape design system, providing the UI
Authentication – Amazon Cognito manages the user pool, handling authentication flows and securing access to application resources
API layer – Amazon API Gateway serves as the communication hub, providing proxy integration between frontend and backend services with request routing and authorization
Data storage – We use the following services for storage:

Two Amazon DynamoDB tables:

The agent execution state table maintains processing state
The threat catalog table stores identified threats and vulnerabilities

An Amazon Simple Storage Service (Amazon S3) architecture bucket stores system diagrams and artifacts

Generative AI – Amazon Bedrock provides the FM for threat modeling, analyzing architecture diagrams and identifying potential vulnerabilities
Backend service – An AWS Lambda function contains the REST interface business logic, built using Powertools for AWS Lambda (Python)
Agent service – Hosted on a Lambda function, the agent service works asynchronously to manage threat analysis workflows, processing diagrams and maintaining execution state in DynamoDB

Agent service workflow
The agent service is built on LangGraph by LangChain, with which we can orchestrate complex workflows through a graph-based structure. This approach incorporates two key design patterns:

Separation of concerns – The threat modeling process is decomposed into discrete, specialized steps that can be executed independently and iteratively. Each node in the graph represents a specific function, such as image processing, asset identification, data flow analysis, or threat enumeration.
Structured output – Each component in the workflow produces standardized, well-defined outputs that serve as inputs to subsequent steps, providing consistency and facilitating downstream integrations for consistent representation.

The agent workflow follows a directed graph where processing begins at the Start node and proceeds through several specialized stages, as illustrated in the following diagram.

The workflow includes the following nodes:

Image processing – The Image processing node processes the architecture diagram image and converts it in the appropriate format for the LLM to consume
Assets – This information, along with textual descriptions, feeds into the Assets node, which identifies and catalogs system components
Flows – The workflow then progresses to the Flows node, mapping data movements and trust boundaries between components
Threats – Lastly, the Threats node uses this information to identify potential vulnerabilities and attack vectors

A critical innovation in our agent architecture is the adaptive iteration mechanism implemented through conditional edges in the graph. This feature addresses one of the fundamental challenges in LLM-based threat modeling: controlling the comprehensiveness and depth of the analysis.
The conditional edge after the Threats node enables two powerful operational modes:

User-controlled iteration – In this mode, the user specifies the number of iterations the agent should perform. With each pass through the loop, the agent enriches the threat catalog by analyzing edge cases that might have been overlooked in previous iterations. This approach gives security professionals direct control over the thoroughness of the analysis.
Autonomous gap analysis – In fully agentic mode, a specialized gap analysis component evaluates the current threat catalog. This component identifies potential blind spots or underdeveloped areas in the threat model and triggers additional iterations until it determines the threat catalog is sufficiently comprehensive. The agent essentially performs its own quality assurance, continuously refining its output until it meets predefined completeness criteria.

Prerequisites
Before you deploy Threat Designer, make sure you have the required prerequisites in place. For more information, refer to the GitHub repo.
Get started with Threat Designer
To start using Threat Designer, follow the step-by-step deployment instructions from the project’s README available in GitHub. After you deploy the solution, you’re ready to create your first threat model. Log in and complete the following steps:

Choose Submit threat model to initiate a new threat model.
Complete the submission form with your system details:

Required fields: Provide a title and architecture diagram image.
Recommended fields: Provide a solution description and assumptions (these significantly improve the quality of the threat model).

Configure analysis parameters:

Choose your iteration mode:

Auto (default): The agent intelligently determines when the threat catalog is comprehensive.
Manual: Specify up to 15 iterations for more control.

Configure your reasoning boost to specify how much time the model spends on analysis (available when using Anthropic’s Claude Sonnet 3.7).

Choose Start threat modeling to launch the analysis.

You can monitor progress through the intuitive interface, which displays each execution step in real time. The complete analysis typically takes between 5–15 minutes, depending on system complexity and selected parameters.

When the analysis is complete, you will have access to a comprehensive threat model that you can explore, refine, and export.

Clean up
To avoid incurring future charges, delete the solution by running the ./destroy.sh script. Refer to the README for more details.
Conclusion
In this post, we demonstrated how generative AI transforms threat modeling from an exclusive, expert-driven process into an accessible security practice for all development teams. By using FMs through our Threat Designer solution, we’ve democratized sophisticated security analysis, enabling organizations to identify vulnerabilities earlier and more consistently. This AI-powered approach removes the traditional barriers of time, expertise, and scalability, making shift-left security a practical reality rather than just an aspiration—ultimately building more resilient systems without sacrificing development velocity.
Deploy Threat Designer following the README instructions, upload your architecture diagram, and quickly receive AI-generated security insights. This streamlined approach helps you integrate proactive security measures into your development process without compromising speed or innovation—making comprehensive threat modeling accessible to teams of different sizes.

About the Authors
Edvin Hallvaxhiu is a senior security architect at Amazon Web Services, specialized in cybersecurity and automation. He helps customers design secure, compliant cloud solutions.
Sindi Cali is a consultant with AWS Professional Services. She supports customers in building data-driven applications in AWS.
Aditi Gupta is a Senior Global Engagement Manager at AWS ProServe. She specializes in delivering impactful Big Data and AI/ML solutions that enable AWS customers to maximize their business value through data utilization.
Rahul Shaurya is a Principal Data Architect at Amazon Web Services. He helps and works closely with customers building data platforms and analytical applications on AWS.

Building High-Performance Financial Analytics Pipelines with Polars: L …

Posted on June 18, 2025 by i-genie

In this tutorial, we delve into building an advanced data analytics pipeline using Polars, a lightning-fast DataFrame library designed for optimal performance and scalability. Our goal is to demonstrate how we can utilize Polars’ lazy evaluation, complex expressions, window functions, and SQL interface to process large-scale financial datasets efficiently. We begin by generating a synthetic financial time series dataset and move step-by-step through an end-to-end pipeline, from feature engineering and rolling statistics to multi-dimensional analysis and ranking. Throughout, we demonstrate how Polars empowers us to write expressive and performant data transformations, all while maintaining low memory usage and ensuring fast execution.

Copy CodeCopiedUse a different Browserimport polars as pl
import numpy as np
from datetime import datetime, timedelta
import io

try:
import polars as pl
except ImportError:
import subprocess
subprocess.run([“pip”, “install”, “polars”], check=True)
import polars as pl

print(” Advanced Polars Analytics Pipeline”)
print(“=” * 50)

We begin by importing the essential libraries, including Polars for high-performance DataFrame operations and NumPy for generating synthetic data. To ensure compatibility, we add a fallback installation step for Polars in case it isn’t already installed. With the setup ready, we signal the start of our advanced analytics pipeline.

Copy CodeCopiedUse a different Browsernp.random.seed(42)
n_records = 100000
dates = [datetime(2020, 1, 1) + timedelta(days=i//100) for i in range(n_records)]
tickers = np.random.choice([‘AAPL’, ‘GOOGL’, ‘MSFT’, ‘TSLA’, ‘AMZN’], n_records)

# Create complex synthetic dataset
data = {
‘timestamp’: dates,
‘ticker’: tickers,
‘price’: np.random.lognormal(4, 0.3, n_records),
‘volume’: np.random.exponential(1000000, n_records).astype(int),
‘bid_ask_spread’: np.random.exponential(0.01, n_records),
‘market_cap’: np.random.lognormal(25, 1, n_records),
‘sector’: np.random.choice([‘Tech’, ‘Finance’, ‘Healthcare’, ‘Energy’], n_records)
}

print(f” Generated {n_records:,} synthetic financial records”)

We generate a rich, synthetic financial dataset with 100,000 records using NumPy, simulating daily stock data for major tickers such as AAPL and TSLA. Each entry includes key market features such as price, volume, bid-ask spread, market cap, and sector. This provides a realistic foundation for demonstrating advanced Polars analytics on a time-series dataset.

Copy CodeCopiedUse a different Browserlf = pl.LazyFrame(data)

result = (
lf
.with_columns([
pl.col(‘timestamp’).dt.year().alias(‘year’),
pl.col(‘timestamp’).dt.month().alias(‘month’),
pl.col(‘timestamp’).dt.weekday().alias(‘weekday’),
pl.col(‘timestamp’).dt.quarter().alias(‘quarter’)
])

.with_columns([
pl.col(‘price’).rolling_mean(20).over(‘ticker’).alias(‘sma_20’),
pl.col(‘price’).rolling_std(20).over(‘ticker’).alias(‘volatility_20’),

pl.col(‘price’).ewm_mean(span=12).over(‘ticker’).alias(’ema_12′),

pl.col(‘price’).diff().alias(‘price_diff’),

(pl.col(‘volume’) * pl.col(‘price’)).alias(‘dollar_volume’)
])

.with_columns([
pl.col(‘price_diff’).clip(0, None).rolling_mean(14).over(‘ticker’).alias(‘rsi_up’),
pl.col(‘price_diff’).abs().rolling_mean(14).over(‘ticker’).alias(‘rsi_down’),

(pl.col(‘price’) – pl.col(‘sma_20’)).alias(‘bb_position’)
])

.with_columns([
(100 – (100 / (1 + pl.col(‘rsi_up’) / pl.col(‘rsi_down’)))).alias(‘rsi’)
])

.filter(
(pl.col(‘price’) > 10) &
(pl.col(‘volume’) > 100000) &
(pl.col(‘sma_20’).is_not_null())
)

.group_by([‘ticker’, ‘year’, ‘quarter’])
.agg([
pl.col(‘price’).mean().alias(‘avg_price’),
pl.col(‘price’).std().alias(‘price_volatility’),
pl.col(‘price’).min().alias(‘min_price’),
pl.col(‘price’).max().alias(‘max_price’),
pl.col(‘price’).quantile(0.5).alias(‘median_price’),

pl.col(‘volume’).sum().alias(‘total_volume’),
pl.col(‘dollar_volume’).sum().alias(‘total_dollar_volume’),

pl.col(‘rsi’).filter(pl.col(‘rsi’).is_not_null()).mean().alias(‘avg_rsi’),
pl.col(‘volatility_20’).mean().alias(‘avg_volatility’),
pl.col(‘bb_position’).std().alias(‘bollinger_deviation’),

pl.len().alias(‘trading_days’),
pl.col(‘sector’).n_unique().alias(‘sectors_count’),

(pl.col(‘price’) > pl.col(‘sma_20’)).mean().alias(‘above_sma_ratio’),

((pl.col(‘price’).max() – pl.col(‘price’).min()) / pl.col(‘price’).min())
.alias(‘price_range_pct’)
])

.with_columns([
pl.col(‘total_dollar_volume’).rank(method=’ordinal’, descending=True).alias(‘volume_rank’),
pl.col(‘price_volatility’).rank(method=’ordinal’, descending=True).alias(‘volatility_rank’)
])

.filter(pl.col(‘trading_days’) >= 10)
.sort([‘ticker’, ‘year’, ‘quarter’])
)

We load our synthetic dataset into a Polars LazyFrame to enable deferred execution, allowing us to chain complex transformations efficiently. From there, we enrich the data with time-based features and apply advanced technical indicators, such as moving averages, RSI, and Bollinger bands, using window and rolling functions. We then perform grouped aggregations by ticker, year, and quarter to extract key financial statistics and indicators. Finally, we rank the results based on volume and volatility, filter out under-traded segments, and sort the data for intuitive exploration, all while leveraging Polars’ powerful lazy evaluation engine to its full advantage.

Copy CodeCopiedUse a different Browserdf = result.collect()
print(f”n Analysis Results: {df.height:,} aggregated records”)
print(“nTop 10 High-Volume Quarters:”)
print(df.sort(‘total_dollar_volume’, descending=True).head(10).to_pandas())

print(“n Advanced Analytics:”)

pivot_analysis = (
df.group_by(‘ticker’)
.agg([
pl.col(‘avg_price’).mean().alias(‘overall_avg_price’),
pl.col(‘price_volatility’).mean().alias(‘overall_volatility’),
pl.col(‘total_dollar_volume’).sum().alias(‘lifetime_volume’),
pl.col(‘above_sma_ratio’).mean().alias(‘momentum_score’),
pl.col(‘price_range_pct’).mean().alias(‘avg_range_pct’)
])
.with_columns([
(pl.col(‘overall_avg_price’) / pl.col(‘overall_volatility’)).alias(‘risk_adj_score’),

(pl.col(‘momentum_score’) * 0.4 +
pl.col(‘avg_range_pct’) * 0.3 +
(pl.col(‘lifetime_volume’) / pl.col(‘lifetime_volume’).max()) * 0.3)
.alias(‘composite_score’)
])
.sort(‘composite_score’, descending=True)
)

print(“n Ticker Performance Ranking:”)
print(pivot_analysis.to_pandas())

Once our lazy pipeline is complete, we collect the results into a DataFrame and immediately review the top 10 quarters based on total dollar volume. This helps us identify periods of intense trading activity. We then take our analysis a step further by grouping the data by ticker to compute higher-level insights, such as lifetime trading volume, average price volatility, and a custom composite score. This multi-dimensional summary helps us compare stocks not just by raw volume, but also by momentum and risk-adjusted performance, unlocking deeper insights into overall ticker behavior.

Copy CodeCopiedUse a different Browserprint(“n SQL Interface Demo:”)
pl.Config.set_tbl_rows(5)

sql_result = pl.sql(“””
SELECT
ticker,
AVG(avg_price) as mean_price,
STDDEV(price_volatility) as volatility_consistency,
SUM(total_dollar_volume) as total_volume,
COUNT(*) as quarters_tracked
FROM df
WHERE year >= 2021
GROUP BY ticker
ORDER BY total_volume DESC
“””, eager=True)

print(sql_result)

print(f”n Performance Metrics:”)
print(f” • Lazy evaluation optimizations applied”)
print(f” • {n_records:,} records processed efficiently”)
print(f” • Memory-efficient columnar operations”)
print(f” • Zero-copy operations where possible”)

print(f”n Export Options:”)
print(” • Parquet (high compression): df.write_parquet(‘data.parquet’)”)
print(” • Delta Lake: df.write_delta(‘delta_table’)”)
print(” • JSON streaming: df.write_ndjson(‘data.jsonl’)”)
print(” • Apache Arrow: df.to_arrow()”)

print(“n Advanced Polars pipeline completed successfully!”)
print(” Demonstrated: Lazy evaluation, complex expressions, window functions,”)
print(” SQL interface, advanced aggregations, and high-performance analytics”)

We wrap up the pipeline by showcasing Polars’ elegant SQL interface, running an aggregate query to analyze post-2021 ticker performance with familiar SQL syntax. This hybrid capability enables us to blend expressive Polars transformations with declarative SQL queries seamlessly. To highlight its efficiency, we print key performance metrics, emphasizing lazy evaluation, memory efficiency, and zero-copy execution. Finally, we demonstrate how easily we can export results in various formats, such as Parquet, Arrow, and JSONL, making this pipeline both powerful and production-ready. With that, we complete a full-circle, high-performance analytics workflow using Polars.

In conclusion, we’ve seen firsthand how Polars’ lazy API can optimize complex analytics workflows that would otherwise be sluggish in traditional tools. We’ve developed a comprehensive financial analysis pipeline, spanning from raw data ingestion to rolling indicators, grouped aggregations, and advanced scoring, all executed with blazing speed. Not only that, but we also tapped into Polars’ powerful SQL interface to run familiar queries seamlessly over our DataFrames. This dual ability to write both functional-style expressions and SQL makes Polars an incredibly flexible tool for any data scientist.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building High-Performance Financial Analytics Pipelines with Polars: Lazy Evaluation, Advanced Expressions, and SQL Integration appeared first on MarkTechPost.

From Fine-Tuning to Prompt Engineering: Theory and Practice for Effici …

Posted on June 18, 2025 by i-genie

The Challenge of Fine-Tuning Large Transformer Models

Self-attention enables transformer models to capture long-range dependencies in text, which is crucial for comprehending complex language patterns. These models work efficiently with massive datasets and achieve remarkable performance without needing task-specific structures. As a result, they are widely applied across industries, including software development, education, and content generation.

A key limitation in applying these powerful models is the reliance on supervised fine-tuning. Adapting a base transformer to a specific task typically involves retraining the model with labeled data, which demands significant computational resources, sometimes amounting to thousands of GPU hours. This presents a major barrier for organizations that lack access to such hardware or seek quicker adaptation times. Consequently, there is a pressing need for methods that can elicit task-specific capabilities from pre-trained transformers without modifying their parameters.

Inference-Time Prompting as an Alternative to Fine-Tuning

To address this issue, researchers have explored inference-time techniques that guide the model’s behavior using example-based inputs, bypassing the need for parameter updates. Among these methods, in-context learning has emerged as a practical approach where a model receives a sequence of input-output pairs to generate predictions for new inputs. Unlike traditional training, these techniques operate during inference, enabling the base model to exhibit desired behaviors solely based on context. Despite their promise, there has been limited formal proof to confirm that such techniques can consistently match fine-tuned performance.

Theoretical Framework: Approximating Fine-Tuned Models via In-Context Learning

Researchers from Patched Codes, Inc. introduced a method grounded in the Turing completeness of transformers, demonstrating that a base model can approximate the behavior of a fine-tuned model using in-context learning, provided sufficient computational resources and access to the original training dataset. Their theoretical framework offers a quantifiable approach to understanding how dataset size, context length, and task complexity influence the quality of the approximation. The analysis specifically examines two task types—text generation and linear classification—and establishes bounds on dataset requirements to achieve fine-tuned-like outputs with a defined error margin.

Prompt Design and Theoretical Guarantees

The method involves designing a prompt structure that concatenates a dataset of labeled examples with a target query. The model processes this sequence, drawing patterns from the examples to generate a response. For instance, a prompt could include input-output pairs like sentiment-labeled reviews, followed by a new review whose sentiment must be predicted. The researchers constructed this process as a simulation of a Turing machine, where self-attention mimics the tape state and feed-forward layers act as transition rules. They also formalized conditions under which the total variation distance between the base and fine-tuned output distributions remains within an acceptable error ε. The paper provides a construction for this inference technique and quantifies its theoretical performance.

Quantitative Results: Dataset Size and Task Complexity

The researchers provided performance guarantees based on dataset size and task type. For text generation tasks involving a vocabulary size V, the dataset must be of sizeOmVϵ2log1δ to ensure the base model approximates the fine-tuned model within an error ε across mmm contexts. When the output length is fixed at l, a smaller dataset of size Ol logVϵ2log1δ suffices. For linear classification tasks where the input has dimension d, the required dataset size becomes Odϵ, or with context constraints, O1ϵ2log1δ. These results are robust under idealized assumptions but also adapted to practical constraints like finite context length and partial dataset availability using techniques such as retrieval-augmented generation.

Implications: Towards Efficient and Scalable NLP Models

This research presents a detailed and well-structured argument demonstrating that inference-time prompting can closely match the capabilities of supervised fine-tuning, provided sufficient contextual data is supplied. It successfully identifies a path toward more resource-efficient deployment of large language models, presenting both a theoretical justification and practical techniques. The study demonstrates that leveraging a model’s latent capabilities through structured prompts is not just viable but scalable and highly effective for specific NLP tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post From Fine-Tuning to Prompt Engineering: Theory and Practice for Efficient Transformer Adaptation appeared first on MarkTechPost.

How to Use python-A2A to Create and Connect Financial Agents with Goog …

Posted on June 18, 2025 by i-genie

Python A2A is an implementation of Google’s Agent-to-Agent (A2A) protocol, which enables AI agents to communicate with each other using a shared, standardized format—eliminating the need for custom integration between services.

In this tutorial, we’ll use the decorator-based approach provided by the python-a2a library. With simple @agent and @skill decorators, you can define your agent’s identity and behavior, while the library takes care of protocol handling and message flow.

This method is perfect for quickly building useful, task-focused agents without worrying about low-level communication logic.

Installing the dependencies

To get started, you’ll need to install the python-a2a library, which provides a clean abstraction to build and run agents that follow the A2A protocol.

Open your terminal and run:

Copy CodeCopiedUse a different Browserpip install python-a2a

Creating the Agents

For this tutorial, we will be creating two agents – one for calculating stock returns based on investment, rate, and time, and another for adjusting an amount based on inflation over a period of years.

EMI Agent (emi_agent.py)

Copy CodeCopiedUse a different Browserfrom python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState
import re

@agent(
name=”EMI Calculator Agent”,
description=”Calculates EMI for a given principal, interest rate, and loan duration”,
version=”1.0.0″
)
class EMIAgent(A2AServer):

@skill(
name=”Calculate EMI”,
description=”Calculates EMI given principal, annual interest rate, and duration in months”,
tags=[“emi”, “loan”, “interest”]
)
def calculate_emi(self, principal: float, annual_rate: float, months: int) -> str:
monthly_rate = annual_rate / (12 * 100)
emi = (principal * monthly_rate * ((1 + monthly_rate) ** months)) / (((1 + monthly_rate) ** months) – 1)
return f”The EMI for a loan of ₹{principal:.0f} at {annual_rate:.2f}% interest for {months} months is ₹{emi:.2f}”

def handle_task(self, task):
input_text = task.message[“content”][“text”]

# Extract values from natural language
principal_match = re.search(r”₹?(d{4,10})”, input_text)
rate_match = re.search(r”(d+(.d+)?)s*%”, input_text)
months_match = re.search(r”(d+)s*(months|month)”, input_text, re.IGNORECASE)

try:
principal = float(principal_match.group(1)) if principal_match else 100000
rate = float(rate_match.group(1)) if rate_match else 10.0
months = int(months_match.group(1)) if months_match else 12

print(f”Inputs → Principal: {principal}, Rate: {rate}, Months: {months}”)
emi_text = self.calculate_emi(principal, rate, months)

except Exception as e:
emi_text = f”Sorry, I couldn’t parse your input. Error: {e}”

task.artifacts = [{
“parts”: [{“type”: “text”, “text”: emi_text}]
}]
task.status = TaskStatus(state=TaskState.COMPLETED)

return task

# Run the server
if __name__ == “__main__”:
agent = EMIAgent()
run_server(agent, port=4737)

This EMI Calculator Agent is built using the python-a2a library and follows the decorator-based approach. At the top, we use the @agent decorator to define the agent’s name, description, and version. This registers the agent so that it can communicate using the A2A protocol.

Inside the class, we define a single skill using the @skill decorator. This skill, called calculate_emi, performs the actual EMI calculation using the standard formula. The formula takes in three parameters: the loan principal, the annual interest rate, and the loan duration in months. We convert the annual rate into a monthly rate and use it to compute the monthly EMI.

The handle_task method is the core of the agent. It receives the user’s input message, extracts relevant numbers using simple regular expressions, and passes them to the calculate_emi method.

Finally, at the bottom of the file, we launch the agent using the run_server() function on port 4737, making it ready to receive A2A protocol messages. This design keeps the agent simple, modular, and easy to extend with more skills in the future.

Inflation Agent (inflation_agent.py)

Copy CodeCopiedUse a different Browserfrom python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState
import re

@agent(
name=”Inflation Adjusted Amount Agent”,
description=”Calculates the future value adjusted for inflation”,
version=”1.0.0″
)
class InflationAgent(A2AServer):

@skill(
name=”Inflation Adjustment”,
description=”Adjusts an amount for inflation over time”,
tags=[“inflation”, “adjustment”, “future value”]
)
def handle_input(self, text: str) -> str:
try:
# Extract amount
amount_match = re.search(r”₹?(d{3,10})”, text)
amount = float(amount_match.group(1)) if amount_match else None

# Extract rate (e.g. 6%, 7.5 percent)
rate_match = re.search(r”(d+(.d+)?)s*(%|percent)”, text, re.IGNORECASE)
rate = float(rate_match.group(1)) if rate_match else None

# Extract years (e.g. 5 years)
years_match = re.search(r”(d+)s*(years|year)”, text, re.IGNORECASE)
years = int(years_match.group(1)) if years_match else None

if amount is not None and rate is not None and years is not None:
adjusted = amount * ((1 + rate / 100) ** years)
return f”₹{amount:.2f} adjusted for {rate:.2f}% inflation over {years} years is ₹{adjusted:.2f}”

return (
“Please provide amount, inflation rate (e.g. 6%) and duration (e.g. 5 years).n”
“Example: ‘What is ₹10000 worth after 5 years at 6% inflation?'”
)
except Exception as e:
return f”Sorry, I couldn’t compute that. Error: {e}”

def handle_task(self, task):
text = task.message[“content”][“text”]
result = self.handle_input(text)

task.artifacts = [{
“parts”: [{“type”: “text”, “text”: result}]
}]
task.status = TaskStatus(state=TaskState.COMPLETED)
return task

if __name__ == “__main__”:
agent = InflationAgent()
run_server(agent, port=4747)

This agent helps calculate how much a given amount would be worth in the future after adjusting for inflation. It uses the same decorator-based structure provided by the python-a2a library. The @agent decorator defines the metadata for this agent, and the @skill decorator registers the main logic under the name “Inflation Adjustment.”

The handle_input method is where the main processing happens. It extracts the amount, inflation rate, and number of years from the user’s input using simple regular expressions. If all three values are present, it uses the standard future value formula to calculate the inflation-adjusted amount:

Adjusted Value = amount × (1 + rate/100) ^ years.

If any value is missing, the agent returns a helpful prompt telling the user what to provide, including an example. The handle_task function connects everything by taking the user’s message, passing it to the skill function, and returning the formatted result back to the user.

Finally, the agent is launched using run_server() on port 4747, making it ready to handle A2A queries.

Creating the Agent Network

Firstly run both the agents in two separate terminals

Copy CodeCopiedUse a different Browserpython emi_agent.py

Copy CodeCopiedUse a different Browserpython inflation_agent.py

Each of these agents exposes a REST API endpoint (e.g. http://localhost:4737 for EMI, http://localhost:4747 for Inflation) using the A2A protocol. They listen for incoming tasks (like “calculate EMI for ₹2,00,000…”) and respond with text answers.

Now, we will add these two agents to our network

Copy CodeCopiedUse a different Browserfrom python_a2a import AgentNetwork, A2AClient, AIAgentRouter

# Create an agent network
network = AgentNetwork(name=”Economics Calculator”)

# Add agents to the network
network.add(“EMI”, “http://localhost:4737”)
network.add(“Inflation”, “http://localhost:4747”)

Next we will create a router to intelligently direct queries to the best agent. This is a core utility of the A2A protocol—it defines a standard task format so agents can be queried uniformly, and routers can make intelligent routing decisions using LLMs.

Copy CodeCopiedUse a different Browserrouter = AIAgentRouter(
llm_client=A2AClient(“http://localhost:5000/openai”), # LLM for making routing decisions
agent_network=network
)

Lastly, we will query the agents

Copy CodeCopiedUse a different Browserquery = “Calculate EMI for ₹200000 at 5% interest over 18 months.”
agent_name, confidence = router.route_query(query)
print(f”Routing to {agent_name} with {confidence:.2f} confidence”)

# Get the selected agent and ask the question
agent = network.get_agent(agent_name)
response = agent.ask(query)
print(f”Response: {response}”)

Copy CodeCopiedUse a different Browserquery = “What is ₹1500000 worth if inflation is 9% for 10 years?”
agent_name, confidence = router.route_query(query)
print(f”Routing to {agent_name} with {confidence:.2f} confidence”)

# Get the selected agent and ask the question
agent = network.get_agent(agent_name)
response = agent.ask(query)
print(f”Response: {response}”)

Check out the Notebooks- inflation_agent.py, network.ipynb and emi_agent.py. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Use python-A2A to Create and Connect Financial Agents with Google’s Agent-to-Agent (A2A) Protocol appeared first on MarkTechPost.

How Anomalo solves unstructured data quality issues to deliver trusted …

Posted on June 18, 2025 by i-genie

This post is co-written with Vicky Andonova and Jonathan Karon from Anomalo.
Generative AI has rapidly evolved from a novelty to a powerful driver of innovation. From summarizing complex legal documents to powering advanced chat-based assistants, AI capabilities are expanding at an increasing pace. While large language models (LLMs) continue to push new boundaries, quality data remains the deciding factor in achieving real-world impact.
A year ago, it seemed that the primary differentiator in generative AI applications would be who could afford to build or use the biggest model. But with recent breakthroughs in base model training costs (such as DeepSeek-R1) and continual price-performance improvements, powerful models are becoming a commodity. Success in generative AI is becoming less about building the right model and more about finding the right use case. As a result, the competitive edge is shifting toward data access and data quality.
In this environment, enterprises are poised to excel. They have a hidden goldmine of decades of unstructured text—everything from call transcripts and scanned reports to support tickets and social media logs. The challenge is how to use that data. Transforming unstructured files, maintaining compliance, and mitigating data quality issues all become critical hurdles when an organization moves from AI pilots to production deployments.
In this post, we explore how you can use Anomalo with Amazon Web Services (AWS) AI and machine learning (AI/ML) to profile, validate, and cleanse unstructured data collections to transform your data lake into a trusted source for production ready AI initiatives, as shown in the following figure.

The challenge: Analyzing unstructured enterprise documents at scale
Despite the widespread adoption of AI, many enterprise AI projects fail due to poor data quality and inadequate controls. Gartner predicts that 30% of generative AI projects will be abandoned in 2025. Even the most data-driven organizations have focused primarily on using structured data, leaving unstructured content underutilized and unmonitored in data lakes or file systems. Yet, over 80% of enterprise data is unstructured (according to MIT Sloan School research), spanning everything from legal contracts and financial filings to social media posts.
For chief information officers (CIOs), chief technical officers (CTOs), and chief information security officers (CISOs), unstructured data represents both risk and opportunity. Before you can use unstructured content in generative AI applications, you must address the following critical hurdles:

Extraction – Optical character recognition (OCR), parsing, and metadata generation can be unreliable if not automated and validated. In addition, if extraction is inconsistent or incomplete, it can result in malformed data.
Compliance and security – Handling personally identifiable information (PII) or proprietary intellectual property (IP) demands rigorous governance, especially with the EU AI Act, Colorado AI Act, General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and similar regulations. Sensitive information can be difficult to identify in unstructured text, leading to inadvertent mishandling of that information.
Data quality – Incomplete, deprecated, duplicative, off-topic, or poorly written data can pollute your generative AI models and Retrieval Augmented Generation (RAG) context, yielding hallucinated, out-of-date, inappropriate, or misleading outputs. Making sure that your data is high-quality helps mitigate these risks.
Scalability and cost – Training or fine-tuning models on noisy data increases compute costs by unnecessarily growing the training dataset (training compute costs tend to grow linearly with dataset size), and processing and storing low-quality data in a vector database for RAG wastes processing and storage capacity.

In short, generative AI initiatives often falter—not because the underlying model is insufficient, but because the existing data pipeline isn’t designed to process unstructured data and still meet high-volume, high-quality ingestion and compliance requirements. Many companies are in the early stages of addressing these hurdles and are facing these problems in their existing processes:

Manual and time-consuming – The analysis of vast collections of unstructured documents relies on manual review by employees, creating time-consuming processes that delay projects.
Error-prone – Human review is susceptible to mistakes and inconsistencies, leading to inadvertent exclusion of critical data and inclusion of incorrect data.
Resource-intensive – The manual document review process requires significant staff time that could be better spent on higher-value business activities. Budgets can’t support the level of staffing needed to vet enterprise document collections.

Although existing document analysis processes provide valuable insights, they aren’t efficient or accurate enough to meet modern business needs for timely decision-making. Organizations need a solution that can process large volumes of unstructured data and help maintain compliance with regulations while protecting sensitive information.
The solution: An enterprise-grade approach to unstructured data quality
Anomalo uses a highly secure, scalable stack provided by AWS that you can use to detect, isolate, and address data quality problems in unstructured data–in minutes instead of weeks. This helps your data teams deliver high-value AI applications faster and with less risk. The architecture of Anomalo’s solution is shown in the following figure.

Automated ingestion and metadata extraction – Anomalo automates OCR and text parsing for PDF files, PowerPoint presentations, and Word documents stored in Amazon Simple Storage Service (Amazon S3) using auto scaling Amazon Elastic Cloud Compute (Amazon EC2) instances, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon Elastic Container Registry (Amazon ECR).
Continuous data observability – Anomalo inspects each batch of extracted data, detecting anomalies such as truncated text, empty fields, and duplicates before the data reaches your models. In the process, it monitors the health of your unstructured pipeline, flagging surges in faulty documents or unusual data drift (for example, new file formats, an unexpected number of additions or deletions, or changes in document size). With this information reviewed and reported by Anomalo, your engineers can spend less time manually combing through logs and more time optimizing AI features, while CISOs gain visibility into data-related risks.
Governance and compliance – Built-in issue detection and policy enforcement help mask or remove PII and abusive language. If a batch of scanned documents includes personal addresses or proprietary designs, it can be flagged for legal or security review—minimizing regulatory and reputational risk. You can use Anomalo to define custom issues and metadata to be extracted from documents to solve a broad range of governance and business needs.
Scalable AI on AWS – Anomalo uses Amazon Bedrock to give enterprises a choice of flexible, scalable LLMs for analyzing document quality. Anomalo’s modern architecture can be deployed as software as a service (SaaS) or through an Amazon Virtual Private Cloud (Amazon VPC) connection to meet your security and operational needs.
Trustworthy data for AI business applications – The validated data layer provided by Anomalo and AWS Glue helps make sure that only clean, approved content flows into your application.
Supports your generative AI architecture – Whether you use fine-tuning or continued pre-training on an LLM to create a subject matter expert, store content in a vector database for RAG, or experiment with other generative AI architectures, by making sure that your data is clean and validated, you improve application output, preserve brand trust, and mitigate business risks.

Impact
Using Anomalo and AWS AI/ML services for unstructured data provides these benefits:

Reduced operational burden – Anomalo’s off-the-shelf rules and evaluation engine save months of development time and ongoing maintenance, freeing time for designing new features instead of developing data quality rules.
Optimized costs – Training LLMs and ML models on low-quality data wastes precious GPU capacity, while vectorizing and storing that data for RAG increases overall operational costs, and both degrade application performance. Early data filtering cuts these hidden expenses.
Faster time to insights – Anomalo automatically classifies and labels unstructured text, giving data scientists rich data to spin up new generative prototypes or dashboards without time-consuming labeling prework.
Strengthened compliance and security – Identifying PII and adhering to data retention rules is built into the pipeline, supporting security policies and reducing the preparation needed for external audits.
Create durable value – The generative AI landscape continues to rapidly evolve. Although LLM and application architecture investments may depreciate quickly, trustworthy and curated data is a sure bet that won’t be wasted.

Conclusion
Generative AI has the potential to deliver massive value–Gartner estimates 15–20% revenue increase, 15% cost savings, and 22% productivity improvement. To achieve these results, your applications must be built on a foundation of trusted, complete, and timely data. By delivering a user-friendly, enterprise-scale solution for structured and unstructured data quality monitoring, Anomalo helps you deliver more AI projects to production faster while meeting both your user and governance requirements.
Interested in learning more? Check out Anomalo’s unstructured data quality solution and request a demo or contact us for an in-depth discussion on how to begin or scale your generative AI journey.

About the authors
Vicky Andonova is the GM of Generative AI at Anomalo, the company reinventing enterprise data quality. As a founding team member, Vicky has spent the past six years pioneering Anomalo’s machine learning initiatives, transforming advanced AI models into actionable insights that empower enterprises to trust their data. Currently, she leads a team that not only brings innovative generative AI products to market but is also building a first-in-class data quality monitoring solution specifically designed for unstructured data. Previously, at Instacart, Vicky built the company’s experimentation platform and led company-wide initiatives to grocery delivery quality. She holds a BE from Columbia University.
Jonathan Karon leads Partner Innovation at Anomalo. He works closely with companies across the data ecosystem to integrate data quality monitoring in key tools and workflows, helping enterprises achieve high-functioning data practices and leverage novel technologies faster. Prior to Anomalo, Jonathan created Mobile App Observability, Data Intelligence, and DevSecOps products at New Relic, and was Head of Product at a generative AI sales and customer success startup. He holds a BA in Cognitive Science from Hampshire College and has worked with AI and data exploration technology throughout his career.
Mahesh Biradar is a Senior Solutions Architect at AWS with a history in the IT and services industry. He helps SMBs in the US meet their business goals with cloud technology. He holds a Bachelor of Engineering from VJTI and is based in New York City (US)
Emad Tawfik is a seasoned Senior Solutions Architect at Amazon Web Services, boasting more than a decade of experience. His specialization lies in the realm of Storage and Cloud solutions, where he excels in crafting cost-effective and scalable architectures for customers.