Racing into the future: How AWS DeepRacer fueled my AI and ML journey

In 2018, I sat in the audience at AWS re:Invent as Andy Jassy announced AWS DeepRacer—a fully autonomous 1/18th scale race car driven by reinforcement learning. At the time, I knew little about AI or machine learning (ML). As an engineer transitioning from legacy networks to cloud technologies, I had never considered myself a developer. But AWS DeepRacer instantly captured my interest with its promise that even inexperienced developers could get involved in AI and ML.

The AWS DeepRacer League was also announced, featuring physical races at AWS Summits worldwide in 2019 and a virtual league in a simulated environment. Winners would qualify to compete for the grand champion title in Las Vegas the following year. For 2018, because AWS DeepRacer had just been unveiled, re:Invent attendees could compete in person at the MGM Grand using pre-trained models.
My colleagues and I from JigsawXYZ immediately headed to the MGM Grand after the keynote. Despite long queues, we persevered, observing others racing while we waited. Participants answered questions about driving preferences to select a pre-trained model. Unlike later competitions, racers had to physically follow the car and place it back on track when it veered off.
We noticed that the AWS-provided models were unstable and slow by today’s standards, frequently going off-track. We concluded that quickly replacing the car on the track could result in a good lap time. Using this strategy, we secured second place on the leaderboard.
The night before the finals, we learned that we had qualified because of a dropout. Panic set in as we realized we would be competing on stage in front of thousands of people while knowing little about ML. We frantically tried to train a model overnight to avoid embarrassment.
The next morning, we found ourselves in the front row of the main auditorium, next to Andy Jassy. Our boss, Rick Fish, represented our team. After an energetic introduction from Indycar commentator Ryan Myrehn, Rick set a lap time of 51.50 seconds, securing the 2018 AWS DeepRacer grand champion title!

Image 2 – Rick Fish accepting the AWS DeepRacer trophy from Matt Wood

2019: Building a community and diving deeper
Back in London, interest in AWS DeepRacer exploded. We spoke at multiple events, including hosting our own An evening with DeepRacer gathering. As the 2019 season approached, I needed to earn my own finals spot. I began training models in the AWS DeepRacer console and experimenting with the physical car, including remote control and first-person view projects.
At the 2019 London AWS Summit, I won the AWS DeepRacer Championship with a lap time of 8.9 seconds, a significant improvement from the previous year. This event also sparked the creation of the AWS DeepRacer Community, which has since grown to over 45,000 members.
My interest in understanding the inner workings of AWS DeepRacer grew. I contributed to open source projects that allowed running the training stack locally, diving deep into AWS services such as Amazon SageMaker and AWS RoboMaker. These efforts led to my nomination as an AWS Community Builder.
Working on community projects improved my skills in Python, Jupyter, numpy, pandas, and ROS. These experiences proved invaluable when I joined Unitary, an AI startup focused on reducing harmful online content. Within a year, we built a world-class inference platform processing over 2 billion video frames daily using dynamically scaled Amazon Elastic Kubernetes Service (Amazon EKS) clusters.

Image 3 – Unitary at the AWS London Summit showcasing dynamically scaled inference using 1000+ EKS nodes

2020-2023: Virtual racing and continued growth
The COVID-19 pandemic shifted AWS DeepRacer competitions online for 2020 and 2021. Despite this, exciting events like the AWS DeepRacer F1 Pro-Am kept the community engaged. The introduction of the AWS DeepRacer Evo, with stereo cameras and a lidar detector, marked a significant hardware upgrade.
In-person racing returned in 2022, and I set a new world record at the London Summit. While I didn’t win the finals that year, the experience of competing and connecting with fellow racers remained invaluable.

Images 4 & 5 – the author hoists the trophy from the 2022 London Summit (left) DeepRacer Community members and Pit Crew hosting a AWS DeepRacer workshop at re:Invent 2023 (right)

2023 brought more intense competition. Although I set another world record in London, it wasn’t enough for first place. I eventually secured a finals spot by winning a virtual league round for Europe. While my performance in the finals didn’t improve on previous results, the opportunity to reconnect with the AWS DeepRacer community was rewarding.
Conclusion: The lasting impact of AWS DeepRacer
Over the past six years, AWS DeepRacer has profoundly impacted my professional and personal life. It has helped me develop a strong foundation in AI and ML, improve my coding skills, and build a network of friends and professional contacts in the tech industry. The experience gained through AWS DeepRacer directly contributed to my success at Unitary, where we’ve achieved recognition as a top UK startup.
As the official AWS DeepRacer league comes to an end, I’m excited to see what the community will achieve next. This journey has shaped my career and life in ways I never expected when I first saw that small autonomous car on stage in 2018.
For those interested in starting their own AI and ML journey, I encourage you to explore the AWS DeepRacer resources available on the AWS website. You can also join the thriving community on Discord to connect with other enthusiasts and learn from their experiences.

About the author
Matt Camp is an AI and ML enthusiast who has been involved with AWS DeepRacer since its inception. He is currently working at Unitary, applying his skills to develop cutting-edge content moderation technology. Matt is an AWS Community Builder and continues to contribute to open source projects in the AWS DeepRacer community.

Your guide to generative AI and ML at AWS re:Invent 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. This year, generative AI and machine learning (ML) will again be in focus, with exciting keynote announcements and a variety of sessions showcasing insights from AWS experts, customer stories, and hands-on experiences with AWS services. As you continue to innovate and partner with us to advance the field of generative AI, we’ve curated a diverse range of sessions to support you at every stage of your journey. These sessions are strategically organized across multiple learning topics, so there’s something valuable for everyone, regardless of your experience level.
In this attendee guide, we’re highlighting a few of our favorite sessions to give you a glimpse into what’s in store. As you browse the re:Invent catalog, select your learning topic and use the “Generative AI” area of interest tag to find the sessions most relevant to you.
The technical sessions covering generative AI are divided into six areas: First, we’ll spotlight Amazon Q, the generative AI-powered assistant transforming software development and enterprise data utilization. These sessions, featuring Amazon Q Business, Amazon Q Developer, Amazon Q in QuickSight, and Amazon Q Connect, span the AI/ML, DevOps and Developer Productivity, Analytics, and Business Applications topics. The sessions showcase how Amazon Q can help you streamline coding, testing, and troubleshooting, as well as enable you to make the most of your data to optimize business operations. You will also explore AWS App Studio, a generative AI-powered service that empowers a new set of builders to rapidly create enterprise-grade applications using natural language, generating intelligent, secure, and scalable apps in minutes. Second, we’ll delve into Amazon Bedrock, our fully managed service for building generative AI applications. Learn how you can use leading foundation models (FMs) from industry leaders and Amazon to build and scale your generative AI applications, and understand customization techniques like fine-tuning and Retrieval Augmented Generation (RAG). We’ll cover Amazon Bedrock Agents, capable of running complex tasks using your company’s systems and data. Third, we’ll explore the robust infrastructure services from AWS powering AI innovation, featuring Amazon SageMaker, AWS Trainium, and AWS Inferentia under AI/ML, as well as Compute topics. Discover how the fully managed infrastructure of SageMaker enables high-performance, low cost ML throughout the ML lifecycle, from building and training to deploying and managing models at scale. Fourth, we’ll address responsible AI, so you can build generative AI applications with responsible and transparent practices. Fifth, we’ll showcase various generative AI use cases across industries. And finally, get ready for the AWS DeepRacer League as it takes it final celebratory lap. You don’t want to miss this moment in AWS DeepRacer history, where racers will go head-to-head one last time to become the final champion. Off the race track, we will have dedicated sessions designed to help you continue your learning journey and apply your skills to the rapidly growing field of generative AI.
Visit the Generative AI Zone (GAIZ) at AWS Village in the Venetian Expo Hall to explore hands-on experiences with our newest launches and connect with our generative AI and ML specialists. Through a series of immersive exhibits, you can gain insights into AWS infrastructure for generative AI, learn about building and scaling generative AI applications, and discover how AI assistants are driving business transformation and modernization. As attendees circulate through the GAIZ, subject matter experts and Generative AI Innovation Center strategists will be on-hand to share insights, answer questions, present customer stories from an extensive catalog of reference demos, and provide personalized guidance for moving generative AI applications into production. Experience an immersive selection of innovative generative AI exhibits at the Generative AI and Innovations Pavilion through interactive displays spanning the AWS generative AI stack. Additionally, you can deep-dive into your industry-specific generative AI and ML use cases with our industry experts at the AWS Industries Pavilion.
If you’re new to re:Invent, you can attend sessions of the following types:

Keynotes – Join in person or virtually and learn about all the exciting announcements.
Innovation talks – Learn about the latest cloud technology from AWS technology leaders and discover how these advancements can help you push your business forward. These sessions will be livestreamed, recorded, and published to YouTube.
Breakout sessions – These 60-minute sessions are expected to have broad appeal, are delivered to larger audiences, and will be recorded. If you miss them, you can watch them on demand after re:Invent.
Chalk talks – Enjoy 60 minutes of content delivered to smaller audiences with an interactive whiteboarding session. Chalk talks are where discussions happen, and these offer you the greatest opportunity to ask questions or share your opinion.
Workshops – In these hands-on learning opportunities, in 2 hours, you’ll be able to build a solution to a problem, and understand the inner workings of the resulting infrastructure and cross-service interaction. Bring your laptop and be ready to learn!
Builders’ sessions – These highly interactive 60-minute mini-workshops are conducted in small groups of fewer than 10 attendees. Some of these appeal to beginners, and others are on specialized topics.
Code talks – These talks are similar to our popular chalk talk format, but instead of focusing on an architecture solution with whiteboarding, the speakers lead an interactive discussion featuring live coding or code samples. These 60-minute sessions focus on the actual code that goes into building a solution. Attendees are encouraged to ask questions and follow along.

If you have reserved your seat at any of the sessions, great! If not, we always set aside some spots for walk-ins, so make a plan and come to the session early.
To help you plan your agenda for this year’s re:Invent, here are some highlights of the generative AI and ML sessions. Visit the session catalog to learn about all our generative AI and ML sessions.
Keynotes

Matt Garman, Chief Executive Officer, Amazon Web Services
Tuesday December 3| 8:00 AM – 10:30 AM (PST) | The Venetian
Join AWS CEO Matt Garman to hear how AWS is innovating across every aspect of the world’s leading cloud. He explores how we are reinventing foundational building blocks as well as developing brand-new experiences, all to empower customers and partners with what they need to build a better future.

Swami Sivasubramanian, Vice President of AI and Data
Wednesday December 4 | 8:30 AM – 10:30 AM (PST) | The Venetian
Join Dr. Swami Sivasubramanian, VP of AI and Data at AWS, to discover how you can use a strong data foundation to create innovative and differentiated solutions for your customers. Hear from customer speakers with real-world examples of how they’ve used data to support a variety of use cases, including generative AI, to create unique customer experiences.
Innovation talks

Pasquale DeMaio, Vice President & General Manager of Amazon Connect| BIZ221-INT | Generative AI for customer service
Monday December 2 | 10:30 AM – 11:30 AM (PST) | Venetian | Level 5 | Palazzo Ballroom B
Generative AI promises to revolutionize customer interactions, ushering in a new era of automation, cost efficiencies, and responsiveness. However, realizing this transformative potential requires a holistic approach that harmonizes people, processes, and technology. Through customer success stories and demonstrations of the latest AWS innovations, gain insights into operationalizing generative AI for customer service from the Vice President of Amazon Connect, Pasquale DeMaio. Whether you’re just starting your journey or well on your way, leave this talk with the knowledge and tools to unlock the transformative power of AI for customer interactions, the agent experience, and more.

Mai-Lan Tomsen Bukovec, Vice President, Technology | AIM250-INT | Modern data patterns for modern data strategies
Tuesday December 3 | 11:30 AM – 12:30 PM (PST) | Venetian | Level 5 | Palazzo Ballroom B
Every modern business is a data business, and organizations need to stay nimble to balance data growth with data-driven value. In this talk, you’ll understand how to recognize the latest signals in changing data patterns, and adapt data strategies that flex to changes in consumer behavior and innovations in technology like AI. Plus, learn how to evolve from data aggregation to data semantics to support data-driven applications while maintaining flexibility and governance. Hear from AWS customers who successfully evolved their data strategies for analytics, ML, and AI, and get practical guidance on implementing similar strategies using cutting-edge AWS tools and services.

Dilip Kumar, Vice President, Amazon Q Business | INV202-INT | Creating business breakthroughs with Amazon Q
Wednesday December 4| 11:30 AM – 12:30 PM (PST) | Venetian | Level 5 | Palazzo Ballroom B
Get an overview of Amazon Q Business capabilities, including its ability to answer questions, provide summaries, generate content, and complete assigned tasks. Learn how Amazon Q Business goes beyond search to enable AI-powered actions. Explore how simple it is to build applications using Amazon Q Apps. Then, examine how AWS App Studio empowers a new set of builders to rapidly create business applications tailored to their organization’s needs, and discover how to build richer analytics using Amazon Q in QuickSight.

Baskar Sridharan, VP, AI/ML Services & Infrastructure | AIM276-INT | Generative AI in action: From prototype to production
Wednesday December 4 | 1:00 PM – 2:00 PM (PST) | Venetian | Level 5 | Palazzo Ballroom B
Learn how to transition generative AI from prototypes to production. This includes building custom models, implementing robust data strategies, and scaling architectures for performance and reliability. Additionally, the session will cover empowering business users to drive innovation and growth through this transformative technology.

Adam Seligman, Vice President, Developer Experience | DOP220-INT | Reimagining the developer experience at AWS
Thursday December 5 | 2:00 PM – 3:00 PM (PST) | Venetian | Level 5 | Palazzo Ballroom B
Dive into the pioneering approach AWS takes to integrating generative AI across the entire software development lifecycle. Explore the rich ecosystem of technical resources, networking opportunities, and knowledge-sharing platforms available to you with AWS. Learn from real-world examples of how AWS, developers, and software teams are using the power of generative AI to creative innovative solutions that are shaping the future of software development.
Breakout sessions

DOP210: Accelerate multi-step SDLC tasks with Amazon Q Developer Agents
Monday December 2 | 8:30 AM – 9:30 AM PT
While existing AI assistants focus on code generation with close human guidance, Amazon Q Developer has a unique capability called agents that can use reasoning and planning capabilities to perform multi-step tasks beyond code generation with minimal human intervention. Its agent for software development can solve complex tasks that go beyond code suggestions, such as building entire application features, refactoring code, or generating documentation. Join this session to discover new agent capabilities that help developers go from planning to getting new features in front of customers even faster.
Reserve your seat now
AIM201: Maximize business impact with Amazon Q Apps: The Volkswagen AI journey
Monday December 2 | 10:00 AM – 11:00 AM PT
Discover how Volkswagen harnesses generative AI for optimized job matching and career growth with Amazon Q. Learn from the AWS Product Management team about the benefits of Amazon Q Business and the latest innovations in Amazon Q Apps. Then, explore how Volkswagen used these tools to streamline a job role mapping project, saving thousands of hours. Mario Duarte, Senior Director at Volkswagen Group of America, details the journey toward their first Amazon Q application that helps Volkswagen’s Human Resources build a learning ecosystem that boosts employee development. Leave the session inspired to bring Amazon Q Apps to supercharge your teams’ productivity engines.
Reserve your seat now
BSI101: Reimagine business intelligence with generative AI
Monday December 2 | 1:00 PM – 2:00 PM PT
In this session, get an overview of the generative AI capabilities of Amazon Q in QuickSight. Learn how analysts can build interactive dashboards rapidly, and discover how business users can use natural language to instantly create documents and presentations explaining data and extract insights beyond what’s available in dashboards with data Q&A and executive summaries. Hear from Availity on how 1.5 million active users are using Amazon QuickSight to distill insights from dashboards instantly, and learn how they are using Amazon Q internally to increase efficiency across their business.
Reserve your seat now
AIM272: 7 Principles for effective and cost-efficient Gen AI Apps
Monday December 2 | 2:30 PM – 3: 30 PM PT
As generative AI gains traction, building effective and cost-efficient solutions is paramount. This session outlines seven guiding principles for building effective and cost-efficient generative AI applications. These principles can help businesses and developers harness generative AI’s potential while optimizing resources. Establishing objectives, curating quality data, optimizing architectures, monitoring performance, upholding ethics, and iterating improvements are crucial. With these principles, organizations can develop impactful generative AI applications that drive responsible innovation. Join this session to hear from ASAPP, a leading contact center solutions provider, as they discuss the principles they used to add generative AI-powered innovations to their software with Amazon Bedrock.
Reserve your seat now
DOP214: Unleashing generative AI: Amazon’s journey with Amazon Q Developer
Tuesday December 3 | 12:00 PM – 1:00 PM
Join us to discover how Amazon rolled out Amazon Q Developer to thousands of developers, trained them in prompt engineering, and measured its transformative impact on productivity. In this session, learn best practices for effectively adopting generative AI in your organization. Gain insights into training strategies, productivity metrics, and real-world use cases to empower your developers to harness the full potential of this game-changing technology. Don’t miss this opportunity to stay ahead of the curve and drive innovation within your team.
Reserve your seat now
AIM229: Scale FM development with Amazon SageMaker HyperPod (customer panel)
Tuesday December 3 | 2:30 PM – 3: 30 PM PT
From startups to enterprises, organizations trust AWS to innovate with comprehensive, secure, and price-performant generative AI infrastructure. Amazon SageMaker HyperPod is a purpose-built infrastructure for FM development at scale. In this session, learn how leading AI companies strategize their FM development process and use SageMaker HyperPod to build state-of-the-art FMs efficiently.
Reserve your seat now
BIZ212: Elevate your contact center performance with AI‑powered analytics
Wednesday December 4 | 8:30 AM – 9:30 AM PT
AI is unlocking deeper insights about contact center performance, including customer sentiment, agent performance, and workforce scheduling. Join this session to hear how contact center managers are using AI-powered analytics in Amazon Connect to proactively identify and act on opportunities to improve customer service outcomes. Learn how Toyota utilizes analytics to detect emerging themes and unlock insights used by leaders across the enterprise.
Reserve your seat now
AIM357: Customizing models for enhanced results: Fine-tuning in Amazon Bedrock
Wednesday December 4 | 4:00 PM – 5:00 PM PT
Unleash the power of customized AI by fine-tuning generative AI models in Amazon Bedrock to achieve higher quality results. Discover how to adapt FMs like Meta’s Llama and Anthropic’s Claude models to your specific use cases and domains, boosting accuracy and efficiency. This session covers the technical process, from data preparation to model customization techniques, training strategies, deployment considerations, and post-customization evaluation. Gain the knowledge to take your generative AI applications to new heights, harnessing tailored, high-performance language processing solutions that give you a competitive advantage.
Reserve your seat now
AIM304: Using multiple agents for scalable generative AI applications
Wednesday December 4 | 5:30 PM – 6:30 PM PT
Join this session to learn how Northwestern Mutual transformed their application development support system using Amazon Bedrock multi-agent collaboration with better planning and communication among agents. Learn how they created specialized agents for different tasks like account management, repos, pipeline management, and more to help their developers go faster. Explore the significant productivity gains and efficiency improvements achieved across the organization.
Reserve your seat now
CMP208: Customer Stories: Optimizing AI performance and costs with AWS AI chips
Thursday December 5 | 12:30 PM – 1:30 PM PT
As you increase the use of generative AI to transform your business at scale, rising costs in your model development and deployment infrastructure can adversely impact your ability to innovate and deliver delightful customer experiences. AWS Trainium and AWS Inferentia deliver high-performance AI training and inference while reducing your costs by up to 50%. Attend this session to hear from AWS customers ByteDance, Ricoh, and Arcee about how they realized these benefits to grow their businesses and deliver innovative experiences to their end-users.
Reserve your seat now
AIM359: Streamline model evaluation and selection with Amazon Bedrock
Friday December 6 | 8:30 AM – 9:30 AM
Explore the robust model evaluation capabilities of Amazon Bedrock, designed to select the optimal FMs for your applications. Discover how to create and manage evaluation jobs, use automatic and human reviews, and analyze critical metrics like accuracy, robustness, and toxicity. This session provides practical steps to streamline your model selection process, providing high-quality, reliable AI deployments. Gain essential insights to enhance your generative AI applications through effective model evaluation techniques.
Reserve your seat now
AIM342: Responsible generative AI: Evaluation best practices and tools
Friday December 6 | 10:00 AM – 11:00 AM
With the newfound prevalence of applications built with large language models (LLMs) including features such as RAG, agents, and guardrails, a responsibly driven evaluation process is necessary to measure performance and mitigate risks. This session covers best practices for a responsible evaluation. Learn about open access libraries and AWS services that can be used in the evaluation process, and dive deep on the key steps of designing an evaluation plan, including defining a use case, assessing potential risks, choosing metrics and release criteria, designing an evaluation dataset, and interpreting results for actionable risk mitigation.
Chalk talks

AIM347-R1 : Real-time issue resolution from machine-generated signals with gen AI
Tuesday December 3 | 1:00 PM – 2:00 PM PT
Resolving urgent service issues quickly is crucial for efficient operations and customer satisfaction. This chalk talk demonstrates how to process machine-generated signals into your contact center, allowing your knowledge base to provide real-time solutions. Discover how generative AI can identify problems, provide resolution content, and deliver it to the right person or device through text, voice, and data. Through a real-life IoT company case study, learn how to monitor devices, collect error messages, and respond to issues through a contact center framework using generative AI to accelerate solution provision and delivery, increasing uptime and reducing technician deployments.
Reserve your seat now
AIM407-R: Understand the deep security & privacy controls within Amazon Bedrock
Tuesday December 3 | 2:30 PM – 3:30 PM PT
Amazon Bedrock is designed to keep your data safe and secure, with none of your data being used to train the supported models. While the inference pathways are straightforward to understand, there are many nuances of some of the complex features of Amazon Bedrock that use your data for other non-inference purposes. This includes Amazon Bedrock Guardrails, Agents, and Knowledge Bases, along with the creation of custom models. In this chalk talk, explore the architectures, secure data flows, and complete lifecycle and usage of your data within these features, as you learn the deep details of the security capabilities in Amazon Bedrock.
Reserve your seat now
AIM352: Unlock Extensibility in AWS App Studio with JavaScript and Lambda
Wednesday December 4 | 10:30 AM – 11:30 AM PT
Looking for a better way to build applications that boost your team’s productivity and drive innovation? Explore the fastest and simplest way to build enterprise-grade applications—and how to extend your app’s potential with JavaScript and AWS Lambda. Join to learn hands-on techniques for automating workflows, creating AI-driven experiences, and integrating with popular AWS services. You’ll leave with practical skills to supercharge your application development!
Reserve your seat now
CMP329: Beyond Text: Unlock multimodal AI with AWS AI chips
Wednesday December 4 | 1:30 PM – 2:30 PM PT
Revolutionize your applications with multi-modal AI. Learn how to harness the power of AWS AI chips to create intelligent systems that understand and process text, images, and video. Explore advanced models, like Idefics2 and Chameleon, to build exceptional AI assistants capable of OCR, document analysis, visual reasoning, and creative content generation.
Reserve your seat now
AIM343-R: Advancing responsible AI: Managing generative AI risk
Wednesday December 4 | 4:00 PM – 5:00 PM
Risk assessment is an essential part of responsible AI (RAI) development and is an increasingly common requirement in AI standards and laws such as ISO 42001 and the EU AI Act. This chalk talk provides an introduction to best practices for RAI risk assessment for generative AI applications, covering controllability, veracity, fairness, robustness, explainability, privacy and security, transparency, and governance. Explore examples to estimate the severity and likelihood of potential events that could be harmful. Learn about Amazon SageMaker tooling for model governance, bias, explainability, and monitoring, and about transparency in the form of service cards as potential risk mitigation strategies.
Reserve your seat now
AIM366: Bring your gen AI models to Amazon Bedrock using Custom Model Import
Thursday December 5 | 1:00 PM – 2:00 PM
Learn how to accelerate your generative AI application development with Amazon Bedrock Custom Model Import. Seamlessly bring your fine-tuned models into a fully managed, serverless environment, and use the Amazon Bedrock standardized API and features like Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to accelerate generative AI application development. Discover how Salesforce achieved 73% cost savings while maintaining high accuracy through this capability. Walk away with knowledge on how to build a production-ready, serverless generative AI application with a fine-tuned model.
Reserve your seat now
Workshops

AIM315: Transforming intelligent document processing with generative AI
Monday December 2 | 8 AM – 10 AM PT
This workshop covers the use of generative AI models for intelligent document processing tasks. It introduces intelligent document processing and demonstrates how generative AI can enhance capabilities like multilingual OCR, document classification based on content/structure/visuals, document rule matching using RAG models, and agentic frameworks that combine generative models with decision-making and task orchestration. Attendees will learn practical applications of generative AI for streamlining and automating document-centric workflows.
Reserve your seat now
DOP308-R: Accelerating enterprise development with Amazon Q Developer
Monday December | 12:00 PM – 2:00 PM PT
In this workshop, explore the transformative impact of generative AI in development. Get hands-on experience with Amazon Q Developer to learn how it can help you understand, build, and operate AWS applications. Explore the IDE to see how Amazon Q provides software development assistance, including code explanation, generation, modernization, and much more. You must bring your laptop to participate.
Reserve your seat now
BSI204-R1: Hands-on with Amazon Q in QuickSight: A step-by-step workshop
Wednesday December 4 | 1:00 PM – 3:00 PM
In this workshop, explore the generative BI capabilities of Amazon Q in QuickSight. Experience authoring visuals and refining them using natural language. Learn how business users can use natural language to generate data stories to create highly customizable narratives or slide decks from data. Discover how natural language Q&A with Amazon Q helps users gain insights beyond what is presented on dashboards while executive summaries provide an at-a-glance view of data, surfacing trends and explanations. You must bring your laptop to participate.
Reserve your seat now
AIM327: Fine-tune and deploy an LLM using Amazon SageMaker and AWS AI chips
Wednesday December 4 | 3:30 PM – 5:30 PM PT
As deep learning models have grown in size and complexity, there is a need for specialized ML accelerators to address the increasing training and inference demands of these models, while also delivering high performance, scalability, and cost-effectiveness. In this workshop, use AWS purpose-built ML accelerators, AWS Trainium and AWS Inferentia, to fine-tune and then run inference using an LLM based on the Meta Llama architecture. You must bring your laptop to participate.
Reserve your seat now
AIM402: Revolutionizing multimodal data search with Amazon Q Business
Wednesday December 4 | 3:30 PM – 5:30 PM PT
Today’s enterprises deal with data in various formats, including audio, image, video, and text, scattered across different documents. Searching through this diverse content to find useful information is a significant challenge. This workshop explores how Amazon Q Business transforms the way enterprises search and discover data across multiple formats. By utilizing cutting-edge AI and ML technologies, Amazon Q Business helps enterprises navigate their content seamlessly. Find out how this powerful tool accelerates real-world use cases by making it straightforward to extract actionable insights from multimodal datasets. You must bring your laptop to participate.
Reserve your seat now
Builder’s sessions

CMP304-R: Fine-tune Hugging Face LLMs using Amazon SageMaker and AWS Trainium
December Tuesday 3 | 2:30 PM – 3:30 PM
LLMs are pre-trained on vast amounts of data and perform well across a variety of general-purpose tasks and benchmarks without further specialized training. In practice, however, it is common to improve the performance of a pre-trained LLM by fine-tuning the model using a smaller task-specific or domain-specific dataset. In this builder’s session, learn how to use Amazon SageMaker to fine-tune a pre-trained Hugging Face LLM using AWS Trainium, and then use the fine-tuned model for inference. You must bring your laptop to participate.
Reserve your seat now
AIM328: Optimize your cloud investments using Amazon Bedrock
December Thursday 5 | 2:30 PM – 3:30 PM
Manually tracking the interconnected nature of deployed cloud resources and reviewing their utilization can be complex and time-consuming. In this builders’ session, see a demo on how you can optimize your cloud investments to maximize efficiency and cost-effectiveness. Explore a novel approach that harnesses AWS services like Amazon Bedrock, AWS CloudFormation, Amazon Neptune, and Amazon CloudWatch to analyze resource utilization and manage unused AWS resources. Using Amazon Bedrock, analyze the source code to identify the AWS resources used in the application. Apply this information to build a knowledge graph that represents the interconnected AWS resources. You must bring a laptop to participate.
Reserve your seat now
AIM403-R: Accelerate FM pre-training on Amazon SageMaker HyperPod
December Monday 2 | 2:30 – 3:30 PM
Amazon SageMaker HyperPod removes the undifferentiated heavy lifting involved in building and optimizing ML infrastructure for training FMs, reducing training time by up to 40%. In this builders’ session, learn how to pre-train an LLM using Slurm on SageMaker HyperPod. Explore the model pre-training workflow from start to finish, including setting up clusters, troubleshooting convergence issues, and running distributed training to improve model performance.
Reserve your seat now
Code talks

DOP315: Optimize your cloud environments in the AWS console with generative AI
December Monday 2 | 5:30 PM – 6 30 PM
Available on the AWS Management Console, Amazon Q Developer is the only AI assistant that is an expert on AWS, helping developers and IT pros optimize their AWS Cloud environments. Proactively diagnose and resolve errors and networking issues, provide guidance on architectural best practices, analyze billing information and trends, and use natural language in chat to manage resources in your AWS account. Learn how Amazon Q Developer accelerates task completion with tailored recommendations based on your specific AWS workloads, shifting from a reactive review to proactive notifications and remediation.
Reserve your seat now
AIM405: Learn to securely invoke Amazon Q Business Chat API
December Wednesday 4 | 2:30 PM – 3:30 PM
Join this code talk to learn how to use the Amazon Q Business identity-aware ChatSync API. First, hear an overview of identity-aware APIs, and then learn how to configure an identity provider as a trusted token issuer. Next, discover how your application can obtain an AWS STS token to assume a role that calls the ChatSync API. Finally, see how a client-side application uses the ChatSync API to answer questions from your documents indexed in Amazon Q Business.
Reserve your seat now
AIM406: Attain ML excellence with proficiency in Amazon SageMaker Python SDK
December Wednesday 4 |4:30 PM – 5:30 PM
In this comprehensive code talk, delve into the robust capabilities of the Amazon SageMaker Python SDK. Explore how this powerful tool streamlines the entire ML lifecycle, from data preparation to model deployment. Discover how to use pre-built algorithms, integrate custom models seamlessly, and harness the power of popular Python libraries within the SageMaker platform. Gain hands-on experience in data management, model training, monitoring, and seamless deployment to production environments. Learn best practices and insider tips to optimize your data science workflow and accelerate your ML journey using the SageMaker Python SDK.
Reserve your seat now
AWS DeepRacer

ML enthusiasts, start your engines—AWS DeepRacer is back at re:Invent with a thrilling finale to 6 years of ML innovation! Whether you’re an ML pro or just starting out, the AWS DeepRacer championship offers an exciting glimpse into cutting-edge reinforcement learning. The action kicks off on December 2 with the Last Chance Qualifier, followed by 3 days of intense competition as 32 global finalists race for a whopping $50,000 prize pool. Don’t miss the grand finale on December 5, where top racers will battle it out on the challenging Forever Raceway in the Data Pavilion. This year, we’re taking AWS DeepRacer beyond the track with a series of four all-new workshops. These sessions are designed to help you use your reinforcement learning skills in the rapidly expanding field of generative AI. Learn to apply AWS DeepRacer skills to LLMs, explore multi-modal semantic search, and create AI-powered chatbots.
Exciting addition: We are introducing the AWS LLM League—a groundbreaking program that builds on the success of AWS DeepRacer to bring hands-on learning to the world of generative AI. The LLM League offers participants a unique opportunity to gain practical experience in model customization and fine-tuning, skills that are increasingly crucial in today’s AI landscape. Join any of the three DPR-101 sessions to demystify LLMs using your AWS DeepRacer know-how.
Make sure to check out the re:Invent content catalog for all the generative AI and ML content at re:Invent.
Let the countdown begin. See you at re:Invent!

About the authors
Mukund Birje is a Sr. Product Marketing Manager on the AIML team at AWS. In his current role he’s focused on driving adoption of AWS data services for generative AI. He has over 10 years of experience in marketing and branding across a variety of industries. Outside of work you can find him hiking, reading, and trying out new restaurants. You can connect with him on LinkedIN
Dr. Andrew Kane is an AWS Principal WW Tech Lead (AI Language Services) based out of London. He focuses on the AWS Language and Vision AI services, helping our customers architect multiple AI services into a single use-case driven solution. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors.

Customize small language models on AWS with automotive terminology

In the rapidly evolving world of AI, the ability to customize language models for specific industries has become more important. Although large language models (LLMs) are adept at handling a wide range of tasks with natural language, they excel at general purpose tasks as compared with specialized tasks. This can create challenges when processing text data from highly specialized domains with their own distinct terminology or specialized tasks where intrinsic knowledge of the LLM is not well-suited for solutions such as Retrieval Augmented Generation (RAG).
For instance, in the automotive industry, users might not always provide specific diagnostic trouble codes (DTCs), which are often proprietary to each manufacturer. These codes, such as P0300 for a generic engine misfire or C1201 for an ABS system fault, are crucial for precise diagnosis. Without these specific codes, a general purpose LLM might struggle to provide accurate information. This lack of specificity can lead to hallucinations in the generated responses, where the model invents plausible but incorrect diagnoses, or sometimes result in no answers at all. For example, if a user simply describes “engine running rough” without providing the specific DTC, a general LLM might suggest a wide range of potential issues, some of which may be irrelevant to the actual problem, or fail to provide any meaningful diagnosis due to insufficient context. Similarly, in tasks like code generation and suggestions through chat-based applications, users might not specify the APIs they want to use. Instead, they often request help in resolving a general issue or in generating code that utilizes proprietary APIs and SDKs.
Moreover, generative AI applications for consumers can offer valuable insights into the types of interactions from end-users. With appropriate feedback mechanisms, these applications can also gather important data to continuously improve the behavior and responses generated by these models.
For these reasons, there is a growing trend in the adoption and customization of small language models (SLMs). SLMs are compact transformer models, primarily utilizing decoder-only or encoder-decoder architectures, typically with parameters ranging from 1–8 billion. They are generally more efficient and cost-effective to train and deploy compared to LLMs, and are highly effective when fine-tuned for specific domains or tasks. SLMs offer faster inference times, lower resource requirements, and are suitable for deployment on a wider range of devices, making them particularly valuable for specialized applications and edge computing scenarios. Additionally, more efficient techniques for customizing both LLMs and SLMs, such as Low Rank Adaptation (LoRA), are making these capabilities increasingly accessible to a broader range of customers.
AWS offers a wide range of solutions for interacting with language models. Amazon Bedrock is a fully managed service that offers foundation models (FMs) from Amazon and other AI companies to help you build generative AI applications and host customized models. Amazon SageMaker is a comprehensive, fully managed machine learning (ML) service to build, train, and deploy LLMs and other FMs at scale. You can fine-tune and deploy models with Amazon SageMaker JumpStart or directly through Hugging Face containers.
In this post, we guide you through the phases of customizing SLMs on AWS, with a specific focus on automotive terminology for diagnostics as a Q&A task. We begin with the data analysis phase and progress through the end-to-end process, covering fine-tuning, deployment, and evaluation. We compare a customized SLM with a general purpose LLM, using various metrics to assess vocabulary richness and overall accuracy. We provide a clear understanding of customizing language models specific to the automotive domain and its benefits. Although this post focuses on the automotive domain, the approaches are applicable to other domains. You can find the source code for the post in the associated Github repository.
Solution overview
This solution uses multiple features of SageMaker and Amazon Bedrock, and can be divided into four main steps:

Data analysis and preparation – In this step, we assess the available data, understand how it can be used to develop solution, select data for fine-tuning, and identify required data preparation steps. We use Amazon SageMaker Studio, a comprehensive web-based integrated development environment (IDE) designed to facilitate all aspects of ML development. We also employ SageMaker jobs to access more computational power on-demand, thanks to the SageMaker Python SDK.
Model fine-tuning – In this step, we prepare prompt templates for fine-tuning SLM. For this post, we use Meta Llama3.1 8B Instruct from Hugging Face as the SLM. We run our fine-tuning script directly from the SageMaker Studio JupyterLab environment. We use the @remote decorator feature of the SageMaker Python SDK to launch a remote training job. The fine-tuning script uses LoRA, distributing compute across all available GPUs on a single instance.
Model deployment – When the fine-tuning job is complete and the model is ready, we have two deployment options:

Deploy in SageMaker by selecting the best instance and container options available.
Deploy in Amazon Bedrock by importing the fine-tuned model for on-demand use.

Model evaluation – In this final step, we evaluate the fine-tuned model against a similar base model and a larger model available from Amazon Bedrock. Our evaluation focuses on how well the model uses specific terminology for the automotive space, as well as the improvements provided by fine-tuning in generating answers.

The following diagram illustrates the solution architecture.
Using the Automotive_NER dataset
The Automotive_NER dataset, available on the Hugging Face platform, is designed for named entity recognition (NER) tasks specific to the automotive domain. This dataset is specifically curated to help identify and classify various entities related to the automotive industry and uses domain-specific terminologies.
The dataset contains approximately 256,000 rows; each row contains annotated text data with entities related to the automotive domain, such as car brands, models, component, description of defects, consequences, and corrective actions. The terminology used to describe defects, reference to components, or error codes reported is a standard for the automotive industry. The fine-tuning process enables the language model to learn the domain terminologies better and helps improve the vocabulary used in the generation of answers and overall accuracy for the generated answers.
The following table is an example of rows contained in the dataset.

1
COMPNAME
DESC_DEFECT
CONEQUENCE_DEFECT
CORRECTIVE_ACTION

2
ELECTRICAL SYSTEM:12V/24V/48V  BATTERY:CABLES
CERTAIN PASSENGER VEHICLES EQUIPPED WITH ZETEC ENGINES, LOOSE OR BROKEN  ATTACHMENTS AND MISROUTED BATTERY CABLES COULD LEAD TO CABLE INSULATION  DAMAGE.
THIS, IN TURN, COULD CAUSE THE BATTERY CABLES TO SHORT RESULTING IN HEAT  DAMAGE TO THE CABLES.  BESIDES HEAT  DAMAGE, THE “CHECK ENGINE” LIGHT MAY ILLUMINATE, THE VEHICLE MAY  FAIL TO START, OR SMOKE, MELTING, OR FIRE COULD ALSO OCCUR.
DEALERS WILL INSPECT THE BATTERY CABLES FOR THE CONDITION OF THE CABLE  INSULATION AND PROPER TIGHTENING OF THE TERMINAL ENDS.  AS NECESSARY, CABLES WILL BE REROUTED,  RETAINING CLIPS INSTALLED, AND DAMAGED BATTERY CABLES REPLACED.   OWNER NOTIFICATION BEGAN FEBRUARY 10,  2003.   OWNERS WHO DO NOT RECEIVE THE  FREE REMEDY  WITHIN A REASONABLE TIME  SHOULD CONTACT FORD AT 1-866-436-7332.

3
ELECTRICAL SYSTEM:12V/24V/48V  BATTERY:CABLES
CERTAIN PASSENGER VEHICLES EQUIPPED WITH ZETEC ENGINES, LOOSE OR BROKEN  ATTACHMENTS AND MISROUTED BATTERY CABLES COULD LEAD TO CABLE INSULATION  DAMAGE.
THIS, IN TURN, COULD CAUSE THE BATTERY CABLES TO SHORT RESULTING IN HEAT  DAMAGE TO THE CABLES.  BESIDES HEAT  DAMAGE, THE “CHECK ENGINE” LIGHT MAY ILLUMINATE, THE VEHICLE MAY  FAIL TO START, OR SMOKE, MELTING, OR FIRE COULD ALSO OCCUR.
DEALERS WILL INSPECT THE BATTERY CABLES FOR THE CONDITION OF THE CABLE  INSULATION AND PROPER TIGHTENING OF THE TERMINAL ENDS.  AS NECESSARY, CABLES WILL BE REROUTED,  RETAINING CLIPS INSTALLED, AND DAMAGED BATTERY CABLES REPLACED.   OWNER NOTIFICATION BEGAN FEBRUARY 10,  2003.   OWNERS WHO DO NOT RECEIVE THE  FREE REMEDY  WITHIN A REASONABLE TIME  SHOULD CONTACT FORD AT 1-866-436-7332.

4
EQUIPMENT:OTHER:LABELS
ON CERTAIN FOLDING TENT CAMPERS, THE FEDERAL CERTIFICATION (AND RVIA)  LABELS HAVE THE INCORRECT GROSS VEHICLE WEIGHT RATING, TIRE SIZE, AND  INFLATION PRESSURE LISTED.
IF THE TIRES WERE INFLATED TO 80 PSI, THEY COULD BLOW RESULTING IN A  POSSIBLE CRASH.
OWNERS WILL BE MAILED CORRECT LABELS FOR INSTALLATION ON THEIR  VEHICLES.   OWNER NOTIFICATION BEGAN  SEPTEMBER 23, 2002.    OWNERS SHOULD  CONTACT JAYCO AT 1-877-825-4782.

5
STRUCTURE
ON CERTAIN CLASS A MOTOR HOMES, THE FLOOR TRUSS NETWORK SUPPORT SYSTEM  HAS A POTENTIAL TO WEAKEN CAUSING INTERNAL AND EXTERNAL FEATURES TO BECOME  MISALIGNED.  THE AFFECTED VEHICLES ARE  1999 – 2003 CLASS A MOTOR HOMES MANUFACTURED ON F53 20,500 POUND GROSS  VEHICLE WEIGHT RATING (GVWR), FORD CHASSIS, AND 2000-2003 CLASS A MOTOR HOMES  MANUFACTURED ON W-22 22,000 POUND GVWR, WORKHORSE CHASSIS.
CONDITIONS CAN RESULT IN THE BOTTOMING OUT THE SUSPENSION AND  AMPLIFICATION OF THE STRESS PLACED ON THE FLOOR TRUSS NETWORK.  THE ADDITIONAL STRESS CAN RESULT IN THE  FRACTURE OF WELDS SECURING THE FLOOR TRUSS NETWORK SYSTEM TO THE CHASSIS  FRAME RAIL AND/OR FRACTURE OF THE FLOOR TRUSS NETWORK SUPPORT SYSTEM.  THE POSSIBILITY EXISTS THAT THERE COULD BE  DAMAGE TO ELECTRICAL WIRING AND/OR FUEL LINES WHICH COULD POTENTIALLY LEAD TO  A FIRE.
DEALERS WILL INSPECT THE FLOOR TRUSS NETWORK SUPPORT SYSTEM, REINFORCE  THE EXISTING STRUCTURE, AND REPAIR, AS NEEDED, THE FLOOR TRUSS NETWORK  SUPPORT.   OWNER NOTIFICATION BEGAN  NOVEMBER 5, 2002.  OWNERS SHOULD  CONTACT MONACO AT 1-800-685-6545.

6
STRUCTURE
ON CERTAIN CLASS A MOTOR HOMES, THE FLOOR TRUSS NETWORK SUPPORT SYSTEM  HAS A POTENTIAL TO WEAKEN CAUSING INTERNAL AND EXTERNAL FEATURES TO BECOME  MISALIGNED.  THE AFFECTED VEHICLES ARE  1999 – 2003 CLASS A MOTOR HOMES MANUFACTURED ON F53 20,500 POUND GROSS  VEHICLE WEIGHT RATING (GVWR), FORD CHASSIS, AND 2000-2003 CLASS A MOTOR HOMES  MANUFACTURED ON W-22 22,000 POUND GVWR, WORKHORSE CHASSIS.
CONDITIONS CAN RESULT IN THE BOTTOMING OUT THE SUSPENSION AND  AMPLIFICATION OF THE STRESS PLACED ON THE FLOOR TRUSS NETWORK.  THE ADDITIONAL STRESS CAN RESULT IN THE  FRACTURE OF WELDS SECURING THE FLOOR TRUSS NETWORK SYSTEM TO THE CHASSIS  FRAME RAIL AND/OR FRACTURE OF THE FLOOR TRUSS NETWORK SUPPORT SYSTEM.  THE POSSIBILITY EXISTS THAT THERE COULD BE  DAMAGE TO ELECTRICAL WIRING AND/OR FUEL LINES WHICH COULD POTENTIALLY LEAD TO  A FIRE.
DEALERS WILL INSPECT THE FLOOR TRUSS NETWORK SUPPORT SYSTEM, REINFORCE  THE EXISTING STRUCTURE, AND REPAIR, AS NEEDED, THE FLOOR TRUSS NETWORK  SUPPORT.   OWNER NOTIFICATION BEGAN  NOVEMBER 5, 2002.  OWNERS SHOULD  CONTACT MONACO AT 1-800-685-6545.

Data analysis and preparation on SageMaker Studio
When you’re fine-tuning LLMs, the quality and composition of your training data are crucial (quality over quantity). For this post, we implemented a sophisticated method to select 6,000 rows out of 256,000. This method uses TF-IDF vectorization to identify the most significant and the rarest words in the dataset. By selecting rows containing these words, we maintained a balanced representation of common patterns and edge cases. This improves computational efficiency and creates a high-quality, diverse subset leading to effective model training.
The first step is to open a JupyterLab application previously created in our SageMaker Studio domain.

After you clone the git repository, install the required libraries and dependencies:

pip install -r requirements.txt

The next step is to read the dataset:

from datasets import load_dataset
import pandas as pd

dataset = load_dataset(“sp01/Automotive_NER”)
df = pd.DataFrame(dataset[‘train’])

The first step of our data preparation activity is to analyze the importance of the words in our dataset, for identifying both the most important (frequent and distinctive) words and the rarest words in the dataset, by using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization.
Given the dataset’s size, we decided to run the fine-tuning job using Amazon SageMaker Training.
By using the @remote function capability of the SageMaker Python SDK, we can run our code into a remote job with ease.
In our case, the TF-IDF vectorization and the extraction of the top words and bottom words are performed in a SageMaker training job directly from our notebook, without any code changes, by simply adding the @remote decorator on top of our function. You can define the configurations required by the SageMaker training job, such as dependencies and training image, in a config.yaml file. For more details on the settings supported by the config file, see Using the SageMaker Python SDK
See the following code:

SchemaVersion: ‘1.0’
SageMaker:
PythonSDK:
Modules:
RemoteFunction:
Dependencies: ./requirements.txt
ImageUri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4-gpu-py311′
InstanceType: ml.g5.12xlarge
PreExecutionCommands:
– ‘export NCCL_P2P_DISABLE=1’
Model:
EnableNetworkIsolation: false

Next step is to define and execute our processing function:

import numpy as np
import re
from sagemaker.remote_function import remote
from sklearn.feature_extraction.text import TfidfVectorizer
import string

@remote(volume_size=10, job_name_prefix=f”preprocess-auto-ner-auto-merge”, instance_type=”ml.m4.10xlarge”)
def preprocess(df,
               top_n=6000,
               bottom_n=6000
    ):
    # Download nltk stopwords
    import nltk
    nltk.download(‘stopwords’)
    from nltk.corpus import stopwords

    # Define a function to preprocess text
    def preprocess_text(text):
        if not isinstance(text, str):
            # Return an empty string or handle the non-string value as needed
            return ”
    
        # Remove punctuation
        text = re.sub(r'[%s]’ % re.escape(string.punctuation), ”, text)
    
        # Convert to lowercase
        text = text.lower()
    
        # Remove stop words (optional)
        stop_words = set(stopwords.words(‘english’))
        text = ‘ ‘.join([word for word in text.split() if word not in stop_words])
    
        return text
    
    print(“Applying text preprocessing”)
    
    # Preprocess the text columns
    df[‘DESC_DEFECT’] = df[‘DESC_DEFECT’].apply(preprocess_text)
    df[‘CONEQUENCE_DEFECT’] = df[‘CONEQUENCE_DEFECT’].apply(preprocess_text)
    df[‘CORRECTIVE_ACTION’] = df[‘CORRECTIVE_ACTION’].apply(preprocess_text)
    
    # Create a TfidfVectorizer object
    tfidf_vectorizer = TfidfVectorizer()

    print(“Compute TF-IDF”)
    
    # Fit and transform the text data
    X_tfidf = tfidf_vectorizer.fit_transform(df[‘DESC_DEFECT’] + ‘ ‘ + df[‘CONEQUENCE_DEFECT’] + ‘ ‘ + df[‘CORRECTIVE_ACTION’])
    
    # Get the feature names (words)
    feature_names = tfidf_vectorizer.get_feature_names_out()
    
    # Get the TF-IDF scores
    tfidf_scores = X_tfidf.toarray()
    
    top_word_indices = np.argsort(tfidf_scores.sum(axis=0))[-top_n:]
    bottom_word_indices = np.argsort(tfidf_scores.sum(axis=0))[:bottom_n]

    print(“Extracting top and bottom words”)
    
    # Get the top and bottom words
    top_words = [feature_names[i] for i in top_word_indices]
    bottom_words = [feature_names[i] for i in bottom_word_indices]

    return top_words, bottom_words

top_words, bottom_words = preprocess(df)

After we extract the top and bottom 6,000 words based on their TF-IDF scores from our original dataset, we classify each row in the dataset based on whether it contained any of these important or rare words. Rows are labeled as ‘top’ if they contained important words, ‘bottom’ if they contained rare words, or ‘neither’ if they don’t contain either:

# Create a function to check if a row contains important or rare words
def contains_important_or_rare_words(row):
    try:
        if (“DESC_DEFECT” in row.keys() and row[“DESC_DEFECT”] is not None and
            “CONEQUENCE_DEFECT” in row.keys() and row[“CONEQUENCE_DEFECT”] is not None and
            “CORRECTIVE_ACTION” in row.keys() and row[“CORRECTIVE_ACTION”] is not None):
            text = row[‘DESC_DEFECT’] + ‘ ‘ + row[‘CONEQUENCE_DEFECT’] + ‘ ‘ + row[‘CORRECTIVE_ACTION’]
        
            text_words = set(text.split())
        
            # Check if the row contains any important words (top_words)
            for word in top_words:
                if word in text_words:
                    return ‘top’
        
            # Check if the row contains any rare words (bottom_words)
            for word in bottom_words:
                if word in text_words:
                    return ‘bottom’
        
            return ‘neither’
        else:
            return ‘none’
    except Exception as e:
        raise e

df[‘word_type’] = df.apply(contains_important_or_rare_words, axis=1)

Finally, we create a balanced subset of the dataset by selecting all rows containing important words (‘top’) and an equal number of rows containing rare words (‘bottom’). If there aren’t enough ‘bottom’ rows, we filled the remaining slots with ‘neither’ rows.

DESC_DEFECT
CONEQUENCE_DEFECT
CORRECTIVE_ACTION
word_type

2
ON CERTAIN FOLDING TENT CAMPERS, THE FEDERAL C…
IF THE TIRES WERE INFLATED TO 80 PSI, THEY COU…
OWNERS WILL BE MAILED CORRECT LABELS FOR INSTA…
top

2402
CERTAIN PASSENGER VEHICLES EQUIPPED WITH DUNLO…
THIS COULD RESULT IN PREMATURE TIRE WEAR.
DEALERS WILL INSPECT AND IF NECESSARY REPLACE …
bottom

0
CERTAIN PASSENGER VEHICLES EQUIPPED WITH ZETEC…
THIS, IN TURN, COULD CAUSE THE BATTERY CABLES …
DEALERS WILL INSPECT THE BATTERY CABLES FOR TH…
neither

Finally, we randomly sampled 6,000 rows from this balanced set:

# Select all rows from each group
top_rows = df[df[‘word_type’] == ‘top’]
bottom_rows = df[df[‘word_type’] == ‘bottom’]

# Combine the two groups, ensuring a balanced dataset
if len(bottom_rows) > 0:
    df = pd.concat([top_rows, bottom_rows.sample(n=len(bottom_rows), random_state=42)], ignore_index=True)
else:
    df = top_rows.copy()

# If the combined dataset has fewer than 6010 rows, fill with remaining rows
if len(df) < 6000:
    remaining_rows = df[df[‘word_type’] == ‘neither’].sample(n=6010 – len(df), random_state=42)
    df = pd.concat([df, remaining_rows], ignore_index=True)

df = df.sample(n=6000, random_state=42)

Fine-tuning Meta Llama 3.1 8B with a SageMaker training job
After selecting the data, we need to prepare the resulting dataset for the fine-tuning activity. By examining the columns, we aim to adapt the model for two different tasks:
The following code is for the first prompt:

# User: 
{MFGNAME}
{COMPNAME}
{DESC_DEFECT}
# AI: 
{CONEQUENCE_DEFECT}

With this prompt, we instruct the model to highlight the possible consequences of a defect, given the manufacturer, component name, and description of the defect.
The following code is for the second prompt:

# User:
{MFGNAME}
{COMPNAME}
{DESC_DEFECT}
# AI: 
{CORRECTIVE_ACTION}

With this second prompt, we instruct the model to suggest possible corrective actions for a given defect and component of a specific manufacturer.
First, let’s split the dataset into train, test, and validation subsets:

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.1, random_state=42)
train, valid = train_test_split(train, test_size=10, random_state=42)

Next, we create prompt templates to convert each row item into the two prompt formats previously described:

from random import randint

# template dataset to add prompt to each sample
def template_dataset_consequence(sample):
    # custom instruct prompt start
    prompt_template = f”””
    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    These are the information related to the defect    

    Manufacturer: {{mfg_name}}
    Component: {{comp_name}}
    Description of a defect:
    {{desc_defect}}
    
    What are the consequences of defect?
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    {{consequence_defect}}
    <|end_of_text|><|eot_id|>
    “””
    sample[“text”] = prompt_template.format(
        mfg_name=sample[“MFGNAME”],
        comp_name=sample[“COMPNAME”],
        desc_defect=sample[“DESC_DEFECT”].lower(),
        consequence_defect=sample[“CONEQUENCE_DEFECT”].lower())
    return sample

from random import randint

# template dataset to add prompt to each sample
def template_dataset_corrective_action(sample):
    # custom instruct prompt start
    prompt_template = f”””
    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    Manufacturer: {{mfg_name}}
    Component: {{comp_name}}
    
    Description of a defect:
    {{desc_defect}}
    
    What are the possible corrective actions?
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    {{corrective_action}}
    <|end_of_text|><|eot_id|>
    “””
    sample[“text”] = prompt_template.format(
        mfg_name=sample[“MFGNAME”],
        comp_name=sample[“COMPNAME”],
        desc_defect=sample[“DESC_DEFECT”].lower(),
        corrective_action=sample[“CORRECTIVE_ACTION”].lower())
    return sample

Now we can apply the template functions template_dataset_consequence and template_dataset_corrective_action to our datasets:

As a final step, we concatenate the four resulting datasets for train and test:

Our final training dataset comprises approximately 12,000 elements, properly split into about 11,000 for training and 1,000 for testing.
Now we can prepare the training script and define the training function train_fn and put the @remote decorator on the function.
The training function does the following:

Tokenizes and chunks the dataset
Sets up BitsAndBytesConfig, for model quantization, which specifies the model should be loaded in 4-bit
Uses mixed precision for the computation, by converting model parameters to bfloat16
Loads the model
Creates LoRA configurations that specify ranking of update matrices (r), scaling factor (lora_alpha), the modules to apply the LoRA update matrices (target_modules), dropout probability for Lora layers (lora_dropout), task_type, and more
Starts the training and evaluation

Because we want to distribute the training across all the available GPUs in our instance, by using PyTorch Distributed Data Parallel (DDP), we use the Hugging Face Accelerate library that enables us to run the same PyTorch code across distributed configurations.
For optimizing memory resources, we have decided to run a mixed precision training:

from accelerate import Accelerator
from huggingface_hub import login
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, set_seed
import transformers

# Start training
@remote(
    keep_alive_period_in_seconds=0,
    volume_size=100, job_name_prefix=f”train-{model_id.split(‘/’)[-1].replace(‘.’, ‘-‘)}-auto”,
    use_torchrun=True,
    nproc_per_node=4)

def train_fn(
        model_name,
        train_ds,
        test_ds=None,
        lora_r=8,
        lora_alpha=16,
        lora_dropout=0.1,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        gradient_accumulation_steps=1,
        learning_rate=2e-4,
        num_train_epochs=1,
        fsdp=””,
        fsdp_config=None,
        gradient_checkpointing=False,
        merge_weights=False,
        seed=42,
        token=None
):

    set_seed(seed)
    accelerator = Accelerator()
    if token is not None:
        login(token=token)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Set Tokenizer pad Token
    tokenizer.pad_token = tokenizer.eos_token
    with accelerator.main_process_first():

        # tokenize and chunk dataset
        lm_train_dataset = train_ds.map(
            lambda sample: tokenizer(sample[“text”]), remove_columns=list(train_ds.features)
        )

        print(f”Total number of train samples: {len(lm_train_dataset)}”)

        if test_ds is not None:

            lm_test_dataset = test_ds.map(
                lambda sample: tokenizer(sample[“text”]), remove_columns=list(test_ds.features)
            )

            print(f”Total number of test samples: {len(lm_test_dataset)}”)
        else:
            lm_test_dataset = None
      
    torch_dtype = torch.bfloat16

    # Defining additional configs for FSDP
    if fsdp != “” and fsdp_config is not None:
        bnb_config_params = {
            “bnb_4bit_quant_storage”: torch_dtype
        }

        model_configs = {
            “torch_dtype”: torch_dtype
        }

        fsdp_configurations = {
            “fsdp”: fsdp,
            “fsdp_config”: fsdp_config,
            “gradient_checkpointing_kwargs”: {
                “use_reentrant”: False
            },
            “tf32″: True
        }

    else:
        bnb_config_params = dict()
        model_configs = dict()
        fsdp_configurations = dict()
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type=”nf4″,
        bnb_4bit_compute_dtype=torch_dtype,
        **bnb_config_params
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        attn_implementation=”flash_attention_2″,
        use_cache=not gradient_checkpointing,
        cache_dir=”/tmp/.cache”,
        **model_configs
    )

    if fsdp == “” and fsdp_config is None:
        model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)

    if gradient_checkpointing:
        model.gradient_checkpointing_enable()

    config = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        target_modules=”all-linear”,
        lora_dropout=lora_dropout,
        bias=”none”,
        task_type=”CAUSAL_LM”
    )

    model = get_peft_model(model, config)
    print_trainable_parameters(model)

    trainer = transformers.Trainer(
        model=model,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset if lm_test_dataset is not None else None,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            per_device_eval_batch_size=per_device_eval_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            gradient_checkpointing=gradient_checkpointing,
            logging_strategy=”steps”,
            logging_steps=1,
            log_on_each_node=False,
            num_train_epochs=num_train_epochs,
            learning_rate=learning_rate,
            bf16=True,
            ddp_find_unused_parameters=False,
            save_strategy=”no”,
            output_dir=”outputs”,
            **fsdp_configurations
        ),

        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )

    trainer.train()

    if trainer.is_fsdp_enabled:
        trainer.accelerator.state.fsdp_plugin.set_state_dict_type(“FULL_STATE_DICT”)

    if merge_weights:
        output_dir = “/tmp/model”
        # merge adapter weights with base model and save
        # save int 4 model
        trainer.model.save_pretrained(output_dir, safe_serialization=False)
      
        if accelerator.is_main_process:
            # clear memory
            del model
            del trainer
            torch.cuda.empty_cache()

            # load PEFT model
            model = AutoPeftModelForCausalLM.from_pretrained(
                output_dir,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
            ) 

            # Merge LoRA and base model and save
            model = model.merge_and_unload()
            model.save_pretrained(
                “/opt/ml/model”, safe_serialization=True, max_shard_size=”2GB”
            )

    else:
        trainer.model.save_pretrained(“/opt/ml/model”, safe_serialization=True)

    if accelerator.is_main_process:
        tokenizer.save_pretrained(“/opt/ml/model”)

We can specify to run a distributed job in the @remote function through the parameters use_torchrun and nproc_per_node, which indicates if the SageMaker job should use as entrypoint torchrun and the number of GPUs to use. You can pass optional parameters like volume_size, subnets, and security_group_ids using the @remote decorator.
Finally, we run the job by invoking train_fn():

model_id = “meta-llama/Meta-Llama-3.1-8B-Instruct”

train_fn(
    model_id,
    train_ds=train_dataset,
    test_ds=test_dataset,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    num_train_epochs=1,
    merge_weights=True,
    token=”<HF_TOKEN>”
)

The training job runs on the SageMaker training cluster. The training job took about 42 minutes, by distributing the computation across the 4 available GPUs on the selected instance type ml.g5.12xlarge.
We choose to merge the LoRA adapter with the base model. This decision was made during the training process by setting the merge_weights parameter to True in our train_fn() function. Merging the weights provides us with a single, cohesive model that incorporates both the base knowledge and the domain-specific adaptations we’ve made through fine-tuning.
By merging the model, we gain flexibility in our deployment options.
Model deployment
When deploying a fine-tuned model on AWS, multiple deployment strategies are available. In this post, we explore two deployment methods:

SageMaker real-time inference – This option is designed for having full control of the inference resources. We can use a set of available instances and deployment options for hosting our model. By using the SageMaker built-in containers, such as DJL Serving or Hugging Face TGI, we can use the inference script and the optimization options provided in the container.
Amazon Bedrock Custom Model Import – This option is designed for importing and deploying custom language models. We can use this fully managed capability for interacting with the deployed model with on-demand throughput.

Model deployment with SageMaker real-time inference
SageMaker real-time inference is designed for having full control over the inference resources. It allows you to use a set of available instances and deployment options for hosting your model. By using the SageMaker built-in container Hugging Face Text Generation Inference (TGI), you can take advantage of the inference script and optimization options available in the container.
In this post, we deploy the fine-tuned model to a SageMaker endpoint for running inference, which will be used for evaluating the model in the next step.
We create the HuggingFaceModel object, which is a high-level SageMaker model class for working with Hugging Face models. The image_uri parameter specifies the container image URI for the model, and model_data points to the Amazon Simple Storage Service (Amazon S3) location containing the model artifact (automatically uploaded by the SageMaker training job). We also specify a set of environment variables to configure the number of GPUs (SM_NUM_GPUS), quantization methodology (QUANTIZE), and maximum input and total token lengths (MAX_INPUT_LENGTH and MAX_TOTAL_TOKENS).

model = HuggingFaceModel(
    image_uri=image_uri,
    model_data=f”s3://{bucket_name}/{job_name}/{job_name}/output/model.tar.gz”,
    role=get_execution_role(),
    env={
        ‘HF_MODEL_ID’: “/opt/ml/model”, # path to where sagemaker stores the model
        ‘SM_NUM_GPUS’: json.dumps(number_of_gpu), # Number of GPU used per replica
        ‘QUANTIZE’: ‘bitsandbytes’,
        ‘MAX_INPUT_LENGTH’: ‘4096’,
        ‘MAX_TOTAL_TOKENS’: ‘8192’
    }
)

After creating the model object, we can deploy it to an endpoint using the deploy method. The initial_instance_count and instance_type parameters specify the number and type of instances to use for the endpoint. The container_startup_health_check_timeout and model_data_download_timeout parameters set the timeout values for the container startup health check and model data download, respectively.

predictor = model.deploy(
    initial_instance_count=instance_count,
    instance_type=instance_type,
    container_startup_health_check_timeout=health_check_timeout,
    model_data_download_timeout=3600
)

It takes a few minutes to deploy the model before it becomes available for inference and evaluation. The endpoint is invoked using the AWS SDK with the boto3 client for sagemaker-runtime, or directly by using the SageMaker Python SDK and the predictor previously created, by using the predict API.

body = {
        ‘inputs’: prompt,
        ‘parameters’: {
            “top_p”: 0.9,
            “temperature”: 0.2,
            “max_new_tokens”: 512,
            “return_full_text”: False,
            “stop”: [
                ‘<|eot_id|>’,
                ‘<|end_of_text|>’
            ]
        }
    }
response = predictor.predict(body)

Model deployment with Amazon Bedrock Custom Model Import
Amazon Bedrock Custom Model Import is a fully managed capability, currently in public preview, designed for importing and deploying custom language models. It allows you to interact with the deployed model both on-demand and by provisioning the throughput.
In this section, we use the Custom Model Import feature in Amazon Bedrock for deploying our fine-tuned model in the fully managed environment of Amazon Bedrock.

After defining the model and job_name variables, we import our model from the S3 bucket by supplying it in the Hugging Face weights format.

Next, we use a preexisting AWS Identity and Access Management (IAM) role that allows reading the binary file from Amazon S3 and create the import job resource in Amazon Bedrock for hosting our model.

It takes a few minutes to deploy the model, and it can be invoked using the AWS SDK with the boto3 client for bedrock-runtime by using the invoke_model API:

fine_tuned_model_id = “<MODEL_ARN>”

body = {
        “prompt”: prompt,
        “temperature”: 0.1,
        “top_p”: 0.9,
    }

response = bedrock_client.invoke_model(
        modelId=fine_tuned_model_id,
        body=json.dumps(body)
)

Model evaluation
In this final step, we evaluate the fine-tuned model against the base models Meta Llama 3 8B Instruct and Meta Llama 3 70B Instruct on Amazon Bedrock. Our evaluation focuses on how well the model uses specific terminology for the automotive space and the improvements provided by fine-tuning in generating answers.
The fine-tuned model’s ability to understand components and error descriptions for diagnostics, as well as identify corrective actions and consequences in the generated answers, can be evaluated on two dimensions.
To evaluate the quality of the generated text and whether the vocabulary and terminology used are appropriate for the task and industry, we use the Bilingual Evaluation Understudy (BLEU) score. BLEU is an algorithm for evaluating the quality of text, by calculating n-gram overlap between the generated and the reference text.
To evaluate the accuracy of the generated text and see if the generated answer is similar to the expected one, we use the Normalized Levenshtein distance. This algorithm evaluates how close the calculated or measured values are to the actual value.
The evaluation dataset comprises 10 unseen examples of component diagnostics extracted from the original training dataset.
The prompt template for the evaluation is structured as follows:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Manufacturer: {row[‘MFGNAME’]}
Component: {row[‘COMPNAME’]}

Description of a defect:
{row[‘DESC_DEFECT’]}

What are the consequences?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

BLEU score evaluation with base Meta Llama 3 8B and 70B Instruct
The following table and figures show the calculated values for the BLEU score comparison (higher is better) with Meta Llama 3 8B and 70 B Instruct.

Example
Fine-Tuned Score
Base Score: Meta Llama 3 8B
Base Score: Meta Llama 3 70B

1
2733
0. 2936
5.10E-155
4.85E-155

2
3382
0.1619
0.058
1.134E-78

3
1198
0.2338
1.144E-231
3.473E-155

4
2942
0.94854
2.622E-231
3.55E-155

5
5151
1.28E-155
0
0

6
2101
0.80345
1.34E-78
1.27E-78

7
5178
0.94854
0.045
3.66E-155

8
1595
0.40412
4.875E-155
0.1326

9
2313
0.94854
3.03E-155
9.10E-232

10
557
0.89315
8.66E-79
0.1954

By comparing the fine-tuned and base scores, we can assess the performance improvement (or degradation) achieved by fine-tuning the model in the vocabulary and terminology used.
The analysis suggests that for the analyzed cases, the fine-tuned model outperforms the base model in the vocabulary and terminology used in the generated answer. The fine-tuned model appears to be more consistent in its performance.
Normalized Levenshtein distance with base Meta Llama 3 8B Instruct
The following table and figures show the calculated values for the Normalized Levenshtein distance comparison with Meta Llama 3 8B and 70B Instruct.

Example
Fine-tuned Score
Base Score – Llama 3 8B
Base Score – Llama 3 70B

1
2733
0.42198
0.29900
0.27226

2
3382
0.40322
0.25304
0.21717

3
1198
0.50617
0.26158
0.19320

4
2942
0.99328
0.18088
0.19420

5
5151
0.34286
0.01983
0.02163

6
2101
0.94309
0.25349
0.23206

7
5178
0.99107
0.14475
0.17613

8
1595
0.58182
0.19910
0.27317

9
2313
0.98519
0.21412
0.26956

10
557
0.98611
0.10877
0.32620

By comparing the fine-tuned and base scores, we can assess the performance improvement (or degradation) achieved by fine-tuning the model on the specific task or domain.
The analysis shows that the fine-tuned model clearly outperforms the base model across the selected examples, suggesting the fine-tuning process has been quite effective in improving the model’s accuracy and generalization in understanding the specific cause of the component defect and providing suggestions on the consequences.
In the evaluation analysis performed for both selected metrics, we can also highlight some areas for improvement:

Example repetition – Provide similar examples for further improvements in the vocabulary and generalization of the generated answer, increasing the accuracy of the fine-tuned model.
Evaluate different data processing techniques – In our example, we selected a subset of the original dataset by analyzing the frequency of words across the entire dataset, extracting the rows containing the most meaningful information and identifying outliers. Further curation of the dataset by properly cleaning and expanding the number of examples can increase the overall performance of the fine-tuned model.

Clean up
After you complete your training and evaluation experiments, clean up your resources to avoid unnecessary charges. If you deployed the model with SageMaker, you can delete the created real-time endpoints using the SageMaker console. Next, delete any unused SageMaker Studio resources. If you deployed the model with Amazon Bedrock Custom Model Import, you can delete the imported model using the Amazon Bedrock console.
Conclusion
This post demonstrated the process of customizing SLMs on AWS for domain-specific applications, focusing on automotive terminology for diagnostics. The provided steps and source code show how to analyze data, fine-tune models, deploy them efficiently, and evaluate their performance against larger base models using SageMaker and Amazon Bedrock. We further highlighted the benefits of customization by enhancing vocabulary within specialized domains.
You can evolve this solution further by implementing proper ML pipelines and LLMOps practices through Amazon SageMaker Pipelines. SageMaker Pipelines enables you to automate and streamline the end-to-end workflow, from data preparation to model deployment, enhancing reproducibility and efficiency. You can also improve the quality of training data using advanced data processing techniques. Additionally, using the Reinforcement Learning from Human Feedback (RLHF) approach can align the model response to human preferences. These enhancements can further elevate the performance of customized language models across various specialized domains. You can find the sample code discussed in this post on the GitHub repo.

About the authors
Bruno Pistone is a Senior Generative AI and ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations
Gopi Krishnamurthy is a Senior AI/ML Solutions Architect at Amazon Web Services based in New York City. He works with large Automotive and Industrial customers as their trusted advisor to transform their Machine Learning workloads and migrate to the cloud. His core interests include deep learning and serverless technologies. Outside of work, he likes to spend time with his family and explore a wide range of music.

Shopify Marketing Apps: 21 Essential Tools for Ecommerce Stores

Shopify’s App Store is packed with options. In fact, there are over 8,000 apps available!. From email marketing to customer reviews to SEO, the possibilities are almost endless. But with so many choices, figuring out which ones are actually worth your time can be tricky.

Here’s the thing – not every app is a perfect fit for every business. 

The key is finding the tools that match your specific goals. Whether it’s increasing conversions, automating workflows, or building customer loyalty, choosing the right apps can make a big impact.

That’s what we are here for. 

In this post, we’re spotlighting 25 Shopify marketing apps that every ecommerce brand should know about. Whether they’re widely loved or hidden gems waiting to be discovered, these tools are built to deliver results. Let’s jump in!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

The Criteria for Choosing Your Shopify Marketing Apps

With thousands of Shopify marketing apps out there, picking the right ones for your store can feel like a challenge. To make the process easier, focus on these key criteria to ensure the apps you choose deliver real value.

1. Integration Capabilities

Your marketing apps need to play nicely with the tools you’re already using. 

Whether it’s syncing with Shopify data, connecting to your email marketing platform, or integrating with social media channels, seamless integration saves time and reduces headaches. Look for apps that fit smoothly into your existing tech stack.

2. User Experience and Ease of Use

No one has time for clunky, hard-to-navigate tools. Apps with intuitive interfaces and clear documentation let you hit the ground running without needing a degree in tech wizardry. The easier it is for you (and your team) to use, the faster you’ll see results.

3. Scalability and Pricing

Your business is growing and your apps need to grow with you. Consider tools that offer flexible pricing plans and features that can scale as your needs evolve. Bonus points if an app has a free trial or basic plan to test it out before committing.

4. Customer Support and Community Feedback

Even the best apps can hit a snag, so reliable customer support is a must. Check out user reviews and community forums to see how responsive the app’s team is to issues and updates. A strong community around an app is also a good sign that it’s trusted by other Shopify users.

It’s not about having the most apps. It’s about having the right ones and by keeping these criteria in mind, you can cut through the clutter and zero in on the apps that will actually help you. 

The Top 21 Shopify Marketing Apps for Ecommerce Stores

Here is the full list if you want to jump to a specific tool:

Website Visitor ID X-Ray Pixel

Klaviyo

Omnisend 

Customers.ai Signal  

Privy 

Yotpo 

Smile.io 

ReferralCandy 

Justuno

PushOwl 

Recart

Stamped.io 

Gorgias 

Tidio 

Seguno 

Loox 

ReConvert

PageFly

SEO Manager 

Plug in SEO 

Customers.ai Abandoned Cart Recovery

1. Website Visitor Identification X-Ray Pixel

The Customers.ai Website Visitor Identification Pixel is a tool designed to identify anonymous visitors to your website. By capturing visitor information such as email addresses and names, it enables businesses to engage with potential customers who might have otherwise remained unknown. This data can be seamlessly integrated into your marketing and sales workflows, enhancing lead generation and customer outreach efforts.

Unique Features:

High Identification Rate: The pixel can identify 20-35% of website visitors, providing a substantial increase in potential leads.

Data Enrichment: Beyond basic identification, it enriches visitor contacts with additional consumer or business data, offering deeper insights into your audience.

Seamless Integration: The tool integrates with various marketing platforms, allowing for automated email outreach and retargeting campaigns.

Use Case:

Imagine an ecommerce store experiencing high traffic but low conversion rates. By implementing the Customers.ai pixel, the store can identify a significant portion of its anonymous visitors, obtain their contact information, and initiate personalized email campaigns. 

This targeted approach can lead to increased engagement and higher conversion rates, effectively turning previously lost opportunities into sales.

Ratings & Reviews:

G2: Customers.ai holds a rating of 4.8 out of 5 stars.

Product Hunt: The Website Visitor Identification Pixel has received positive feedback, with users highlighting its efficiency in capturing visitor information and enhancing marketing efforts.

Overall, the Customers.ai Website Visitor Identification Pixel is a valuable tool for businesses aiming to maximize their website’s potential by converting anonymous visitors into actionable leads.

Here is how to identify your website visitors with Customers.ai.:

1. Sign up for a free account

If you don’t already have a Customers.ai account, sign up here (no credit card is required) and connect your business.

2. Install the x-ray pixel on your site

Installing the website identification x-ray pixel is easy and can be done through Tag Manager, Shopify, WordPress, and more

3. Verify the x-ray pixel is firing

4. Start identifying your website visitors

That’s it! Once the pixel is installed and verified, you can start identifying your website visitors.

2. Klaviyo for Shopify Marketing App

Klaviyo is a powerful email and SMS marketing platform designed specifically for ecommerce brands. It allows businesses to create personalized, data-driven marketing campaigns using customer insights. From email automation to advanced segmentation, Klaviyo provides the tools you need to engage your audience and drive sales.

Unique Features:

Advanced Segmentation: Klaviyo’s segmentation capabilities allow you to create hyper-targeted groups based on behavior, purchase history, and engagement.

Predictive Analytics: Leverage AI to forecast customer behavior, including likely next purchase dates and lifetime value.

Multi-Channel Marketing: Combine email and SMS campaigns in one platform for a seamless customer experience.

Ecommerce Integrations: Klaviyo integrates with major platforms like Shopify, WooCommerce, and Magento, making it easy to sync data for personalized campaigns.

Use Case:

An ecommerce brand selling beauty products can use Klaviyo to segment their audience based on purchase frequency. They could set up automated flows to re-engage customers who haven’t purchased in 90 days, offering them a discount on their favorite products. The result? A boost in repeat purchases and stronger customer loyalty.

Ratings & Reviews:

G2 Rating: 4.7 out of 5 stars

Capterra Rating: 4.6 out of 5 starsUsers praise Klaviyo for its ease of use, robust analytics, and the ability to create highly personalized campaigns. However, some note that the pricing can be steep for smaller businesses.

3. Omnisend Shopify Marketing App

Omnisend is an all-in-one marketing automation platform tailored for ecommerce businesses. It enables brands to create personalized email and SMS campaigns, automate workflows, and integrate multiple channels to enhance customer engagement. Omnisend offers seamless integration with Shopify, allowing for real-time data synchronization, which facilitates targeted marketing efforts and efficient customer segmentation.

Unique Features:

Omnichannel Marketing: Combine email, SMS, push notifications, and more within a single workflow to deliver a cohesive customer experience.

Pre-built Automation Workflows: Utilize ready-made workflows for cart abandonment, welcome series, order confirmations, and more, enabling quick setup and deployment.

Advanced Segmentation: Segment your audience based on shopping behavior, purchase history, and engagement levels to send highly targeted messages.

Drag-and-Drop Email Builder: Create visually appealing emails effortlessly with a user-friendly, drag-and-drop interface.

Use Case:

Consider an online apparel store aiming to reduce cart abandonment rates. By integrating Omnisend with their Shopify store, they can set up an automated cart abandonment workflow that sends a series of personalized emails and SMS messages to customers who left items in their carts. This multi-channel approach increases the likelihood of recovering lost sales and enhances customer retention.

Ratings & Reviews:

Shopify App Store: 4.8 out of 5 stars, based on over 5,300 reviews.

G2: 4.5 out of 5 stars, with users highlighting its ease of use and robust automation features.

4. Signal Shopify Marketing App

Signal by Customers.ai is a tool designed to enhance Shopify’s marketing capabilities by identifying return visitors that might otherwise go unnoticed. Shopify’s default tracking may miss visitors after a certain period, but Signal steps in to capture and identify those high-intent return visitors who are showing renewed interest in your store.

Unique Features:

Extended Visitor Identification: Signal identifies visitors returning to your Shopify store, even after the initial tracking window has expired.

High-Intent Focus: Pinpoints customers who are likely in-market and ready to make a purchase based on their return behavior.

Data Enrichment: Provides enriched profiles with actionable insights for personalized marketing efforts.

Use Case:

A Shopify store selling home decor notices many visitors returning after browsing but not completing their purchase. With Signal, the store can identify these return visitors and send personalized email campaigns featuring the products they viewed. This targeted follow-up can re-engage customers at the right moment, increasing conversions and driving sales.

Ratings & Reviews:

G2 Rating: 4.8 out of 5 stars

Customer Feedback: Users praise Signal for its ability to capture return visitors and convert previously lost opportunities into revenue.

Signal by Customers.ai ensures you’re not missing out on high-value return visitors, giving you the tools to engage them at the perfect time and drive more sales through your Shopify store.

5. Privy Shopify Marketing App

Privy is an all-in-one marketing platform designed to help Shopify businesses grow their sales through pop-ups, email, and SMS marketing. It enables merchants to create engaging website displays, automate email campaigns, and send targeted SMS messages, all aimed at increasing conversions and customer engagement. Privy integrates seamlessly with Shopify, allowing for easy synchronization of customer data and streamlined marketing efforts.

Unique Features:

Customizable Pop-Ups: Design and launch pop-up campaigns within minutes using a drag-and-drop editor, with options like spin-to-win wheels, countdown timers, and banners.

Email and SMS Automation: Automate welcome emails, cart abandonment reminders, and other campaigns to nurture leads and drive sales.

Advanced Targeting: Target campaigns based on exit intent, cart value, and website behavior to personalize customer interactions.

Seamless Shopify Integration: Privy syncs with Shopify to pull products and photos directly into emails and track campaign performance.

Use Case:

A Shopify store specializing in handmade jewelry wants to grow its email list and reduce cart abandonment. By implementing Privy, the store can create eye-catching pop-ups offering a discount in exchange for email sign-ups. Additionally, automated cart abandonment emails and SMS messages can be set up to remind customers of their pending purchases, leading to increased conversions and a larger subscriber base.

Ratings & Reviews:

Shopify App Store: 4.6 out of 5 stars, based on over 24,900 reviews.

Customer Feedback: Users commend Privy for its ease of use, robust features, and excellent customer support, noting significant improvements in email capture rates and sales.

6. Yotpo Shopify Marketing App

Yotpo is a comprehensive platform that enables businesses to collect and leverage customer reviews and user-generated content (UGC) to build trust and drive sales. By integrating with Shopify, Yotpo allows merchants to seamlessly gather and display authentic customer feedback, enhancing the shopping experience and boosting conversion rates.

Unique Features:

Automated Review Requests: Yotpo automates the process of soliciting reviews from customers post-purchase, increasing the volume of feedback collected.

Visual UGC Integration: The platform supports the collection and display of customer photos and videos, adding a visual element to reviews that enhances credibility.

Customizable Widgets: Merchants can tailor the appearance of review displays to align with their brand aesthetics, ensuring a cohesive look across the site.

Advanced Moderation Tools: Yotpo provides robust moderation capabilities, allowing businesses to manage and respond to reviews effectively.

Use Case:

A Shopify store specializing in eco-friendly products aims to build trust with potential customers. By implementing Yotpo, the store can collect and showcase authentic reviews and photos from satisfied customers. This user-generated content serves as social proof, reassuring new visitors of the product quality and encouraging them to make a purchase.

Ratings & Reviews:

Shopify App Store: 4.7 out of 5 stars, based on over 2,000 reviews.

Customer Feedback: Users praise Yotpo for its ease of integration, user-friendly interface, and the positive impact on customer engagement and sales.

7. Smile.io Shopify Marketing App

Smile.io is a loyalty and rewards platform designed to help Shopify merchants increase customer retention and lifetime value. By integrating seamlessly with Shopify, Smile.io enables businesses to create customized loyalty programs that reward customers for various actions, such as making purchases, referring friends, or engaging on social media. This fosters a sense of community and encourages repeat business.

Unique Features:

Points Programs: Reward customers with points for specific actions, which can be redeemed for discounts or other incentives.

Referral Programs: Encourage customers to refer friends by offering rewards to both the referrer and the new customer.

VIP Programs: Create tiered loyalty programs that offer exclusive perks to your most valuable customers.

Customization: Tailor the appearance and functionality of your loyalty program to align with your brand identity.

Use Case:

A Shopify store specializing in organic skincare products wants to boost customer loyalty and increase repeat purchases. By implementing Smile.io, the store sets up a points-based system where customers earn points for every purchase, social media engagement, and referrals. Accumulated points can be redeemed for discounts on future purchases, encouraging customers to return and fostering a loyal customer base.

Ratings & Reviews:

Shopify App Store: 4.8 out of 5 stars, based on over 6,000 reviews.

Customer Feedback: Users commend Smile.io for its user-friendly interface, seamless Shopify integration, and positive impact on customer engagement and retention.

8. Referral Candy Shopify Marketing App

ReferralCandy is a referral marketing platform designed to help Shopify merchants boost sales through word-of-mouth marketing. By integrating seamlessly with Shopify, it enables businesses to set up and manage customer referral programs, incentivizing existing customers to refer new ones. This approach leverages satisfied customers to drive new sales, enhancing customer acquisition efforts.

Unique Features:

Automated Referral Tracking: ReferralCandy automatically tracks referrals and rewards, simplifying program management.

Customizable Rewards: Merchants can tailor rewards to fit their brand, offering discounts, cash, or custom gifts to referrers and their friends.

Multi-Channel Sharing: Customers can share referral links via email, social media, or direct messaging, broadening the program’s reach.

Comprehensive Analytics: The platform provides detailed insights into referral performance, helping businesses optimize their programs.

Use Case:

A Shopify store specializing in fitness apparel aims to expand its customer base. By implementing ReferralCandy, the store sets up a program where existing customers receive a discount for each successful referral, and their referred friends also get a discount on their first purchase. This strategy encourages satisfied customers to promote the brand, leading to increased sales and a growing customer community.

Ratings & Reviews:

Shopify App Store: 4.8 out of 5 stars, based on over 1,800 reviews.

Customer Feedback: Users praise ReferralCandy for its ease of use, effective referral tracking, and positive impact on customer acquisition and sales growth.

9. JustUno Shopify Marketing App

Justuno is a conversion optimization platform that empowers Shopify merchants to enhance their website’s performance through targeted pop-ups, banners, and personalized messaging. By integrating seamlessly with Shopify, Justuno enables businesses to create engaging on-site experiences that drive conversions, increase average order value, and grow email lists.

Unique Features:

Advanced Targeting and Segmentation: Utilize over 80 targeting rules to display personalized messages based on visitor behavior, referral source, and more.

AI-Powered Product Recommendations: Leverage artificial intelligence to showcase relevant products, boosting cross-sell and upsell opportunities.

Design Flexibility: Create custom pop-ups and banners with a drag-and-drop editor, ensuring they align with your brand’s aesthetics.

A/B Testing: Test different designs and messaging to determine what resonates best with your audience, optimizing for higher conversion rates.

Use Case:

A Shopify store specializing in eco-friendly home goods aims to reduce cart abandonment and increase email subscribers. By implementing Justuno, the store sets up exit-intent pop-ups offering a discount to users about to leave without purchasing. Additionally, they create targeted banners promoting free shipping for orders over a certain amount, encouraging higher cart values. These strategies lead to a decrease in cart abandonment and a growth in their email list for future marketing efforts.

Ratings & Reviews:

Shopify App Store: 4.7 out of 5 stars, based on over 2,300 reviews.

Customer Feedback: Users commend Justuno for its robust features, ease of use, and positive impact on conversion rates and customer engagement.

10. PushOwl Shopify Marketing App

PushOwl is a web push notification app designed to help Shopify merchants re-engage visitors and boost sales. By integrating seamlessly with Shopify, PushOwl enables businesses to send real-time notifications directly to a user’s device, even when they’re not on the website. This facilitates timely communication about promotions, product updates, and cart reminders, enhancing customer engagement and driving conversions.

Unique Features:

Automated Abandoned Cart Recovery: Send automated reminders to visitors who left items in their cart, encouraging them to complete their purchase.

Personalized Notifications: Customize messages based on user behavior and preferences to increase relevance and effectiveness.

Segmentation: Target specific customer groups with tailored notifications, improving engagement rates.

Analytics and Reporting: Access detailed insights into notification performance to optimize future campaigns.

Use Case:

A Shopify store specializing in handmade crafts notices a high rate of cart abandonment. By implementing PushOwl, the store sets up automated push notifications that remind customers of their pending carts, offering a small discount as an incentive. This strategy leads to a significant increase in recovered sales and a reduction in cart abandonment rates.

Ratings & Reviews:

Shopify App Store: 4.9 out of 5 stars, based on over 3,000 reviews.

Customer Feedback: Users praise PushOwl for its ease of use, effective re-engagement capabilities, and responsive customer support.

11. Recart Shopify Marketing App

Recart is a marketing platform designed to help Shopify merchants enhance customer engagement and recover lost sales through SMS marketing and abandoned cart recovery. By integrating seamlessly with Shopify, Recart enables businesses to automate personalized SMS campaigns, capture leads, and send timely reminders to customers, thereby boosting conversions and fostering customer loyalty.

Unique Features:

SMS Marketing Automation: Create and schedule personalized SMS campaigns to engage customers with promotions, order updates, and more.

Abandoned Cart Recovery: Automatically send reminders to customers who have left items in their cart, encouraging them to complete their purchase.

List Growth Tools: Utilize pop-ups and other tools to capture email and SMS subscribers, expanding your marketing reach.

Analytics and Reporting: Access detailed insights into campaign performance to optimize strategies and improve ROI.

Use Case:

A Shopify store specializing in fitness apparel experiences a high rate of cart abandonment. By implementing Recart, the store sets up automated SMS reminders that are sent to customers who leave items in their cart. These messages include personalized content and exclusive discounts, leading to a significant increase in recovered sales and a reduction in cart abandonment rates.

Ratings & Reviews:

Shopify App Store: 4.8 out of 5 stars, based on over 5,400 reviews.

Customer Feedback: Users commend Recart for its user-friendly interface, effective SMS marketing capabilities, and positive impact on sales recovery.

12. Stamped.io Shopify Marketing App

Stamped.io is a comprehensive platform designed to help Shopify merchants collect and showcase customer reviews, ratings, and user-generated content. Additionally, it offers robust loyalty and rewards programs to enhance customer retention and engagement. By integrating seamlessly with Shopify, Stamped.io enables businesses to build trust, encourage repeat purchases, and foster a loyal customer base.

Unique Features:

Product Reviews and Ratings: Collect and display high-quality reviews, photos, and videos from customers, enriching your product pages and boosting credibility.

Loyalty and Rewards Programs: Implement points-based systems, VIP tiers, and referral programs to incentivize customer engagement and repeat purchases.

Visual Marketing: Leverage user-generated content in your marketing campaigns to enhance authenticity and drive conversions.

Net Promoter Score (NPS): Measure customer satisfaction and loyalty through integrated NPS surveys, providing valuable insights for business improvement.

Use Case:

A Shopify store specializing in eco-friendly home goods aims to build trust with potential customers and encourage repeat business. By implementing Stamped.io, the store collects authentic reviews and photos from satisfied customers, displaying them prominently on product pages. Additionally, they set up a loyalty program where customers earn points for purchases and referrals, redeemable for discounts on future orders. This strategy leads to increased customer engagement, higher conversion rates, and a growing base of loyal customers.

Ratings & Reviews:

Shopify App Store: 4.9 out of 5 stars, based on over 5,000 reviews.

Customer Feedback: Users praise Stamped.io for its ease of use, comprehensive features, and positive impact on customer engagement and sales growth.

13. Gorgias Shopify Marketing App

Gorgias is a customer support helpdesk tailored for ecommerce businesses, offering a unified platform to manage customer interactions across multiple channels, including email, live chat, phone, and social media. By integrating seamlessly with Shopify, Gorgias enables support teams to access customer data and order histories directly within support tickets, facilitating personalized and efficient responses. Additionally, Gorgias connects with over 100 apps, including marketing tools, to enhance customer engagement and streamline operations.

Unique Features:

Unified Support Inbox: Consolidate all customer communications into a single dashboard, eliminating the need to switch between platforms.

Automation and Macros: Automate repetitive tasks and create templated responses to common inquiries, improving response times and consistency.

Integration with Marketing Tools: Connect with marketing platforms like Klaviyo and Yotpo to synchronize customer data, enabling targeted marketing campaigns based on support interactions.

Real-Time Order Management: Access and manage customer orders directly within support tickets, allowing for quick resolutions to order-related queries.

Use Case:

A Shopify store specializing in custom apparel experiences a high volume of customer inquiries across various channels. By implementing Gorgias, the support team consolidates all communications into a single platform, reducing response times and improving customer satisfaction. Integration with Klaviyo allows the marketing team to segment customers based on their support interactions, enabling personalized email campaigns that address specific customer needs and preferences.

Ratings & Reviews:

Shopify App Store: 4.6 out of 5 stars, based on over 1,200 reviews.

Customer Feedback: Users commend Gorgias for its intuitive interface, robust automation features, and seamless integration with Shopify and other marketing tools, noting significant improvements in support efficiency and customer engagement.

14. Tidio Shopify Marketing App

Tidio is a comprehensive customer experience platform that combines live chat, AI-powered chatbots, and a helpdesk solution to enhance customer engagement and support for Shopify merchants. By seamlessly integrating with Shopify, Tidio enables businesses to provide real-time assistance, automate responses to common inquiries, and manage customer interactions efficiently, all from a unified dashboard.

Unique Features:

Live Chat: Offer instant support to website visitors, addressing their questions and concerns in real-time to improve customer satisfaction and boost sales.

AI Chatbots: Deploy AI-driven chatbots to handle repetitive queries, provide product recommendations, and guide customers through the purchasing process, reducing the workload on human agents.

Unified Inbox: Manage all customer messages from various channels, including email, live chat, and social media, in a single, organized inbox for streamlined communication.

Shopify Integration: Access customer order details, recommend products, and manage orders directly within the chat interface, enhancing the efficiency of support operations.

Use Case:

A Shopify store specializing in handmade crafts experiences a surge in customer inquiries during the holiday season. By implementing Tidio, the store sets up AI chatbots to handle common questions about shipping times, product availability, and order tracking. Simultaneously, live chat support is available for more complex inquiries. This combination ensures prompt responses, reduces cart abandonment, and enhances the overall customer experience, leading to increased sales during the peak season.

Ratings & Reviews:

Shopify App Store: 4.7 out of 5 stars, based on over 1,900 reviews.

Customer Feedback: Users praise Tidio for its user-friendly interface, robust feature set, and positive impact on customer engagement and conversion rates.

15. Seguno Shopify Marketing App

Seguno is an email marketing platform built exclusively for Shopify, enabling merchants to create, manage, and track email campaigns directly within their Shopify admin. This seamless integration allows for efficient marketing workflows, leveraging existing store data to personalize communications and drive sales.

Unique Features:

Shopify-Native Integration: Operate entirely within Shopify, utilizing store data for targeted email campaigns without the need for external platforms.

Automated Email Campaigns: Set up automated emails for welcome series, product reviews, and abandoned cart recovery to engage customers at critical points in their journey.

Template Library: Access a variety of customizable templates designed to align with your brand and marketing goals.

Performance Analytics: Monitor the success of your email campaigns with detailed analytics, helping to refine strategies and improve ROI.

Use Case:

A Shopify store specializing in artisanal teas aims to boost customer retention and increase repeat purchases. By implementing Seguno, the store sets up an automated welcome series to introduce new subscribers to their products and offers. Additionally, they create personalized product recommendations based on past purchases, leading to higher engagement and increased sales.

Ratings & Reviews:

Shopify App Store: 4.8 out of 5 stars, based on over 1,170 reviews.

Customer Feedback: Users commend Seguno for its seamless Shopify integration, user-friendly interface, and effective automation features that enhance email marketing efforts.

16. Loox Shopify Marketing App

Loox is a comprehensive social proof marketing platform designed for Shopify merchants, enabling the collection and display of customer reviews, photos, and videos. By integrating seamlessly with Shopify, Loox helps businesses build trust, enhance credibility, and drive sales through authentic user-generated content and referral programs.

Unique Features:

Visual Reviews: Encourage customers to submit photo and video reviews, bringing products to life and providing potential buyers with real-life perspectives.

Automated Review Requests: Send personalized, automated emails to customers post-purchase, prompting them to leave reviews and share their experiences.

Customizable Display Widgets: Showcase reviews in various formats, such as carousels, pop-ups, and badges, all customizable to match your brand’s aesthetic.

Referral Programs: Implement referral incentives, allowing satisfied customers to refer friends and family, thereby expanding your customer base organically.

Use Case:

A Shopify store specializing in handmade jewelry seeks to build trust with new visitors and encourage repeat purchases. By implementing Loox, the store collects photo reviews from customers showcasing their jewelry in everyday settings. These visual testimonials are displayed on product pages and shared on social media, providing authentic social proof. Additionally, the store sets up a referral program where customers receive discounts for referring friends, leading to increased traffic and sales.

Ratings & Reviews:

Shopify App Store: 4.9 out of 5 stars, based on over 21,800 reviews.

Customer Feedback: Users praise Loox for its ease of use, effective review collection, and positive impact on conversion rates and customer trust.

17. Reconvert Shopify Marketing App

ReConvert is a Shopify app designed to help merchants optimize their post-purchase experience by customizing the thank you page and implementing upsell strategies. By integrating seamlessly with Shopify, ReConvert enables businesses to increase average order value (AOV) and boost customer retention through personalized offers and engaging thank you pages.

Unique Features:

Thank You Page Customization: Transform the standard thank you page into a dynamic, revenue-generating asset by adding personalized product recommendations, discount codes, and engaging content.

One-Click Upsells: Offer post-purchase upsells that customers can add to their order with a single click, without needing to re-enter payment information, reducing friction and increasing conversions.

Drag-and-Drop Editor: Easily design and customize the thank you page using a user-friendly drag-and-drop interface, allowing for quick adjustments without coding knowledge.

Advanced Analytics: Monitor the performance of upsell offers and thank you page elements with detailed analytics, enabling data-driven optimization.

Use Case:

A Shopify store specializing in fitness apparel aims to increase its average order value. By implementing ReConvert, the store customizes its thank you page to include personalized product recommendations based on the customer’s purchase history. Additionally, they set up one-click upsell offers for complementary products immediately after checkout. This strategy leads to a significant increase in AOV and enhances customer satisfaction by providing relevant product suggestions.

Ratings & Reviews:

Shopify App Store: 4.9 out of 5 stars, based on over 4,300 reviews.

Customer Feedback: Users commend ReConvert for its ease of use, effective upsell features, and positive impact on revenue growth.

18. PageFly Shopify Marketing App

PageFly is a versatile page builder app designed exclusively for Shopify merchants, enabling the creation of custom landing pages, product pages, and other essential store pages without the need for coding. By integrating seamlessly with Shopify, PageFly offers a user-friendly drag-and-drop interface, allowing businesses to design high-converting pages that enhance the overall shopping experience.

Unique Features:

Extensive Template Library: Access over 100 professionally designed, fully responsive templates tailored for various niches and page types, streamlining the page creation process.

Rich Element Library: Utilize a wide array of elements, including images, videos, countdown timers, and forms, to build engaging and interactive pages.

Mobile Responsiveness: Customize pages for optimal display across all devices, ensuring a consistent and user-friendly experience for all visitors.

Third-Party Integrations: Seamlessly integrate with popular Shopify apps and tools, such as email marketing platforms and review apps, to enhance functionality and drive conversions.

Use Case:

A Shopify store specializing in eco-friendly home goods aims to launch a holiday promotion with a dedicated landing page. By implementing PageFly, the store quickly designs a visually appealing, conversion-optimized landing page featuring a countdown timer, product showcases, and a sign-up form for exclusive offers. This targeted approach leads to increased traffic, higher engagement, and a boost in holiday sales.

Ratings & Reviews:

Shopify App Store: 4.9 out of 5 stars, based on over 6,000 reviews.

Customer Feedback: Users praise PageFly for its intuitive interface, extensive customization options, and exceptional customer support, noting significant improvements in page design and conversion rates.

19. SEOManager Shopify Marketing App

SEO Manager is a comprehensive search engine optimization app designed specifically for Shopify merchants. It offers a suite of tools to enhance your store’s visibility on search engines, thereby driving organic traffic and increasing sales. By integrating seamlessly with Shopify, SEO Manager allows you to manage and optimize various SEO elements directly within your store’s dashboard.

Unique Features:

Real-Time SEO Feedback: Receive immediate insights and suggestions to improve your store’s SEO performance.

404 Error Tracking and Management: Monitor and fix broken links to ensure a seamless user experience and maintain search engine rankings.

Google Integration: Easily connect with Google Search Console to track your store’s search performance and identify areas for improvement.

Bulk Editing: Efficiently update meta tags, titles, and descriptions across multiple products and pages.

Structured Data Support: Implement JSON-LD structured data to enhance your store’s appearance in search results with rich snippets.

Use Case:

A Shopify store specializing in handmade crafts aims to improve its online visibility and attract more organic traffic. By implementing SEO Manager, the store identifies and fixes 404 errors, optimizes product meta descriptions, and integrates structured data. These actions lead to higher search engine rankings, increased website traffic, and a boost in sales.

Ratings & Reviews:

Shopify App Store: 4.6 out of 5 stars, based on over 1,200 reviews.

Customer Feedback: Users commend SEO Manager for its user-friendly interface, comprehensive feature set, and positive impact on search engine rankings and organic traffic.

20. Plug in SEO Shopify Marketing App

Plug in SEO is a comprehensive search engine optimization tool designed specifically for Shopify merchants. It offers a suite of features to help store owners identify and rectify SEO issues, optimize content, and enhance overall search engine visibility. By integrating seamlessly with Shopify, Plug in SEO enables businesses to monitor and improve their SEO performance directly from their store’s dashboard.

Unique Features:

Automated SEO Audits: Conducts regular scans of your store to detect SEO problems, providing actionable insights and step-by-step instructions for resolution.

SEO Templating: Allows for the creation of templates to manage titles and meta descriptions efficiently across various pages, ensuring consistency and optimization.

Structured Data Support: Implements JSON-LD structured data to enhance your store’s appearance in search results with rich snippets, improving click-through rates.

Broken Link Detection: Identifies and assists in fixing broken links, ensuring a seamless user experience and maintaining search engine rankings.

Use Case:

A Shopify store specializing in artisanal home decor aims to improve its search engine rankings and attract more organic traffic. By implementing Plug in SEO, the store conducts a comprehensive audit, identifying issues such as missing meta descriptions and broken links. Utilizing the app’s templating feature, they efficiently update meta tags across all product pages. Additionally, the structured data support enhances their visibility in search results. These optimizations lead to improved search rankings, increased organic traffic, and a boost in sales.

Ratings & Reviews:

Shopify App Store: 4.6 out of 5 stars, based on over 2,500 reviews.

Customer Feedback: Users praise Plug in SEO for its user-friendly interface, comprehensive feature set, and positive impact on search engine rankings and organic traffic.

21. Abandoned Cart Recovery for Shopify

Customers.ai offers an advanced Abandoned Cart Recovery solution designed to help Shopify merchants recapture lost sales by identifying and engaging shoppers who leave items in their carts without completing the purchase. By integrating seamlessly with Shopify, this tool enables businesses to automatically reach out to potential customers, encouraging them to finalize their transactions and boosting overall revenue.

Unique Features:

Website Visitor Identification: Capture information about shoppers who abandon their carts, even if they haven’t filled out a form, allowing for targeted follow-up communications.

Automated Outreach: Send personalized emails and retargeting ads to remind customers of their abandoned carts, offering incentives to complete their purchases.

Seamless Integration: Easily connect with your existing email automation and retargeting platforms to streamline your marketing efforts.

Detailed Analytics: Monitor the performance of your abandoned cart recovery campaigns with comprehensive analytics, enabling data-driven optimizations.

Use Case:

A Shopify store specializing in eco-friendly products experiences a high rate of cart abandonment. By implementing Customers.ai’s Abandoned Cart Recovery solution, the store identifies visitors who left items in their carts and sends them personalized emails with exclusive discounts. Additionally, retargeting ads are displayed to these potential customers across various platforms. This approach leads to a significant increase in recovered sales and a reduction in cart abandonment rates.

Ratings & Reviews:

Customer Feedback: Users commend Customers.ai’s Abandoned Cart Recovery for its effectiveness in recapturing lost sales, ease of integration with Shopify, and the ability to engage customers who might have otherwise been lost.

AI-Powered Advertising

How to Unlock AI and Lead Capture Tech for 10X Return on Ad Spend

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

Integrating These Shopify Marketing Apps into Your Strategy

Adding new tools to your Shopify store can be exciting but a smooth integration is key to seeing results without disrupting your existing workflows. Here’s how to make the most of these apps:

Start Small: Begin by integrating one app at a time to avoid overwhelming your team or causing workflow disruptions. Prioritize apps that address your most pressing needs, such as improving email marketing or reducing cart abandonment.

Test and Measure: Before fully rolling out an app, run a small test to ensure it integrates seamlessly and delivers results. Use built-in analytics and KPIs, like conversion rates or engagement metrics, to track its impact and make adjustments as needed.

Train Your Team: New tools work best when everyone knows how to use them. Host training sessions or provide how-to guides for your team, ensuring they understand each app’s features and how it fits into your overall strategy.

Monitor and Optimize: Once integrated, regularly review the performance of your apps. Look for opportunities to fine-tune settings or explore advanced features that can further enhance your marketing efforts.

With a thoughtful approach, these apps can transform your marketing strategy, streamline workflows, and drive results without missing a beat.

Wrapping It Up: Unlock Your Store’s Full Potential 

Exploring the right Shopify marketing apps can open up new opportunities to engage customers, recover lost sales, and grow your business. By integrating tools that align with your specific goals, you can streamline your workflows, deliver personalized experiences, and maximize your store’s potential.

Take a moment to assess your current marketing strategy. Are there gaps you could fill with the right app? Whether it’s boosting email performance, recovering abandoned carts, or building loyalty, the tools you choose can make a big difference.

Ready to take the next step? 

Try Customers.ai for free and see how it can help you identify return visitors, recover lost sales, and enhance your marketing efforts. Start transforming your Shopify store today!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Shopify Marketing App FAQs

What are Shopify marketing apps?Shopify marketing apps are tools designed to help store owners improve their marketing efforts, such as email campaigns, social media ads, SEO, and customer engagement.

How do Shopify marketing apps help grow my store?These apps streamline marketing tasks, boost customer engagement, recover abandoned carts, and drive more traffic to your store, ultimately increasing sales.

What are the best Shopify marketing apps for ecommerce stores?Popular options include Klaviyo for email marketing, Privy for pop-ups and SMS, and Yotpo for customer reviews and user-generated content.

Can Shopify marketing apps help with SEO?Yes, apps like Plug in SEO and SEO Manager are specifically designed to optimize your store’s search engine visibility.

How do I choose the right Shopify marketing apps for my store?Consider your business goals, budget, and the app’s integration capabilities with your existing tools.

Are there free Shopify marketing apps available?Yes, many apps offer free plans or trials, such as Privy, Omnisend, and Klaviyo, allowing you to test their features before committing.

What are Shopify ecommerce apps, and how do they differ from marketing apps?Shopify ecommerce apps include a wide range of tools for store management, while marketing apps focus specifically on driving traffic, engagement, and sales.

Can Shopify marketing apps integrate with other tools?Most apps integrate seamlessly with platforms like Klaviyo, Google Ads, and Facebook to enhance your marketing strategy.

What Shopify apps help with abandoned cart recovery?Apps like Customers.ai Abandoned Cart Recovery, Klaviyo, and Recart are great for re-engaging customers who didn’t complete their purchase.

Which Shopify apps can help increase customer retention?Smile.io, Yotpo, and Stamped.io are excellent for building loyalty and encouraging repeat purchases.

How do Shopify apps support email marketing?Apps like Klaviyo, Omnisend, and Seguno allow you to create, automate, and optimize email campaigns directly within Shopify.

Can I use Shopify marketing apps for social media campaigns?Yes, apps like Facebook & Instagram by Meta and Postscript help you manage and optimize social media ads and engagement.

What are the best apps for creating custom landing pages on Shopify?PageFly and Shogun are two popular apps for designing high-converting landing pages without coding.

How do Shopify marketing apps help with customer reviews?Apps like Yotpo and Loox make it easy to collect, display, and leverage customer reviews and photos to build trust and drive sales.

What’s the easiest way to set up referral programs on Shopify?ReferralCandy and Smile.io are top-rated apps for creating and managing referral programs.

How do Shopify apps improve the post-purchase experience?Apps like ReConvert let you customize thank-you pages and offer post-purchase upsells, enhancing customer satisfaction and increasing AOV.

What are the top Shopify apps for push notifications?PushOwl is a leading app for sending web push notifications to re-engage customers and drive traffic.

How do I measure the success of Shopify marketing apps?Use built-in analytics and reporting features to track key metrics like conversion rates, click-through rates, and ROI.

Do Shopify marketing apps work for small businesses?Absolutely. Many apps offer scalable features and affordable pricing, making them accessible for small stores.

What is the best Shopify marketing app for new store owners?Privy is a great starting point for new Shopify stores, offering email marketing, pop-ups, and SMS tools in one platform.

The post Shopify Marketing Apps: 21 Essential Tools for Ecommerce Stores appeared first on Customers.ai.

Meet LLaVA-o1: The First Visual Language Model Capable of Spontaneous, …

The development of vision-language models (VLMs) has faced challenges in handling complex visual question-answering tasks. Despite substantial advances in reasoning capabilities by large language models like OpenAI’s GPT-o1, VLMs still struggle with systematic and structured reasoning. Current models often lack the ability to organize information and engage in logical, sequential reasoning, limiting their effectiveness for tasks that require deep cognitive processing, particularly when dealing with multimodal inputs such as images combined with text. Traditional VLMs tend to generate immediate responses without a step-by-step reasoning approach, leading to errors and inconsistencies.

Meet LLaVA-o1

A team of researchers from Peking University, Tsinghua University, Peng Cheng Laboratory, Alibaba DAMO Academy, and Lehigh University has introduced LLaVA-o1: a visual language model capable of systematic reasoning, similar to GPT-o1. LLaVA-o1 is an 11-billion-parameter model designed for autonomous, multistage reasoning. It builds upon the Llama-3.2-Vision-Instruct model and introduces a structured reasoning process, addressing the limitations of previous VLMs with a more methodical approach. The key innovation in LLaVA-o1 is the implementation of four distinct reasoning stages: summary, caption, reasoning, and conclusion.

The model is fine-tuned using a dataset called LLaVA-o1-100k, derived from visual question answering (VQA) sources and structured reasoning annotations generated by GPT-4o. This enables LLaVA-o1 to perform multistage reasoning, extending capabilities similar to GPT-o1 into vision-language tasks, which have historically lagged behind text-based models.

Technical Details and Benefits

LLaVA-o1 employs a novel inference-time scaling technique called stage-level beam search. Unlike previous methods, such as best-of-N or sentence-level beam search, LLaVA-o1 generates multiple responses for each stage of its structured reasoning process and selects the best candidate at each step, ensuring higher-quality results. This structured approach maintains logical coherence throughout the reasoning process, leading to more accurate conclusions.

Fine-tuned from the Llama-3.2-11B-Vision-Instruct model, LLaVA-o1 shows an 8.9% improvement on multimodal reasoning benchmarks compared to its base model, even outperforming larger or closed-source competitors like Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct. It achieves this with only 100,000 training samples, making LLaVA-o1 an efficient solution in terms of both performance and scalability. By employing structured thinking through distinct stages, LLaVA-o1 systematically addresses problems, minimizing reasoning errors common in other VLMs.

Importance and Results

LLaVA-o1 addresses a significant gap between textual and visual question-answering models by enabling systematic reasoning in vision-language tasks. Experimental results show that LLaVA-o1 improves performance across benchmarks like MMStar, MMBench, MMVet, MathVista, AI2D, and HallusionBench. It consistently surpasses its base model by over 6.9% across multimodal benchmarks, particularly in reasoning-intensive domains such as mathematical and scientific visual questions.

Stage-level beam search enhances the model’s reliability by generating and verifying multiple candidate responses for each stage, selecting the most appropriate one. This allows LLaVA-o1 to excel in complex visual tasks, compared to traditional inference scaling methods that can be inefficient. LLaVA-o1 demonstrates that structured responses are crucial for achieving high-quality, consistent reasoning, setting a new standard for similarly sized models.

Conclusion

LLaVA-o1 is a visual language model capable of systematic reasoning, similar to GPT-o1. Its four-stage reasoning structure, combined with stage-level beam search, sets a new benchmark for multimodal AI. By training on a relatively small yet strategically constructed dataset, LLaVA-o1 demonstrates that efficient and scalable multimodal reasoning is achievable without the massive resources required by larger closed-source models. LLaVA-o1 paves the way for future research on structured reasoning within vision-language models, promising more advanced capabilities in AI-driven cognitive processing across visual and textual domains.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

Why AI-Language Models Are Still Vulnerable: Key Insights from Kili Technology’s Report on Large Language Model Vulnerabilities [Read the full technical report here]
The post Meet LLaVA-o1: The First Visual Language Model Capable of Spontaneous, Systematic Reasoning Similar to GPT-o1 appeared first on MarkTechPost.

Pleias Introduces Common Corpus: The Largest Multilingual Dataset for …

In recent years, the development of large language models has significantly advanced natural language processing (NLP). These models, trained on extensive datasets, can generate, understand, and analyze human language with remarkable proficiency. However, building such models requires substantial amounts of data, and access to high-quality multilingual datasets remains a considerable challenge. The scarcity of openly available, large-scale, and diverse training datasets has hindered researchers and developers from creating more inclusive and robust language models, especially for less widely spoken languages. Language barriers and limited representation have prevented NLP systems from reaching their full potential. Addressing these challenges requires a new approach that prioritizes multilingualism and open access in language model training.

The Release of Common Corpus

Pleias recently released the Common Corpus: the largest multilingual dataset for pretraining language models. This extensive dataset is a significant milestone for the NLP community, offering over two trillion tokens across dozens of languages, sourced from various open domains. Available on Hugging Face, the Common Corpus is part of the AI Alliance’s open dataset initiative, embodying a commitment to open-access data for research and innovation. Common Corpus is a collection that celebrates the diversity and breadth of the knowledge commons, containing five major categories of data: open culture, open government, open source, open science, and open web. From public reports to scientific publications, open culture resources like Wikipedia, and even permissively licensed code from GitHub, this dataset provides an unprecedented breadth of content for training multilingual models. The inclusion of these diverse data types makes it ideal for the pretraining of general-purpose language models that can understand and respond to nuanced, varied human communication.

Technical Details and Benefits

From a technical standpoint, the Common Corpus is an extraordinary achievement, serving as a multilingual data powerhouse. It includes curated data from open-access repositories like OpenAlex for scientific articles, government publications, GitHub for open-source software, and more. By leveraging multiple data domains, Pleias ensures that the dataset is not only vast but also represents a wide spectrum of real-world content. This diversity enables language models trained on Common Corpus to develop better contextual understanding and a deeper grasp of different genres and registers of language. Furthermore, its multilingual nature addresses the critical need for equitable representation across global languages, helping NLP researchers work toward a future where language technologies are not dominated by only English or a handful of widely spoken languages. The dataset, with its emphasis on open access, also helps in reducing the resource disparity between major research entities and independent or academic researchers, making advanced language technology more accessible.

Importance and Results

The release of the Common Corpus is a pivotal development for multiple reasons. The dataset not only sets a new benchmark in terms of size but also embodies a vision of shared knowledge, reproducibility, and inclusivity. It empowers researchers across the globe to develop language models that cater to a broader audience. By training on a rich multilingual dataset, future models can deliver more accurate, culturally aware, and contextually nuanced responses. Preliminary experiments have already shown promising results, with models trained on the Common Corpus exhibiting improved performance in zero-shot and few-shot settings across a variety of languages. This suggests that the scope of such a dataset can genuinely elevate language models beyond the typical monolingual or bilingual training paradigms, offering a real step forward for both academia and industry in tackling challenges like language preservation and ensuring the cultural inclusiveness of AI systems.

Conclusion

In conclusion, Pleias’ Common Corpus stands as a monumental contribution to the future of multilingual language modeling. By providing an open and comprehensive dataset, it addresses the challenges of data accessibility and diversity that have limited NLP development. With the dataset being openly available on platforms like Hugging Face, it also reflects a growing commitment within the AI community to prioritize collaboration and openness. As we move forward, resources like Common Corpus will be critical in shaping more democratic, fair, and inclusive AI systems that can truly serve a global audience.

Check out Common Corpus on HuggingFace. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

Why AI-Language Models Are Still Vulnerable: Key Insights from Kili Technology’s Report on Large Language Model Vulnerabilities [Read the full technical report here]
The post Pleias Introduces Common Corpus: The Largest Multilingual Dataset for Pretraining Language Models appeared first on MarkTechPost.

Fireworks AI Releases f1: A Compound AI Model Specialized in Complex R …

The field of artificial intelligence is advancing rapidly, yet significant challenges remain in developing and applying AI systems, particularly in complex reasoning. Many current AI solutions, including advanced models like GPT-4 and Claude 3.5 Sonnet, still struggle with intricate coding tasks, deep conversations, and mathematical reasoning. The limitations of individual models—no matter how sophisticated—lead to blind spots and inadequacies. Additionally, while the demand for specialized AI models for niche tasks is growing, integrating multiple specialized models into a cohesive system remains technically challenging and labor-intensive. This calls for a new approach to AI, one that combines the strengths of multiple models while simplifying their integration and development.

Fireworks AI’s f1: A New Compound AI Model

To address these challenges, Fireworks AI has introduced f1, a compound AI model designed for complex reasoning tasks. f1 integrates multiple open models at the inference layer, achieving improved performance across domains such as coding, chat, and mathematical problem-solving. Unlike conventional AI models that rely on a single inference system, f1 combines the strengths of various specialized models, providing developers with a powerful yet straightforward prompting interface. This release reflects Fireworks AI’s vision for the future of AI—systems that combine specialized tools and models to enhance performance, reliability, and control.

Technical Details

At its core, f1 is an open-model-based reasoning system designed to outperform even the latest powerhouse models like GPT-4 and Claude 3.5 Sonnet in complex tasks. The compound approach taken by Fireworks AI means that instead of using a monolithic model to solve every problem, f1 dynamically selects the most suitable open model for each specific part of a problem. This allows for an optimized solution process that is both efficient and effective. Developers can interact with f1 through a simple prompting mechanism, essentially treating prompts as a universal programming language for AI applications. With f1, developers can describe what they want to achieve without delving into the technical details—thereby reducing the development time and effort involved in creating AI applications. Fireworks AI currently offers two variants of f1: the standard f1 and a lighter version called f1-mini. Both are available in preview, accessible through the Fireworks AI Playground, allowing developers to experiment with the compound model capabilities firsthand.

The Importance of f1 and Benchmark Results

The strength of f1 lies in its integration of multiple models at the inference layer. By leveraging several open models, f1 breaks down complex tasks into smaller sub-tasks, each handled by the most suitable model. For example, in a challenging coding scenario, f1 may use one model for code understanding and another for debugging. This modularity allows f1 to solve problems with greater precision and ensures that each step is optimized for performance. Additionally, f1 simplifies sophisticated AI usage, making it more accessible to developers. The prompting mechanism bridges the gap between high-level goals and detailed execution, enabling developers of different skill levels to use compound AI without requiring deep expertise in machine learning.

Benchmark tests show that f1 surpasses GPT-4 and Claude 3.5 Sonnet in hard coding, conversation, and math benchmarks—areas where traditional AI models often face difficulties. This advancement demonstrates the potential of compound AI systems not only in achieving higher performance but also in providing enhanced reliability and fine-grained control. By integrating multiple models cohesively, f1 captures the benefits of specialization while reducing the limitations of individual models. Furthermore, Fireworks AI has designed f1 with usability in mind. Developers can gain early access to the f1 API by joining a waitlist, allowing them to incorporate f1’s capabilities into their projects ahead of general release. The Fireworks AI Playground also offers a free, hands-on experience with both f1 and f1-mini for those interested in exploring its potential.

Conclusion

Fireworks AI’s f1 model addresses the limitations of current AI models by using a compound approach that combines multiple specialized open models to enhance reasoning capabilities. By simplifying how developers interact with these capabilities through a universal prompting interface, f1 remains both powerful and accessible. As AI continues to evolve, the compound approach of f1 suggests a future where specialized models collaborate to solve complex challenges, offering a more efficient experience for developers. With the release of f1, Fireworks AI aims to create more flexible and efficient AI applications, marking an important step toward reshaping how we interact with AI.

Check out the Details here. Access f1 and f1-mini in preview with free access now on Fireworks AI Playground. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Read the full technical report here] Why AI-Language Models Are Still Vulnerable: Key Insights from Kili Technology’s Report on Large Language Model Vulnerabilities
The post Fireworks AI Releases f1: A Compound AI Model Specialized in Complex Reasoning that Beats GPT-4o and Claude 3.5 Sonnet Across Hard Coding, Chat and Math Benchmarks appeared first on MarkTechPost.

Build cost-effective RAG applications with Binary Embeddings in Amazon …

Today, we are happy to announce the availability of Binary Embeddings for Amazon Titan Text Embeddings V2 in Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless. With support for binary embedding in Amazon Bedrock and a binary vector store in OpenSearch Serverless, you can use binary embeddings and binary vector store to build Retrieval Augmented Generation (RAG) applications in Amazon Bedrock Knowledge Bases, reducing memory usage and overall costs.
Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. Amazon Bedrock also offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock Knowledge Bases, FMs and agents can retrieve contextual information from your company’s private data sources for RAG. RAG helps FMs deliver more relevant, accurate, and customized responses.
Amazon Titan Text Embeddings models generate meaningful semantic representations of documents, paragraphs, and sentences. Amazon Titan Text Embeddings takes as an input a body of text and generates a 1,024 (default), 512, or 256 dimensional vector. Amazon Titan Text Embeddings are offered through latency-optimized endpoint invocation for faster search (recommended during the retrieval step) and throughput-optimized batch jobs for faster indexing. With Binary Embeddings, Amazon Titan Text Embeddings V2 will represent data as binary vectors with each dimension encoded as a single binary digit (0 or 1). This binary representation will convert high-dimensional data into a more efficient format for storage and computation.
Amazon OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service, a fully managed service that makes it simple to perform interactive log analytics, real-time application monitoring, website search, and vector search with its k-nearest neighbor (kNN) plugin. It supports exact and approximate nearest-neighbor algorithms and multiple storage and matching engines. It makes it simple for you to build modern machine learning (ML) augmented search experiences, generative AI applications, and analytics workloads without having to manage the underlying infrastructure.
The OpenSearch Serverless kNN plugin now supports 16-bit (FP16) and binary vectors, in addition to 32-bit floating point vectors (FP32). You can store the binary embeddings generated by Amazon Titan Text Embeddings V2 for lower costs by setting the kNN vector field type to binary. The vectors can be stored and searched in OpenSearch Serverless using PUT and GET APIs.
This post summarizes the benefits of this new binary vector support across Amazon Titan Text Embeddings, Amazon Bedrock Knowledge Bases, and OpenSearch Serverless, and gives you information on how you can get started. The following diagram is a rough architecture diagram with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless.

You can lower latency and reduce storage costs and memory requirements in OpenSearch Serverless and Amazon Bedrock Knowledge Bases with minimal reduction in retrieval quality.
We ran the Massive Text Embedding Benchmark (MTEB) retrieval data set with binary embeddings. On this data set, we reduced storage, while observing a 25-times improvement in latency. Binary embeddings maintained 98.5% of the retrieval accuracy with re-ranking, and 97% without re-ranking. Compare these results to the results we got using full precision (float32) embeddings. In end-to-end RAG benchmark comparisons with full-precision embeddings, Binary Embeddings with Amazon Titan Text Embeddings V2 retain 99.1% of the full-precision answer correctness (98.6% without reranking). We encourage customers to do their own benchmarks using Amazon OpenSearch Serverless and Binary Embeddings for Amazon Titan Text Embeddings V2.
OpenSearch Serverless benchmarks using the Hierarchical Navigable Small Worlds (HNSW) algorithm with binary vectors have unveiled a 50% reduction in search OpenSearch Computing Units (OCUs), translating to cost savings for users. The use of binary indexes has resulted in significantly faster retrieval times. Traditional search methods often rely on computationally intensive calculations such as L2 and cosine distances, which can be resource-intensive. In contrast, binary indexes in Amazon OpenSearch Serverless operate on Hamming distances, a more efficient approach that accelerates search queries.
In the following sections we’ll discuss the how-to for binary embeddings with Amazon Titan Text Embeddings, binary vectors (and FP16) for vector engine, and binary embedding option for Amazon Bedrock Knowledge Bases To learn more about Amazon Bedrock Knowledge Bases, visit Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.
Generate Binary Embeddings with Amazon Titan Text Embeddings V2
Amazon Titan Text Embeddings V2 now supports Binary Embeddings and is optimized for retrieval performance and accuracy across different dimension sizes (1024, 512, 256) with text support for more than 100 languages. By default, Amazon Titan Text Embeddings models produce embeddings at Floating Point 32 bit (FP32) precision. Although using a 1024-dimension vector of FP32 embeddings helps achieve better accuracy, it also leads to large storage requirements and related costs in retrieval use cases.
To generate binary embeddings in code, add the right embeddingTypes parameter in your invoke_model API request to Amazon Titan Text Embeddings V2:

import json
import boto3
import numpy as np
rt_client = boto3.client(“bedrock-runtime”)

response = rt_client.invoke_model(modelId=”amazon.titan-embed-text-v2:0″,
body=json.dumps(
{
“inputText”:”What is Amazon Bedrock?”,
“embeddingTypes”: [“binary”,”float”]
}))[‘body’].read()

embedding = np.array(json.loads(response)[“embeddingsByType”][“binary”], dtype=np.int8)

As in the request above, we can request either the binary embedding alone or both binary and float embeddings. The preceding embedding above is a 1024-length binary vector similar to:
array([0, 1, 1, …, 0, 0, 0], dtype=int8)
For more information and sample code, refer to Amazon Titan Embeddings Text.
Configure Amazon Bedrock Knowledge Bases with Binary Vector Embeddings
You can use Amazon Bedrock Knowledge Bases, to take advantage of the Binary Embeddings with Amazon Titan Text Embeddings V2 and the binary vectors and Floating Point 16 bit (FP16) for vector engine in Amazon OpenSearch Serverless, without writing a single line of code. Follow these steps:

On the Amazon Bedrock console, create a knowledge base. Provide the knowledge base details, including name and description, and create a new or use an existing service role with the relevant AWS Identity and Access Management (IAM) permissions. For information on creating service roles, refer to Service roles. Under Choose data source, choose Amazon S3, as shown in the following screenshot. Choose Next.
Configure the data source. Enter a name and description. Define the source S3 URI. Under Chunking and parsing configurations, choose Default. Choose Next to continue.
Complete the knowledge base setup by selecting an embeddings model. For this walkthrough, select Titan Text Embedding v2. Under Embeddings type, choose Binary vector embeddings. Under Vector dimensions, choose 1024. Choose Quick Create a New Vector Store. This option will configure a new Amazon Open Search Serverless store that supports the binary data type.

You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base and check the FM’s responses.
Conclusion
As we’ve explored throughout this post, Binary Embeddings are an option in Amazon Titan Text Embeddings V2 models available in Amazon Bedrock and the binary vector store in OpenSearch Serverless. These features significantly reduce memory and disk needs in Amazon Bedrock and OpenSearch Serverless, resulting in fewer OCUs for the RAG solution. You’ll also experience better performance and improvement in latency, but there will be some impact on the accuracy of the results compared to using the full float data type (FP32). Although the drop in accuracy is minimal, you have to decide if it suits your application. The specific benefits will vary based on factors such as the volume of data, search traffic, and storage requirements, but the examples discussed in this post illustrate the potential value.
Binary Embeddings support in Amazon Open Search Serverless, Amazon Bedrock Knowledge Bases, and Amazon Titan Text Embeddings v2 are available today in all AWS Regions where the services are already available. Check the Region list for details and future updates. To learn more about Amazon Knowledge Bases, visit the Amazon Bedrock Knowledge Bases product page. For more information regarding Amazon Titan Text Embeddings, visit Amazon Titan in Amazon Bedrock. For more information on Amazon OpenSearch Serverless, visit the Amazon OpenSearch Serverless  product page. For pricing details, review the Amazon Bedrock pricing page.
Give the new feature a try in the Amazon Bedrock console today. Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts and engage with the generative AI builder community at community.aws.

About the Authors
Shreyas Subramanian is a principal data scientist and helps customers by using generative AI and deep learning to solve their business challenges using AWS services. Shreyas has a background in large-scale optimization and ML and in the use of ML and reinforcement learning for accelerating optimization tasks.
Ron Widha is a Senior Software Development Manager with Amazon Bedrock Knowledge Bases, helping customers easily build scalable RAG applications.
Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and AI/ML. He holds a bachelor’s degree in computer science and an MBA in entrepreneurship. In his free time, he likes to fly airplanes and hang gliders and ride his motorcycle.
Vamshi Vijay Nakkirtha is a Senior Software Development Manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems.

Automate cloud security vulnerability assessment and alerting using Am …

Cloud technologies are progressing at a rapid pace. Businesses are adopting new innovations and technologies to create cutting-edge solutions for their customers. However, security is a big risk when adopting the latest technologies. Enterprises often rely on reactive security monitoring and notification techniques, but those techniques might not be sufficient to safeguard your enterprises from vulnerable assets and third-party attacks. You need to establish proper security guardrails in the cloud environment and create a proactive monitoring practice to strengthen your cloud security posture and maintain required compliance standards.
To address this challenge, this post demonstrates a proactive approach for security vulnerability assessment of your accounts and workloads, using Amazon GuardDuty, Amazon Bedrock, and other AWS serverless technologies. This approach aims to identify potential vulnerabilities proactively and provide your users with timely alerts and recommendations, avoiding reactive escalations and other damages. By implementing a proactive security monitoring and alerting system, users can receive personalized notifications in preferred channels like email, SMS, or push notifications. These alerts concisely summarize the identified security issues and provide succinct troubleshooting steps to fix the problem promptly, without the need for escalation.
GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior across your AWS environment. GuardDuty combines machine learning (ML), anomaly detection, and malicious file discovery, using both AWS and industry-leading third-party sources, to help protect AWS accounts, workloads, and data. GuardDuty integrates with Amazon EventBridge by creating an event for EventBridge for new generated vulnerability findings. This solution uses a GuardDuty findings notification through EventBridge to invoke AWS Step Functions, a serverless orchestration engine, which runs a state machine. The Step Functions state machine invokes AWS Lambda functions to get a findings summary and remediation steps through Amazon Bedrock.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
By using generative AI FMs on Amazon Bedrock, users can quickly analyze vast amounts of security data to identify patterns and anomalies that may indicate potential threats or breaches. Furthermore, by recognizing patterns in network traffic, user behavior, or system logs, such FMs can help identify suspicious activities or security vulnerabilities. Generative AI can make predictions about future security threats or attacks by analyzing historical security data and trends. This can help organizations proactively implement security measures to prevent breaches before they occur. This form of automation can help improve efficiency and reduce the response time to security threats.
Solution overview
The solution uses the built-in integration between GuardDuty and EventBridge to raise an event notification for any new vulnerability findings in your AWS accounts or workloads. You can configure the EventBridge rule to filter the findings based on severity so that only high-severity findings are prioritized first. The EventBridge rule invokes a Step Functions workflow. The workflow invokes a Lambda function and passes the GuardDuty findings details. The Lambda function calls Anthropic’s Claude 3 Sonnet model through Amazon Bedrock APIs with the input request. The API returns the finding summarization and mitigation steps. The Step Functions workflow sends findings and remediation notifications to the subscribers or users using Amazon Simple Notification Service (Amazon SNS). In this post, we use email notification, but you can extend the solution to send mobile text or push notifications.
The solution uses the following key services:

Amazon Bedrock – The solution integrates with Anthropic’s Claude 3 Sonnet model to provide summarized visibility into security vulnerabilities and troubleshooting steps.
Amazon EventBridge – EventBridge is a serverless event bus that helps you receive, filter, transform, route, and deliver events.
Amazon GuardDuty – The solution uses the threat detection capabilities of GuardDuty to identify and respond to threats.
IAM – With AWS Identity and Access Management (IAM), you can specify who or what can access services and resources in AWS, centrally manage fine-grained permissions, and analyze access to refine permissions across AWS. Follow the principle of least privilege to safeguard your workloads.
AWS Lambda – Lambda is a compute service that runs your code in response to events and automatically manages the compute resources, making it the fastest way to turn an idea into a modern, production, serverless application.
Amazon SNS – Amazon SNS is a managed service that provides message delivery from publishers to subscribers.
AWS Step Functions – Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and ML pipelines.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

GuardDuty invokes an EventBridge rule. The rule can filter the findings based on severity.

The findings are also exported to an Amazon Simple Storage Service (Amazon S3) bucket.

The EventBridge rule invokes a Step Functions workflow.
The Step Functions workflow calls a Lambda function to get the details of the vulnerability findings.
The Lambda function creates a prompt with the vulnerability details and passes it to Anthropic’s Claude 3 using Amazon Bedrock APIs. The function returns the response to the Step Functions workflow.
The Step Functions workflow calls an SNS topic with the findings details to send an email notification to subscribers. You can use your support or operations team as the subscriber for this use case.
Amazon SNS sends the email to the subscribers.
The Step Functions workflow and Lambda function logs are stored in Amazon CloudWatch. For more details, see Configure logging in the Step Functions console to store logs in CloudWatch. By default, CloudWatch logs use server-side encryption for the log data at rest.

Solution benefits
The solution provides the following benefits for end-users:

Real-time visibility – The intuitive omnichannel support solution provides a comprehensive view of your cloud environment’s security posture
Actionable insights – You can drill down into specific security alerts and vulnerabilities generated using generative AI to prioritize and respond effectively
Proactive customizable reporting – You can troubleshoot various errors before escalation by retrieving a summary of reports with action recommendations

Prerequisites
Complete the following prerequisite steps:

Enable GuardDuty in your account to generate findings.
Provision least privilege IAM permissions for AWS resources like Step Functions and Lambda functions to perform desired actions:

The Step Functions IAM role should have IAM policies to invoke the Lambda function and publish to the SNS topic.
The Lambda function needs AWSLambdaBasic ExecutionRole to publish logs and the bedrock:InvokeModel
Edit the access policy of the SNS topic to only allow Step Functions to publish messages to the topic.

Request access to Anthropic’s Claude 3 on Amazon Bedrock.
Turn on encryption at the SNS topic to enable server-side encryption.

Deploy the solution
Complete the following steps to deploy the solution:

On the EventBridge console, create a new rule for GuardDuty findings notifications.

The example rule in the following screenshot filters high-severity findings at severity level 8 and above. For a complete list of GuardDuty findings, refer to the GetFindings API.

On the Lambda console, create a Lambda function that will take the findings as the input and call the Amazon Bedrock API to get the summarization and mitigation steps from Anthropic’s Claude 3.

You need to provide proper IAM permissions to your Lambda function to call Amazon Bedrock APIs. You can configure parameters in the environment variables in the Lambda function. The following function uses three configuration parameters:

modelId is set as claude-3-sonnet-20240229-v1:0
findingDetailType is set as GuardDuty finding to filter the payload
source is set as guardduty to only evaluate GuardDuty findings

import json
import boto3
import urllib.parse
import os

region = os.environ[‘AWS_REGION’]
model_Id = os.environ[‘modelId’]
finding_detail_type = os.environ[‘findingDetailType’]
finding_source = os.environ[‘source’]

# Bedrock client used to interact with APIs around models
bedrock = boto3.client(service_name=’bedrock’, region_name= region)

# Bedrock Runtime client used to invoke and question the models
bedrock_runtime = boto3.client(service_name=’bedrock-runtime’, region_name= region)

evaluator_response = []
max_tokens=512
top_p=1
temp=0.5
system = “”

def lambda_handler(event, context):
    message = “”
    try:
        file_body = json.loads(json.dumps(event))
        print(finding_detail_type)
        print(finding_source)
        if file_body[‘detail-type’] == finding_detail_type and file_body[‘source’] == finding_source and file_body[‘detail’]:
            print(f’File contents: {file_body[‘detail’]}’)
            description = file_body[“detail”][“description”]
            finding_arn = file_body[“detail”][“arn”]
            try:
                body= createBedrockRequest(description)
                message = invokeModel(body)
                print(message)
                evaluator_response.append(message)
                evaluator_response.append(finding_arn)
            except Exception as e:
                print(e)
                print(‘Error calling model’)
        else:
            message = “Invalid finding source”
    except Exception as e:
        print(e)
        print(‘Error getting finding id from the guard duty record’)
        raise e
    return message

def createBedrockRequest(description):
    prompt = “You are an expert in troubleshooting AWS logs and sharing details with the user via an email draft as stated in <description>. Do NOT provide any preamble. Draft a professional email summary of details as stated in description. Write the recipient as – User in the email and sender in the email should be listed as – Your Friendly Troubleshooter. Skip the preamble and directly start with subject. Also, provide detailed troubleshooting steps in the email draft.” + “<description>” + description + “</description>”
    messages = [{ “role”:’user’, “content”:[{‘type’:’text’,’text’: prompt}]}]
    body=json.dumps(
             {
                “anthropic_version”: “bedrock-2023-05-31”,
                “max_tokens”: max_tokens,
                “messages”: messages,
                “temperature”: temp,
                “top_p”: top_p,
                “system”: system
            } 
        )
    return body

def invokeModel(body):
    response = bedrock_runtime.invoke_model(body= body, modelId = model_Id)
    response_body = json.loads(response.get(‘body’).read())
    message = response_body.get(‘content’)[0].get(“text”)
    return message

It’s crucial to perform prompt engineering and follow prompting best practices in order to avoid hallucinations or non-coherent responses from the LLM. In our solution, we created the following prompt to generate responses from Anthropic’s Claude 3 Sonnet:
Prompt = “`You are an expert in troubleshooting AWS logs and sharing details with the user via an email draft as stated in <description>. Do NOT provide any preamble. Draft a professional email summary of details as stated in description. Write the recipient as – User in the email and sender in the email should be listed as – Your Friendly Troubleshooter. Skip the preamble and directly start with subject. Also, provide detailed troubleshooting steps in the email draft.” + “<description>” + description + “</description>“`
The prompt makes sure the description of the issue under consideration is categorized appropriately within XML tags. Further emphasis has been provided upon jumping directly into generating the answer and skipping any additional information that may be generated from the model.

On the Amazon SNS console, create an SNS topic to send notifications and add the emails of the subscribers.

The following screenshot shows the topic details with some test subscribers.

Now you can create the Step Functions state machine and integrate the Lambda and Amazon SNS calls in the workflow.

On the Step Functions console, create a new state machine and add the Lambda and Amazon SNS optimized integration.

You need to provide appropriate IAM permissions to the Step Functions role so it can call Lambda and Amazon SNS.
The following diagram illustrates the Step Functions state machine.

The following sample code shows how to use the Step Functions optimized integration with Lambda and Amazon SNS.

On the EventBridge console, add the Step Functions state machine as the target of the EventBridge rule created earlier.

As seen in the following screenshot, the rule needs to have proper IAM permission to invoke the Step Functions state machine.

Test the solution
You can test the setup by generating some sample findings on the GuardDuty console. Based on the sample findings volume, the test emails will be triggered accordingly.

Based on a sample generation, the following screenshot shows an email from Amazon SNS about a potential security risk in an Amazon Elastic Container Service (Amazon ECS) cluster. The email contains the vulnerability summary and a few mitigation steps to remediate the issue.

The following screenshot is a sample email notification about a potential Bitcoin IP address communication.

This proactive approach enables users to take immediate action and remediate vulnerabilities before they escalate, reducing the risk of data breaches or security incidents. It empowers users to maintain a secure environment within their AWS accounts, fostering a culture of proactive security awareness and responsibility. Furthermore, a proactive security vulnerability assessment and remediation system can streamline the resolution process, minimizing the time and effort required to address security concerns.
Clean up
To avoid incurring unnecessary costs, complete the following steps:

Delete the following AWS resources associated with this solution:

Step Functions state machine
Lambda functions
SNS topic

You can disable GuardDuty if you’re no longer using it to avoid S3 bucket storage cost.

By cleaning up the resources created for this solution, you can prevent any ongoing charges to your AWS account.
Conclusion
By providing users with clear and actionable recommendations, they can swiftly implement the necessary fixes, reducing the likelihood of untracked or lost tickets and enabling swift resolution. Adopting this proactive approach not only enhances the overall security posture of AWS accounts, but also promotes a collaborative and efficient security practice within the organization, fostering a sense of ownership and accountability among users.
You can deploy this solution and integrate it with other services to have a holistic omnichannel solution. To learn more about Amazon Bedrock and AWS generative AI services, refer to the following workshops:

Amazon Bedrock Workshop
Amazon Q Developer Deep Dive

About the Authors
Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.
Rajdeep Banerjee is a Senior Partner Solutions Architect at AWS helping strategic partners and clients in the AWS cloud migration and digital transformation journey. Rajdeep focuses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. He is a member of Serverless technical field community. Rajdeep is based out of Richmond, Virginia.

DXC transforms data exploration for their oil and gas customers with L …

This post is co-written with Kaustubh Kambli from DXC Technology.
DXC Technology is an IT services leader with more than 130,000 employees supporting 6,000 customers in over 70 countries. DXC builds offerings across industry verticals to deliver accelerated value to their customers.
One of the sectors DXC has deep expertise in is energy. The oil and gas industry relies on discovering new drilling sites to drive growth. Data-driven insights can accelerate the process of identifying potential locations and improve decision-making. For the largest companies in the sector, shaving even a single day off the time to first oil can impact operational costs and revenue by millions of dollars.
In this post, we show you how DXC and AWS collaborated to build an AI assistant using large language models (LLMs), enabling users to access and analyze different data types from a variety of data sources. The AI assistant is powered by an intelligent agent that routes user questions to specialized tools that are optimized for different data types such as text, tables, and domain-specific formats. It uses the LLM’s ability to understand natural language, write code, and reason about conversational context.
Data plays a key role in identifying sites for oil exploration and in accelerating the time to extract oil from those sites, but data in this industry is scattered, non-standard, and of various types. These companies have remote sites, offshore drilling locations, branch offices, and corporate offices. Relevant data is also in various formats, ranging from spreadsheets to complex datasets like satellite images and GIS data. Moreover, there are industry-specific data formats like Log ASCII standard (LAS).
The assistant architecture consists of several key components powered by Anthropic’s Claude on Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
In this project, Amazon Bedrock enabled us to seamlessly switch between different variants of Anthropic’s Claude models. We used smaller, faster models for basic tasks such as routing, and more powerful models for complex processes such as code generation. Additionally, we took advantage of Amazon Bedrock Knowledge Bases, a managed service that enhances the LLM’s knowledge by integrating external documents. This service seamlessly integrates with FMs on Bedrock and can be set up through the console in a few minutes.
Solution overview
The solution is composed of several components powered by Anthropic’s Claude models on Bedrock:

Router – The router analyzes the user query and routes it to the appropriate tool
Custom built tools – These tools are optimized for different data sources such as file systems, tables, and LAS files
Conversational capabilities – These capabilities enable the model to understand context and rewrite queries when needed

We also use an Amazon Simple Storage Service (Amazon S3) bucket to store the data. The data is indexed by relevant tools when deploying the solution, and we use signed S3 URLs to provide access to the relevant data sources in the UI.
The following diagram illustrates the solution architecture.

In the following sections, we go over these components in detail. The examples presented in this post use the Teapot dome dataset, which describes geophysical and well log data from the Rocky Mountain Oilfield Testing Center (RMOTC) related to the Tensleep Formation and is available for educational and research purposes.
LLM-powered router
The types of questions that the chatbot can be asked can be broken down into distinct categories:

File name questions – For example, “How many 3D seg-y files do we have?” For these questions, we don’t need to look at the file content; we only need to filter by file extension and count.
File content questions – For example, “What can you say about the geology of teapot dome?” For these questions, we need to do semantic search on the file content.
Production questions – For example, “How much oil did API 490251069400 produce on March 2001?” For these questions, we need to filter the production Excel sheet (here, match on API number) and make operations on the columns (here, sum on the oil production column).
Directional survey questions – For example, “What APIs have a total depth of more than 6000 ft?” For these questions, we need to filter the directional survey Excel sheet. The process is similar to the production questions, but the data differs.
LAS files questions – For example, “What log measurements were acquired for API 490251069400?” For these questions, we need to open and process the LAS file to look at the measurements present in the file.

The way to deal with each of these questions requires different processing steps. We can design LLM-powered tools that can address each question type, but the first step upon receiving the user query is to route it to the right tool. For this, we use Anthropic’s Claude v2.1 on Amazon Bedrock with the following prompt:

routing_prompt = “””

Human: You are an AI assistant that is an expert in Oil and Gas exploration.

    Use the following information as background to categorize the question

    – An API well number or API# can can have up to 14 digits sometimes divided
    by dashes.
    – There can be different unique identifiers for wells other than API #.
    – .las or .LAS refers to Log ASCII file format. It is a standard file
    format for storing well log data, which is crucial in the exploration and production of oil and gas. Well logs are detailed records of the geological formations penetrated by a borehole, and they are essential for understanding the subsurface conditions.

    Determine the right category of the question to route it to the appropriate service by following the instructions below

    – Repond with single word (the category name).
    – Use the information provided in <description> to determine the category of
    the question.
    – If you are unable to categorize the question or it is not related to one of
    the below categories then return “unknown”.
    – Use the category names provided below. Do not make up new categories.
    – If the category is ambiguous then output all the categories that are relevant
    as a comma separated list.

    <categories>
        <category>
            <name>filename</name>
            <description>The questions about searching for files or objects or
            those related to counting files of specific types such as .pdf, .las, .xls, .sgy etc.
            </description>
        </category>

        <category>
            <name>production</name>
            <description>well production related information. This can correlate API#, Well, Date of production, Amount Produces, Formation, Section </description>
        </category>

        <category>
            <name>las</name>
            <description>related to log data or .las or .LAS or Log ASCII files.
            Except questions related
            to searching or counting the files with .las extension.
            Those belong to filesystem category. </description>
        </category>

        <category>
            <name>directional_survey</name>
            <description>directional survey contains information about multiple
            wells and associates API, Measured Depth, Inclination and Azimuth
            </description>
        </category>

        <category>
            <name>knowledge_base</name>
            <description>related to oil and gas exploration but does not fit in any of the categories above, include seismic, logging and core analysis related questions.
            </description>
        </category>

        <category>
            <name>unrelated</name>
            <description> Question does not belong to one of the above categories and it is not related to oil and gas exploration in general. </description>
        </category>

    </categories>

    Here is the question

    <question>
    {query}
    </question>

    Return your answer in the following format

    <answer>
        <reason>$REASON_JUSTIFYING_CATEGORY</reason>
        <labels>$COMMA_SEPARETED_LABELS</labels>
    </answer>
“””

Using XML tags in the output allows you to parse out the right category for the question. You can then pass the query down to the relevant tool for further processing. Note that with the release of new powerful Anthropic models, you could also use Anthropic’s Claude Haiku on Amazon Bedrock to improve latency for the routing.
The prompt also includes guardrails to make sure queries not pertaining to oil and gas data are gently dismissed.
LLM-powered tools
To optimally handle the variety of tasks for the chatbot, we built specialized tools. The tools that we built are data-type specific (text, tables, and LAS), except for the file search tool, which is task specific.
File search tool
When searching for files and information, we identified two distinct types of search. One type pertains to identifying files based on the name or extension; the other requires analyzing the contents of the file to answer a question. We call the first type file name-based search and the second semantic-content based search.
File name-based search
For this tool, we don’t need to look at the contents of the file; only at the file name. To initialize the tool, we first crawl the S3 bucket containing the data to get a list of the available files. Then for each query, the steps are as follows:

LLM call to extract file extension and keywords from the query. When searching for relevant files to answer a query, we can look for specific file extensions or keywords that might be present in the content of the files. Our approach is to first use an LLM to analyze the user’s query and extract potential file extensions and keywords that could be relevant. We then search through the list of available files, looking for matches to those file extensions and keywords identified by the LLM. This allows us to narrow down the set of files that might contain information pertinent to answering the user’s question. Because we’re working with Anthropic’s Claude models, we ask the LLM to format its answer using XML tags. This structured output allows us to parse and extract file extensions and keywords from the answer. For instance, if the question is “Do we have any *.SGY files,” the LLM response should be <file-extension>.sgy</file-extension> <keywords></keywords> because there are no keywords. On the other hand, if the question is “Can you show me the 2Dseismic base map,” the response should be <file-extension></file-extension> <keywords>2D, seismic, base, map</keywords>.
Retrieve files that match the extension or keywords identified by the LLM. Retrieval is done by doing simple string matching over the list of available files. If a file extension is extracted by the LLM, simple string matching is done on the end of the file name; if keywords are extracted by the LLM, string matching is done for each of the keywords.
LLM call to confirm that retrieved files match the user query, and provide a final answer. To reach the final answer, we build a prompt containing the user query and the files retrieved at Step 2. We also give specific output formatting instructions to the LLM. Similar to Step 1, we rely on an XML structure to parse and extract desired information. In this step, the desired outputs are as follows:

<success> – Whether the search was successful.
<resources> – The list of Amazon S3 locations that the LLM deems relevant to the user query.
<answer> – The final answer from the LLM.

To be mindful of the limited number of input and output tokens, we also implement controls to make sure the retrieved context isn’t too large and that the LLM’s answer isn’t cut off, which can happen if there are too many resources that match the user query.
The following screenshot is an example of a query that’s answered using that tool. It shows the query “show me the .sgy files” and the tool’s response, which includes a list of files.

Semantic content-based search
The implementation for semantic content-based search relies on Amazon Bedrock Knowledge Bases. Amazon Bedrock Knowledge Bases provides a seamless way to implement semantic search by pointing the service to an S3 bucket containing the data. The managed service takes care of the processing, chunking, and data management when files are added or deleted from the bucket. For setup instructions, see Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.
For a given user query that’s passed to the tool, the steps are as follows:

Use the retrieve API from Amazon Bedrock Knowledge Bases to retrieve documents semantically close to the query.
Construct a prompt with the retrieved documents formatted with XML tags—<content> for text content and <location> for the corresponding Amazon S3 location.
Call Anthropic’s Claude v2.1 model with the prompt to get the final answer. Similarly to the file name-based search, we instruct the LLM to use <success>, <answer>, and <resources> tags in the answer.

Using the retrieve_and_reply API instead of the retrieve API would get the answer in a single step, but this approach gives us more flexibility in the prompting to get the output with the desired format.
The following screenshot is an example of a question answered using the semantic search tool. It shows the query “what information do you have about the geology of teapot dome?” and the tool’s response.

Tables tool
This tool is designed to filter tables and compute certain metrics from the information they contain. It uses the LLM’s ability to write Python code for data analysis. We implemented a generic tables tool that takes the following as input:

An Excel or CSV file
A description of the table (optional)
Table-specific instructions (optional)

In practice, with every new CSV or Excel file, we create a standalone tool for the router. This means that the tables tool applied to the production Excel sheet constitutes the production tool, whereas the tables tool coupled with the directional survey Excel sheet constitutes the directional survey tool.
Some out-of-the-box data analysis tools, such as LangChain’s Pandas agent, are available in open source libraries. The way these agents work is that they use an LLM to generate Python code, execute the code, and send the result of the code back to the LLM to generate a final response. However, for certain data analysis tasks, it would be preferable to directly output the result of Python code. Having an LLM generate the response as an extra step after the code execution introduces both latency and a risk for hallucination.
For example, many sample questions require filtering a DataFrame and potentially returning dozens of entries. The ability to filter a DataFrame and return the filtered results as an output was essential for our use case. To address this limitation, we wanted the LLM to generate code that we could run to obtain the desired output directly, so we built a custom agent to enable this functionality. Our custom agent also has the ability to self-correct if the generated code outputs an error. The main difference with traditional code-writing agents is that after the code is run, we return the output, whereas with traditional agents, this output is passed back to the agent to generate the final response. In our example with filtering and returning a large DataFrame, passing the DataFrame back to the agent to generate the final response would have the LLM rewrite that large DataFrame with risk of either exceeding the context window or hallucinating some of the data.
The following screenshot is an example of a question answered using the production data tool, which is the tables tool applied to the production data CSV file. It shows the query “What were the top 3 oil producing wells in March 2024” and the tool’s response. The response includes a DataFrame with the top 3 oil producing wells as well as the logic behind how the DataFrame was obtained.

The following code is the LLM response generated in the background; you can see in <code> tags the code that is being run to get the result in the data section of the UI. We explicitly prompt the LLM to store the final result in a variable called result so we can send it back to the user in the frontend.

<scratchpad>
To find wells with total depth greater than 6000 feet, I need to:

1. Filter the df dataframe on the ‘Total Depth’ column
2. Check if the value is greater than 6000
3. Handle any NaN values so they are not included
</scratchpad>

<code>
import pandas as pd
import numpy as np

result = df[df[‘Total Depth’].replace({np.nan: 0}) > 6000]
</code>

<answer>
I filtered the dataframe on the ‘Total Depth’ column, replaced any NaN values with 0 so they would not be included, and checked for values greater than 6000 feet. The final filtered dataframe is stored in the variable ‘result’.
</answer>

LAS tool
The LAS tool is built to analyze Log ASCII Standard (LAS) files, which contain a wide variety of log curves and data to fully characterize the drilled formations and support petroleum engineering analysis. This file format, which is in essence a structured text file, can be efficiently queried and analyzed using the lasio Python library.
Similarly to the tables tool, we use the code-writing abilities of the LLM to generate Python code using the lasio library. We used the XML agent implemented in LangChain, along with the Python REPL tool, which allows the agent to run Python code.
Because lasio isn’t a common Python library, we give detailed instructions in the prompt on how to use it properly. The following code shows an example of the extract from the larger prompt:

“”” …
If the question is to display a particular log data, you can use the `lasio` package
 in Python to load every LAS file into dataframe and display them iteratively as
follows.

    import lasio
   #Extract the las file names as a list from the dataframe df
    las_files = df[df[‘API Number’] == api_num][‘las_files’].values[0]
    print(las_files)

    for las_file in las_files: #enumerate the las file name in the list            
        las = lasio.read(las_file) # read a single LAS file
        las_df = las.df() # convert the LAS data into dataframe
        print(las_df) # print the dataframe 
 … “””

The following screenshot shows an example of a question answered using the LAS tool. It shows the query “What log measurements were acquired for API #490251107500” and the tool’s response, which includes processed data from the LAS files.

The following code is the LLM’s response in the backend, which contains the relevant Python code:

<tool>Python_REPL</tool><tool_input>
import pandas as pd
import lasio

df = pd.read_parquet(“/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/processed/teapot_dome/las_index.parq”)

api_num = 490251107500  
las_files = df[df[‘API Number’] == api_num][‘las_files’].values[0]

print(f”LAS files for API #{api_num}: {las_files}”)

log_measurements = []
for las_file in las_files:
    las = lasio.read(las_file)
    for curve in las.curves:
        if curve.mnemonic not in log_measurements:
            log_measurements.append(curve.mnemonic)
            
print(f”Log measurements acquired: {log_measurements}”)

The following is the output of the code run by the Python REPL tool, which is returned to the LLM to generate its final answer:

“””
LAS files for API #490251107500: [‘/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/raw/teapot_dome/Well Log Data/LAS_log_files/Shallow_LAS_files/49025110750000_480593.LAS’]
Log measurements acquired: [‘DEPT’, ‘GRR’, ‘RILD’, ‘RILM’, ‘RLL’, ‘SPR’]
“””

Lastly, we have the LLM’s final answer in the backend, before it’s parsed and returned to the UI:

 <final_answer>
    <data>
        <log_measurements>
            DEPT, GRR, RILD, RILM, RLL, SPR
        </log_measurements>
    </data>
    <resources>
        <resource>
            <location>/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/raw/teapot_dome/Well Log Data/LAS_log_files/Shallow_LAS_files/49025110750000_480593.LAS</location>

Conversational capabilities
The basic router handles a single user query and isn’t aware of chat history. However, conversational context is an essential part of the user experience. For instance, when a user asks “What API produced the most oil in 2010,” a natural follow-up question would be “What measurements do we have for that API,” in which case we need to recall the API number from the previous context, or “What about 2011,” in which case we need to recall the fact that the question is to find the API that produced the most oil. To enable this ability to add follow-up questions, we added another layer before the router that takes the user query and the conversation history and rewrites the user query with context that might be missing from it. We can also use this query-rewriting layer to directly translate or summarize previous responses, without having to go to the router, which saves time for simple queries.
The following is the sample prompt for context-aware query rewriting. We give the LLM two choices: either directly reply to the question if it’s a translation of summarization of a previous interaction, because this doesn’t require the use of tools, or rewrite the query to forward it to an expert (the router plus the tool framework). To differentiate between the options, the LLM can use either <answer> tags or <new_query> tags in its reply. In both cases, we ask the LLM to start out by using <thinking> tags to logically think about which one is more appropriate. If the <answer> tag is present in the LLM’s reply, we directly forward that answer to the user. Otherwise, if the <new_query> tag is present, we forward that new query to the router for appropriate tool use. We also added few-shot examples to the prompt to make the query rewriting process more explicit for the LLM, and in our experience they were instrumental to the success of query rewriting.

query_rewriting_prompt = “””
You are an AI assistant that helps a human answer oil and gas question.
You only know how to or rewrite previous interactions.
If the human asks for oil and gas specific knowledge, or to count and find specific
files, you should rewrite the query so it can be forwarded to an expert.
If the human tries to ask a question that is not related to oil and gas,
you should politely tell them that only oil and gas related questions are supported.

Here is the conversation between the human and the expert so far.
H is the human and E is the expert:
<history>
{history}
</history>

Here is the new query
<query>
{query}
</query>

If you can answer the question, your answer should be formatted as follows.
In the example, H is the human, E is the expert and A is you, the assistant.
<example>
H: How many wells are in section 24?
E: There are 42 wells
H: Can you rewrite that in French?
A: <think> This is a translation, I can answer.</think>
<answer>Il y a 42 puits.</answer>
</example>

<example>
H: Can you summarize that in one sentence?
A: <think> This is just rewriting, I can summarize the previous reply and
answer directly.</think>
<answer>Il y a 42 puits.</answer>
</example>

<example>
H: Who’s the queen of England?
A: <think>This is unrelated, I can tell the user I can’t answer.</think>
<answer>I am sorry but I can only answer questions about your files.</answer>
</example>

If you need to forward the question to the expert, your answer should be as follows
<example>
H: What is the depth of well x250?
E: It’s 2000 ft
H : What about well y890?
A: <think>This requires expert knowledge and some context, and I need to rewrite
the query before I ask the expert.</think>
<new_query>What is the depth of well y890?</new_query>
</example>

<example>
H: How many pdf files do I have?
A: <think>This requires to look into the file system,
I need to forward the question to the expert.</think>
</new_query>How many pdf files do I have?</new_query>
</example>

Remember, You only know how to translate or rewrite previous interactions.
If the human asks for anything other than rewriting or translating,
you should rewrite the query so it can be forwarded to an expert.
If the query needs context from previous questions,  rewrite the query so
the expert can understand it, otherwise, forward the query as-is.
If the human tries to ask a question that is not related to oil and gas,
you should politely tell them that only oil and gas related questions are supported.
“””

This query rewriting step adds an extra second in terms of latency, and could be toggled off, but it greatly enhances user experience because it enables follow-up questions. Another way to handle this would have been to combine the query rewriting and the router in a single prompt, but we find that it’s more straightforward for LLMs to perform tasks separately, because they can get overwhelmed when faced with too many instructions.
The following is an example of a conversational flow. The user first asks about the measurements for a given API, which requires the LAS tool. Then they follow up by asking production questions about that API, all using the production tool. Each of these questions builds on previous questions, which highlights the need for query rewriting. The table shows the initial user query and corresponding LLM-rewritten query, which accounts for previous context.

User Query
LLM Rewritten Query (Context Aware)

What log measurements were acquired for API #490251107500
What log measurements were acquired for API #490251107500?

How much oil did this API produce in September 2003?
How much oil did API #490251107500 produce in September 2003?

What about November?
For API #490251107500, how much oil did it produce in November 2003?

What month had the highest production that year?
What steps would you take to analyze the production data for API #490251107500 and determine which month had the highest oil production in the year 2003?

Get me a table of the monthly productions for that API for that year, include the monthly production and the months in the table
Please provide a table of the monthly oil productions for API #490251107500 for the year 2003. This API number and year were referenced in our previous discussion.

The following screenshots show the corresponding flow in the UI and demonstrates that the tool is able to respond based on previous context.

Conclusion
In this post, we presented an AI assistant for efficient data exploration in the oil and gas industry powered by LLMs and optimized tools. The router uses the language understanding abilities of LLMs to route queries to specialized tools. We built custom tools optimized for major data types such as text, tables, and domain-specific formats like LAS. Conversational capabilities enable clarification and context-aware follow-up questions. The end-to-end solution showcases how LLMs can transform data exploration workflows through the use of specialized tools and conversational interfaces. Data exploration tasks that took hours can now be achieved in just a few minutes, dramatically reducing time to first oil for DXC’s customers.
In addition to the tools presented here, you can create additional generative AI tools to query SQL data bases or analyze other industry-specific formats. Additionally, instead of creating separate table tools for each CSV dataset, the selection of the relevant dataset could be part of the tables tools itself, further reducing the need for preprocessing when onboarding the solution.
If you’re interested in building a similar AI assistant that can use multiple tools, you can get started with Amazon Bedrock Agents, a fully managed AWS solution that helps orchestrate complex tasks.

About the authors
Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.
Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) helping GSIs such as DXC in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel and spend time with his family.
Kaustubh Kambli is a Senior Manager responsible for Generative AI and Cloud Analytics Delivery at DXC. His team drives innovation and AI-powered solutions to meet client needs across multiple industries in AMS region. When he’s not focused on advancing AI technologies, Kaustubh enjoys exploring new places, engaging in creative pursuits and spending quality time with his loved ones.
Anveshi Charuvaka is a Senior Applied Scientist at the Generative AI Innovation Center, where he develops Generative AI-driven solutions for customers’ critical business challenges. With a PhD in Machine Learning and over a decade of experience, he specializes in applying innovative machine learning and generative AI techniques to address complex real-world problems.
Mofijul Islam is an Applied Scientist II at the AWS Generative AI Innovation Center, where he helps customers tackle customer-centric research and business challenges using generative AI, large language models (LLM), multi-agent learning, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual NLP, and multitask learning. His research has been published in top-tier conferences like NeurIPS, ICLR, AISTATS, and AAAI, as well as IEEE and ACM Transactions.
Yingwei Yu is an Applied Science Manager at Generative AI Innovation Center, AWS, where he leverages machine learning and generative AI to drive innovation across industries. With a PhD in Computer Science from Texas A&M University and years of working experience in Oil&Gas industry, Yingwei brings extensive expertise in applying cutting-edge technologies to real-world applications. Outside of work, he enjoys swimming, painting, MIDI composing, and spending time with family and friends.

Meet Beepo-22B: The Unrestricted AI Finetuned Model based on Mistral S …

Modern language models have transformed our daily interactions with technology, offering tools that help draft emails, write articles, code software, and much more. However, these powerful models often come with significant limitations. Many language models today are hamstrung by overly cautious guardrails that restrict certain types of information or enforce a predetermined moral stance. While these constraints exist for safety reasons, they often limit the utility of the model, leading to refusals or evasive responses even for harmless queries. Users are left feeling frustrated, needing workarounds like special prompts or complicated jailbreaks just to get a direct answer. The gap between user expectations and the model’s restrictions continues to be an area of contention, highlighting the need for a solution that respects user autonomy while maximizing the model’s potential.

Meet Beepo-22B: A Fine-Tuned Solution for User-Centric AI

Meet Beepo-22B: a fine-tuned version of Mistral Small Instruct 22B, designed with a focus on delivering completely uncensored and unrestricted responses to user instructions. Unlike other AI models, Beepo-22B prioritizes helpfulness without imposing moral judgments or requiring elaborate prompt hacks to bypass content limitations. Beepo-22B is built to provide users with a model that simply obeys their commands, performing to the best of its ability for any and all requests. This approach empowers users to access the full range of capabilities of a sophisticated AI model without needing to jump through hoops or deal with refusals.

Beepo-22B follows the Alpaca instruct format, making it compatible with other models that use similar formats, such as Mistral v3. This makes it easy for users to transition between different instruct models while still getting consistent performance. Designed for flexibility, Beepo-22B is offered in various formats, including GGUF and Safetensors, providing versatility in deployment environments. Whether for educational use, research purposes, or simply as a creative assistant, Beepo-22B’s core focus is on transparency and efficiency.

Technical Details

Beepo-22B stands out for several key reasons, both in terms of technical proficiency and user empowerment. It is based on Mistral’s Small Instruct 22B, a model known for its balanced trade-off between computational efficiency and robust language understanding. By leveraging the Alpaca instruct tuning approach, Beepo-22B manages to retain much of the model’s original intelligence and general knowledge capacity while eliminating overly restrictive behaviors that frustrate users. This fine-tuning has been achieved without abliteration—a practice in model fine-tuning that sometimes erases or dampens certain types of knowledge for safety. Instead, Beepo-22B keeps the model’s capability intact, offering both powerful functionality and user freedom.

Technically, Beepo-22B integrates with several popular tools and platforms to further enhance its utility. It is worth noting that KoboldCpp, a front-end interface, recently integrated Stable Diffusion 3.5 and Flux image generation capabilities in its latest release, offering users a seamless experience when using Beepo-22B for multimodal projects. This versatility makes Beepo-22B a compelling choice for those who seek an unrestricted model capable of handling diverse and creative tasks.

The release of Beepo-22B could be an important step towards providing users with AI that respects their autonomy. By offering an unrestricted model, it addresses a gap that has become more pronounced as other AI solutions impose heavier limitations on their capabilities. The freedom to explore the full capacity of AI technology without arbitrary constraints is crucial for those in creative, academic, or experimental fields who need access to unfiltered information and versatile AI assistance.

Moreover, early results from user tests show promising outcomes. Beepo-22B consistently delivered outputs that were highly rated in terms of accuracy, relevance, and completion of requests without refusals. The model retained its core intelligence while providing candid responses, regardless of the nature of the prompt. This kind of openness is particularly valuable for developers and researchers looking to prototype or experiment without interruptions. By retaining the foundational power of Mistral 22B, Beepo-22B has proven capable of handling sophisticated tasks, making it a valuable addition to any AI toolbox where flexibility is a priority.

Conclusion

Beepo-22B stands as an important evolution in the field of language models—a model that focuses on maximizing utility and minimizing unnecessary restrictions. It provides a pathway for those who need a reliable AI that won’t censor information or require constant prompt modifications to achieve straightforward outputs. From its roots in Mistral Small Instruct 22B to its user-friendly formats in GGUF and Safetensors, Beepo-22B is a versatile, intelligent, and obedient tool designed to serve a wide range of user needs. Whether you’re a researcher seeking unfiltered responses, a creative individual in need of inspiration, or just someone looking to interact freely with advanced AI, Beepo-22B opens up new possibilities in an ever-growing AI landscape. With its emphasis on unrestricted user interaction, Beepo-22B makes a bold statement: the future of AI can be one where the users are truly in control.

Check out the Finetuned Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production
The post Meet Beepo-22B: The Unrestricted AI Finetuned Model based on Mistral Small Instruct 22B appeared first on MarkTechPost.

Meet Memoripy: A Python Library that Brings Real Memory Capabilities t …

Artificial intelligence systems often struggle with retaining meaningful context over extended interactions. This limitation poses challenges for applications such as chatbots and virtual assistants, where maintaining a coherent conversation thread is essential. Most traditional AI models operate in a stateless manner, focusing solely on immediate inputs without considering the continuity of prior exchanges. This lack of effective memory leads to fragmented and inconsistent interactions, hampering the ability to build truly engaging, context-sensitive AI systems.

Meet Memoripy: A Python library that brings real memory capabilities to AI applications. Memoripy addresses the problem of maintaining conversational context by equipping AI systems with structured memory, allowing them to effectively store, recall, and build upon prior interactions. Memoripy provides both short-term and long-term memory storage, enabling AI systems to retain context from recent interactions while preserving important information over the long term. By structuring memory in a way that mimics human cognition—prioritizing recent events and retaining key details—Memoripy ensures that interactions remain relevant and coherent over time.

Memoripy organizes memory into short-term and long-term clusters, enabling the prioritization of recent interactions for immediate recall while retaining significant historical interactions for future use. This prevents the AI from becoming overwhelmed with excessive data while ensuring relevant information is accessible. Memoripy also implements semantic clustering, grouping similar memories together to facilitate efficient context retrieval. This capability allows AI systems to quickly identify and link related memories, thereby enhancing response quality. Furthermore, Memoripy incorporates memory decay and reinforcement mechanisms, whereby less useful memories gradually fade, and frequently accessed memories are reinforced, reflecting principles of human memory. Memoripy’s design emphasizes local storage, which allows developers to handle memory operations entirely on local infrastructure. This approach mitigates privacy concerns and provides flexibility in integrating with locally hosted language models, as well as with external services like OpenAI and Ollama.

To illustrate how Memoripy can be integrated into an AI application, consider the following example:

from memoripy import MemoryManager, JSONStorage

def main():
# Replace ‘your-api-key’ with your actual OpenAI API key
api_key = “your-key”
if not api_key:
raise ValueError(“Please set your OpenAI API key.”)

# Define chat and embedding models
chat_model = “openai” # Choose ‘openai’ or ‘ollama’ for chat
chat_model_name = “gpt-4o-mini” # Specific chat model name
embedding_model = “ollama” # Choose ‘openai’ or ‘ollama’ for embeddings
embedding_model_name = “mxbai-embed-large” # Specific embedding model name

# Choose your storage option
storage_option = JSONStorage(“interaction_history.json”)

# Initialize the MemoryManager with the selected models and storage
memory_manager = MemoryManager(
api_key=api_key,
chat_model=chat_model,
chat_model_name=chat_model_name,
embedding_model=embedding_model,
embedding_model_name=embedding_model_name,
storage=storage_option
)

# New user prompt
new_prompt = “My name is Khazar”

# Load the last 5 interactions from history (for context)
short_term, _ = memory_manager.load_history()
last_interactions = short_term[-5:] if len(short_term) >= 5 else short_term

# Retrieve relevant past interactions, excluding the last 5
relevant_interactions = memory_manager.retrieve_relevant_interactions(new_prompt, exclude_last_n=5)

# Generate a response using the last interactions and retrieved interactions
response = memory_manager.generate_response(new_prompt, last_interactions, relevant_interactions)

# Display the response
print(f”Generated response:n{response}”)

# Extract concepts for the new interaction
combined_text = f”{new_prompt} {response}”
concepts = memory_manager.extract_concepts(combined_text)

# Store this new interaction along with its embedding and concepts
new_embedding = memory_manager.get_embedding(combined_text)
memory_manager.add_interaction(new_prompt, response, new_embedding, concepts)

if __name__ == “__main__”:
main()

In this script, the MemoryManager Is initialized with specified chat and embedding models, along with a storage option. A new user prompt is processed, and the system retrieves relevant past interactions to generate a contextually appropriate response. The interaction is then stored with its embedding and extracted concepts for future reference.

Memoripy provides an essential advancement in building AI systems that are more context-aware. The ability to retain and recall relevant information enables the development of virtual assistants, conversational agents, and customer service systems that offer more consistent and personalized interactions. For instance, a virtual assistant using Memoripy could remember user preferences or details of prior requests, thereby offering a more tailored response. Preliminary evaluations indicate that AI systems incorporating Memoripy exhibit enhanced user satisfaction, producing more coherent and contextually appropriate responses. Moreover, Memoripy’s emphasis on local storage is crucial for privacy-conscious applications, as it allows data to be handled securely without reliance on external servers.

In conclusion, Memoripy represents a significant step towards more sophisticated AI interactions by providing real memory capabilities that enhance context retention and coherence. By structuring memory in a way that closely mimics human cognitive processes, Memoripy paves the way for AI systems that can adapt based on cumulative user interactions and offer more personalized, contextually aware experiences. This library provides developers with the tools needed to create AI that not only processes inputs but also learns from interactions in a meaningful way.

Check out the GitHub Repo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production
The post Meet Memoripy: A Python Library that Brings Real Memory Capabilities to AI Applications appeared first on MarkTechPost.

NeuralDEM: Pioneering High-Performance Simulation of Large-Scale Parti …

Developments in simulating particulate flows have significantly impacted industries ranging from mining to pharmaceuticals. Particulate systems consist of granular materials interacting with each other and surrounding fluids, and their accurate modeling is critical for optimizing processes. However, traditional numerical methods like the Discrete Element Method (DEM) face substantial computational limitations. These methods track particle movements and interactions by solving Newton’s equations of motion, which require enormous computational resources. Coupled with fluid dynamics simulations, DEM becomes even more demanding, making large-scale or long-duration simulations impractical for real-time applications.

One of the central challenges in this domain lies in the multiscale nature of particulate systems. Simulating millions of particles interacting over time necessitates microsecond-scale timesteps, causing simulations to run for hours or even days. Also, DEM requires extensive calibration of microscopic material properties, such as friction coefficients, to achieve accurate macroscopic results. Such calibration is tedious and error-prone, further complicating the integration of these simulations into iterative industrial workflows. Existing methods, although correct, need help to accommodate the vast computational demands of industrial systems with over 500,000 particles or fluid cells.

Researchers from NXAI GmbH, Institute for Machine Learning, JKU Linz, University of Amsterdam, and The Netherlands Cancer Institute developed NeuralDEM. NeuralDEM employs deep learning to replace the computationally intensive routines of DEM and CFD-DEM. This framework models particle dynamics and fluid interactions as continuous fields, significantly reducing computational complexity. By leveraging multi-branch neural operators, NeuralDEM directly predicts macroscopic behaviors such as flow regimes and transport phenomena without requiring detailed microscopic parameter calibration. This ability to generalize across diverse system conditions is a key innovation, enabling seamless simulation of varying geometries, particle properties, and flow conditions.

The architecture of NeuralDEM is built on the concept of multi-branch transformers. These neural operators process multiple physical phenomena simultaneously. For example, the framework uses primary branches to model core physics like particle displacement and fluid velocity, while auxiliary branches handle macroscopic quantities such as particle transport and mixing. This design allows NeuralDEM to simulate highly complex scenarios involving 500,000 particles and 160,000 fluid cells, as demonstrated in the fluidized bed reactor experiments. Unlike traditional DEM, NeuralDEM operates on coarser timesteps, achieving real-time simulation performance for long-duration processes.

In experimental validation, NeuralDEM was applied to hopper and fluidized bed reactor systems, showcasing its versatility and efficiency. In hopper simulations involving 250,000 particles, NeuralDEM accurately captured macroscopic flow phenomena such as mass flow and funnel flow regimes. It successfully predicted outflow rates, drainage times, and residual material volumes with minimal deviation from ground-truth DEM results. For instance, NeuralDEM estimated drainage times within 0.19 seconds of DEM calculations and predicted residual material volumes with an average error of 0.41%. These simulations required only a fraction of the computational time compared to DEM, achieving real-time performance.

In fluidized bed reactors, NeuralDEM demonstrated its capacity to model fast and transient phenomena involving strong particle-fluid interactions. Simulations with 500,000 particles and 160,000 fluid cells accurately replicated mixing behaviors, residence times, and dynamic flow patterns. The researchers highlighted NeuralDEM’s ability to simulate 28-second trajectories in just 2800 machine learning timesteps, a significant reduction compared to traditional methods. This efficiency positions NeuralDEM as a transformative tool for industrial applications requiring rapid and reliable process modeling.

The research presents key takeaways that highlight NeuralDEM’s potential as a game-changing technology:

Scalability: Successfully simulated systems with up to 500,000 particles and 160,000 fluid cells, significantly extending the applicability of numerical modeling to industrial-scale problems.

Accuracy: Achieved high fidelity in modeling complex flow regimes, with errors as low as 0.41% for residual material predictions.

Efficiency: Reduced computational times from hours to real-time, making iterative design and optimization feasible.

Generality: Demonstrated robustness across varying system parameters, including geometries, material properties, and flow velocities.

Innovation: Introduced multi-branch neural operators capable of decoupling microscopic and macroscopic modeling for enhanced flexibility and precision.

In conclusion, NeuralDEM represents a leap forward in the simulation of particulate flows, bridging the gap between computational feasibility and industrial applicability. By leveraging deep learning to address the limitations of traditional methods, NeuralDEM has redefined the landscape of numerical modeling. Its efficiency, scalability, and accuracy make it a pivotal tool for industries aiming to optimize processes and accelerate engineering cycles. The results of this research showcase a clear pathway for integrating advanced simulations into real-world workflows, unlocking new possibilities for innovation in particulate system modeling.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production
The post NeuralDEM: Pioneering High-Performance Simulation of Large-Scale Particulate Systems with Multi-Branch Neural Operator Architectures appeared first on MarkTechPost.

Lead Scoring + Meta Ads: The Secret to Better Targeting

Ever felt like your Meta Ads budget is a black hole, swallowing your money without delivering the results you want? 

You’re not alone. 

On average, 37% of marketers say their ad spend is wasted on the wrong audience. Yikes!

But here’s the good news. With lead scoring, you can rank your potential customers based on their likelihood to convert and by prioritizing the leads that are most engaged and ready to take action, you can focus your ad budget where it counts and stop wasting money on people who aren’t interested.

That sounds pretty good, right? Let’s do it then. We’re gonna dive into lead scoring for Meta Ads and see how you can stop wasting money. 

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Why Most Meta Ads Fail: You’re Treating Every Lead the Same

Most Meta Ads campaigns don’t fail because your product isn’t great or because people hate your brand. They fail because you’re trying to talk to everyone at once. 

And here’s the truth – I’m sorry to say that not everyone cares about what you’re offering. 

It’s the classic “spray and pray” approach. Blasting ads to every potential lead and hoping for results. This strategy often leads to wasted ad spend, low engagement, and missed opportunities to connect with the leads that actually matter.

Now, imagine this. What if your ads knew exactly who was most likely to click, buy, and rave about your brand? What if you could focus your budget on the leads that are genuinely interested and ready to act?

That’s where lead scoring (and Customers.ai Meta Ads Audiences) comes in and yes, it works. 

By prioritizing your top leads and targeting them with precision, you’ll stop spending on dead ends and start running campaigns that actually deliver results. 

Let’s break down how it all works.

What Happens When You Ignore Lead Scoring (Hint: It’s Expensive)

Let’s cut to the chase. Ignoring lead scoring is basically like throwing your Meta Ads budget into a shredder. Without it, you’re likely spending big on people who have zero interest in buying from you.

Here’s what that looks like in the wild:

Wasted Ad Spend: Without knowing which leads are worth targeting, you’re paying to show ads to people who might never convert. That’s money straight down the drain.

Low CTRs and Engagement: Sending generic ads to everyone results in fewer clicks, dismal engagement rates, and a big hit to your campaign performance.

Clogged Funnels: A flood of irrelevant leads clogs your pipeline, making it harder to identify and nurture the ones who actually matter.

The numbers don’t lie. Brands using lead scoring with Meta Ads see 20-30% higher conversion rates. And the brands that don’t? Well, let’s just say they’re spending a lot more for a lot less in return.

Don’t let this be your story. 

The Simple Math Behind Lead Scoring

Lead scoring is about categorizing your leads into three key areas:

1. Actions: What Are They Doing?

A lead’s actions provide the clearest insight into their level of engagement with your brand. Every interaction they have with your ads, website, or content can indicate how interested they are in what you offer.

Key actions to track include:

Ad Engagement: Did they click on your Meta Ad or simply scroll past it? A click is a strong signal they’re curious about your product or offer.

Website Behavior: What pages did they visit? Leads exploring your product pages, blog, or resources are engaging with your brand and moving down the funnel.

Resource Downloads: Did they download a free ebook, guide, or template? This shows they see value in your content and are likely open to further communication.

With Customers.ai, you can see who clicked your Meta ad and better understand their behavior and engagement levels.

2. Demographics: Who Are They?

Not all leads are created equal and understanding who your audience is plays a huge role in determining their value. This involves looking at demographic information such as:

Age and Gender: Are they within your ideal customer profile? For example, a lead in their early 30s might be more likely to buy athletic apparel than someone in their 60s.

Location: Do they live in areas where your product or service is available? If you’re targeting a local market, proximity can make or break a lead’s relevance.

Interests: Does their behavior or profile match the interests you’ve targeted? For instance, someone engaging with content about hiking gear is a strong lead for an outdoor equipment store.

Using Customers.ai’s demographic data and visitor analytics, you can gain insights into these demographics even if a visitor hasn’t filled out a form.

3. Intent Signals: Are They Ready to Buy?

Intent signals are the strongest indicators of whether a lead is on the verge of converting. These behaviors typically include:

Browsing Pricing Pages: This is one of the most telling signals because they’re actively evaluating costs and considering a purchase.

Spending Time on Key Pages: Are they lingering on your product descriptions or FAQ pages? The more time they spend, the closer they are to making a decision.

Engaging with Customer Support: A lead asking specific questions about features, availability, or shipping is likely near the bottom of the funnel and ready to act.

With customer journey tracking, Customers.ai can show you these intent signals in real-time. See which pages they’re visiting, how much time they are spending, how many times they’re coming back, if they’re opening emails, and much more. 

By understanding and tracking these three factors, you’ll have a clear framework for prioritizing leads and focusing your resources where they’ll make the biggest impact. And with visitor identification tools like Customers.ai, you’ll have all this data at your fingertips, ready to turn insights into action.

Using Your Lead Scoring to Create Custom Audiences

Once you’ve scored your leads, it’s time to turn that data into action using Meta Ads’ Custom Audiences. 

The best part? with Customers.ai, you can take those high-intent website visitors and create audiences that sync directly to your Meta Ads campaigns.

For instance, you can build an audience of all visitors who reached the cart but didn’t provide their information or complete the checkout. Or you can get even more granular – create an audience of everyone who visited your Buffalo Bills page and retarget them with Bills fan gear. 

With this level of data you can craft highly targeted custom audiences that generate results and improve your ROAS.

By integrating Customers.ai with Meta Ads you’re not just advertising. You’re making every dollar work smarter, not harder.

How to Actually Use Lead Scoring with Ecommerce Meta Ads 

Lead scoring is even more powerful when applied to ecommerce. It helps you zero in on your highest-value customers, the ones who are most likely to buy, and craft campaigns tailored to their behavior. 

Let’s simplify things with this actionable framework.

1. Score It to Pour It

Start by assigning scores to specific actions your ecommerce leads take. The more valuable the action, the higher the score. Here’s an example of how to score common ecommerce behaviors:

Form Submissions = 10 Points: They’ve shared their contact info, signaling serious interest in your brand.

Product Page Views = 5 Points: They’re browsing specific items, showing curiosity about your offerings.

Pricing Page or Shopping Cart Visit = 15 Points: This is ecommerce gold. They’re comparing prices, adding items to their cart, or checking shipping details. These actions indicate they’re close to converting.

With tools like Customers.ai that integrate directly into Salesforce, you can automate this process. By tracking website visitor behavior and syncing customer data, you can get a clear picture of who’s ready to buy and who might need more nurturing.

2. High Scorers, Meet VIP Ads

Your top-scoring ecommerce leads deserve the spotlight. Use Meta Ads Custom Audiences to target these high-value shoppers with precision.

Here’s how to apply this to your store:

Use Customers.ai to identify high-intent visitors, such as those who added items to their cart but didn’t check out.

Sync these leads to your Meta Ads account to create custom audiences tailored to their behavior.

Run dynamic product ads showcasing the exact items they viewed, paired with incentives like “Free Shipping on Orders Over $50!” or “Get It Before It’s Gone!”

For example, if a customer spent time on your site browsing blue shirts but didn’t complete the purchase, show them an ad featuring your bestselling blue shirts or a limited-time discount.

3. Warm ’Em Up Without Burning Out

Not every ecommerce lead is ready to check out immediately. For those mid-tier leads, retargeting ads are the key to keeping your brand top of mind while gently nudging them toward a purchase.

Here’s how to nurture them:

Identify visitors who browsed your site or clicked an ad but didn’t make it to the checkout stage.

Use carousel ads to highlight related or complementary products they might love.

Incorporate customer reviews or social proof into your ads to build trust.

Example: A visitor looked at a pair of hiking boots but didn’t buy. Show them an ad featuring the boots along with a matching pair of socks or testimonials from happy customers who rave about the product.

With Customers.ai, you can segment your audience by actions taken (or not taken), ensuring your retargeting ads hit the right people at the right time.

By blending lead scoring with ecommerce insights, you’re not just running ads, you’re running smarter, more personalized campaigns. 

Webinar: 2024 Playbook

Increasing Traffic & Conversions in the Age of AI

Free Webinar: Watch Now

The Experiment: How to Prove Lead Scoring Works for Your Meta Ads

Let’s turn your skepticism into excitement with a simple experiment that puts lead scoring to the test. 

No fluff, just results. 

Here’s how to prove that using lead scoring with Meta Ads isn’t just a good idea. It’s a game-changer.

Step 1: Set Up Two Ad Campaigns

Campaign A: This is your control group. Set up a Meta Ads campaign without any lead scoring. Use broad targeting or general interest-based audiences.

Campaign B: This is your experimental group. Use lead scoring to create a Custom Audience of high-intent leads using Customers.ai. Include visitors who interacted with your site, like viewing product pages, hitting the shopping cart, or browsing pricing.

Both campaigns should use the same budget, creatives, and objectives so the only difference is the audience.

Step 2: Compare the Results

After running the campaigns for a few weeks, dig into the data:

CTR (Click-Through Rate): Are more people clicking the ads in Campaign B? A higher CTR indicates better targeting.

CPC (Cost Per Click): Is Campaign B costing less per click? Lower CPC suggests you’re focusing on quality leads.

Conversions: Which campaign drove more sales, sign-ups, or purchases? Higher conversion rates in Campaign B prove the power of targeting high-scoring leads.

Bonus points: Track ROAS (Return on Ad Spend) to see which campaign gave you the biggest bang for your buck.

Here’s what one of our clients saw using the Customers.ai Meta Ads audiences in just 30 days:

Step 3: Celebrate the Victory

If Campaign B crushes Campaign A (and it will), take a moment to celebrate. You’ve just unlocked a smarter, more effective way to run Meta Ads that drives results instead of wasting money.

The Challenge: Let the Numbers Speak

Here’s your challenge: Run this experiment for 30 days. 

Use Customers.ai to identify and score leads, sync them into a Custom Audience, and let the campaigns run. Compare the data, and let the results convince you.

Remember…lead scoring isn’t just another marketing buzzword. It’s a proven strategy to make your Meta Ads more targeted, cost-effective, and impactful. 

Real Talk: Lead Scoring Won’t Fix Bad Meta Ads

If your ads suck, no amount of lead scoring will save you. You can have the most perfectly targeted audience in the world but if your creative doesn’t grab their attention or inspire action, you’re throwing money away.

Quick Tips for Killer Meta Ads

Want ads that convert? Here’s what you need to focus on:

Visuals That Stop the Scroll

Use high-quality images or videos that are eye-catching and relevant to your audience.

For ecommerce, showcase your products in action—think lifestyle shots, user-generated content, or dynamic product ads.

Copy That Connects

Speak directly to your audience’s needs and desires. Highlight benefits, not just features.

Keep it short and punchy. Meta Ads aren’t the place for essays.

Clear and Compelling CTAs

Be specific about what you want them to do—“Shop Now,” “Claim Your Discount,” or “Learn More.”

Avoid generic CTAs like “Click Here,” which don’t convey value.

Test, Test, Test:

A/B test everything—headlines, images, offers, and CTAs—to see what resonates with your audience.

Need some inspiration to make your ads more compelling? Check out our guide, 101 Meta Ads from Top DTC Brands to Inspire Your Next Campaign. 

Conclusion: Make Every Ad Dollar Count

Your Meta Ads deserve to be smarter. 

Your budget deserves to work harder. 

And your leads? 

They deserve better targeting. 

By using lead scoring to identify your most valuable prospects and pairing that data with precision Meta Ads campaigns, you’re not just advertising, you’re creating campaigns that convert.

The best part? It’s all within your reach. Tools like Customers.ai make it simple to find high-quality leads, sync them to Meta Ads, and build campaigns that generate real results.

Ready to see the difference? 

Try Customers.ai free and start improving your Meta Ads results today!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Lead Scoring + Meta Ads: The Secret to Better Targeting appeared first on Customers.ai.

Meet NEO: A Multi-Agent System that Automates the Entire Machine Learn …

Machine learning (ML) engineers face many challenges while working on end-to-end ML projects. The typical workflow involves repetitive and time-consuming tasks like data cleaning, feature engineering, model tuning, and eventually deploying models into production. Although these steps are critical to building accurate and robust models, they often turn into a bottleneck for innovation. The workload is riddled with mundane and manual activities that take away precious hours from focusing on advanced modeling or refining core business solutions. This has created a need for solutions that can not only automate these cumbersome processes but also optimize the entire workflow for maximum efficiency.

Introducing NEO: Revolutionizing ML Automation

Meet NEO: A Multi-Agent System that Automates the Entire Machine Learning Workflow. NEO is here to transform how ML engineers operate by acting as a fully autonomous ML engineer. Developed to eliminate the grunt work and enhance productivity, NEO automates the entire ML process, including data engineering, model selection, hyperparameter tuning, and deployment. It’s like having a tireless assistant that enables engineers to focus on solving high-level problems, building business value, and pushing the boundaries of what ML can do. By leveraging recent advancements in multi-step reasoning and memory orchestration, NEO offers a solution that doesn’t just reduce manual effort but also boosts the quality of output.

Technical Details and Key Benefits

NEO is built on a multi-agent architecture that utilizes collaboration between various specialized agents to tackle different segments of the ML pipeline. With its capacity for multi-step reasoning, NEO can autonomously handle data preprocessing, feature extraction, and model training while selecting the most suitable algorithms and hyperparameters. Memory orchestration allows NEO to learn from previous tasks and apply that experience to improve performance over time. Its effectiveness was put to the test in 50 Kaggle competitions, where NEO secured a medal in 26% of them. To put this into perspective, the previous state-of-the-art OpenAI’s O1 system with AIDE scaffolding had a success rate of 16.9%. This significant leap in benchmark results demonstrates the capacity of NEO to take on sophisticated ML challenges with greater efficiency and success.

The Impact of NEO: Why It Matters

This breakthrough is more than just a productivity enhancement; it represents a major shift in how machine learning projects are approached. By automating routine workflows, NEO empowers ML engineers to focus on innovation rather than being bogged down by repetitive tasks. The platform brings world-class ML capabilities to everyone’s fingertips, effectively democratizing access to expert-level proficiency. This ability to solve complex ML problems autonomously helps reduce the gap between expertise levels and facilitates faster project turnarounds. The results from Kaggle benchmarks confirm that NEO is capable of matching and even surpassing human experts in certain aspects of ML workflows, qualifying it as a Kaggle Grandmaster. This means NEO can bring the kind of machine learning expertise typically associated with top-tier data scientists directly into businesses and development teams, providing a major boost to overall efficiency and success rates.

Conclusion

In conclusion, NEO represents the next frontier in machine learning automation. By taking care of the tedious and repetitive parts of the workflow, it saves thousands of hours that engineers would otherwise spend on manual tasks. The use of multi-agent systems and advanced memory orchestration makes it a powerful tool for enhancing productivity and pushing the boundaries of ML capabilities.

To try out NEO join our waitlist here.

Check out the Details here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production
The post Meet NEO: A Multi-Agent System that Automates the Entire Machine Learning Workflow appeared first on MarkTechPost.