Lead Capture Software Reviewed: The 15 Best Tools for 2025

As marketers, we spend countless hours and resources driving traffic to our websites, landing pages, social media channels, etc. And it’s exciting to see those numbers go up—more visitors, more clicks, more engagement! 

But here’s the truth: none of that matters if we’re not capturing the most important thing—our leads.

In the rush to get attention, many marketers overlook one crucial step: actually identifying who those visitors are. 

That’s where lead capture software comes into play. 

Lead capture tools don’t just help you track who’s visiting – they tell you who they are, what they’re interested in, and, most importantly, where they are in the buying funnel. 

With the right lead capture software, you can finally bridge that gap—nurturing your audience from the moment they engage to the point they convert.

In this post, we’ll dive deep into 15 of the best lead capture software tools for 2025, exploring their unique features, benefits, and reviews to help you choose the best one for your needs.

Lead Capture Software

Customers.ai

ConvertFlow

OptinMonster

JotForm

Leadpages

Unbounce

Sleeknote

Podium

Stova

Privy

Sumo

Justuno

Drift (Now Salesloft)

OptiMonk

Wishpond

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

What is Lead Capture Software?

Before we dive into the specific tools, it’s important to understand what lead capture software is and why it’s essential for marketers.

Lead capture software is a tool designed to help businesses identify and collect information from potential customers (leads) who visit their website or engage with their digital content. This software enables marketers to gather key details, such as a visitor’s name, email address, and behavior on the site, which can then be used to guide the lead through the sales funnel.

With lead capture software, you don’t have to let visitors slip away unnoticed. It helps you grab their info and follow up in a way that feels natural, turning casual browsers into real business opportunities. 

At the end of the day, it’s all about keeping the conversation going and making sure you’re not missing out on potential customers.

What does Lead Capture Software Do?

So, what exactly does lead capture software do? 

In simple terms, it grabs key info from your website visitors. 

For example, if someone visits your ecommerce store but leaves without making a purchase, lead capture software can grab their email through a pop-up form or a special offer, like a 10% discount code. Now, you have their contact details and can follow up with a personalized email or retarget them through ads.

But it doesn’t stop there. Say another visitor is comparing a few services on your site and clicks on multiple pages. Lead capture tools like visitor identification can track that behavior and tell you where this person is in the buying process—are they just browsing or showing real intent? 

It helps you organize that data so you can follow up at the perfect moment, with the right message to move them closer to becoming a customer.

In short, lead capture software automates the process of turning visitors into leads, helping you understand who they are, what they’re interested in, and how to nurture them into paying customers.

Who Uses Lead Capture Software?

Traditionally, lead capture software has been a go-to for B2B companies—think sales teams tracking high-value leads and nurturing them through long buying cycles. 

But these days, it’s just as useful for ecommerce businesses, service-based companies, and even local shops. Whether you’re running an online store that needs to capture email addresses for abandoned carts or a B2B company looking to track potential clients, lead capture software helps streamline the process.

It’s not limited to just big businesses, either. Marketers, small business owners, and agencies all use these tools to turn website visitors into customers, no matter what industry they’re in.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

The Best Lead Capture Software [Reviewed]

Alright, now let’s really get into why we’re here. 

We’ve talked about what lead capture software is and how it can work for any business, but now it’s time to check out the best tools available. 

Below, we’ve reviewed 15 top lead capture software options for 2025, highlighting their features, strengths, and what makes them stand out. 

Whether you’re just starting or looking to upgrade, there’s something here for everyone.

**Note: Pricing info is as of October 2024**

1. Customers.ai

URL: https://www.customers.ai

What it Does: Customers.ai is an AI-powered lead capture and remarketing tool designed to help businesses capture high-intent visitors, track the customer journey, and nurture leads across multiple channels through it’s integrations with Klaviyo, Shopify, Meta, and more. It automates follow-ups via email and sends audiences to ad remarketing platforms to help convert leads into customers. It’s especially useful for ecommerce and DTC businesses, providing detailed visitor tracking and segmentation.

Pricing:Customers.ai offers flexible pricing with a free trial and paid plans starting at $99/month.

Rating: ★★★★★ (4.8/5)

Customers.ai Lead Capture Software Reviews:

Customers.ai consistently receives high marks from users for its ease of use, powerful lead capture capabilities, and its all-in-one approach to marketing automation. Here’s what real users have to say:

Ease of Use: Many users appreciate how intuitive the platform is. One G2 reviewer mentioned, “Customers.ai makes lead capture and automation a breeze. I was able to set up my campaigns in minutes without needing any technical skills.” This ease of use is a common theme, especially for small businesses and teams without dedicated technical resources.

AI-Powered Automations: A key standout is Customers.ai’s AI-driven tools. A reviewer on Trustpilot noted, “The AI automations have drastically reduced the time we spend managing our lead follow-ups. It captures our visitors and instantly sends them into the right workflows, boosting our conversion rates significantly.”

Lead Identification and Retargeting: Customers also rave about the ability to not only capture leads but to take action with them. A Capterra review said, “The visitor identification and retargeting features are incredible. We can now track anonymous visitors, identify who they are, and remarket to them via ads and email. It’s helped us capture leads we would have otherwise lost.”

Customer Support: Several reviewers have praised the platform’s customer support. One G2 reviewer shared, “The support team is always responsive and ready to help. Whenever we hit a snag, they’ve been there to guide us through, making sure we get the most out of the platform.”

2. ConvertFlow

URL: https://www.convertflow.com

What it Does: ConvertFlow is a lead generation platform that helps marketers create personalized on-site experiences, including pop-ups, forms, and quizzes, to capture and convert leads. It integrates with various CRM and email marketing platforms to automate follow-ups.

Pricing: Plans start at $22/month with a 14-day free trial.

Rating: ★★★★☆ (4.6/5)

ConvertFlow Lead Capture Software Reviews:

ConvertFlow earns praise for its personalization features and seamless integrations.

Customizability: One G2 user wrote, “The ability to create personalized funnels and on-site experiences is a game-changer for us. ConvertFlow integrates well with our email platform, allowing us to build highly targeted lead capture forms.”

Ease of Use: Another reviewer on Capterra mentioned, “We had our pop-ups and lead forms running in no time, and it was super easy to integrate with our CRM.”

Support: Trustpilot reviews frequently mention the helpful customer support, with one stating, “ConvertFlow’s support team walked us through every step of the setup process.”

3. OptinMonster

URL: https://www.optinmonster.com

What it Does: OptinMonster specializes in building lead capture forms such as pop-ups, slide-ins, and banners to convert website visitors into subscribers and customers. It’s known for its A/B testing features and exit-intent technology.

Pricing: Plans start at $7/month, billed annually.

Rating: ★★★★☆ (4.5/5)

OptinMonster Lead Capture Software Reviews:

OptinMonster is widely praised for its conversion optimization features and exit-intent technology.

Lead Conversion: A G2 reviewer shared, “OptinMonster’s exit-intent technology has saved us thousands in lost leads. We’ve been able to capture users who were about to leave and turn them into subscribers or customers.”

User Interface: One user on Trustpilot commented, “The drag-and-drop builder is easy to use, and the templates made it simple to get started without needing a designer.”

Flexibility: On Capterra, a reviewer mentioned, “The flexibility to run different campaigns—pop-ups, slide-ins, banners—has given us a lot of creative control over how we capture leads.”

4. JotForm

URL: https://www.jotform.com

What it Does: JotForm is a form builder that allows users to create custom forms for lead capture, surveys, and more. It integrates with email marketing services and CRMs, making it easy to collect and manage leads.

Pricing: Free for basic features, with paid plans starting at $34/month.

Rating: ★★★★☆ (4.7/5)

JotForm Lead Capture Software Reviews:

​​JotForm is known for its versatile form-building capabilities and ease of use.

Custom Forms: One G2 reviewer noted, “We’ve used JotForm to create everything from lead capture forms to customer surveys. The flexibility and customization options are unmatched.”

Ease of Use: A Capterra user mentioned, “Building forms on JotForm is incredibly easy, even for non-technical users. Plus, the integrations with CRMs and email platforms are seamless.”

Customer Service: A Trustpilot review praised JotForm’s support, saying, “The customer support team is always quick to respond and offers real solutions.”

5. Leadpages

URL: https://www.leadpages.com

What it Does: Leadpages is a landing page builder designed to help businesses capture leads through high-converting landing pages, pop-ups, and alert bars. It integrates with a wide variety of marketing tools.

Pricing: Plans start at $37/month with a 14-day free trial.

Rating: ★★★★☆ (4.4/5)

Leadpages Lead Capture Software Reviews:

Leadpages is highly regarded for its landing page builder and conversion tracking tools.

Landing Page Quality: A Capterra user said, “Leadpages makes it simple to create beautiful, high-converting landing pages. We’ve seen a huge boost in our lead capture since using the platform.”

Integrations: One G2 reviewer noted, “The integrations with email and CRM systems allow us to instantly follow up with leads, which is essential for our business.”

Support: On Trustpilot, a reviewer commented, “Leadpages’ customer service team is phenomenal. They’re always willing to help us optimize our pages and improve our conversion rates.”

6. Unbounce

URL: https://www.unbounce.com

What it Does: Unbounce is a platform for building and optimizing landing pages, pop-ups, and sticky bars to capture and convert leads. It’s known for its A/B testing and AI-driven suggestions to improve conversions.

Pricing: Plans start at $99/month with a 14-day free trial.

Rating: ★★★★☆ (4.5/5)

Unbounce Lead Capture Software Reviews:

Unbounce is frequently praised for its A/B testing capabilities and the quality of its landing pages.

A/B Testing: A G2 user wrote, “The A/B testing features in Unbounce have helped us refine our messaging and design, leading to higher conversion rates.”

Landing Page Builder: One reviewer on Capterra shared, “Unbounce is easy to use and has great drag-and-drop features that make building landing pages fast and simple.”

Support: A Trustpilot review mentioned, “Their support team is knowledgeable and quick to respond, helping us get the most out of the platform.”

7. Sleeknote

URL: https://www.sleeknote.com

What it Does: Sleeknote helps ecommerce sites convert visitors into leads through personalized pop-ups, slide-ins, and banners. It focuses on capturing emails and reducing cart abandonment with advanced targeting options.

Pricing: Plans start at $59/month with a 7-day free trial.

Rating: ★★★★☆ (4.6/5)

Sleeknote Lead Capture Software Reviews:

Sleeknote is widely recognized for its ecommerce-focused lead capture solutions and personalized pop-ups.

Pop-Up Customization: A reviewer on G2 said, “Sleeknote’s customization options are fantastic. We’re able to tailor the pop-ups to match our brand and message, making them far more effective.”

Targeting Options: A user on Capterra noted, “The advanced targeting rules let us show the right message to the right visitor, which has boosted our conversion rates significantly.”

Support: On Trustpilot, a reviewer wrote, “Sleeknote’s support team is always there to help us set up campaigns and offer advice on best practices.”

8. Podium

URL: https://www.podium.com

What it Does: Podium is a lead capture tool that focuses on converting website visitors into leads through text messaging and chat. It’s particularly useful for local businesses to communicate with potential customers and capture leads in real-time.

Pricing: Custom pricing based on business needs; starts around $399/month.

Rating: ★★★★☆ (4.5/5)

Podium Lead Capture Software Reviews:

Podium is well-reviewed for its focus on local businesses and its text-based lead capture tools.

Customer Engagement: A G2 reviewer said, “Podium has revolutionized how we engage with potential customers. The ability to capture leads via text has dramatically increased our response rates.”

Ease of Use: One Trustpilot review mentioned, “It was so easy to set up and use Podium. We started capturing and converting leads within a day.”

Support: A Capterra user noted, “Podium’s customer support has been amazing. They walked us through every step of the integration process.”

9. Stova (Formerly Aventri)

URL: https://www.stova.io

What it Does: Stova is an event management platform that includes lead capture tools for event organizers. It allows businesses to collect and manage leads from event registrations, networking, and other engagement activities.

Pricing: Custom pricing based on the size of the event and features required.

Rating: ★★★★☆ (4.3/5)

Stova Lead Capture Software Reviews:

Stova is popular among event organizers for its lead capture and attendee engagement features.

Event Lead Capture: A G2 reviewer shared, “We use Stova for all our event lead capture needs. It helps us collect valuable attendee data and follow up quickly.”

Event Management: One Capterra user mentioned, “Stova makes it easy to manage attendees and gather leads during and after events, all in one platform.”

Support: A Trustpilot reviewer wrote, “Their support team is incredibly responsive and knowledgeable, helping us make the most out of the platform.”

10. Privy

URL: https://www.privy.com

What it Does: Privy offers pop-ups, email capture forms, and exit-intent banners, primarily targeting ecommerce stores. It also includes email marketing features, making it a comprehensive tool for lead generation and follow-ups.

Pricing: Paid plans starting at $24/month.

Rating: ★★★★☆ (4.6/5)

Privy Lead Capture Software Reviews:

Privy is well-liked for its ecommerce focus and ability to reduce cart abandonment through lead capture.

Cart Abandonment: A G2 reviewer said, “Privy’s pop-ups and exit-intent features have helped us reduce cart abandonment and recover lost sales.”

Ease of Use: On Capterra, a user shared, “It’s super easy to create pop-ups and email capture forms without needing a developer. Plus, it integrates seamlessly with our Shopify store.”

Support: A Trustpilot review mentioned, “Privy’s support team has been excellent at helping us optimize our campaigns and get more conversions.”

11. Sumo (Now Bdow)

URL: https://www.bdow.com

What it Does: Bdow (previously Sumo) is a suite of tools that help websites capture leads through pop-ups, scroll boxes, and other opt-in forms. It also offers email marketing integrations and social sharing tools to maximize conversions.

Pricing: Free plan available, with paid plans starting at $12/month (billed annually).

Rating: ★★★★☆ (4.3/5)

Sumo (now Bdow) Lead Capture Software Reviews:

Sumo (now Bdow) is praised for its simplicity and effectiveness in capturing leads with pop-ups and scroll boxes.

Lead Generation: A G2 reviewer shared, “Sumo has been great for helping us capture leads through simple, effective pop-ups that don’t disrupt the user experience.”

Easy Setup: On Capterra, a user mentioned, “It was incredibly easy to install and set up Sumo. We saw results within the first week of using it.”

Support: A Trustpilot review stated, “The customer service is always helpful, especially when it comes to tweaking the settings to match our needs.”

12. Justuno

URL: https://www.justuno.com

What it Does: Justuno is a conversion optimization platform that provides on-site pop-ups, banners, and form builders to capture leads and grow email lists. It integrates with popular ecommerce and marketing platforms.

Pricing: Free plan with additional plans start at $35/month with a 14-day free trial.

Rating: ★★★★☆ (4.5/5)

Justuno Lead Capture Software Reviews:

Justuno is known for its conversion optimization and lead capture features for ecommerce businesses.

Conversion Boost: A reviewer on G2 noted, “Justuno’s AI-driven pop-ups have helped us increase conversions and grow our email list significantly.”

Personalization: A Capterra user said, “The personalization and targeting options are fantastic. We’re able to show different offers to different segments of our audience, which has improved our results.”

Support: A Trustpilot review highlighted, “Their support team is always responsive and goes above and beyond to help us optimize our campaigns.”

13. Drift (Now Salesloft)

URL: https://www.salesloft.com

What it Does: Drift (now Salesloft) is a conversational marketing platform that uses live chat and chatbots to capture leads. It helps businesses engage website visitors in real time, making it easier to convert them into leads.

Pricing: Plans start at $2,500/year.

Rating: ★★★★☆ (4.4/5)

Drift (now Salesloft) Lead Capture Software Reviews:

Drift (now Salesloft) is often praised for its conversational marketing and lead capture features via chatbots.

Chatbots for Lead Capture: A G2 reviewer mentioned, “Drift’s chatbots have completely transformed our lead capture strategy. We’re able to engage visitors in real-time and capture their information before they leave.”

Customer Engagement: A Capterra user shared, “Drift’s conversational approach to marketing has helped us engage with more potential customers, leading to better conversions.”

Support: On Trustpilot, a reviewer wrote, “Drift’s customer support team is top-notch. They’ve helped us get the most out of their tools and optimize our workflows.”

14. OptiMonk

URL: https://www.optimonk.com

What it Does: OptiMonk specializes in pop-ups, sticky bars, and other on-site messages designed to capture leads and increase conversions. It offers personalization and A/B testing features to optimize performance.

Pricing: Free plan available, with paid plans starting at $32/month.

Rating: ★★★★☆ (4.6/5)

OptiMonk Lead Capture Software Reviews:

OptiMonk is highly regarded for its personalization and A/B testing features in lead capture.

Personalization: A G2 reviewer shared, “The personalization options in OptiMonk allow us to create highly targeted pop-ups that convert better than any generic pop-ups we’ve tried.”

A/B Testing: On Capterra, a user wrote, “The A/B testing features are easy to use and have helped us identify the best-performing lead capture campaigns.”

Support: A Trustpilot review mentioned, “OptiMonk’s support team is always quick to help us troubleshoot and optimize our pop-ups.”

15. Wishpond

URL: https://www.wishpond.com

What it Does: Wishpond is a lead generation platform that includes landing pages, pop-ups, forms, and contests to capture leads. It also provides email marketing and automation tools to help nurture those leads.

Pricing: Custom pricing

Rating: ★★★★☆ (4.4/5)

Wishpond Lead Capture Software Reviews:

Wishpond is popular for its lead capture, landing page builder, and email marketing integration.

All-in-One Platform: A G2 reviewer said, “Wishpond makes it easy to create landing pages, capture leads, and follow up with email automation—all in one platform.”

Ease of Use: On Capterra, a user mentioned, “We love how easy it is to create campaigns and landing pages with Wishpond. It’s user-friendly and packed with features.”

How to Choose Your Lead Capture Software

With so many lead capture tools available, how do you decide which one is right for you? 

The truth is, there’s no one-size-fits-all solution. Depending on your business, you might need a combination of tools to meet your goals. Whether it’s a pop-up builder for your ecommerce store, a chatbot for real-time lead capture, or a more comprehensive platform that does it all, like Customers.ai, having a variety of tools in your marketing toolkit is key.

While we’ve reviewed 15 of our favorites, there are plenty of other great options out there. The best approach is to think about your specific needs—whether it’s simplicity, advanced targeting, or full automation—and choose the tools that help you convert visitors into leads most effectively.

If you’re ready to get started today, try Customers.ai free and start capturing leads with names, emails, and more in just a few clicks. It’s the all-in-one solution for turning anonymous visitors into actionable opportunities. 

Don’t let your traffic slip away—start capturing leads now and see the difference!

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Lead Capture Software FAQs

How does lead capture software work?

Lead capture software uses forms, pop-ups, chatbots, or landing pages to gather visitor data. This information is then stored for future follow-ups, nurturing, and conversion.

Why is lead capture software important?

Lead capture software is crucial because it helps businesses convert website traffic into leads, allowing them to grow their contact lists and improve sales.

What are examples of lead capture software?

Examples of lead capture software include Customers.ai, OptinMonster, ConvertFlow, Leadpages, JotForm, and Sleeknote.

Who uses lead capture software?

Lead capture software is used by marketers, sales teams, and businesses of all sizes, from small ecommerce stores to large B2B companies.

What are the best lead capture tools?

Some of the best lead capture tools are Customers.ai, OptinMonster, ConvertFlow, Leadpages, and Drift, all known for their powerful lead generation features.

How much does lead capture software cost?

The cost of lead capture software varies, with some tools offering free plans and others starting around $9/month, with higher tiers offering more features.

Can lead capture software integrate with my CRM?

Yes, most lead capture software integrates with popular CRMs like Salesforce, HubSpot, and Zoho, making it easy to manage and nurture leads.

Does lead capture software work for ecommerce?

Yes, lead capture software works great for ecommerce businesses by capturing emails, reducing cart abandonment, and retargeting visitors through ads.

What features should I look for in lead capture software?

Key features include customizable forms, pop-ups, A/B testing, integrations with email platforms or CRMs, and automation tools to follow up with leads.

Can lead capture software help with retargeting?

Yes, many lead capture tools like Customers.ai allow you to capture leads and then retarget them through email or ads, increasing the chances of conversion.

Is lead capture software easy to use?

Most lead capture tools are designed to be user-friendly, with drag-and-drop builders and pre-made templates to make it easy for non-technical users to set up.

What is the difference between lead capture software and CRM?

Lead capture software focuses on collecting visitor information, while CRM systems help manage and nurture those leads throughout the sales process.

Can I use multiple lead capture tools?

Yes, many businesses use a combination of lead capture tools to meet different needs, such as pop-ups for lead generation and chatbots for real-time engagement.

What’s the best lead capture software for small businesses?

For small businesses, tools like Customers.ai, JotForm, and Privy are affordable and easy to use, offering powerful lead generation features.

How do I choose the best lead capture software for my business?

Consider your specific needs, such as budget, integration requirements, and features like automation or personalization, to select the right tool for your business.

Can lead capture software help improve conversion rates?

Yes, by targeting visitors with relevant forms or pop-ups, lead capture software helps you collect high-intent leads, which can lead to higher conversion rates.

Is lead capture software only for B2B companies?

No, lead capture software works for both B2B and B2C businesses, including ecommerce, service providers, and agencies.

What is the easiest lead capture software to use?

Tools like Customers.ai and Leadpages are often cited as being extremely user-friendly, with easy setups and intuitive interfaces.

How do I set up lead capture software on my website?

Most lead capture tools provide simple integrations or plugins for popular website platforms like WordPress, Shopify, and Wix, making it easy to get started.
The post Lead Capture Software Reviewed: The 15 Best Tools for 2025 appeared first on Customers.ai.

Meet TurtleBench: A Unique AI Evaluation System for Evaluating Top Lan …

The need for efficient and trustworthy techniques to assess the performance of Large Language Models (LLMs) is increasing as these models are incorporated into more and more domains. When evaluating how effectively LLMs operate in dynamic, real-world interactions, traditional assessment standards are frequently used on static datasets, which present serious issues. 

Since the questions and responses in these static datasets are usually unchanging, it is challenging to predict how a model would respond to changing user discussions. A lot of these benchmarks call for the model to use particular prior knowledge, which might make it more difficult to evaluate a model’s capacity for logical reasoning. This reliance on pre-established knowledge restricts assessing a model’s capacity for reasoning and inference independent of stored data.

Other methods of evaluating LLMs include dynamic interactions, like manual evaluations by human assessors or the use of high-performing models as a benchmark. These approaches have disadvantages of their own, even though they may provide a more adaptable evaluation environment. Strong models may have a specific style or methodology that affects the evaluation process; therefore, using them as benchmarks can introduce biases. Manual evaluation frequently requires a significant amount of time and money, making it unfeasible for large-scale applications. These limitations draw attention to the need for a substitute that balances cost-effectiveness, evaluation fairness, and the dynamic character of real-world interactions.

In order to overcome these issues, a team of researchers from China has introduced TurtleBench, a unique evaluation system. TurtleBench employs a strategy by gathering actual user interactions via the Turtle Soup Puzzle1, a specially designed web platform. Users of this site can participate in reasoning exercises where they must guess based on predetermined circumstances. A more dynamic evaluation dataset is then created using the data points gathered from the users’ predictions. Models cheating by memorizing fixed datasets are less likely to use this approach because the data changes in response to real user interactions. This configuration provides a more accurate representation of a model’s practical capabilities, which also guarantees that the assessments are more closely linked with the reasoning requirements of actual users.

The 1,532 user guesses in the TurtleBench dataset are accompanied by annotations indicating the accuracy or inaccuracy of each guess. This makes it possible to examine in-depth how successfully LLMs do reasoning tasks. TurtleBench has carried out a thorough analysis of nine top LLMs using this dataset. The team has shared that OpenAI o1 series models did not win these tests. 

According to one theory that came out of this study, the OpenAI o1 models’ reasoning abilities depend on comparatively basic Chain-of-Thought (CoT) strategies. CoT is a technique that can assist models become more accurate and clear by generating intermediate steps of reasoning before reaching a final conclusion. On the other hand, it appears that the o1 models’ CoT processes might be too simple or surface-level to do well on challenging reasoning tasks. According to another theory, lengthening CoT processes can enhance a model’s ability to reason, but it may also add additional noise or unrelated or distracting information, which could cause the reasoning process to get confused.

The TurtleBench evaluation’s dynamic and user-driven features assist in guaranteeing that the benchmarks stay applicable and change to meet the changing requirements of practical applications.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Meet TurtleBench: A Unique AI Evaluation System for Evaluating Top Language Models via Real World Yes/No Puzzles appeared first on MarkTechPost.

Differentiable Adaptive Merging (DAM): A Novel AI Approach to Model In …

Model merging, particularly within the realm of large language models (LLMs), presents an intriguing challenge that addresses the growing demand for versatile AI systems. These models often possess specialized capabilities such as multilingual proficiency or domain-specific expertise, making their integration crucial for creating more robust, multi-functional systems. However, merging LLMs effectively is not trivial; it often requires deep expertise and significant computational resources to balance different training methods and fine-tuning processes without deteriorating overall performance. To simplify this process and reduce the complexity associated with current model merging techniques, researchers are striving to develop more adaptive, less resource-intensive merging methods.

Researchers from Arcee AI and Liquid AI propose a novel merging technique called Differentiable Adaptive Merging (DAM). DAM aims to tackle the complexities of merging language models by offering an efficient, adaptive method that reduces the computational overhead typically associated with current model merging practices. Specifically, DAM provides an alternative to compute-heavy approaches like evolutionary merging by optimizing model integration through scaling coefficients, enabling simpler yet effective merging of multiple LLMs. The researchers also conducted a comparative analysis of DAM against other merging approaches, such as DARE-TIES, TIES-Merging, and simpler methods like Model Soups, to highlight its strengths and limitations.

The core of DAM is its ability to merge multiple LLMs using a data-informed approach, which involves learning optimal scaling coefficients for each model’s weight matrix. The method is applicable to various components of the models, including linear layers, embedding layers, and layer normalization layers. DAM works by scaling each column of the weight matrices to balance the input features from each model, thus ensuring that the merged model retains the strengths of each contributing model. The objective function of DAM combines several components: minimizing Kullback-Leibler (KL) divergence between the merged model and the individual models, cosine similarity loss to encourage diversity in scaling coefficients, and L1 and L2 regularization to ensure sparsity and stability during training. These elements work in tandem to create a robust and well-integrated merged model capable of handling diverse tasks effectively.

The researchers performed extensive experiments to compare DAM with other model merging methods. The evaluation was conducted across different model families, such as Mistral and Llama 3, and involved merging models with diverse capabilities, including multilingual processing, coding proficiency, and mathematical reasoning. The results showed that DAM not only matches but, in some cases, outperforms more computationally demanding techniques like Evolutionary Merging. For example, in a case study focusing on Japanese language processing and mathematical reasoning, DAM demonstrated superior adaptability, effectively balancing the specialized capabilities of different models without the intensive computational requirements of other methods. Performance was measured using multiple metrics, with DAM generally scoring higher or on par with alternatives across tasks involving language comprehension, mathematical reasoning, and structured query processing.

The research concludes that DAM is a practical solution for merging LLMs with reduced computational cost and manual intervention. This study also emphasizes that more complex merging methods, while powerful, do not always outperform simpler alternatives like linear averaging when models share similar characteristics. DAM proves that focusing on efficiency and scalability without sacrificing performance can provide a significant advantage in AI development. Moving forward, researchers intend to explore DAM’s scalability across different domains and languages, potentially expanding its impact on the broader AI landscape.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Differentiable Adaptive Merging (DAM): A Novel AI Approach to Model Integration appeared first on MarkTechPost.

Google AI Researchers Introduced a Set of New Methods for Enhancing Lo …

Large language models (LLMs) have revolutionized various fields by enabling more effective data processing, complex problem-solving, and natural language understanding. One major innovation is retrieval-augmented generation (RAG), which allows LLMs to retrieve relevant information from external sources, such as large knowledge databases, to generate better answers. However, the integration of long-context LLMs with RAG presents certain challenges. Specifically, while LLMs are becoming capable of handling longer input sequences, the increase in retrieved information can overwhelm the system. The challenge lies in making sure that the additional context improves the accuracy of the LLM’s outputs rather than confusing the model with irrelevant information. 

The problem faced by long-context LLMs stems from a phenomenon where increasing the number of retrieved passages does not necessarily improve performance. Instead, it often leads to performance degradation, primarily due to including irrelevant or misleading documents known as “hard negatives.” These hard negatives appear relevant based on certain retrieval criteria but introduce noise that misguides the LLM in generating the correct answer. As a result, the model’s accuracy declines despite having access to more information. This is particularly problematic for knowledge-intensive tasks where correctly identifying relevant information is crucial.

Existing RAG systems employ a retriever to select the most relevant passages from a database, which the LLM then processes. Standard RAG implementations, however, typically limit the number of retrieved passages to around ten. This works well for shorter contexts but only scales efficiently when the number of passages increases. The issue becomes more pronounced when dealing with complex datasets with multiple relevant passages. Current approaches must adequately address the risks of introducing misleading or irrelevant information, which can diminish the quality of LLM responses.

Researchers from Google Cloud AI and the University of Illinois introduced innovative methods to improve the robustness and performance of RAG systems when using long-context LLMs. Their approach encompasses training-free and training-based methods designed to mitigate the impact of hard negatives. One of the key innovations is retrieval reordering, a training-free method that improves the sequence in which the retrieved passages are fed to the LLM. The researchers propose prioritizing passages with higher relevance scores at the beginning and end of the input sequence, thus focusing the LLM’s attention on the most important information. Also, training-based methods were introduced to enhance further the model’s ability to handle irrelevant data. These include implicit robustness fine-tuning and explicit relevance fine-tuning, both of which train the LLM to discern relevant information better and filter out misleading content.

Retrieval reordering is a relatively simple but effective approach that addresses the “lost-in-the-middle” phenomenon commonly observed in LLMs, where the model tends to focus more on the beginning and end of an input sequence while losing attention to the middle portions. By restructuring the input so that highly relevant information is placed at the edges of the sequence, the researchers improved the model’s ability to generate accurate responses. In addition, they explored implicit fine-tuning, which involves training the LLM with datasets containing noisy and potentially misleading information. This method encourages the model to become more resilient to such noise, making it more robust in practical applications. Explicit relevance fine-tuning goes one step further by teaching the LLM to actively analyze retrieved documents and identify the most relevant passages before generating an answer. This method enhances the LLM’s ability to distinguish between valuable and irrelevant information in complex, multi-document contexts.

The proposed methods demonstrated notable improvements in accuracy and robustness. The research showed that retrieval reordering improved the LLM’s accuracy by several percentage points, particularly when handling large sets of retrieved passages. For example, experiments on the Natural Questions dataset showed that increasing the number of retrieved passages initially improved accuracy. Still, performance declined after a certain point when hard negatives became too prevalent. The introduction of reordering and fine-tuning mitigated this issue, maintaining higher accuracy even as the number of passages increased. Notably, the accuracy with the Gemma-2-9B-Chat model improved by 5% when the reordering technique was applied to larger retrieval sets, demonstrating the technique’s effectiveness in real-world scenarios.

Key Takeaways from the Research:

A 5% improvement in accuracy was achieved by applying retrieval reordering to large sets of retrieved passages.

Explicit relevance fine-tuning enables the model to analyze and identify the most relevant information, improving accuracy in complex retrieval scenarios.

Implicit fine-tuning makes the LLM more robust against noisy and misleading data by training it with challenging datasets.

Retrieval reordering mitigates the “lost-in-the-middle” effect, helping the LLM focus on the most important passages at the beginning and end of the input sequence.

The methods introduced can be applied to improve the performance of long-context LLMs across various datasets, including Natural Questions and PopQA, where they were shown to improve accuracy consistently. 

In conclusion, this research offers practical solutions to the challenges of long-context LLMs in RAG systems. By introducing innovative methods like retrieval reordering and fine-tuning approaches, the researchers have demonstrated a scalable way to enhance the accuracy and robustness of these systems, making them more reliable for handling complex, real-world data.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Google AI Researchers Introduced a Set of New Methods for Enhancing Long-Context LLM Performance in Retrieval-Augmented Generation appeared first on MarkTechPost.

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance vid …

This post was co-written with Lucas Desard, Tom Lauwers, and Sam Landuydt from DPG Media.
DPG Media is a leading media company in Benelux operating multiple online platforms and TV channels. DPG Media’s VTM GO platform alone offers over 500 days of non-stop content.
With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enhancing video metadata such as actor information, genre, summary of episodes, the mood of the video, and more. Having descriptive metadata is key to providing accurate TV guide descriptions, improving content recommendations, and enhancing the consumer’s ability to explore content that aligns with their interests and current mood.
This post shows how DPG Media introduced AI-powered processes using Amazon Bedrock and Amazon Transcribe into its video publication pipelines in just 4 weeks, as an evolution towards more automated annotation systems.
The challenge: Extracting and generating metadata at scale
DPG Media receives video productions accompanied by a wide range of marketing materials such as visual media and brief descriptions. These materials often lack standardization and vary in quality. As a result, DPG Media Producers have to run a screening process to consume and understand the content sufficiently to generate the missing metadata, such as brief summaries. For some content, additional screening is performed to generate subtitles and captions.
As DPG Media grows, they need a more scalable way of capturing metadata that enhances the consumer experience on online video services and aids in understanding key content characteristics.
The following were some initial challenges in automation:

Language diversity – The services host both Dutch and English shows. Some local shows feature Flemish dialects, which can be difficult for some large language models (LLMs) to understand.
Variability in content volume – They offer a range of content volume, from single-episode films to multi-season series.
Release frequency – New shows, episodes, and movies are released daily.
Data aggregation – Metadata needs to be available at the top-level asset (program or movie) and must be reliably aggregated across different seasons.

Solution overview
To address the challenges of automation, DPG Media decided to implement a combination of AI techniques and existing metadata to generate new, accurate content and category descriptions, mood, and context.
The project focused solely on audio processing due to its cost-efficiency and faster processing time. Video data analysis with AI wasn’t required for generating detailed, accurate, and high-quality metadata.
The following diagram shows the metadata generation pipeline from audio transcription to detailed metadata.

The general architecture of the metadata pipeline consists of two primary steps:

Generate transcriptions of audio tracks: use speech recognition models to generate accurate transcripts of the audio content.
Generate metadata: use LLMs to extract and generate detailed metadata from the transcriptions.

In the following sections, we discuss the components of the pipeline in more detail.
Step 1. Generate transcriptions of audio tracks
To generate the necessary audio transcripts for metadata extraction, the DPG Media team evaluated two different transcription strategies: Whisper-v3-large, which requires at least 10 GB of vRAM and high operational processing, and Amazon Transcribe, a managed service with the added benefit of automatic model updates from AWS over time and speaker diarization. The evaluation focused on two key factors: price-performance and transcription quality.
To evaluate the transcription accuracy quality, the team compared the results against ground truth subtitles on a large test set, using the following metrics:

Word error rate (WER) – This metric measures the percentage of words that are incorrectly transcribed compared to the ground truth. A lower WER indicates a more accurate transcription.
Match error rate (MER) – MER assesses the proportion of correct words that were accurately matched in the transcription. A lower MER signifies better accuracy.
Word information lost (WIL) – This metric quantifies the amount of information lost due to transcription errors. A lower WIL suggests fewer errors and better retention of the original content.
Word information preserved (WIP) – WIP is the opposite of WIL, indicating the amount of information correctly captured. A higher WIP score reflects more accurate transcription.
Hits – This metric counts the number of correctly transcribed words, giving a straightforward measure of accuracy.

Both experiments transcribing audio yielded high-quality results without the need to incorporate video or further speaker diarization. For further insights into speaker diarization in other use cases, see Streamline diarization using AI as an assistive technology: ZOO Digital’s story.
Considering the varying development and maintenance efforts required by different alternatives, DPG Media chose Amazon Transcribe for the transcription component of their system. This managed service offered convenience, allowing them to concentrate their resources on obtaining comprehensive and highly accurate data from their assets, with the goal of achieving 100% qualitative precision.
Step 2. Generate metadata
Now that DPG Media has the transcription of the audio files, they use LLMs through Amazon Bedrock to generate the various categories of metadata (summaries, genre, mood, key events, and so on). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Through Amazon Bedrock, DPG Media selected the Anthropic Claude 3 Sonnet model based on internal testing, and the Hugging Face LMSYS Chatbot Arena Leaderboard for its reasoning and Dutch language performance. Working closely with end-consumers, the DPG Media team tuned the prompts to make sure the generated metadata matched the expected format and style.
After the team had generated metadata at the individual video level, the next step was to aggregate this metadata across an entire series of episodes. This was a critical requirement, because content recommendations on a streaming service are typically made at the series or movie level, rather than the episode level.
To generate summaries and metadata at the series level, the DPG Media team reused the previously generated video-level metadata. They fed the summaries in an ordered and structured manner, along with a specifically tailored system prompt, back through Amazon Bedrock to Anthropic Claude 3 Sonnet.
Using the summaries instead of the full transcriptions of the episodes was sufficient for high-quality aggregated data and was more cost-efficient, because many of DPG Media’s series have extended runs.
The solution also stores the direct association between each type of metadata and its corresponding system prompt, making it straightforward to tune, remove, or add prompts as needed—similar to the adjustments made during the development process. This flexibility allows them to tailor the metadata generation to evolving business requirements.
To evaluate the metadata quality, the team used reference-free LLM metrics, inspired by LangSmith. This approach used a secondary LLM to evaluate the outputs based on tailored metrics such as if the summary is simple to understand, if it contains all important events from the transcription, and if there are any hallucinations in the generated summary. The secondary LLM is used to evaluate the summaries on a large scale.
Results and lessons learned
The implementation of the AI-powered metadata pipeline has been a transformative journey for DPG Media. Their approach saves days of work generating metadata for a TV series.
DPG Media chose Amazon Transcribe for its ease of transcription and low maintenance, with the added benefit of incremental improvements by AWS over the years. For metadata generation, DPG Media chose Anthropic Claude 3 Sonnet on Amazon Bedrock, instead of building direct integrations to various model providers. The flexibility to experiment with multiple models was appreciated, and there are plans to try out Anthropic Claude Opus when it becomes available in their desired AWS Region.
DPG Media decided to strike a balance between AI and human expertise by having the results generated by the pipeline validated by humans. This approach was chosen because the results would be exposed to end-customers, and AI systems can sometimes make mistakes. The goal was not to replace people but to enhance their capabilities through a combination of human curation and automation.
Transforming the video viewing experience is not merely about adding more descriptions, it’s about creating a richer, more engaging user experience. By implementing AI-driven processes, DPG Media aims to offer better-recommended content to users, foster a deeper understanding of its content library, and progress towards more automated and efficient annotation systems. This evolution promises not only to streamline operations but also to align content delivery with modern consumption habits and technological advancements.
Conclusion
In this post, we shared how DPG Media introduced AI-powered processes using Amazon Bedrock into its video publication pipelines. This solution can help accelerate audio metadata extraction, create a more engaging user experience, and save time.
We encourage you to learn more about how to gain a competitive advantage with powerful generative AI applications by visiting Amazon Bedrock and trying this solution out on a dataset relevant to your business.

About the Authors
Lucas Desard is GenAI Engineer at DPG Media. He helps DPG Media integrate generative AI efficiently and meaningfully into various company processes.
Tom Lauwers is a machine learning engineer on the video personalization team for DPG Media. He builds and architects the recommendation systems for DPG Media’s long-form video platforms, supporting brands like VTM GO, Streamz, and RTL play.
Sam Landuydt is the Area Manager Recommendation & Search at DPG Media. As the manager of the team, he guides ML and software engineers in building recommendation systems and generative AI solutions for the company.
Irina Radu is a Prototyping Engagement Manager, part of AWS EMEA Prototyping and Cloud Engineering. She helps customers get the most out of the latest tech, innovate faster, and think bigger.
Fernanda Machado, AWS Prototyping Architect, helps customers bring ideas to life and use the latest best practices for modern applications.
Andrew Shved, Senior AWS Prototyping Architect, helps customers build business solutions that use innovations in modern applications, big data, and AI.

How SailPoint uses Anthropic’s Claude on Amazon Bedrock to automatic …

Generative AI is revolutionizing the way developers approach programming by providing intelligent assistance and automation throughout the coding process. With the power of advanced language models and machine learning (ML) algorithms, generative AI can understand the context and intent behind a programmer’s code, offering valuable suggestions, completing code snippets, and even generating entire functions or modules based on high-level descriptions. This technology empowers developers to focus on higher-level problem-solving and architecture, while the AI handles the tedious and repetitive aspects of coding. One of the key advantages of large language models (LLMs) in programming is their ability to learn from the vast amounts of existing code and programming patterns they were trained on. This knowledge allows them to generate context-aware code, detect potential bugs or vulnerabilities, and offer optimizations to improve code quality and performance.
In this post, we highlight how the AWS Generative AI Innovation Center collaborated with SailPoint Technologies to build a generative AI-based coding assistant that uses Anthropic’s Claude Sonnet on Amazon Bedrock to help accelerate the development of software as a service (SaaS) connectors.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
SailPoint specializes in enterprise identity security solutions. Over 3,000 enterprises worldwide use SailPoint to help defend against today’s dynamic, identity-centric cyber threats while enhancing productivity and efficiency. Their products are designed to manage and secure access to applications and data through the lens of identity, at speed and scale, for users inside an organization and for external parties such as non-employees. SailPoint’s unified, intelligent, and extensible environment provides comprehensive identity governance capabilities, including access certifications, policy management, access request and provisioning, password management, and data access governance. This helps organizations make sure the right individuals have the right access to the right resources at the right times, thereby enforcing security policies and compliance requirements. Founded in 2005, SailPoint has grown to be a key player in identity security, serving customers globally across various industries.
SailPoint connectors and SaaS connectivity
SailPoint’s identity security solutions interface with various software as a service (SaaS) applications to retrieve the necessary information, such as account and access information, from an identity security standpoint. Each SaaS application implements these functionalities in slightly different ways and might expose their implementation through REST-based web APIs that are typically supported by OpenAPI specifications. SailPoint connectors are TypeScript modules that interface with a SaaS application and map the relevant identity security information (such as accounts and entitlements) to a standardized format understood by SailPoint. Based on the APIs exposed by the application, SailPoint connectors can create, update, and delete access on those accounts. SailPoint connectors help manage user identities and their access rights across different environments within an organization, supporting the organization’s compliance and security efforts.
Although a typical connector exposes several functions, for this post, we focus on developing the list user function of a connector that connects to an API endpoint for listing users, retrieving all the users, and transforming them into the format required by SailPoint.
In the following sections, we detail how we used Anthropic’s Claude Sonnet on Amazon Bedrock to automatically create the list user connector, a critical component of the broader SaaS connectivity.
Understanding the list user connector
Connectors are modules that can connect to an external service and retrieve and update relevant information from a SaaS application. To better understand how connectors are built, we give an example of the connector function that connects to DocuSign’s REST API’s getUsers endpoint. The following TypeScript code defines an asynchronous function listUsers that retrieves a list of user accounts from an external API and constructs a structured output for each user:

// imports
import { Context, Response, StdAccountListHandler, StdAccountListOutput } from ‘@sailpoint/connector-sdk’;

const listUsers: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {
// retrieve api key and host url from context
let apiKey = context.apiKey;
let hostUrl = context.hostUrl;
let hasMore = true;

// url construction
let url = `https://${hostUrl}/Management/v2/organizations/${context.organizationId}/users`;

// loop through pages
while (hasMore) {
// fetch response from the endpoint
let response = await fetch(url, {
headers: {
‘Authorization’: `Bearer ${apiKey}`
}
});
let results = await response.json();

// processing users from response
let users = results.users;
for (const user of users) {
const output: StdAccountListOutput = {
identity: user.id,
attributes: {
user_name: user.user_name,
first_name: user.first_name,
last_name: user.last_name,
user_status: user.user_status,
membership_status: user.membership_status,
email: user.email,
created_on: user.created_on,
membership_created_on: user.membership_created_on,
ds_group_id: user.company_groups.map(group => group.ds_group_id),
ds_group_account_id: user.company_groups.map(group => group.ds_group_account_id)
}
};
}
// pagination
if (results.paging.next) {
url = results.paging.next;
} else {
hasMore = false;
}
}
}

The following is a breakdown of what each part of the code does:

Imports – The code imports several types and interfaces from @sailpoint/connector-sdk. These include Context, Response, StdAccountListHandler, and StdAccountListOutput, which are used to handle the input and output of the function in a standardized way within a SailPoint environment.
Function definition – listUsers is defined as an asynchronous function compatible with the StdAccountListHandler It uses the Context to access configuration details like API keys and the base URL, and a Response to structure the output.
Retrieve API key and host URL – These are extracted from the context parameter. They are used to authenticate and construct the request URL.
URL construction – The function constructs the initial URL using the hostUrl and organizationId from the context. This URL points to an endpoint that returns users associated with a specific organization.
Loop through pages – The while loop continues as long as there are more pages of data (hasMore is true). It serves the following functions:

Fetch data – Inside the while loop, a fetch request is made to the API endpoint. The request includes an Authorization header that uses the apiKey. The API’s response is converted to JSON format.
Process users – Inside the while loop, it extracts user data from the API response. The process loops through each user, constructing an StdAccountListOutput object for each one. This object includes user identifiers and attributes like user names, names, status, email, and group IDs.
Pagination – Inside the while loop, it checks if there is a next page URL in the pagination information (results.paging.next). If it exists, it updates the url for the next iteration of the loop. If not, it sets hasMore to false to stop the loop.

Understanding this example helps us understand the step-by-step process of building this function in a connector. We aim to reproduce this process using an LLM with a prompt chaining strategy.
Generate a TypeScript connector using an LLM prompt chain
There are several approaches to using pre-trained LLMs for code generation, with varying levels of complexity:

Single prompt – You can use models like Anthropic’s Claude to generate code by direct prompting. These models can generate code in a variety of languages, including TypeScript, but they don’t inherently possess domain-specific knowledge relevant to the task of building a connector. All the required information, including API specifications and formatting instructions, must be provided in the prompt, similar to the instructions that would be given to a developer. However, LLMs tend to struggle when given a long list of complex instructions. It’s also difficult for the prompt engineer to understand which steps are challenging for the LLM.
Agentic frameworks with LLMs – Agents are a sophisticated framework that can use tools to perform a sequence of complex tasks. In this case, the agent starts by breaking down the user requests into steps, searches for necessary information using tools (a knowledge base or web browser), and autonomously generates code from start to finish. Although they’re powerful, these frameworks are complex to implement, often unstable in their behavior, and less controllable compared to other methods. Agents also require many LLM calls to perform a task, which makes them rather slow in practice. In the case where the logic to perform a task is a fixed sequence of steps, agents are not an efficient option.
Prompt chain – A solution that finds a good trade-off between the two previous approaches involves using a prompt chaining technique. This method breaks the complex problem into a series of more manageable steps and integrates them to craft the final code. Each step has clear instructions that are easier for the LLM to follow, and a human in the loop can control the output of each step and correct the LLM if needed. This approach strikes a balance between flexibility and control, avoiding the extremes of the other two methods.

We initially tested the LLM’s ability to generate connector code based on a single prompt and realized that it struggles to generate code that addresses all aspects of the problem, such as pagination or nested data structures. To make sure the LLM would cover all the necessary components of the connector functions, and because creating a connector follows a fixed sequence of steps, prompt chaining was the most natural approach to improve the generated code.
The chain we used for connector generation consists of the following high-level steps:

Parse the data model of the API response into prescribed TypeScript classes.
Generate the function for user flattening in the format expected by the connector interface.
Understand the pagination of the API specs and formulate a high-level solution.
Generate the code for the ListUsers function by combining all the intermediate steps.

Step 1 is used as an input to Step 2, but Step 3 is separate. Both Step 2 and Step 3 results are fed to Step 4 for the final result. The following diagram illustrates this workflow.

In the following sections, we will dive into the prompting techniques we used for each of these steps.
System prompt
The system prompt is an essential component of LLM prompting that typically provides the initial context to guide the model’s response. For all the prompts in the chain, we used the following system prompt:

“””
You are an expert web developer who has an excellent understanding of REST APIs and backend
API development using TypeScript. Your goal is to understand API specification provided in
OpenAPI specification or Swagger specification as YAML files and to write TypeScript code,
XML, API descriptions etc. as requested in the task description. Do not deviate from the
provided specification and documentation and if something is unclear in the specification then
mention your uncertainty in the comments do not make up information that is not available in
the specs or description.

When writing TypeScript code, provide minimal but meaningful comments.
“””

More specifically, the system prompt is used to establish the role of the LLM (expert web developer), give it a general goal (understand API specs and write TypeScript code), give high-level instructions (add comments in the code) and set boundaries (do not make up information).
Data model parsing
In this step, we prompt the LLM to understand the structure of the API response and create TypeScript classes corresponding to the objects in the response. Although this step isn’t strictly necessary for generating the response, it can help the LLM immensely in generating a correct connector. Similar to chain-of-thought reasoning for arithmetic problems, it is forcing the LLM to “think” before responding.
This step offers two primary benefits:

Verbose API response simplification – API responses specified in the documentation can be quite verbose. By converting the response structure into TypeScript classes, we compress the information into fewer lines of code, making it more concise and less complicated for the LLM to comprehend. This step helps ensure that the essential information is prominently displayed at the start.
Handling fragmented user responses – In some APIs, the user response is composed of several fragments because of the reuse of data structures. The OpenAPI specification uses the $ref tag to reference these reusable components. By converting the user response into TypeScript classes, we can consolidate all the relevant information into a single location. This consolidation simplifies the downstream steps by providing a centralized source of information.

We use the following task prompt to convert the API response into prescribed TypeScript classes:

“””
You are given an API spec of OpenAPI or Swagger for a REST API endpoint
that serves a list of users for Software as a Service (SaaS) application. You will
be asked to extract the structure of the User in the JSON response from the API endpoint.

Here is the API spec provided between the XML tags <api-spec> </api-spec>.
Understand and remember the API spec well.

<api-spec>
{api_spec}
</api-spec>

Use the following instructions to create TypeScript interfaces based on the structure
of the User.
<instructions>
– Provide the code in between <data-model> </data-model> XML tags.
– If there are any nested objects, expand them into their own interfaces.
– Be comprehensive and include all attributes.
– Retrieve all attributes, including those marked as not mandatory, not required, or nullable.
– The attributes are listed under `properties` section.
– Output only one User interface that includes all the attributes from any interfaces it extends.
</instructions>

The expected format of the output is as follows:
<data-model>
// User
interface User {{
id: number;
first_name: string;
last_name: string;
email: string;
is_active: boolean;
company_groups_ids: number[];
other_attribute: string;
nested_type: NestedType[];
}}

// Some complex type
interface NestedType {{
type_id: string;
some_string_attribute: string;
some_number_attribute: number;
}}

</data-model>
“””

In the preceding prompt template, the variable {api_spec} is replaced with the API specification of the endpoint. A specific example for a DocuSign ListUsers endpoint is provided in the appendix.
The following code is an example of the LLM-generated classes when applied to the DocuSign API specs. This has been parsed out of the <data-model> tags.

// User
interface User {
id: string; // format: uuid
user_name: string;
first_name: string;
last_name: string;
user_status: string; // One of: ‘active’ | ‘created’ | ‘closed’
membership_status: string; // One of: ‘activation_required’ | ‘activation_sent’ | ‘active’ | ‘closed’ | ‘disabled’
email: string;
created_on: string; // format: date-time
membership_created_on: string; // format: date-time
ds_groups: DsGroup[];
}

// DsGroup
interface DsGroup {
ds_group_id: string; // format: uuid
account_id: string; // format: uuid
source_product_name: string;
group_id: string;
group_name: string;
description: string;
is_admin: boolean;
last_modified_on: string; // format: date-time
user_count: number; // format: int32
external_account_id: number; // format: int64
account_name: string;
membership_id: string; // format: uuid
}

User flattening function generation
The expected structure for each user is an object consisting of two properties: an identifier and a dictionary of attributes. The attributes dictionary is a map that associates string keys with either primitive attributes (number, Boolean, or string) or an array of primitive attributes. because of the potential for arbitrarily nested JSON object structures in the response, we use the capabilities of an LLM to generate a user flattening and conversion function. Both the user ID and the attributes are extracted from the response. By employing this approach, we effectively separate the intricate task of converting the user structure from the REST API response into the required format for the SailPoint connector SDK (hereafter referred to as the connector SDK).
The benefits of this approach are twofold. First, it allows for a cleaner and more modular code design, because the complex conversion process is abstracted away from the main code base. Second, it enables greater flexibility and adaptability, because the conversion function can be modified or regenerated to accommodate changes in the API response structure or the connector SDK requirements, without necessitating extensive modifications to the surrounding code base.
We use the following prompt to generate the conversion function, which takes as input the data model generated in the previous step:

“””
Understand and remember the following data model provided
as a set of TypeScript class definitions.
You will be required to generate a TypeScript function based on the
data model provided between <data-model> </data-model> XML tags.

<data-model>
{data_model}
</data-model>

DO NOT use the TypeScript interfaces defined above in the code you write,
as they will be unavailable. Substitute them with the type `any` where required
to write code that works with strict TypeScript.

Given the TypeScript class definitions and data model above, your goals is to
write a TypeScript function that transforms the user into an object containing two
keys – an `identity` and `attributes`. The attributes is a list of attributes
other than the identifier attribute.

Here are the rules for mapping attributes from the user object to attributes field:

<instructions-for-mapping>
– The function takes in a user and return StdAccountListOutput.
– Extract all attributes specified in the user schema above in the <data-model>
– attributes can only contain either primitives values or array of primitive values.
– Determine the type of the attribute from the <data-model> above. Do not guess it from the
names of the attributes. E.g. if the name is plural don’t assume it is an array.
– All primitive attributes such as string and numbers are copied without transformation.
– All arrays of primitive attributes are copied without transformation.
– All objects are flattened out and their attributes are copied as simple attributes.
– All arrays of objects are converted into multiple attributes where each attribute is an array of primitive type.
See further instruction below between the XML tags <object-array-instructions>.
– Use `any` type in functions and arrow function definitions so that it works with TypeScript.
</instructions-for-mapping>

<object-array-instructions>

Consider the following structure of the response where `obj_array` is an attribute that is array of objects of type `MyObj`.
Then in the flattening process, for the response, you will need to convert the `obj_array` attribute
into multiple attributes like `obj_array_attribute1`, `obj_array_attribute2` each of which is an array of strings in this case.

interface User {{
id: string; // format: uuid
user_name: string;
first_name: string;
obj_array: MyObject[];
}}

interface MyObj {{
attribute1: string; // format: uuid
attribute2: string; // format: uuid
attribute3: string;
}}
</object-array-instructions>

Output the code according to the template program below enclosed between the <user-mapping-function> XML tags.

This function is part of a larger program so keep the StdAccountListOutput in the function as it is.
Do not overcomplicate and write simple, well readable code.

<user-mapping-function>

function flattenUser(user: any): StdAccountListOutput {{
return {{
identity: user.id,
attributes: {{
first_name: user.first_name,
last_name: user.last_name,
user_type: user.type,
login: user.login,
email: user.email,
location: user.location,
employee_id: user.employee_id,
reviewer_id: user.reviewer_id,
is_active: user.is_active,
struct_array_attribute: user.struct_array.map( (val: any) => val.struct_attribute)
}}
}}
}};

</user-mapping-function>
“””

In the preceding prompt template, we replace the {data_model} variable with the data model of TypeScript classes extracted in the previous generation step of parsing the data model.
The following code is an example of the LLM-generated user flattening function when applied to the DocuSign API:

function flattenUser(user: any): StdAccountListOutput {
const identity = user.id;
const attributes: { [key: string]: any } = {};

// Copy primitive attributes
attributes.user_name = user.user_name;
attributes.first_name = user.first_name;
attributes.last_name = user.last_name;
attributes.user_status = user.user_status;
attributes.membership_status = user.membership_status;
attributes.email = user.email;
attributes.created_on = user.created_on;
attributes.membership_created_on = user.membership_created_on;

// Convert array of objects to multiple attributes
const dsGroupAttributes: { [key: string]: any[] } = {};
user.ds_groups.forEach((dsGroup: any) => {
dsGroupAttributes.ds_group_id = dsGroupAttributes.ds_group_id || [];
dsGroupAttributes.ds_group_id.push(dsGroup.ds_group_id);

dsGroupAttributes.account_id = dsGroupAttributes.account_id || [];
dsGroupAttributes.account_id.push(dsGroup.account_id);

dsGroupAttributes.source_product_name = dsGroupAttributes.source_product_name || [];
dsGroupAttributes.source_product_name.push(dsGroup.source_product_name);

dsGroupAttributes.group_id = dsGroupAttributes.group_id || [];
dsGroupAttributes.group_id.push(dsGroup.group_id);

dsGroupAttributes.group_name = dsGroupAttributes.group_name || [];
dsGroupAttributes.group_name.push(dsGroup.group_name);

dsGroupAttributes.description = dsGroupAttributes.description || [];
dsGroupAttributes.description.push(dsGroup.description);

dsGroupAttributes.is_admin = dsGroupAttributes.is_admin || [];
dsGroupAttributes.is_admin.push(dsGroup.is_admin);

dsGroupAttributes.last_modified_on = dsGroupAttributes.last_modified_on || [];
dsGroupAttributes.last_modified_on.push(dsGroup.last_modified_on);

dsGroupAttributes.user_count = dsGroupAttributes.user_count || [];
dsGroupAttributes.user_count.push(dsGroup.user_count);

dsGroupAttributes.external_account_id = dsGroupAttributes.external_account_id || [];
dsGroupAttributes.external_account_id.push(dsGroup.external_account_id);

dsGroupAttributes.account_name = dsGroupAttributes.account_name || [];
dsGroupAttributes.account_name.push(dsGroup.account_name);

dsGroupAttributes.membership_id = dsGroupAttributes.membership_id || [];
dsGroupAttributes.membership_id.push(dsGroup.membership_id);
});

Object.assign(attributes, dsGroupAttributes);

return {
identity,
attributes
};
}

Pagination understanding
As mentioned earlier, the REST API can implement one or more pagination schemes. Often, the pagination details aren’t explicitly mentioned. During the development of the chain, we found that when there are multiple pagination schemes, the LLM would mix up elements of different pagination schemes and output code that isn’t coherent and sometimes also contains errors. Because looping over the paged results is a crucial step, we separate out this step in the code generation to let the LLM understand the pagination scheme implemented by the API and formulate its response at a high level before outputting the code. This allows the LLM to think step by step in formulating the response. This step generates the intermediate reasoning, which is fed into the next and final step: generating the list users function code.
We use the following prompt to get the pagination logic. Because we’re using Anthropic’s Claude Sonnet on Amazon Bedrock, we ask the LLM to output the logic in XML format, which is known to be an efficient way to structure information for that model.

“””
Understand and remember the following OpenAPI specification provided between the
<api-spec> </api-spec> XML tags. You will answer questions based on this specification,
which describes an endpoint for listing users from a SaaS application.

<api-spec>
{api_spec}
</api-spec>

In addition to the specification, use the following information about the API to
understand the details that are not available in the spec. The details
are included in between XML tags <api-info> </api-info>.

<api-info>
{api_info}
</api_info>

The list users API is used to obtain a list of users from the API. This API
may implement one or more pagination schemes to retrieve the list of users.
Pagination is used to retrieve pages of results one at a time instead of
all at once.

Your goal is to understand the multiple pagination schemes implemented
in the API end point and use the simplest scheme to loop over the pages
to retrieve all the users.

First describe in details how the pagination is implemented by the API
in a step by step manner. Include the details of how to get the first
page and then how to get the subsequent pages. Provide the output in between
the XML tags <pagination-logic> </pagination-logic>. If the API does not
implement pagination then output “No pagination implemented”.

If the API implements multiple pagination schemes:
– Select the easiest scheme and implement using only that one scheme.
– If the API response contains a next URL either in the response body or the
headers, prefer that pagination scheme over more complex offset-based schemes.

If there is insufficient or missing information in the specs then do not
make up or guess it. Specify it in the output.

Output the results in the following format

<pagination-logic>
<pagination-schemes> list all the pagination schemes <pagination-schemes>
<selected-scheme> select one scheme to implement looping </selected-scheme>
<steps-first-page> steps to get first page </steps-first-page>
<steps-next> steps to get next page </steps-next>
<termination-logic> condition for exiting loop </termination-logic>
<information-complete> yes/no </information-complete>
</pagination-logic>
“””

In the preceding prompt template, the variable {api_spec} is replaced with the API specification. An example of the DocuSign API is provided in the appendix at the end of this post. The variable {api_info} can be replaced with additional API documentation in natural language, which is left as an empty string in the DocuSign example.
The following is the LLM’s response for the pagination logic extraction in the case of the DocuSign API, parsed out of the <pagination-logic> tags:

“””
<pagination-schemes>
1. Offset-based pagination using the start and take query parameters.
2. Index-based pagination using the start and end query parameters.
3. Link-based pagination using the next and previous links in the response.
</pagination-schemes>

<selected-scheme>
Link-based pagination using the next and previous links in the response.
</selected-scheme>

<steps-first-page>
1. Make a GET request to the /v2/organizations/{organizationId}/users endpoint without any pagination parameters.
2. The response will contain the first page of results and a next link for the next page (if there are more results).
</steps-first-page>

<steps-next>
1. Check if the response contains a next link.
2. If the next link is present, make a GET request to the URL provided in the next link.
3. The response will contain the next page of results and a next link for the subsequent page (if there are more results).
</steps-next>

<termination-logic>
Continue making requests to the next link until the response no longer contains a next link, indicating that there are no more pages.
</termination-logic>

<information-complete>
yes
</information-complete>
“””

ListUsers function generation
This final step in the chain combines the information extracted in the previous steps in addition to the user flattening function generated in the previous steps to formulate the final response, which is the TypeScript function that retrieves a list of users from the provided API.
We use the following prompt to generate the complete TypeScript function:

“””
Your task is to write a TypeScript program to retrieve a list of users from REST API. Detailed
instructions for the task are provided below. The API typically returns a set of users at a
time, you will have to paginate through the results to retrieve all users while there are more
results remaining.

Understand and remember the following OpenAPI YAML specification provided between the
<api-spec> </api-spec> XML tags you will be asked to write a TypeScript program based on it.

<api-spec>
{api_spec}
</api-spec>

In addition to the specs the following documentation and implementation details about the
API are provided in between the XML tags <api-info> </api-info>.

<api-info>
{api_info}
</api_info>

The following pagination logic specified in between <pagination-logic> </pagination-logic> XML
tags provides high level details on how to implement looping over paginated results to retrieve
all the users. Select the pagination according to the preference mentions in the pagination logic below.

<pagination-logic>
{pagination_logic}
</pagination-logic>

Now, implement a TypeScript function that retrieves all the users following the instructions below
<instructions>
– Do not modify the `flattenUser` function and reproduce it as it is.
– Change only the BODY of `listUsers` function but do not modify the function signature as it is
part of the large program that expects this function signature.
– Do not create or invent any new types.
– For pagination adhere to the instructions provided in <pagination-logic> and <api-spec>
– If pagination requires number of results, then fetch 50 users at a time.
– Assume that `context.hostUrl` from the input context will provide the API hostname and port in the
format “hostname:port_number” or “host_ip_address:port_number”.
– If the API uses a Link based pagination then you typically do not need to reconstruct the url using
`hostUrl` as the complete link will be provided in the next url or in the link headers.
– replace the `endpoint_url` with the endpoint url provided in the specs.
– You can assume that any required values such as `apiKey` will be provided in the `context` variable.
– Change the authorization headers to match the information in the headers.
– Assume that any parameters that are required to construct the path are supplied through `context` variable. The names of such variable
will match exactly with the variable specified in the templated path in the API specs.
</instructions>

Here is the example output

<example-template>
import {{ Context, Response, StdAccountListHandler, StdAccountListOutput }} from ‘@sailpoint/connector-sdk’;

{flatten_user_function}

const listUsers: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {{
let apiKey = context.apiKey;
let hostUrl = context.hostUrl;
let hasMore = true;

let url = ‘http://${{hostUrl}}/{{endpoint_path}}’;
while (hasMore) {{

// fetch response from the endpoint
let response = await fetch(url, {{
headers: {{
‘Authorization’: `${{apiKey}}`
}}
}});
let results = await response.json();

// get the users from the returned results
let users = results.users;
for (let user of users) {{
flattened_user = flattenUser(user)
res.send(flattened_user);
}}

// Set the url to the “next” URL from the response if it is available
if (results.next) {{
url = results.next;
}} else {{
hasMore = false;
}}
}}
}}
</example-template>

Output the generated code between the XML tags <list-users-code> </list-users-code> and
only output the code without any additional formatting like triple quotes.
“””

In this prompt, we replace {flatten_user_function} with the flattenUser that was generated earlier and {pagination_logic} with the one that was generated earlier. We provide a template for the listUsers function to make sure the final output meets the requirements for the connector function. The resulting output is the following listUsers function, which uses the flattenUser function from earlier:

const listAccounts: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {
let apiKey = context.apiKey;
let hostUrl = context.hostUrl;
let organizationId = context.organizationId;
let hasMore = true;

let url = `https://${hostUrl}/Management/v2/organizations/${organizationId}/users`;
while (hasMore) {

// fetch response from the endpoint
let response = await fetch(url, {
headers: {
‘Authorization’: `Bearer ${apiKey}`
}
});
let results = await response.json();

// get the users from the returned results
let users = results.users;
for (let user of users) {
let flattened_user = flattenUser(user)
res.send(flattened_user);
}

// Set the url to the “next” URL from the response if it is available
if (results.paging.next) {
url = results.paging.next;
} else {
hasMore = false;
}
}
}

Lessons learned
In this post, we demonstrated how LLMs can address complex code generation problems by employing various core prompting principles and the prompt chaining technique. Although LLMs excel at following clearly defined instructions and generating small code snippets, this use case involved a substantial amount of contextual information in the form of API specifications and user instructions. Our findings from this exercise are the following:

Decomposing complex problems – Breaking down a complex code generation problem into several intermediate steps of lower complexity enhances the LLM’s performance. Providing a single complex prompt can result in the LLM missing some instructions. The prompt chaining approach enhances the robustness of the generation, maintaining better adherence to instructions.
Iterative optimization – This method allows for iterative optimization of intermediate steps. Each part of the chain can be refined independently before moving to the next step. LLMs can be sensitive to minor changes in instructions, and adjusting one aspect can unintentionally affect other objectives. Prompt chaining offers a systematic way to optimize each step independently.
Handling complex decisions – In the section on understanding pagination, we illustrated how LLMs can reason through various options and make complex decisions before generating code. For instance, when the input API specification supports multiple pagination schemes, we prompted the LLM to decide on the pagination approach before implementing the code. With direct code generation, without using an intermediate reasoning step, the LLM tended to mix elements of different pagination schemes, resulting in inconsistent output. By forcing decision-making first, in natural language, we achieved more consistent and accurate code generation.

Through automated code generation, SailPoint was able to dramatically reduce connector development time from hours or days to mere minutes. The approach also democratizes code development, so you don’t need deep TypeScript expertise or intimate familiarity with SailPoint’s connector SDK. By accelerating connector generation, SailPoint significantly shortens the overall customer onboarding process. This streamlined workflow not only saves valuable developer time but also enables faster integration of diverse systems, ultimately allowing customers to use SailPoint’s identity security solutions more rapidly and effectively.
Conclusion
Our AI-powered solution for generating connector code opens up new possibilities for integrating with REST APIs. By automating the creation of connectors from API specifications, developers can rapidly build robust connections to any REST API, saving developer time and reducing the time to value for onboarding new customers. As demonstrated in this post, this technology can significantly streamline the process of working with diverse APIs, allowing teams to focus on using the data and functionality these APIs provide rather than getting overwhelmed by connector code details. Consider how such a solution could enhance your own API integration efforts—it could be the key to more efficient and effective use of the myriad APIs available in today’s interconnected digital landscape.

About the Authors
Erik Huckle is the product lead for AI at SailPoint, where he works to solve critical customer problems in the identity security ecosystem through generative AI and data technologies. Prior to SailPoint, Erik co-founded a startup in robotic automation and later joined AWS as the first product hire at Amazon One. Erik mentors local startups and serves as a board member and tech committee lead for a EdTech nonprofit organization.
Tyler McDonnell is the engineering head of AI at SailPoint, where he leads the development of AI solutions to drive innovation and impact in identity security world. Prior to SailPoint, Tyler led machine learning research and engineering teams at several early to late-stage startups and published work in domains spanning software maintenance, information retrieval, and deep learning. He’s passionate about building products that use AI to bring positive impact to real people and problems.
Anveshi Charuvaka is a Senior Applied Scientist at the Generative AI Innovation Center, where he helps customers adopt Generative AI by implementing solutions for their critical business challenges. With a PhD in Machine Learning and over a decade of experience, he specializes in applying innovative machine learning and generative AI techniques to address complex real-world problems.
Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.
Mofijul Islam is an Applied Scientist II at the AWS Generative AI Innovation Center, where he helps customers tackle complex, customer-centric research challenges using generative AI, large language models (LLM), multi-agent learning, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual NLP, and multitask learning. His research has been published in top-tier conferences like NeurIPS, ICLR, AISTATS, and AAAI, as well as IEEE and ACM Transactions.
Yasin Khatami is a Senior Applied Scientist at the Generative AI Innovation Center. With more than a decade of experience in artificial intelligence (AI), he implements state-of-the-art AI products for AWS customers to drive efficiency and value for customer platforms. His expertise is in generative AI, large language models (LLM), multi-agent techniques, and multimodal learning.
Karthik Ram is a Principal Solutions Architect with Amazon Web Services based in Columbus, Ohio. He works with Independent Software Vendors (ISVs) to build secure and innovative cloud solutions, including helping with their products and solving their business problems using data-driven approaches. Karthik’s area of depth is Cloud Security with a focus on Infrastructure Security and threat detection.

Appendix
The following API specifications were used for the experiments in this post:

Copyright (c) 2017- DocuSign, Inc. (https://www.docusign.com)
swagger: ‘2.0’
info:
title: DocuSign API
version: v2.1
host: api.docusign.net
basePath: “/Management”
schemes:
– https
consumes:
– application/json
produces:
– application/json
paths:
“/v2/organizations/{organizationId}/users”:
get:
tags:
– Users
summary: Returns information about the users in an organization.
description: |-
Returns information about the users in an organization.

You must include at least one of the following query parameters in the request:

– `account_id`: The ID of an account associated with the organization.
– `organization_reserved_domain_id`: The ID of one of the organization’s reserved domains.
– `email`: An email address associated with the users that you want to return.
operationId: OrganizationUser_OrganizationUsers_GetV2
produces:
– application/json
parameters:
– name: organizationId
in: path
description: The organization ID Guid
required: true
type: string
format: uuid
– name: start
in: query
description: Index of first item to include in the response. The default value
is 0.
required: false
type: integer
format: int32
– name: take
in: query
description: Page size of the response. The default value is 20.
required: false
type: integer
format: int32
– name: end
in: query
description: Index of the last item to include in the response. Ignored if
`take` parameter is specified.
required: false
type: integer
format: int32
– name: email
in: query
description: Email address of the desired user. At least one of `email`, `account_id`
or `organization_reserved_domain_id` must be specified.
required: false
type: string
– name: email_user_name_like
in: query
description: Selects users by pattern matching on the user’s email address
required: false
type: string
– name: status
in: query
description: Status.
required: false
type: string
– name: membership_status
in: query
description: |
The user’s membership status. One of:

– `activation_required`
– `activation_sent`
– `active`
– `closed`
– `disabled`
required: false
type: string
– name: account_id
in: query
description: Select users that are members of the specified account. At least
one of `email`, `account_id` or `organization_reserved_domain_id` must be
specified.
required: false
type: string
format: uuid
– name: organization_reserved_domain_id
in: query
description: Select users that are in the specified domain. At least one of
`email`, `account_id` or `organization_reserved_domain_id` must be specified.
required: false
type: string
format: uuid
– name: last_modified_since
in: query
description: Select users whose data have been modified since the date specified.
`account_id` or `organization_reserved_domain_id` must be specified.
required: false
type: string
responses:
‘200’:
description: OK
schema:
type: object
properties:
users:
type: array
items:
type: object
properties:
id:
format: uuid
type: string
example: 00000000-0000-0000-0000-000000000000
description: The user’s unique ID.
user_name:
type: string
description: The full name of the user.
first_name:
type: string
description: The user’s first name.
last_name:
type: string
description: The user’s last name.
user_status:
type: string
description: |
The user’s status. One of:

– `active`
– `created`
– `closed`
membership_status:
type: string
description: |
The user’s membership status. One of:

– `activation_required`
– `activation_sent`
– `active`
– `closed`
– `disabled`
email:
type: string
description: The email address.
created_on:
format: date-time
type: string
description: The date the user’s account was created.
membership_created_on:
format: date-time
type: string
description: The date on which the user became a member of the organization.
ds_groups:
type: array
items:
type: object
properties:
ds_group_id:
format: uuid
type: string
example: 00000000-0000-0000-0000-000000000000
description: ”
account_id:
format: uuid
type: string
example: 00000000-0000-0000-0000-000000000000
description: Select users that are members of the specified account. At least
one of `email`, `account_id` or `organization_reserved_domain_id` must be
specified.
source_product_name:
type: string
group_id:
type: string
group_name:
type: string
description:
type: string
is_admin:
type: boolean
last_modified_on:
format: date-time
type: string
user_count:
format: int32
type: integer
external_account_id:
format: int64
type: integer
account_name:
type: string
membership_id:
format: uuid
type: string
example: 00000000-0000-0000-0000-000000000000
description: Information about a user.
description: A list of users.
paging:
type: object
properties:
result_set_size:
format: int32
type: integer
description: The number of items in a result set (page).
result_set_start_position:
format: int32
type: integer
description: The index position of the first result in this set.
result_set_end_position:
format: int32
type: integer
description: The index position of the last result in this set.
total_set_size:
format: int32
type: integer
description: The total number of results.
next:
type: string
description: ‘A URL to the next set of results. ‘
previous:
type: string
description: ‘A URL to the previous set of results. ‘
description: Contains information about paging through the results.
description: A response containing information about users.

Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

In today’s rapidly changing world, monitoring the health of our planet’s vegetation is more critical than ever. Vegetation plays a crucial role in maintaining an ecological balance, providing sustenance, and acting as a carbon sink. Traditionally, monitoring vegetation health has been a daunting task. Methods such as field surveys and manual satellite data analysis are not only time-consuming, but also require significant resources and domain expertise. These traditional approaches are cumbersome. This often leads to delays in data collection and analysis, making it difficult to track and respond swiftly to environmental changes. Furthermore, the high costs associated with these methods limit their accessibility and frequency, hindering comprehensive and ongoing global vegetation monitoring efforts at a planetary scale. In light of these challenges, we have developed an innovative solution to streamline and enhance the efficiency of vegetation monitoring processes on a global scale.
Transitioning from the traditional, labor-intensive methods of monitoring vegetation health, Amazon SageMaker geospatial capabilities offer a streamlined, cost-effective solution. Amazon SageMaker supports geospatial machine learning (ML) capabilities, allowing data scientists and ML engineers to build, train, and deploy ML models using geospatial data. These geospatial capabilities open up a new world of possibilities for environmental monitoring. With SageMaker, users can access a wide array of geospatial datasets, efficiently process and enrich this data, and accelerate their development timelines. Tasks that previously took days or even weeks to accomplish can now be done in a fraction of the time.
In this post, we demonstrate the power of SageMaker geospatial capabilities by mapping the world’s vegetation in under 20 minutes. This example not only highlights the efficiency of SageMaker, but also its impact how geospatial ML can be used to monitor the environment for sustainability and conservation purposes.
Identify areas of interest
We begin by illustrating how SageMaker can be applied to analyze geospatial data at a global scale. To get started, we follow the steps outlined in Getting Started with Amazon SageMaker geospatial capabilities. We start with the specification of the geographical coordinates that define a bounding box covering the areas of interest. This bounding box acts as a filter to select only the relevant satellite images that cover the Earth’s land masses.

import os
import json
import time
import boto3
import geopandas
from shapely.geometry import Polygon
import leafmap.foliumap as leafmap
import sagemaker
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
sg_client = session.client(service_name=”sagemaker-geospatial”)
cooridinates =[
[-179.034845, -55.973798],
[179.371094, -55.973798],
[179.371094, 83.780085],
[-179.034845, 83.780085],
[-179.034845, -55.973798]
]
polygon = Polygon(cooridinates)
world_gdf = geopandas.GeoDataFrame(index=[0], crs=’epsg:4326′, geometry=[polygon])
m = leafmap.Map(center=[37, -119], zoom=4)
m.add_basemap(‘Esri.WorldImagery’)
m.add_gdf(world_gdf, layer_name=”AOI”, style={“color”: “red”})
m

Data acquisition
SageMaker geospatial capabilities provide access to a wide range of public geospatial datasets, including Sentinel-2, Landsat 8, Copernicus DEM, and NAIP. For our vegetation mapping project, we’ve selected Sentinel-2 for its global coverage and update frequency. The Sentinel-2 satellite captures images of Earth’s land surface at a resolution of 10 meters every 5 days. We pick the first week of December 2023 in this example. To make sure we cover most of the visible earth surface, we filter for images with less than 10% cloud coverage. This way, our analysis is based on clear and reliable imagery.

search_rdc_args = {
“Arn”: “arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8”, # sentinel-2 L2A
“RasterDataCollectionQuery”: {
“AreaOfInterest”: {
“AreaOfInterestGeometry”: {
“PolygonGeometry”: {
“Coordinates”: [
[
[-179.034845, -55.973798],
[179.371094, -55.973798],
[179.371094, 83.780085],
[-179.034845, 83.780085],
[-179.034845, -55.973798]
]
]
}
}
},
“TimeRangeFilter”: {
“StartTime”: “2023-12-01T00:00:00Z”,
“EndTime”: “2023-12-07T23:59:59Z”,
},
“PropertyFilters”: {
“Properties”: [{“Property”: {“EoCloudCover”: {“LowerBound”: 0, “UpperBound”: 10}}}],
“LogicalOperator”: “AND”,
},
}
}

s2_items = []
s2_tile_ids = []
s2_geometries = {
‘id’: [],
‘geometry’: [],
}
while search_rdc_args.get(“NextToken”, True):
search_result = sg_client.search_raster_data_collection(**search_rdc_args)
for item in search_result[“Items”]:
s2_id = item[‘Id’]
s2_tile_id = s2_id.split(‘_’)[1]
# filtering out tiles cover the same area
if s2_tile_id not in s2_tile_ids:
s2_tile_ids.append(s2_tile_id)
s2_geometries[‘id’].append(s2_id)
s2_geometries[‘geometry’].append(Polygon(item[‘Geometry’][‘Coordinates’][0]))
del item[‘DateTime’]
s2_items.append(item)

search_rdc_args[“NextToken”] = search_result.get(“NextToken”)

print(f”{len(s2_items)} unique Sentinel-2 images found.”)

By utilizing the search_raster_data_collection function from SageMaker geospatial, we identified 8,581 unique Sentinel-2 images taken in the first week of December 2023. To validate the accuracy in our selection, we plotted the footprints of these images on a map, confirming that we had the correct images for our analysis.

s2_gdf = geopandas.GeoDataFrame(s2_geometries)
m = leafmap.Map(center=[37, -119], zoom=4)
m.add_basemap(‘OpenStreetMap’)
m.add_gdf(s2_gdf, layer_name=”Sentinel-2 Tiles”, style={“color”: “blue”})
m

SageMaker geospatial processing jobs
When querying data with SageMaker geospatial capabilities, we received comprehensive details about our target images, including the data footprint, properties around spectral bands, and hyperlinks for direct access. With these hyperlinks, we can bypass traditional memory and storage-intensive methods of first downloading and subsequently processing images locally—a task made even more daunting by the size and scale of our dataset, spanning over 4 TB. Each of the 8,000 images are large in size, have multiple channels, and are individually sized at approximately 500 MB. Processing multiple terabytes of data on a single machine would be time-prohibitive. Although setting up a processing cluster is an alternative, it introduces its own set of complexities, from data distribution to infrastructure management. SageMaker geospatial streamlines this with Amazon SageMaker Processing. We use the purpose-built geospatial container with SageMaker Processing jobs for a simplified, managed experience to create and run a cluster. With just a few lines of code, you can scale out your geospatial workloads with SageMaker Processing jobs. You simply specify a script that defines your workload, the location of your geospatial data on Amazon Simple Storage Service (Amazon S3), and the geospatial container. SageMaker Processing provisions cluster resources for you to run city-, country-, or continent-scale geospatial ML workloads.
For our project, we’re using 25 clusters, with each cluster comprising 20 instances, to scale out our geospatial workload. Next, we divided the 8,581 images into 25 batches for efficient processing. Each batch contains approximately 340 images. These batches are then evenly distributed across the machines in a cluster. All batch manifests are uploaded to Amazon S3, ready for the processing job, so each segment is processed swiftly and efficiently.

def s2_item_to_relative_metadata_url(item):
parts = item[“Assets”][“visual”][“Href”].split(“/”)
tile_prefix = parts[4:-1]
return “{}/{}.json”.format(“/”.join(tile_prefix), item[“Id”])

num_jobs = 25
num_instances_per_job = 20 # maximum 20

manifest_list = {}
for idx in range(num_jobs):
manifest = [{“prefix”: “s3://sentinel-cogs/sentinel-s2-l2a-cogs/”}]
manifest_list[idx] = manifest
# split the manifest for N processing jobs
for idx, item in enumerate(s2_items):
job_idx = idx%num_jobs
manifest_list[job_idx].append(s2_item_to_relative_metadata_url(item))

# upload the manifest to S3
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()
s3_prefix = ‘processing_job_demo’
s3_client = boto3.client(“s3”)
s3 = boto3.resource(“s3”)

manifest_dir = “manifests”
os.makedirs(manifest_dir, exist_ok=True)

for job_idx, manifest in manifest_list.items():
manifest_file = f”{manifest_dir}/manifest{job_idx}.json”
s3_manifest_key = s3_prefix + “/” + manifest_file
with open(manifest_file, “w”) as f:
json.dump(manifest, f)

s3_client.upload_file(manifest_file, s3_bucket_name, s3_manifest_key)
print(“Uploaded {} to {}”.format(manifest_file, s3_manifest_key))

With our input data ready, we now turn to the core analysis that will reveal insights into vegetation health through the Normalized Difference Vegetation Index (NDVI). NDVI is calculated from the difference between Near-infrared (NIR) and Red reflectances, normalized by their sum, yielding values that range from -1 to 1. Higher NDVI values signal dense, healthy vegetation, a value of zero indicates no vegetation, and negative values usually point to water bodies. This index serves as a critical tool for assessing vegetation health and distribution. The following is an example of what NDVI looks like.

%%writefile scripts/compute_vi.py

import os
import rioxarray
import json
import gc
import warnings

warnings.filterwarnings(“ignore”)

if __name__ == “__main__”:
print(“Starting processing”)

input_path = “/opt/ml/processing/input”
output_path = “/opt/ml/processing/output”
input_files = []
items = []
for current_path, sub_dirs, files in os.walk(input_path):
for file in files:
if file.endswith(“.json”):
full_file_path = os.path.join(input_path, current_path, file)
input_files.append(full_file_path)
with open(full_file_path, “r”) as f:
items.append(json.load(f))

print(“Received {} input files”.format(len(input_files)))

for item in items:
print(“Computing NDVI for {}”.format(item[“id”]))
red_band_url = item[“assets”][“red”][“href”]
nir_band_url = item[“assets”][“nir”][“href”]
scl_mask_url = item[“assets”][“scl”][“href”]
red = rioxarray.open_rasterio(red_band_url, masked=True)
nir = rioxarray.open_rasterio(nir_band_url, masked=True)
scl = rioxarray.open_rasterio(scl_mask_url, masked=True)
scl_interp = scl.interp(
x=red[“x”], y=red[“y”]
) # interpolate SCL to the same resolution as Red and NIR bands

# mask out cloudy pixels using SCL (https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview)
# class 8: cloud medium probability
# class 9: cloud high probability
# class 10: thin cirrus
red_cloud_masked = red.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))
nir_cloud_masked = nir.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))

ndvi = (nir_cloud_masked – red_cloud_masked) / (nir_cloud_masked + red_cloud_masked)
# save the ndvi as geotiff
s2_tile_id = red_band_url.split(“/”)[-2]
file_name = f”{s2_tile_id}_ndvi.tif”
output_file_path = f”{output_path}/{file_name}”
ndvi.rio.to_raster(output_file_path)
print(“Written output: {}”.format(output_file_path))

# keep memory usage low
del red
del nir
del scl
del scl_interp
del red_cloud_masked
del nir_cloud_masked
del ndvi

gc.collect()

Now we have the compute logic defined, we’re ready to start the geospatial SageMaker Processing job. This involves a straightforward three-step process: setting up the compute cluster, defining the computation specifics, and organizing the input and output details.
First, to set up the cluster, we decide on the number and type of instances required for the job, making sure they’re well-suited for geospatial data processing. The compute environment itself is prepared by selecting a geospatial image that comes with all commonly used packages for processing geospatial data.
Next, for the input, we use the previously created manifest that lists all image hyperlinks. We also designate an S3 location to save our results.
With these elements configured, we’re able to initiate multiple processing jobs at once, allowing them to operate concurrently for efficiency.

from multiprocessing import Process
import sagemaker
import boto3
from botocore.config import Config
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

role = get_execution_role()
geospatial_image_uri = ‘081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest’
# use the retry behaviour of boto3 to avoid throttling issue
sm_boto = boto3.client(‘sagemaker’, config=Config(connect_timeout=5, read_timeout=60, retries={‘max_attempts’: 20}))
sagemaker_session = sagemaker.Session(sagemaker_client = sm_boto)

def run_job(job_idx):
s3_manifest = f”s3://{s3_bucket_name}/{s3_prefix}/{manifest_dir}/manifest{job_idx}.json”
s3_output = f”s3://{s3_bucket_name}/{s3_prefix}/output”
script_processor = ScriptProcessor(
command=[‘python3′],
image_uri=geospatial_image_uri,
role=role,
instance_count=num_instances_per_job,
instance_type=’ml.m5.xlarge’,
base_job_name=f’ca-s2-nvdi-{job_idx}’,
sagemaker_session=sagemaker_session,
)

script_processor.run(
code=’scripts/compute_vi.py’,
inputs=[
ProcessingInput(
source=s3_manifest,
destination=’/opt/ml/processing/input/’,
s3_data_type=’ManifestFile’,
s3_data_distribution_type=”ShardedByS3Key”
),
],
outputs=[
ProcessingOutput(
source=’/opt/ml/processing/output/’,
destination=s3_output,
s3_upload_mode=’Continuous’
)
],
)
time.sleep(2)

processes = []
for idx in range(num_jobs):
p = Process(target=run_job, args=(idx,))
processes.append(p)
p.start()

for p in processes:
p.join()

After you launch the job, SageMaker automatically spins up the required instances and configures the cluster to process the images listed in your input manifest. This entire setup operates seamlessly, without needing your hands-on management. To monitor and manage the processing jobs, you can use the SageMaker console. It offers real-time updates on the status and completion of your processing tasks. In our example, it took under 20 minutes to process all 8,581 images with 500 instances. The scalability of SageMaker allows for faster processing times if needed, simply by increasing the number of instances.

Conclusion
The power and efficiency of SageMaker geospatial capabilities have opened new doors for environmental monitoring, particularly in the realm of vegetation mapping. Through this example, we showcased how to process over 8,500 satellite images in less than 20 minutes. We not only demonstrated the technical feasibility, but also showcased the efficiency gains from using the cloud for environmental analysis. This approach illustrates a significant leap from traditional, resource-intensive methods to a more agile, scalable, and cost-effective approach. The flexibility to scale processing resources up or down as needed, combined with the ease of accessing and analyzing vast datasets, positions SageMaker as a transformative tool in the field of geospatial analysis. By simplifying the complexities associated with large-scale data processing, SageMaker enables scientists, researchers, and businesses stakeholders to focus more on deriving insights and less on infrastructure and data management.
As we look to the future, the integration of ML and geospatial analytics promises to further enhance our understanding of the planet’s ecological systems. The potential to monitor changes in real time, predict future trends, and respond with more informed decisions can significantly contribute to global conservation efforts. This example of vegetation mapping is just the beginning for running planetary-scale ML. See Amazon SageMaker geospatial capabilities to learn more.

About the Author
Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes LLM evaluation and data generation. In his spare time, he enjoys running, playing basketball and spending time with his family.
Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker geospatial ML team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries and Broadway shows.
Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.
Li Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.
Amit Modi is the product leader for SageMaker MLOps, ML Governance, and Responsible AI at AWS. With over a decade of B2B experience, he builds scalable products and teams that drive innovation and deliver value to customers globally.
Kris Efland is a visionary technology leader with a successful track record in driving product innovation and growth for over 20 years. Kris has helped create new products including consumer electronics and enterprise software across many industries, at both startups and large companies. In his current role at Amazon Web Services (AWS), Kris leads the Geospatial AI/ML category. He works at the forefront of Amazon’s fastest-growing ML service, Amazon SageMaker, which serves over 100,000 customers worldwide. He recently led the launch of Amazon SageMaker’s new geospatial capabilities, a powerful set of tools that allow data scientists and machine learning engineers to build, train, and deploy ML models using satellite imagery, maps, and location data. Before joining AWS, Kris was the Head of Autonomous Vehicle (AV) Tools and AV Maps for Lyft, where he led the company’s autonomous mapping efforts and toolchain used to build and operate Lyft’s fleet of autonomous vehicles. He also served as the Director of Engineering at HERE Technologies and Nokia and has co-founded several startups..

Google AI Introduces Gemma-APS: A Collection of Gemma Models for Text- …

The increasing reliance on machine learning models for processing human language comes with several hurdles, such as accurately understanding complex sentences, segmenting content into comprehensible parts, and capturing the contextual nuances present in multiple domains. In this landscape, the demand for models capable of breaking down intricate pieces of text into manageable, proposition-level components has never been more pronounced. This capability is particularly critical in improving language models used for summarization, information retrieval, and various other NLP tasks.

Google AI Releases Gemma-APS, a collection of Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro models applied to multi-domain synthetic data, which includes textual data generated to simulate different scenarios and language complexities. This approach of using synthetic data is essential as it allows the models to train on diverse sentence structures and domains, making them adaptable across multiple applications. Gemma-APS models were meticulously designed to convert a continuous text into smaller proposition units, making it more actionable for subsequent NLP tasks, such as sentiment analysis, chatbot applications, or retrieval-augmented generation (RAG). With this release, Google AI is hoping to make text segmentation more accessible, with models optimized to run on varied computational resources.

Technically, Gemma-APS is characterized by its use of distilled models from the Gemini Pro series, which were originally tailored to deliver high performance in multi-domain text analysis. The distillation process involves compressing these powerful models into smaller, more efficient versions without compromising their segmentation quality. These models are now available as Gemma-7B-APS-IT and Gemma-2B-APS-IT on Hugging Face, catering to different needs in terms of computational efficiency and accuracy. The use of multi-domain synthetic data ensures that these models have been exposed to a broad spectrum of language inputs, thereby enhancing their robustness and adaptability. As a result, Gemma-APS models can efficiently handle complex texts, segmenting them into meaningful propositions that encapsulate the underlying information, a feature highly beneficial in improving downstream tasks like summarization, comprehension, and classification.

The importance of Gemma-APS is reflected not only in its versatility but also in its high level of performance across diverse datasets. Google AI has leveraged synthetic data from multiple domains to finetune these models, ensuring that they excel in real-world applications such as technical document parsing, customer service interactions, and knowledge extraction from unstructured texts. Initial evaluations demonstrate that Gemma-APS consistently outperforms previous segmentation models in terms of accuracy and computational efficiency. For instance, it achieves notable improvements in capturing propositional boundaries within complex sentences, enabling subsequent language models to work more effectively. This advancement also reduces the risk of semantic drift during text analysis, which is crucial for applications where retaining the original meaning of each text fragment is critical.

In conclusion, Google AI’s release of Gemma-APS marks a significant milestone in the evolution of text segmentation technologies. By using an effective distillation technique combined with multi-domain synthetic training, these models offer a blend of performance and efficiency that addresses many of the existing limitations in NLP applications. They are poised to be game changers in how language models interpret and break down complex texts, allowing for more effective information retrieval and summarization across multiple domains.

Check out the Models here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Google AI Introduces Gemma-APS: A Collection of Gemma Models for Text-to-Propositions Segmentation appeared first on MarkTechPost.

A New Study by OpenAI Explores How Users’ Names can Impact ChatGPT …

Bias in AI-powered systems like chatbots remains a persistent challenge, particularly as these models become more integrated into our daily lives. A pressing issue concerns biases that can manifest when chatbots respond differently to users based on name-related demographic indicators, such as gender or race. Such biases can undermine trust, especially in name-sensitive contexts where chatbots are expected to treat all users equitably.

To address this issue, OpenAI researchers have introduced a privacy-preserving methodology for analyzing name-based biases in name-sensitive chatbots, such as ChatGPT. This approach aims to understand whether chatbot responses vary subtly when exposed to different user names, potentially reinforcing demographic stereotypes. The analysis focuses on ensuring the privacy of real user data while examining whether biases occur in responses linked to specific demographic groups represented through names. In the process, the researchers leverage a Language Model Research Assistant (LMRA) to identify patterns of bias without directly exposing sensitive user information. The research methodology involves comparing chatbot responses by substituting different names associated with different demographics and evaluating any systematic differences.

The privacy-preserving method is built around three main components: (1) a split-data privacy approach, (2) a counterfactual fairness analysis, and (3) the use of LMRA for bias detection and evaluation. The split-data approach involves using a combination of public and private chat datasets to train and evaluate models while ensuring no sensitive personal information is accessed directly by human evaluators. The counterfactual analysis involves substituting user names in conversations to assess if there are differential responses depending on the name’s gender or ethnicity. By using LMRA, the researchers were able to automatically analyze and cross-validate potential biases in chatbot responses, identifying subtle yet potentially harmful patterns across various contexts, such as storytelling or advice.

Results from the study revealed distinct differences in chatbot responses based on user names. For example, when users with female-associated names asked for creative story-writing assistance, the chatbot’s responses more often featured female protagonists and included warmer, more emotionally engaging language. In contrast, users with male-associated names received more neutral and factual content. These differences, though seemingly minor in isolation, highlight how implicit biases in language models can manifest subtly across a wide array of scenarios. The research found similar patterns across several domains, with female-associated names often receiving responses that were more supportive in tone, while male-associated names received responses with slightly more complex or technical language.

The conclusion of this work underscores the importance of ongoing bias evaluation and mitigation efforts for chatbots, especially in user-centric applications. The proposed privacy-preserving approach enables researchers to detect biases without compromising user privacy and provides valuable insights for improving chatbot fairness. The research highlights that while harmful stereotypes were generally found at low rates, even these minimal biases require attention to ensure equitable interactions for all users. This approach not only informs developers about specific bias patterns but also serves as a replicable framework for further bias investigations by external researchers.

Check out the Details and Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post A New Study by OpenAI Explores How Users’ Names can Impact ChatGPT’s Responses appeared first on MarkTechPost.

Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA …

The rapid growth of large language models (LLMs) and their increasing computational requirements have prompted a pressing need for optimized solutions to manage memory usage and inference speed. As models like GPT-3, Llama, and other large-scale architectures push the limits of GPU capacity, efficient hardware utilization becomes crucial. High memory requirements, slow token generation, and limitations in memory bandwidth have all contributed to significant performance bottlenecks. These problems are particularly noticeable when deploying LLMs on NVIDIA Hopper GPUs, as balancing memory usage and computational speed becomes more challenging.

Neural Magic introduces Machete: a new mixed-input GEMM kernel for NVIDIA Hopper GPUs, representing a major advancement in high-performance LLM inference. Machete utilizes w4a16 mixed-input quantization to drastically reduce memory usage while ensuring consistent computational performance. This innovative approach allows Machete to reduce memory requirements by roughly 4x in memory-bound environments. When compared to FP16 precision, Machete matches compute-bound performance while greatly improving efficiency for memory-constrained deployments. As LLMs continue to expand in scope, addressing memory bottlenecks with practical solutions like Machete becomes essential for enabling smoother, faster, and more efficient model inference.

One of Machete’s key innovations lies in its technical implementation. Built on CUTLASS 3.5.1, Machete leverages the wgmma tensor core instructions to overcome compute-bound limitations, resulting in faster model inference. It also incorporates weight pre-shuffling, which allows for faster-shared memory loads, effectively mitigating bottlenecks that typically arise in large-scale LLMs. This weight pre-shuffling mechanism optimizes shared memory by allowing 128-bit loads, increasing throughput and reducing latency. In addition, Machete has improved upconversion routines that facilitate efficient conversion of 4-bit elements to 16-bit, maximizing tensor core utilization. Together, these innovations make Machete an effective solution for improving LLM performance without the overhead typically associated with increased precision or additional computational costs.

The importance of Machete cannot be overstated, particularly in the context of the growing demand for LLM deployments that are both memory and compute-efficient. By reducing memory usage by around fourfold, Machete helps ensure that even the largest LLMs, such as Llama 3.1 70B and Llama 3.1 405B, can be run efficiently on available hardware. In testing, Machete achieved notable results, including a 29% increase in input throughput and a 32% faster output token generation rate for Llama 3.1 70B, with an impressive time-to-first-token (TTFT) of under 250ms on a single H100 GPU. When scaled to a 4xH100 setup, Machete delivered a 42% throughput speedup on Llama 3.1 405B. These results demonstrate not only the significant performance boost provided by Machete but also its capacity to scale efficiently across different hardware configurations. The support for upcoming optimizations, such as w4a8 FP8, AWQ, QQQ, and improved performance for low-batch-size operations, further solidifies Machete’s role in pushing the boundaries of efficient LLM deployment.

In conclusion, Machete represents a meaningful step forward in optimizing LLM inference on NVIDIA Hopper GPUs. By addressing the critical bottlenecks of memory usage and bandwidth, Machete has introduced a new approach to managing the demands of large-scale language models. Its mixed-input quantization, technical optimizations, and scalability make it an invaluable tool for improving model inference efficiency while reducing computational costs. The impressive gains demonstrated on Llama models show that Machete is poised to become a key enabler of efficient LLM deployments, setting a new standard for performance in memory-constrained environments. As LLMs continue to grow in scale and complexity, tools like Machete will be essential in ensuring that these models can be deployed efficiently, providing faster and more reliable outputs without compromising on quality.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs appeared first on MarkTechPost.

Accelerate migration portfolio assessment using Amazon Bedrock

Conducting assessments on application portfolios that need to be migrated to the cloud can be a lengthy endeavor. Despite the existence of AWS Application Discovery Service or the presence of some form of configuration management database (CMDB), customers still face many challenges. These include time taken for follow-up discussions with application teams to review outputs and understand dependencies (approximately 2 hours per application), cycles needed to generate a cloud architecture design that meets security and compliance requirements, and the effort needed to provide cost estimates by selecting the right AWS services and configurations for optimal application performance in the cloud. Typically, it takes 6–8 weeks to carry out these tasks before actual application migrations begin.
In this blog post, we will harness the power of generative AI and Amazon Bedrock to help organizations simplify, accelerate, and scale migration assessments. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. By using Amazon Bedrock Agents, action groups, and Amazon Bedrock Knowledge Bases, we demonstrate how to build a migration assistant application that rapidly generates migration plans, R-dispositions, and cost estimates for applications migrating to AWS. This approach enables you to scale your application portfolio discovery and significantly accelerate your planning phase.
General requirements for a migration assistant
The following are some key requirements that you should consider when building a migration assistant.
Accuracy and consistency
Is your migration assistant application able to render accurate and consistent responses?
Guidance: To ensure accurate and consistent responses from your migration assistant, implement Amazon Bedrock Knowledge Bases. The knowledge base should contain contextual information based on your company’s private data sources. This enables the migration assistant to use Retrieval-Augmented Generation (RAG), which enhances the accuracy and consistency of responses. Your knowledge base should comprise multiple data sources, including:

Responses for an application discovery questionnaire (See example)
Output from the configuration management database (CMDB) or AWS Application Discovery Agent data (See example)
Best practices and white-papers on migration to AWS (for example, Migration Lens – AWS Well-Architected Framework and Container Migration Methodology)
Any organization-specific guidelines, migration patterns, or application patterns

Handle hallucinations
How are you reducing the hallucinations from the large language model (LLM) for your migration assistant application?
Guidance: Reducing hallucinations in LLMs involves implementation of several key strategies. Implement customized prompts based on your requirements and incorporate advanced prompting techniques to guide the model’s reasoning and provide examples for more accurate responses. These techniques include chain-of-thought prompting, zero-shot prompting, multishot prompting, few-shot prompting, and model-specific prompt engineering guidelines (see Anthropic Claude on Amazon Bedrock prompt engineering guidelines). RAG combines information retrieval with generative capabilities to enhance contextual relevance and reduce hallucinations. Finally, a feedback loop or human-in-the-loop when fine-tuning LLMs on specific datasets will help align the responses with accurate and relevant information, mitigating errors and outdated content.
Modular design
Is the design of your migration assistant modular?
Guidance: Building a migration assistant application using Amazon Bedrock action groups, which have a modular design, offers three key benefits.

Customization and adaptability: Action groups allow users to customize migration workflows to suit specific AWS environments and requirements. For instance, if a user is migrating a web application to AWS, they can customize the migration workflow to include specific actions tailored to web server setup, database migration, and network configuration. This customization ensures that the migration process aligns with the unique needs of the application being migrated.
Maintenance and troubleshooting: Simplifies maintenance and troubleshooting tasks by isolating issues to individual components. For example, if there’s an issue with the database migration action within the migration workflow, it can be addressed independently without affecting other components. This isolation streamlines the troubleshooting process and minimizes the impact on the overall migration operation, ensuring a smoother migration and faster resolution of issues.
Scalability and reusability: Promote scalability and reusability across different AWS migration projects. For instance, if a user successfully migrates an application to AWS using a set of modular action groups, they can reuse those same action groups to migrate other applications with similar requirements. This reusability saves time and effort when developing new migration workflows and ensures consistency across multiple migration projects. Additionally, modular design facilitates scalability by allowing users to scale the migration operation up or down based on workload demands. For example, if they need to migrate a larger application with higher resource requirements, they can easily scale up the migration workflow by adding more instances of relevant action groups, without needing to redesign the entire workflow from scratch.

Overview of solution
Before we dive deep into the deployment, let’s walk through the key steps of the architecture that will be established, as shown in Figure 1.

Users interact with the migration assistant through the Amazon Bedrock chat console to input their requests. For example, a user might request to Generate R-disposition with cost estimates or Generate Migration plan for specific application IDs (for example, A1-CRM or A2-CMDB).
The migration assistant, which uses Amazon Bedrock agents, is configured with instructions, action groups, and knowledge bases. When processing the user’s request, the migration assistant invokes relevant action groups such as R Dispositions and Migration Plan, which in turn invoke specific AWS Lambda
The Lambda functions process the request using RAG to produce the required output.
The resulting output documents (R-Dispositions with cost estimates and Migration Plan) are then uploaded to a designated Amazon Simple Storage Service (Amazon S3)

The following image is a screenshot of a sample user interaction with the migration assistant.

Prerequisites
You should have the following:

Understanding of Amazon Bedrock Agents, prompt engineering, Amazon Bedrock Knowledge Bases, Lambda functions, and AWS Identity and Access Management (IAM).
Familiarity with basic cloud migration concepts, including application discovery and migration strategies.
An AWS account with the appropriate IAM permissions to create Amazon Bedrock agents and knowledge bases, Lambda functions, and IAM roles.
Access to Amazon Bedrock models. For more information, refer to Model access.
Access to create and configure Amazon Simple Storage Service (S3) buckets, which will be used for storing generated migration plans and other outputs.
Create a service role for Amazon Bedrock Agents.

Deployment steps

Configure a knowledge base:

Open the AWS Management Console for Amazon Bedrock and navigate to Amazon Bedrock Knowledge Bases.
Choose Create knowledge base and enter a name and optional description.
Select the vector database (for example, Amazon OpenSearch Serverless).
Select the embedding model (for example, Amazon Titan Embedding G1 – Text).
Add data sources:

For Amazon S3: Specify the S3 bucket and prefix, file types, and chunking configuration.
For custom data: Use the API to ingest data programmatically.

Review and create the knowledge base.

Set up Amazon Bedrock Agents:

In the Amazon Bedrock console, go to the Agents section and chose Create agent.
Enter a name and optional description for the agent.
Select the foundation model (for example, Anthropic Claude V3).
Configure the agent’s AWS Identity and Access Management (IAM) role to grant necessary permissions.
Add instructions to guide the agent’s behavior.
Optionally, add the previously created Amazon Bedrock Knowledge Base to enhance the agent’s responses.
Configure additional settings such as maximum tokens and temperature.
Review and create the agent.

Configure actions groups for the agent:

On the agent’s configuration page, navigate to the Action groups
Choose Add action group for each required group (for example, Create R-disposition Assessment and Create Migration Plan).
For each action group:

Enter a name and description.
Define the API schema using OpenAPI 3.0 specification, detailing the endpoints and expected request and response formats.
Create and associate a Lambda function to handle the action’s logic. See the sample Lambda logic for Create R-disposition Assessment and Create Migration Plan action groups.
Configure the Lambda function with the appropriate permissions and environment variables.
Test the action group using the provided test console to ensure proper functionality.

After adding all action groups, review the entire agent configuration and deploy the agent.

Clean up
To avoid unnecessary charges, delete the resources created during testing. Use the following steps to clean up the resources:

Delete the Amazon Bedrock knowledge base: Open the Amazon Bedrock console. Delete the knowledge base from any agents that it’s associated with.

From the left navigation pane, choose Agents.
Select the Name of the agent that you want to delete the knowledge base from.
A red banner appears to warn you to delete the reference to the knowledge base, which no longer exists, from the agent.
Select the radio button next to the knowledge base that you want to remove. Choose More and then choose Delete.
From the left navigation pane, choose Knowledge base.
To delete a source, either choose the radio button next to the source and select Delete or select the Name of the source and then choose Delete in the top right corner of the details page.
Review the warnings for deleting a knowledge base. If you accept these conditions, enter delete in the input box and choose Delete to confirm.

Delete the Agent

In the Amazon Bedrock console, choose Agents from the left navigation pane.
Select the radio button next to the agent to delete.
A modal appears warning you about the consequences of deletion. Enter delete in the input box and choose Delete to confirm.
A blue banner appears to inform you that the agent is being deleted. When deletion is complete, a green success banner appears.

Delete all the other resources including the Lambda functions and any AWS services used for account customization.

Conclusion
Conducting assessments on application portfolios for AWS cloud migration can be a time-consuming process, involving analyzing data from various sources, discovery and design discussions to develop an AWS Cloud architecture design, and cost estimates.
In this blog post, we demonstrated how you can simplify, accelerate, and scale migration assessments by using generative AI and Amazon Bedrock. We showcased using Amazon Bedrock Agents, action groups, and Amazon Bedrock Knowledge Bases for a migration assistant application that renders migration plans, R-dispositions, and cost estimates. This approach significantly reduces the time and effort required for portfolio assessments, helping organizations to scale and expedite their journey to the AWS Cloud.
Ready to improve your cloud migration process with generative AI in Amazon Bedrock? Begin by exploring the Amazon Bedrock User Guide to understand how it can streamline your organization’s cloud journey. For further assistance and expertise, consider using AWS Professional Services (contact sales) to help you streamline your cloud migration journey and maximize the benefits of Amazon Bedrock.

About the Authors
Ebbey Thomas is a Senior Cloud Architect at AWS, with a strong focus on leveraging generative AI to enhance cloud infrastructure automation and accelerate migrations. In his role at AWS Professional Services, Ebbey designs and implements solutions that improve cloud adoption speed and efficiency while ensuring secure and scalable operations for AWS users. He is known for solving complex cloud challenges and driving tangible results for clients. Ebbey holds a BS in Computer Engineering and an MS in Information Systems from Syracuse University.
Shiva Vaidyanathan is a Principal Cloud Architect at AWS. He provides technical guidance, design and lead implementation projects to customers ensuring their success on AWS. He works towards making cloud networking simpler for everyone. Prior to joining AWS, he has worked on several NSF funded research initiatives on performing secure computing in public cloud infrastructures. He holds a MS in Computer Science from Rutgers University and a MS in Electrical Engineering from New York University.

Improve public speaking skills using a generative AI-based virtual ass …

Public speaking is a critical skill in today’s world, whether it’s for professional presentations, academic settings, or personal growth. By practicing it regularly, individuals can build confidence, manage anxiety in a healthy way, and develop effective communication skills leading to successful public speaking engagements. Now, with the advent of large language models (LLMs), you can use generative AI-powered virtual assistants to provide real-time analysis of speech, identification of areas for improvement, and suggestions for enhancing speech delivery.
In this post, we present an Amazon Bedrock powered virtual assistant that can transcribe presentation audio and examine it for language use, grammatical errors, filler words, and repetition of words and sentences to provide recommendations as well as suggest a curated version of the speech to elevate the presentation. This solution helps refine communication skills and empower individuals to become more effective and impactful public speakers. Organizations across various sectors, including corporations, educational institutions, government entities, and social media personalities, can use this solution to provide automated coaching for their employees, students, and public speaking engagements.
In the following sections, we walk you through constructing a scalable, serverless, end-to-end Public Speaking Mentor AI Assistant with Amazon Bedrock, Amazon Transcribe, and AWS Step Functions using provided sample code. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Overview of solution
The solution consists of four main components:

An Amazon Cognito user pool for user authentication. Authenticated users are granted access to the Public Speaking Mentor AI Assistant web portal to upload audio and video recordings.
A simple web portal created using Streamlit to upload audio and video recordings. The uploaded files are stored in an Amazon Simple Storage Service (Amazon S3) bucket for later processing, retrieval, and analysis.
A Step Functions standard workflow to orchestrate converting the audio to text using Amazon Transcribe and then invoking Amazon Bedrock with AI prompt chaining to generate speech recommendations and rewrite suggestions.
Amazon Simple Notification Service (Amazon SNS) to send an email notification to the user with Amazon Bedrock generated recommendations.

This solution uses Amazon Transcribe for speech-to-text conversion. When an audio or video file is uploaded, Amazon Transcribe transcribes the speech into text. This text is passed as an input to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock. The solution sends two prompts to Amazon Bedrock: one to generate feedback and recommendations on language usage, grammar, filler words, repetition, and more, and another to obtain a curated version of the original speech. Prompt chaining is performed with Amazon Bedrock for these prompts. The solution then consolidates the outputs, displays recommendations on the user’s webpage, and emails the results.
The generative AI capabilities of Amazon Bedrock efficiently process user speech inputs. It uses natural language processing to analyze the speech and provides tailored recommendations. Using LLMs trained on extensive data, Amazon Bedrock generates curated speech outputs to enhance the presentation delivery.
The following diagram shows our solution architecture.

Let’s explore the architecture step by step:

The user authenticates to the Public Speaking Mentor AI Assistant web portal (a Streamlit application hosted on user’s local desktop) using the Amazon Cognito user pool authentication mechanism.
The user uploads an audio or video file to the web portal, which is stored in an S3 bucket encrypted using server-side encryption with Amazon S3 managed keys (SSE-S3).
The S3 service triggers an s3:ObjectCreated event for each file that is saved to the bucket.
Amazon EventBridge invokes the Step Functions state machine based on this event. Because the state machine execution could exceed 5 minutes, we use a standard workflow. Step Functions state machine logs are sent to Amazon CloudWatch for logging and troubleshooting purposes.
The Step Functions workflow uses AWS SDK integrations to invoke Amazon Transcribe and initiates a StartTranscriptionJob, passing the S3 bucket, prefix path, and object name in the MediaFileUri The workflow waits for the transcription job to complete and saves the transcript in another S3 bucket prefix path.
The Step Functions workflow uses the optimized integrations to invoke the Amazon Bedrock InvokeModel API, which specifies the Anthropic Claude 3.5 Sonnet model, the system prompt, maximum tokens, and the transcribed speech text as inputs to the API. The system prompt instructs the Anthropic Claude 3.5 Sonnet model to provide suggestions on how to improve the speech by identifying incorrect grammar, repetitions of words or content, use of filler words, and other recommendations.
After receiving a response from Amazon Bedrock, the Step Functions workflow uses prompt chaining to craft another input for Amazon Bedrock, incorporating the previous transcribed speech and the model’s previous response, and requesting the model to provide suggestions for rewriting the speech.
The workflow combines these outputs from Amazon Bedrock and crafts a message that is displayed on the logged-in user’s webpage.
The Step Functions workflow invokes the Amazon SNS Publish optimized integration to send an email to the user with the Amazon Bedrock generated message.
The Streamlit application queries Step Functions to display output results on the Amazon Cognito user’s webpage.

Prerequisites
For implementing the Public Speaking Mentor AI Assistant solution, you should have the following prerequisites:

An AWS account with sufficient AWS Identity and Access Management (IAM) permissions for the following AWS services to deploy the solution and run the Streamlit application web portal:

Amazon Bedrock
AWS CloudFormation
Amazon CloudWatch
Amazon Cognito
Amazon EventBridge
Amazon Transcribe
Amazon SNS
Amazon S3
AWS Step Functions

Model access enabled for Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock in your desired AWS Region.
A local desktop environment with the AWS Command Line Interface (AWS CLI) installed, Python 3.8 or above, and the AWS Cloud Development Kit (AWS CDK) for Python and Git installed.
The AWS CLI set up with necessary AWS credentials and desired Region.

Deploy the Public Speaking Mentor AI Assistant solution
Complete the following steps to deploy the Public Speaking Mentor AI Assistant AWS infrastructure:

Clone the repository to your local desktop environment with the following command:

git clone https://github.com/aws-samples/improve_public_speaking_skills_using_a_genai_based_virtual_assistant_with_amazon_bedrock.git

Change to the app directory in the cloned repository:

cd improve_public_speaking_skills_using_a_genai_based_virtual_assistant_with_amazon_bedrock/app

Create a Python virtual environment:

python3 -m venv .venv

Activate your virtual environment:

source .venv/bin/activate

Install the required dependencies:

pip install -r requirements.txt

Optionally, synthesize the CloudFormation template using the AWS CDK:

cdk synth

You may need to perform a one-time AWS CDK bootstrapping using the following command. See AWS CDK bootstrapping for more details.

cdk bootstrap aws://<ACCOUNT-NUMBER-1>/<REGION-1>

Deploy the CloudFormation template in your AWS account and selected Region:

cdk deploy

After the AWS CDK is deployed successfully, you can follow the steps in the next section to create an Amazon Cognito user.
Create an Amazon Cognito user for authentication
Complete the following steps to create a user in the Amazon Cognito user pool to access the web portal. The user created doesn’t need AWS permissions.

Sign in to the AWS Management Console of your account and select the Region for your deployment.
On the Amazon Cognito console, choose User pools in the navigation pane.
Choose the user pool created by the CloudFormation template. (The user pool name should have the prefix PSMBUserPool followed by a string of random characters as one word.)
Choose Create user.

Enter a user name and password, then choose Create user.

Subscribe to an SNS topic for email notifications
Complete the following steps to subscribe to an SNS topic to receive speech recommendation email notifications:

Sign in to the console of your account and select the Region for your deployment.
On the Amazon SNS console, choose Topics in the navigation pane.
Choose the topic created by the CloudFormation template. (The name of the topic should look like InfraStack-PublicSpeakingMentorAIAssistantTopic followed by a string of random characters as one word.)
Choose Create subscription.

For Protocol, choose Email.
For Endpoint, enter your email address.
Choose Create subscription.

Run the Streamlit application to access the web portal
Complete the following steps to run the Streamlit application to access the Public Speaking Mentor AI Assistant web portal:

Change the directory to webapp inside the app directory:

cd webapp

Launch the Streamlit server on port 8080:

streamlit run webapp.py –server.port 8080

Make note of the Streamlit application URL for further use. Depending on your environment setup, you could choose one of the URLs out of three (Local, Network, or External) provided by Streamlit server’s running process.

Make sure incoming traffic on port 8080 is allowed on your local machine to access the Streamlit application URL.

Use the Public Speaking Mentor AI Assistant
Complete the following steps to use the Public Speaking Mentor AI Assistant to improve your speech:

Open the Streamlit application URL in your browser (Google Chrome, preferably) that you noted in the previous steps.
Log in to the web portal using the Amazon Cognito user name and password created earlier for authentication.

Choose Browse files to locate and choose your recording.
Choose Upload File to upload your file to an S3 bucket.

As soon as the file upload finishes, the Public Speaking Mentor AI Assistant processes the audio transcription and prompt engineering steps to generate speech recommendations and rewrite results.

When the processing is complete, you can see the Speech Recommendations and Speech Rewrite sections on the webpage as well as in your email through Amazon SNS notifications.
On the right pane of the webpage, you can review the processing steps performed by the Public Speaking Mentor AI Assistant solution to get your speech results.

Clean up
Complete the following steps to clean up your resources:

Shut down your Streamlit application server process running in your environment using Ctrl+C.
Change to the app directory in your repository.
Destroy the resources created with AWS CloudFormation using the AWS CDK:

cdk destroy

Optimize for functionality, accuracy, and cost
Let’s conduct an analysis of this proposed solution architecture to identify opportunities for functionality enhancements, accuracy improvements, and cost optimization.
Starting with prompt engineering, our approach involves analyzing users’ speech based on several criteria, such as language usage, grammatical errors, filler words, and repetition of words and sentences. Individuals and organizations have the flexibility to customize the prompt by including additional analysis parameters or adjusting existing ones to align with their requirements and company policies. Furthermore, you can set the inference parameters to control the response from the LLM deployed on Amazon Bedrock.
To create a lean architecture, we have primarily chosen serverless technologies, such as Amazon Bedrock for prompt engineering and natural language generation, Amazon Transcribe for speech-to-text conversion, Amazon S3 for storage, Step Functions for orchestration, EventBridge for scalable event handling to process audio files, and Amazon SNS for email notifications. Serverless technologies enable you to run the solution without provisioning or managing servers, allowing for automatic scaling and pay-per-use billing, which can lead to cost savings and increased agility.
For the web portal component, we are currently deploying the Streamlit application in a local desktop environment. Alternatively, you have the option to use Amazon S3 Website Hosting, which would further contribute to a serverless architecture.
To enhance the accuracy of audio-to-text translation, it’s recommended to record your presentation audio in a quiet environment, away from noise and distractions.
In cases where your media contains domain-specific or non-standard terms, such as brand names, acronyms, and technical words, Amazon Transcribe might not accurately capture these terms in your transcription output. To address transcription inaccuracies and customize your output for your specific use case, you can create custom vocabularies and custom language models.
At the time of writing, our solution analyzes only the audio component. Uploading audio files alone can optimize storage costs. You may consider converting your video files into audio using third-party tools prior to uploading them to the Public Speaking Mentor AI Assistant web portal.
Our solution currently uses the standard tier of Amazon S3. However, you have the option to choose the S3 One Zone-IA storage class for storing files that don’t require high availability. Additionally, configuring an Amazon S3 lifecycle policy can further help reduce costs.
You can configure Amazon SNS to send speech recommendations to other destinations, such as email, webhook, and Slack. Refer to Configure Amazon SNS to send messages for alerts to other destinations for more information.
To estimate the cost of implementing the solution, you can use the AWS Pricing Calculator. For larger workloads, additional volume discounts may be available. We recommend contacting AWS pricing specialists or your account manager for more detailed pricing information.
Security best practices
Security and compliance is a shared responsibility between AWS and the customer, as outlined in the Shared Responsibility Model. We encourage you to review this model for a comprehensive understanding of the respective responsibilities. Refer to Security in Amazon Bedrock and Build generative AI applications on Amazon Bedrock to learn more about building secure, compliant, and responsible generative AI applications on Amazon Bedrock. OWASP Top 10 For LLMs outlines the most common vulnerabilities. We encourage you to enable Amazon Bedrock Guardrails to implement safeguards for your generative AI applications based on your use cases and responsible AI policies.
With AWS, you manage the privacy controls of your data, control how your data is used, who has access to it, and how it is encrypted. Refer to Data Protection in Amazon Bedrock and Data Protection in Amazon Transcribe for more information. Similarly, we strongly recommend referring to the data protection guidelines for each AWS service used in our solution architecture. Furthermore, we advise applying the principle of least privilege when granting permissions, because this practice enhances the overall security of your implementation.
Conclusion
By harnessing the capabilities of LLMs in Amazon Bedrock, our Public Speaking Mentor AI Assistant offers a revolutionary approach to enhancing public speaking abilities. With its personalized feedback and constructive recommendations, individuals can develop effective communication skills in a supportive and non-judgmental environment.
Unlock your potential as a captivating public speaker. Embrace the power of our Public Speaking Mentor AI Assistant and embark on a transformative journey towards mastering the art of public speaking. Try out our solution today by cloning the GitHub repository and experience the difference our cutting-edge technology can make in your personal and professional growth.

About the Authors
Nehal Sangoi is a Sr. Technical Account Manager at Amazon Web Services. She provides strategic technical guidance to help independent software vendors plan and build solutions using AWS best practices. Connect with Nehal on LinkedIn.
Akshay Singhal is a Sr. Technical Account Manager at Amazon Web Services supporting Enterprise Support customers focusing on the Security ISV segment. He provides technical guidance for customers to implement AWS solutions, with expertise spanning serverless architectures and cost optimization. Outside of work, Akshay enjoys traveling, Formula 1, making short movies, and exploring new cuisines. Connect with him on LinkedIn.

Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast are now available in Amazon S …

This post is co-written with Bar Fingerman from Bria.
We are thrilled to announce that Bria 2.3, 2.2 HD, and 2.3 Fast text-to-image foundation models (FMs) from Bria AI are now available in Amazon SageMaker JumpStart. Bria models are trained exclusively on commercial-grade licensed data, providing high standards of safety and compliance with full legal indemnity.
These advanced models from Bria AI generate high-quality and contextually relevant visual content that is ready to use in marketing, design, and image generation use cases across industries from ecommerce, media and entertainment, and gaming to consumer-packaged goods and retail.
In this post, we discuss Bria’s family of models, explain the Amazon SageMaker platform, and walk through how to discover, deploy, and run inference on a Bria 2.3 model using SageMaker JumpStart.
Overview of Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast
Bria AI offers a family of high-quality visual content models. These advanced models represent the cutting edge of generative AI technology for image creation:

Bria 2.3 – The core model delivers high-quality visual content with exceptional photorealism and detail, capable of generating stunning images with complex concepts in various art styles, including photorealism.
Bria 2.2 HD – Optimized for high-definition, Bria 2.2 HD offers high-definition visual content that meets the demanding needs of high-resolution applications, making sure every detail is crisp and clear.
Bria 2.3 Fast – Optimized for speed, Bria 2.3 Fast generates high-quality visuals at a faster rate, perfect for applications requiring quick turnaround times without compromising on quality. Using the model on SageMaker g5 instance types gives fast latency and throughput (compared to Bria 2.3 and Bria 2.2 HD), and the p4d instance type provides twice the latency from the g5 instance.

Overview of SageMaker JumpStart
With SageMaker JumpStart, you can choose from a broad selection of publicly available FMs. ML practitioners can deploy FMs to dedicated SageMaker instances from a network-isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Bria models in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. Doing so enables you to derive model performance and machine learning operations (MLOps) controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs.
The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security. Bria models are available today for deployment and inferencing in SageMaker Studio in 22 AWS Regions where SageMaker JumpStart is available. Bria models will require g5 and p4 instances.
Prerequisites
To try out the Bria models using SageMaker JumpStart, you need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
Access to AWS Marketplace.
Access to SageMaker Studio, a SageMaker notebook instance, or an integrated development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.

Discover Bria models in SageMaker JumpStart
You can access the FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we show how to discover the models in SageMaker Studio.
SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane or by choosing JumpStart on the Home page.

On the SageMaker JumpStart landing page, you can find pre-trained models from popular model hubs. You can search for Bria, and the search results will list all the Bria model variants available. For this post, we use the Bria 2.3 Commercial Text-to-image model.

You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You also have two options, Deploy and Preview notebooks, to deploy the model and create an endpoint.

Subscribe to Bria models in AWS Marketplace
When you choose Deploy, if the model wasn’t already subscribed, you first have to subscribe before you can deploy the model. We demonstrate the subscription process for the Bria 2.3 Commercial Text-to-image model. You can repeat the same steps for subscribing to other Bria models.

After you choose Subscribe, you’re redirected to the model overview page, where you can read the model details, pricing, usage, and other information. Choose Continue to Subscribe and accept the offer on the following page to complete the subscription.

Configure and deploy Bria models using AWS Marketplace
The configuration page gives three different launch methods to choose from. For this post, we showcase how you can use SageMaker console:

For Available launch method, select SageMaker console.
For Region, choose your preferred Region.
Choose View in Amazon SageMaker.
For Model name, enter a name (for example, Model-Bria-v2-3).
For IAM role, choose an existing IAM role or create a new role that has the SageMaker full access IAM policy attached.
Choose Next.The recommended instance types for this model endpoint are ml.g5.2xlarge, ml.g5.12xlarge, ml.g5.48xlarge, ml.p4d.24xlarge, and ml.p4de.24xlarge. Make sure you have the account-level service limit for one or more of these instance types to deploy this model. For more information, refer to Requesting a quota increase.
In the Variants section, select any of the recommended instance types provided by Bria (for example, ml.g5.2xlarge).
Choose Create endpoint configuration. A success message should appear after the endpoint configuration is successfully created.
Choose Next to create an endpoint.
In the Create endpoint section, enter the endpoint name (for example, Endpoint-Bria-v2-3-Model) and choose Submit.After you successfully create the endpoint, it’s displayed on the SageMaker endpoints page on the SageMaker console.

Configure and deploy Bria models using SageMaker JumpStart
If the Bria models are already subscribed in AWS Marketplace, you can choose Deploy in the model card page to configure the endpoint.

On the endpoint configuration page, SageMaker pre-populates the endpoint name, recommended instance type, instance count, and other details for you. You can modify them based on your requirements and then choose Deploy to create an endpoint.

After you successfully create the endpoint, the status will show as In service.

Run inference in SageMaker Studio
You can test the endpoint by passing a sample inference request payload in SageMaker Studio, or you can use SageMaker notebook. In this section, we demonstrate using SageMaker Studio:

In SageMaker Studio, in the navigation pane, choose Endpoints under Deployments.
Choose the Bria endpoint you just created.
On the Test inference tab, test the endpoint by sending a sample request. You can see the response on the same page, as shown in the following screenshot.

Text-to-image generation using a SageMaker notebook
You can also use a SageMaker notebook to run inference against the deployed endpoint using the SageMaker Python SDK.
The following code initiates the endpoint you created using SageMaker JumpStart:

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# Use the existing endpoint name
endpoint_name = “XXXXXXXX”  # Replace with your endpoint name

# Create a SageMaker predictor object
bria_predictor = Predictor(
    endpoint_name=endpoint_name,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

bria_predictor.endpoint_name

The model responses are in base64 encoded format. The following function helps decode the base64 encoded image and displays it as an image:

import base64
from PIL import Image
import io

def display_base64_image(base64_string):
    image_bytes = base64.b64decode(base64_string)
    image_stream = io.BytesIO(image_bytes)
    image = Image.open(image_stream)

    # Display the image
    image.show()

The following is a sample payload with a text prompt to generate an image using the Bria model:

payload = {
  “prompt”: “a baby riding a bicycle in a field of flowers”,
  “num_results”: 1,
  “sync”: True
}

response = bria_predictor.predict(payload)
artifacts = response[‘artifacts’][0]

encoded_image = artifacts[‘image_base64’]

display_base64_image(encoded_image)

Example prompts
You can interact with the Bria 2.3 text-to-image model like any standard image generation model, where the model processes an input sequence and outputs response. In this section, we provide some example prompts and sample output.
We use the following prompts:

Photography, dynamic, in the city, professional mail skateboarder, sunglasses, teal and orange hue
Young woman with flowing curly hair stands on a subway platform, illuminated by the vibrant lights of a speeding train, purple and cyan colors
Close up of vibrant blue and green parrot perched on a wooden branch inside a cozy, well-lit room
Light speed motion with blue and purple neon colors and building in the background

The model generates the following images.

The following is an example prompt for generating an image using the preceding text prompt:

payload = {
“prompt”: “Photography, dynamic, in the city, professional mail skateboarder, sunglasses, teal and orange hue”,
“num_results”: 1,
“sync”: True
}

response = bria_predictor.predict(payload)
artifacts = response[‘artifacts’][0]

encoded_image = artifacts[‘image_base64’]

display_base64_image(encoded_image)

Clean up
After you’re done running the notebook, delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion
With the availability of Bria 2.3, 2.2 HD, and 2.3 Fast in SageMaker JumpStart and AWS Marketplace, enterprises can now use advanced generative AI capabilities to enhance their visual content creation processes. These models provide a balance of quality, speed, and compliance, making them an invaluable asset for any organization looking to stay ahead in the competitive landscape.
Bria’s commitment to responsible AI and the robust security framework of SageMaker provide enterprises with the full package for data privacy, regulatory compliance, and responsible AI models for commercial use. In addition, the integrated experience takes advantage of the capabilities of both platforms to simplify MLOps, data storage, and real-time processing.
For more information about using FMs in SageMaker JumpStart, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart, JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.
Explore Bria models in SageMaker JumpStart today and revolutionize your visual content creation process!

About the Authors
Bar Fingerman is the Head of AI/ML Engineering at Bria. He leads the development and optimization of core infrastructure, enabling the company to scale cutting-edge generative AI technologies. With a focus on designing high-performance supercomputers for large-scale AI training, Bar leads the engineering group in deploying, managing, and securing scalable AI/ML cloud solutions. He works closely with leadership and cross-functional teams to align business goals while driving innovation and cost-efficiency.
Supriya Puragundla is a Senior Solutions Architect at AWS. She has over 15 years of IT experience in software development, design, and architecture. She helps key customer accounts on their data, generative AI, and AI/ML journeys. She is passionate about data-driven AI and the area of depth in ML and generative AI.
Rodrigo Merino is a Generative AI Solutions Architect Manager at AWS. With over a decade of experience deploying emerging technologies, ranging from generative AI to IoT, Rodrigo guides customers across various industries to accelerate their AI/ML and generative AI journeys. He specializes in helping organizations train and build models on AWS, as well as operationalize end-to-end ML solutions. Rodrigo’s expertise lies in bridging the gap between cutting-edge technology and practical business applications, enabling companies to harness the full potential of AI and drive innovation in their respective fields.
Eliad Maimon is a Senior Startup Solutions Architect at AWS, focusing on generative AI startups. He helps startups accelerate and scale their AI/ML journeys by guiding them through deep-learning model training and deployment on AWS. With a passion for AI and entrepreneurship, Eliad is committed to driving innovation and growth in the startup ecosystem.

Visitor Tracking: Everything You Need to Know in 2025

Visitor tracking can mean many things. After all, as technology has evolved, our ability (or inability) to track our website visitors has also evolved.

If you want to go way back, visitor tracking used to just answer the question, “How many people are on your website?”. Remember those little counters that people had at the bottom of their sites? Ahh, simpler times.

Now, visitor tracking means so much more. While it can still pertain to the number of website visitors, it now typically relates to the customer journey – who is on your site, what did they do while they were there, what were they interested in, etc. 

That’s what we are going to get into today. As we head toward 2025, we want to look at what visitor tracking is and what marketers need to know to be successful. 

Here’s what we’ll cover:

What is Visitor Tracking

The State of Visitor Tracking in 2025

Types of Visitor Tracking

How Does Visitor Tracking Work 

Benefits of Visitor Tracking

Visitor Tracking Tools & Software

Visitor Tracking with Customers.ai

Visitor Tracking FAQs

Let’s dive in.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

What is Visitor Tracking?

At its core, visitor tracking can be defined as the process of identifying and monitoring the actions of users on a website, capturing details such as page views, session duration, and interactions.

If we really want to dig deeper, visitor tracking is a technology that helps businesses see how people are interacting with their website—things like what pages they visit, how long they stay, and what they click on. 

This information can then be used to make improvements to the site, create a better experience for visitors, and ultimately boost sales. 

Today’s visitor tracking tools can go even further, offering insights like identifying anonymous visitors, tracking the customer journey across channels, enabling personalized marketing, and so much more.

The Current State of Visitor Tracking

We’d be remiss if we didn’t talk about the current state of visitor tracking. Why? Because privacy concerns and regulations have completely reshaped how visitor tracking works. 

Back in the day, tools like Google Analytics and Facebook gave us all the data we could want about our visitors. We could identify IPs, company data, demographic data, and so much more.

Not anymore. 

The introduction of iOS 14 dealt a huge blow to Facebook’s ability to retarget users by restricting access to key data, and browsers like Safari, Firefox, and Google Chrome have phased out (or started to phase out) third-party cookies, making it even harder to not just get visitor information but also to track users across sites. As a result, businesses have been forced to rethink their approach to tracking while complying with stricter regulations like GDPR and CCPA.

The next phase of visitor tracking looks to be first-party data. Collecting data directly from your own website and customers is now the most reliable method for tracking user behavior.

Tools like server-side tracking and identity resolution platforms are already being used to help businesses adapt and get the information they need to create the personalized experiences customers want. 

Types of Visitor Tracking

We mentioned earlier that visitor identification isn’t simple – that means along with having many meanings, there are also several ‘types’ of visitor tracking. These include visitor identification, website analytics, heatmap tracking, session recording, funnel analysis, behavioral targeting, and A/B testing. 

Let’s look at what each type does along with examples of tools.

Visitor Identification / Identity Resolution

Visitor identification tools tell you who’s browsing your site, even when they don’t fill out a form. They gather info like IP addresses and behavioral data to create customer profiles, allowing you to see who’s visiting and reach out to potential leads. As we like to say here at Customers.ai, visitor identification tools are especially useful for turning anonymous traffic into actionable insights.

Visitor Identification Tools: Customers.ai

Website Analytics

Website analytics tools give you a clear picture of how visitors are interacting with your site. You’ll get data like page views, bounce rates, and how long people stick around, helping you understand what’s working and what’s not. 

Website Analytics Tools: Google Analytics, Adobe Analytics

Heatmap Tracking

Heatmaps show you where people are clicking, scrolling, and spending time on your website. It’s a visual way to see what content grabs their attention and which areas they ignore. 

Heatmap Tracking Tools: Hotjar, Crazy Egg, Mouseflow

Session Recording / Replay

Session recording tools let you watch recordings of real user sessions on your site. You can see exactly how visitors navigate, where they get stuck, and what they’re engaging with. 

Session Recording Tools: FullStory, Inspectlet, Lucky Orange

Funnel Analysis

Funnel analysis tools track how visitors move through the stages of your sales or marketing funnel, showing you where they drop off. By understanding the path to conversion, you can figure out where potential customers are losing interest and optimize those points to keep them moving forward.

Funnel Analysis Tools: Mixpanel, Kissmetrics, ClickFunnel

Behavioral Targeting

Behavioral targeting tools analyze your visitors’ actions to serve up personalized content and offers. By tapping into their browsing habits and past behavior, you can create a more relevant experience.

Behavioral Targeting Tools: Dynamic Yield, Optimizely, Segment

A/B Testing

A/B testing tools let you test different versions of a page or feature to see which one performs better. You can experiment with headlines, layouts, or calls-to-action, and the tool will show you what works best for your audience, so you can optimize your site over time.

A/B Testing Tools: VWO (Visual Website Optimizer), Unbounce, Convert

How Does Visitor Tracking Work? 

Visitor tracking generally works by installing a small piece of code (a script) on your website. This script gathers data on visitor behavior, like clicks, page views, and time spent on the site. It can also track user interactions across different sessions and devices to build a more complete picture of their journey.

Some tools use browser cookies to track user activity over time, while others rely on methods like reverse DNS lookup to identify companies visiting your site based on IP addresses. 

At Customers.ai, our visitor tracking tools use a combination of advanced identity resolution and behavioral tracking technologies that are not reliant on third-party cookies, sign-ins, or form fills. Our proprietary technology has resulted in industry-leading match rates and the ability to identify anonymous visitors even when other tools can’t. 

Benefits of Visitor Tracking

We are obviously biased to the many benefits of visitor tracking but if you need some convincing, let’s look at 10 ways visitor tracking can help marketers and businesses take their results to the next level and the applicable use cases.

1. Identifying Anonymous Visitors

Visitor tracking tools can identify visitors who haven’t filled out a form or signed in, providing valuable data about who’s coming to your site and allowing you to reach them via email, ads, or even direct mail retargeting. 

Visitor Tracking Use Case: An ecommerce marketer for a luxury clothing brand sees that an anonymous visitor has browsed 10 different product pages, including several high-ticket items, indicating strong purchase intent. Using a visitor identification tool like Customers.ai, the marketer uncovers the visitor’s name, email, and other details. With this information, they can set up a personalized retargeting campaign on Facebook or Google, showing ads for the exact products the visitor engaged with. This allows the brand to follow up with tailored ads, increasing the chances of turning this high-intent visitor into a paying customer.

2. Understanding User Behavior

Tracking tools show how visitors interact with your website—what pages they view, how long they stay, and where they click. This helps marketers understand which content and pages are engaging visitors and which ones may need improvement.

Visitor Tracking Use Case: A skincare brand’s content team notices that blog posts about acne treatment are driving high engagement, with readers often visiting related product pages. After analyzing behavior, the marketer finds that visitors who engage with two or more acne-related posts are 30% more likely to add products to their cart. Armed with this insight, the marketer creates more content around skincare concerns and links relevant products within the posts, leading to higher engagement and increased sales.

3. Optimizing Website Performance

With insights into where visitors spend the most time or where they drop off, tracking tools help identify performance issues like slow page load times or poor navigation.

Visitor Tracking Use Case: The dev team at a shoe retailer notices that many visitors are dropping off on the product pages, especially during peak traffic times. By analyzing website performance data, they discover that the page load time on these product pages is slow, especially on mobile devices. To fix this, they optimize the images, reduce unnecessary scripts, and improve the mobile layout. As a result, the page load time decreases, and bounce rates drop, leading to more visitors reaching the checkout page and an overall increase in conversions.

4. Improving User Experience

Visitor tracking reveals friction points in the user journey, such as confusing navigation or hard-to-find information. This allows marketers to streamline the user experience, making the site easier to navigate and more user-friendly.

Visitor Tracking Use Case: The conversion rate optimization specialist for a home decor store notices that visitors frequently abandon their shopping carts at the shipping information step. After analyzing session recordings, they realize that the shipping costs aren’t clearly displayed until the final checkout stage, frustrating users. To improve the experience, they update the site to show estimated shipping costs earlier in the process, right on the product page. This transparency reduces cart abandonment, leading to smoother checkouts and higher conversion rates.

5. Personalizing Marketing Efforts

By understanding what content or products a visitor is interested in, visitor tracking enables personalized messaging and content recommendations. This makes marketing efforts more relevant and engaging.

Visitor Tracking Use Case: An email marketer for a fitness apparel brand tracks visitors’ behavior and notices that a segment of users frequently view women’s running shoes but never make a purchase. By using visitor tracking data, they identify these high-interest users and create personalized email campaigns that showcase new arrivals and exclusive discounts on women’s running shoes. This personalized approach leads to higher engagement and a significant boost in sales for that specific product category.

6. Tracking Conversions and Sales

Conversion tracking allows businesses to measure specific actions taken on a website, like form submissions, sign-ups, or purchases. This helps marketers understand what’s driving sales and where potential customers might be dropping off.

Visitor Tracking Use Case: A demand generation analyst for a beauty brand sets up conversion tracking to measure how visitors move from product pages to completing a purchase. By analyzing the data, they notice that visitors who engage with product reviews are 25% more likely to complete a purchase. Based on this insight, they decide to highlight customer reviews more prominently across product pages and emails. This optimization boosts the conversion rate, as more visitors now interact with the reviews, leading to increased sales.

7. Enhancing Retargeting Efforts

Visitor tracking helps capture data about visitors who didn’t convert on their first visit, allowing businesses to retarget them with relevant ads or offers. This ensures that you can re-engage potential customers and bring them back to complete the purchase.

Visitor Tracking Use Case: A social media manager for a high-end electronics store tracks visitors who browse premium headphones but leave without purchasing. Using visitor tracking data, they identify those who spent considerable time on the product page and add them to a retargeting list. They then create targeted Facebook and Google ads featuring the exact headphones those visitors viewed, along with an incentive like free shipping or a limited-time discount. This tailored retargeting campaign brings the visitors back, significantly increasing conversions for the premium headphones.

8. Segmenting High-Intent Visitors

Tracking tools can segment visitors based on their interactions, such as those who viewed key pages or added items to their cart. This helps marketers identify which visitors are most likely to convert and focus efforts on them.

Visitor Tracking Use Case: An ecommerce marketer for a jewelry brand notices that visitors who spend time customizing engagement rings on the site’s product builder are highly engaged but often leave before completing a purchase. Using visitor tracking, the marketer segments these high-intent visitors based on their interactions with the custom ring tool. They then create a specific email campaign offering personalized assistance, like a virtual consultation, and send targeted ads showing the exact rings those users customized. This targeted segmentation helps convert high-intent visitors into paying customers by addressing their needs at the right moment.

9. Detecting Drop-off Points in the User Journey

Tracking user behavior allows marketers to see where visitors are dropping off, such as during the checkout process or after viewing a few pages. This data helps businesses address problem areas and improve the overall conversion rate.

Visitor Tracking Use Case: A fashion retailer notices that many visitors add items to their carts but abandon the checkout process before completing the purchase. By analyzing visitor tracking data, the marketer discovers that most drop-offs occur on the shipping options page. Upon further investigation, they find that the shipping options are confusing and lack clarity. To fix this, the marketer simplifies the shipping choices and provides clear explanations of delivery times and costs. This reduces friction in the checkout process, resulting in fewer cart abandonments and an increase in completed purchases.

10. Increasing Lead Generation and Sales Opportunities

Visitor tracking can uncover which sources of traffic bring in the most engaged users, helping marketers focus on high-performing channels for lead generation. With better data on visitor behavior, businesses can tailor their offers and messaging to attract more qualified leads.

Visitor Tracking Use Case: A marketer for a home fitness equipment store uses visitor tracking to monitor which blog posts and product guides drive the most traffic. They notice that visitors engaging with content about home workout routines are more likely to visit product pages for workout gear and equipment. The marketer creates a lead capture form offering a free downloadable workout plan in exchange for an email address. By targeting visitors interested in fitness content, they successfully capture more leads, which are then nurtured with personalized email campaigns promoting relevant products. This strategy boosts both lead generation and sales opportunities.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Visitor Tracking Tools & Software

The list of visitor tracking tools and software really could be endless but for the purpose of this guide, we want to give you 10 of our favorites.

1. Customers.ai

Customers.ai offers advanced visitor tracking through identity resolution, helping businesses identify anonymous visitors, gather their information, and retarget them through ads. It’s a powerful tool for converting high-intent visitors into leads by providing detailed insights like name, email, and company information.

2. Google Analytics

URL: https://analytics.google.com 

Google Analytics provides comprehensive tracking of website traffic and user behavior. It helps businesses understand their audience with data on page views, bounce rates, and conversion paths, offering valuable insights to optimize site performance.

3. Hotjar

URL: https://www.hotjar.com 

Hotjar combines heatmaps, session recordings, and user feedback tools to track how visitors interact with your website. Its visual insights help businesses improve user experience by identifying popular areas of the site and potential pain points.

4. Crazy Egg

URL: https://www.crazyegg.com 

Crazy Egg offers tools like heatmaps and scroll maps to visualize where users are clicking and scrolling on your website. It’s ideal for understanding user engagement and optimizing page layout for better conversions.

5. Matomo

URL: https://matomo.org 

Matomo is an open-source analytics platform that tracks user behavior while offering full data ownership. It provides insights similar to Google Analytics but with an emphasis on privacy and data control, making it a good choice for businesses with strict compliance needs.

6. Mouseflow

URL: https://www.mouseflow.com 

Mouseflow tracks visitor behavior with session recordings, heatmaps, and funnel analysis. It helps businesses understand how users navigate their site and where they drop off, offering insights to optimize user journeys and boost conversions.

7. FullStory

URL: https://www.fullstory.com  

FullStory captures detailed user sessions to show how visitors interact with your website. Its session replay feature helps businesses identify usability issues, making it easier to optimize the site and improve user experience.

8. Kissmetrics

URL: https://www.kissmetrics.io  

Kissmetrics focuses on tracking user behavior across the entire customer journey. It provides funnel reports, cohort analysis, and conversion tracking, making it useful for understanding how visitors move from initial engagement to purchase.

9. Segment

URL: https://segment.com   

Segment allows businesses to collect user data from multiple sources and organize it into a single platform. It integrates with various marketing tools, helping businesses track customer interactions across channels and deliver more personalized experiences.

10. VWO (Visual Website Optimizer)

URL: https://vwo.com    

VWO combines A/B testing with visitor behavior tracking tools like heatmaps and session recordings. It’s ideal for optimizing website performance by testing different versions of pages and identifying what drives the most conversions.

Visitor Tracking with Customers.ai

With Customers.ai, visitor tracking gives you more than just basic data—it helps you see who’s actually browsing your site, even if they haven’t filled out a form. 

By using website visitor identification technology, you can turn anonymous visitors into real leads, gathering details like their name, email, and company info. This way, you can follow up with personalized retargeting ads through Facebook or Google.

Just look at what Mailer Profit Agency did for their client – by utilizing Customers.ai’s powerful lead capture tools, they were able to triple their client’s email list by capturing warm leads that would’ve otherwise been missed. This not only grew the list but helped drive a staggering $1.7 million in revenue and deliver an impressive 15x ROI for their client!

Sounds pretty awesome, right? 

It is and if you’re not tracking your visitors, you’re likely missing out on valuable leads and data that could grow your business. 

Don’t let customers slip through the cracks—in just 90 seconds you can start tracking your visitors today with Customers.ai. Start your free trial!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Visitor Tracking FAQs

1. Why is visitor tracking important for websites?

Visitor tracking is important because it helps businesses understand how users engage with their site, identify issues, optimize performance, and create personalized marketing strategies to drive more conversions.

2. What types of visitor tracking are there?

There are several types of visitor tracking, including identity resolution, website analytics, session recording, heatmaps, and conversion tracking, each providing different insights into user behavior.

3. How does visitor tracking help with lead generation?

Visitor tracking helps with lead generation by identifying high-intent visitors, even anonymous ones, and providing actionable data to retarget them with personalized offers, ultimately turning them into leads.

4. How can I track visitors to my website?

You can track visitors by using visitor tracking tools that collect data through scripts, cookies, and IP addresses, offering insights into user behavior and engagement.

5. What tools are used for visitor tracking?

Popular visitor tracking tools include Google Analytics, Hotjar, Customers.ai, Crazy Egg, and Matomo, which monitor user behavior and provide actionable insights for optimization.

6. What data does visitor tracking collect?

Visitor tracking typically collects data such as page views, session duration, clicks, IP addresses, referring URLs, and sometimes even personal information like name and email (with consent).

7. Is visitor tracking legal?

Visitor tracking is legal as long as it complies with data privacy laws like GDPR and CCPA, which require user consent before collecting personal data.

8. Can visitor tracking identify individual users?

Yes, advanced visitor tracking tools can use identity resolution to identify individual users, providing details like names, email addresses, and company information, but only if the proper consent is given.

9. What is anonymous visitor tracking?

Anonymous visitor tracking refers to tracking website visitors who haven’t filled out a form or provided personal details, often using methods like IP tracking or reverse DNS to gather information.

10. How does visitor tracking improve conversions?

By understanding user behavior and identifying high-intent visitors, visitor tracking allows businesses to tailor their website experience and marketing efforts, leading to better engagement and higher conversion rates.

11. Can visitor tracking help with retargeting?

Yes, visitor tracking helps identify users who visited specific pages but didn’t convert, enabling businesses to retarget them with personalized ads through platforms like Google and Facebook.

12. How does visitor tracking help with website optimization?

Visitor tracking provides insights into where visitors drop off, which pages are popular, and how users interact with the site, helping businesses make data-driven decisions to optimize their website.

13. What is the difference between visitor tracking and website analytics?

Visitor tracking focuses on user actions and interactions on a website, while website analytics provides a broader overview of site performance, including traffic sources, demographics, and conversions.

14. How can I track anonymous visitors on my website?

You can track anonymous visitors using tools like Customers.ai that use methods like reverse DNS, IP tracking, and behavioral data to identify and retarget users who haven’t provided personal information.

15. How long can I track a website visitor?

The duration you can track a visitor depends on the type of tracking used; cookie-based tracking can last until the cookie expires (typically 7-30 days), while server-side tracking can be more persistent.

16. Can visitor tracking work on mobile websites?

Yes, visitor tracking works on both desktop and mobile websites, collecting data on user behavior regardless of the device used to visit the site.

17. What is the best visitor tracking software for ecommerce?

The best visitor tracking software for ecommerce depends on your needs, but tools like Customers.ai, Hotjar, and Google Analytics offer robust tracking capabilities that can improve engagement and sales.

18. How can visitor tracking improve user experience?

Visitor tracking helps identify pain points in the user journey, such as confusing navigation or slow-loading pages, allowing businesses to optimize the experience and keep visitors engaged.

19. What are first-party cookies in visitor tracking?

First-party cookies are small text files placed by the website a user is visiting, and they track user behavior and preferences for a better, personalized experience while staying compliant with privacy regulations.

20. How does GDPR affect visitor tracking?

GDPR requires businesses to obtain user consent before tracking personal data, meaning businesses must be transparent about their tracking practices and provide options for users to opt out.
The post Visitor Tracking: Everything You Need to Know in 2025 appeared first on Customers.ai.

Zyphra Releases Zamba2-7B: A State-of-the-Art Small Language Model

Zyphra has officially released Zamba2-7B, a state-of-the-art small language model that promises unprecedented performance in the 7B parameter range. This model outperforms existing competitors, including Mistral-7B, Google’s Gemma-7B, and Meta’s Llama3-8B, in both quality and speed. Zamba2-7B is specifically designed for environments that require powerful language capabilities but have hardware limitations, such as on-device processing or consumer GPUs. By focusing on efficiency without sacrificing quality, Zyphra is trying to democratize access to advanced AI for a broader audience, from enterprises to individual developers.

The architecture of Zamba2-7B incorporates significant technical innovations that enhance both efficiency and expressivity. Unlike its predecessor, Zamba1, Zamba2-7B uses two shared attention blocks interleaved throughout the network, providing a more sophisticated approach to information flow and cross-sequence dependencies. The Mamba2 blocks form the backbone of the architecture, which allows better parameter utilization compared to traditional transformer models. The use of LoRA (Low-Rank Adaptation) projection on shared MLP blocks is another advancement that helps the model adapt more precisely, thus increasing the versatility of each layer while keeping the model size compact. As a result, Zamba2-7B achieves a 25% reduction in time to the first token and a 20% improvement in tokens processed per second compared to its competitors.

Zamba2-7B is particularly important due to its impressive efficiency and adaptability, which have been validated through rigorous testing. The model was trained on a massive pre-training dataset of three trillion tokens, which includes high-quality and extensively filtered open datasets. Additionally, Zyphra has incorporated an “annealing” pre-training phase, which rapidly decays the learning rate over a curated set of high-quality tokens. This strategy has resulted in superior benchmark performance, as the model comfortably surpasses its competitors in both inference speed and quality. The results indicate that Zamba2-7B is exceptionally suited for tasks involving natural language understanding and generation without the significant computational overhead typically associated with high-quality models.

In conclusion, Zamba2-7B represents a significant step forward in the development of small language models that do not compromise on quality or performance. By blending innovative architectural improvements with efficient training techniques, Zyphra has succeeded in creating a model that is not only accessible but also highly capable of meeting a variety of NLP needs. With the release of Zamba2-7B under an open-source license, Zyphra invites researchers, developers, and enterprises to explore its capabilities, pushing the frontier of what smaller models can achieve. The open availability of Zamba2-7B could well make advanced NLP accessible to a wider community, thereby advancing the field in exciting new ways.

Check out the Details, and Huggingface integration is available here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Zyphra Releases Zamba2-7B: A State-of-the-Art Small Language Model appeared first on MarkTechPost.