OpenAI Releases SimpleQA: A New AI Benchmark that Measures the Factual …

The rise of large language models has been accompanied by significant challenges, particularly around ensuring the factuality of generated responses. One persistent issue is that these models can produce outputs that are factually incorrect or even misleading, a phenomenon often called “hallucination.” These hallucinations occur when models generate confident-sounding but incorrect or unverifiable information. Given the growing reliance on AI for information, factual accuracy has become critical. However, evaluating this accuracy is not easy, especially for long-form completions filled with multiple factual claims.

OpenAI recently open-sourced SimpleQA: a new benchmark that measures the factuality of responses generated by language models. SimpleQA is unique in its focus on short, fact-seeking questions with a single, indisputable answer, making it easier to evaluate the factual correctness of model responses. Unlike other benchmarks that often become outdated or saturated over time, SimpleQA was designed to remain challenging for the latest AI models. The questions in SimpleQA were created in an adversarial manner against responses from GPT-4, ensuring that even the most advanced language models struggle to answer them correctly. The benchmark contains 4,326 questions spanning various domains, including history, science, technology, art, and entertainment, and is built to be highly evaluative of both model precision and calibration.

SimpleQA’s design follows specific principles to ensure it serves as a robust factuality benchmark. First, questions are created with high correctness in mind: each question has a reference answer determined by two independent AI trainers to ensure consistency. The dataset was curated to focus only on questions that can be answered with a single, clear response, which prevents ambiguity and makes grading simpler. Moreover, grading is carried out by a prompted ChatGPT classifier, which assesses responses as either “correct,” “incorrect,” or “not attempted.” This straightforward structure allows researchers to assess how models perform under factual constraints.

The diversity of questions is another key benefit of SimpleQA. It features a broad set of topics to prevent model specialization and ensure a holistic evaluation. Moreover, the dataset’s usability is enhanced by its simplicity—both questions and answers are short, which makes the benchmark fast to run and reduces variance during evaluation runs. Importantly, SimpleQA also incorporates questions that have been verified to be relevant over time, thus eliminating the influence of shifting information and making it an “evergreen” benchmark.

The importance of SimpleQA lies in its targeted evaluation of language models’ factual abilities. In a landscape where many benchmarks have been “solved” by recent models, SimpleQA is designed to remain challenging even for frontier models like GPT-4 and Claude. For instance, models such as GPT-4o scored only about 38.4% in terms of correct answers, highlighting the benchmark’s ability to probe areas where even advanced models face difficulties. Other models, including Claude-3.5, performed similarly or worse, indicating that SimpleQA poses a consistent challenge across model types. This benchmark, therefore, provides valuable insights into the calibration and reliability of language models—particularly their ability to discern when they have enough information to answer confidently and correctly.

Moreover, SimpleQA’s grading metrics provide nuanced insights into model behavior. The benchmark calculates not only the percentage of questions answered correctly but also measures “correct given attempted,” a metric akin to precision. These two metrics are combined to derive an F-score, which offers a single-number measure of factuality. Notably, the results of SimpleQA suggest that language models tend to overstate their confidence, with a large number of incorrect attempts. The analysis reveals that while larger models demonstrate better calibration (meaning they are better at recognizing when they know the correct answer), the overall accuracy leaves room for improvement.

SimpleQA is an important step toward improving the reliability of AI-generated information. By focusing on short, fact-based questions, it provides a practical, easy-to-use benchmark that helps evaluate a critical aspect of language models: their ability to generate factual content consistently. Given the benchmark’s adversarial design, SimpleQA sets a high bar for accuracy, encouraging researchers and developers to create models that not only generate language but do so truthfully. The open sourcing of SimpleQA provides the AI community with a valuable tool for assessing and improving the factual accuracy of language models, helping to ensure that future AI systems can be both informative and trustworthy.

Check out the Paper, Details, and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs
The post OpenAI Releases SimpleQA: A New AI Benchmark that Measures the Factuality of Language Models appeared first on MarkTechPost.

Taipan: A Novel Hybrid Architecture that Combines Mamba-2 with Selecti …

Transformer-based architectures have revolutionized natural language processing, delivering exceptional performance across diverse language modeling tasks. However, they still face major challenges when handling long-context sequences. The self-attention mechanism in Transformers suffers from quadratic computational complexity, and their memory requirement grows linearly with context length during inference. These factors impose practical constraints on sequence length due to the high computational and memory costs. Recent advancements in recurrent-based architectures, especially State Space Models (SSMs), have shown promise as efficient alternatives for language modeling.

Existing approaches like State Space Models (SSMs) have shown the capability to address the challenges of Transformer-based architectures. The development of SSMs has progressed through several key iterations, such as S4, DSS, S4D, and S5, which have improved computational and memory efficiency. Recent variants like Mamba have used input-dependent state transitions to address the limitations of static dynamics in previous SSMs. Despite these advancements, SSM-based models face limitations in scenarios that need in-context retrieval or handling complex long-range dependencies. Moreover, the Long Context Models discussed in this paper include Recurrent Memory Transformer, LongNet, and  Hyena/HyenaDNA.

Researchers from the University of Oregon, Auburn University, and Adobe Research have proposed Taipan, a hybrid architecture that combines the efficiency of Mamba with enhanced long-range dependency handling through Selective Attention Layers (SALs). While Mamba is highly efficient, it relies on the Markov assumption, which can lead to information loss for tokens that need interactions with distant tokens. To mitigate this, Taipan utilizes SALs that strategically select key tokens in the input sequence requiring long-range dependencies. Moreover, the Taipan balances Mamba’s efficiency with Transformer-like performance in memory-intensive tasks. It extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency by constraining the attention budget. 

Taipan leverages SALs within the Mamba framework to boost Mamba’s modeling capabilities while preserving its computational efficiency. SALs are inserted after every K Mamba-2 block, creating a hybrid structure that combines Mamba-2’s efficiency with Transformer-style attention. The core of SALs is a gating network that identifies important tokens for enhanced representation modeling. These tokens undergo feature refinement and attention-based representation augmentation, allowing Taipan to capture complex, non-Markovian dependencies. The hybrid structure balances Mamba-2’s efficiency with the expressive power of SALs, allowing Taipan to perform well in tasks that need both speed and accurate information retrieval.

Taipan consistently outperforms baseline models across most tasks for various model sizes, with the performance gap widening as the model size increases. The 1.3B Taipan model significantly improves over other baselines, suggesting its architecture effectively captures and utilizes linguistic patterns. Taipan also demonstrates superior performance in in-context retrieval tasks compared to Mamba and Jamba, while consuming fewer computational resources than Jamba. Moreover, Taipan maintains constant memory usage, offering a more efficient solution for processing long documents compared to Transformers which face challenges with linear memory scaling.

In conclusion, researchers introduced Taipan, a hybrid architecture that combines Mamba’s efficiency with improved long-range dependency handling through SALs. The experiments demonstrate Taipan’s superior performance across various scales and tasks, especially in scenarios that need extensive in-context retrieval while maintaining computational efficiency. Taipan’s architecture utilizes the insight that not all tokens require the same computational resources through its selective attention mechanism, which dynamically allocates resources based on the importance of tokens. This approach allows Taipan to balance efficiency with enhanced long-range modeling capabilities, making it a promising solution for memory-intensive tasks with long sequences.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs
The post Taipan: A Novel Hybrid Architecture that Combines Mamba-2 with Selective Attention Layers (SALs) appeared first on MarkTechPost.

Meta AI Releases LongVU: A Multimodal Large Language Model that can Ad …

Understanding and analyzing long videos has been a significant challenge in AI, primarily due to the vast amount of data and computational resources required. Traditional Multimodal Large Language Models (MLLMs) struggle to process extensive video content because of limited context length. This challenge is especially evident with hour-long videos, which need hundreds of thousands of tokens to represent visual information—often exceeding the memory capacity of even advanced hardware. Consequently, these models struggle to provide consistent and comprehensive video understanding, limiting their real-world applications.

Meta AI Releases LongVU

Meta AI has released LongVU, an MLLM designed to address the challenge of long video understanding within a commonly used context length. LongVU employs a spatiotemporal adaptive compression mechanism that intelligently reduces the number of video tokens while preserving essential visual details. By leveraging a combination of DINOv2 features and cross-modal queries, LongVU effectively reduces spatial and temporal redundancies in video data, enabling the processing of long-form video sequences without losing critical information.

LongVU uses a selective frame feature reduction approach guided by text queries and leverages DINOv2’s self-supervised features to discard redundant frames. This method has a significant advantage over traditional uniform sampling techniques, which either lead to the loss of important information by discarding keyframes or become computationally infeasible by retaining too many tokens. The resulting MLLM has a lightweight design, allowing it to operate efficiently and achieve state-of-the-art results on video understanding benchmarks.

Technical Details and Benefits of LongVU

LongVU’s architecture combines DINOv2 features for frame extraction, selective frame feature reduction through text-guided cross-modal queries, and spatial token reduction based on temporal dependencies. Initially, DINOv2’s feature similarity objective is used to eliminate redundant frames, reducing the token count. LongVU then applies a cross-modal query to prioritize frames relevant to the input text query. For the remaining frames, a spatial pooling mechanism further reduces the token representation while preserving the most important visual details.

This approach maintains high performance even when processing hour-long videos. The spatial token reduction mechanism ensures that essential spatial information is retained while redundant data is eliminated. LongVU processes one-frame-per-second (1fps) sampled video input, effectively reducing the number of tokens per frame to an average of two, accommodating hour-long video sequences within an 8k context length—a common limitation for MLLMs. The architecture balances token reduction with the preservation of crucial visual content, making it highly efficient for long video processing.

Importance and Performance of LongVU

LongVU represents a significant breakthrough in long video understanding by overcoming the fundamental issue of limited context length faced by most MLLMs. Through spatiotemporal compression and effective cross-modal querying, LongVU achieves impressive results on key video understanding benchmarks. For example, on the VideoMME benchmark, LongVU outperforms a strong baseline model, LLaVA-OneVision, by approximately 5% in overall accuracy. Even when scaled down to a lightweight version using the Llama3.2-3B language backbone, LongVU demonstrated substantial gains, achieving a 3.4% improvement over previous state-of-the-art models in long video tasks.

LongVU’s robustness is further highlighted by its competitive results against proprietary models like GPT-4V. On the MVBench evaluation set, LongVU not only reduced the performance gap with GPT-4V but also surpassed it in some cases, demonstrating its effectiveness in understanding densely sampled video inputs. This makes LongVU particularly valuable for applications that require real-time video analysis, such as security surveillance, sports analysis, and video-based educational tools.

Conclusion

Meta AI’s LongVU is a major advancement in video understanding, especially for lengthy content. By using spatiotemporal adaptive compression, LongVU effectively addresses the challenges of processing videos with temporal and spatial redundancies, providing an efficient solution for long video analysis. Its superior performance across benchmarks highlights its edge over traditional MLLMs, paving the way for more advanced applications.

With its lightweight architecture and efficient compression, LongVU extends high-level video understanding to diverse use cases, including mobile and low-resource environments. By reducing computational costs without compromising accuracy, LongVU sets a new standard for future MLLMs.

Check out the Paper and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs
The post Meta AI Releases LongVU: A Multimodal Large Language Model that can Address the Significant Challenge of Long Video Understanding appeared first on MarkTechPost.

Unlock organizational wisdom using voice-driven knowledge capture with …

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.
This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.
The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.
By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.
Solution overview
This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

User interface – A React-based front-end, distributed through Amazon CloudFront, provides an intuitive interface for employees to input voice data.
Real-time transcription – Amazon Transcribe streaming converts speech to text in real time, providing accurate and immediate transcription of spoken knowledge.
Intelligent processing – A Lambda function, powered by generative AI models through Amazon Bedrock, analyzes and summarizes the transcribed text. It goes beyond simple summarization by performing the following actions:

Extracting key concepts and terminologies.
Structuring the information into a coherent, well-organized document.

Secure storage – Raw audio files, processed information, summaries, and generated content are securely stored in Amazon S3, providing scalable and durable storage for this valuable knowledge repository. S3 bucket policies and encryption are implemented to enforce data security and compliance.

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

Simplicity – As a sample application, it doesn’t demand full user management or login functionality
Minimal user friction – Users don’t need to create accounts or log in, simplifying the user experience
Quick implementation – For rapid prototyping, this approach can be faster to implement than setting up a full user management system
Temporary credential management – Businesses can use this approach to offer secure, temporary access to AWS services without embedding long-term credentials in the application

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.
The following diagram illustrates the architecture of our solution.

The workflow includes the following steps:

Users access the front-end UI application, which is distributed through CloudFront
The React web application sends an initial request to Amazon API Gateway
API Gateway forwards the request to the authorization Lambda function
The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions
The authorization function sends temporary credentials back to the front-end application through API Gateway
With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input
After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway
API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function
The orchestration function sends the transcribed text for summarization
The orchestration function receives summarized text from Amazon Bedrock to generate content
The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites
You need the following prerequisites:

An active AWS account
Docker installed
The AWS CDK Toolkit 2.114.1+ installed and bootstrapped to the us-east-1 AWS Region
Python 3.12+ installed
Model access to Anthropic’s Claude enabled in Amazon Bedrock
An IAM user or role with access to Amazon Transcribe, Amazon Bedrock, Amazon S3, and Lambda

Deploy the solution with the AWS CDK
The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

Amazon Bedrock
Amazon CloudFront
AWS CodeBuild
Amazon EventBridge
IAM
AWS Key Management Service (AWS KMS)
AWS Lambda
Amazon S3
AWS Systems Manager Parameter Store
Amazon Transcribe
AWS WAF

To deploy the solution, complete the following steps:

Clone the GitHub repository: genai-knowledge-capture-webapp
Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

Invoke npm install to install the dependencies
Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.
Amazon Transcribe Streaming within React application
Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.
The real-time transcription offers several benefits for knowledge capture:

With the immediate feedback, speakers can correct or clarify their statements in the moment
The visual representation of spoken words can help maintain focus and structure in the knowledge sharing process
It reduces the cognitive load on the speaker, who doesn’t need to worry about note-taking or remembering key points

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe client
transcribeClientRef.current = new TranscribeStreamingClient({
region: credentials.Region,
credentials: {
accessKeyId: credentials.AccessKeyId,
secretAccessKey: credentials.SecretAccessKey,
sessionToken: credentials.SessionToken,
},
});

// Create Transcribe Start Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
LanguageCode: transcribeLanguage,
MediaEncoding: audioEncodingType,
MediaSampleRateHertz: audioSampleRate,
AudioStream: getAudioStreamGenerator(),
});

// Start Transcribe session
const data = await transcribeClientRef.current.send(
transcribeStartCommand
);
console.log(“Transcribe session established “, data.SessionId);
setIsTranscribing(true);

// Process Transcribe result stream
if (data.TranscriptResultStream) {
try {
for await (const event of data.TranscriptResultStream) {
handleTranscriptEvent(event, setTranscribeResponse);
}
} catch (error) {
console.error(“Error processing transcript result stream:”, error);
}
}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.
Use the application
After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

To use this application, complete the following steps:

Enter a question or topic.
Enter a file name for the document.
Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time.
After recording, you can edit the transcribed text.
You can also choose the play icon to play the recorded audio clips.
Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

Organizes the content into logical sections with appropriate headings
Identifies and highlights important concepts or terminologies
Generates a brief executive summary at the beginning of the document
Applies consistent formatting and styling

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

Additional information
To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.
Custom vocabulary with Amazon Transcribe
For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

Create a custom vocabulary file with your specialized terms
Use the Amazon Transcribe API to add this vocabulary to your account
Specify the custom vocabulary in your transcription requests

Asynchronous file uploads
For handling large audio files or improving user experience, implement an asynchronous upload process:

Create a separate Lambda function for file uploads
Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3
Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation
For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.
Key benefits of this approach include:

Efficient capture of complex, multifaceted knowledge
Improved document structure and coherence
Reduced cognitive load on subject matter experts (SMEs)

Use captured knowledge as a knowledge base
The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.
Clean up
When you’re done testing the solution, remove it from your AWS account to avoid future costs:

Invoke cdk destroy to remove the solution
You may also need to manually remove the S3 buckets created by the solution

Summary
This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.
We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.
By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.

About the Authors
Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.
Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.
Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

Achieve multi-Region resiliency for your conversational AI chatbots wi …

Global Resiliency is a new Amazon Lex capability that enables near real-time replication of your Amazon Lex V2 bots in a second AWS Region. When you activate this feature, all resources, versions, and aliases associated after activation will be synchronized across the chosen Regions. With Global Resiliency, the replicated bot resources and aliases in the second Region will have the same identifiers as those in the source Region. This consistency allows you to seamlessly route traffic to any Region by simply changing the Region identifier, providing uninterrupted service availability. In the event of a Regional outage or disruption, you can swiftly redirect your bot traffic to a different Region. Applications now have the ability to use replicated Amazon Lex bots across Regions in an active-active or active-passive manner for improved availability and resiliency. With Global Resiliency, you no longer need to manually manage separate bots across Regions, because the feature automatically replicates and keeps Regional configurations in sync. With just a few clicks or commands, you gain robust Amazon Lex bot replication capabilities. Applications that are using Amazon Lex bots can now fail over from an impaired Region seamlessly, minimizing the risk of costly downtime and maintaining business continuity. This feature streamlines the process of maintaining robust and highly available conversational applications. These include interactive voice response (IVR) systems, chatbots for digital channels, and messaging platforms, providing a seamless and resilient customer experience.
In this post, we walk you through enabling Global Resiliency for a sample Amazon Lex V2 bot. We showcase the replication process of bot versions and aliases across multiple Regions. Additionally, we discuss how to handle integrations with AWS Lambda and Amazon CloudWatch after enabling Global Resiliency.
Solution overview
For this exercise, we create a BookHotel bot as our sample bot. We use an AWS CloudFormation template to build this bot, including defining intents, slots, and other required components such as a version and alias. Throughout our demonstration, we use the us-east-1 Region as the source Region, and we replicate the bot in the us-west-2 Region, which serves as the replica Region. We then replicate this bot, enable logging, and integrate it with a Lambda function.
To better understand the solution, refer to the following architecture diagram.

Enabling Global Resiliency for an Amazon Lex bot is straightforward using the AWS Management Console, AWS Command Line Interface (AWS CLI), or APIs. We walk through the instructions to replicate the bot later in this post.
After replication is successfully enabled, the bot will be replicated across Regions, providing a unified experience. This allows you to distribute IVR or chat application requests between Regions in either an active-active or active-passive setup, depending on your use case.
A key benefit of Global Resiliency is that developers can continuously work on bot improvements in the source Region, and changes are automatically synchronized to the replica Region. This streamlines the development workflow without compromising resiliency.

At the time of writing, Global Resiliency only works with predetermined pairs of Regions. For more information, see Use Global Resiliency to deploy bots to other Regions.
Prerequisites
You should have the following prerequisites:

An AWS account with administrator access
Access to Amazon Lex Global Resiliency (contact your Amazon Connect Solutions Architect or Technical Account Manager)
Working knowledge of the following services:

AWS CloudFormation
Amazon CloudWatch
AWS Lambda
Amazon Lex

Create a sample Amazon Lex bot
To set up a sample bot for our use case, refer to Manage your Amazon Lex bot via AWS CloudFormation templates. For this example, we create a bot named BookHotel in the source Region (us-east-1). Complete the following steps:

Download the CloudFormation template and deploy it in the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

Upon successful deployment, the BookHotel bot will be created in the source Region.

On the Amazon Lex console, choose Bots in the navigation pane and locate the BookHotel.

Verify that the Global Resiliency option is available under Deployment in the navigation pane. If this option isn’t visible, the Global Resiliency feature may not be enabled for your account. In this case, refer to the prerequisites section for enabling the Global Resiliency feature.

Our sample BookHotel bot has one version (Version 1, in addition to the draft version) and an alias named BookHotelDemoAlias (in addition to the TestBotAlias).
Enable Global Resiliency
To activate Global Resiliency and set up bot replication in a replica Region, complete the following steps:

On the Amazon Lex console, choose us-east-1 as your Region.
Choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here. Because you haven’t enabled Global Resiliency yet, all the details are blank.

Choose Create replica to create a draft version of your bot.

In your source Region (us-east-1), after the bot replication is complete, you will see Replication status as Enabled.

Switch to the replica Region (us-west-2).

You can see that the BookHotel bot is replicated. This is a read-only replica and the bot ID in the replica Region matches the bot ID in the source Region.

Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here, which are the same as that in the source Region BookHotel bot.
You have verified that the bot is replicated successfully after Global Resiliency is enabled. Only new versions and aliases created from this point onward will be replicated. As a next step, we create a bot version and alias to demonstrate the replication.
Create a new bot version and alias
Complete the following steps to create a new bot version and alias:

On the Amazon Lex console in your source Region (us-east-1), navigate to the BookHotel.
Choose Bot versions in the navigation pane, and choose Create new version to create Version 2.

Version 2 now has Global Resiliency enabled, whereas Version 1 and the draft version do not, because they were created prior to enabling Global Resiliency.

Choose Aliases in the navigation pane, then choose Create new alias.
Create a new alias for the BookHotel bot called BookHotelDemoAlias_GR and point that to the new version.

Similarly, the BookHotelDemoAlias_GR now has Global Resiliency enabled, whereas aliases created before enabling Global Resiliency, such as BookHotelDemoAlias and TestBotAlias, don’t have Global Resiliency enabled.

Choose Global Resiliency in the navigation pane to view the source and replication details.

The details for Last replicated version are now updated to Version 2.

Switch to the replica Region (us-west-2) and choose Global Resiliency in the navigation pane.

You can see that the new Global Resiliency enabled version (Version 2) is replicated and the new alias BookHotelDemoAlias_GR is also present.
You have verified that the new version and alias were created after Global Resiliency is replicated to the replica Region. You can now make Amazon Lex runtime calls to both Regions.
Handling integrations with Lambda and CloudWatch after enabling Global Resiliency
Amazon Lex has integrations with other AWS services such as enabling custom logic with Lambda functions and logging with conversation logs using CloudWatch and Amazon Simple Storage Service (Amazon S3). In this section, we associate a Lambda function and CloudWatch group for the BookHotel bot in the source Region (us-east-1) and validate its association in the replica Region (us-west-2).

Download the CloudFormation template to deploy a sample Lambda and CloudWatch log group.
Deploy the CloudFormation stack to the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-east-1 Region.

Deploy the CloudFormation stack to the replica Region (us-west-2).

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-west-2 Region. The Lambda function name and CloudWatch log group name must be the same in both Regions.

On the Amazon Lex console in the source Region (us-east-1), navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.
In the Languages section, choose English (US).
Select the book-hotel-lambda function and associate it with the BookHotel bot by choosing Save.
Navigate back to the BookHotelDemoAlias_GR alias, and in the Conversation logs section, choose Manage conversation logs.
Enable Text logs and select the /lex/book-hotel-bot log group, then choose Save.

Conversation text logs are now enabled for the BookHotel bot in us-east-1.

Switch to the replica Region (us-west-2) and navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.

You can see that the conversation logs are already associated with the /lex/book-hotel-bot CloudWatch group the us-west-2 Region.

In the Languages section, choose English (US).

You can see that the book-hotel-lambda function is associated with the BookHotel alias.
Through this process, we have demonstrated how Lambda functions and CloudWatch log groups are automatically associated with the corresponding bot resources in the replica Region for the replicated bots, providing a seamless and consistent integration across both Regions.
Disabling Global Resiliency
You have the flexibility to disable Global Resiliency at any time. By disabling Global Resiliency, your source bot, along with its associated aliases and versions, will no longer be replicated across other Regions. In this section, we demonstrate the process to disable Global Resiliency.

On the Amazon Lex console in your source Region (us-east-1), choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.
Choose Disable Global Resiliency.

Enter confirm in the confirmation box and choose Delete.

This action initiates the deletion of the replicated BookHotel bot in the replica Region.
The replication status will change to Deleting, and after a few minutes, the deletion process will be complete. You will then see the Create replica option available again. If you don’t see it, try refreshing the page.

Check the Bot versions page of the BookHotel bot to confirm that Version 2 is still the latest version.
Check the Aliases page to confirm that the BookHotelDemoAlias_GR alias is still present on the source bot.

Applications referring to this alias can continue to function as normal in the source Region.

Switch to the replica Region (us-west-2) to confirm that the BookHotel bot has been deleted from this Region.

You can reenable Global Resiliency on the source Region (us-east-1) by going through the process described earlier in this post.
Clean up
To prevent incurring charges, complete the following steps to clean up the resources created during this demonstration:

Disable Global Resiliency for the bot by following the instructions detailed earlier in this post.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-west-2. For instructions, see Delete a stack on the CloudFormation console.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-east-1.
Delete the book-hotel-stack CloudFormation stack from the us-east-1.

Integrations with Amazon Connect
Amazon Lex Global Resiliency seamlessly complements Amazon Connect Global Resiliency, providing you with a comprehensive solution for maintaining business continuity and resilience across your conversational AI and contact center infrastructure. Amazon Connect Global Resiliency enables you to automatically maintain your instances synchronized across two Regions, making sure that all configuration resources, such as contact flows, queues, and agents, are true replicas of each other.
With the addition of Amazon Lex Global Resiliency, Amazon Connect customers gain the added benefit of automated synchronization of their Amazon Lex V2 bots associated with their contact flows. This integration provides a consistent and uninterrupted experience during failover scenarios, because your Amazon Lex interactions seamlessly transition between Regions without any disruption. By combining these complementary features, you can achieve end-to-end resilience. This minimizes the risk of downtime and makes sure your conversational AI and contact center operations remain highly available and responsive, even in the case of Regional failures or capacity constraints.
Global Resiliency APIs
Global Resiliency provides API support to create and manage replicas. These are supported in the AWS CLI and AWS SDKs. In this section, we demonstrate usage with the AWS CLI.

Create a bot replica in the replica Region using the CreateBotReplica.
Monitor the bot replication status using the DescribeBotReplica.
List the replicated bots using the ListBotReplicas.
List all the version replication statuses applicable for Global Resiliency using the ListBotVersionReplicas.

This list includes only the replicated bot versions, which were created after Global Resiliency was enabled. In the API response, a botVersionReplicationStatus of Available indicates that the bot version was replicated successfully.

List all the alias replication statuses applicable for Global Resiliency using the ListBotAliasReplicas.

This list includes only the replicated bot aliases, which were created after Global Resiliency was enabled. In the API response, a botAliasReplicationStatus of Available indicates that the bot alias was replicated successfully.
Conclusion
In this post, we introduced the Global Resiliency feature for Amazon Lex V2 bots. We discussed the process to enable Global Resiliency using the console and reviewed some of the new APIs released as part of this feature.
As the next step, you can explore Global Resiliency and apply the techniques discussed in this post to replicate bots and bot versions across Regions. This hands-on practice will solidify your understanding of managing and replicating Amazon Lex V2 bots in your solution architecture.

About the Authors
Priti Aryamane is a Specialty Consultant at AWS Professional Services. With over 15 years of experience in contact centers and telecommunications, Priti specializes in helping customers achieve their desired business outcomes with customer experience on AWS using Amazon Lex, Amazon Connect, and generative AI features.
Sanjeet Sanda is a Specialty Consultant at AWS Professional Services with over 20 years of experience in telecommunications, contact center technology, and customer experience. He specializes in designing and delivering customer-centric solutions with a focus on integrating and adapting existing enterprise call centers into Amazon Connect and Amazon Lex environments. Sanjeet is passionate about streamlining adoption processes by using automation wherever possible. Outside of work, Sanjeet enjoys hanging out with his family, having barbecues, and going to the beach.
Yogesh Khemka is a Senior Software Development Engineer at AWS, where he works on large language models and natural language processing. He focuses on building systems and tooling for scalable distributed deep learning training and real-time inference.

Create and fine-tune sentence transformers for enhanced classification …

Sentence transformers are powerful deep learning models that convert sentences into high-quality, fixed-length embeddings, capturing their semantic meaning. These embeddings are useful for various natural language processing (NLP) tasks such as text classification, clustering, semantic search, and information retrieval.
In this post, we showcase how to fine-tune a sentence transformer specifically for classifying an Amazon product into its product category (such as toys or sporting goods). We showcase two different sentence transformers, paraphrase-MiniLM-L6-v2 and a proprietary Amazon large language model (LLM) called M5_ASIN_SMALL_V2.0, and compare their results. M5 LLMS are BERT-based LLMs fine-tuned on internal Amazon product catalog data using product title, bullet points, description, and more. They are currently being used for use cases such as automated product classification and similar product recommendations. Our hypothesis is that M5_ASIN_SMALL_V2.0 will perform better for the use case of Amazon product category classification due to it being fine-tuned with Amazon product data. We prove this hypothesis in the following experiment illustrated in this post.
Solution overview
In this post, we demonstrate how to fine-tune a sentence transformer with Amazon product data and how to use the resulting sentence transformer to improve classification accuracy of product categories using an XGBoost decision tree. For this demonstration, we use a public Amazon product dataset called Amazon Product Dataset 2020 from a kaggle competition. This dataset contains the following attributes and fields:

Domain name – amazon.com
Date range – January 1, 2020, through January 31, 2020
File extension – CSV
Available fields – Uniq Id, Product Name, Brand Name, Asin, Category, Upc Ean Code, List Price, Selling Price, Quantity, Model Number, About Product, Product Specification, Technical Details, Shipping Weight, Product Dimensions, Image, Variants, SKU, Product Url, Stock, Product Details, Dimensions, Color, Ingredients, Direction To Use, Is Amazon Seller, Size Quantity Variant, and Product Description
Label field – Category

Prerequisites
Before you begin, install the following packages. You can do this in either an Amazon SageMaker notebook or your local Jupyter notebook by running the following commands:
!pip install sentencepiece –quiet
!pip install sentence_transformers –quiet
!pip install xgboost –-quiet
!pip install scikit-learn –-quiet/

Preprocess the data
The first step needed for fine-tuning a sentence transformer is to preprocess the Amazon product data for the sentence transformer to be able to consume the data and fine-tune effectively. It involves normalizing the text data, defining the product’s main category by extracting the first category from the Category field, and selecting the most important fields from the dataset that contribute to classifying the product’s main category accurately. We use the following code for preprocessing:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.read_csv(‘marketing_sample_for_amazon_com-ecommerce__20200101_20200131__10k_data.csv’)
data.columns = data.columns.str.lower().str.replace(‘ ‘, ‘_’)
data[‘main_category’] = data[‘category’].str.split(“|”).str[0]
data[“all_text”] = data.apply(
lambda r: ” “.join(
[
str(r[“product_name”]) if pd.notnull(r[“product_name”]) else “”,
str(r[“about_product”]) if pd.notnull(r[“about_product”]) else “”,
str(r[“product_specification”]) if pd.notnull(r[“product_specification”]) else “”,
str(r[“technical_details”]) if pd.notnull(r[“technical_details”]) else “”
]
),
axis=1
)
label_encoder = LabelEncoder()
labels_transform = label_encoder.fit_transform(data[‘main_category’])
data[‘label’]=labels_transform
data[[‘all_text’,’label’]]

The following screenshot shows an example of what our dataset looks like after it has been preprocessed.

Fine-tune the sentence transformer paraphrase-MiniLM-L6-v2
The first sentence transformer we fine-tune is called paraphrase-MiniLM-L6-v2. It uses the popular BERT model as its underlying architecture to transform product description text into a 384-dimensional dense vector embedding that will be consumed by our XGBoost classifier for product category classification. We use the following code to fine-tune paraphrase-MiniLM-L6-v2 using the preprocessed Amazon product data:
from sentence_transformers import SentenceTransformer
model_name=’paraphrase-MiniLM-L6-v2′
model = SentenceTransformer(model_name)

The first step is to define a classification head that represents the 24 product categories that an Amazon product can be classified into. This classification head will be used to train the sentence transformer specifically to be more effective at transforming product descriptions according to the 24 product categories. The idea is that all product descriptions that are within the same category should be transformed into a vector embedding that is closer in distance compared to product descriptions that belong in different categories.
 The following code is for fine-tuning sentence transformer 1:
import torch.nn as nn

# Define classification head
class ClassificationHead(nn.Module):
def __init__(self, embedding_dim, num_classes):
super(ClassificationHead, self).__init__()
self.linear = nn.Linear(embedding_dim, num_classes)

def forward(self, features):
x = features[‘sentence_embedding’]
x = self.linear(x)
return x

# Define the number of classes for a classification task.
num_classes = 24
print(‘class number:’, num_classes)
classification_head = ClassificationHead(model.get_sentence_embedding_dimension(), num_classes)

# Combine SentenceTransformer model and classification head.”
class SentenceTransformerWithHead(nn.Module):
def __init__(self, transformer, head):
super(SentenceTransformerWithHead, self).__init__()
self.transformer = transformer
self.head = head

def forward(self, input):
features = self.transformer(input)
logits = self.head(features)
return logits

model_with_head = SentenceTransformerWithHead(model, classification_head)

We then set the fine-tuning parameters. For this post, we train on five epochs, optimize for cross-entropy loss, and use the AdamW optimization method. We chose epoch 5 because, after testing various epoch values, we observed that the loss minimized at epoch 5. This made it the optimal number of training iterations for achieving the best classification results.
The following code is for fine-tuning sentence transformer 2:
import os
os.environ[“TORCH_USE_CUDA_DSA”] = “1”
os.environ[“CUDA_LAUNCH_BLOCKING”] = “1”

from sentence_transformers import SentenceTransformer, InputExample, LoggingHandler
import torch
from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup

train_sentences = data[‘all_text’]
train_labels = data[‘label’]
# training parameters
num_epochs = 5
batch_size = 2
learning_rate = 2e-5

# Convert the dataset to PyTorch tensors.
train_examples = [InputExample(texts=[s], label=l) for s, l in zip(train_sentences, train_labels)]

# Customize collate_fn to convert InputExample objects into tensors.
def collate_fn(batch):
texts = [example.texts[0] for example in batch]
labels = torch.tensor([example.label for example in batch])
return texts, labels

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=batch_size, collate_fn=collate_fn)

# Define the loss function, optimizer, and learning rate scheduler.
criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model_with_head.parameters(), lr=learning_rate)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

# Training loop
loss_list=[]
for epoch in range(num_epochs):
model_with_head.train()
for step, (texts, labels) in enumerate(train_dataloader):
labels = labels.to(model.device)
optimizer.zero_grad()

# Encode text and pass through classification head.
inputs = model.tokenize(texts)
input_ids = inputs[‘input_ids’].to(model.device)
input_attention_mask = inputs[‘attention_mask’].to(model.device)
inputs_final = {‘input_ids’: input_ids, ‘attention_mask’: input_attention_mask}

# move model_with_head to the same device
model_with_head = model_with_head.to(model.device)
logits = model_with_head(inputs_final)

loss = criterion(logits, labels)
loss.backward()
optimizer.step()
scheduler.step()
if step % 100 == 0:
print(f”Epoch {epoch}, Step {step}, Loss: {loss.item()}”)

print(f’Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}’)
model_save_path = f’./intermediate-output/epoch-{epoch}’
model.save(model_save_path)
loss_list.append(loss.item())
# Save the final model
model_final_save_path=’st_ft_epoch_5′
model.save(model_final_save_path)

To observe whether our resulting fine-tuned sentence transformer improves our product category classification accuracy, we use it as our text embedder in the XGBoost classifier in the next step.
XGBoost classification
XGBoost (Extreme Gradient Boosting) classification is a machine learning technique used for classification tasks. It’s an implementation of the gradient boosting framework designed to be efficient, flexible, and portable. For this post, we have XGBoost consume the product description text embedding output of our sentence transformers and observe product category classification accuracy. We use the following code to use the standard paraphrase-MiniLM-L6-v2 sentence transformer before it was fine-tuned to classify Amazon products to their respective categories:
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score

model = SentenceTransformer(‘paraphrase-MiniLM-L6-v2’)
data[‘text_embedding’] = data[‘all_text’].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data[‘text_embedding’].tolist(), index=data.index, dtype=float)

# Convert numeric columns stored as strings to floats
numeric_columns = [‘selling_price’, ‘shipping_weight’, ‘product_dimensions’] # Add more columns as needed
for col in numeric_columns:
data[col] = pd.to_numeric(data[col], errors=’coerce’)

# Convert categorical columns to category type
categorical_columns = [‘model_number’, ‘is_amazon_seller’] # Add more columns as needed
for col in categorical_columns:
data[col] = data[col].astype(‘category’)

X_0 = data[[‘selling_price’,’model_number’,’is_amazon_seller’]]
X = pd.concat([X_0, text_embeddings], axis=1)
label_encoder = LabelEncoder()
data[‘main_category_encoded’] = label_encoder.fit_transform(data[‘main_category’])
y = data[‘main_category_encoded’]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
‘max_depth’: 6,
‘eta’: 0.3,
‘objective’: ‘multi:softmax’,
‘num_class’: len(label_mapping),
‘eval_metric’: ‘mlogloss’
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

# Evaluate the model
y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f’Accuracy: {accuracy:.2f}’)

Accuracy: 0.78
We observe a 78% accuracy using the stock paraphrase-MiniLM-L6-v2 sentence transformer. To observe the results of the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we need to update the beginning of the code as follows. All other code remains the same.
model = SentenceTransformer(‘st_ft_epoch_5’)
data[‘text_embedding_miniLM_ft10’] = data[‘all_text’].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data[‘text_embedding_finetuned’].tolist(), index=data.index, dtype=float)
X_pa_finetuned = pd.concat([X_0, text_embeddings], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X_pa_finetuned, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Build and train the XGBoost model
# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
‘max_depth’: 6,
‘eta’: 0.3,
‘objective’: ‘multi:softmax’,
‘num_class’: len(label_mapping),
‘eval_metric’: ‘mlogloss’
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f’Accuracy: {accuracy:.2f}’)

# Optionally, convert the predicted labels back to the original category labels
inverse_label_mapping = {idx: label for label, idx in label_mapping.items()}
y_pred_labels = pd.Series(y_pred).map(inverse_label_mapping)

Accuracy: 0.94
With the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we observe a 94% accuracy, a 16% increase from the baseline of 78% accuracy. From this observation, we conclude that fine-tuning paraphrase-MiniLM-L6-v2 is effective for classifying Amazon product data into product categories.
Fine-tune the sentence transformer M5_ASIN_SMALL_V20
Now we create a sentence transformer from a BERT-based model called M5_ASIN_SMALL_V2.0. It’s a 40-million-parameter BERT-based model trained at M5, an internal team at Amazon specializing in fine-tuning LLMs using Amazon product data. It was distilled from a larger teacher model (approximately 5 billion parameters), which was pre-trained on a large amount of unlabeled ASIN data and pre-fine-tuned on a set of Amazon supervised learning tasks (multi-task pre-fine-tuning). It is a multi-task, multi-lingual, multi-locale, and multi-modal BERT-based encoder-only model trained on text and structured data input. Its neural network architectural details are as follows:
Model backbone:  Hidden size: 384  Number of hidden layers: 24  Number of attention heads: 16  Intermediate size: 1536  Vocabulary size: 256,035 Number of backbone parameters: 42,587,904 Number of word embedding parameters (bert.embedding.*): 98,517,504 Total number of parameters: 141,259,023
Because M5_ASIN_SMALL_V20 was pre-trained on Amazon product data specifically, we hypothesize that building a sentence transformer from it will increase the accuracy of product category classification. We complete the following steps to build a sentence transformer from M5_ASIN_SMALL_V20, fine-tune it, and input it into an XGBoost classifier to observe accuracy impact:

Load a pre-trained M5 model that you want to use as the base encoder.
Use the M5 model within the SentenceTransformer framework to create a sentence transformer.
Add a pooling layer to create fixed-size sentence embeddings from the variable-length output of the BERT model.
Combine the M5 model and pooling layer into a single model.
Fine-tune the model on a relevant dataset.

See the following code for Steps 1–3:
from sentence_transformers import models
from transformers import AutoTokenizer

# Step 1: Load Pre-trained M5 Model
model_path = ‘M5_ASIN_SMALL_V20’ # or your custom model path
transformer_model = models.Transformer(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Step 2: Define Pooling Layer
pooling_model = models.Pooling(transformer_model.get_word_embedding_dimension(),
pooling_mode_mean_tokens=True)

# Step 3: Create SentenceTransformer Model
model_mean_m5_base = SentenceTransformer(modules=[transformer_model, pooling_model])

The rest of the code remains the same as fine-tuning for the paraphrase-MiniLM-L6-v2 sentence transformer, except that we use the fine-tuned M5 sentence transformer instead to create embeddings for the texts in the dataset:
loaded_model = SentenceTransformer(‘m5_ft_epoch_5_mean’)
data[‘text_embedding_m5’] = data[‘all_text’].apply(lambda x: loaded_model.encode(str(x)))

Result
We observe similar results to paraphrase-MiniLM-L6-v2 when looking at accuracy before fine-tuning, observing a 78% accuracy for M5_ASIN_SMALL_V20. However, we observe that the fine-tuned M5_ASIN_SMALL_V20 sentence transformer performs better than the fine-tuned paraphrase-MiniLM-L6-v2. Its accuracy is 98%, compared to 94% for the fine-tuned paraphrase-MiniLM-L6-v2. We fine-tuned the sentence transformers for 5 epochs, because experiments showed this was the optimal number to minimize loss. The following graph summarizes our observations of accuracy improvement with fine-tuning for 5 epochs in a single comparison chart.

Clean up
We recommend using GPUs to fine-tune the sentence transformers, for example, ml.g5.4xlarge or ml.g4dn.16xlarge. Be sure to clean up resources to avoid incurring additional costs.
If you’re using a SageMaker notebook instance, refer to Clean up Amazon SageMaker notebook instance resources. If you’re using Amazon SageMaker Studio, refer to Delete or stop your Studio running instances, applications, and spaces.
Conclusion
In this post, we explored sentence transformers and how to use them effectively for text classification tasks. We dived deep into the sentence transformer paraphrase-MiniLM-L6-v2, demonstrated how to use a BERT-based model like M5_ASIN_SMALL_V20 to create a sentence transformer, showed how to fine-tune sentence transformers, and showed the accuracy effects of fine-tuning sentence transformers.
Fine-tuning sentence transformers has proven to be highly effective for classifying product descriptions into categories, significantly enhancing prediction accuracy. As a next step, we encourage you to explore different sentence transformers from Hugging Face.
Lastly, if you want to explore M5, note that it is proprietary to Amazon and you can only access it as an Amazon partner or customer as of the time of this publication. Connect with your Amazon point of contact if you’re an Amazon partner or customer wanting to use M5, and they will guide you through M5’s offerings and how it can be used for your use case.

About the Authors
Kara Yang is a Data Scientist at AWS Professional Services in the San Francisco Bay Area, with extensive experience in AI/ML. She specializes in leveraging cloud computing, machine learning, and Generative AI to help customers address complex business challenges across various industries. Kara is passionate about innovation and continuous learning.
Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.
James Poquiz is a Data Scientist with AWS Professional Services based in Orange County, California. He has a BS in Computer Science from the University of California, Irvine and has several years of experience working in the data domain having played many different roles. Today he works on implementing and deploying scalable ML solutions to achieve business outcomes for AWS clients.

Build a video insights and summarization engine using generative AI wi …

Professionals in a wide variety of industries have adopted digital video conferencing tools as part of their regular meetings with suppliers, colleagues, and customers. These meetings often involve exchanging information and discussing actions that one or more parties must take after the session. The traditional way to make sure information and actions aren’t forgotten is to take notes during the session; a manual and tedious process that can be error-prone, particularly in a high-activity or high-pressure scenario. Furthermore, these notes are usually personal and not stored in a central location, which is a lost opportunity for businesses to learn what does and doesn’t work, as well as how to improve their sales, purchasing, and communication processes.
This post presents a solution where you can upload a recording of your meeting (a feature available in most modern digital communication services such as Amazon Chime) to a centralized video insights and summarization engine. This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. The solution notes the logged actions per individual and provides suggested actions for the uploader. All of this data is centralized and can be used to improve metrics in scenarios such as sales or call centers. Many commercial generative AI solutions available are expensive and require user-based licenses. In contrast, our solution is an open-source project powered by Amazon Bedrock, offering a cost-effective alternative without those limitations.
This solution can help your organizations’ sales, sales engineering, and support functions become more efficient and customer-focused by reducing the need to take notes during customer calls.
Use case overview
The organization in this scenario has noticed that during customer calls, some actions often get skipped due to the complexity of the discussions, and that there might be potential to centralize customer data to better understand how to improve customer interactions in the long run. The organization already records sessions in video format, but these videos are often kept in individual repositories, and a review of the access logs has shown that employees rarely use them in their day-to-day activities.
To increase efficiency, reduce the load, and gain better insights, this solution looks at how to use generative AI to analyze recorded videos and provide employees with valuable insights relating to their calls. It also supports audio files so you have flexibility around the type of call recordings you use. Generated call transcripts and insights include conversation summary, sentiment, a list of logged actions, and a set of suggested next best actions. These insights are stored in a central repository, unlocking the ability for analytics teams to have a single view of interactions and use the data to formulate better sales and support strategies.
Organizations typically can’t predict their call patterns, so the solution relies on AWS serverless services to scale during busy times. This enables you to keep up with peak demands, but also scale down to reduce costs during times such as seasonal holidays when the sales, engineering, and support teams are away.
This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services. We walk through the key components and services needed to build the end-to-end architecture, offering example code snippets and explanations for each critical element that help achieve the core functionality. This approach should enable you to understand the underlying architectural concepts and provides flexibility for you to either integrate these into existing workloads or use them as a foundation to build a new workload.
Solution overview
The following diagram illustrates the pipeline for the video insights and summarization engine.

To enable the video insights solution, the architecture uses a combination of AWS services, including the following:

Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at scale.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can invoke Lambda functions from over 200 AWS services and software-as-a-service (SaaS) applications.
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to securely store objects and also serve static websites.
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward for developers to add speech-to-text capability to their applications.

For integration between services, we use API Gateway as an event trigger for our Lambda function, and DynamoDB as a highly scalable database to store our customer details. Finally, video or audio files uploaded are stored securely in an S3 bucket.
The end-to-end solution for the video insights and summarization engine starts with the UI. We build a simple static web application hosted in Amazon S3 and deploy an Amazon CloudFront distribution to serve the static website for low latency and high transfer speeds. We use CloudFront origin access control (OAC) to secure Amazon S3 origins and permit access to the designated CloudFront distributions only. With Amazon Cognito, we are able to protect the web application from unauthenticated users.
We use API Gateway as the entry point for real-time communications between the frontend and backend of the video insights and summarization engine, while controlling access using Amazon Cognito as the authorizer. With Lambda integration, we can create a web API with an endpoint to the Lambda function.
To start the workflow, upload a raw video file directly into an S3 bucket with the pre-signed URL given through API Gateway and a Lambda function. The updated video is fed into Amazon Transcribe, which converts the speech of the video into a video transcript in text format. Finally, we use large language models (LLMs) available through Amazon Bedrock to summarize the video transcript and extract insights from the video content.
The solution stores uploaded videos and video transcripts in Amazon S3, which offers durable, highly available, and scalable data storage at a low cost. We also store the video summaries, sentiments, insights, and other workflow metadata in DynamoDB, a NoSQL database service that allows you to quickly keep track of the workflow status and retrieve relevant information from the original video.
We also use Amazon CloudWatch and Amazon EventBridge to monitor every component of the workflow in real time and respond as necessary.
AI/ML workflow
In this post, we focus on the workflow using AWS AI/ML services to generate the summarized content and extract insights from the video transcript.
Starting with the Amazon Transcribe StartTranscriptionJob API, we transcribe the original video stored in Amazon S3 into a JSON file. The following code shows an example of this using Python:

job_args = {
‘TranscriptionJobName’: jobId,
‘Media’: {‘MediaFileUri’: media_uri},
‘MediaFormat’: media_format,
‘LanguageCode’: language_code,
‘Subtitles’: {‘Formats’: [‘srt’]},
‘OutputBucketName’: output_bucket_name,
‘OutputKey’: jobId + “.json”
}
if vocabulary_name is not None:
job_args[‘Settings’] = {‘VocabularyName’: vocabulary_name}
response = transcribe_client.start_transcription_job(**job_args)

The following is an example of our workload’s Amazon Transcribe output in JSON format:

{
“jobName”: “a37f0f27-0908-45eb-8d98-8efc3a9d4590-1698392975”,
“accountId”: “8469761*****”,
“results”: {
“transcripts”: [{
“transcript”: “Thank you for calling, my name is Ivy. Can I have your name?…”}],
“items”: [{
“start_time”: “7.809”,”end_time”: “8.21”,
“alternatives”: [{
“confidence”: “0.998”,”content”: “Thank”}],
“type”: “pronunciation”
},

]
},
“status”: “COMPLETED”
}

As the output from Amazon Transcribe is created and stored in Amazon S3, we use Amazon S3 Event Notifications to invoke an event to a Lambda function when the transcription job is finished and a video transcript file object has been created.
In the next step of the workflow, we use LLMs available through Amazon Bedrock. LLMs are neural network-based language models containing hundreds of millions to over a trillion parameters. The ability to generate content has resulted in LLMs being widely utilized for use cases such as text generation, summarization, translation, sentiment analysis, conversational chatbots, and more. For this solution, we use Anthropic’s Claude 3 on Amazon Bedrock to summarize the original text, get the sentiment of the conversation, extract logged actions, and suggest further actions for the sales team. In Amazon Bedrock, you can also use other LLMs for text summarization such as Amazon Titan, Meta Llama 3, and others, which can be invoked using the Amazon Bedrock API.
As shown in the following Python code to summarize the video transcript, you can call the InvokeEndpoint API to invoke the specified Amazon Bedrock model to run inference using the input provided in the request body:

modelId = ‘anthropic.claude-3-sonnet-20240229-v1:0’
accept = ‘application/json’
contentType = ‘application/json’

prompt_template = “””
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to obtain a brief summary of what the conversation was about. The AI based this summary on the contents of the conversation and does not make up events that did not happen.
The transcript is:
<text>
{}
</text>
What is the 2 paragraphs summary of the conversation?
“””

PROMPT = prompt_template.format(raw_text)

body = json.dumps(
{
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: PROMPT}
],
}
],
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 512,
“temperature”: 0.1,
“top_p”: 0.9
}
)
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response[“body”].read())
summary = response_body[“content”][0][“text”]

You can invoke the endpoint with different parameters defined in the payload to impact the text summarization:

temperature – temperature is used in text generation to control the level of randomness of the output. A lower temperature value results in a more conservative and deterministic output; a higher temperature value encourages more diverse and creative outputs.
top_p – top_p, also known as nucleus sampling, is another parameter to control the diversity of the summaries text. It indicates the cumulative probability threshold to select the next token during the text generation process. Lower values of top_p result in a narrower selection of tokens with high probabilities, leading to more deterministic outputs. Conversely, higher values of top_p introduce more randomness and diversity into the generated summaries.

Although there’s no universal optimal combination of top_p and temperature for all scenarios, in the preceding code, we demonstrate sample values with high top_p and low temperature in order to generate summaries focused on key information, maintaining fidelity to the original video transcript while still introducing some degree of wording variation.
The following is another example of using the Anthropic’s Claude 3 model through the Amazon Bedrock API to provide suggested actions to sales representatives based on the video transcript:

prompt_template = “””
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to look into what additional actions they can use to increase sales after the session. The AI bases the suggested actions on the contents of the conversation and what it thinks might help increase the customers satisfaction and loyalty.

The transcript is:
<text>
{}
</text>

Using the transcript above, provide a bullet point format for suggested actions the sales representative could do to increase follow on sales.
“””

PROMPT = prompt_template.format(raw_text)

body = json.dumps(
{
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: PROMPT}
],
}
],
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 1024,
“temperature”: 0.1,
“top_p”: 0.9
}
)

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response[“body”].read())
suggested_actions = response_body[“content”][0][“text”]

After we successfully generate video summaries, sentiments, logged actions, and suggested actions from the original video transcript, we store these insights in a DynamoDB table, which is then updated in the UI through API Gateway.
The following screenshot shows a simple UI for the video insights and summarization engine. The frontend is built on Cloudscape, an open source design system for the cloud. On average, it takes less than 5 minutes and costs no more than $2 to process 1 hour of video, assuming the video’s transcript contains approximately 8,000 words.

Future improvements
The solution in this post shows how you can use AWS services with Amazon Bedrock to build a cost-effective and powerful generative AI application that allows you to analyze video content and extract insights to help teams become more efficient. This solution is just the beginning of the value you can unlock with AWS generative AI and broader ML services.
One example of how this solution could be taken further is to expand the scope to help tackle some of the logged actions from calls. The addition of services such as Amazon Bedrock Agents could help automate some of the responses, such as forwarding relevant documentation like product specifications, price lists, or even a simple recap email. All of these could save effort and time, enabling you to focus more on value-added activities.
Similarly, the centralization of all this data could allow you to create an analytics layer on top of a centralized database to help formulate more effective sales and support strategies. This data is usually lost or misplaced within organizations because people prefer different methods for note collection. The proposed solution gives you the freedom to centralize data but also augment organization data with the voice of the customer. For example, the analytics team could analyze what employees did well in calls that have a positive sentiment and offer training or guidance to help everyone achieve more positive customer interactions.
Conclusion
In this post, we described how to create a solution that ingests video and audio files to create powerful, actionable, and accurate insights that an organization can use through the power of Amazon Bedrock generative AI capabilities on AWS. The insights provided can help reduce the undifferentiated heavy lifting that customer-facing teams encounter, and also provide a centralized dataset of customer conversations that an organization can use to further improve performance.
For further information on how you can use Amazon Bedrock for your workloads, see Amazon Bedrock.

About the Authors
Simone Zucchet is a Solutions Architect Manager at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.
Vu San Ha Huynh is a Solutions Architect at AWS. He has a PhD in computer science and enjoys working on different innovative projects to help support large enterprise customers.
Adam Raffe is a Principal Solutions Architect at AWS. With over 8 years of experience in cloud architecture, Adam helps large enterprise customers solve their business problems using AWS.
Ahmed Raafat is a Principal Solutions Architect at AWS, with 20 years of field experience and a dedicated focus of 6 years within the AWS ecosystem. He specializes in AI/ML solutions. His extensive experience spans various industry verticals, making him a trusted advisor for numerous enterprise customers, helping them seamlessly navigate and accelerate their cloud journey.

Email Capture Software: The 15 Best Tools for Ecommerce (Reviewed)

If you’re in ecommerce, you know email isn’t just a channel – it’s a revenue machine! 

In fact, email marketing can account for up to 30% of total revenue for many brands, outpacing social media and paid ads in impact. And with an ROI of $42 for every $1 spent, email is basically the gift that keeps on giving.

But to see those results, you’ve got to build your list—and that’s where email capture software comes in. 

From visitor identification to pop-ups that sense when a visitor is about to leave to forms that feel relevant and well-timed, the latest tools make it easier than ever to turn visitors into loyal customers.

Ready to up your email game? We are going to break down not just the best email capture software out there, but we’re also going to look at how to use it, who should use it, and so much more. 

Let’s dive in!

Want to skip ahead and just see our email capture software reviews? 

Customers.ai

OptinMonster

ConvertFlow

Leadpages

Sleeknote

Justuno

Wishpond

Thrive Leads

JotForm

Mail Munch

Bloom Email Opt-in

Privy

Wisepops

OptiMonk

Poptin

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

What is Email Capture Software?

Before we get into the specific tools, it’s important to understand what email capture software is and why it’s a must for ecommerce marketers.

Email capture software is a tool that helps businesses collect email addresses from website visitors and potential customers. These tools use forms, pop-ups, and lead magnets to encourage visitors to subscribe, allowing brands to build their email lists and engage customers with targeted campaigns. 

Beyond the basics, the best email capture tools offer advanced features like A/B testing, behavioral triggers, and integrations with your existing marketing stack, making it easier to capture high-quality leads. 

Instead of relying on one-size-fits-all forms, these tools let you customize how, when, and where you capture emails, ensuring you’re reaching your audience at just the right moment—like when they’re about to abandon their cart or after they’ve spent some time browsing.

Email capture software is essential for ecommerce and marketing teams looking to drive more conversions, boost revenue, and create a more personalized customer experience.

What does Email Capture Software Do?

Seems like a silly question right? Isn’t there just a straightforward answer?

Yes and no. At its core, email capture software is all about collecting email addresses from your website visitors. 

But it’s not just about slapping a form on your homepage and hoping people sign up. Modern email capture tools use smart triggers, timing, and personalization to connect with customers in ways that feel natural and engaging.

For example, let’s say you’re running an online boutique. A shopper visits the site few items, adds something to their cart, but leaves without buying or giving you their information. 

With an email capture tool like Customers.ai, you can grab their email and put them into an abandoned cart flow. Maybe the flow contains a an email with a 10% discoun that they love and they immediately come back to the site and make a purchase.

Email capture tools can do a lot of things – they can recognize repeat visitors, tailor sign-up offers based on browsing behavior, and even integrate directly with your CRM or email marketing platform to start campaigns right away. 

Ultimately, they help you capture interest when it’s at its peak and move shoppers down the funnel faster.

Why is Email Capture Software Essential for Marketers?

For ecommerce marketers, capturing emails isn’t just a tactic—it’s a key to driving sales and staying connected with customers. 

But it’s not just about adding people to a list or finding another way to spam their inboxes. It’s about building relationships, boosting conversions, and creating repeat customers. 

Here are five reasons why email capture software is one of the smartest investments any ecommerce brand can make:

1. Boosts Revenue Directly

With email campaigns driving up to 30% of revenue for many ecommerce brands, capturing emails is one of the most reliable ways to secure consistent, long-term revenue. 

Unlike social or paid ads that can be here one day and gone the next, an email list lets you stay directly in touch with your customers.

2. Increases Conversions with Targeted Campaigns

On average, segmented email campaigns have a 14% higher open rate and a 100% increase in click-through rates compared to non-segmented ones. 

By capturing emails, you’re setting up a pipeline for personalized follow-up campaigns that convert. Abandoned cart reminders, product recommendations, and exclusive offers can all be tailored to individual preferences, nudging customers toward a purchase.

3. Builds Customer Loyalty and Repeat Purchases

Customers who receive personalized offers based on their preferences are more likely to return and buy again. 

And since repeat customers spend 67% more than new ones, capturing emails lets you create the kind of ongoing engagement that builds brand loyalty and increases lifetime value.

4. Reduces Dependence on Third-Party Channels

With algorithms constantly changing and the cost of social ads rising, having a direct line to your customers is more important than ever. 

Email capture software helps build a first-party data source that you fully control, so you’re not at the mercy of platform shifts.

5. Supports Data-Driven Decision-Making

Email capture tools often provide data on which forms work best, which offers are most popular, and when customers are most likely to engage. 

This data helps you refine your approach, creating campaigns that feel more relevant and hit the mark more often. 

Email capture software helps you do more than just build a list—it keeps your brand connected with customers, drives real revenue, and makes your marketing smarter. For ecommerce brands, it’s one of the best tools to have in your kit.

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Top Features to Look for in Email Capture Tools

When it comes to email capture tools, not everything is as it seems. Each tool comes with its own set of features, and finding the right one means focusing on the features that truly fit your goals. 

For instance, if your priority is reducing cart abandonment, look for tools with strong exit-intent pop-up capabilities. On the other hand, if you want more detailed customer data, lead segmentation and customer journey tracking features are a must. 

Taking the time to choose a tool that matches your needs can make all the difference in how well it performs for your brand.

1. Behavioral Triggers

Look for options to trigger forms based on visitor behavior, like exit intent, scroll depth, or time on page, so you can capture attention without interrupting the experience.

2. Customizable Forms and Pop-Ups

The tool should allow you to design forms and pop-ups that match your brand’s look and feel, making them feel like a natural part of your website.

3. Anonymous Visitor Tracking

Some tools allow you to gather insights on anonymous visitors, tracking behaviors like page visits, time spent, and interactions. This data can help you understand visitor intent and prioritize high-interest leads.

4. Mobile Optimization

Since so much web traffic comes from mobile, your email capture forms must be responsive and look great on any device.

5. Integration with Email Marketing Platforms

A seamless connection to tools like Klaviyo, Mailchimp, or HubSpot lets you automate follow-up sequences instantly.

6. Custom Segmentation Based on On-Site Behavior

Some tools offer custom segmentation that groups visitors based on specific actions—like clicking certain links or viewing particular products—allowing you to create highly relevant follow-up campaigns tailored to each visitor’s interests.

7. Exit-Intent Pop-Ups

Exit-intent technology detects when a visitor is about to leave your site, giving you a last chance to capture their email with a tailored message or offer.

8. Customer Journey Tracking

Customer journey tracking lets you see the entire path a visitor takes on your site – from initial entry to purchase or exit – helping you identify key touchpoints where visitors are most engaged.

9. Analytics and Reporting

Access to data on which forms convert best, who’s signing up, and when they’re most engaged helps you fine-tune your strategy.

10. Compliance Tools

Look for built-in compliance features, like GDPR checkboxes and cookie consent banners, to ensure you’re collecting data responsibly and legally.

When choosing an email capture tool, it’s important to make sure you find one with features that fit your goals. By focusing on what matters most to your strategy, you’ll set yourself up to capture more leads and keep your audience connected.

Best Email Capture Software for 2025 (Reviewed)

We’ve talked about what email capture software is and why it’s important to have in your ecommerce toolset — now, let’s check out the best tools available. 

Below, we’ve reviewed 15 top email capture software tools for ecommerce marketers, highlighting their features, strengths, and what makes them stand out. Here’s what we you need to know:

**Note: Pricing info is as of October 2024**

1. Customers.ai

URL: https://www.customers.ai

What it Does: Customers.ai is an AI-powered email capture and remarketing tool designed to help businesses capture high-intent visitors, track the customer journey, and nurture leads across multiple channels through it’s integrations with Klaviyo, Shopify, Meta, and more. It automates follow-ups via email and sends audiences to ad remarketing platforms to help convert leads into customers. It’s especially useful for ecommerce and DTC businesses, providing detailed visitor tracking and segmentation.

Pricing: Customers.ai offers flexible pricing with a free trial and paid plans starting at $99/month.

Rating: ★★★★★ (4.8/5)

Customers.ai Email Capture Software Reviews:

Customers.ai consistently receives high marks from users for its ease of use, powerful email capture capabilities, and all-in-one marketing automation capabilies. Here’s what real users have to say:

Ease of Use: Many users appreciate how intuitive the platform is. One G2 reviewer mentioned, “Customers.ai makes lead capture and automation a breeze. I was able to set up my campaigns in minutes without needing any technical skills.” This ease of use is a common theme, especially for small businesses and teams without dedicated technical resources.

AI-Powered Automations: A key standout is Customers.ai’s AI-driven tools. A reviewer on Trustpilot noted, “The AI automations have drastically reduced the time we spend managing our lead follow-ups. It captures our visitors and instantly sends them into the right workflows, boosting our conversion rates significantly.”

Lead Identification and Retargeting: Customers also rave about the ability to not only capture leads but to take action with them. A Capterra review said, “The visitor identification and retargeting features are incredible. We can now track anonymous visitors, identify who they are, and remarket to them via ads and email. It’s helped us capture leads we would have otherwise lost.”

Customer Support: Several reviewers have praised the platform’s customer support. One G2 reviewer shared, “The support team is always responsive and ready to help. Whenever we hit a snag, they’ve been there to guide us through, making sure we get the most out of the platform.”

2. OptinMonster

URL: https://www.optinmonster.com

What it Does: OptinMonster specializes in building email capture forms such as pop-ups, slide-ins, and banners to convert website visitors into subscribers and customers. It’s known for its A/B testing features and exit-intent technology.

Pricing: Plans start at $9/month, billed annually.

Rating: ★★★★☆ (4.5/5)

OptinMonster Email Capture Software Reviews:

OptinMonster is widely praised for its conversion optimization features and exit-intent technology.

Lead Conversion: A G2 reviewer shared, “OptinMonster’s exit-intent technology has saved us thousands in lost leads. We’ve been able to capture users who were about to leave and turn them into subscribers or customers.”

User Interface: One user on Trustpilot commented, “The drag-and-drop builder is easy to use, and the templates made it simple to get started without needing a designer.”

Flexibility: On Capterra, a reviewer mentioned, “The flexibility to run different campaigns—pop-ups, slide-ins, banners—has given us a lot of creative control over how we capture leads.”

3. ConvertFlow

URL: https://www.convertflow.com

What it Does: ConvertFlow is a lead generation platform that helps marketers create personalized on-site experiences, including pop-ups, forms, and quizzes, to capture and convert leads. It integrates with various CRM and email marketing platforms to automate follow-ups and email capture.

Pricing: Plans start at $99/month with a 14-day free trial.

Rating: ★★★★☆ (4.6/5)

ConvertFlow Email Capture Software Reviews:

ConvertFlow earns praise for its personalization features and seamless integrations.

Customizability: One G2 user wrote, “The ability to create personalized funnels and on-site experiences is a game-changer for us. ConvertFlow integrates well with our email platform, allowing us to build highly targeted email capture forms.”

Ease of Use: Another reviewer on Capterra mentioned, “We had our pop-ups and lead forms running in no time, and it was super easy to integrate with our CRM.”

Support: Trustpilot reviews frequently mention the helpful customer support, with one stating, “ConvertFlow’s support team walked us through every step of the setup process.”

4. Leadpages

URL: https://www.leadpages.com

What it Does: Leadpages is a landing page builder designed to help businesses capture emails through high-converting landing pages, pop-ups, and alert bars. It integrates with a wide variety of marketing tools.

Pricing: Plans start at $49/month with a 14-day free trial.

Rating: ★★★★☆ (4.4/5)

Leadpages Email Capture Software Reviews:

Leadpages is highly regarded for its landing page builder and conversion tracking tools.

Landing Page Quality: A Capterra user said, “Leadpages makes it simple to create beautiful, high-converting landing pages. We’ve seen a huge boost in our email capture since using the platform.”

Integrations: One G2 reviewer noted, “The integrations with email and CRM systems allow us to instantly follow up with leads, which is essential for our business.”

Support: On Trustpilot, a reviewer commented, “Leadpages’ customer service team is phenomenal. They’re always willing to help us optimize our pages and improve our conversion rates.”

5. Sleeknote

URL: https://www.sleeknote.com

What it Does: Sleeknote helps ecommerce sites convert visitors into leads through personalized pop-ups, slide-ins, and banners. It focuses on capturing emails and reducing cart abandonment with advanced targeting options.

Pricing: Plans start at $59/month with a 7-day free trial.

Rating: ★★★★☆ (4.6/5)

Sleeknote Email Capture Software Reviews:

Sleeknote is widely recognized for its ecommerce-focused email capture solutions and personalized pop-ups.

Pop-Up Customization: A reviewer on G2 said, “Sleeknote’s customization options are fantastic. We’re able to tailor the pop-ups to match our brand and message, making them far more effective.”

Targeting Options: A user on Capterra noted, “The advanced targeting rules let us show the right message to the right visitor, which has boosted our conversion rates significantly.”

Support: On Trustpilot, a reviewer wrote, “Sleeknote’s support team is always there to help us set up campaigns and offer advice on best practices.”

6. Justuno

URL: https://www.justuno.com

What it Does: Justuno is a conversion optimization platform that provides on-site pop-ups, banners, and form builders to capture emails and grow lead lists. It integrates with popular ecommerce and marketing platforms.

Pricing: Plans start at $29/month with a 14-day free trial.

Rating: ★★★★☆ (4.5/5)

Justuno Email Capture Software Reviews:

Justuno is known for its conversion optimization and emal capture features for ecommerce businesses.

Conversion Boost: A reviewer on G2 noted, “Justuno’s AI-driven pop-ups have helped us increase conversions and grow our email list significantly.”

Personalization: A Capterra user said, “The personalization and targeting options are fantastic. We’re able to show different offers to different segments of our audience, which has improved our results.”

Support: A Trustpilot review highlighted, “Their support team is always responsive and goes above and beyond to help us optimize our campaigns.”

7. Wishpond

URL: https://www.wishpond.com

What it Does: Wishpond is a lead generation platform that includes landing pages, pop-ups, forms, and contests to capture emails. It also provides email marketing and automation tools to help nurture those leads.

Pricing: Plans start at $49/month with a 14-day free trial.

Rating: ★★★★☆ (4.4/5)

Wishpond Email Capture Software Reviews:

Wishpond is popular for its email capture, landing page builder, and email marketing integration.

All-in-One Platform: A G2 reviewer said, “Wishpond makes it easy to create landing pages, capture leads, and follow up with email automation—all in one platform.”

Ease of Use: On Capterra, a user mentioned, “We love how easy it is to create campaigns and landing pages with Wishpond. It’s user-friendly and packed with features.”

8. Thrive Leads

URL: https://thrivethemes.com

What it Does: Thrive Leads is a WordPress plugin focused on building email lists through customizable opt-in forms like pop-ups, slide-ins, and in-line forms. It includes advanced A/B testing, responsive design, and SmartLinks for personalized user targeting.

Pricing: Available as part of the Thrive Suite at $299/year, which includes all Thrive Themes tools.

Rating: ★★★★☆ (4.5/5)

Thrive Email Capture Software Reviews:

Customization: Users on G2 praise its customization options and extensive template library, which lets you easily create high-converting forms.

A/B Testing: Thrive Leads’ advanced A/B testing feature is frequently highlighted as an effective way to optimize conversion rates by testing different form types and triggers.

Ease of Use: Many reviewers find Thrive Leads intuitive for WordPress, with responsive design and mobile-specific customization options for an optimized user experience.

9. JotForm

URL: https://www.jotform.com

What it Does: JotForm is a form builder that allows users to create custom forms for email capture, surveys, and more. It integrates with email marketing services and CRMs, making it easy to collect and manage leads.

Pricing: Free for basic features, with paid plans starting at $34/month.

Rating: ★★★★☆ (4.7/5)

JotForm Email Capture Software Reviews:

​​JotForm is known for its versatile form-building capabilities and ease of use.

Custom Forms: One G2 reviewer noted, “We’ve used JotForm to create everything from lead capture forms to customer surveys. The flexibility and customization options are unmatched.”

Ease of Use: A Capterra user mentioned, “Building forms on JotForm is incredibly easy, even for non-technical users. Plus, the integrations with CRMs and email platforms are seamless.”

Customer Service: A Trustpilot review praised JotForm’s support, saying, “The customer support team is always quick to respond and offers real solutions.”

10. Mail Munch

URL: https://www.mailmunch.com

What it Does: MailMunch provides a range of lead capture forms, including pop-ups, embedded forms, and landing pages, with integrations for major email marketing platforms. It’s designed to help businesses grow their email lists through customizable opt-in forms.

Pricing: Free plan available; paid plans start at $13.99/month.

Rating: ★★★★☆ (4.4/5)

Mail Munch Email Capture Software Reviews:

Affordability: MailMunch receives positive feedback on Capterra for its competitive pricing and comprehensive free version.

Integrations: Users appreciate its seamless integration with popular email marketing platforms, allowing quick and easy list building.

User Experience: MailMunch is frequently noted for being user-friendly, with simple drag-and-drop editing and helpful customer support.

11. Bloom

URL: https://www.elegantthemes.com/plugins/bloom/

What it Does: Bloom by Elegant Themes is an email opt-in plugin that offers a variety of customizable form types, including pop-ups, slide-ins, and inline forms. It’s tailored for WordPress users and offers targeted display settings.

Pricing: Included with the Elegant Themes membership at $89/year.

Rating: ★★★★☆ (4.3/5)

Reviews:

Ease of Use: Users on G2 appreciate Bloom’s straightforward interface and pre-designed templates, which make it easy to get started.

Targeted Triggers: Reviewers highlight its targeting options that allow users to control which forms display on specific pages or posts, making it useful for personalized marketing.

Value: Many users find Bloom offers good value as part of the Elegant Themes membership, giving access to multiple tools in addition to Bloom.

12. Privy

URL: https://www.privy.com

What it Does: Privy offers pop-ups, email capture forms, and exit-intent banners, primarily targeting ecommerce stores. It also includes email marketing features, making it a comprehensive tool for lead generation and follow-ups.

Pricing: Free plan available, with paid plans starting at $15/month.

Rating: ★★★★☆ (4.6/5)

Privy Email Capture Software Reviews:

Privy is well-liked for its ecommerce focus and ability to reduce cart abandonment through email capture.

Cart Abandonment: A G2 reviewer said, “Privy’s pop-ups and exit-intent features have helped us reduce cart abandonment and recover lost sales.”

Ease of Use: On Capterra, a user shared, “It’s super easy to create pop-ups and email capture forms without needing a developer. Plus, it integrates seamlessly with our Shopify store.”

Support: A Trustpilot review mentioned, “Privy’s support team has been excellent at helping us optimize our campaigns and get more conversions.”

13. Wisepops

URL: https://wisepops.com

What it Does: Wisepops is a lead capture tool that focuses on creating customizable pop-ups and banners for email list building. It offers advanced targeting and segmentation features and integrates with major email marketing platforms.

Pricing: Starts at $49/month, with additional costs based on visitor volume.

Rating: ★★★★☆ (4.5/5)

Wisepops Email Capture Software Reviews:

Advanced Targeting: Reviewers on Capterra commend Wisepops for its robust targeting options, allowing businesses to show the right message to the right user at the right time.

User-Friendly: Users find the interface easy to navigate, with strong support for setting up complex targeting rules.

Analytics: Wisepops’ built-in analytics receive positive reviews for helping users track the performance of each pop-up and make data-driven adjustments.

14. OptiMonk

URL: https://www.optimonk.com

What it Does: OptiMonk specializes in pop-ups, sticky bars, and other on-site messages designed to capture emails and increase conversions. It offers personalization and A/B testing features to optimize performance.

Pricing: Free plan available, with paid plans starting at $39/month.

Rating: ★★★★☆ (4.6/5)

OptiMonk Email Capture Software Reviews:

OptiMonk is highly regarded for its personalization and A/B testing features in lead capture.

Personalization: A G2 reviewer shared, “The personalization options in OptiMonk allow us to create highly targeted pop-ups that convert better than any generic pop-ups we’ve tried.”

A/B Testing: On Capterra, a user wrote, “The A/B testing features are easy to use and have helped us identify the best-performing lead capture campaigns.”

Support: A Trustpilot review mentioned, “OptiMonk’s support team is always quick to help us troubleshoot and optimize our pop-ups.”

15. Poptin

URL: https://www.poptin.com

What it Does: Poptin offers various email capture forms, including pop-ups, embedded forms, and slide-ins, designed to engage visitors and convert them into subscribers. It features A/B testing, exit-intent technology, and a range of templates.

Pricing: Free plan available; paid plans start at $19/month.

Rating: ★★★★☆ (4.6/5)

Poptin Email Capture Software Reviews:

Ease of Use: Users on G2 highlight Poptin’s ease of use, with customizable templates and a simple editor that makes setting up forms quick and easy.

Customer Support: Many reviewers mention responsive customer support that assists with setup and troubleshooting.

Conversion Optimization: Poptin is praised for its A/B testing and exit-intent features, helping businesses optimize form performance and reduce bounce rates.

With a range of features, pricing options, and customization capabilities, these email capture tools provide flexibility for different business needs and budgets. 

Email Deliverability Hacks:

Unlocking Inboxes 2

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

How to Choose Your Email Capture Software

We only listed 15 but with so many email capture software tools out there, finding the right one comes down to knowing what you need most. 

Here’s a checklist of factors to consider as you evaluate options:

Ease of Use: Choose a tool that’s easy to set up and use without needing a lot of technical expertise. An intuitive dashboard and straightforward customization options can save time and get you up and running faster.

Customizability: Look for a tool that lets you fully customize forms, pop-ups, and calls-to-action. The ability to match your brand’s style and tailor messaging will help make email capture feel seamless to your visitors.

Behavioral Triggers: Tools with advanced behavioral triggers, like exit-intent, scroll-based, and time-based pop-ups, can help you capture emails at optimal moments. Make sure the software you choose allows for flexible targeting based on visitor behavior.

Integration with Marketing Platforms: Check if the tool integrates easily with your existing platforms, like your email service provider, CRM, or analytics tools. This integration is key for automating workflows and keeping all your data in one place.

Mobile Responsiveness: With so much traffic coming from mobile devices, make sure the tool’s forms and pop-ups look great on mobile and don’t disrupt the mobile experience.

A/B Testing Capabilities: The ability to run A/B tests on different form designs, messages, and calls-to-action helps you identify what works best. This is crucial for optimizing conversion rates over time.

Visitor Identification Features: If you’re focused on ecommerce, consider tools with visitor identification capabilities. Features like anonymous tracking, IP recognition, or identity resolution give you insights into visitor behavior, even if they don’t immediately convert.

Analytics and Reporting: Strong analytics give you insights into which campaigns are working, who’s signing up, and which pages have the highest conversion rates. Look for tools that offer detailed, actionable data.

Customer Support: Reliable customer support can be a game-changer, especially if you’re integrating the tool into a larger marketing stack. Check if the tool provides live support, documentation, and resources for troubleshooting.

Compliance with Privacy Laws: Make sure the tool has built-in features for compliance with GDPR, CCPA, and other privacy regulations. This might include options like consent checkboxes and clear data policies to keep your brand protected.

Taking the time to assess these factors will ensure that your chosen tool not only captures more emails but also aligns with your goals and seamlessly integrates into your workflow.

Email Capture Software vs. Other Lead Capture Tools

I mentioned earlier that not all email capture tools are the same. It’s also important to understand that email capture software can be different than lead capture software.

While both email capture software and other lead capture tools are designed to turn visitors into leads, their approaches and goals often differ. Email capture software is focused on gathering email addresses specifically to build a long-term subscriber list that can be nurtured with targeted email campaigns, making it a powerful choice for ecommerce brands looking to drive repeat sales and customer loyalty.

On the other hand, other lead capture tools may collect a wider range of information, such as phone numbers or social media profiles, and are often geared toward immediate engagement, like connecting visitors to sales or support teams through live chat or contact forms. 

These tools are valuable for industries where leads may need immediate follow-up or one-on-one communication, such as B2B or service-based businesses.

For ecommerce brands, email capture software typically provides a higher ROI by focusing on building a list that can be regularly engaged with personalized offers and product updates, while other lead capture tools offer flexibility in capturing and acting on leads quickly in different settings.

Why Customers.ai is the Best Email Capture Software

With so many email capture tools on the market, it’s all about finding the one that checks all the right boxes for your business. 

Customers.ai stands out due to our industry-leading capture rates, helping ecommerce brands turn more visitors into subscribers and customers. Plus, with seamless integrations across top ecommerce platforms like Klaviyo and Shopify, plus a powerful API, it fits right into any tech stack without a hitch.

Customers.ai is also top-rated for its ease of use and flexibility, meaning you can set up and customize email capture within minutes! 

If you’re looking for an email capture solution that’s built to drive results, start your free trial with Customers.ai today and see the impact firsthand.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Email Capture Software FAQs

Why is email capture important for ecommerce?

Email capture is crucial for ecommerce because it builds a direct line of communication with potential customers. With emails, brands can send personalized offers, abandoned cart reminders, and promotions that drive sales and customer loyalty.

What are the benefits of using email capture software?

Email capture software helps increase conversion rates, build a high-value email list, and support targeted marketing campaigns. It also reduces reliance on third-party platforms by creating a first-party data source brands control.

Can email capture software help reduce cart abandonment?

Yes, email capture software can help reduce cart abandonment by using exit-intent pop-ups to capture emails before a visitor leaves. This allows brands to follow up with abandoned cart reminders, often leading to higher recovery rates.

How does email capture software integrate with other marketing tools?

Most email capture software integrates seamlessly with email marketing platforms, CRMs, and analytics tools. This allows you to automatically add new subscribers to campaigns, segment them, and measure their engagement.

Is email capture software compliant with GDPR and other privacy laws?

Leading email capture software includes features to ensure GDPR, CCPA, and other privacy law compliance, such as consent checkboxes and data management tools. Always check if the tool has built-in compliance options before use.

What features should I look for in email capture software?

Key features include customizable forms, behavioral triggers, A/B testing, mobile responsiveness, integration with email marketing platforms, visitor identification, segmentation, analytics, and GDPR compliance.

What is an exit-intent pop-up in email capture software?

An exit-intent pop-up appears when a visitor is about to leave the website. It detects mouse movements and prompts the visitor to enter their email, offering an incentive to stay or subscribe before they exit.

Can email capture software track anonymous visitors?

Some advanced email capture tools offer visitor identification, tracking anonymous visitors to gather data on behaviors like browsing patterns. This can provide insights into high-intent users who may convert later.

How does email capture software help with retargeting?

Email capture software collects subscriber data, enabling brands to retarget those users through personalized email campaigns, abandoned cart reminders, or exclusive offers that drive conversions.

How can I improve my email capture rate?

To improve your email capture rate, use well-timed pop-ups, offer valuable incentives, and test different form designs. Behavioral triggers like exit-intent and A/B testing help optimize capture methods based on visitor behavior.

What’s the difference between email capture and lead capture software?

Email capture software focuses specifically on collecting email addresses, while lead capture software may also collect phone numbers, job titles, or other information. Email capture is ideal for ecommerce brands aiming to build an engaged email list.

How do behavioral triggers work in email capture software?

Behavioral triggers activate forms or pop-ups based on visitor actions, like scrolling, time spent on a page, or intent to leave. This approach increases capture rates by engaging visitors when they’re most likely to subscribe.

What is A/B testing in email capture software?

A/B testing in email capture software allows you to test different versions of forms or pop-ups to see which design, messaging, or placement converts best. It’s a way to continuously optimize capture rates.

How do I choose the best email capture software for my business?

Look for email capture software that fits your business goals, integrates with your marketing tools, has customizable forms, and includes features like A/B testing, behavioral triggers, and compliance options.

What’s the average ROI of email marketing for ecommerce brands?

On average, email marketing yields an ROI of $42 for every $1 spent, making email capture software a valuable investment for building a list that drives conversions and revenue.

How do pop-up forms differ from embedded forms in email capture?

Pop-up forms appear based on visitor behavior, like when they’re about to leave, while embedded forms are placed directly within a webpage, often as part of a footer or sidebar.

How can email capture software support multi-channel marketing?

By capturing email addresses, the software enables you to reach customers through emails, retargeting ads, and other channels, creating a consistent experience across platforms and increasing engagement.

Is email capture software worth it for small businesses?

Absolutely. For small businesses, email capture software provides a cost-effective way to build a customer base, boost conversions, and create personalized engagement, all of which contribute to sustainable growth.
The post Email Capture Software: The 15 Best Tools for Ecommerce (Reviewed) appeared first on Customers.ai.

LongRAG: A Robust RAG Framework for Long-Context Question Answering

Large Language Models (LLMs) have revolutionized long-context question answering (LCQA), a complex task requiring reasoning over extensive documents to provide accurate answers. While recent long-context LLMs like Gemini and GPT4-128k can process entire documents directly, they struggle with the “lost in the middle” phenomenon, where relevant information in the middle of documents often leads to suboptimal or incorrect responses. Retrieval-Augmented Generation (RAG) systems attempt to address this by using fixed-length chunking strategies, but they face their own limitations. These include the disruption of contextual structure, incomplete information in chunks, and challenges with low evidence density in long documents, where noise can impair the LLMs’ ability to identify key information accurately. These issues collectively hinder the development of reliable LCQA systems.

Multiple approaches have emerged to address the challenges in long-context question answering. Long-context LLM methods fall into two categories: training-based approaches like Position Interpolation, YaRN, and LongLoRA, which offer better performance but require significant resources, and non-fine-tuned methods such as restricted attention and context compression, which provide plug-and-play solutions at lower costs. Traditional RAG systems attempted to improve LLMs’ response quality by utilizing external knowledge sources, but their direct incorporation of retrieved chunks led to incomplete information and noise. Advanced RAG models introduced solutions like filtering retrieved knowledge, implementing chunk-free strategies to preserve semantics, and employing active retrieval mechanisms. Domain-specific fine-tuning has also emerged as a strategy to enhance RAG components, focusing on improving retrieval outcomes and generating more personalized outputs.

Researchers from Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences, Tsinghua University and Zhipu AI introduce LongRAG, a comprehensive solution to LCQA challenges through a dual-perspective, robust system paradigm comprising four plug-and-play components: a hybrid retriever, an LLM-augmented information extractor, a CoT-guided filter, and an LLM-augmented generator. The system’s innovative approach addresses both global context understanding and factual detail identification. The long-context extractor employs a mapping strategy to transform retrieved chunks into a higher-dimensional semantic space, preserving contextual relationships, while the CoT-guided filter utilizes Chain of Thought reasoning to provide global clues and precisely filter irrelevant information. This dual-perspective approach significantly enhances the system’s ability to process complex, lengthy contexts while maintaining accuracy. The system’s architecture is complemented by an automated instruction data pipeline for fine-tuning, enabling strong “instruction-following” capabilities and easy domain adaptation.

LongRAG’s architecture consists of four sophisticated components working in harmony. The hybrid retriever employs a dual-encoder structure with sliding windows for chunk segmentation, combining coarse-grained rapid retrieval with fine-grained semantic interaction through FAISS implementation. The LLM-augmented information extractor addresses scattered evidence by mapping retrieved chunks back to source paragraphs using a mapping function, preserving semantic order and contextual relationships. The CoT-guided filter implements a two-stage strategy: first generating Chain of Thought reasoning with a global perspective, then using these insights to evaluate and filter chunks based on their relevance to the question. Finally, the LLM-augmented generator synthesizes the global information and filtered factual details to produce accurate answers. The system’s effectiveness is enhanced through instruction-tuning using 2,600 high-quality data points from LRGinstruction, with models trained using advanced strategies like DeepSpeed and flash attention.

LongRAG demonstrates superior performance across multiple comparative dimensions. When compared to long-context LLM methods like LongAlign and LongLoRA, the system achieves higher performance across all datasets, particularly in detecting crucial factual details that other models often miss in mid-document sections. Compared to advanced RAG systems, LongRAG shows a 6.16% improvement over leading competitors like Self-RAG, primarily due to its more robust handling of factual details and complex multi-hop questions. The system’s most dramatic improvement appears in comparison to Vanilla RAG, showing up to a 17.25% performance increase, attributed to its superior preservation of coherent long-context background and structure. Notably, LongRAG’s effectiveness extends across both small and large language models, with fine-tuned ChatGLM3-6B-32k outperforming even non-fine-tuned GPT-3.5-Turbo, demonstrating the system’s robust architecture and effective instruction-following capabilities.

LongRAG emerges as a robust solution in the field of long-context question answering through its innovative dual information perspective approach. The system effectively addresses two critical challenges that have plagued existing methods: the incomplete collection of long-context information and the imprecise identification of factual information in noisy environments. Through comprehensive multidimensional experiments, LongRAG demonstrates not only superior performance over long-context LLMs, advanced RAG methods, and Vanilla RAG but also remarkable cost-effectiveness. The system’s plug-and-play components achieve better results than GPT-3.5-Turbo while using smaller parameter-size LLMs, making it a practical solution for local deployment without relying on expensive API resources.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs

The post LongRAG: A Robust RAG Framework for Long-Context Question Answering appeared first on MarkTechPost.

Researchers from Intel and Salesforce Propose SynthKG: A Multi-Step Do …

Knowledge Graph (KG) synthesis is gaining traction in artificial intelligence research because it can construct structured knowledge representations from expansive, unstructured text data. These structured graphs have pivotal applications in areas requiring information retrieval and reasoning, such as question answering, complex data summarization, and retrieval-augmented generation (RAG). KGs effectively link and organize information, enabling models to process and answer intricate queries more accurately. Despite these advantages, creating high-quality KGs from large datasets remains challenging due to the need for both coverage and efficiency, which become increasingly difficult to maintain with traditional methods when handling massive amounts of data.

One of the central problems in KG synthesis is reducing the inefficiency in generating comprehensive graphs, especially for large-scale corpora that require complex knowledge representations. Existing KG extraction techniques typically employ large language models (LLMs) capable of advanced processing but can also be computationally prohibitive. These methods generally use zero-shot or few-shot prompt-based approaches to structure KGs, often involving extensive API calls and high costs. These approaches need to be revised in handling lengthy documents comprehensively, leading to issues such as incomplete data representation and significant information loss. This creates a gap between the growing demand for effective data synthesis methods and the available KG construction tools, which need more specialization for ontology-free KG evaluation and benchmarking.

In current practice, traditional methods of KG construction rely heavily on LLM prompting to derive knowledge triplets. This single-step, in-context learning approach presents several limitations. For example, the computational demand increases as the corpus grows, and each additional API call to process data increases costs. Also, there needs to be a standardized dataset or evaluation metric for assessing document-level, ontology-free KGs, creating further challenges for researchers aiming to benchmark the effectiveness of their models. With large-scale applications in mind, there is a compelling need for models that can manage detailed document processing efficiently without compromising data quality.

The Salesforce and Intel Labs researchers introduced SynthKG, a multi-step KG construction workflow that enhances coverage and efficiency. SynthKG breaks down document processing into manageable stages, ensuring that information remains intact by chunking documents and then processing each segment to identify entities, relations, and relevant propositions. A distilled model, Distill-SynthKG, was further developed by fine-tuning a smaller LLM using KGs generated from SynthKG. This distillation reduces the multi-step workflow into a single-step process, significantly reducing computational requirements. With Distill-SynthKG, the need for repeated LLM prompts is minimized, enabling high-quality KG generation with a fraction of the resources required by conventional approaches.

The SynthKG workflow involves document segmentation, which splits each input document into independent, semantically complete chunks. During this chunking process, entity disambiguation is applied to maintain a consistent reference for each entity across segments. For example, if an individual is introduced by full name in one chunk, all future mentions are updated to ensure contextual accuracy. This approach improves the coherence of each segment while preventing the loss of important relationships between entities. The next stage involves relation extraction, where entities and their types are identified and linked based on predefined propositions. Each KG segment is further enriched with a quadruplet format, providing an intermediate, indexable unit for better retrieval accuracy. By structuring each chunk independently, SynthKG avoids redundancy and maintains high-quality data integrity throughout the KG construction process.

Distill-SynthKG has shown substantial improvements over baseline models in experimental settings. For instance, the model generated over 46.9% coverage on MuSiQue and 58.2% on 2WikiMultiHopQA in terms of triplet coverage, outperforming larger models by a margin of up to 6.26% in absolute terms across various test datasets. Regarding retrieval and question-answering tasks, Distill-SynthKG consistently surpassed the performance of even models eight times larger by reducing computational costs while enhancing retrieval accuracy. This efficiency is evident in the Graph+LLM retriever, where the KG model demonstrated a 15.2% absolute improvement in retrieval tasks, particularly when answering multi-hop reasoning questions. These results confirm the efficacy of a structured multi-step approach in maximizing KG coverage and enhancing accuracy without relying on oversized LLMs.

The experimental results highlight the success of Distill-SynthKG in delivering high-performance KG synthesis with lower computational demand. By training smaller models on high-quality document-KG pairs from SynthKG, researchers achieved improved semantic accuracy, resulting in triplet densities consistent across documents of various lengths. Also, the SynthKG model produced KGs with greater triplet density, remaining steady across documents up to 1200 words, demonstrating the workflow’s scalability. Evaluated across benchmarks such as MuSiQue and HotpotQA, the model’s improvements were validated using new KG coverage metrics, which included proxy triplet coverage and semantic matching scores. These metrics further confirmed the model’s suitability for large-scale, ontology-free KG tasks, as it successfully synthesized detailed KGs that supported high-quality retrieval and multi-hop question-answering tasks.

Key Takeaways from the research:

Efficiency: Distill-SynthKG reduces the need for repeated LLM calls by consolidating KG construction into a single-step model, cutting computational costs.

Improved Coverage: Achieved 46.9% triplet coverage on MuSiQue and 58.2% on 2WikiMultiHopQA, outperforming larger models by 6.26% on average across datasets.

Enhanced Retrieval Accuracy: A 15.2% improvement in multi-hop question-answering retrieval accuracy with Graph+LLM retrieval.

Scalability: Maintained consistent triplet density across documents of varying lengths, demonstrating suitability for large datasets.

Broader Applications: The model supports efficient KG generation for various domains, from healthcare to finance, by accurately accommodating ontology-free KGs.

In conclusion, the research findings emphasize the impact of an optimized KG synthesis process that prioritizes coverage, accuracy, and computational efficiency. Distill-SynthKG not only sets a new benchmark for KG generation but also presents a scalable solution that accommodates various domains, paving the way for more efficient retrieval and question-answering frameworks. This approach could have broad implications for advancing AI’s ability to generate and structure large-scale knowledge representations, ultimately enhancing the quality of knowledge-based applications across sectors.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Researchers from Intel and Salesforce Propose SynthKG: A Multi-Step Document-Level Ontology-Free Knowledge Graphs Synthesis Workflow based on LLMs appeared first on MarkTechPost.

LLMWare Introduces Model Depot: An Extensive Collection of Small Langu …

LLMWare.ai, a pioneer in deploying and fine-tuning Small Language Models (SLMs) announced today the launching of Model Depot in Hugging Face, one of the largest collections of SLMs that are optimized for Intel PCs. With over 100 models spanning multiple use cases such as chat, coding, math, function calling, and embedding models, Model Depot aims to provide to the open-source AI community an unprecedented collection of the latest SLMs that are optimized for Intel-based PCs in Intel’s OpenVINO as well as ONNX formats.

Using LLMWare’s Model Depot combined with LLMWare’s open-source library that provides a complete toolkit for end-to-end development of AI-enabled workflows, developers can create Retrieval Augmented Generation (RAG) and agent-based workflows using SLMs in OpenVINO format for Intel hardware users. OpenVINO is an open-source library for optimizing and deploying deep learning model inferencing capabilities, including large and small language models. Specifically designed to reduce resource demands to efficiently deploy on a range of platforms, including on-device and AI PCs, OpenVINO supports model inferencing on CPUs, GPUs and Intel NPUs. 

Similarly, ONNX provides an open-source format for AI models, both deep learning and traditional ML, with a current focus on the capabilities needed for inferencing. ONNX can be found in many frameworks, tools, and hardware and aims to enable interoperability between different frameworks.

In a recent white paper, LLMWare found that deploying 4-bit quantized small language models (1B-9B parameters) in the OpenVINO format maximizes model inference performance on Intel AI PCs. When tested on a Dell laptop with Intel Core Ultra 9 (Meteor Lake), using a 1.1B parameter BLING-Tiny-Llama model, the OpenVINO quantized format led to inference speeds that are up to 7.6x faster than PyTorch and up to 7.5x faster than GGUF.

The comparison uses consistently LLMWare’s 21-question RAG test. The processing time shows the total runtime for all 21 questions:

Detailed information about LLMWare’s testing methodology can be found in the white paper.

LLMWare’s goal is to provide a powerful abstraction layer for working with various inferencing capabilities. By supporting OpenVINO, ONNX and Llama.cpp all in one platform, developers are able to leverage model formats that are most performant with specific hardware capabilities of their intended users. With Model Depot, Intel PC developers are able to access SLMs that are specifically optimized for inferencing on Intel hardware. 

Providing OpenVINO and ONNX support for the most popular SLMs today, including Microsoft Phi-3, Mistal, Llama, Yi and Qwen as well as LLMWare’s specialized function calling SLIM models designed for multi-step workflows and RAG specialized DRAGON and BLING family of models, LLMWare provides developers the SLMs to easily and seamlessly build productivity-enhancing workflows that maximize the local capabilities of AI PCs.

Optimized with powerful integrated GPUs and NPUs that provide the hardware capability to enable AI apps to be deployed on-device, AI PCs allows enterprises to deploy many lightweight AI apps locally without exposing sensitive data or necessitating data copies in external systems. This unlocks tremendous benefits from added security, safety and significant cost-savings. 

LLMWare also recently announced its strategic collaboration with Intel with its launch of Model HQ in limited release for private preview. Specifically designed for AI PCs with Intel Core Ultra Processors, Model HQ provides an out-of-the-box no-code kit for running, creating and deploying AI-enabled apps with integrated UI/UX and low-code agent workflow for easy app creation. With built-in Chatbot and Document Search and Analysis features, the app comes ready to use, with the ability to launch custom workflows directly on the device. Model HQ also comes with many enterprise-ready security and safety features such as Model Vault for model security checks, Model Safety Monitor for toxicity and bias screening, hallucination detector, AI Explainability data, Compliance and Auditing Toolkit, Privacy Filters and much more. 

“At LLMWare, we believe strongly in lowering the center of gravity in AI to enable local, private, decentralized, self-hosted deployment – with high-quality models and data pipelines optimized for safe, controlled, cost-optimized roll-outs of lightweight, customized RAG, Agent and Chat apps for enterprises of all sizes. We are thrilled to launch the Model Depot collection in open source to expand access to OpenVino and ONNX packaged models to support the AI PC roll-out over the coming months,” said Darren Oberst, Chief Technology Officer of LLMWare.

“Rise of Generative AI unlocks new application experiences that were not available with previous generations of data processing algorithms. Unique combination of powerful AI PC platform and optimization software like OpenVINO is a way to get best characteristics for local and private owned LLMs deployment without thinking of optimization details. LLMWare’s platform goes step further by allowing to use software building blocks and pretrained models to implement data processing within final application and save time to market. Combination of OpenVINO and LLMWare’s platform truly unlocks best performing Generative AI capabilities at the edge for applications,” said Yury Gorbachev, Intel Fellow, and OpenVINO Architect at Intel.

Please visit LLMWare’s Github and Hugging Face sites for its comprehensive library in open source and collection of small language models as well as llmware.ai for the latest white paper and blogs.

Thanks to AI Bloks for the thought leadership/ Educational article. AI Bloks has supported us in this content/article.
The post LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs appeared first on MarkTechPost.

Import data from Google Cloud Platform BigQuery for no-code machine le …

In the modern, cloud-centric business landscape, data is often scattered across numerous clouds and on-site systems. This fragmentation can complicate efforts by organizations to consolidate and analyze data for their machine learning (ML) initiatives.
This post presents an architectural approach to extract data from different cloud environments, such as Google Cloud Platform (GCP) BigQuery, without the need for data movement. This minimizes the complexity and overhead associated with moving data between cloud environments, enabling organizations to access and utilize their disparate data assets for ML projects.
We highlight the process of using Amazon Athena Federated Query to extract data from GCP BigQuery, using Amazon SageMaker Data Wrangler to perform data preparation, and then using the prepared data to build ML models within Amazon SageMaker Canvas, a no-code ML interface.
SageMaker Canvas allows business analysts to access and import data from over 50 sources, prepare data using natural language and over 300 built-in transforms, build and train highly accurate models, generate predictions, and deploy models to production without requiring coding or extensive ML experience.
Solution overview
The solution outlines two main steps:

Set up Amazon Athena for federated queries from GCP BigQuery, which enables running live queries in GCP BigQuery directly from Athena
Import the data into SageMaker Canvas from BigQuery using Athena as an intermediate

After the data is imported into SageMaker Canvas, you can use the no-code interface to build ML models and generate predictions based on the imported data.
You can use SageMaker Canvas to build the initial data preparation routine and generate accurate predictions without writing code. However, as your ML needs evolve or require more advanced customization, you may want to transition from a no-code environment to a code-first approach. The integration between SageMaker Canvas and Amazon SageMaker Studio allows you to operationalize the data preparation routine for production-scale deployments. For more details, refer to Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio
The overall architecture, as seen below, demonstrates how to use AWS services to seamlessly access and integrate data from a GCP BigQuery data warehouse into SageMaker Canvas for building and deploying ML models.

The workflow includes the following steps:

Within the SageMaker Canvas interface, the user composes a SQL query to run against the GCP BigQuery data warehouse. SageMaker Canvas relays this query to Athena, which acts as an intermediary service, facilitating the communication between SageMaker Canvas and BigQuery.
Athena uses the Athena Google BigQuery connector, which uses a pre-built AWS Lambda function to enable Athena federated query capabilities. This Lambda function retrieves the necessary BigQuery credentials (service account private key) from AWS Secrets Manager for authentication purposes.
After authentication, the Lambda function uses the retrieved credentials to query BigQuery and obtain the desired result set. It parses this result set and sends it back to Athena.
Athena returns the queried data from BigQuery to SageMaker Canvas, where you can use it for ML model training and development purposes within the no-code interface.

This solution offers the following benefits:

Seamless integration – SageMaker Canvas empowers you to integrate and use data from various sources, including cloud data warehouses like BigQuery, directly within its no-code ML environment. This integration eliminates the need for additional data movement or complex integrations, enabling you to focus on building and deploying ML models without the overhead of data engineering tasks.
Secure access – The use of Secrets Manager makes sure BigQuery credentials are securely stored and accessed, enhancing the overall security of the solution.
Scalability – The serverless nature of the Lambda function and the ability in Athena to handle large datasets make this solution scalable and able to accommodate growing data volumes. Additionally, you can use multiple queries to partition the data to source in parallel.

In the next sections, we dive deeper into the technical implementation details and walk through a step-by-step demonstration of this solution.
Dataset
The steps outlined in this post provide an example of how to import data into SageMaker Canvas for no-code ML. In this example, we demonstrate how to import data through Athena from GCP BigQuery.
For our dataset, we use a synthetic dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. The Churn column in the dataset indicates whether the customer left service (true/false). This Churn attribute is the target variable that the ML model should aim to predict.
The following screenshot shows an example of the dataset on the BigQuery console.

Prerequisites
Complete the following prerequisite steps:

Create a service account in GCP and a service account key.
Download the private key JSON file.
Store the JSON file in Secrets Manager:

On the Secrets Manager console, choose Secrets in the navigation pane, then choose Store a new secret.
For Secret type¸ select Other type of secret.
Copy the contents of the JSON file and enter it under Key/value pairs on the Plaintext tab.

If you don’t have a SageMaker domain already created, create it along with the user profile. For instructions, see Quick setup to Amazon SageMaker.
Make sure the user profile has permission to invoke Athena by confirming that the AWS Identity and Access Management (IAM) role has glue:GetDatabase and athena:GetDataCatalog permission on the resource. See the following example:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: [
“glue:GetDatabase”,
“athena:GetDataCatalog”
],
“Resource”: [
“arn:aws:glue:*:<AWS account id>:catalog”,
“arn:aws:glue:*:<AWS account id>:database/*”,
“arn:aws:athena:*:<AWS account id>:datacatalog/*”
]
}
]
}

Register the Athena data source connector
Complete the following steps to set up the Athena data source connector:

On the Athena console, choose Data sources in the navigation pane.
Choose Create data source.
On the Choose a data source page, search for and select Google BigQuery, then choose Next.

On the Enter data source details page, provide the following information:

For Data source name¸ enter a name.
For Description, enter an optional description.
For Lambda function, choose Create Lambda function to configure the connection.

Under Application settings¸ enter the following details:

For SpillBucket, enter the name of the bucket where the function can spill data.
For GCPProjectID, enter the project ID within GCP.
For LambdaFunctionName, enter the name of the Lambda function that you’re creating.
For SecretNamePrefix, enter the secret name stored in Secrets Manager that contains GCP credentials.

Choose Deploy.

You’re returned to the Enter data source details page.

In the Connection details section, choose the refresh icon under Lambda function.
Choose the Lambda function you just created. The ARN of the Lambda function is displayed.
Optionally, for Tags, add key-value pairs to associate with this data source.

For more information about tags, see Tagging Athena resources.

Choose Next.
On the Review and create page, review the data source details, then choose Create data source.

The Data source details section of the page for your data source shows information about your new connector. You can now use the connector in your Athena queries. For information about using data connectors in queries, see Running federated queries.
To query from Athena, launch the Athena SQL editor and choose the data source you created. You should be able to run live queries against the BigQuery database.

Connect to SageMaker Canvas with Athena as a data source
To import data from Athena, complete the following steps:

On the SageMaker Canvas console, choose Data Wrangler in the navigation pane.
Choose Import data and prepare.
Select the Tabular
Choose Athena as the data source.

SageMaker Data Wrangler in SageMaker Canvas allows you to prepare, featurize, and analyze your data. You can integrate a SageMaker Data Wrangler data preparation flow into your ML workflows to simplify and streamline data preprocessing and feature engineering using little to no coding.

Choose an Athena table in the left pane from AwsDataCatalog and drag and drop the table into the right pane.

Choose Edit in SQL and enter the following SQL query:

SELECT
state,
account_length,
area_code,
phone,
intl_plan,
vmail_plan,vmail_message,day_mins,
day_calls,
day_charge,
eve_mins,
eve_calls,
eve_charge,
night_mins,
night_calls,
night_charge,
intl_mins,
intl_calls,
intl_charge,
custserv_calls,
churn FROM “bigquery”.”athenabigquery”.”customer_churn” order by random() limit 50 ;

In the preceding query, bigquery is the data source name created in Athena, athenabigquery is the database name, and customer_churn is the table name.

Choose Run SQL to preview the dataset and when you’re satisfied with the data, choose Import.

When working with ML, it’s crucial to randomize or shuffle the dataset. This step is essential because you may have access to millions or billions of data points, but you don’t necessarily need to use the entire dataset for training the model. Instead, you can limit the data to a smaller subset specifically for training purposes. After you’ve shuffled and prepared the data, you can begin the iterative process of data preparation, feature evaluation, model training, and ultimately hosting the trained model.

You can process or export your data to a location that is suitable for your ML workflows. For example, you can export the transformed data as a SageMaker Canvas dataset and create an ML model from it.
After you export your data, choose Create model to create an ML model from your data.

The data is imported into SageMaker Canvas as a dataset from the specific table in Athena. You can now use this dataset to create a model.
Train a model
After your data is imported, it shows up on the Datasets page in SageMaker Canvas. At this stage, you can build a model. To do so, complete the following steps:

Select your dataset and choose Create a model.

For Model name, enter your model name (for this post, my_first_model).

SageMaker Canvas enables you to create models for predictive analysis, image analysis, and text analysis.

Because we want to categorize customers, select Predictive analysis for Problem type.
Choose Create.

On the Build page, you can see statistics about your dataset, such as the percentage of missing values and mode of the data.

For Target column, choose a column that you want to predict (for this post, churn).

SageMaker Canvas offers two types of models that can generate predictions. Quick build prioritizes speed over accuracy, providing a model in 2–15 minutes. Standard build prioritizes accuracy over speed, providing a model in 30 minutes–2 hours.

For this example, choose Quick build.

After the model is trained, you can analyze the model accuracy.
The Overview tab shows us the column impact, or the estimated importance of each column in predicting the target column. In this example, the Night_calls column has the most significant impact in predicting if a customer will churn. This information can help the marketing team gain insights that lead to taking actions to reduce customer churn. For example, we can see that both low and high CustServ_Calls increase the likelihood of churn. The marketing team can take actions to help prevent customer churn based on these learnings. Examples include creating a detailed FAQ on websites to reduce customer service calls, and running education campaigns with customers on the FAQ that can keep engagement up.

Generate predictions
On the Predict tab, you can generate both batch predictions and single predictions. Complete the following steps to generate a batch prediction:

Download the following sample inference dataset for generating predictions.
To test batch predictions, choose Batch prediction.

SageMaker Canvas allows you to generate batch predictions either manually or automatically on a schedule. To learn how to automate batch predictions on a schedule, refer to Manage automations.

For this post, choose Manual.
Upload the file you downloaded.
Choose Generate predictions.

After a few seconds, the prediction is complete, and you can choose View to see the prediction.

Optionally, choose Download to download a CSV file containing the full output. SageMaker Canvas will return a prediction for each row of data and the probability of the prediction being correct.

Optionally, you can deploy your models to an endpoint to make predictions. For more information, refer to Deploy your models to an endpoint.
Clean up
To avoid future charges, log out of SageMaker Canvas.
Conclusion
In this post, we showcased a solution to extract the data from BigQuery using Athena federated queries and a sample dataset. We then used the extracted data to build an ML model using SageMaker Canvas to predict customers at risk of churning—without writing code. SageMaker Canvas enables business analysts to build and deploy ML models effortlessly through its no-code interface, democratizing ML across the organization. This enables you to harness the power of advanced analytics and ML to drive business insights and innovation, without the need for specialized technical skills.
For more information, see Query any data source with Amazon Athena’s new federated query and Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas. If you’re new to SageMaker Canvas, refer to Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas.

About the authors
Amit Gautam is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.
Sujata Singh is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.

Customized model monitoring for near real-time batch inference with Am …

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. In these scenarios, customized model monitoring for near real-time batch inference with Amazon SageMaker is essential, making sure the quality of predictions is continuously monitored and any deviations are promptly detected.
In this post, we present a framework to customize the use of Amazon SageMaker Model Monitor for handling multi-payload inference requests for near real-time inference scenarios. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Early and proactive detection of deviations in model quality enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing quality issues without having to monitor models manually or build additional tooling. SageMaker Model Monitor provides monitoring capabilities for data quality, model quality, bias drift in a model’s predictions, and drift in feature attribution. SageMaker Model Monitor adapts well to common AI/ML use cases and provides advanced capabilities given edge case requirements such as monitoring custom metrics, handling ground truth data, or processing inference data capture.
You can deploy your ML model to SageMaker hosting services and get a SageMaker endpoint for real-time inference. Your client applications invoke this endpoint to get inferences from the model. To reduce the number of invocations and meet custom business objectives, AI/ML developers can customize inference code to send multiple inference records in one payload to the endpoint for near real-time model predictions. Rather than using a SageMaker Model Monitoring schedule with native configurations, a SageMaker Model Monitor Bring Your Own Container (BYOC) approach meets these custom requirements. Although this advanced BYOC topic can appear overwhelming to AI/ML developers, with the right framework, there is opportunity to accelerate SageMaker Model Monitor BYOC development for customized model monitoring requirements.
In this post, we provide a BYOC framework with SageMaker Model Monitor to enable customized payload handling (such as multi-payload requests) from SageMaker endpoint data capture, use ground truth data, and output custom business metrics for model quality.
Overview of solution
SageMaker Model Monitor uses a SageMaker pre-built image using Spark Deequ, which accelerates the usage of model monitoring. Using this pre-built image occasionally becomes problematic when customization is required. For example, the pre-built image requires one inference payload per inference invocation (request to a SageMaker endpoint). However, if you’re sending multiple payloads in one invocation to reduce the number of invocations and setting up model monitoring with SageMaker Model Monitor, then you will need to explore additional capabilities within SageMaker Model Monitor.
A preprocessor script is a capability of SageMaker Model Monitor to preprocess SageMaker endpoint data capture before creating metrics for model quality. However, even with a preprocessor script, you still face a mismatch in the designed behavior of SageMaker Model Monitor, which expects one inference payload per request.
Given these requirements, we create the BYOC framework shown in the following diagram. In this example, we demonstrate setting up a SageMaker Model Monitor job for monitoring model quality.

The workflow includes the following steps:

 Before and after training an AI/ML model, an AI/ML developer creates baseline and validation data that is used downstream for monitoring model quality. For example, users can save the accuracy score of a model, or create custom metrics, to validate model quality.
An AI/ML developer creates a SageMaker endpoint including custom inference scripts. Data capture must be enabled for the SageMaker endpoint to save real-time inference data to Amazon Simple Storage Service (Amazon S3) and support downstream SageMaker Model Monitor.
A user or application sends a request including multiple inference payloads. If you have a large volume of inference records, SageMaker batch transform may be a suitable option for your use case.
The SageMaker endpoint (which includes the custom inference code to preprocesses the multi-payload request) passes the inference data to the ML model, postprocesses the predictions, and sends a response to the user or application. The information pertaining to the request and response is stored in Amazon S3.
Independent of calling the SageMaker endpoint, the user or application generates ground truth for the predictions returned by the SageMaker endpoint.
A customer image (BYOC) is pushed to Amazon Elastic Container Registry (Amazon ECR) that contains code to perform the following actions:

Read input and output contracts required for SageMaker Model Monitor.
Read ground truth data.
Optionally, read any baseline constraint or validation data (such as accuracy score threshold).
Process data capture stored in Amazon S3 from the SageMaker endpoint.
Compare real-time data with ground truth and create model quality metrics.
Publish metrics to Amazon CloudWatch Logs and output a model quality report.

The AI/ML developer creates a SageMaker Model Monitor schedule and sets the custom image (BYOC) as the referable image URI.

This post uses code provided in the following GitHub repo to demonstrate the solution. The process includes the following steps:

Train a multi-classification XGBoost model using the public forest coverage dataset.
Create an inference script for the SageMaker endpoint for custom inference logic.
Create a SageMaker endpoint with data capture enabled.
Create a constraint file that contains metrics used to determine if model quality alerts should be generated.
Create a custom Docker image for SageMaker Model Monitor by using the SageMaker Docker Build CLI and push it to Amazon ECR.
Create a SageMaker Model Monitor schedule with the BYOC image.
View the custom model quality report generated by the SageMaker Model Monitor job.

Prerequisites
To follow along with this walkthrough, make sure you have the following prerequisites:

An AWS account
Access to Amazon SageMaker Studio
Applicable AWS Identity and Access Management (IAM) roles for SageMaker, Amazon S3, and Amazon ECR
Familiarity with model deployment and model monitoring concepts
The GitHub repository cloned to an environment within SageMaker Studio

Train the model
In the SageMaker Studio environment, launch a SageMaker training job to train a multi-classification model and output model artifacts to Amazon S3:

from sagemaker.xgboost.estimator import XGBoost
from sagemaker.estimator import Estimator

hyperparameters = {
“max_depth”: 5,
“eta”: 0.36,
“gamma”: 2.88,
“min_child_weight”: 9.89,
“subsample”: 0.77,
“objective”: “multi:softprob”,
“num_class”: 7,
“num_round”: 50
}

xgb_estimator = XGBoost(
entry_point=”./src/train.py”,
hyperparameters=hyperparameters,
role=role,
instance_count=1,
instance_type=”ml.m5.2xlarge”,
framework_version=”1.5-1″,
output_path=f’s3://{bucket}/{prefix_name}/models’
)

xgb_estimator.fit(
{
“train”: train_data_path,
“validation”: validation_data_path
},
wait=True,
logs=True
)

Create Inference Code
Before you deploy the SageMaker endpoint, create an inference script (inference.py) that contains a function to preprocess the request with multiple payloads, invoke the model, and postprocess results.
For output_fn, a payload index is created for each inference record found in the request. This enables you to merge ground truth records with data capture within the SageMaker Model Monitor job.
See the following code:
def input_fn(input_data, content_type):
“””Take request data and de-serializes the data into an object for prediction.
When an InvokeEndpoint operation is made against an Endpoint running SageMaker model server,
the model server receives two pieces of information:
– The request Content-Type, for example “application/json”
– The request data, which is at most 5 MB (5 * 1024 * 1024 bytes) in size.
Args:
input_data (obj): the request data.
content_type (str): the request Content-Type.
Returns:
(obj): data ready for prediction. For XGBoost, this defaults to DMatrix.
“””

if content_type == “application/json”:
request_json = json.loads(input_data)
prediction_df = pd.DataFrame.from_dict(request_json)
return xgb.DMatrix(prediction_df)
else:
raise ValueError

def predict_fn(input_data, model):
“””A predict_fn for XGBooost Framework. Calls a model on data deserialized in input_fn.
Args:
input_data: input data (DMatrix) for prediction deserialized by input_fn
model: XGBoost model loaded in memory by model_fn
Returns: a prediction
“””
output = model.predict(input_data, validate_features=True)
return output

def output_fn(prediction, accept):
“””Function responsible to serialize the prediction for the response.
Args:
prediction (obj): prediction returned by predict_fn .
accept (str): accept content-type expected by the client.
Returns: JSON output
“””

if accept == “application/json”:
prediction_labels = np.argmax(prediction, axis=1)
prediction_scores = np.max(prediction, axis=1)
output_returns = [
{
“payload_index”: int(index),
“label”: int(label),
“score”: float(score)} for label, score, index in zip(
prediction_labels, prediction_scores, range(len(prediction_labels))
)
]
return worker.Response(encoders.encode(output_returns, accept), mimetype=accept)

else:
raise ValueError

Deploy the SageMaker endpoint
Now that you have created the inference script, you can create the SageMaker endpoint:

from sagemaker.model_monitor import DataCaptureConfig

predictor = xgb_estimator.deploy(
instance_type=”ml.m5.large”,
initial_instance_count=1,
wait=True,
data_capture_config=DataCaptureConfig(
enable_capture=True,
sampling_percentage=100,
destination_s3_uri=f”s3://{bucket}/{prefix_name}/model-monitor/data-capture”
),
source_dir=”./src”,
entry_point=”inference.py”
)

Create constraints for model quality monitoring
In model quality monitoring, you need to compare your metric generated from ground truth and data capture with a pre-specified threshold. In this example, we use the accuracy value of the trained model on the test set as a threshold. If the newly computed accuracy metric (generated using ground truth and data capture) is lower than this threshold, a violation report will be generated and the metrics will be published to CloudWatch.
See the following code:
constraints_dict = {
“accuracy”:{
“threshold”: accuracy_value
}
}

# Serializing json
json_object = json.dumps(constraints_dict, indent=4)

# Writing to sample.json
with open(“constraints.json”, “w”) as outfile:
outfile.write(json_object)

This contraints.json file is written to Amazon S3 and will be the input for the processing job for the SageMaker Model Monitor job downstream.
Publish the BYOC image to Amazon ECR
Create a script named model_quality_monitoring.py to perform the following functions:

Read environment variables and any arguments passed to the SageMaker Model Monitor job
Read SageMaker endpoint data capture and constraint metadata configured with the SageMaker Model Monitor job
Read ground truth data from Amazon S3 using the AWS SDK for pandas
Create accuracy metrics with data capture and ground truth
Create metrics and violation reports given constraint violations
Publish metrics to CloudWatch if violations are present

This script serves as the entry point for the SageMaker Model Monitor job. With a custom image, the entry point script needs to be specified in the Docker image, as shown in the following code. This way, when the SageMaker Model Monitor job initiates, the specified script is run. The sm-mm-mqm-byoc:1.0 image URI is passed to the image_uri argument when you define the SageMaker Model Monitor job downstream.

FROM 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3

RUN python3 -m pip install awswrangler

ENV PYTHONUNBUFFERED=TRUE

ADD ./src/model_quality_monitoring.py /

ENTRYPOINT [“python3”, “/model_quality_monitoring.py”]

The custom BYOC image is pushed to Amazon ECR using the SageMaker Docker Build CLI:
sm-docker build . –file ./docker/Dockerfile –repository sm-mm-mqm-byoc:1.0
Create a SageMaker Model Monitor schedule
Next, you use the Amazon SageMaker Python SDK to create a model monitoring schedule. You can define the BYOC ECR image created in the previous section as the image_uri parameter.
You can customize the environment variables and arguments passed to the SageMaker Processing job when SageMaker Model Monitor runs the model quality monitoring job. In this example, the ground truth Amazon S3 URI path is passed as an environment variable and is used within the SageMaker Processing job:

sm_mm_mqm = ModelMonitor(
role=role,
image_uri=f”{account_id}.dkr.ecr.us-east-1.amazonaws.com/sm-mm-mqm-byoc:1.0″,
instance_count=1,
instance_type=’ml.m5.xlarge’,
base_job_name=”sm-mm-mqm-byoc”,
sagemaker_session=sess,
env={
“ground_truth_s3_uri_path”: f”s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name}”
}
)

Before you create the schedule, specify the endpoint name, the Amazon S3 URI output location you want to send violation reports to, the statistics and constraints metadata files (if applicable), and any custom arguments you want to pass to your entry script within your BYOC SageMaker Processing job. In this example, the argument –-create-violation-tests is passed, which creates a mock violation for demonstration purposes. SageMaker Model Monitor accepts the rest of the parameters and translates them into environment variables, which you can use within your custom monitoring job.
sm_mm_mqm.create_monitoring_schedule(
endpoint_input=predictor.endpoint_name,
output=MonitoringOutput(
source=”/opt/ml/processing/output”,
destination=f”s3://{bucket}/{prefix_name}/model-monitor/mqm/reports”
),
statistics=f”s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/statistics.json”,
constraints=f”s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/constraints.json”,
monitor_schedule_name=”sm-mm-byoc-batch-inf-schedule”,
schedule_cron_expression=CronExpressionGenerator().hourly(),
arguments=[
“–create-violation-tests”
]
)

Review the entry point script model_quallity_monitoring.py to better understand how to use custom arguments and environment variables provided by the SageMaker Model Monitor job.
Observe the SageMaker Model Monitor job output
Now that the SageMaker Model Monitor resource is created, the SageMaker endpoint is invoked.
In this example, a request is provided that includes a list of two payloads in which we want to collect predictions:
sm_runtime = boto3.client(“sagemaker-runtime”)

response = sm_runtime.invoke_endpoint(
EndpointName=predictor.endpoint_name,
ContentType=”application/json”,
Accept=”application/json”,
Body=test_records,
InferenceId=”0″
)

InferenceId is passed as an argument to the invoke_endpoint method. This ID is used downstream when merging the ground truth data to the real-time SageMaker endpoint data capture. In this example, we want to collect ground truth with the following structure.

InferenceI
payload_index
groundTruthLabel

0
0
1

0
1
0

This makes it simpler when merging the ground truth data with real-time data within the SageMaker Model Monitor custom job.
Because we set the CRON schedule for the SageMaker Model Monitor job to an hourly schedule, we can view the results at the end of the hour. In SageMaker Studio Classic, by navigating the SageMaker endpoint details page, you can choose the Monitoring job history tab to view status reports of the SageMaker Model Monitor job.
If an issue is found, you can choose the monitoring job name to review the report.
In this example, the custom model monitoring metric created in the BYOC flagged an accuracy score violation of -1 (this was done purposely for demonstration with the argument –create-violation-tests).

This gives you the ability to monitor model quality violations for your custom SageMaker Model Monitor job within the SageMaker Studio console. If you want to invoke CloudWatch alarms based on published CloudWatch metrics, you must create these CloudWatch metrics with your BYOC job. You can review how this is done within the monitor_quality_monitoring.py script. For automated alerts for model monitoring, creating an Amazon Simple Notification Service (Amazon SNS) topic is recommended, which email user groups will subscribe to for alerts on a given CloudWatch metric alarm.
Clean up
To avoid incurring future charges, delete all resources related to the SageMaker Model Monitor schedule by completing the following steps:

Delete data capture and any ground truth data:

! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/data-capture/{predictor.endpoint_name} –recursive
! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name} –recursive

Delete the monitoring schedule:

sm_mm_mqm.delete_monitoring_schedule()

Delete the SageMaker model and SageMaker endpoint:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion
Custom business or technical requirements for a SageMaker endpoint frequently have an impact on downstream efforts in model monitoring. In this post, we provided a framework that enables you to customize SageMaker Model Monitor jobs (in this case, for monitoring model quality) to handle the use case of passing multiple inference payloads to a SageMaker endpoint.
Explore the provided GitHub repository to implement this customized model monitoring framework with SageMaker Model Monitor. You can use this framework as a starting point to monitor your custom metrics or handle other unique requirements for model quality monitoring in your AI/ML applications.

About the Authors
Joe King is a Sr. Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS.
Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on architecting and implementing ML solutions at scale. He is a technology enthusiast and a builder with a core area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with family, traveling, and playing football.
Raju Patil is a Sr. Data Scientist with AWS Professional Services. He architects, builds, and deploys AI/ML solutions to help AWS customers across different verticals overcome business challenges in a variety of AI/ML use cases.

Meta AI Silently Releases NotebookLlama: An Open Version of Google’s …

Meta has recently released NotebookLlama, an open version of Google’s NotebookLM that empowers researchers and developers with accessible, scalable solutions for interactive data analysis and documentation. NotebookLlama integrates large language models directly into an open-source notebook interface, similar to Jupyter or Google Colab, allowing users to interact with a trained LLM as they would with any other cell in a notebook environment. By providing tools to enhance both code writing and documentation, Meta’s NotebookLlama supports a community-driven model that emphasizes transparency, openness, and flexibility—qualities often lacking in proprietary AI-driven software.

Technical Details and Benefits

NotebookLlama is powered by a highly optimized version of Meta’s Llama language models, tailored for interactive document and code generation. The model employs parameter-efficient fine-tuning, enabling developers to create personalized models suited to their specific project needs. Meta has also provided the foundational model and a set of recipes for deploying NotebookLlama across various environments, whether on local servers or cloud infrastructure, significantly lowering entry barriers for smaller institutions and individual users. NotebookLlama supports multi-turn conversations, allowing for in-depth interaction between the user and the AI—ideal for debugging, code optimization, and comprehensive explanations of both code and complex concepts.

Significance of NotebookLlama

NotebookLlama’s importance extends beyond its open-source nature; it is a crucial step toward creating accessible, community-driven alternatives in a space dominated by major corporations. Google’s NotebookLM, while powerful, is restricted to limited users and lacks advanced customization options that many users seek, particularly for deploying models on their own infrastructure. In contrast, NotebookLlama offers full control over data usage and model interaction. Early reports from beta testers have shown promising results, especially in data science education and software development. In tests involving coding tasks and explanatory documentation, NotebookLlama demonstrated impressive results, producing code and documentation on par with, or even superior to, closed models. A community-driven benchmark on Reddit highlights NotebookLlama’s effectiveness in generating insightful commentary for complex Python scripts, achieving over 90% accuracy in generating meaningful docstrings.

Conclusion

Meta’s NotebookLlama is a significant step forward in the world of open-source AI tools. By releasing an open version of Google’s NotebookLM, Meta is democratizing access to AI-powered documentation and coding. NotebookLlama is vital for those needing flexible, secure, and customizable tools for interactive analysis, bridging the gap between proprietary AI and open access. Its open-source nature fosters collaboration and lays the groundwork for future innovations across diverse fields. With NotebookLlama, the AI community gains a more inclusive and adaptable tool, empowering users to harness AI without limitations.

Check out the GitHub Repo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Meta AI Silently Releases NotebookLlama: An Open Version of Google’s NotebookLM appeared first on MarkTechPost.

Meet mcdse-2b-v1: A New Performant, Scalable and Efficient Multilingua …

The rise of the information era has brought an overwhelming amount of data in varied formats. Documents, presentations, and images are generated at an astonishing rate across multiple languages and domains. However, retrieving useful information from these diverse sources presents a significant challenge. Conventional retrieval models, while effective for text-based queries, struggle with complex multimodal content, such as screenshots or slide presentations. This poses particular challenges for businesses, researchers, and educators, who need to query and extract information from documents that combine text and visual elements. Addressing this challenge requires a model capable of efficiently handling such diverse content.

Introducing mcdse-2b-v1: A New Approach to Document Retrieval

Meet mcdse-2b-v1, a new AI model that allows you to embed page or slide screenshots and query them using natural language. Unlike traditional retrieval systems, which depend solely on text for indexing and searching, mcdse-2b-v1 enables users to work with screenshots or slides that contain a mixture of text, images, and diagrams. This opens up new possibilities for those who often deal with documents that are not purely text-based. With mcdse-2b-v1, you can take a screenshot of a slide presentation or an infographic-heavy document, embed it into the model, and perform natural language searches to obtain relevant information.

mcdse-2b-v1 bridges the gap between traditional text-based queries and more complex visual data, making it ideal for industries that require frequent content analysis from presentation decks, reports, or other visual documentation. This capability makes the model invaluable in content-rich environments, where manually browsing through visual-heavy documents is time-consuming and impractical. Instead of struggling to find that one slide from a presentation or manually going through dense reports, users can leverage natural language to instantly search for embedded content, saving time and improving productivity.

Technical Details and Benefits

mcdse-2b-v1 () builds upon MrLight/dse-qwen2-2b-mrl-v1 and is trained using the DSE approach. mcdse-2b-v1 is a performant, scalable, and efficient multilingual document retrieval model that can seamlessly handle mixed-content sources. It provides an embedding mechanism that effectively captures both textual and visual components, allowing for robust retrieval operations across multimodal data types.

One of the most notable features of mcdse-2b-v1 is its resource efficiency. For instance, it can embed 100 million pages in just 10 GB of space. This level of optimization makes it ideal for applications where data storage is at a premium, such as on-premises solutions or edge deployments. Additionally, the model can be shrunk by up to six times with minimal performance degradation, enabling it to work on devices with limited computational resources while still maintaining high retrieval accuracy.

Another benefit of mcdse-2b-v1 is its compatibility with commonly used frameworks like Transformers or vLLM, making it accessible for a wide range of users. This flexibility allows the model to be easily integrated into existing machine learning workflows without extensive modifications, making it a convenient choice for developers and data scientists.

Why mcdse-2b-v1 Matters

The significance of mcdse-2b-v1 lies not only in its ability to retrieve information efficiently but also in how it democratizes access to complex document analysis. Traditional document retrieval methods require precise structuring and often overlook the rich visual elements present in modern-day documents. mcdse-2b-v1 changes this by allowing users to access information embedded within diagrams, charts, and other non-textual components as easily as they would with a text-based query.

Early results have shown that mcdse-2b-v1 consistently delivers high retrieval accuracy, even when compressed to one-sixth of its original size. This level of performance makes it practical for large-scale deployments without the typical computational expense. Additionally, its multilingual capability means it can serve a wide range of users globally, making it valuable in multinational organizations or academic settings where multiple languages are in use.

For those working on multimodal Retrieval-Augmented Generation (RAG), mcdse-2b-v1 offers a scalable solution that provides high-performance embeddings for documents that include both text and visuals. This combination enhances the ability of downstream tasks, such as answering complex user queries or generating detailed reports from multimodal input.

Conclusion

mcdse-2b-v1 addresses the challenges of multimodal document retrieval by embedding page and slide screenshots with scalability, efficiency, and multilingual capabilities. It streamlines interactions with complex documents, freeing users from the tedious process of manual searches. Users gain a powerful retrieval model that effectively handles multimodal content, recognizing the complexities of real-world data. This model reshapes how we access and interact with knowledge embedded in both text and visuals, setting a new benchmark for document retrieval.

Check out the Model on Hugging Face and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Meet mcdse-2b-v1: A New Performant, Scalable and Efficient Multilingual Document Retrieval Model appeared first on MarkTechPost.