Mini-Gemini: A Simple and Effective Artificial Intelligence Framework …

Vision Language Models (VLMs) emerge as a result of a unique integration of Computer Vision (CV) and Natural Language Processing (NLP). This integration seeks to mimic human-like understanding by interpreting and generating content that marries images with words, giving rise to a complex challenge that has piqued the interest of researchers worldwide.

Recent developments have introduced models like LLaVA and BLIP-2, which capitalize on massive collections of image-text pairs to fine-tune cross-modal alignment. Advancements like LLaVA-Next and Otter-HD have focused on enhancing image resolution and token quality, enriching visual embeddings within LLMs, and addressing the computational challenges of processing high-resolution images. Moreover, methods such as InternLM-XComposer and auto-regressive token prediction approaches, exemplified by EMU and SEED, have sought to enable LLMs to decode images directly through extensive image-text data. While effective, these approaches have faced challenges related to latency and the need for massive training resources.

Researchers from the Chinese University of Hong Kong and SmartMore have introduced a novel framework, Mini-Gemini, that advances VLMs by enhancing multi-modal input processing. Its distinctiveness lies in employing a dual-encoder system and a novel patch info mining technique alongside a specially curated high-quality dataset. These innovations enable Mini-Gemini to process high-resolution images effectively and generate context-rich visual and textual content, setting it apart from existing models.

The methodology behind Mini-Gemini involves a dual-encoder system that includes a convolutional neural network for refined image processing, enhancing visual tokens without increasing their number. It utilizes patch info mining for detailed visual cue extraction. The framework is trained on a composite dataset, combining high-quality image-text pairs and task-oriented instructions to improve model performance and application scope. Mini-Gemini is compatible with various Large Language Models (LLMs), ranging from 2B to 34B parameters, enabling efficient any-to-any inference. This setup allows Mini-Gemini to achieve superior results in zero-shot benchmarks and supports advanced multi-modal tasks.

In evaluating Mini-Gemini’s effectiveness, the framework showcased leading performance in several zero-shot benchmarks. Specifically, it surpassed the Gemini Pro model in the MM-Vet and MMBench benchmarks, scoring 79.6 and 75.6, respectively. When configured with Hermes-2-Yi-34B, Mini-Gemini achieved a remarkable 70.1 score in the VQAT benchmark, outperforming the existing LLaVA-1.5 model across all evaluated metrics. These results validate Mini-Gemini’s advanced multi-modal processing capabilities, highlighting its efficiency and precision in handling complex visual and textual tasks.

To conclude, the research introduces Mini-Gemini, which advances VLMs through a dual-encoder system, patch info mining, and a high-quality dataset. Demonstrating exceptional performance across multiple benchmarks, Mini-Gemini outperforms established models, marking a significant step forward in multi-modal AI capabilities. However, as the researchers acknowledge, there is still room for improvement in Mini-Gemini’s visual comprehension and reasoning abilities, and they assert that future work will explore advanced methods for visual understanding, reasoning, and generation.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Mini-Gemini: A Simple and Effective Artificial Intelligence Framework Enhancing multi-modality Vision Language Models (VLMs) appeared first on MarkTechPost.

7 GPTs That Are Game-Changing For Entrepreneurs 

In the rapidly evolving world of artificial intelligence (AI), entrepreneurs find themselves at the forefront of innovation and efficiency. The advent of generative pre-trained transformers (GPT) has introduced a plethora of tools designed to streamline the entrepreneurial journey. Among these advancements, seven GPT applications stand out, promising to significantly impact how entrepreneurs operate, analyze data, and communicate their ideas.

First on the list is Data Analyst, a tool that allows entrepreneurs to upload files for instant data visualization. This application simplifies the process of understanding business metrics, providing users with a clear snapshot of their performance and areas needing attention.

Slide Creator emerges as a second indispensable tool. Recognizing the ubiquitous nature of PowerPoint presentations in the business world, Slide Creator offers an efficient solution for crafting compelling slides, saving precious time and energy that can be redirected towards more critical tasks.

For entrepreneurs brimming with ideas, Whimsical Diagrams provides a creative outlet. This tool assists in converting complex concepts into understandable flowcharts and diagrams, making it easier to share and explain ideas to teams and stakeholders.

Research plays a crucial role in entrepreneurship, and ScholarAI addresses this need by offering access to an extensive database of over 200 million research papers and books. It streamlines the search process, enabling users to find relevant information quicker than traditional search engines.

Voxscript is another game-changer, designed to help entrepreneurs stay informed and learn efficiently. By summarizing web pages and YouTube videos, it condenses vast amounts of information into digestible insights, facilitating continuous learning and improvement.

Expanding into design, Canva makes professional graphics creation accessible to all entrepreneurs, regardless of their design expertise. This tool simplifies the process of producing high-quality visuals for marketing, presentations, and social media.

Lastly, WebPilot stands out as a comprehensive solution that encompasses search, browsing, writing, and action capabilities, along with API offerings. It represents the next level of online interaction and automation for business operations.

Key Takeaways:

Empowerment Through Visualization: Tools like Data Analyst and Whimsical Diagrams empower entrepreneurs to visualize data and ideas, enhancing understanding and communication.

Efficiency in Creation: Applications such as Slide Creator and Canva streamline the creation of presentations and graphics, freeing up time for strategic thinking.

Knowledge at Your Fingertips: ScholarAI and Voxscript revolutionize the way entrepreneurs access and digest information, accelerating research and learning.

Comprehensive Online Interaction: WebPilot offers a versatile platform for managing a wide range of online activities, from browsing to automating tasks, underlining the importance of adaptability in digital entrepreneurship.

In conclusion, these seven GPT tools offer transformative solutions for entrepreneurs, each addressing a unique aspect of the entrepreneurial journey. By integrating these tools into their daily operations, entrepreneurs can not only increase efficiency and productivity but also maintain a competitive edge in the fast-paced world of business.
The post 7 GPTs That Are Game-Changing For Entrepreneurs  appeared first on MarkTechPost.

Researchers from the University of Washington and Meta AI Present a Si …

Language models (LMs) have proven their remarkable effectiveness in generating coherent and fluent continuations of a prompt or document prefix. In the text generation step, they mostly rely on two sources of knowledge: (1) prior knowledge, which is learned during pretraining and stored implicitly within the model parameters; (2) context knowledge, passed as inputs in the prefix context. However, it remains an open question how a pre-trained LM, particularly a vanilla LM without task-specific finetuning, balances these two knowledge sources during generation. LMs often need help paying enough attention to the input context and generating texts that are unfaithful or contain hallucinations. 

Previous research shows that LMs need to pay more attention to new information introduced in the context-knowledge. This can lead to hallucination in summarization, where the generated summaries include facts not present in the input document (but were learned by the LM during the training phase). More attention to context is especially problematic when the context knowledge contradicts the prior knowledge. For instance, when LLaMA is presented with the latest document, “Argentina won the FIFA World Cups in 1978,1986 and 2022 …” in its context, it still predicts “Two” in response to the question “How many World Cups have Argentina won?”, due in part to the outdated training data on which the model has learned that output.

Researchers from the University of Washington and Meta AI present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. CAD is particularly effective in overriding a model’s prior knowledge when it contradicts the provided context, leading to substantial improvements in tasks where resolving the knowledge conflict is essential.

CAD samples from a new output distribution, which amplifies the difference between output probabilities with and without the context document. This provides a new contrastive decoding form, effectively downweights the prior knowledge when more relevant contextual information is provided. CAD can be used with off-the-shelf pre-trained LMs without any additional training. They adjusted the model’s original output probability distribution using the pointwise mutual information (PMI) between the context and the generation conditioned on input.

Experimentally, they have shown that CAD outperforms the standard decoding algorithm by a large margin in all eight models across both datasets. Specifically, when applied to LLAMA30B in CNN-DM, CAD leads to a 21% increase in ROUGE-L, a 14.3% increase in factKB, and a 7.8% increase in BERT-P. This result demonstrates that CAD could effectively improve the quality and factuality of the generated summaries from a diverse set of LMs.

In conclusion, researchers from the University of Washington and Meta AI present CAD, which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context, to encourage the LM to pay sufficient attention to its context during generation, CAD, without additional training, significantly improves the faithfulness of different LM families, including OPT, GPT, LLaMA and FLAN-T5 for summarization tasks. CAD is particularly effective in overriding a model’s prior knowledge when it contradicts the provided context, leading to substantial improvements in tasks where resolving the knowledge conflict is essential.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

When augmented with retrieval, LMs sometimes overlook retrieved docs and hallucinate To make LMs trust evidence more and hallucinate less, we introduce Context-Aware Decoding: a decoding algorithm improving LM’s focus on input contexts https://t.co/SyLfHXTUC0#NAACL2024 pic.twitter.com/iwMfDQLZ0G— Weijia Shi (@WeijiaShi2) March 28, 2024

The post Researchers from the University of Washington and Meta AI Present a Simple Context-Aware Decoding (CAD) Method to Encourage the Language Model to Attend to Its Context During Generation appeared first on MarkTechPost.

This AI Paper from Intel Presents a SYCL Implementation of Fully Fused …

In the field of Artificial Intelligence (AI), Multi-Layer Perceptrons (MLPs) are the foundation for many Machine Learning (ML) tasks, including partial differential equation solving, density function representation in Neural Radiance Fields (NeRFs), and ray tracing simulation using Neural Ray Tracing.

Fully connected layers, in which every neuron in a layer is connected to every other neuron in the layer above and below, are a defining characteristic of MLPs. In MLPs, every neuron’s output is independent of the output of its nearby neurons in the same layer, in contrast to certain other topologies. Because of this property, MLPs can be used for fully fusing processes, which is essential for some computational workloads.

In recent research, a team of researchers from Intel Corporation and Ecole Polytechnique has focussed on effectively building narrow MLPs on Intel GPUs. Narrow MLPs feature a tiny, fixed number of neurons per layer and a shallow depth, i.e., the number of layers. Narrow MLPs are universal approximators that have significance in a wide range of applications despite their narrow width. Their narrow breadth, however, limits their performance, leading to low memory bandwidth utilization and arithmetic intensity during training and inference.

Combining the layers into a single kernel is a popular solution to these problems, as it allows for the use of quicker memories such as caches, shared memory, and register files. This method, called ‘fully-fused MLPs,’ was previously utilized with CUDA to construct Nvidia GPUs.

The team has shared that the goal of this study is to create fully-fused MLPs with a fixed layer width of 2^i neurons and arbitrary depth using SYCL for Intel GPUs (where i varies from 4 to 7). These MLPs are effective universal approximators in spite of the fixed layer width. Utilizing the XMX technology in Intel’s Data Centre GPU Max 1550, the implementation is based on Intel’s joint matrix SYCL extensions.

Models requiring high data throughput with batch sizes of 2^i, where i is more than 15, are especially well suited for this technique. Compared to comparable CUDA implementations, the Intel hardware SYCL version performs better, particularly for 64-width MLPs. A study has also indicated that this method requires less access to global memory than prior ones, which improves inference acceleration and theoretical peak performance. 

Benchmarks and applications, including Image Compression, Neural Radiance Fields (NeRFs), and Physics-Informed Machine Learning, have been tested in order to demonstrate performance improvements and possible applications. The provided approach performs significantly better than off-the-shelf implementations such as the CUDA PyTorch version on Nvidia’s H100 GPU and Intel Extension for PyTorch (IPEX) on the same Intel GPU in all circumstances.

The team has summarized their primary contributions as follows.

The first SYCL implementation for fully-fused Multi-Layer Perceptrons designed for Intel GPUs using XMX instructions has been introduced. 

The performance of the implementation has been assessed using a roofline model, which shows a rise in arithmetic intensity of up to 2.15 times when compared to a fully-fused implementation.

Four sample applications have been used to validate the higher performance: the regression benchmark, image compression, neural radiation fields, and physics-informed neural networks. 

The implementation is noteworthy because it can perform training 1.75 times quicker and inference 2.84 times faster than another fully-fused implementation. Its effectiveness across a variety of activities and datasets has been further demonstrated by the up to 30 times performance improvement it delivers over commercially available PyTorch versions.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post This AI Paper from Intel Presents a SYCL Implementation of Fully Fused Multi-Layer Perceptrons (MLPs) on Intel Data Center GPU Max appeared first on MarkTechPost.

Researchers from Google DeepMind and Stanford Introduce Search-Augment …

Understanding and improving the factuality of responses generated by large language models (LLMs) is critical in artificial intelligence research. The domain investigates how well these models can adhere to truthfulness when answering open-ended, fact-seeking queries across various topics. Despite their advancements, LLMs often need to work on generating content that does not contain factual inaccuracies as it poses significant reliability issues in real-world applications where accurate information is paramount.

Existing approaches to assessing the factuality of model-generated content typically rely on direct human evaluation. While valuable, this process is inherently limited by human judgment’s subjectivity and variability and the scalability challenges of applying human labor to large datasets or models. Consequently, there exists a need for more automated and objective methods to assess the accuracy of information produced by LLMs.

Researchers from Google DeepMind and Stanford University have introduced a novel automated evaluation framework named the Search-Augmented Factuality Evaluator (SAFE). This framework aims to tackle the challenge of assessing the factuality of content generated by LLMs. By automating the evaluation process, SAFE presents a scalable and efficient solution to verify the accuracy of information produced by these models, offering a significant advancement over the traditional, labor-intensive methods of fact-checking that rely heavily on human annotators.

The SAFE methodology comprehensively analyzes long-form responses generated by LLMs by breaking them down into individual facts. Each fact is then independently verified for accuracy using Google Search as a reference point. Initially, the researchers used GPT to generate LongFact, a dataset comprising approximately 16,000 facts drawn from diverse topics. This process involves a sophisticated multi-step reasoning system, which evaluates the support for each fact in the context of search results. SAFE was applied across thirteen language models spanning four model families, including Gemini, GPT, Claude, and PaLM-2, to evaluate and benchmark their factuality performance. This detailed approach ensures a thorough and objective assessment of LLM-generated content.

The effectiveness of SAFE is quantitatively affirmed when its evaluations align with those of human annotators on 72% of around LongFact’s 16,000 individual facts. In a focused analysis of 100 contentious facts, SAFE’s determinations were correct 76% of the time under further scrutiny. The framework also demonstrates its economic advantages, being more than 20 times less expensive than human annotation. Benchmark tests across thirteen language models indicated that larger models, such as GPT-4-Turbo, generally achieved better factuality, with factual precision rates reaching up to 95%. SAFE offers a scalable, cost-effective method for accurately evaluating the factuality of LLM-generated content.

To conclude, the research introduces SAFE, an innovative framework developed by researchers from Google DeepMind and Stanford University to assess LLMs’ accuracy. SAFE’s methodology employs Google Search to verify individual facts in LLM responses, showing high alignment with human assessments. By providing a scalable, cost-efficient method for factual evaluation, this research significantly advances the field of AI, enhancing the trustworthiness and reliability of information produced by LLMs.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

New @GoogleDeepMind+@Stanford paper! How can we benchmark long-form factuality in language models?We show that LLMs can generate a large dataset and are better annotators than humans, and we use this to rank Gemini, GPT, Claude, and PaLM-2 models.https://t.co/A3vgEjbqTV pic.twitter.com/x1tlgYlCdg— Jerry Wei (@JerryWeiAI) March 28, 2024

The post Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models appeared first on MarkTechPost.

This Paper Reveals Insights from Reproducing OpenAI’s RLHF (Reinforc …

In recent years, there has been an enormous development in pre-trained large language models (LLMs). These LLMs are trained to predict the next token given the previous tokens and provide a suitable prompt. They can solve various natural language processing (NLP) tasks. However, the next-token prediction objective deviates from the fundamental aim of “outputting contents that humans prefer.” 

To address this gap, Reinforcement Learning from Human Feedback (RLHF) is introduced as a pipeline to collect pair-wise human preferences, train a reward model (RM) to model these preferences, and use Reinforcement Learning (RL) to create a model that outputs contents that humans prefer. It has proven challenging to reproduce OpenAI’s RLHF pipeline in the open-source community for several reasons:

RL and RLHF have many subtle implementation details that can significantly impact training stability.

The models are challenging to evaluate for the following tasks: e.g., assessing the quality of 800 lines of generated code snippets for a coding task.

They take a long time to train and iterate.

Hugging Face, Mila and Fuxi AI lab researchers have undertaken a unique approach, presenting a high-precision reproduction of the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI’s seminal TL;DR summarization work. They meticulously created an RLHF pipeline, focusing on over 20 key implementation details. They adopted a unified learning rate for SFT, RM, and PPO training to enhance reproducibility. 

They used the transformers library’s implementation of the Pythia models in conjunction with deepspeed’s ZeRO Stage 2 to help fit the models into the GPU memory; for 6.9B PPO training, they also transferred the reference policy and reward model to the CPU. The dropout layers were turned off during training. This is important for PPO training, especially because with dropout activated, the log probabilities of tokens will not be reproducible, making calculating the KL penalty unreliable while also causing the ratios of the PPO to be not 1s during the first epoch, causing PPO optimization problems. For consistency, they also turn off dropout for SFT and RM training. 

The PPO implementation optimizes the RLHF objective, leading to a significant increase in the score total. Their best 6.9B model is preferred by GPT nearly 80% of the time, demonstrating its practical superiority. For their 1B-sized model, the average preference consistency in multiple random experiments is close to 0.4, indicating that the 1B model has captured a different set of preferences, a finding with important implications. It is shown that PPO models outperform SFT models across all summary lengths, further reinforcing the practical relevance of the research.

In conclusion, Mila and Fuxi AI lab researchers have successfully reproduced the RLHF scaling behaviors reported in OpenAI’s seminal TL;DR summarization work with high precision. Their RLHF-trained Pythia models have demonstrated significant gains in response quality that scale with model size. Notably, their 2.8B and 6.9B models have outperformed OpenAI’s released 1.3B checkpoint, underscoring the importance of model size in achieving superior results.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Happy to share our work on reproducing RLHF scaling behaviors in @OpenAI’s work in summarizing from feedback. We built an RLHF pipeline from scratch and enumerated over 20+ implementation details Fun collab with @mnoukhov, @arianTBD, @krasul, @weixunwang, and @_lewtun … pic.twitter.com/Nh4O5ahWfR— Costa Huang (@vwxyzjn) March 27, 2024

The post This Paper Reveals Insights from Reproducing OpenAI’s RLHF (Reinforcement Learning from Human Feedback) Work: Implementation and Scaling Explored appeared first on MarkTechPost.

Provide live agent assistance for your chatbot users with Amazon Lex a …

Amazon Lex provides advanced conversational artificial intelligence (AI) capabilities to enable self-service support for your organization’s contact center. With Amazon Lex, you can implement an omnichannel strategy where customers engage via phone, websites, and messaging platforms. The bots can answer FAQs, provide self-service experiences, or triage customer requests before transferring to a human agent. Amazon Lex integrates with state-of-the-art contact centers including Amazon Connect, Genesys Cloud, and Amazon Chime SDK to facilitate a seamless omnichannel experience.
This is the second post of a two-part series. The integration of Amazon Lex with Talkdesk cloud contact center is inspired by WaFd Bank (WaFd)’s digital innovation journey to enhance customer experience. In our previous post, we described how Amazon Lex integrates with the Talkdesk cloud contact center for the voice channel. In this post, we are focusing on the chat channel to show how to use Amazon Lex and the Amazon Lex Web UI to enable live agents to interact with your customers in real time. For example, the following figure shows screenshots of a chatbot transitioning a customer to a live agent chat (courtesy of WaFd Bank).

Solution overview
The following diagram illustrates the solution architecture.
In the preceding architecture, the following sequence of steps takes place in a live customer/agent conversation:

Using the Amazon Lex Web UI, a customer asks to be connected to an agent. The associated Amazon Lex chatbot is configured with an escalation intent to process the incoming agent assistance request.
The Amazon Lex fulfillment AWS Lambda function retrieves the Talkdesk touchpoint ID and Talkdesk OAuth secrets from AWS Secrets Manager and initiates a request to Talkdesk Digital Connect using the Start a Conversation API. In the payload, the function includes information that may be useful to an agent, such as the customer sentiment or the history of previously traversed intents.
If the request to the Talkdesk API is successful, a Talkdesk conversation ID is returned to Amazon Lex.
The Amazon Lex fulfillment Lambda function stores the conversation ID in Amazon Lex session attributes, thus making the conversation ID accessible to the Amazon Lex Web UI.
The Amazon Lex Web UI opens a communication session with agents on the Talkdesk contact center through a WebSocket API in Amazon API Gateway.
The Lambda associated with the WebSocket API first stores the Talkdesk conversation ID to WebSocket client ID mappings in Amazon DynamoDB. Then, through the Talkdesk Send a Message API, the Lambda function sends the customer’s message to the agent on Talkdesk contact center.
Your agent responds to the customer with a message sent through the callback Rest API in API Gateway. The payload includes the conversation ID of the active conversation.
The callback Rest API is configured to support the agents’ incoming messages as well as the agent’s closing of the conversation. In order to send the agent’s message to the customer, the supporting Lambda function reads the WebSocket client ID associated to the conversation ID from the DynamoDB table. This makes sure the agent’s message is delivered to the appropriate WebSocket client ID.
The agent’s response is displayed through the Amazon Lex Web UI and the customer responds or closes the chat as appropriate. Steps 6–9 are repeated as long as the conversation remains active. If the agent ends the conversation, the customer is notified and the WebSocket connection is closed.

In the following sections, we walk you through the steps to build the solution architecture. Dependencies among each step are cross-referenced.
Prerequisites
To implement the solution presented in this post, you should first familiarize yourself with the following AWS services and features:

Amazon API Gateway

WebSocket API operations
Rest API operations

Amazon CloudWatch
Amazon DynamoDB
AWS Identity and Access Management (IAM)
AWS Lambda
Amazon Lex

Amazon Lex Web UI

AWS Secrets Manager

Additionally, you should be familiar with the following Talkdesk services:

Talkdesk cloud contact center account
Talkdesk Digital Engagement with chat channel for agents
Talkdesk Digital Connect API – To send messages to the agent, use the following Talkdesk Digital Connect API components:

Start a Conversation – Create an identifier used to interact with an agent
Send a Message – Send a message to the agent

Prepare your Talkdesk instance for the Amazon Lex Web UI chat with an agent
This section outlines the basic steps required to configure the Talkdesk chat with agent experience using the Talkdesk Digital Connect channel. Review Talkdesk APIs for further details for any additional tasks that may be required as part of your specific implementation.
Complete the following steps:

Enable Talkdesk Digital Connect on your Talkdesk instance.
Configure your agents’ accounts and assign them to the agents’ queues.
Build a Talkdesk Studio flow.

This will be used to send chat users to an inbox for agents to assign. A sample is provided with this solution.

To create an integration for your Amazon Lex Web UI instance, in the Talkdesk Builder navigation pane, select Integrations.
On the Actions tab, configure three actions using the input and output schemas provided through the following links:

conversation_ended
conversation_started
message_created

Create a Talkdesk Digital Connect Touchpoint.
Name the Touchpoint Lex Web UI Chat and record the Touchpoint ID.

This will be stored in Secrets Manager as dev/talkdesk/touchpoint/ids.

In Talkdesk Builder, choose OAuth Clients in the navigation pane to set up OAuth credentials.
Select Grant type for Client credentials and set Scope to digital-connect:write.
Record the client ID and secret key from the Keys tab.

These will be stored in Secrets Manager as dev/talkdesk/client/keys and used to authenticate and communicate with the Talkdesk API.

In your AWS account, store the two secrets in Secrets Manager.

The following screenshot shows the details of the Touchpoint ID as a Secrets Manager secret.

The following screenshot shows the details of the client ID as a Secrets Manager secret.

Deploy the Talkdesk Amazon Lex CloudFormation template
The following AWS CloudFormation template creates all the resources of the solution architecture. This includes all necessary IAM roles to invoke API operations, run associated Lambda functions, access secrets on Secrets Manager, and store and retrieve conversation ID and WebSocket client ID pairs from DynamoDB.
To facilitate monitoring and debugging, a CloudWatch log group is created for each of the resources.
The CloudFormation template provides additional details for each of the resources.
Complete the following steps to deploy the template:

Sign in to the AWS Management Console.
Choose Launch Stack for your AWS Region to begin the CloudFormation stack creation process.

US East (N. Virginia)

US West (Oregon)

Asia Pacific (Singapore)

Asia Pacific (Sydney)

Asia Pacific (Tokyo)

Europe (Frankfurt)

Europe (Ireland)

Europe (London)

For Stack name, enter a name.
For TDAUTHHOST, enter the URL of your Talkdesk instance.
Leave the other parameters as default and choose Next
Select the acknowledgement check boxes and choose Create stack.
After the CloudFormation template is complete, record the values for the following keys on the Outputs tab to use in later steps:

APIGatewayApiKey
BotAliasId
BotId
CallbackRestAPI
WebSocketAPIEndpoint

Update the Talkdesk instance
Log in to your Talkdesk instance and complete the following steps to update your instance:

In Talkdesk Builder, select Integrations in the navigation pane.
On the Settings tab, locate Base path and enter the callback Rest API URL you recorded earlier.
Under Other settings, set x-api-key to the value of the API Gateway key.

Deploy the Amazon Lex Web UI
The solution outlined in this post uses the Amazon Lex Web UI, a full-featured web client to deploy your Amazon Lex chatbot on your website. With the Amazon Lex Web UI, you can quickly bring your chatbot-powered application to life while minimizing time-to-value.

Choose Launch Stack for the Region in which you will use your chatbot:

US East (N. Virginia)

US West (Oregon)

Asia Pacific (Singapore)

Asia Pacific (Sydney)

Asia Pacific (Tokyo)

Europe (Frankfurt)

Europe (Ireland)

Europe (London)

For LexV2BotId, enter the value for BotId.
For LexV2BotAliasId, enter the value for BotAliasId.
Launch the stack.
When deployment is complete, locate the Amazon Simple Storage Service (Amazon S3) URL for WebAppBucket.
Navigate to the S3 bucket on the Amazon S3 console and download the lex-web-ui-loader-config.json file.
Open the file and modify or add the following parameters:

In the connect configuration section, add the new parameter talkDeskWebsocketEndpoint and set its value to the WebSocket endpoint.
In the UI configuration section, set enableLiveChat to true.

Upload the modified lex-web-ui-loader-config.json file and overwrite the previous version of the file in the S3 bucket.
Return to the CloudFormation stack Outputs tab and find the WebAppDomainName link.

This will redirect you to a full-page version of the Amazon Lex Web UI. From here, you can test the Talkdesk integration and confirm that the bot is able to connect to Talkdesk using the WebSocket connection.
Test the solution
Now you’re ready to try the Amazon Lex and Talkdesk chat interaction:

Start your Banking Bot chat window using the WebAppUrl provided as output in the CloudFormation stack.
Log in to your Talkdesk Digital Connect channel and navigate to Conversations.
In the Banking Bot chat window, request to talk to an agent.
Watch the customer’s message being delivered to the Talkdesk Conversations Inbox.
The Talkdesk agent self-assigns the conversation and starts engaging with the customer.

The following video demonstrates the chat experience.

Clean up
To clean up your resources, complete the following steps:

On the AWS CloudFormation console, select Stacks in the navigation pane.
Select the LexTalkdesk stack (or the stack name you provided), and select Delete.
Delete the stack resources by selecting Delete stack.

Conclusion
Amazon Lex brings the power of conversational self-service to your customer preferred channels, such as phone, web chat, and messaging applications. In this post, we demonstrated a solution that provides live agent assistance on your website with Amazon Lex, Amazon Lex Web UI, and Talkdesk cloud contact center. We provided a CloudFormation stack that includes DynamoDB and Lambda resources, and a Rest API and WebSocket API in API Gateway to maintain a communication session with agents in the Talkdesk contact center.
This solution is meant to be a reference architecture or a quick implementation guide that can be tailored to suit your organization’s requirements. If you need help setting up this solution, AWS Professional Services and Talkdesk are available to help you and your team through the process of selecting the right technologies for your cloud contact center.

About the authors
Grazia Russo Lassner is a Senior Consultant with the AWS Professional Services Natural Language AI team. She specialises in designing and developing conversational AI solutions using AWS technologies for customers in various industries. Outside of work, she enjoys beach weekends, reading the latest fiction books, and family time.
Austin Johnson is a Solutions Architect, helping to maintain the Lex Web UI open source library.
Chris Brown is a Principal Natural Language AI consultant at AWS focused on digital customer experiences – including mobile apps, websites, marketing campaigns, and most recently conversational AI applications. Chris is an award-winning strategist and product manager – working with the Fortune 100 to deliver the best experiences for their customers. In his free time, Chris enjoys traveling, music, art, and experiencing new cultures.
Bruno Mateus is a Principal Engineer at Talkdesk. With over 20 years of experience in the software industry, he specialises in large-scale distributed systems. When not working, he enjoys spending time outside with his family, trekking, mountain bike riding, and motorcycle riding.
Jonathan Diedrich is a Principal Solutions Consultant at Talkdesk. He works on enterprise and strategic projects to ensure technical execution and adoption. Outside of work, he enjoys ice hockey and games with the family.
Crispim Tribuna is a Senior Software Engineer at Talkdesk currently focusing on the AI-based virtual agent project. He has over 17 years of experience in computer science, with a focus on telecommunications, IPTV, and fraud prevention. In his free time, he enjoys spending time with his family, running (he has completed three marathons), and riding motorcycles.

Gmail Bulk Sender Rules: Preparing for April 1st Updates & Beyond

Back in October, Google and Yahoo announced their new bulk sender rules. 

These rules, which effectively drew a line in the sand when it came to spam complaint rates, were a big change in that this was the first time any big email provider had given specific directives. 

The new rules?

Mandatory digital email signing (domain authentication)

0.3% spam complaint rate threshold

The updates, rolled out February 1, 2024, were a surprise to many and certainly presented a new challenge for email marketers – especially outbound email marketers.

In fact, as we dug into just how these new rules would impact outbound marketers, we found that a lot of people were going to be in real trouble.

The thing is, since the announcement of the new guidelines, we haven’t seen much chatter about it. 

Until now. 

It seems this is more of a slow rollout than a big change that would have an immediate impact, starting with domain authentication. 

April 1st Domain Authentication Requirements

According to Gmail’s group product manager, Neil Kumaran, all bulk senders* will be required to authenticate their email beginning April 1, 2024. 

What’s key here is the word required.

While email authentication should certainly be the norm by now, the fact Google is requiring it shows how serious they are. 

They are so serious that they are flat-out going to reject emails that don’t meet the requirements!

From Forbes:

“Starting from April 1, Google will reject emails from bulk senders unless they meet new authentication requirements. This strict rule is aimed at reducing the amount of spam that lands in Gmail inboxes and enhancing the security of Gmail users. By implementing these new requirements, Google is aiming to prevent malicious actors from using unauthenticated or compromised domains to deliver their dangerous payloads and reduce unwanted spam.”

As you can see, there are real repercussions here for those who choose not to adhere to the Google guidelines. 

Sounds like it’s time to get your emails in line if they aren’t already.

*Note: Current bulk sender rules refer to 5,000+ email sends to Gmail users

Staying in Line with New Google Guidelines

Authenticating your email account is simple and honestly, the majority of ESPs have this capability built in. 

Here are links to a few of our partner sites:

How to Authenticate Your Email with Sendgrid

How to Verify a Domain with Sendlane

What is DMARC and How Do I Set it Up on Klaviyo?

Along with ensuring your email is authenticated, there are other parts of the spam rules that still need to be adhered to, including unsubscribe links. 

In our original deep drive, we noted that clear unsubscribes were crucial. Make it easy for people to unsubscribe and they are less likely to mark you as spam.

Well, come June 1st, not only do bulk senders need to have an unsubscribe, they have to have a one-click unsubscribe option that is processed within 48 hours.

As for the spam complaint rate threshold?

I think we are going to see more about that as these updates continue to roll out. 

In the meantime, make sure you are creating an outbound email marketing program that focuses on warm leads, personalized messaging, and best practices.

How Can Customers.ai Help Prevent Spam Penalties?

One of the main reasons emails get marked as spam is because the recipient has never heard of the sender. 

Our inboxes are already so crowded – don’t bother me with something I’ve never even heard of, right?

That’s where Customers.ai comes in. 

We identify people who are already on your site. 

People who are engaging with high-intent pages.

People who haven’t received an email or ad previously.

These are the people you want to target. 

Forget the cold email lists of the past you had to buy. These are the people who are going to open your emails, engage, improve your email deliverability, and drive up revenue!

Ready to see how we can help?

Sign up for free and start identifying visitors who actually want to hear from you.

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Gmail Bulk Sender Rules: Preparing for April 1st Updates & Beyond appeared first on Customers.ai.

FedFixer: A Machine Learning Algorithm with the Dual Model Structure t …

In today’s world, where data is distributed across various locations and privacy is paramount, Federated Learning (FL) has emerged as a game-changing solution. It enables multiple parties to train machine learning models collaboratively without sharing their data, ensuring that sensitive information remains locally stored and protected. However, a significant challenge arises when the data labels provided by human annotators are imperfect, leading to heterogeneous label noise distributions across different parties involved in the federated learning process. This issue can severely undermine the performance of FL models, hindering their ability to generalize effectively and make accurate predictions.

Researchers have explored various approaches to address label noise in FL, broadly classified into coarse-grained and fine-grained methods. Coarse-grained methods focus on strategies at the client level, such as selectively choosing clients with low noise ratios or identifying clean client sets. On the other hand, fine-grained methods concentrate on techniques at the sample level, aiming to identify and filter out noisy label samples from individual clients.

However, a common limitation of these existing methods is that they often need to pay more attention to the inherent heterogeneity of label noise distributions across clients. This heterogeneity can arise from varying true class distributions or personalized human labeling errors, making it challenging to achieve substantial performance improvements.

To tackle this issue head-on, a team of researchers from Xi’an Jiaotong University, Leiden University, Docta AI, California State University, Monterey Bay, and the University of California, Santa Cruz, has proposed FedFixer. This innovative algorithm leverages a dual model structure consisting of a global model and a personalized model. The global model benefits from aggregated updates across clients, robustly representing the overall data distribution.

Conversely, the personalized model is specifically designed to adapt to the unique characteristics of each client’s data, including client-specific samples and label noise patterns.

In their groundbreaking approach, the researchers behind FedFixer have incorporated two key regularization techniques to combat the potential overfitting of the dual models, particularly the personalized model, which is trained on limited local data.

The first technique is a confidence regularizer, which modifies the traditional Cross-Entropy loss function to alleviate the impact of unconfident predictions caused by label noise. By incorporating a term that encourages the model to produce confident predictions, the confidence regularizer guides the model towards better fitting the clean dataset, reducing the influence of noisy label samples.

The second technique is a distance regularizer, which constrains the disparity between the personalized and global models. This regularizer is implemented by adding a term to the loss function that penalizes the deviation of the personalized model’s parameters from the global model’s parameters. The distance regularizer acts as a stabilizing force, preventing the personalized model from overfitting to local noisy data due to the limited sample size available on each client.

Furthermore, FedFixer employs an alternative update strategy for the dual models during the local training. The global and personalized models are updated using the samples selected by each other’s model. This alternating update process leverages the complementary strengths of the two models, effectively decreasing the risk of error accumulation from a single model over time.

The researchers conducted extensive experiments on benchmark datasets, including MNIST, CIFAR-10, and Clothing1M, with varying degrees of label noise and heterogeneity. The results demonstrate that FedFixer outperforms existing state-of-the-art methods, particularly in highly heterogeneous label noise scenarios. For example, on the CIFAR-10 dataset with a non-IID distribution, a noisy client ratio of 1.0, and a lower bound noise level of 0.5, FedFixer achieved an accuracy of 59.01%, up to 10% higher than other methods.

To illustrate the potential real-world impact, consider a healthcare application where federated learning is employed to collaboratively train diagnostic models across multiple hospitals while preserving patient data privacy. In such a scenario, label noise can arise due to variations in medical expertise, subjective interpretations, or human errors during the annotation process. FedFixer’s ability to handle heterogeneous label noise distributions would be invaluable, as it could effectively filter out mislabeled data and improve the generalization performance of the diagnostic models, ultimately leading to more accurate and reliable predictions that could save lives.

In conclusion, the research paper introduces FedFixer, an innovative approach to mitigating the impact of heterogeneous label noise in Federated Learning. By employing a dual model structure with regularization techniques and alternative updates, FedFixer effectively identifies and filters out noisy label samples across clients, improving generalization performance, especially in highly heterogeneous label noise scenarios. The proposed method’s effectiveness has been extensively validated through experiments on benchmark datasets, demonstrating its potential for real-world applications where data privacy and label noise are significant concerns, such as in the healthcare domain or any other field where accurate and reliable predictions are crucial.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post FedFixer: A Machine Learning Algorithm with the Dual Model Structure to Mitigate the Impact of Heterogeneous Noisy Label Samples in Federated Learning appeared first on MarkTechPost.

Researchers at the University of Maryland Propose a Unified Machine Le …

Continual Learning (CL) is a method that focuses on gaining knowledge from dynamically changing data distributions. This technique mimics real-world scenarios and helps improve the performance of a model as it encounters new data while retaining previous information. However, CL faces a challenge called catastrophic forgetting, in which the model forgets or overwrites previous knowledge when learning new information.

Researchers have introduced various methods to address this limitation of Continual Learning CL. Strategies like Bayesian-based techniques, regularization-driven solutions, memory-replay-oriented methodologies, etc., have been developed. However, they lack a cohesive framework and a standardized terminology for their formulation. In this research paper, the authors from the University of Maryland, College Park, and JD Explore Academy have introduced a unified and general framework for Continual Learning CL that encompasses and reconciles these existing methods.

Their work is inspired by the ability of the human brain to selectively forget certain things to enable more efficient cognitive processes. The researchers have introduced a refresh learning mechanism that first unlearns and then relearns the current loss function. Forgetting less relevant details enables the model to learn new tasks without significantly impacting its performance on previously learned tasks. This mechanism has a seamless integration capability and is easily compatible with existing CL methods, allowing for an enhanced overall performance.

The researchers demonstrated the capabilities of their method by providing an in-depth theoretical analysis. They showed that their method minimized the Fisher Information Matrix weighted gradient norm of the loss function and encouraged the flattening of the loss landscape, which resulted in an improved generalization.

The researchers also conducted various experiments on different datasets, including CIFAR10, CIFAR100, and Tiny-ImageNet, to assess the effectiveness of their method. The results showed that by using the refresh plug-in, the performance of the compared methods improved significantly, highlighting the effectiveness and general applicability of the refresh mechanism.

In conclusion, the authors of this research paper have tried to address the limitations associated with Continual Learning CL by introducing a unified framework that encompasses and reconciles the existing methods. They also introduced a novel approach called refresh learning that enables models to unlearn or forget less relevant information, which improves their overall performance. They validated their work by conducting various experiments, which demonstrated the effectiveness of their method. This research represents a significant advancement in the field of CL and offers a unified and adaptable solution.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Researchers at the University of Maryland Propose a Unified Machine Learning Framework for Continual Learning (CL) appeared first on MarkTechPost.

This AI Paper Explores the Impact of Model Compression on Subgroup Rob …

The significant computational demands of large language models (LLMs) have hindered their adoption across various sectors. This hindrance has shifted attention towards compression techniques designed to reduce the model size and computational needs without major performance trade-offs. This pivot is crucial in Natural Language Processing (NLP), facilitating applications from document classification to advanced conversational agents. A pressing concern in this transition is ensuring compressed models maintain robustness towards minority subgroups in datasets defined by specific labels and attributes. 

Previous works have focused on Knowledge Distillation, Pruning, Quantization, and Vocabulary Transfer, which aim to retain the essence of the original models in much smaller footprints. Similar efforts have been made to explore the effects of model compression on classes or attributes in images, such as imbalanced classes and sensitive attributes. These approaches have shown promise in maintaining overall performance metrics; however, their impact on the nuanced metric of subgroup robustness still needs to be explored. 

A research team from the University of Sussex, BCAM Severo Ochoa Strategic Lab on Trustworthy Machine Learning, Monash University, and expert.ai have proposed a comprehensive investigation into the effects of model compression on the subgroup robustness of BERT language models. The study uses MultiNLI, CivilComments, and SCOTUS datasets to explore 18 different compression methods, including knowledge distillation, pruning, quantization, and vocabulary transfer.

The methodology employed in this study involved training each compressed BERT model using Empirical Risk Minimization (ERM) with five distinct initializations. The aim was to gauge the models’ efficacy through metrics like average accuracy, worst-group accuracy (WGA), and overall model size. Different datasets required tailored approaches for fine-tuning, involving variable epochs, batch sizes, and learning rates specific to each. For methods involving vocabulary transfer, an initial phase of masked-language modeling was conducted before the fine-tuning process, ensuring the models were adequately prepared for the compression’s impact.

Findings highlight significant variances in model performance across different compression techniques. For instance, in the MultiNLI dataset, models like TinyBERT6 outperformed the baseline BERTBase model, showcasing an 85.26% average accuracy with a notable 72.74% worst-group accuracy (WGA). Conversely, when applied to the SCOTUS dataset, a stark performance drop was observed, with some models’ WGA collapsing to 0%, indicating a critical threshold of model capacity for effectively managing subgroup robustness. 

To conclude, this research sheds light on the nuanced impacts of model compression techniques on the robustness of BERT models towards minority subgroups across several datasets. The analysis highlighted that compression methods can improve the performance of language models on minority subgroups, but this effectiveness can vary depending on the dataset and weight initialization after compression. The study’s limitations include focusing on English language datasets and not considering combinations of compression methods.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post This AI Paper Explores the Impact of Model Compression on Subgroup Robustness in BERT Language Models appeared first on MarkTechPost.

Advanced RAG patterns on Amazon SageMaker

Today, customers of all industries—whether it’s financial services, healthcare and life sciences, travel and hospitality, media and entertainment, telecommunications, software as a service (SaaS), and even proprietary model providers—are using large language models (LLMs) to build applications like question and answering (QnA) chatbots, search engines, and knowledge bases. These generative AI applications are not only used to automate existing business processes, but also have the ability to transform the experience for customers using these applications. With the advancements being made with LLMs like the Mixtral-8x7B Instruct, derivative of architectures such as the mixture of experts (MoE), customers are continuously looking for ways to improve the performance and accuracy of generative AI applications while allowing them to effectively use a wider range of closed and open source models.
A number of techniques are typically used to improve the accuracy and performance of an LLM’s output, such as fine-tuning with parameter efficient fine-tuning (PEFT), reinforcement learning from human feedback (RLHF), and performing knowledge distillation. However, when building generative AI applications, you can use an alternative solution that allows for the dynamic incorporation of external knowledge and allows you to control the information used for generation without the need to fine-tune your existing foundational model. This is where Retrieval Augmented Generation (RAG) comes in, specifically for generative AI applications as opposed to the more expensive and robust fine-tuning alternatives we’ve discussed. If you’re implementing complex RAG applications into your daily tasks, you may encounter common challenges with your RAG systems such as inaccurate retrieval, increasing size and complexity of documents, and overflow of context, which can significantly impact the quality and reliability of generated answers.
This post discusses RAG patterns to improve response accuracy using LangChain and tools such as the parent document retriever in addition to techniques like contextual compression in order to enable developers to improve existing generative AI applications.
Solution overview
In this post, we demonstrate the use of Mixtral-8x7B Instruct text generation combined with the BGE Large En embedding model to efficiently construct a RAG QnA system on an Amazon SageMaker notebook using the parent document retriever tool and contextual compression technique. The following diagram illustrates the architecture of this solution.

You can deploy this solution with just a few clicks using Amazon SageMaker JumpStart, a fully managed platform that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly and with ease, accelerating the development and deployment of machine learning (ML) applications. One of the key components of SageMaker JumpStart is the Model Hub, which offers a vast catalog of pre-trained models, such as the Mixtral-8x7B, for a variety of tasks.
Mixtral-8x7B uses an MoE architecture. This architecture allows different parts of a neural network to specialize in different tasks, effectively dividing the workload among multiple experts. This approach enables the efficient training and deployment of larger models compared to traditional architectures.
One of the main advantages of the MoE architecture is its scalability. By distributing the workload across multiple experts, MoE models can be trained on larger datasets and achieve better performance than traditional models of the same size. Additionally, MoE models can be more efficient during inference because only a subset of experts needs to be activated for a given input.
For more information on Mixtral-8x7B Instruct on AWS, refer to Mixtral-8x7B is now available in Amazon SageMaker JumpStart. The Mixtral-8x7B model is made available under the permissive Apache 2.0 license, for use without restrictions.
In this post, we discuss how you can use LangChain to create effective and more efficient RAG applications. LangChain is an open source Python library designed to build applications with LLMs. It provides a modular and flexible framework for combining LLMs with other components, such as knowledge bases, retrieval systems, and other AI tools, to create powerful and customizable applications.
We walk through constructing a RAG pipeline on SageMaker with Mixtral-8x7B. We use the Mixtral-8x7B Instruct text generation model with the BGE Large En embedding model to create an efficient QnA system using RAG on a SageMaker notebook. We use an ml.t3.medium instance to demonstrate deploying LLMs via SageMaker JumpStart, which can be accessed through a SageMaker-generated API endpoint. This setup allows for the exploration, experimentation, and optimization of advanced RAG techniques with LangChain. We also illustrate the integration of the FAISS Embedding store into the RAG workflow, highlighting its role in storing and retrieving embeddings to enhance the system’s performance.
We perform a brief walkthrough of the SageMaker notebook. For more detailed and step-by-step instructions, refer to the Advanced RAG Patterns with Mixtral on SageMaker Jumpstart GitHub repo.
The need for advanced RAG patterns
Advanced RAG patterns are essential to improve upon the current capabilities of LLMs in processing, understanding, and generating human-like text. As the size and complexity of documents increase, representing multiple facets of the document in a single embedding can lead to a loss of specificity. Although it’s essential to capture the general essence of a document, it’s equally crucial to recognize and represent the varied sub-contexts within. This is a challenge you are often faced with when working with larger documents. Another challenge with RAG is that with retrieval, you aren’t aware of the specific queries that your document storage system will deal with upon ingestion. This could lead to information most relevant to a query being buried under text (context overflow). To mitigate failure and improve upon the existing RAG architecture, you can use advanced RAG patterns (parent document retriever and contextual compression) to reduce retrieval errors, enhance answer quality, and enable complex question handling.
With the techniques discussed in this post, you can address key challenges associated with external knowledge retrieval and integration, enabling your application to deliver more precise and contextually aware responses.
In the following sections, we explore how parent document retrievers and contextual compression can help you deal with some of the problems we’ve discussed.
Parent document retriever
In the previous section, we highlighted challenges that RAG applications encounter when dealing with extensive documents. To address these challenges, parent document retrievers categorize and designate incoming documents as parent documents. These documents are recognized for their comprehensive nature but aren’t directly utilized in their original form for embeddings. Rather than compressing an entire document into a single embedding, parent document retrievers dissect these parent documents into child documents. Each child document captures distinct aspects or topics from the broader parent document. Following the identification of these child segments, individual embeddings are assigned to each, capturing their specific thematic essence (see the following diagram). During retrieval, the parent document is invoked. This technique provides targeted yet broad-ranging search capabilities, furnishing the LLM with a wider perspective. Parent document retrievers provide LLMs with a twofold advantage: the specificity of child document embeddings for precise and relevant information retrieval, coupled with the invocation of parent documents for response generation, which enriches the LLM’s outputs with a layered and thorough context.

Contextual compression
To address the issue of context overflow discussed earlier, you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context, so only pertinent information is kept and processed. This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance, as illustrated in the following diagram. This streamlined approach, facilitated by the contextual compression retriever, greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information. It tackles the issue of information overload and irrelevant data processing head-on, leading to improved response quality, more cost-effective LLM operations, and a smoother overall retrieval process. Essentially, it’s a filter that tailors the information to the query at hand, making it a much-needed tool for developers aiming to optimize their RAG applications for better performance and user satisfaction.

Prerequisites
If you’re new to SageMaker, refer to the Amazon SageMaker Development Guide.
Before you get started with the solution, create an AWS account. When you create an AWS account, you get a single sign-on (SSO) identity that has complete access to all the AWS services and resources in the account. This identity is called the AWS account root user.
Signing in to the AWS Management Console using the email address and password that you used to create the account gives you complete access to all the AWS resources in your account. We strongly recommend that you do not use the root user for everyday tasks, even the administrative ones.
Instead, adhere to the security best practices in AWS Identity and Access Management (IAM), and create an administrative user and group. Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.
The Mixtral-8x7b model requires an ml.g5.48xlarge instance. SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to launch an endpoint to host Mixtral-8x7B from SageMaker JumpStart, you may need to request a service quota increase to access an ml.g5.48xlarge instance for endpoint usage. You can request service quota increases through the console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.
Set up a SageMaker notebook instance and install dependencies
To get started, create a SageMaker notebook instance and install the required dependencies. Refer to the GitHub repo to ensure a successful setup. After you set up the notebook instance, you can deploy the model.
You can also run the notebook locally on your preferred integrated development environment (IDE). Make sure that you have the Jupyter notebook lab installed.
Deploy the model
Deploy the Mixtral-8X7B Instruct LLM model on SageMaker JumpStart:

# Import the JumpStartModel class from the SageMaker JumpStart library
from sagemaker.jumpstart.model import JumpStartModel

# Specify the model ID for the HuggingFace Mixtral 8x7b Instruct LLM model
model_id = “huggingface-llm-mixtral-8x7b-instruct”
model = JumpStartModel(model_id=model_id)
llm_predictor = model.deploy()

Deploy the BGE Large En embedding model on SageMaker JumpStart:

# Specify the model ID for the HuggingFace BGE Large EN Embedding model
model_id = “huggingface-sentencesimilarity-bge-large-en”
text_embedding_model = JumpStartModel(model_id=model_id)
embedding_predictor = text_embedding_model.deploy()

Set up LangChain
After importing all the necessary libraries and deploying the Mixtral-8x7B model and BGE Large En embeddings model, you can now set up LangChain. For step-by-step instructions, refer to the GitHub repo.
Data preparation
In this post, we use several years of Amazon’s Letters to Shareholders as a text corpus to perform QnA on. For more detailed steps to prepare the data, refer to the GitHub repo.
Question answering
Once the data is prepared, you can use the wrapper provided by LangChain, which wraps around the vector store and takes input for the LLM. This wrapper performs the following steps:

Take the input question.
Create a question embedding.
Fetch relevant documents.
Incorporate the documents and the question into a prompt.
Invoke the model with the prompt and generate the answer in a readable manner.

Now that the vector store is in place, you can start asking questions:

prompt_template = “””<s>[INST]
{query}
[INST]”””
PROMPT = PromptTemplate(
template=prompt_template, input_variables=[“query”]
)
query = “How has AWS evolved?”
answer = wrapper_store_faiss.query(question=PROMPT.format(query=query), llm=llm)
print(answer)
AWS, or Amazon Web Services, has evolved significantly since its initial launch in 2006. It started as a feature-poor service, offering only one instance size, in one data center, in one region of the world, with Linux operating system instances only. There was no monitoring, load balancing, auto-scaling, or persistent storage at the time. However, AWS had a successful launch and has since grown into a multi-billion-dollar service.

Over the years, AWS has added numerous features and services, with over 3,300 new ones launched in 2022 alone. They have expanded their offerings to include Windows, monitoring, load balancing, auto-scaling, and persistent storage. AWS has also made significant investments in long-term inventions that have changed what’s possible in technology infrastructure.

One example of this is their investment in chip development. AWS has also seen a robust new customer pipeline and active migrations, with many companies opting to move to AWS for the agility, innovation, cost-efficiency, and security benefits it offers. AWS has transformed how customers, from start-ups to multinational companies to public sector organizations, manage their technology infrastructure.

Regular retriever chain
In the preceding scenario, we explored the quick and straightforward way to get a context-aware answer to your question. Now let’s look at a more customizable option with the help of RetrievalQA, where you can customize how the documents fetched should be added to the prompt using the chain_type parameter. Also, in order to control how many relevant documents should be retrieved, you can change the k parameter in the following code to see different outputs. In many scenarios, you might want to know which source documents the LLM used to generate the answer. You can get those documents in the output using return_source_documents, which returns the documents that are added to the context of the LLM prompt. RetrievalQA also allows you to provide a custom prompt template that can be specific to the model.

from langchain.chains import RetrievalQA

prompt_template = “””<s>[INST]
Use the following pieces of context to provide a concise answer to the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.

{context}

Question: {question}

[INST]”””
PROMPT = PromptTemplate(
template=prompt_template, input_variables=[“context”, “question”]
)

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type=”stuff”,
retriever=vectorstore_faiss.as_retriever(
search_type=”similarity”, search_kwargs={“k”: 3}
),
return_source_documents=True,
chain_type_kwargs={“prompt”: PROMPT}
)

Let’s ask a question:

query = “How did AWS evolve?”
result = qa({“query”: query})
print(result[‘result’])
AWS (Amazon Web Services) evolved from an initially unprofitable investment to an $85B annual revenue run rate business with strong profitability, offering a wide range of services and features, and becoming a significant part of Amazon’s portfolio. Despite facing skepticism and short-term headwinds, AWS continued to innovate, attract new customers, and migrate active customers, offering benefits such as agility, innovation, cost-efficiency, and security. AWS also expanded its long-term investments, including chip development, to provide new capabilities and change what’s possible for its customers.

Parent document retriever chain
Let’s look at a more advanced RAG option with the help of ParentDocumentRetriever. When working with document retrieval, you may encounter a trade-off between storing small chunks of a document for accurate embeddings and larger documents to preserve more context. The parent document retriever strikes that balance by splitting and storing small chunks of data.
We use a parent_splitter to divide the original documents into larger chunks called parent documents and a child_splitter to create smaller child documents from the original documents:

# This text splitter is used to create the parent documents
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

# The vectorstore to use to index the child chunks
vectorstore_faiss = FAISS.from_documents(
child_splitter.split_documents(documents),
sagemaker_embeddings,
)

The child documents are then indexed in a vector store using embeddings. This enables efficient retrieval of relevant child documents based on similarity. To retrieve relevant information, the parent document retriever first fetches the child documents from the vector store. It then looks up the parent IDs for those child documents and returns the corresponding larger parent documents.

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type=”stuff”,
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={“prompt”: PROMPT}
)

Let’s ask a question:

query = “How did AWS evolve?”
result = qa({“query”: query})
print(result[‘result’])
AWS (Amazon Web Services) started with a feature-poor initial launch of the Elastic Compute Cloud (EC2) service in 2006, providing only one instance size, in one data center, in one region of the world, with Linux operating system instances only, and without many key features like monitoring, load balancing, auto-scaling, or persistent storage. However, AWS’s success allowed them to quickly iterate and add the missing capabilities, eventually expanding to offer various flavors, sizes, and optimizations of compute, storage, and networking, as well as developing their own chips (Graviton) to push price and performance further. AWS’s iterative innovation process required significant investments in financial and people resources over 20 years, often well in advance of when it would pay out, to meet customer needs and improve long-term customer experiences, loyalty, and returns for shareholders.

Contextual compression chain
Let’s look at another advanced RAG option called contextual compression. One challenge with retrieval is that usually we don’t know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.
The contextual compression retriever addresses the challenge of retrieving relevant information from a document storage system, where the pertinent data may be buried within documents containing a lot of  text. By compressing and filtering the retrieved documents based on the given query context, only the most relevant information is returned.
To use the contextual compression retriever, you’ll need:

A base retriever – This is the initial retriever that fetches documents from the storage system based on the query
A document compressor – This component takes the initially retrieved documents and shortens them by reducing the contents of individual documents or dropping irrelevant documents altogether, using the query context to determine relevance

Adding contextual compression with an LLM chain extractor
First, wrap your base retriever with a ContextualCompressionRetriever. You’ll add an LLMChainExtractor, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

from langchain.retrievers import ContextualCompressionRetrieverfrom langchain.retrievers.document_compressors import LLMChainExtractor

text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=1000,
chunk_overlap=100,
)

docs = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(
docs,
sagemaker_embeddings,
).as_retriever()

compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
“How was Amazon impacted by COVID-19?”
)

Initialize the chain using the ContextualCompressionRetriever with an LLMChainExtractor and pass the prompt in via the chain_type_kwargs argument.

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type=”stuff”,
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={“prompt”: PROMPT}
)

Let’s ask a question:

query = “How did AWS evolve?”
result = qa({“query”: query})
print(result[‘result’])
AWS evolved by starting as a small project inside Amazon, requiring significant capital investment and facing skepticism from both inside and outside the company. However, AWS had a head start on potential competitors and believed in the value it could bring to customers and Amazon. AWS made a long-term commitment to continue investing, resulting in over 3,300 new features and services launched in 2022. AWS has transformed how customers manage their technology infrastructure and has become an $85B annual revenue run rate business with strong profitability. AWS has also continuously improved its offerings, such as enhancing EC2 with additional features and services after its initial launch.

Filter documents with an LLM chain filter
The LLMChainFilter is a slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents:

from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=_filter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
“How was Amazon impacted by COVID-19?”
)
print(compressed_docs)

Initialize the chain using the ContextualCompressionRetriever with an LLMChainFilter and pass the prompt in via the chain_type_kwargs argument.

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type=”stuff”,
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={“prompt”: PROMPT}
)

Let’s ask a question:

query = “How did AWS evolve?”
result = qa({“query”: query})
print(result[‘result’])
AWS (Amazon Web Services) evolved by initially launching feature-poor but iterating quickly based on customer feedback to add necessary capabilities. This approach allowed AWS to launch EC2 in 2006 with limited features and then continuously add new functionalities, such as additional instance sizes, data centers, regions, operating system options, monitoring tools, load balancing, auto-scaling, and persistent storage. Over time, AWS transformed from a feature-poor service to a multi-billion-dollar business by focusing on customer needs, agility, innovation, cost-efficiency, and security. AWS now has an $85B annual revenue run rate and offers over 3,300 new features and services each year, catering to a wide range of customers from start-ups to multinational companies and public sector organizations.

Compare results
The following table compares results from different queries based on technique.

Technique
Query 1
Query 2
Comparison

How did AWS evolve?
Why is Amazon successful?

Regular Retriever Chain Output
AWS (Amazon Web Services) evolved from an initially unprofitable investment to an $85B annual revenue run rate business with strong profitability, offering a wide range of services and features, and becoming a significant part of Amazon’s portfolio. Despite facing skepticism and short-term headwinds, AWS continued to innovate, attract new customers, and migrate active customers, offering benefits such as agility, innovation, cost-efficiency, and security. AWS also expanded its long-term investments, including chip development, to provide new capabilities and change what’s possible for its customers.
Amazon is successful due to its continuous innovation and expansion into new areas such as technology infrastructure services, digital reading devices, voice-driven personal assistants, and new business models like the third-party marketplace. Its ability to scale operations quickly, as seen in the rapid expansion of its fulfillment and transportation networks, also contributes to its success. Additionally, Amazon’s focus on optimization and efficiency gains in its processes has resulted in productivity improvements and cost reductions. The example of Amazon Business highlights the company’s capability to leverage its e-commerce and logistics strengths in different sectors.
Based on the responses from the regular retriever chain, we notice that although it provides long answers, it suffers from context overflow and fails to mention any significant details from the corpus in regards to responding to the query provided. The regular retrieval chain is not able to capture the nuances with depth or contextual insight, potentially missing critical aspects of the document.

Parent Document Retriever Output
AWS (Amazon Web Services) started with a feature-poor initial launch of the Elastic Compute Cloud (EC2) service in 2006, providing only one instance size, in one data center, in one region of the world, with Linux operating system instances only, and without many key features like monitoring, load balancing, auto-scaling, or persistent storage. However, AWS’s success allowed them to quickly iterate and add the missing capabilities, eventually expanding to offer various flavors, sizes, and optimizations of compute, storage, and networking, as well as developing their own chips (Graviton) to push price and performance further. AWS’s iterative innovation process required significant investments in financial and people resources over 20 years, often well in advance of when it would pay out, to meet customer needs and improve long-term customer experiences, loyalty, and returns for shareholders.
Amazon is successful due to its ability to constantly innovate, adapt to changing market conditions, and meet customer needs in various market segments. This is evident in the success of Amazon Business, which has grown to drive roughly $35B in annualized gross sales by delivering selection, value, and convenience to business customers. Amazon’s investments in ecommerce and logistics capabilities have also enabled the creation of services like Buy with Prime, which helps merchants with direct-to-consumer websites drive conversion from views to purchases.
The parent document retriever delves deeper into the specifics of AWS’s growth strategy, including the iterative process of adding new features based on customer feedback and the detailed journey from a feature-poor initial launch to a dominant market position, while providing a context-rich response. Responses cover a wide range of aspects, from technical innovations and market strategy to organizational efficiency and customer focus, providing a holistic view of the factors contributing to success along with examples. This can be attributed to the parent document retriever’s targeted yet broad-ranging search capabilities.

LLM Chain Extractor: Contextual Compression Output
AWS evolved by starting as a small project inside Amazon, requiring significant capital investment and facing skepticism from both inside and outside the company. However, AWS had a head start on potential competitors and believed in the value it could bring to customers and Amazon. AWS made a long-term commitment to continue investing, resulting in over 3,300 new features and services launched in 2022. AWS has transformed how customers manage their technology infrastructure and has become an $85B annual revenue run rate business with strong profitability. AWS has also continuously improved its offerings, such as enhancing EC2 with additional features and services after its initial launch.
Based on the provided context, Amazon’s success can be attributed to its strategic expansion from a book-selling platform to a global marketplace with a vibrant third-party seller ecosystem, early investment in AWS, innovation in introducing the Kindle and Alexa, and substantial growth in annual revenue from 2019 to 2022. This growth led to the expansion of the fulfillment center footprint, creation of a last-mile transportation network, and building a new sortation center network, which were optimized for productivity and cost reductions.
The LLM chain extractor maintains a balance between covering key points comprehensively and avoiding unnecessary depth. It dynamically adjusts to the query’s context, so the output is directly relevant and comprehensive.

LLM Chain Filter: Contextual Compression Output
AWS (Amazon Web Services) evolved by initially launching feature-poor but iterating quickly based on customer feedback to add necessary capabilities. This approach allowed AWS to launch EC2 in 2006 with limited features and then continuously add new functionalities, such as additional instance sizes, data centers, regions, operating system options, monitoring tools, load balancing, auto-scaling, and persistent storage. Over time, AWS transformed from a feature-poor service to a multi-billion-dollar business by focusing on customer needs, agility, innovation, cost-efficiency, and security. AWS now has an $85B annual revenue run rate and offers over 3,300 new features and services each year, catering to a wide range of customers from start-ups to multinational companies and public sector organizations.
Amazon is successful due to its innovative business models, continuous technological advancements, and strategic organizational changes. The company has consistently disrupted traditional industries by introducing new ideas, such as an ecommerce platform for various products and services, a third-party marketplace, cloud infrastructure services (AWS), the Kindle e-reader, and the Alexa voice-driven personal assistant. Additionally, Amazon has made structural changes to improve its efficiency, such as reorganizing its US fulfillment network to decrease costs and delivery times, further contributing to its success.
Similar to the LLM chain extractor, the LLM chain filter makes sure that although the key points are covered, the output is efficient for customers looking for concise and contextual answers.

Upon comparing these different techniques, we can see that in contexts like detailing AWS’s transition from a simple service to a complex, multi-billion-dollar entity, or explaining Amazon’s strategic successes, the regular retriever chain lacks the precision the more sophisticated techniques offer, leading to less targeted information. Although very few differences are visible between the advanced techniques discussed, they are by far more informative than regular retriever chains.
For customers in industries such as healthcare, telecommunications, and financial services who are looking to implement RAG in their applications, the limitations of the regular retriever chain in providing precision, avoiding redundancy, and effectively compressing information make it less suited to fulfilling these needs compared to the more advanced parent document retriever and contextual compression techniques. These techniques are able to distill vast amounts of information into the concentrated, impactful insights that you need, while helping improve price-performance.
Clean up
When you’re done running the notebook, delete the resources you created in order to avoid accrual of charges for the resources in use:

# Delete resources
llm_predictor.delete_model()
llm_predictor.delete_endpoint()
embedding_predictor.delete_model()
embedding_predictor.delete_endpoint()

Conclusion
In this post, we presented a solution that allows you to implement the parent document retriever and contextual compression chain techniques to enhance the ability of LLMs to process and generate information. We tested out these advanced RAG techniques with the Mixtral-8x7B Instruct and BGE Large En models available with SageMaker JumpStart. We also explored using persistent storage for embeddings and document chunks and integration with enterprise data stores.
The techniques we performed not only refine the way LLM models access and incorporate external knowledge, but also significantly improve the quality, relevance, and efficiency of their outputs. By combining retrieval from large text corpora with language generation capabilities, these advanced RAG techniques enable LLMs to produce more factual, coherent, and context-appropriate responses, enhancing their performance across various natural language processing tasks.
SageMaker JumpStart is at the center of this solution. With SageMaker JumpStart, you gain access to an extensive assortment of open and closed source models, streamlining the process of getting started with ML and enabling rapid experimentation and deployment. To get started deploying this solution, navigate to the notebook in the GitHub repo.

About the Authors
Niithiyn Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.
Sebastian Bustillo is a Solutions Architect at AWS. He focuses on AI/ML technologies with a profound passion for generative AI and compute accelerators. At AWS, he helps customers unlock business value through generative AI. When he’s not at work, he enjoys brewing a perfect cup of specialty coffee and exploring the world with his wife.
Armando Diaz is a Solutions Architect at AWS. He focuses on generative AI, AI/ML, and Data Analytics. At AWS, Armando helps customers integrating cutting-edge generative AI capabilities into their systems, fostering innovation and competitive advantage. When he’s not at work, he enjoys spending time with his wife and family, hiking, and traveling the world.
Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.
Marco Punio is a Solutions Architect focused on generative AI strategy, applied AI solutions and conducting research to help customers hyper-scale on AWS. Marco is a digital native cloud advisor with experience in the FinTech, Healthcare & Life Sciences, Software-as-a-service, and most recently, in Telecommunications industries. He is a qualified technologist with a passion for machine learning, artificial intelligence, and mergers & acquisitions. Marco is based in Seattle, WA and enjoys writing, reading, exercising, and building applications in his free time.
AJ Dhimine is a Solutions Architect at AWS. He specializes in generative AI, serverless computing and data analytics. He is an active member/mentor in Machine Learning Technical Field Community and has published several scientific papers on various AI/ML topics. He works with customers, ranging from start-ups to enterprises, to develop AWSome generative AI solutions. He is particularly passionate about leveraging Large Language Models for advanced data analytics and exploring practical applications that address real-world challenges. Outside of work, AJ enjoys traveling, and is currently at 53 countries with a goal of visiting every country in the world.

Efficient continual pre-training LLMs for financial domains

Large language models (LLMs) are generally trained on large publicly available datasets that are domain agnostic. For example, Meta’s Llama models are trained on datasets such as CommonCrawl, C4, Wikipedia, and ArXiv. These datasets encompass a broad range of topics and domains. Although the resulting models yield amazingly good results for general tasks, such as text generation and entity recognition, there is evidence that models trained with domain-specific datasets can further improve LLM performance. For example, the training data used for BloombergGPT is 51% domain-specific documents, including financial news, filings, and other financial materials. The resulting LLM outperforms LLMs trained on non-domain-specific datasets when tested on finance-specific tasks. The authors of BloombergGPT concluded that their model outperforms all other models tested for four of the five financial tasks. The model provided even better performance when tested for Bloomberg’s internal financial tasks by a wide margin—as much as 60 points better (out of 100). Although you can learn more about the comprehensive evaluation results in the paper, the following sample captured from the BloombergGPT paper can give you a glimpse of the benefit of training LLMs using financial domain-specific data. As shown in the example, the BloombergGPT model provided correct answers while other non-domain-specific models struggled:

Input: CEO of Assicurazioni Generali?
BloombergGPT: Philippe Donnet
GPT-NeoX: Antonio De Lorenzo, Simone Gambarini, Enrico Zanetti
FLAN-T5-XXL: John M Forsyth, Christopher K Peters, {empty string}

Input: CEO of Silicon Valley Bank?
BloombergGPT: Greg Becker
GPT-NeoX: John Hodor, Mike Fridgen, Peter R Briger
FLAN-T5-XXL: Christian Klein, John M Forsyth, John M Forsyth

Input: CEO of IHS Markit Ltd?
BloombergGPT: Lance Uggla
GPT-NeoX: John F Lechleiter, David Steward, Niall O Broin
FLAN-T5-XXL: William M Hawkins, John M Forsyth, John M Forsyth
Input: CEO of Walgreens Boots Alliance?

BloombergGPT: Rosalind Brewer
GPT-NeoX: Mark V Dei, Stefano Pessina, Stefano Pessina
FLAN-T5-XXL: Christian Klein, John M Forsyth, John M Forsyth
Input: CEO of Citigroup Inc?

BloombergGPT: Jane Fraser
GPT-NeoX: Michael L Corbat, Michael L Corbat, Michael L Corbat
FLAN-T5-XXL: Christian Sewing, John M Forsyth, John M Forsyth

This post provides a guide to training LLMs specifically for the financial domain. We cover the following key areas:

Data collection and preparation – Guidance on sourcing and curating relevant financial data for effective model training
Continual pre-training vs. fine-tuning – When to use each technique to optimize your LLM’s performance
Efficient continual pre-training – Strategies to streamline the continual pre-training process, saving time and resources

This post brings together the expertise of the applied science research team within Amazon Finance Technology and the AWS Worldwide Specialist team for the Global Financial Industry. Some of the content is based on the paper Efficient Continual Pre-training for Building Domain Specific Large Language Models.
Collecting and preparing finance data
Domain continual pre-training necessities a large-scale, high-quality, domain-specific dataset. The following are the main steps for domain dataset curation:

Identify data sources – Potential data sources for domain corpus include open web, Wikipedia, books, social media, and internal documents.
Domain data filters – Because the ultimate goal is to curate domain corpus, you might need apply additional steps to filter out samples that irrelevant to the target domain. This reduces useless corpus for continual pre-training and reduces training cost.
Preprocessing – You might consider a series of preprocessing steps to improve data quality and training efficiency. For example, certain data sources can contain a fair number of noisy tokens; deduplication is considered a useful step to improve data quality and reduce training cost.

To develop financial LLMs, you can use two important data sources: News CommonCrawl and SEC filings. An SEC filing is a financial statement or other formal document submitted to the US Securities and Exchange Commission (SEC). Publicly listed companies are required to file various documents regularly. This creates a large number of documents over the years. News CommonCrawl is a dataset released by CommonCrawl in 2016. It contains news articles from news sites all over the world.
News CommonCrawl is available on Amazon Simple Storage Service (Amazon S3) in the commoncrawl bucket at crawl-data/CC-NEWS/. You can get the listings of files using the AWS Command Line Interface (AWS CLI) and the following command:

aws s3 ls –recursive s3://commoncrawl/crawl-data/CC-NEWS/

In Efficient Continual Pre-training for Building Domain Specific Large Language Models, the authors use a URL and keyword-based approach to filter financial news articles from generic news. Specifically, the authors maintain a list of important financial news outlets and a set of keywords related to financial news. We identify an article as financial news if either it comes from financial news outlets or any keywords show up in the URL. This simple yet effective approach enables you to identify financial news from not only financial news outlets but also finance sections of generic news outlets.
SEC filings are available online through the SEC’s EDGAR (Electronic Data Gathering, Analysis, and Retrieval) database, which provides open data access. You can scrape the filings from EDGAR directly, or use APIs in Amazon SageMaker with a few lines of code, for any period of time and for a large number of tickers (i.e., the SEC assigned identifier). To learn more, refer to SEC Filing Retrieval.
The following table summarizes the key details of both data sources.

.
News CommonCrawl
SEC Filing

Coverage
2016-2022
1993-2022

Size
25.8 billion words
5.1 billion words

The authors go through a few extra preprocessing steps before the data is fed into a training algorithm. First, we observe that SEC filings contain noisy text due to the removal of tables and figures, so the authors remove short sentences that are deemed to be table or figure labels. Secondly, we apply a locality sensitive hashing algorithm to deduplicate the new articles and filings. For SEC filings, we deduplicate at the section level instead of the document level. Lastly, we concatenate documents into a long string, tokenize it, and chunk the tokenization into pieces of max input length supported by the model to be trained. This improves the throughput of continual pre-training and reduces the training cost.
Continual pre-training vs. fine-tuning
Most available LLMs are general purpose and lack domain-specific abilities. Domain LLMs have shown considerable performance in medical, finance, or scientific domains. For an LLM to acquire domain-specific knowledge, there are four methods: training from scratch, continual pre-training, instruction fine-tuning on domain tasks, and Retrieval Augmented Generation (RAG).
In traditional models, fine-tuning is usually used to create task-specific models for a domain. This means maintaining multiple models for multiple tasks like entity extraction, intent classification, sentiment analysis, or question answering. With the advent of LLMs, the need to maintain separate models has become obsolete by using techniques like in-context learning or prompting. This saves the effort required to maintain a stack of models for related but distinct tasks.
Intuitively, you can train LLMs from scratch with domain-specific data. Although most of the work to create domain LLMs has focused on training from scratch, it is prohibitively expensive. For example, the GPT-4 model costs over $100 million to train. These models are trained on a mix of open domain data and domain data. Continual pre-training can help models acquire domain-specific knowledge without incurring the cost of pre-training from scratch because you pre-train an existing open domain LLM on only the domain data.
With instruction fine-tuning on a task, you can’t make the model acquire domain knowledge because the LLM only acquires domain information contained in the instruction fine-tuning dataset. Unless a very large dataset for instruction fine-tuning is used, it is not enough to acquire domain knowledge. Sourcing high-quality instruction datasets is usually challenging and is the reason to use LLMs in first place. Also, instruction fine-tuning on one task can affect performance on other tasks (as seen in this paper). However, instruction fine-tuning is more cost-effective than either of the pre-training alternatives.
The following figure compares traditional task-specific fine-tuning. vs in-context learning paradigm with LLMs.
RAG is the most effective way of guiding an LLM to generate responses grounded in a domain. Although it can guide a model to generate responses by providing facts from the domain as auxiliary information, it doesn’t acquire the domain-specific language because the LLM is still relying on non-domain language style to generate the responses.
Continual pre-training is a middle ground between pre-training and instruction fine-tuning in terms of cost while being a strong alternative to gaining domain-specific knowledge and style. It can provide a general model over which further instruction fine-tuning on limited instruction data can be performed. Continual pre-training can be a cost-effective strategy for specialized domains where set of downstream tasks is large or unknown and labeled instruction tuning data is limited. In other scenarios, instruction fine-tuning or RAG might be more suitable.
To learn more about fine-tuning, RAG, and model training, refer to Fine-tune a foundation model, Retrieval Augmented Generation (RAG), and Train a Model with Amazon SageMaker, respectively. For this post, we focus on efficient continual pre-training.
Methodology of efficient continual pre-training
Continual pre-training consists of the following methodology:

Domain-Adaptive Continual Pre-training (DACP) – In the paper Efficient Continual Pre-training for Building Domain Specific Large Language Models, the authors continually pre-train the Pythia language model suite on the financial corpus to adapt it to the finance domain. The objective is to create financial LLMs by feeding data from the whole financial domain into an open-sourced model. Because the training corpus contains all the curated datasets in the domain, the resultant model should acquire finance-specific knowledge, thereby becoming a versatile model for various financial tasks. This results in FinPythia models.
Task-Adaptive Continual Pre-training (TACP) – The authors pre-train the models further on labeled and unlabeled task data to tailor them for specific tasks. In certain circumstances, developers may prefer models delivering better performance on a group of in-domain tasks rather than a domain-generic model. TACP is designed as continual pre-training aiming to enhance performance on targeted tasks, without requirements for labeled data. Specifically, the authors continually pre-train the open sourced models on the task tokens (without labels). The primary limitation of TACP lies in constructing task-specific LLMs instead of foundation LLMs, owing to the sole use of unlabeled task data for training. Although DACP uses a much larger corpus, it is prohibitively expensive. To balance these limitations, the authors propose two approaches that aim to build domain-specific foundation LLMs while preserving superior performance on target tasks:

Efficient Task-Similar DACP (ETS-DACP) – The authors propose selecting a subset of financial corpus that is highly similar to the task data using embedding similarity. This subset is used for continual pre-training to make it more efficient. Specifically, the authors continually pre-train the open sourced LLM on a small corpus extracted from the financial corpus that is close to the target tasks in distribution. This can help improve task performance because we adopt the model to the distribution of task tokens despite labeled data not being required.
Efficient Task-Agnostic DACP (ETA-DACP) – The authors propose using metrics like perplexity and token type entropy that don’t require task data to select samples from financial corpus for efficient continual pre-training. This approach is designed to deal with scenarios where task data is unavailable or more versatile domain models for the broader domain are preferred. The authors adopt two dimensions to select data samples that are important for obtaining domain information from a subset of pre-training domain data: novelty and diversity. Novelty, measured by the perplexity recorded by the target model, refers to the information that was unseen by the LLM before. Data with high novelty indicates novel knowledge for the LLM, and such data is viewed as more difficult to learn. This updates generic LLMs with intensive domain knowledge during continual pre-training. Diversity, on the other hand, captures the diversity of distributions of token types in the domain corpus, which has been documented as a useful feature in the research of curriculum learning on language modeling.

The following figure compares an example of ETS-DACP (left) vs. ETA-DACP (right).

We adopt two sampling schemes to actively select data points from curated financial corpus: hard sampling and soft sampling. The former is done by first ranking the financial corpus by corresponding metrics and then selecting the top-k samples, where k is predetermined according to the training budget. For the latter, the authors assign sampling weights for each data points according the metric values, and then randomly sample k data points to meet the training budget.
Result and analysis
The authors evaluate the resulting financial LLMs on an array of financial tasks to investigate the efficacy of continual pre-training:

Financial Phrase Bank – A sentiment classification task on financial news.
FiQA SA – An aspect-based sentiment classification task based on financial news and headlines.
Headline – A binary classification task on whether a headline on a financial entity contains certain information.
NER – A financial named entity extraction task based on credit risk assessment section of SEC reports. Words in this task are annotated with PER, LOC, ORG, and MISC.

Because financial LLMs are instruction fine-tuned, the authors evaluate models in a 5-shot setting for each task for the sake of robustness. On average, the FinPythia 6.9B outperforms Pythia 6.9B by 10% across four tasks, which demonstrates the efficacy of domain-specific continual pre-training. For the 1B model, the improvement is less profound, but performance still improves 2% on average.
The following figure illustrates the performance difference before and after DACP on both models.

The following figure showcases two qualitative examples generated by Pythia 6.9B and FinPythia 6.9B. For two finance-related questions regarding an investor manager and a financial term, Pythia 6.9B doesn’t understand the term or recognize the name, whereas FinPythia 6.9B generates detailed answers correctly. The qualitative examples demonstrate that continual pre-training enables the LLMs to acquire domain knowledge during the process.

The following table compares various efficient continual pre-training approaches. ETA-DACP-ppl is ETA-DACP based on perplexity (novelty), and ETA-DACP-ent is based on entropy (diversity). ETS-DACP-com is similar to DACP with data selection by averaging all three metrics. The following are a few takeaways from the results:

Data selection methods are efficient – They surpass standard continual pre-training with just 10% of training data. Efficient continual pre-training including Task-Similar DACP (ETS-DACP), Task-Agnostic DACP based on entropy (ESA-DACP-ent) and Task-Similar DACP based on all three metrics (ETS-DACP-com) outperforms standard DACP on average despite the fact that they are trained on only 10% of financial corpus.
Task-aware data selection works the best in line with small language models research – ETS-DACP records the best average performance among all the methods and, based on all three metrics, records the second-best task performance. This suggests that using unlabeled task data is still an effective approach to boost task performance in the case of LLMs.
Task-agnostic data selection is close second – ESA-DACP-ent follows the performance of the task-aware data selection approach, implying that we could still boost task performance by actively selecting high-quality samples not tied to specific tasks. This paves the way to build financial LLMs for the whole domain while achieving superior task performance.

One critical question regarding continual pre-training is whether it negatively affects the performance on non-domain tasks. The authors also evaluate the continually pre-trained model on four widely used generic tasks: ARC, MMLU, TruthQA, and HellaSwag, which measure the ability of question answering, reasoning, and completion. The authors find that continual pre-training does not adversely affect non-domain performance. For more details, refer to Efficient Continual Pre-training for Building Domain Specific Large Language Models.
Conclusion
This post offered insights into data collection and continual pre-training strategies for training LLMs for financial domain. You can start training your own LLMs for financial tasks using Amazon SageMaker Training or Amazon Bedrock today.

About the Authors
Yong Xie is an applied scientist in Amazon FinTech. He focuses on developing large language models and Generative AI applications for finance.
Karan Aggarwal is a Senior Applied Scientist with Amazon FinTech with a focus on Generative AI for finance use-cases. Karan has extensive experience in time-series analysis and NLP, with particular interest in learning from limited labeled data
Aitzaz Ahmad is an Applied Science Manager at Amazon where he leads a team of scientists building various applications of Machine Learning and Generative AI in Finance. His research interests are in NLP, Generative AI, and LLM Agents. He received his PhD in Electrical Engineering from Texas A&M University.
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in financial service build machine learning solutions on AWS.
Raghvender Arni leads the Customer Acceleration Team (CAT) within AWS Industries. The CAT is a global cross-functional team of customer facing cloud architects, software engineers, data scientists, and AI/ML experts and designers that drives innovation via advanced prototyping, and drives cloud operational excellence via specialized technical expertise.

The Role of Symmetry Breaking in Machine Learning: A Study on Equivari …

Symmetry is a fundamental characteristic where an object remains unchanged under certain transformations and is a key inductive bias that enhances model performance and efficiency. Therefore, understanding and leveraging the concept of symmetry has emerged as a cornerstone for designing more efficient and effective neural network models. Researchers have consistently sought ways to exploit this property, leading to significant breakthroughs that span various machine-learning applications.

One of the main challenges identified in this domain is the limitation of equivariant functions in neural networks to break symmetry at the level of individual data samples adaptively. This constraint hampers the versatility of neural networks, especially in fields requiring nuanced interpretation of symmetrical data, such as physics, where phenomena like phase transitions demand a departure from initial symmetrical states.

Recent approaches to managing symmetries in neural networks have centered around the principle of equivariance. This principle ensures a coherent transformation of outputs in response to changes in the inputs dictated by symmetry operations. While this method preserves the integrity of data’s structural properties through computational layers, it needs to be revised when the need arises to break the symmetry in data, a requirement in numerous scientific and optimization problems.

A research team from Mila-Quebec AI Institute and McGill University has proposed a novel method termed “relaxed equivariance.” This concept extends the boundaries of equivariant neural networks by allowing the intentional breaking of input symmetries. By embedding relaxed equivariance within equivariant multilayer perceptrons (E-MLPs), the researchers offer a refined alternative to injecting noise to induce symmetry breaking. 

Relaxed equivariance enables outputs to adapt to input transformations without preserving all input symmetries, offering a nuanced approach over traditional noise-induced symmetry breaking. This method integrates into E-MLPs by strategically applying weight matrices aligned with symmetry subgroups, facilitating effective symmetry breaking in linear layers. Point-wise activation functions compatible with permutation groups are employed, satisfying relaxed equivariance requirements and ensuring compositional compatibility. This sophisticated design allows for more precise and controlled handling of symmetry in data, significantly enhancing the adaptability and efficiency of neural network models.

The proposed framework for symmetry breaking in deep learning has applications in multiple domains, such as physics modeling, graph representation learning, combinatorial optimization, and equivariant decoding. Details are as stated below: 

In physics modeling, symmetry breaking is important for describing phase transitions and bifurcations in dynamical systems.

In graph representation learning, breaking symmetry is necessary to avoid unnecessary symmetry from the graph itself.

In combinatorial optimization, breaking symmetry is required to handle degeneracies caused by symmetry and identify a single solution.

In conclusion, the efforts of the Mila-Quebec AI Institute and McGill University research team mark a pivotal development in the ongoing quest to harness the full potential of symmetries in machine learning. By pioneering the concept of relaxed equivariance, they have not only broadened the theoretical landscape of neural network design but also unlocked new possibilities for practical applications across a spectrum of disciplines. This work enriches the understanding of equivariant networks and sets a new benchmark for developing machine-learning models capable of expertly handling the intricacies of symmetry and asymmetry in data.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post The Role of Symmetry Breaking in Machine Learning: A Study on Equivariant Functions and E-MLPs appeared first on MarkTechPost.

This AI Paper from Microsoft Present SiMBA: A Simplified Mamba-based A …

Language models’ evolution is shifting from Large Language Models (LLMs) to the era of Small Language Models (SLMs). At the core of both LLMs and SLMs lies the power of transformers, which are the building blocks of LLMs and SLMs. While transformers have proven their outstanding performance across domains through their attention networks, multiple issues exist in attention networks, including low inductive bias and quadratic complexity concerning input sequence length. 

State Space Models (SSMs) like S4 and others have emerged to address the above issues and help handle longer sequence lengths. S4 has been less effective in catering to modeling information-dense data, particularly in domains such as computer vision, and faces challenges in discrete scenarios like genomic data. Mamba, a selective state space sequence modeling technique, was recently proposed to address typical state space models’ difficulties in handling long sequences efficiently. However, Mamba has stability issues, i.e., the training loss is not converging while scaling to large-sized networks for computer vision datasets.

The researchers from Microsoft introduced SiMBA,  a new architecture that introduces Einstein FFT (EinFFT) for channel modeling. SiMBA architecture incorporates Mamba for sequence modeling and introduces EinFFT as a new channel modeling technique. SiMBA effectively addresses the instability issues observed in Mamba when scaling to large networks. This method highlights various models based on convolutional models, transformers models, MLP-mixers, spectralmixers models, and state space methods. Also, it introduces hybrid models combining convolution with transformers or spectral approaches. 

The Channel Mixing component of SiMBA incorporates three main components: Spectral Transformation, Spectral Gating Network using Einstein Matrix multiplication, and Inverse Spectral Transformation. EinFFT utilizes frequency-domain channel mixing by applying Einstein Matrix multiplication on complex number representations. This enables the extraction of crucial data patterns with enhanced global visibility and energy concentration. Mamba combined with MLP for channel mixing bridges the performance gap for small-scale networks but may have the same stability issues for large networks. Combined with EinFFT, Mamba solves stability issues for small-scale and large networks. 

SiMBA demonstrates superior performance across multiple evaluation metrics, including Mean Squared Error (MSE) and Mean Absolute Error (MAE), outperforming the state-of-the-art models. These results highlight the effectiveness of the SiMBA architecture in handling diverse time series forecasting tasks and modalities, solidifying its position as a leading model in the field. By conducting performance evaluations on the ImageNet 1K dataset, the model demonstrates remarkable performance with an 84.0% top-1 accuracy, surpassing prominent convolutional networks like ResNet-101 and ResNet-152, as well as leading transformers such as EffNet, ViT, Swin, and DeIT.

The major contributions of the researchers in this paper are the following:

EinFFT: A new technique for channel modeling known as EinFFT is proposed to solve the stability issue in Mamba. This uses Fourier transforms with nonlinearity to model eigenvalues as negative real numbers, which solves instability.

SiMBA: Researchers propose an optimized Mamba architecture for computer vision tasks called SiMBA. This architecture uses EinFFT for channel modeling and Mamba for token mixing to handle inductive bias and computational complexity.

Performance Gap: SiMBA is the first SSM to close the performance gap with state-of-the-art attention-based transformers on the ImageNet dataset and six standard time series datasets. 

In conclusion, The researchers from Microsoft have proposed SiMBA, a new architecture that utilizes EinFFT for channel modeling and Mamba for sequence modeling. SiMBA allows for exploring various alternatives for sequence modeling like S4, long conv, Hyena, H3, RWKV, and even newer state space models. SiMBA also bridges the performance gap that most state space models have with state-of-the-art transformers on both vision and time series datasets.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post This AI Paper from Microsoft Present SiMBA: A Simplified Mamba-based Architecture for Vision and Multivariate Time Series appeared first on MarkTechPost.