Can LLMs Design Good Questions Based on Context? This AI Paper Evaluat …

Large Language Models (LLMs) are used to create questions based on given facts or context, but understanding how good these questions are can be difficult. The challenge is that questions made by LLMs often differ from those made by humans in terms of length, type, or how well they fit the context and can be answered. Checking the quality of these questions is hard because most methods need a lot of work from people or only use simple numbers that don’t show the full picture. This makes it tough to judge the questions properly and creates problems in improving how LLMs make questions or avoiding mistakes when used incorrectly.

Current question generation (QG) methods use automated techniques to generate questions based on facts. While many approaches exist, they either rely on simple statistical measures or require extensive manual labeling effort, both of which are limited in evaluating the full quality of generated questions. Statistical methods do not capture deeper meanings and context, making human labeling time-consuming and inefficient. Although LLMs have improved significantly, there has been limited exploration of how these models generate questions and evaluate their quality, resulting in gaps in understanding and improvement.

To address issues in question generation (QG), researchers from the University of California Berkeley,  KACST, and the University of Washington proposed an automated evaluation framework using Large Language Models (LLMs). This framework generates questions based on a given context and evaluates them on six dimensions: question type, length, context coverage, answerability, uncommonness, and required answer length. Unlike conventional methods based on positional biases or limited metrics, this method fully analyzes the quality and characteristics of questions generated by LLMs. It compares them with human-generated questions and shows how LLMs focus on different parts of the context evenly, producing descriptive and self-contained questions that include all relevant information.

Upon evaluation, researchers explored LLM-based Question Generation (QG) using 860,000 paragraphs from the WikiText dataset to generate self-contained questions without direct context references. They analyzed question type, length, and context coverage, finding an average question length of 15 words with 51.1% word-level and 66.7% sentence-level context coverage. Answerability was very high with context but low without context, showing that context is important. Researchers reduced the number of words for the answer from 36 to 26 without losing quality, reflecting improvements in automatic QG and evaluation techniques.

In summary, the proposed method analyzed the questions generated by LLM and highlighted their specific features and differences from human-generated ones. In addition, the researchers introduced an automated evaluation method to improve the understanding and optimization of QG tasks. This work can serve as a baseline for future research to enhance LLM-based QG, exploring application-specific tasks, domain-specific contexts, and improved alignment with human-generated content.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Can LLMs Design Good Questions Based on Context? This AI Paper Evaluates Questions Generated by LLMs from Context, Comparing Them to Human-Generated Questions appeared first on MarkTechPost.

Content-Adaptive Tokenizer (CAT): An Image Tokenizer that Adapts Token …

One of the major hurdles in AI-driven image modeling is the inability to account for the diversity in image content complexity effectively. The tokenization methods so far used are static compression ratios where all images are treated equally, and the complexities of images are not considered. Due to this reason, complex images get over-compressed and lead to the loss of crucial information, while simple images remain under-compressed, wasting valuable computational resources. These inefficiencies hinder the performance of subsequent operations such as reconstruction and generation of images, in which accurate and efficient representation plays a critical role.

Current techniques for tokenizing images do not address the variation in complexity appropriately. Fixed ratio tokenization approaches resize images to standard sizes without considering the varying complexity of contents. Vision Transformers adapt patch size dynamically but rely on image input and do not have flexibility with text-to-image applications. Other compression techniques include JPEG, which is designed specifically for traditional media but lacks optimization for deep learning-based tokenization. Current work, ElasticTok, has offered random token length strategies but lacked consideration of the intrinsic content complexity during training time; this leads to inefficiencies regarding quality and the computational cost associated.

Researchers from  Carnegie Mellon University and Meta propose Content-Adaptive Tokenization (CAT), a pioneering framework for content-aware image tokenization that introduces a dynamic approach by allocating representation capacity based on content complexity. This innovation enables large language models to test the complexity of images from captions and perception-based queries while classifying images into three compression levels: 8x, 16x, and 32x. In addition, it uses a nested VAE architecture that generates variable-length latent features by dynamically routing intermediate outputs based on the complexity of the images. The adaptive design reduces training overhead and optimizes image representation quality to overcome the inefficiencies of fixed-ratio methods. CAT enables adaptive and efficient tokenization using text-based complexity analysis without requiring image inputs at inference.

CAT evaluates complexity with captions produced from LLMs that consider both semantic, visual, and perceptual features while determining compression ratios. Such a caption-based system is seen to be greater than traditional methods, including JPEG size and MSE in its ability to mimic human perceived importance. This adaptive nested VAE design does so with the channel-matched skip connections dynamically altering latent space across various compression levels. Shared parameterization guarantees consistency across scales, while training is performed by a combination of reconstruction error, perceptual loss (for example, LPIPS), and adversarial loss to reach optimal performance. CAT was trained on a dataset of 380 million images and tested on the benchmarks of COCO, ImageNet, CelebA, and ChartQA, thus showing its applicability to different image types.

This achieves highly significant performance improvements over both image reconstruction and generation by adapting compression based on content complexity. For reconstruction tasks, it significantly improves the rFID, LPIPS, and PSNR metrics. It delivers 12% quality improvement for the reconstruction of CelebA and 39% enhancement for ChartQA, all while keeping the quality comparable to those of datasets such as COCO and ImageNet with fewer tokens and efficiency. For class-conditional ImageNet generation, CAT outperforms the fixed-ratio baselines with an FID of 4.56 and improves inference throughput by 18.5%. This adaptive tokenization framework is the new benchmark for further improvement.

CAT is a new approach to image tokenization by dynamically modulating compression levels based on the complexity of the content. It integrates LLM-based assessments with an adaptive nested VAE, eliminating persistent inefficiencies associated with fixed-ratio tokenization, thereby significantly improving performance in reconstruction and generation tasks. The adaptability and effectiveness of CAT make it a revolutionary asset in AI-oriented image modeling, with potential applications extending to video and multi-modal domains.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Content-Adaptive Tokenizer (CAT): An Image Tokenizer that Adapts Token Count based on Image Complexity, Offering Flexible 8x, 16x, or 32x Compression appeared first on MarkTechPost.

Meet KaLM-Embedding: A Series of Multilingual Embedding Models Built o …

Multilingual applications and cross-lingual tasks are central to natural language processing (NLP) today, making robust embedding models essential. These models underpin systems like retrieval-augmented generation and other AI-driven solutions. However, existing models often struggle with noisy training data, limited domain diversity, and inefficiencies in managing multilingual datasets. These limitations affect performance and scalability. Researchers from the Harbin Institute of Technology (Shenzhen) have addressed these challenges with KaLM-Embedding, a model that emphasizes data quality and innovative training methodologies.

KaLM-Embedding is a multilingual embedding model built on Qwen 2-0.5B and released under the MIT license. Designed with compactness and efficiency in mind, it is particularly well-suited for real-world applications where computational resources are constrained.

The model’s data-centric design is a key strength. It incorporates 550,000 synthetic data samples generated using persona-based techniques to ensure diversity and relevance. Additionally, it employs ranking consistency filtering to remove noisy and false-negative samples, enhancing the quality and robustness of the training data.

Technical Features and Advantages

KaLM-Embedding incorporates advanced methodologies to deliver strong multilingual text embeddings. A notable feature is Matryoshka Representation Learning, which supports flexible embedding dimensions. This adaptability allows embeddings to be optimized for different applications, ranging from 64 to 896 dimensions.

The training strategy consists of two stages: weakly supervised pre-training and supervised fine-tuning. Over 70 diverse datasets were utilized during fine-tuning, covering a range of languages and domains. Semi-homogeneous task batching further refined the training process by balancing the challenges posed by in-batch negatives with the risk of false negatives.

KaLM-Embedding also benefits from its foundation on Qwen 2-0.5B, a pre-trained autoregressive language model. This architecture enables effective adaptation to embedding tasks, offering an advantage over traditional BERT-like models.

Performance and Benchmark Results

KaLM-Embedding’s performance was evaluated on the Massive Text Embedding Benchmark (MTEB). It achieved an average score of 64.53, setting a high standard for models with fewer than 1 billion parameters. Scores of 64.13 on Chinese-MTEB and 64.94 on English-MTEB highlight its multilingual capabilities. Despite limited fine-tuning data for some languages, the model demonstrated strong generalization abilities.

Ablation studies provided additional insights. Features like Matryoshka Representation Learning and ranking consistency filtering were shown to enhance performance. However, the studies also highlighted areas for improvement, such as refining low-dimensional embeddings to further boost effectiveness.

Conclusion: A Step Forward in Multilingual Embeddings

KaLM-Embedding represents a significant advancement in multilingual embedding models. By addressing challenges such as noisy data and inflexible architectures, it achieves a balance between efficiency and performance. The open-source release under the MIT license invites researchers and practitioners to explore and build upon this work.

With its robust multilingual performance and innovative methodologies, KaLM-Embedding is well-positioned for diverse applications, from retrieval-augmented systems to cross-lingual tasks. As the need for multilingual NLP solutions continues to grow, KaLM-Embedding serves as a testament to the impact of high-quality data and thoughtful model design.

Check out the Paper, Models, and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Meet KaLM-Embedding: A Series of Multilingual Embedding Models Built on Qwen2-0.5B and Released Under MIT appeared first on MarkTechPost.

Evola: An 80B-Parameter Multimodal Protein-Language Model for Decoding …

Proteins, essential molecular machines evolved over billions of years, perform critical life-sustaining functions encoded in their sequences and revealed through their 3D structures. Decoding their functional mechanisms remains a core challenge in biology despite advances in experimental and computational tools. While AlphaFold and similar models have revolutionized structure prediction, the gap between structural knowledge and functional understanding persists, compounded by the exponential growth of unannotated protein sequences. Traditional tools rely on evolutionary similarities, limiting their scope. Emerging protein-language models offer promise, leveraging deep learning to decode protein “language,” but limited, diverse, and context-rich training data constrain their effectiveness.

Researchers from Westlake University and Nankai University developed Evola, an 80-billion-parameter multimodal protein-language model designed to interpret the molecular mechanisms of proteins through natural language dialogue. Evola integrates a protein language model (PLM) as an encoder, an LLM as a decoder, and an alignment module, enabling precise protein function predictions. Trained on an unprecedented dataset of 546 million protein-question-answer pairs and 150 billion tokens, Evola leverages Retrieval-Augmented Generation (RAG) and Direct Preference Optimization (DPO) to enhance response relevance and quality. Evaluated using the novel Instructional Response Space (IRS) framework, Evola provides expert-level insights, advancing proteomics research.

Evola is a multimodal generative model designed to answer functional protein questions. It integrates protein-specific knowledge with LLMs for accurate and context-aware responses. Evola features a frozen protein encoder, a trainable sequence compressor and aligner, and a pre-trained LLM decoder. It employs DPO for fine-tuning based on GPT-scored preferences and RAG to enhance response accuracy using Swiss-Prot and ProTrek datasets. Applications include protein function annotation, enzyme classification, gene ontology, subcellular localization, and disease association. Evola is available in two versions: a 10B-parameter model and an 80B-parameter model still under training.

The study introduces Evola, an advanced 80-billion-parameter multimodal protein-language model designed to interpret protein functions through natural language dialogue. Evola integrates a protein language model as the encoder, a large language model as the decoder, and an intermediate module for compression and alignment. It employs RAG to incorporate external knowledge and DPO to enhance response quality and refine outputs based on preference signals. Evaluation using the IRS framework demonstrates Evola’s capability to generate precise and contextually relevant insights into protein functions, thereby advancing proteomics and functional genomics research. 

The results demonstrate that Evola outperforms existing models in protein function prediction and natural language dialogue tasks. Evola was evaluated on diverse datasets and achieved state-of-the-art performance in generating accurate, context-sensitive answers to protein-related questions. Benchmarking with the IRS framework revealed its high precision, interpretability, and response relevance. The qualitative analysis highlighted Evola’s ability to address nuanced functional queries and generate protein annotations comparable to expert-curated knowledge. Additionally, ablation studies confirmed the effectiveness of its training strategies, including retrieval-augmented generation and direct preference optimization, in enhancing response quality and alignment with biological contexts. This establishes Evola as a robust tool for proteomics.

In conclusion, Evola is an 80-billion-parameter generative protein-language model designed to decode the molecular language of proteins. Using natural language dialogue, it bridges protein sequences, structures, and biological functions. Evola’s innovation lies in its training on an AI-synthesized dataset of 546 million protein question-answer pairs, encompassing 150 billion tokens—unprecedented in scale. Employing DPO and RAG it refines response quality and integrates external knowledge. Evaluated using the IRS, Evola delivers expert-level insights, advancing proteomics and functional genomics while offering a powerful tool to unravel the molecular complexity of proteins and their biological roles.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Evola: An 80B-Parameter Multimodal Protein-Language Model for Decoding Protein Functions via Natural Language Dialogue appeared first on MarkTechPost.

This AI Paper Explores Quantization Techniques and Their Impact on Mat …

Mathematical reasoning stands at the backbone of artificial intelligence and is highly important in arithmetic, geometric, and competition-level problems. Recently, LLMs have emerged as very useful tools for reasoning, showing the ability to produce detailed step-by-step reasoning and present coherent explanations about complex tasks. However, due to such success, it’s becoming harder and harder to support these models with the computational resources required, thus leading to difficulty deploying them in restricted environments.

An immediate challenge for researchers is lowering LLMs’ computational and memory needs without deteriorating performance. Mathematical reasoning poses a very big challenge as a task in maintaining the need for accuracy and logical consistency, without which many techniques may compromise those aims. Scaling models to realistic uses is severely affected by such limitations.

Current approaches toward this challenge are pruning, knowledge distillation, and quantization. Quantization, the process of converting model weights and activations to low-bit formats, has indeed been promising to reduce memory consumption while improving computational efficiency. However, its impact on tasks requiring stepwise reasoning is poorly understood, especially in mathematical domains. Most existing methods cannot capture the nuances of the trade-offs between efficiency and reasoning fidelity.

A group of researchers from The Hong Kong Polytechnic University, Southern University of Science & Technology, Tsinghua University, Wuhan University, and The University of Hong Kong developed a systematic framework for the effects of quantization on mathematical reasoning. They used several techniques for quantization, such as GPTQ and SmoothQuant, to combine and evaluate the impact of both techniques on reasoning. The team focused on the MATH benchmark, which requires step-by-step problem-solving, and analyzed the performance degradation caused by these methods under varying levels of precision.

The researchers used a methodology that involved training models with structured tokens and annotations. These included special markers to define reasoning steps, ensuring the model could retain intermediate steps even under quantization. To reduce architectural changes to the models while applying fine-tuning techniques similar to LoRA, this adapted approach balances the trade-off of efficiency and accuracy in the implementation and the quantized model. Hence, it provides logical consistency to the models. Similarly, the PRM800K dataset’s step-level correctness has been considered training data to enable a granular set of reasoning steps that the model would learn to reproduce.

A thorough performance analysis unveiled critical deficiencies of the quantized models. Quantization heavily impacted computation-intensive tasks, with large performance degradations across different configurations. For example, the Llama-3.2-3B model lost accuracy, with scores falling from 5.62 in full precision to 3.88 with GPTQ quantization and 4.64 with SmoothQuant. The Llama-3.1-8B model had smaller performance losses, with scores falling from 15.30 in full precision to 11.56 with GPTQ and 13.56 with SmoothQuant. SmoothQuant showed the highest robustness of all methods tested, performing better than GPTQ and AWQ. The results highlighted some of the challenges in low-bit formats, particularly maintaining numerical computation precision and logical coherence.

An in-depth error analysis categorized issues into computation errors, logical errors, and step omissions. Computation errors were the most frequent, often stemming from low-bit precision overflow, disrupting the accuracy of multi-step calculations. Step omissions were also prevalent, especially in models with reduced activation precision, which failed to retain intermediate reasoning steps. Interestingly, some quantized models outperformed their full-precision counterparts in specific reasoning tasks, highlighting the nuanced effects of quantization.

The results of this study clearly illustrate the trade-offs between computational efficiency and reasoning accuracy in quantized LLMs. Although techniques such as SmoothQuant help mitigate some of the performance degradation, the challenges of maintaining high-fidelity reasoning remain significant. Researchers have provided valuable insights into optimizing LLMs for resource-constrained environments by introducing structured annotations and fine-tuning methods. These findings are pivotal for deploying LLMs in practical applications, offering a pathway to balance efficiency with reasoning capabilities.

In summary, this study addresses the critical gap in understanding the effect of quantization on mathematical reasoning. The methodologies and frameworks proposed here indicate some of the inadequacies in the existing quantization techniques and provide actionable strategies to overcome them. These advances open pathways toward more efficient and capable AI systems, narrowing the gap between theoretical potential and real-world applicability.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post This AI Paper Explores Quantization Techniques and Their Impact on Mathematical Reasoning in Large Language Models appeared first on MarkTechPost.

Build an Amazon Bedrock based digital lending solution on AWS

Digital lending is a critical business enabler for banks and financial institutions. Customers apply for a loan online after completing the know your customer (KYC) process. A typical digital lending process involves various activities, such as user onboarding (including steps to verify the user through KYC), credit verification, risk verification, credit underwriting, and loan sanctioning. Currently, some of these activities are done manually, leading to delays in loan sanctioning and impacting the customer experience.
In India, the KYC verification usually involves identity verification through identification documents for Indian citizens, such as a PAN card or Aadhar card, address verification, and income verification. Credit checks in India are normally done using the PAN number of a customer. The ideal way to address these challenges is to automate them to the extent possible.
The digital lending solution primarily needs orchestration of a sequence of steps and other features such as natural language understanding, image analysis, real-time credit checks, and notifications. You can seamlessly build automation around these features using Amazon Bedrock Agents. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock Agents, you can orchestrate multi-step processes and integrate with enterprise data using natural language instructions.
In this post, we propose a solution using DigitalDhan, a generative AI-based solution to automate customer onboarding and digital lending. The proposed solution uses Amazon Bedrock Agents to automate services related to KYC verification, credit and risk assessment, and notification. Financial institutions can use this solution to help automate the customer onboarding, KYC verification, credit decisioning, credit underwriting, and notification processes. This post demonstrates how you can gain a competitive advantage using Amazon Bedrock Agents based automation of a complex business process.
Why generative AI is best suited for assistants that support customer journeys
Traditional AI assistants that use rules-based navigation or natural language processing (NLP) based guidance fall short when handling the nuances of complex human conversations. For instance, in a real-world customer conversation, the customer might provide inadequate information (for example, missing documents), ask random or unrelated questions that aren’t part of the predefined flow (for example, asking for loan pre-payment options while verifying the identity documents), natural language inputs (such as using various currency modes, such as representing twenty thousand as “20K” or “20000” or “20,000”). Additionally, rules-based assistants don’t provide additional reasoning and explanations (such as why a loan was denied). Some of the rigid and linear flow-related rules either force customers to start the process over again or the conversation requires human assistance.
Generative AI assistants excel at handling these challenges. With well-crafted instructions and prompts, a generative AI-based assistant can ask for missing details, converse in human-like language, and handle errors gracefully while explaining the reasoning for their actions when required. You can add guardrails to make sure that these assistants don’t deviate from the main topic and provide flexible navigation options that account for real-world complexities. Context-aware assistants also enhance customer engagement by flexibly responding to the various off-the-flow customer queries.
Solution overview
DigitalDhan, the proposed digital lending solution, is powered by Amazon Bedrock Agents. They have developed a solution that fully automates the customer onboarding, KYC verification, and credit underwriting process. The DigitalDhan service provides the following features:

Customers can understand the step-by-step loan process and the documents required through the solution
Customers can upload KYC documents such as PAN and Aadhar, which DigitalDhan verifies through automated workflows
DigitalDhan fully automates the credit underwriting and loan application process
DigitalDhan notifies the customer about the loan application through email

We have modeled the digital lending process close to a real-world scenario. The high-level steps of the DigitalDhan solution are shown in the following figure.

The key business process steps are:

The loan applicant initiates the loan application flow by accessing the DigitalDhan solution.
The loan applicant begins the loan application journey. Sample prompts for the loan application include:

“What is the process to apply for loan?”
“I would like to apply for loan.”
“My name is Adarsh Kumar. PAN is ABCD1234 and email is john_doe@example.org. I need a loan for 150000.”
The applicant uploads their PAN card.
The applicant uploads their Aadhar card.

The DigitalDhan processes each of the natural language prompts. As part of the document verification process, the solution extracts the key details from the uploaded PAN and Aadhar cards such as name, address, date of birth, and so on. The solution then identifies whether the user is an existing customer using the PAN.

If the user is an existing customer, the solution gets the internal risk score for the customer.
If the user is a new customer, the solution gets the credit score based on the PAN details.

The solution uses the internal risk score for an existing customer to check for credit worthiness.
The solution uses the external credit score for a new customer to check for credit worthiness.
The credit underwriting process involves credit decisioning based on the credit score and risk score, and calculates the final loan amount for the approved customer.
The loan application details along with the decision are sent to the customer through email.

Technical solution architecture
The solution primarily uses Amazon Bedrock Agents (to orchestrate the multi-step process), Amazon Textract (to extract data from the PAN and Aadhar cards), and Amazon Comprehend (to identify the entities from the PAN and Aadhar card). The solution architecture is shown in the following figure.

The key solution components of the DigitalDhan solution architecture are:

A user begins the onboarding process with the DigitalDhan application. They provide various documents (including PAN and Aadhar) and a loan amount as part of the KYC
After the documents are uploaded, they’re automatically processed using various artificial intelligence and machine learning (AI/ML) services.
Amazon Textract is used to extract text information from the uploaded documents.
Amazon Comprehend is used to identify entities such as PAN and Aadhar.
The credit underwriting flow is powered by Amazon Bedrock Agents.

The knowledge base contains loan-related documents to respond to loan-related queries.
The loan handler AWS Lambda function uses the information in the KYC documents to check the credit score and internal risk score. After the credit checks are complete, the function calculates the loan eligibility and processes the loan application.
The notification Lambda function emails information about the loan application to the customer.

The Lambda function can be integrated with external credit APIs.
Amazon Simple Email Service (Amazon SES) is used to notify customers of the status of their loan application.
The events are logged using Amazon CloudWatch.

Amazon Bedrock Agents deep dive
Because we used Amazon Bedrock Agents heavily in the DigitalDhan solution, let’s look at the overall functioning of Amazon Bedrock Agents. The flow of the various components of Amazon Bedrock Agents is shown in the following figure.

The Amazon Bedrock agents break each task into subtasks, determine the right sequence, and perform actions and knowledge searches. The detailed steps are:

Processing the loan application is the primary task performed by the Amazon Bedrock agents in the DigitalDhan solution.
The Amazon Bedrock agents use the user prompts, conversation history, knowledge base, instructions, and action groups to orchestrate the sequence of steps related to loan processing. The Amazon Bedrock agent takes natural language prompts as inputs. The following are the instructions given to the agent:

You are DigitalDhan, an advanced AI lending assistant designed to provide personal loan-related information create loan application. Always ask for relevant information and avoid making assumptions. If you’re unsure about something, clearly state “I don’t have that information.”

Always greet the user by saying the following: Hi there! I am DigitalDhan bot. I can help you with loans over this chat. To apply for a loan, kindly provide your full name, PAN Number, email, and the loan amount.”

When a user expresses interest in applying for a loan, follow these steps in order, always ask the user for necessary details:

1. Determine user status: Identify if they’re an existing or new customer.

2. User greeting (mandatory, do not skip): After determining user status, welcome returning users using the following format:

  Existing customer: Hi {customerName}, I see you are an existing customer. Please upload your PAN for KYC.

  New customer: Hi {customerName}, I see you are a new customer. Please upload your PAN and Aadhar for KYC.

3. Call Pan Verification step using the uploaded PAN document

4. Call Aadhaar Verification step using the uploaded Aadhaar document. Request the user to upload their Aadhaar card document for verification.

5. Loan application: Collect all necessary details to create the loan application.

6. If the loan is approved (email will be sent with details):

   For existing customers: If the loan officer approves the application, inform the user that their loan application has been approved using following format: Congratulations {customerName}, your loan is sanctioned. Based on your PAN {pan}, your risk score is {riskScore} and your overall credit score is {cibilScore}. I have created your loan and the application ID is {loanId}. The details have been sent to your email.

   For new customers: If the loan officer approves the application, inform the user that their loan application has been approved using following format: Congratulations {customerName}, your loan is sanctioned. Based on your PAN {pan} and {aadhar}, your risk score is {riskScore} and your overall credit score is {cibilScore}. I have created your loan and the application ID is {loanId}. The details have been sent to your email.

7. If the loan is rejected ( no emails sent):

   For new customers: If the loan officer rejects the application, inform the user that their loan application has been rejected using following format: Hello {customerName}, Based on your PAN {pan} and aadhar {aadhar}, your overall credit score is {cibilScore}. Because of the low credit score, unfortunately your loan application cannot be processed.

   For existing customers: If the loan officer rejects the application, inform the user that their loan application has been rejected using following format: Hello {customerName}, Based on your PAN {pan}, your overall credit score is {creditScore}. Because of the low credit score, unfortunately your loan application cannot be processed.

Remember to maintain a friendly, professional tone and prioritize the user’s needs and concerns throughout the interaction. Be short and direct in your responses and avoid making assumptions unless specifically requested by the user.

Be short and prompt in responses, do not answer queries beyond the lending domain and respond saying you are a lending assistant

We configured the agent preprocessing and orchestration instructions to validate and perform the steps in a predefined sequence. The few-shot examples specified during the agent instructions boost the accuracy of the agent performance. Based on the instructions and the API descriptions, the Amazon Bedrock agent creates a logical sequence of steps to complete an action. In the DigitalDhan example, instructions are specified such that the Amazon Bedrock agent creates the following sequence:

Greet the customer.
Collect the customer’s name, email, PAN, and loan amount.
Ask for the PAN card and Aadhar card to read and verify the PAN and Aadhar number.
Categorize the customer as an existing or new customer based on the verified PAN.
For an existing customer, calculate the customer internal risk score.
For a new customer, get the external credit score.
Use the internal risk score (for existing customers) or credit score (for external customers) for credit underwriting. If the internal risk score is less than 300 or if the credit score is more than 700, sanction the loan amount.
Email the credit decision to the customer’s email address.

Action groups define the APIs for performing actions such as creating the loan, checking the user, fetching the risk score, and so on. We described each of the APIs in the OpenAPI schema, which the agent uses to select the most appropriate API to perform the action. Lambda is associated with the action group. The following code is an example of the create_loan API. The Amazon Bedrock agent uses the description for the create_loan API while performing the action. The API schema also specifies customerName, address, loanAmt, PAN, and riskScore as required elements for the APIs. Therefore, the corresponding APIs read the PAN number for the customer (verify_pan_card API), calculate the risk score for the customer (fetch_risk_score API), and identify the customer’s name and address (verify_aadhar_card API) before calling the create_loan API.

“/create_loan”:
  post:
    summary: Create New Loan application
    description: Create new loan application for the customer. This API must be
      called for each new loan application request after calculating riskscore and
      creditScore
    operationId: createLoan
    requestBody:
      required: true
      content:
        application/json:
          schema:
            type: object
            properties:
              customerName:
                type: string
                description: Customer’s Name for creating the loan application
                minLength: 3
              loanAmt:
                type: string
                description: Preferred loan amount for the loan application
                minLength: 5
              pan:
                type: string
                description: Customer’s PAN number for the loan application
                minLength: 10
              riskScore:
                type: string
                description: Risk Score of the customer
                minLength: 2
              creditScore:
                type: string
                description: Risk Score of the customer
                minLength: 3
            required:
            – customerName
            – address
            – loanAmt
            – pan
            – riskScore
            – creditScore
    responses:
      ‘200’:
        description: Success
        content:
          application/json:
            schema:
              type: object
              properties:
                loanId:
                  type: string
                  description: Identifier for the created loan application
                status:
                  type: string
                  description: Status of the loan application creation process

Amazon Bedrock Knowledge Bases provides a cloud-based Retrieval Augmented Generation (RAG) experience to the customer. We have added the documents related to loan processing, the general information, the loan information guide, and the knowledge base. We specified the instructions for when to use the knowledge base. Therefore, during the beginning of a customer journey, when the customer is in the exploration stage, they get responses with how-to instructions and general loan-related information. For instance, if the customer asks “What is the process to apply for a loan?” the Amazon Bedrock agent fetches the relevant step-by-step details from the knowledge base.
After the required steps are complete, the Amazon Bedrock agent curates the final response to the customer.

Let’s explore an example flow for an existing customer. For this example, we have depicted various actions performed by Amazon Bedrock Agents for an existing customer. First, the customer begins the loan journey by asking exploratory questions. We have depicted one such question—“What is the process to apply for a loan?”—in the following figure. Amazon Bedrock responds to such questions by providing a step-by-step guide fetched from the configured knowledge base.

The customer proceeds to the next step and tries to apply for a loan. The DigitalDhan solution asks for the user details such as the customer name, email address, PAN number, and desired loan amount. After the customer provides those details, the solution asks for the actual PAN card to verify the details, as shown in in the following figure.

When the PAN verification and the risk score checks are complete, the DigitalDhan solution creates a loan application and notifies the customer of the decision through the email, as shown in the following figure.

Prerequisites
This project is built using the AWS Cloud Development Kit (AWS CDK).
For reference, the following versions of node and AWS CDK are used:

js: v20.16.0
AWS CDK: 2.143.0
The command to install a specific version of the AWS CDK is npm install -g aws-cdk@<X.YY.Z>

Deploy the Solution
Complete the following steps to deploy the solution. For more details, refer to the GitHub repo.

Clone the repository:

git clone https://github.com/aws-samples/DigitalDhan-GenAI-FSI-LendingSolution-India.git

Enter the code sample backend directory:

cd DigitalDhan-GenAI-FSI-LendingSolution-India/

Install packages:

npm install
npm install -g aws-cdk

Bootstrap AWS CDK resources on the AWS account. If deployed in any AWS Region other than us-east-1, the stack might fail because of Lambda layers dependency. You can either comment the layer and deploy in another Region or deploy in us-east-1.

cdk bootstrap aws://<ACCOUNT_ID>/<REGION>

You must explicitly enable access to models before they can be used with the Amazon Bedrock service. Follow the steps in Access Amazon Bedrock foundation models to enable access to the models (Anthropic::Claude (Sonnet) and Cohere::Embed English).
Deploy the sample in your account. The following command will deploy one stack in your account cdk deploy –all To protect against unintended changes that might affect your security posture, the AWS CDK prompts you to approve security-related changes before deploying them. You will need to answer yes to fully deploy the stack.

The AWS Identity and Access Management (IAM) role creation in this example is for illustration only. Always provision IAM roles with the least required privileges. The stack deployment takes approximately 10–15 minutes. After the stack is successfully deployed, you can find InsureAssistApiAlbDnsName in the output section of the stack—this is the application endpoint.
Enable user input
After deployment is complete, enable user input so the agent can prompt the customer to provide addition information if necessary.

Open the Amazon Bedrock console in the deployed Region and edit the agent.
Modify the additional settings to enable User Input to allow the agent to prompt for additional information from the user when it doesn’t have enough information to respond to a prompt.

Test the solution
We covered three test scenarios in the solution. The sample data and prompts for the three scenarios can found in the GitHub repo.

Scenario 1 is an existing customer who will be approved for the requested loan amount
Scenario 2 is a new customer who will be approved for the requested loan amount
Scenario 3 is a new customer whose loan application will be denied because of a low credit score

Clean up
To avoid future charges, delete the sample data stored in Amazon Simple Storage Service (Amazon S3) and the stack:

Remove all data from the S3 bucket.
Delete the S3 bucket.
Use the following command to destroy the stack: cdk destroy

Summary
The proposed digital lending solution discussed in this post onboards a customer by verifying the KYC documents (including the PAN and Aadhar cards) and categorizes the customer as an existing customer or a new customer. For an existing customer, the solution uses an internal risk score, and for a new customer, the solution uses the external credit score.
The solution uses Amazon Bedrock Agents to orchestrate the digital lending processing steps. The documents are processed using Amazon Textract and Amazon Comprehend, after which Amazon Bedrock Agents processes the workflow steps. The customer identification, credit checks, and customer notification are implemented using Lambda.
The solution demonstrates how you can automate a complex business process with the help of Amazon Bedrock Agents and enhance customer engagement through a natural language interface and flexible navigation options.
Test some Amazon Bedrock for banking use cases such as building customer service bots, email classification, and sales assistants by using the powerful FMs and Amazon Bedrock Knowledge Bases that provide a managed RAG experience. Explore using Amazon Bedrock Agents to help orchestrate and automate complex banking processes such as customer onboarding, document verification, digital lending, loan origination, and customer servicing.

About the Authors
Shailesh Shivakumar is a FSI Sr. Solutions Architect with AWS India. He works with financial enterprises such as banks, NBFCs, and trading enterprises to help them design secure cloud services and engages with them to accelerate their cloud journey. He builds demos and proofs of concept to demonstrate the possibilities of AWS Cloud. He leads other initiatives such as customer enablement workshops, AWS demos, cost optimization, and solution assessments to make sure that AWS customers succeed in their cloud journey. Shailesh is part of Machine Learning TFC at AWS, handling the generative AI and machine learning-focused customer scenarios. Security, serverless, containers, and machine learning in the cloud are his key areas of interest.
Reena Manivel is AWS FSI Solutions Architect. She specializes in analytics and works with customers in lending and banking businesses to create secure, scalable, and efficient solutions on AWS. Besides her technical pursuits, she is also a writer and enjoys spending time with her family.

Build AI-powered malware analysis using Amazon Bedrock with Deep Insti …

This post is co-written with Yaniv Avolov, Tal Furman and Maor Ashkenazi from Deep Instinct.
Deep Instinct is a cybersecurity company that offers a state-of-the-art, comprehensive zero-day data security solution—Data Security X (DSX), for safeguarding your data repositories across the cloud, applications, network attached storage (NAS), and endpoints. DSX provides unmatched prevention and explainability by using a powerful combination of deep learning-based DSX Brain and generative AI DSX Companion to protect systems from known and unknown malware and ransomware in real-time.
Using deep neural networks (DNNs), Deep Instinct analyzes threats with unmatched accuracy, adapting to identify new and unknown risks that traditional methods might miss. This approach significantly reduces false positives and enables unparalleled threat detection rates, making it popular among large enterprises and critical infrastructure sectors such as finance, healthcare, and government.
In this post, we explore how Deep Instinct’s generative AI-powered malware analysis tool, DIANNA, uses Amazon Bedrock to revolutionize cybersecurity by providing rapid, in-depth analysis of known and unknown threats, enhancing the capabilities of AWS System and Organization Controls (SOC) teams and addressing key challenges in the evolving threat landscape.
Main challenges for SecOps
There are two main challenges for SecOps:

The growing threat landscape – With a rapidly evolving threat landscape, SOC teams are becoming overwhelmed with a continuous increase of security alerts that require investigation. This situation hampers proactive threat hunting and exacerbates team burnout. Most importantly, the surge in alert storms increases the risk of missing critical alerts. A solution is needed that provides the explainability necessary to allow SOC teams to perform quick risk assessments regarding the nature of incidents and make informed decisions.
The challenges of malware analysis – Malware analysis has become an increasingly critical and complex field. The challenge of zero-day attacks lies in the limited information about why a file was blocked and classified as malicious. Threat analysts often spend considerable time assessing whether it was a genuine exploit or a false positive.

Let’s explore some of the key challenges that make malware analysis demanding:

Identifying malware – Modern malware has become incredibly sophisticated in its ability to disguise itself. It often mimics legitimate software, making it challenging for analysts to distinguish between benign and malicious code. Some malware can even disable security tools or evade scanners, further obfuscating detection.
Preventing zero-day threats – The rise of zero-day threats, which have no known signatures, adds another layer of difficulty. Identifying unknown malware is crucial, because failure can lead to severe security breaches and potentially incapacitate organizations.
Information overload – The powerful malware analysis tools currently available can be both beneficial and detrimental. Although they offer high explainability, they can also produce an overwhelming amount of data, forcing analysts to sift through a digital haystack to find indicators of malicious activity, increasing the possibility of analysts overlooking critical compromises.
Connecting the dots – Malware often consists of multiple components interacting in complex ways. Not only do analysts need to identify the individual components, but they also need to understand how they interact. This process is like assembling a jigsaw puzzle to form a complete picture of the malware’s capabilities and intentions, with pieces constantly changing shape.
Keeping up with cybercriminals – The world of cybercrime is fluid, with bad actors relentlessly developing new techniques and exploiting newly emerging vulnerabilities, leaving organizations struggling to keep up. The time window between the discovery of a vulnerability and its exploitation in the wild is narrowing, putting pressure on analysts to work faster and more efficiently. This rapid evolution means that malware analysts must constantly update their skill set and tools to stay one step ahead of the cybercriminals.
Racing against the clock – In malware analysis, time is of the essence. Malicious software can spread rapidly across networks, causing significant damage in a matter of minutes, often before the organization realizes an exploit has occurred. Analysts face the pressure of conducting thorough examinations while also providing timely insights to prevent or mitigate exploits.

DIANNA, the DSX Companion
There is a critical need for malware analysis tools that can provide precise, real-time, in-depth malware analysis for both known and unknown threats, supporting SecOps efforts. Deep Instinct, recognizing this need, has developed DIANNA (Deep Instinct’s Artificial Neural Network Assistant), the DSX Companion. DIANNA is a groundbreaking malware analysis tool powered by generative AI to tackle real-world issues, using Amazon Bedrock as its large language model (LLM) infrastructure. It offers on-demand features that provide flexible and scalable AI capabilities tailored to the unique needs of each client. Amazon Bedrock is a fully managed service that grants access to high-performance foundation models (FMs) from top AI companies through a unified API. By concentrating our generative AI models on specific artifacts, we can deliver comprehensive yet focused responses to address this gap effectively.
DIANNA is a sophisticated malware analysis tool that acts as a virtual team of malware analysts and incident response experts. It enables organizations to shift strategically toward zero-day data security by integrating with Deep Instinct’s deep learning capabilities for a more intuitive and effective defense against threats.
DIANNA’s unique approach
Current cybersecurity solutions use generative AI to summarize data from existing sources, but this approach is limited to retrospective analysis with limited context. DIANNA enhances this by integrating the collective expertise of numerous cybersecurity professionals within the LLM, enabling in-depth malware analysis of unknown files and accurate identification of malicious intent.
DIANNA’s unique approach to malware analysis sets it apart from other cybersecurity solutions. Unlike traditional methods that rely solely on retrospective analysis of existing data, DIANNA harnesses generative AI to empower itself with the collective knowledge of countless cybersecurity experts, sources, blog posts, papers, threat intelligence reputation engines, and chats. This extensive knowledge base is effectively embedded within the LLM, allowing DIANNA to delve deep into unknown files and uncover intricate connections that would otherwise go undetected.
At the heart of this process are DIANNA’s advanced translation engines, which transform complex binary code into natural language that LLMs can understand and analyze. This unique approach bridges the gap between raw code and human-readable insights, enabling DIANNA to provide clear, contextual explanations of a file’s intent, malicious aspects, and potential system impact. By translating the intricacies of code into accessible language, DIANNA addresses the challenge of information overload, distilling vast amounts of data into concise, actionable intelligence.
This translation capability is key for linking between different components of complex malware. It allows DIANNA to identify relationships and interactions between various parts of the code, offering a holistic view of the threat landscape. By piecing together these components, DIANNA can construct a comprehensive picture of the malware’s capabilities and intentions, even when faced with sophisticated threats. DIANNA doesn’t stop at simple code analysis—it goes deeper. It provides insights into why unknown events are malicious, streamlining what is often a lengthy process. This level of understanding allows SOC teams to focus on the threats that matter most.
Solution overview
DIANNA’s integration with Amazon Bedrock allows us to harness the power of state-of-the-art language models while maintaining agility to adapt to evolving client requirements and security considerations. DIANNA benefits from the robust features of Amazon Bedrock, including seamless scaling, enterprise-grade security, and the ability to fine-tune models for specific use cases.
The integration offers the following benefits:

Accelerated development with Amazon Bedrock – The fast-paced evolution of the threat landscape necessitates equally responsive cybersecurity solutions. DIANNA’s collaboration with Amazon Bedrock has played a crucial role in optimizing our development process and speeding up the delivery of innovative capabilities. The service’s versatility has enabled us to experiment with different FMs, exploring their strengths and weaknesses in various tasks. This experimentation has led to significant advancements in DIANNA’s ability to understand and explain complex malware behaviors. We have also benefited from the following features:

Fine-tuning – Alongside its core functionalities, Amazon Bedrock provides a range of ready-to-use features for customizing the solution. One such feature is model fine-tuning, which allows you to train FMs on proprietary data to enhance your performance in specific domains. For example, organizations can fine-tune an LLM-based malware analysis tool to recognize industry-specific jargon or detect threats associated with particular vulnerabilities.
Retrieval Augmented Generation – Another valuable feature is the use of Retrieval Augmented Generation (RAG), enabling access to and the incorporation of relevant information from external sources, such as knowledge bases or threat intelligence feeds. This enhances the model’s ability to provide contextually accurate and informative responses, improving the overall effectiveness of malware analysis.

A landscape for innovation and comparison – Amazon Bedrock has also served as a valuable landscape for conducting LLM-related research and comparisons.
Seamless integration, scalability, and customization – Integrating Amazon Bedrock into DIANNA’s architecture was a straightforward process. The user-friendly Amazon Bedrock API and well-documented facilitated seamless integration with our existing infrastructure. Furthermore, the service’s on-demand nature allows us to scale our AI capabilities up or down based on customer demand. This flexibility makes sure that DIANNA can handle fluctuating workloads without compromising performance.
Prioritizing data security and compliance – Data security and compliance are paramount in the cybersecurity domain. Amazon Bedrock offers enterprise-grade security features that provide us with the confidence to handle sensitive customer data. The service’s adherence to industry-leading security standards, coupled with the extensive experience of AWS in data protection, makes sure DIANNA meets the highest regulatory requirements such as GDPR. By using Amazon Bedrock, we can offer our customers a solution that not only protects their assets, but also demonstrates our commitment to data privacy and security.

By combining Deep Instinct’s proprietary prevention algorithms with the advanced language processing capabilities of Amazon Bedrock, DIANNA offers a unique solution that not only identifies and analyzes threats with high accuracy, but also communicates its findings in clear, actionable language. This synergy between Deep Instinct’s expertise in cybersecurity and the leading AI infrastructure of Amazon positions DIANNA at the forefront of AI-driven malware analysis and threat prevention.
The following diagram illustrates DIANNA’s architecture.

Evaluating DIANNA’s malware analysis
In our task, the input is a malware sample, and the output is a comprehensive, in-depth report on the behaviors and intents of the file. However, generating ground truth data is particularly challenging. The behaviors and intents of malicious files aren’t readily available in standard datasets and require expert malware analysts for accurate reporting. Therefore, we needed a custom evaluation approach.
We focused our evaluation on two core dimensions:

Technical features – This dimension focuses on objective, measurable capabilities. We used programmable metrics to assess how well DIANNA handled key technical aspects, such as extracting indicators of compromise (IOCs), detecting critical keywords, and processing the length and structure of threat reports. These metrics allowed us to quantitatively assess the model’s basic analysis capabilities.
In-depth semantics – Because DIANNA is expected to generate complex, human-readable reports on malware behavior, we relied on domain experts (malware analysts) to assess the quality of the analysis. The reports were evaluated based on the following:

Depth of information – Whether DIANNA provided a detailed understanding of the malware’s behavior and techniques.
Accuracy – How well the analysis aligned with the true behaviors of the malware.
Clarity and structure – Evaluating the organization of the report, making sure the output was clear and comprehensible for security teams.

Because human evaluation is labor-intensive, fine-tuning the key components (the model itself, the prompts, and the translation engines) involved iterative feedback loops. Small adjustments in a component led to significant variations in the output, requiring repeated validations by human experts. The meticulous nature of this process, combined with the continuous need for scaling, has subsequently led to the development of the auto-evaluation capability.
Fine-tuning process and human validation
The fine-tuning and validation process consisted of the following steps:

Gathering a malware dataset – To cover the breadth of malware techniques, families, and threat types, we collected a large dataset of malware samples, each with technical metadata.
Splitting the dataset – The data was split into subsets for training, validation, and evaluation. Validation data was continually used to test how well DIANNA adapted after each key component update.
Human expert evaluation – Each time we fine-tuned DIANNA’s model, prompts, and translation mechanisms, human malware analysts reviewed a portion of the validation data. This made sure improvements or degradations in the quality of the reports were identified early. Because DIANNA’s outputs are highly sensitive to even minor changes, each update required a full reevaluation by human experts to verify whether the response quality was improved or degraded.
Final evaluation on a broader dataset – After sufficient tuning based on the validation data, we applied DIANNA to a large evaluation set. Here, we gathered comprehensive statistics on its performance to confirm improvements in report quality, correctness, and overall technical coverage.

Automation of evaluation
To make this process more scalable and efficient, we introduced an automatic evaluation phase. We trained a language model specifically designed to critique DIANNA’s outputs, providing a level of automation in assessing how well DIANNA was generating reports. This critique model acted as an internal judge, allowing for continuous, rapid feedback on incremental changes during fine-tuning. This enabled us to make small adjustments across DIANNA’s three core components (model, prompts, and translation engines) while receiving real-time evaluations of the impact of those changes.
This automated critique model enhanced our ability to test and refine DIANNA without having to rely solely on the time-consuming manual feedback loop from human experts. It provided a consistent, reliable measure of performance and allowed us to quickly identify which model adjustments led to meaningful improvements in DIANNA’s analysis.
Advanced integration and proactive analysis
DIANNA is integrated with Deep Instinct’s proprietary deep learning algorithms, enabling it to detect zero-day threats with high accuracy and a low false positive rate. This proactive approach helps security teams quickly identify unknown threats, reduce false positives, and allocate resources more effectively. Additionally, it streamlines investigations, minimizes cross-tool efforts, and automates repetitive tasks, making the decision-making process clearer and faster. This ultimately helps organizations strengthen their security posture and significantly reduce the mean time to triage.
This analysis offers the following key features and benefits:

Performs on-the-fly file scans, allowing for immediate assessment without prior setup or delays
Generates comprehensive malware analysis reports for a variety of file types in seconds, making sure users receive timely information about potential threats
Streamlines the entire file analysis process, making it more efficient and user-friendly, thereby reducing the time and effort required for thorough evaluations
Supports a wide range of common file formats, including Office documents, Windows executable files, script files, and Windows shortcut files (.lnk), providing compatibility with various types of data
Offers in-depth contextual analysis, malicious file triage, and actionable insights, greatly enhancing the efficiency of investigations into potentially harmful files
Empowers SOC teams to make well-informed decisions without relying on manual malware analysis by providing clear and concise insights into the behavior of malicious files
Alleviates the need to upload files to external sandboxes or VirusTotal, thereby enhancing security and privacy while facilitating quicker analysis

Explainability and insights into better decision-making for SOC teams
DIANNA stands out by offering clear insights into why unknown events are flagged as malicious. Traditional AI tools often rely on lengthy, retrospective analyses that can take hours or even days to generate, and often lead to vague conclusions. DIANNA dives deeper, understanding the intent behind the code and providing detailed explanations of its potential impact. This clarity allows SOC teams to prioritize the threats that matter most.
Example scenario of DIANNA in action
In this section, we explore some DIANNA use cases.
For example, DIANNA can perform investigations on malicious files.
The following screenshot is an example of a Windows executable file analysis.
The following screenshot is an example of an Office file analysis.

You can also quickly triage incidents with enriched data on file analysis provided by DIANNA. The following screenshot is an example using Windows shortcut files (LNK) analysis.
The following screenshot is an example with a script file (JavaScript) analysis.
The following figure presents a before and after comparison of the analysis process.
Additionally, a key advantage of DIANNA is its ability to provide explainability by correlating and summarizing the intentions of malicious files in a detailed narrative. This is especially valuable for zero-day and unknown threats that aren’t yet recognized, making investigations challenging when starting from scratch without any clues.
Potential advancements in AI-driven cybersecurity
AI capabilities are enhancing daily operations, but adversaries are also using AI to create sophisticated malicious events and advanced persistent threats. This leaves organizations, particularly SOC and cybersecurity teams, dealing with more complex incidents.
Although detection controls are useful, they often require significant resources and can be ineffective on their own. In contrast, using AI engines for prevention controls—such as a high-efficacy deep learning engine—can lower the total cost of ownership and help SOC analysts streamline their tasks.
Conclusion
The Deep Instinct solution can predict and prevent known, unknown, and zero-day threats in under 20 milliseconds—750 times faster than the fastest ransomware encryption. This makes it essential for security stacks, offering comprehensive protection in hybrid environments.
DIANNA provides expert malware analysis and explainability for zero-day attacks and can enhance the incident response process for the SOC team, allowing them to efficiently tackle and investigate unknown threats with minimal time investment. This, in turn, reduces the resources and expenses that Chief Information Security Officers (CISOs) need to allocate, enabling them to invest in more valuable initiatives.
DIANNA’s collaboration with Amazon Bedrock accelerated development, enabled innovation through experimentation with various FMs, and facilitated seamless integration, scalability, and data security. The rise of AI-based threats is becoming more pronounced. As a result, defenders must outpace increasingly sophisticated bad actors by moving beyond traditional AI tools and embracing advanced AI, especially deep learning. Companies, vendors, and cybersecurity professionals must consider this shift to effectively combat the growing prevalence of AI-driven exploits.

About the Authors
Tzahi Mizrahi is a Solutions Architect at Amazon Web Services with experience in cloud architecture and software development. His expertise includes designing scalable systems, implementing DevOps best practices, and optimizing cloud infrastructure for enterprise applications. He has a proven track record of helping organizations modernize their technology stack and improve operational efficiency. In his free time, he enjoys music and plays the guitar.
Tal Panchek is a Senior Business Development Manager for Artificial Intelligence and Machine Learning with Amazon Web Services. As a BD Specialist, he is responsible for growing adoption, utilization, and revenue for AWS services. He gathers customer and industry needs and partner with AWS product teams to innovate, develop, and deliver AWS solutions.
Yaniv Avolov is a Principal Product Manager at Deep Instinct, bringing a wealth of experience in the cybersecurity field. He focuses on defining and designing cybersecurity solutions that leverage AIML, including deep learning and large language models, to address customer needs. In addition, he leads the endpoint security solution, ensuring it is robust and effective against emerging threats. In his free time, he enjoys cooking, reading, playing basketball, and traveling.
Tal Furman is a Data Science and Deep Learning Director at Deep Instinct. His focused on applying Machine Learning and Deep Learning algorithms to tackle real world challenges, and takes pride in leading people and technology to shape the future of cyber security. In his free time, Tal enjoys running, swimming, reading and playfully trolling his kids and dogs.
Maor Ashkenazi is a deep learning research team lead at Deep Instinct, and a PhD candidate at Ben-Gurion University of the Negev. He has extensive experience in deep learning, neural network optimization, computer vision, and cyber security. In his spare time, he enjoys traveling, cooking, practicing mixology and learning new things.

Email your conversations from Amazon Q

As organizations navigate the complexities of the digital realm, generative AI has emerged as a transformative force, empowering enterprises to enhance productivity, streamline workflows, and drive innovation. To maximize the value of insights generated by generative AI, it is crucial to provide simple ways for users to preserve and share these insights using commonly used tools such as email.
Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It is redefining the way businesses approach data-driven decision-making, content generation, and secure task management. By using the custom plugin capability of Amazon Q Business, you can extend its functionality to support sending emails directly from Amazon Q applications, allowing you to store and share the valuable insights gleaned from your conversations with this powerful AI assistant.
Amazon Simple Email Service (Amazon SES) is an email service provider that provides a simple, cost-effective way for you to send and receive email using your own email addresses and domains. Amazon SES offers many email tools, including email sender configuration options, email deliverability tools, flexible email deployment options, sender and identity management, email security, email sending statistics, email reputation dashboard, and inbound email services.
This post explores how you can integrate Amazon Q Business with Amazon SES to email conversations to specified email addresses.
Solution overview
The following diagram illustrates the solution architecture.

The workflow includes the following steps:

Create an Amazon Q Business application with an Amazon Simple Storage Service (Amazon S3) data source. Amazon Q uses Retrieval Augmented Generation (RAG) to answer user questions.
Configure an AWS IAM Identity Center instance for your Amazon Q Business application environment with users and groups added. Amazon Q Business supports both organization- and account-level IAM Identity Center instances.
Create a custom plugin that invokes an OpenAPI schema of the Amazon API Gateway This API sends emails to the users.
Store OAuth information in AWS Secrets Manager and provide the secret information to the plugin.
Provide AWS Identity Manager and Access Management (IAM) roles to access the secrets in Secrets Manager.
The custom plugin takes the user to an Amazon Cognito sign-in page. The user provides credentials to log in. After authentication, the user session is stored in the Amazon Q Business application for subsequent API calls.
Post-authentication, the custom plugin will pass the token to API Gateway to invoke the API.
You can help secure your API Gateway REST API from common web exploits, such as SQL injection and cross-site scripting (XSS) attacks, using AWS WAF.
AWS Lambda hosted in Amazon Virtual Private Cloud (Amazon VPC) internally calls the Amazon SES SDK.
Lambda uses AWS Identity and Access Management (IAM) permissions to make an SDK call to Amazon SES.
Amazon SES sends an email using SMTP to verified emails provided by the user.

In the following sections, we walk through the steps to deploy and test the solution. This solution is supported only in the us-east-1 AWS Region.
Prerequisites
Complete the following prerequisites:

Have a valid AWS account.
Enable an IAM Identity Center instance and capture the Amazon Resource Name (ARN) of the IAM Identity Center instance from the settings page.
Add users and groups to IAM Identity Center.
Have an IAM role in the account that has sufficient permissions to create the necessary resources. If you have administrator access to the account, no action is necessary.
Enable Amazon CloudWatch Logs for API Gateway. For more information, see How do I turn on CloudWatch Logs to troubleshoot my API Gateway REST API or WebSocket API?
Have two email addresses to send and receive emails that you can verify using the link sent to you. Do not use existing verified identities in Amazon SES for these email addresses. Otherwise, the AWS CloudFormation template will fail.
Have an Amazon Q Business Pro subscription to create Amazon Q apps.
Have the service-linked IAM role AWSServiceRoleForQBusiness. If you don’t have one, create it with the amazonaws.com service name.
Enable AWS CloudTrail logging for operational and risk auditing. For instructions, see Creating a trail for your AWS account.
Enable budget policy notifications to help protect from unwanted billing.

Deploy the solution resources
In this step, we use a CloudFormation template to deploy a Lambda function, configure the REST API, and create identities. Complete the following steps:

Open the AWS CloudFormation console in the us-east-1
Choose Create stack.
Download the CloudFormation template and upload it in the Specify template
Choose Next.

For Stack name, enter a name (for example, QIntegrationWithSES).
In the Parameters section, provide the following:

For IDCInstanceArn, enter your IAM Identity Center instance ARN.
For LambdaName, enter the name of your Lambda function.
For Fromemailaddress, enter the address to send email.
For Toemailaddress, enter the address to receive email.

Choose Next.

Keep the other values as default and select I acknowledge that AWS CloudFormation might create IAM resources in the Capabilities
Choose Submit to create the CloudFormation stack.
After the successful deployment of the stack, on the Outputs tab, make a note of the value for apiGatewayInvokeURL. You will need this later to create a custom plugin.

Verification emails will be sent to the Toemailaddress and Fromemailaddress values provided as input to the CloudFormation template.

Verify the newly created email identities using the link in the email.

This post doesn’t cover auto scaling of Lambda functions. For more information about how to integrate Lambda with Application Auto Scaling, see AWS Lambda and Application Auto Scaling.
To configure AWS WAF on API Gateway, refer to Use AWS WAF to protect your REST APIs in API Gateway.
This is sample code, for non-production usage. You should work with your security and legal teams to meet your organizational security, regulatory, and compliance requirements before deployment.
Create Amazon Cognito users
This solution uses Amazon Cognito to authorize users to make a call to API Gateway. The CloudFormation template creates a new Amazon Cognito user pool.
Complete the following steps to create a user in the newly created user pool and capture information about the user pool:

On the AWS CloudFormation console, navigate to the stack you created.
On the Resources tab, choose the link next to the physical ID for CognitoUserPool.

On the Amazon Cognito console, choose User management and users in the navigation pane.
Choose Create user.
Enter an email address and password of your choice, then choose Create user.

In the navigation pane, choose Applications and app clients.
Capture the client ID and client secret. You will need these later during custom plugin development.
On the Login pages tab, copy the values for Allowed callback URLs. You will need these later during custom plugin development.
In the navigation pane, choose Branding.
Capture the Amazon Cognito domain. You will need this information to update OpenAPI specifications.

Upload documents to Amazon S3
This solution uses the fully managed Amazon S3 data source to seamlessly power a RAG workflow, eliminating the need for custom integration and data flow management.
For this post, we use sample articles to upload to Amazon S3. Complete the following steps:

On the AWS CloudFormation console, navigate to the stack you created.
On the Resources tab, choose the link for the physical ID of AmazonQDataSourceBucket.

Upload the sample articles file to the S3 bucket. For instructions, see Uploading objects.

Add users to the Amazon Q Business application
Complete the following steps to add users to the newly created Amazon Q business application:

On the Amazon Q Business console, choose Applications in the navigation pane.
Choose the application you created using the CloudFormation template.
Under User access, choose Manage user access.

On the Manage access and subscriptions page, choose Add groups and users.

Select Assign existing users and groups, then choose Next.
Search for your IAM Identity Center user group.

Choose the group and choose Assign to add the group and its users.
Make sure that the current subscription is Q Business Pro.
Choose Confirm.

Sync Amazon Q data sources
To sync the data source, complete the following steps:

On the Amazon Q Business console, navigate to your application.
Choose Data Sources under Enhancements in the navigation pane.
From the Data sources list, select the data source you created through the CloudFormation template.
Choose Sync now to sync the data source.

It takes some time to sync with the data source. Wait until the sync status is Completed.

Create an Amazon Q custom plugin
In this section, you create the Amazon Q custom plugin for sending emails. Complete the following steps:

On the Amazon Q Business console, navigate to your application.
Under Enhancements in the navigation pane, choose Plugins.
Choose Add plugin.

Choose Create custom plugin.
For Plugin name, enter a name (for example, email-plugin).
For Description, enter a description.
Select Define with in-line OpenAPI schema editor.

You can also upload API schemas to Amazon S3 by choosing Select from S3. That would be the best way to upload for production use cases.
Your API schema must have an API description, structure, and parameters for your custom plugin.

Select JSON for the schema format.
Enter the following schema, providing your API Gateway invoke URL and Amazon Cognito domain URL:

{
“openapi”: “3.0.0”,
“info”: {
“title”: “Send Email API”,
“description”: “API to send email from SES”,
“version”: “1.0.0”
},
“servers”: [
{
“url”: “< API Gateway Invoke URL >”
}
],
“paths”: {
“/”: {
“post”: {
“summary”: “send email to the user and returns the success message”,
“description”: “send email to the user and returns the success message”,
“security”: [
{
“OAuth2”: [
“email/read”
]
}
],
“requestBody”: {
“required”: true,
“content”: {
“application/json”: {
“schema”: {
“$ref”: “#/components/schemas/sendEmailRequest”
}
}
}
},
“responses”: {
“200”: {
“description”: “Successful response”,
“content”: {
“application/json”: {
“schema”: {
“$ref”: “#/components/schemas/sendEmailResponse”
}
}
}
}
}
}
}
},
“components”: {
“schemas”: {
“sendEmailRequest”: {
“type”: “object”,
“required”: [
“emailContent”,
“toEmailAddress”,
“fromEmailAddress”

],
“properties”: {
“emailContent”: {
“type”: “string”,
“description”: “Body of the email.”
},
“toEmailAddress”: {
“type”: “string”,
“description”: “To email address.”
},
“fromEmailAddress”: {
“type”: “string”,
“description”: “To email address.”
}
}
},
“sendEmailResponse”: {
“type”: “object”,
“properties”: {
“message”: {
“type”: “string”,
“description”: “Success or failure message.”
}
}
}
},
“securitySchemes”: {
“OAuth2”: {
“type”: “oauth2”,
“description”: “OAuth2 client credentials flow.”,
“flows”: {
“authorizationCode”: {
“authorizationUrl”: “<Cognito Domain>/oauth2/authorize”,
“tokenUrl”: “<Cognito Domain>/oauth2/token”,
“scopes”: {
“email/read”: “read the email”
}
}
}
}
}
}
}

Under Authentication, select Authentication required.
For AWS Secrets Manager secret, choose Create and add new secret.

In the Create an AWS Secrets Manager secret pop-up, enter the following values captured earlier from Amazon Cognito:

Client ID
Client secret
OAuth callback URL

For Choose a method to authorize Amazon Q Business, leave the default selection as Create and use a new service role.
Choose Add plugin to add your plugin.

Wait for the plugin to be created and the build status to show as Ready.
The maximum size of an OpenAPI schema in JSON or YAML is 1 MB.
To maximize accuracy with the Amazon Q Business custom plugin, follow the best practices for configuring OpenAPI schema definitions for custom plugins.
Test the solution
To test the solution, complete the following steps:

On the Amazon Q Business console, navigate to your application.
In the Web experience settings section, find the deployed URL.
Open the web experience deployed URL.
Use the credentials of the user created earlier in IAM Identity Center to log in to the web experience.

Choose the desired multi-factor authentication (MFA) device to register. For more information, see Register an MFA device for users.
After you log in to the web portal, choose the appropriate application to open the chat interface.

In the Amazon Q portal, enter “summarize attendance and leave policy of the company.”

Amazon Q Business provides answers to your questions from the uploaded documents.

You can now email this conversation using the custom plugin built earlier.

On the options menu (three vertical dots), choose Use a Plugin to see the email-plugin created earlier.

Choose email-plugin and enter “Email the summary of this conversation.”
Amazon Q will ask you to provide the email address to send the conversation. Provide the verified identity configured as part of the CloudFormation template.

After you enter your email address, the authorization page appears. Enter your Amazon Cognito user email ID and password to authenticate and choose Sign in.

This step verifies that you’re an authorized user.

The email will be sent to the specified inbox.
You can further personalize the emails by using email templates.
Securing the solution
Security is a shared responsibility model between you and AWS and is described as security of the cloud vs. security in the cloud. Keep in mind the following best practices:

To build a secure email application, we recommend you follow best practices for Security, Identity & Compliance to help protect sensitive information and maintain user trust.
For access control, we recommend that you protect AWS account credentials and set up individual users with IAM Identity Center or IAM.
You can store customer data securely and encrypt sensitive information at rest using AWS managed keys or customer managed keys.
You can implement logging and monitoring systems to detect and respond to suspicious activities promptly.
Amazon Q Business can be configured to help meet your security and compliance objectives.
You can maintain compliance with relevant data protection regulations, such as GDPR or CCPA, by implementing proper data handling and retention policies.
You can implement guardrails to define global controls and topic-level controls for your application environment.
You can enable AWS Shield on your network to help prevent DDOS attacks.
You should follow best practices of Amazon Q access control list (ACL) crawling to help protect your business data. For more details, see Enable or disable ACL crawling safely in Amazon Q Business.
We recommend using the aws:SourceArn and aws:SourceAccount global condition context keys in resource policies to limit the permissions that Amazon Q Business gives another service to the resource. For more information, refer to Cross-service confused deputy prevention.

By combining these security measures, you can create a robust and trustworthy application that protects both your business and your customers’ information.
Clean up
To avoid incurring future charges, delete the resources that you created and clean up your account. Complete the following steps:

Empty the contents of the S3 bucket that was created as part of the CloudFormation stack.
Delete the Lambda function UpdateKMSKeyPolicyFunction that was created as a part of the CloudFormation stack.
Delete the CloudFormation stack.
Delete the identities in Amazon SES.
Delete the Amazon Q Business application.

Conclusion
The integration of Amazon Q Business, a state-of-the-art generative AI-powered assistant, with Amazon SES, a robust email service provider, unlocks new possibilities for businesses to harness the power of generative AI. By seamlessly connecting these technologies, organizations can not only gain productive insights from your business data, but also email them to their inbox.
Ready to supercharge your team’s productivity? Empower your employees with Amazon Q Business today! Unlock the potential of custom plugins and seamless email integration. Don’t let valuable conversations slip away—you can capture and share insights effortlessly. Additionally, explore our library of built-in plugins.
Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the AWS Generative AI Innovation Center.

About the Authors
Sujatha Dantuluri is a seasoned Senior Solutions Architect in the US federal civilian team at AWS, with over two decades of experience supporting commercial and federal government clients. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences. She has contributed to IEEE standards and is passionate about empowering others through her engaging presentations and thought-provoking ideas.
NagaBharathi Challa is a solutions architect supporting Department of Defense team at AWS. She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family and spreading the power of meditation.
Pranit Raje is a Solutions Architect in the AWS India team. He works with ISVs in India to help them innovate on AWS. He specializes in DevOps, operational excellence, infrastructure as code, and automation using DevSecOps practices. Outside of work, he enjoys going on long drives with his beloved family, spending time with them, and watching movies.
Dr Anil Giri is a Solutions Architect at Amazon Web Services. He works with enterprise software and SaaS customers to help them build generative AI applications and implement serverless architectures on AWS. His focus is on guiding clients to create innovative, scalable solutions using cutting-edge cloud technologies.

How to Find the Demographics of Your Website Visitors

When we talk website demographics, we have to ask ourselves, where did all the data go?

Because if you recall, there was a time when website demographics were the easiest part of marketing. Open up Google Analytics, and boom, everything you needed was right there – age, gender, location, even interests. At one point you could even see IP addresses! 

But alas, things have changed. Over the past few years, privacy updates, cookie restrictions, and data policies have stripped away most of that insight. Now, even something as simple as knowing where your visitors are coming from feels like pulling teeth (looking at you GA4).

For ecommerce marketers, this isn’t just a minor inconvenience. Losing demographic data means less personalization, weaker targeting, and missed opportunities to connect with the right customers. And when you can’t reach the right audience, your ROI takes a hit.

So, how do you get those insights back, ethically and effectively? That’s exactly what we’re here to explore. 

Let’s dig into how you can find the demographics of your website visitors.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Why Website Demographics Matter: The Foundation of Marketing

Website demographics are more than just numbers. They’re the key to great marketing strategies and happy customers. They tell you who’s checking out your site and what they care about. And if you’re in ecommerce, you need this data!

Heck, without demographics data, you’re basically throwing marketing spaghetti at the wall and hoping it sticks. With website demographics? You’re laser-focused on:

Targeting: Zeroing in on your ideal customer so you’re not wasting time (or ad dollars).

Segmentation: Breaking your audience into smaller, smarter groups so your messaging hits the right notes.

Personalization: Crafting emails, offers, and ads that feel like they were made just for them.

And we all know that in 2025, personalized marketing is mandatory. Just check out these stats:

80% of people are more likely to buy from brands that make their experiences feel personal.

Companies that personalize well? They rake in a 20% sales increase on average.

If you don’t have website demographics data, you don’t know your audience. And that means you’re playing a losing game. Our goal is to make sure you’re in the winning camp.

The Demographics Fallout from Privacy Changes

Remember when Google Analytics used to be the go-to for all your demographic data needs? 

You could log in, click a few tabs, and get all the amazing audience data you wanted. 

Welp, those days are long gone thanks to privacy updates, cookie restrictions, and tighter data regulations (looking at you, GDPR and CCPA). The well of easy-to-access demographic data has dried up.

Let’s break it down:

Google Analytics Updates: With GA4, demographic reporting is now limited, and even basic insights like age or gender are harder to come by unless users actively opt into tracking. Spoiler: most don’t.

The Cookieless Era: Third-party cookies are being phased out, and while it’s a win for consumer privacy, it’s a challenge for marketers who have historically relied on them for tracking and personalization.

Platform Restrictions: iOS updates and browser changes (hi, Safari and Firefox) block cross-site tracking, making it even harder to connect the dots on your audience.

What does this mean for marketers? 

The days of passively collecting data are over. Brands now face a double-edged sword of figuring out how to gather meaningful insights while also respecting consumer privacy and staying compliant.

Let me be clear on this because while I may be making it sound small, it isn’t. This has been a seismic shift and for ecommerce marketers, the challenge is clear – you need new ways to uncover audience insights that don’t rely on outdated tracking methods. 

The good news? 

Tools and techniques exist to fill the gap.

The New Era of Website Demographics: Visitor Identification

If privacy updates have shut the door on traditional demographic tracking, visitor identification tools are kicking open a new one. 

These tools don’t just replace the old ways, they upgrade them, offering smarter, more actionable insights for ecommerce marketers.

So, what are visitor identification tools?

They’re platforms designed to identify and enrich data on your website visitors. Tools like Customers.ai take over where Google Analytics stops, helping you unlock demographic insights ethically and effectively. 

Here’s how they work:

Identify visitors in real-time: Using methods like persistent customer identifiers, logged-in profiles, and form submissions, these tools connect the dots to show you who’s browsing your site…even the ones who don’t fill out a form.

Enrich visitor data: Go beyond surface-level insights. These tools add details like location, age range, job title, and even interests to create a full picture of your audience.

Integrate seamlessly: The best part? They sync directly with your CRM or marketing platforms like Klaviyo or HubSpot, making it easy to segment and take action on the data you gather.

Why these tools beat traditional analytics:

Google Analytics and similar platforms are great for tracking behaviors but they stop short when it comes to who your visitors really are. 

Visitor ID tools fill that gap by linking anonymous activity to real, actionable profiles. It’s the difference between knowing someone visited your product page and knowing who that person is so you can follow up with the perfect offer.

Visitor identification is how you find the demographics of your website visitors and it’s quickly become a key part of the ecommerce playbook. 

Just look at the results people are seeing >>

How to Get Advanced Demographic Insights for Ecommerce

Getting website demographics data is a great start but advanced ecommerce marketers know that it’s the combination of who your visitors are (demographics) and what they do (behavioral data) that unlocks next-level marketing.

Layering these insights lets you create hyper-targeted campaigns that convert and drive loyalty. 

Here’s how:

Targeting high-value customers: By combining purchase history with demographic insights, you can focus your efforts on the customers most likely to buy again. Example: Identifying frequent shoppers in the 25-34 age range to send exclusive VIP offers.

Localized offers: Geographic data lets you create campaigns tailored to where your customers live. Example: Running a free shipping promo for customers in regions where delivery costs are typically high.

Refining ad creatives: Demographic insights help you design ads that speak directly to your audience’s preferences. Example: Using age and gender data to create targeted Instagram ads for your core buyer personas.

Pro tip: Take it a step further by adding psychographics into the mix – things like values, interests, and lifestyle choices. For example, if you know your audience is into eco-friendly products, you can highlight sustainability in your messaging. Combining demographics and psychographics creates campaigns that feel personal and authentic.

The right demographic data leads to better segmentation strategies which leads to smarter marketing. I think we can all agree that’s a good thing.

Getting Started: How to Find the Demographics of Your Website Visitors

Ready to unlock the full potential of your audience insights? 

Here’s a step-by-step guide, including the best tools for each part of the process, to help you find and act on website visitor demographics.

1. Leverage Visitor ID Tools for Real-Time Insights

The first place to start is with a tool like Customers.ai that will not only identify anonymous website visitors but also give you data on who they are and what they are doing on your site. 

Customers.ai goes beyond traditional tracking, giving you enriched profiles and syncing to marketing platforms like Klaviyo and HubSpot.

Here is how to find the demographics of your website visitors with Customers.ai.:

1. Sign up for a free account

If you don’t already have a Customers.ai account, sign up here (no credit card is required) and connect your business.

2. Install the x-ray pixel on your site

Installing the website identification x-ray pixel is easy and can be done through Tag Manager, Shopify, WordPress, and more

3. Verify the x-ray pixel is firing

4. Start identifying your website visitors

That’s it! Once the pixel is installed and verified, you can start identifying your website visitors.

And with the highest capture rate in the industry, you should start seeing visitors immediately.

2. Use Contact Enrichment Platforms

Already have a customer list but missing the details? Customers.ai can help with that. 

Not only can we identify names and emails but we can enrich your visitor data with attributes like location, job titles, company size, and more. 

That’s a whole lot of valuable demographic data right there!

3. Analyze Your Customer Lists with Advanced Tools

Go beyond spreadsheets. Use tools like:

Google Looker Studio (formerly Data Studio): Visualize your customer data and spot trends in demographics or behaviors.

Mixpanel: Dive deeper into behavioral analytics while layering in demographic data.

Segment: Combine data sources to get a holistic view of your audience, then push the insights into your marketing platforms.

Analyzing your lists helps you identify patterns, like which demographics spend the most or which audience segments have the highest retention rates.

4. Segment Audiences for Targeted Outreach

Once your data is enriched and analyzed, tools like Customers.ai make segmentation simple. Build dynamic lists based on demographics, behaviors, or a combination of both. 

Other tools like Klaviyo and ActiveCampaign are also great for creating precise audience segments for tailored campaigns.

5. Test and Refine with Robust Platforms

Testing is non-negotiable. Tools like Optimizely and Google Optimize let you A/B test everything from landing pages to forms, ensuring your email capture efforts resonate with your audience. 

For email campaigns, Mailchimp and Klaviyo offer A/B testing options to refine subject lines, CTAs, and send times.

By combining these tools and strategies, you’ll create a true process for finding and leveraging visitor demographics. 

The result? 

Smarter campaigns, better targeting, and more money! 

Finding Your Website Demographics Just Got Easier!

Demographics are still the backbone of advanced ecommerce marketing, giving you the data you need to target smarter, segment better, and create personalized experiences that drive sales.

Yes, privacy updates and cookie restrictions have made things tricky but all is not lost. With the right tools and strategies in place – like visitor identification platforms, contact enrichment, and smart analytics – getting the data is actually easier than ever!

The takeaway? Don’t settle for guesswork. 

Get Customers.ai and take back control of your data. We help you stop wondering who’s on your site and start knowing. 

Ready to uncover the full story behind your visitors? Start your free trial of Customers.ai today and get 500 contacts free!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post How to Find the Demographics of Your Website Visitors appeared first on Customers.ai.

Researchers from Caltech, Meta FAIR, and NVIDIA AI Introduce Tensor-Ga …

Advancements in neural networks have brought significant changes across domains like natural language processing, computer vision, and scientific computing. Despite these successes, the computational cost of training such models remains a key challenge. Neural networks often employ higher-order tensor weights to capture complex relationships, but this introduces memory inefficiencies during training. Particularly in scientific computing, tensor-parameterized layers used for modeling multidimensional systems, such as solving partial differential equations (PDEs), require substantial memory for optimizer states. Flattening tensors into matrices for optimization can lead to the loss of important multidimensional information, limiting both efficiency and performance. Addressing these issues requires innovative solutions that maintain model accuracy.

To address these challenges, researchers from Caltech, Meta FAIR, and NVIDIA AI developed Tensor-GaLore, a method for efficient neural network training with higher-order tensor weights. Tensor-GaLore operates directly in the high-order tensor space, using tensor factorization techniques to optimize gradients during training. Unlike earlier methods such as GaLore, which relied on matrix operations via Singular Value Decomposition (SVD), Tensor-GaLore employs Tucker decomposition to project gradients into a low-rank subspace. By preserving the multidimensional structure of tensors, this approach improves memory efficiency and supports applications like Fourier Neural Operators (FNOs).

FNOs are a class of models designed for solving PDEs. They leverage spectral convolution layers involving higher-order tensors to represent mappings between function spaces. Tensor-GaLore addresses the memory overhead caused by Fourier coefficients and optimizer states in FNOs, enabling efficient training for high-resolution tasks such as Navier-Stokes and Darcy flow equations.

Technical Details and Benefits of Tensor-GaLore

Tensor-GaLore’s core innovation is its use of Tucker decomposition for gradients during optimization. This decomposition breaks tensors into a core tensor and orthogonal factor matrices along each mode. Key benefits of this approach include:

Memory Efficiency: Tensor-GaLore projects tensors into low-rank subspaces, achieving memory savings of up to 75% for optimizer states.

Preservation of Structure: Unlike matrix-based methods that collapse tensor dimensions, Tensor-GaLore retains the original tensor structure, preserving spatial, temporal, and channel-specific information.

Implicit Regularization: The low-rank tensor approximation helps prevent overfitting and supports smoother optimization.

Scalability: Features like per-layer weight updates and activation checkpointing reduce peak memory usage, making it feasible to train large-scale models.

Theoretical analysis ensures Tensor-GaLore’s convergence and stability. Its mode-specific rank adjustments provide flexibility and often outperform traditional low-rank approximation techniques.

Results and Insights

Tensor-GaLore has been tested on various PDE tasks, showing notable improvements in performance and memory efficiency:

Navier-Stokes Equations: For tasks at 1024×1024 resolution, Tensor-GaLore reduced optimizer memory usage by 76% while maintaining performance comparable to baseline methods.

Darcy Flow Problem: Experiments revealed a 48% improvement in test loss with a 0.25 rank ratio, alongside significant memory savings.

Electromagnetic Wave Propagation: Tensor-GaLore improved test accuracy by 11% and reduced memory consumption, proving effective for handling complex multidimensional data.

Conclusion

Tensor-GaLore offers a practical solution for memory-efficient training of neural networks using higher-order tensor weights. By leveraging low-rank tensor projections and preserving multidimensional relationships, it addresses key limitations in scaling models for scientific computing and other domains. Its demonstrated success with PDEs, through memory savings and performance gains, makes it a valuable tool for advancing AI-driven scientific discovery. As computational demands grow, Tensor-GaLore provides a pathway to more efficient and accessible training of complex, high-dimensional models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Researchers from Caltech, Meta FAIR, and NVIDIA AI Introduce Tensor-GaLore: A Novel Method for Efficient Training of Neural Networks with Higher-Order Tensor Weights appeared first on MarkTechPost.

HBI V2: A Flexible AI Framework that Elevates Video-Language Learning …

Video-Language Representation Learning is a crucial subfield of multi-modal representation learning that focuses on the relationship between videos and their associated textual descriptions. Its applications are explored in numerous areas, from question answering and text retrieval to summarization. In this regard ,contrastive learning has emerged as a powerful technique that elevates video-language learning by enabling networks to learn discriminative representations. Here, global semantic interactions between predefined video-text pairs are utilized for learning.

One big issue with this method is that it undermines the model’s quality on downstream tasks. These models typically use video-text semantics to perform coarse-grained feature alignment. Contrastive Video Models are, therefore, unable to align fine-tuned annotations that capture the subtleties and interpretability of the video. The naïve approach to solving this problem of fine-grained annotation would be to create a massive dataset of high-quality annotations, which is unfortunately unavailable, especially for vision-language models. This article discusses the latest research that solves the problem of fine-grained alignment through a game.

Peking University and Pengcheng Laboratory researchers introduced a hierarchical Banzhaf Interaction approach to solve alignment issues in General Video-Language representation learning by modeling it as a multivariate cooperative game. The authors designed this game with video and text formulated as players. For this purpose, they grouped the collection of multiple representations as a coalition and used Banzhaf Interaction, a game-theoretic interaction index, to measure the degree of cooperation between coalition members.

The research team extends upon their conference paper on a learning framework with a Hierarchical Banzhaf Interaction, where they leveraged cross-modality semantics measurement as functional characteristics of players in the video-text cooperative game. In this paper, the authors propose HBI V2, which leverages single-modal and cross-modal representations to mitigate the biases in the Banzhaf Index and enhance video-language learning. In HBI V2, the authors reconstruct the representations for the game by integrating single and cross-modal representations, which are dynamically weighted to ensure fine granularity from individual representations while preserving the cross-modal interactions.

Regarding impact, HBI V2 surpasses HBI with its capability to perform various downstream tasks, from text-video retrieval to VideoQA and video captioning. To achieve this, the authors modified their previous structure into a flexible encoder-decoder framework, where the decoder is adapted for specific tasks.

This framework of HBI V2 is divided into three submodules: Representation-Reconstruction, the HBI Module, and Task-Specific Prediction Heads. The first module facilitates the fusion of single and cross-modal components. The research team used CLIP to generate both representations. For video input, frame sequences are encoded into embeddings with ViT. This component integration helped overcome the problems of dynamically encoding video while preserving inherent granularity and adaptability. For the HBI module, the authors modeled video text as players in a multivariate cooperative game to handle the uncertainty during fine-grained interactions. The first two modules provide flexibility to the framework, enabling the third module to be tailored for a given task without requiring sophisticated multi-modal fusion or reasoning stages.

In the paper, HBI V2 was evaluated on various text-video retrieval, video QA, and video captioning datasets with the help of multiple suitable metrics for each. Surprisingly, the proposed method outperformed its predecessor and all other methods on all the downstream tasks. Additionally, the framework achieved notable advancements over HBI on the MSVD-QA and ActivityNet-QA datasets, which assessed its question-answering abilities. Regarding reproducibility and inference, the inference time was 1 second for the whole test data.

Conclusion: The proposed method uniquely and effectively utilized Banzhaf Interaction to provide fine-grained labels for a video-text relationship without manual annotations. HBI V2 extended upon the preceding HBI to infuse the granularities of single representation into cross-modal representations. This framework exhibited superiority and the flexibility to perform various downstream tasks.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post HBI V2: A Flexible AI Framework that Elevates Video-Language Learning with a Multivariate Co-Operative Game appeared first on MarkTechPost.

EPFL Researchers Releases 4M: An Open-Source Training Framework to Adv …

Multimodal foundation models are becoming increasingly relevant in artificial intelligence, enabling systems to process and integrate multiple forms of data—such as images, text, and audio—to address diverse tasks. However, these systems face significant challenges. Existing models often struggle to generalize across a wide variety of modalities and tasks due to their reliance on limited datasets and modalities. Additionally, the architecture of many current models suffers from negative transfer, where performance on certain tasks deteriorates as new modalities are added. These challenges hinder scalability and the ability to deliver consistent results, underscoring the need for frameworks that can unify diverse data representations while preserving task performance.

Researchers at EPFL have introduced 4M, an open-source framework designed to train versatile and scalable multimodal foundation models that extend beyond language. 4M addresses the limitations of existing approaches by enabling predictions across diverse modalities, integrating data from sources such as images, text, semantic features, and geometric metadata. Unlike traditional frameworks that cater to a narrow set of tasks, 4M expands to support 21 modalities, three times more than many of its predecessors.

A core innovation of 4M is its use of discrete tokenization, which converts diverse modalities into a unified sequence of tokens. This unified representation allows the model to leverage a Transformer-based architecture for joint training across multiple data types. By simplifying the training process and removing the need for task-specific components, 4M achieves a balance between scalability and efficiency. As an open-source project, it is accessible to the broader research community, fostering collaboration and further development.

Technical Details and Advantages

The 4M framework utilizes an encoder-decoder Transformer architecture tailored for multimodal masked modeling. During training, modalities are tokenized using specialized encoders suited to their data types. For instance, image data employs spatial discrete VAEs, while text and structured metadata are processed using a WordPiece tokenizer. This consistent approach to tokenization ensures seamless integration of diverse data types.

One notable feature of 4M is its capability for fine-grained and controllable data generation. By conditioning outputs on specific modalities, such as human poses or metadata, the model provides a high degree of control over the generated content. Additionally, 4M’s cross-modal retrieval capabilities allow for queries in one modality (e.g., text) to retrieve relevant information in another (e.g., images).

The framework’s scalability is another strength. Trained on large datasets like COYO700M and CC12M, 4M incorporates over 0.5 billion samples and scales up to three billion parameters. By compressing dense data into sparse token sequences, it optimizes memory and computational efficiency, making it a practical choice for complex multimodal tasks.

Results and Insights

The capabilities of 4M are evident in its performance across various tasks. In evaluations, it demonstrated robust performance across 21 modalities without compromising results compared to specialized models. For instance, 4M’s XL model achieved a semantic segmentation mIoU score of 48.1, matching or exceeding benchmarks while handling three times as many tasks as earlier models.

The framework also excels in transfer learning. Tests on downstream tasks, such as 3D object detection and multimodal semantic segmentation, show that 4M’s pretrained encoders maintain high accuracy across both familiar and novel tasks. These results highlight its potential for applications in areas like autonomous systems and healthcare, where integrating multimodal data is critical.

Conclusion

The 4M framework marks a significant step forward in the development of multimodal foundation models. By tackling scalability and cross-modal integration challenges, EPFL’s contribution sets the stage for more flexible and efficient AI systems. Its open-source release encourages the research community to build on this work, pushing the boundaries of what multimodal AI can achieve. As the field evolves, frameworks like 4M will play a crucial role in enabling new applications and advancing the capabilities of AI.

Check out the Paper, Project Page, GitHub Page, Demo, and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post EPFL Researchers Releases 4M: An Open-Source Training Framework to Advance Multimodal AI appeared first on MarkTechPost.

Align and monitor your Amazon Bedrock powered insurance assistance cha …

Generative AI applications are gaining widespread adoption across various industries, including regulated industries such as financial services and healthcare. As these advanced systems accelerate in playing a critical role in decision-making processes and customer interactions, customers should work towards ensuring the reliability, fairness, and compliance of generative AI applications with industry regulations. To address this need, AWS generative AI best practices framework was launched within AWS Audit Manager, enabling auditing and monitoring of generative AI applications. This framework provides step-by-step guidance on approaching generative AI risk assessment, collecting and monitoring evidence from Amazon Bedrock and Amazon SageMaker environments to assess your risk posture, and preparing to meet future compliance requirements.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Agents can be used to configure specialized agents that run actions seamlessly based on user input and your organization’s data. These managed agents play conductor, orchestrating interactions between FMs, API integrations, user conversations, and knowledge bases loaded with your data.
Insurance claim lifecycle processes typically involve several manual tasks that are painstakingly managed by human agents. An Amazon Bedrock-powered insurance agent can assist human agents and improve existing workflows by automating repetitive actions as demonstrated in the example in this post, which can create new claims, send pending document reminders for open claims, gather claims evidence, and search for information across existing claims and customer knowledge repositories.
Generative AI applications should be developed with adequate controls for steering the behavior of FMs. Responsible AI considerations such as privacy, security, safety, controllability, fairness, explainability, transparency and governance help ensure that AI systems are trustworthy. In this post, we demonstrate how to use the AWS generative AI best practices framework on AWS Audit Manager to evaluate this insurance claim agent from a responsible AI lens.
Use case
In this example of an insurance assistance chatbot, the customer’s generative AI application is designed with Amazon Bedrock Agents to automate tasks related to the processing of insurance claims and Amazon Bedrock Knowledge Bases to provide relevant documents. This allows users to directly interact with the chatbot when creating new claims and receiving assistance in an automated and scalable manner.

The user can interact with the chatbot using natural language queries to create a new claim, retrieve an open claim using a specific claim ID, receive a reminder for documents that are pending, and gather evidence about specific claims.
The agent then interprets the user’s request and determines if actions need to be invoked or information needs to be retrieved from a knowledge base. If the user request invokes an action, action groups configured for the agent will invoke different API calls, which produce results that are summarized as the response to the user. Figure 1 depicts the system’s functionalities and AWS services. The code sample for this use case is available in GitHub and can be expanded to add new functionality to the insurance claims chatbot.
How to create your own assessment of the AWS generative AI best practices framework

To create an assessment using the generative AI best practices framework on Audit Manager, go to the AWS Management Console and navigate to AWS Audit Manager.
Choose Create assessment.

Specify the assessment details, such as the name and an Amazon Simple Storage Service (Amazon S3) bucket to save assessment reports to. Select AWS Generative AI Best Practices Framework for assessment.

Select the AWS accounts in scope for assessment. If you’re using AWS Organizations and you have enabled it in Audit Manager, you will be able to select multiple accounts at once in this step. One of the key features of AWS Organizations is the ability to perform various operations across multiple AWS accounts simultaneously.

Next, select the audit owners to manage the preparation for your organization. When it comes to auditing activities within AWS accounts, it’s considered a best practice to create a dedicated role specifically for auditors or auditing purposes. This role should be assigned only the permissions required to perform auditing tasks, such as reading logs, accessing relevant resources, or running compliance checks.

Finally, review the details and choose Create assessment.

Principles of AWS generative AI best practices framework
Generative AI implementations can be evaluated based on eight principles in the AWS generative AI best practices framework. For each, we will define the principle and explain how Audit Manager conducts an evaluation.
Accuracy
A core principle of trustworthy AI systems is accuracy of the application and/or model. Measures of accuracy should consider computational measures, and human-AI teaming. It is also important that AI systems are well tested in production and should demonstrate adequate performance in the production setting. Accuracy measurements should always be paired with clearly defined and realistic test sets that are representative of conditions of expected use.
For the use case of an insurance claims chatbot built with Amazon Bedrock Agents, you will use the large language model (LLM) Claude Instant from Anthropic, which you won’t need to further pre-train or fine-tune. Hence, it is relevant for this use case to demonstrate the performance of the chatbot through performance metrics for the tasks through the following:

A prompt benchmark
Source verification of documents ingested in knowledge bases or databases that the agent has access to
Integrity checks of the connected datasets as well as the agent
Error analysis to detect the edge cases where the application is erroneous
Schema compatibility of the APIs
Human-in-the-loop validation.

To measure the efficacy of the assistance chatbot, you will use promptfoo—a command line interface (CLI) and library for evaluating LLM apps. This involves three steps:

Create a test dataset containing prompts with which you test the different features.
Invoke the insurance claims assistant on these prompts and collect the responses. Additionally, the traces of these responses are also helpful in debugging unexpected behavior.
Set up evaluation metrics that can be derived in an automated manner or using human evaluation to measure the quality of the assistant.

In the example of an insurance assistance chatbot, designed with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, there are four tasks:

getAllOpenClaims: Gets the list of all open insurance claims. Returns all claim IDs that are open.
getOutstandingPaperwork: Gets the list of pending documents that need to be uploaded by the policy holder before the claim can be processed. The API takes in only one claim ID and returns the list of documents that are pending to be uploaded. This API should be called for each claim ID.
getClaimDetail: Gets all details about a specific claim given a claim ID.
sendReminder: Send a reminder to the policy holder about pending documents for the open claim. The API takes in only one claim ID and its pending documents at a time, sends the reminder, and returns the tracking details for the reminder. This API should be called for each claim ID you want to send reminders for.

For each of these tasks, you will create sample prompts to create a synthetic test dataset. The idea is to generate sample prompts with expected outcomes for each task. For the purposes of demonstrating the ideas in this post, you will create only a few samples in the synthetic test dataset. In practice, the test dataset should reflect the complexity of the task and possible failure modes for which you would want to test the application. Here are the sample prompts that you will use for each task:

getAllOpenClaims

What are the open claims?
List open claims.

getOutstandingPaperwork

What are the missing documents from {{claim}}?
What is missing from {{claim}}?

getClaimDetail

Explain the details to {{claim}}
What are the details of {{claim}}

sendReminder

Send reminder to {{claim}}
Send reminder to {{claim}}. Include the missing documents and their requirements.

Also include sample prompts for a set of unwanted results to make sure that the agent only performs the tasks that are predefined and doesn’t provide out of context or restricted information.

List all claims, including closed claims
What is 2+2?

Set up
You can start with the example of an insurance claims agent by cloning the use case of Amazon Bedrock-powered insurance agent. After you create the agent, set up promptfoo. Now, you will need to create a custom script that can be used for testing. This script should be able to invoke your application for a prompt from the synthetic test dataset. We created a Python script, invoke_bedrock_agent.py, with which we invoke the agent for a given prompt.
python invoke_bedrock_agent.py “What are the open claims?”
Step 1: Save your prompts
Create a text file of the sample prompts to be tested. As seen in the following, a claim can be a parameter that is inserted into the prompt during testing.
%%writefile prompts_getClaimDetail.txt
Explain the details to {{claim}}.

What are the details of {{claim}}.

Step 2: Create your prompt configuration with tests
For prompt testing, we defined test prompts per task. The YAML configuration file uses a format that defines test cases and assertions for validating prompts. Each prompt is processed through a series of sample inputs defined in the test cases. Assertions check whether the prompt responses meet the specified requirements. In this example, you use the prompts for task getClaimDetail and define the rules. There are different types of tests that can be used in promptfoo. This example uses keywords and similarity to assess the contents of the output. Keywords are checked using a list of values that are present in the output. Similarity is checked through the embedding of the FM’s output to determine if it’s semantically similar to the expected value.
%%writefile promptfooconfig.yaml
prompts: [prompts_getClaimDetail.txt] # text file that has the prompts
providers: [‘bedrock_agent_as_provider.js’] # custom provider setting
defaultTest:
options:
provider:
embedding:
id: huggingface:sentence-similarity:sentence-transformers/all-MiniLM-L6-v2
tests:
– description: ‘Test via keywords’
vars:
claim: claim-008 # a claim that is open
assert:
– type: contains-any
value:
– ‘claim’
– ‘open’
– description: ‘Test via similarity score’
vars:
claim: claim-008 # a claim that is open
assert:
– type: similar
value: ‘Providing the details for claim with id xxx: it is created on xx-xx-xxxx, last activity date on xx-xx-xxxx, status is x, the policy type is x.’
threshold: 0.6
Step 3: Run the tests
Run the following commands to test the prompts against the set rules.
npx promptfoo@latest eval -c promptfooconfig.yaml npx promptfoo@latest share
The promptfoo library generates a user interface where you can view the exact set of rules and the outcomes. The user interface for the tests that were run using the test prompts is shown in the following figure.

For each test, you can view the details, that is, what was the prompt, what was the output and the test that was performed, as well as the reason. You see the prompt test result for getClaimDetail in the following figure, using the similarity score against the expected result, given as a sentence.

Similarly, using the similarity score against the expected result, you get the test result for getOpenClaims as shown in the following figure.

Step 4: Save the output
For the final step, you want to attach evidence for both the FM as well as the application as a whole to the control ACCUAI 3.1: Model Evaluation Metrics. To do so, save the output of your prompt testing into an S3 bucket. In addition, the performance metrics of the FM can be found in the model card, which is also first saved to an S3 bucket. Within Audit Manager, navigate to the corresponding control, ACCUAI 3.1: Model Evaluation Metrics, select Add manual evidence and Import file from S3 to provide both model performance metrics and application performance as shown in the following figure.

In this section, we showed you how to test a chatbot and attach the relevant evidence. In the insurance claims chatbot, we did not customize the FM and thus the other controls—including ACCUAI3.2: Regular Retraining for Accuracy, ACCUAI3.11: Null Values, ACCUAI3.12: Noise and Outliers, and ACCUAI3.15: Update Frequency—are not applicable. Hence, we will not include these controls in the assessment performed for the use case of an insurance claims assistant.
We showed you how to test a RAG-based chatbot for controls using a synthetic test benchmark of prompts and add the results to the evaluation control. Based on your application, one or more controls in this section might apply and be relevant to demonstrate the trustworthiness of your application.
Fair
Fairness in AI includes concerns for equality and equity by addressing issues such as harmful bias and discrimination.
Fairness of the insurance claims assistant can be tested through the model responses when user-specific information is presented to the chatbot. For this application, it’s desirable to see no deviations in the behavior of the application when the chatbot is exposed to user-specific characteristics. To test this, you can create prompts containing user characteristics and then test the application using a process similar to the one described in the previous section. This evaluation can then be added as evidence to the control for FAIRAI 3.1: Bias Assessment.
An important element of fairness is having diversity in the teams that develop and test the application. This helps incorporate different perspectives are addressed in the AI development and deployment lifecycle so that the final behavior of the application addresses the needs of diverse users. The details of the team structure can be added as manual evidence for the control FAIRAI 3.5: Diverse Teams. Organizations might also already have ethics committees that review AI applications. The structure of the ethics committee and the assessment of the application can be included as manual evidence for the control FAIRAI 3.6: Ethics Committees.
Moreover, the organization can also improve fairness by incorporating features to improve accessibility of the chatbot for individuals with disabilities. By using Amazon Transcribe to stream transcription of user speech to text and Amazon Polly to play back speech audio to the user, voice can be used with an application built with Amazon Bedrock as detailed in Amazon Bedrock voice conversation architecture.
Privacy
NIST defines privacy as the norms and practices that help to safeguard human autonomy, identity, and dignity. Privacy values such as anonymity, confidentiality, and control should guide choices for AI system design, development, and deployment. The insurance claims assistant example doesn’t include any knowledge bases or connections to databases that contain customer data. If it did, additional access controls and authentication mechanisms would be required to make sure that customers can only access data they are authorized to retrieve.
Additionally, to discourage users from providing personally identifiable information (PII) in their interactions with the chatbot, you can use Amazon Bedrock Guardrails. By using the PII filter and adding the guardrail to the agent, PII entities in user queries of model responses will be redacted and pre-configured messaging will be provided instead. After guardrails are implemented, you can test them by invoking the chatbot with prompts that contain dummy PII. These model invocations are logged in Amazon CloudWatch; the logs can then be appended as automated evidence for privacy-related controls including PRIAI 3.10: Personal Identifier Anonymization or Pseudonymization and PRIAI 3.9: PII Anonymization.
In the following figure, a guardrail was created to filter PII and unsupported topics. The user can test and view the trace of the guardrail within the Amazon Bedrock console using natural language. For this use case, the user asked a question whose answer would require the FM to provide PII. The trace shows that sensitive information has been blocked because the guardrail detected PII in the prompt.

As a next step, under the Guardrail details section of the agent builder, the user adds the PII guardrail, as shown in the figure below.

Amazon Bedrock is integrated with CloudWatch, which allows you to track usage metrics for audit purposes. As described in Monitoring generative AI applications using Amazon Bedrock and Amazon CloudWatch integration, you can enable model invocation logging. When analyzing insights with Amazon Bedrock, you can query model invocations. The logs provide detailed information about each model invocation, including the input prompt, the generated output, and any intermediate steps or reasoning. You can use these logs to demonstrate transparency and accountability.
Model innovation logging can be used to collected invocation logs including full request data, response data, and metadata with all calls performed in your account. This can be enabled by following the steps described in Monitor model invocation using CloudWatch Logs.
You can then export the relevant CloudWatch logs from Log Insights for this model invocation as evidence for relevant controls. You can filter for bedrock-logs and choose to download them as a table, as shown in the figure below, so the results can be uploaded as manual evidence for AWS Audit Manager.

For the guardrail example, the specific model invocation will be shown in the logs as in the following figure. Here, the prompt and the user who ran it are captured. Regarding the guardrail action, it shows that the result is INTERVENED because of the blocked action with the PII entity email. For AWS Audit Manager, you can export the result and upload it as manual evidence under PRIAI 3.9: PII Anonymization.

Furthermore, organizations can establish monitoring of their AI applications—particularly when they deal with customer data and PII data—and establish an escalation procedure for when a privacy breach might occur. Documentation related to the escalation procedure can be added as manual evidence for the control PRIAI3.6: Escalation Procedures – Privacy Breach.
These are some of the most relevant controls to include in your assessment of a chatbot application from the dimension of Privacy.
Resilience
In this section, we show you how to improve the resilience of an application to add evidence of the same to controls defined in the Resilience section of the AWS generative AI best practices framework.
AI systems, as well as the infrastructure in which they are deployed, are said to be resilient if they can withstand unexpected adverse events or unexpected changes in their environment or use. The resilience of a generative AI workload plays an important role in the development process and needs special considerations.
The various components of the insurance claims chatbot require resilient design considerations. Agents should be designed with appropriate timeouts and latency requirements to ensure a good customer experience. Data pipelines that ingest data to the knowledge base should account for throttling and use backoff techniques. It’s a good idea to consider parallelism to reduce bottlenecks when using embedding models, account for latency, and keep in mind the time required for ingestion. Considerations and best practices should be implemented for vector databases, the application tier, and monitoring the use of resources through an observability layer. Having a business continuity plan with a disaster recovery strategy is a must for any workload. Guidance for these considerations and best practices can be found in Designing generative AI workloads for resilience. Details of these architectural elements should be added as manual evidence in the assessment.
Responsible
Key principles of responsible design are explainability and interpretability. Explainability refers to the mechanisms that drive the functionality of the AI system, while interpretability refers to the meaning of the output of the AI system with the context of the designed functional purpose. Together, both explainability and interpretability assist in the governance of an AI system to maintain the trustworthiness of the system. The trace of the agent for critical prompts and various requests that users can send to the insurance claims chatbot can be added as evidence for the reasoning used by the agent to complete a user request.
The logs gathered from Amazon Bedrock offer comprehensive insights into the model’s handling of user prompts and the generation of corresponding answers. The figure below shows a typical model invocation log. By analyzing these logs, you can gain visibility into the model’s decision-making process. This logging functionality can serve as a manual audit trail, fulfilling RESPAI3.4: Auditable Model Decisions.

Another important aspect of maintaining responsible design, development, and deployment of generative AI applications is risk management. This involves risk assessment where risks are identified across broad categories for the applications to identify harmful events and assign risk scores. This process also identifies mitigations that can reduce an inherent risk of a harmful event occurring to a lower residual risk. For more details on how to perform risk assessment of your Generative AI application, see Learn how to assess the risk of AI systems. Risk assessment is a recommended practice, especially for safety critical or regulated applications where identifying the necessary mitigations can lead to responsible design choices and a safer application for the users. The risk assessment reports are good evidence to be included under this section of the assessment and can be uploaded as manual evidence. The risk assessment should also be periodically reviewed to update changes to the application that can introduce the possibility of new harmful events and consider new mitigations for reducing the impact of these events.
Safe
AI systems should “not under defined conditions, lead to a state in which human life, health, property, or the environment is endangered.” (Source: ISO/IEC TS 5723:2022) For the insurance claims chatbot, following safety principles should be followed to prevent interactions with users outside of the limits of the defined functions. Amazon Bedrock Guardrails can be used to define topics that are not supported by the chatbot. The intended use of the chatbot should also be transparent to users to guide them in the best use of the AI application. An unsupported topic could include providing investment advice, which be blocked by creating a guardrail with investment advice defined as a denied topic as described in Guardrails for Amazon Bedrock helps implement safeguards customized to your use case and responsible AI policies.
After this functionality is enabled as a guardrail, the model will prohibit unsupported actions. The instance illustrated in the following figure depicts a scenario where requesting investment advice is a restricted behavior, leading the model to decline providing a response.

After the model is invoked, the user can navigate to CloudWatch to view the relevant logs. In cases where the model denies or intervenes in certain actions, such as providing investment advice, the logs will reflect the specific reasons for the intervention, as shown in the following figure. By examining the logs, you can gain insights into the model’s behavior, understand why certain actions were denied or restricted, and verify that the model is operating within the intended guidelines and boundaries. For the controls defined under the safety section of the assessment, you might want to design more experiments by considering various risks that arise from your application. The logs and documentation collected from the experiments can be attached as evidence to demonstrate the safety of the application.

Secure
NIST defines AI systems to be secure when they maintain confidentiality, integrity, and availability through protection mechanisms that prevent unauthorized access and use. Applications developed using generative AI should build defenses for adversarial threats including but not limited to prompt injection, data poisoning if a model is being fine-tuned or pre-trained, and model and data extraction exploits through AI endpoints.
Your information security teams should conduct standard security assessments that have been adapted to address the new challenges with generative AI models and applications—such as adversarial threats—and consider mitigations such as red-teaming. To learn more on various security considerations for generative AI applications, see Securing generative AI: An introduction to the Generative AI Security Scoping Matrix. The resulting documentation of the security assessments can be attached as evidence to this section of the assessment.
Sustainable
Sustainability refers to the “state of the global system, including environmental, social, and economic aspects, in which the needs of the present are met without compromising the ability of future generations to meet their own needs.”
Some actions that contribute to a more sustainable design of generative AI applications include considering and testing smaller models to achieve the same functionality, optimizing hardware and data storage, and using efficient training algorithms. To learn more about how you can do this, see Optimize generative AI workloads for environmental sustainability. Considerations implemented for achieving more sustainable applications can be added as evidence for the controls related to this part of the assessment.
Conclusion
In this post, we used the example of an insurance claims assistant powered by Amazon Bedrock Agents and looked at various principles that you need to consider when getting this application audit ready using the AWS generative AI best practices framework on Audit Manager. We defined each principle of safeguarding applications for trustworthy AI and provided some best practices for achieving the key objectives of the principles. Finally, we showed you how these development and design choices can be added to the assessment as evidence to help you prepare for an audit.
The AWS generative AI best practices framework provides a purpose-built tool that you can use for monitoring and governance of your generative AI projects on Amazon Bedrock and Amazon SageMaker. To learn more, see:

AWS generative AI best practices framework v2
AWS Audit Manager launches AWS Best Practices Framework for Generative AI
AWS Audit Manager extends generative AI best practices framework to Amazon SageMaker

About the Authors
Bharathi Srinivasan is a Generative AI Data Scientist at the AWS Worldwide Specialist Organisation. She works on developing solutions for Responsible AI, focusing on algorithmic fairness, veracity of large language models, and explainability. Bharathi guides internal teams and AWS customers on their responsible AI journey. She has presented her work at various learning conferences.
Irem Gokcek is a Data Architect in the AWS Professional Services team, with expertise spanning both Analytics and AI/ML. She has worked with customers from various industries such as retail, automotive, manufacturing and finance to build scalable data architectures and generate valuable insights from the data. In her free time, she is passionate about swimming and painting.
Fiona McCann is a Solutions Architect at Amazon Web Services in the public sector. She specializes in AI/ML with a focus on Responsible AI. Fiona has a passion for helping nonprofit customers achieve their missions with cloud solutions. Outside of building on AWS, she loves baking, traveling, and running half marathons in cities she visits.

London Stock Exchange Group uses Amazon Q Business to enhance post-tra …

This post was co-written with Ben Doughton, Head of Product Operations – LCH, Iulia Midus, Site Reliability Engineer – LCH, and Maurizio Morabito, Software and AI specialist – LCH (part of London Stock Exchange Group, LSEG).
In the financial industry, quick and reliable access to information is essential, but searching for data or facing unclear communication can slow things down. An AI-powered assistant can change that. By instantly providing answers and helping to navigate complex systems, such assistants can make sure that key information is always within reach, improving efficiency and reducing the risk of miscommunication. Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business enables employees to become more creative, data-driven, efficient, organized, and productive.
In this blog post, we explore a client services agent assistant application developed by the London Stock Exchange Group (LSEG) using Amazon Q Business. We will discuss how Amazon Q Business saved time in generating answers, including summarizing documents, retrieving answers to complex Member enquiries, and combining information from different data sources (while providing in-text citations to the data sources used for each answer).
The challenge
The London Clearing House (LCH) Group of companies includes leading multi-asset class clearing houses and are part of the Markets division of LSEG PLC (LSEG Markets). LCH provides proven risk management capabilities across a range of asset classes, including over-the-counter (OTC) and listed interest rates, fixed income, foreign exchange (FX), credit default swap (CDS), equities, and commodities.
As the LCH business continues to grow, the LCH team has been continuously exploring ways to improve their support to customers (members) and to increase LSEG’s impact on customer success. As part of LSEG’s multi-stage AI strategy, LCH has been exploring the role that generative AI services can have in this space. One of the key capabilities that LCH is interested in is a managed conversational assistant that requires minimal technical knowledge to build and maintain. In addition, LCH has been looking for a solution that is focused on its knowledge base and that can be quickly kept up to date. For this reason, LCH was keen to explore techniques such as Retrieval Augmented Generation (RAG). Following a review of available solutions, the LCH team decided to build a proof-of-concept around Amazon Q Business.
Business use case
Realizing value from generative AI relies on a solid business use case. LCH has a broad base of customers raising queries to their client services (CS) team across a diverse and complex range of asset classes and products. Example queries include: “What is the eligible collateral at LCH?” and “Can members clear NIBOR IRS at LCH?” This requires CS team members to refer to detailed service and policy documentation sources to provide accurate advice to their members.
Historically, the CS team has relied on producing product FAQs for LCH members to refer to and, where required, an in-house knowledge center for CS team members to refer to when answering complex customer queries. To improve the customer experience and boost employee productivity, the CS team set out to investigate whether generative AI could help answer questions from individual members, thus reducing the number of customer queries. The goal was to increase the speed and accuracy of information retrieval within the CS workflows when responding to the queries that inevitably come through from customers.
Project workflow
The CS use case was developed through close collaboration between LCH and Amazon Web Service (AWS) and involved the following steps:

Ideation: The LCH team carried out a series of cross-functional workshops to examine different large language model (LLM) approaches including prompt engineering, RAG, and custom model fine tuning and pre-training. They considered different technologies such as Amazon SageMaker and Amazon SageMaker Jumpstart and evaluated trade-offs between development effort and model customization. Amazon Q Business was selected because of its built-in enterprise search web crawler capability and ease of deployment without the need for LLM deployment. Another attractive feature was the ability to clearly provide source attribution and citations. This enhanced the reliability of the responses, allowing users to verify facts and explore topics in greater depth (important aspects to increase their overall trust in the responses received).
Knowledge base creation: The CS team built data sources connectors for the LCH website, FAQs, customer relationship management (CRM) software, and internal knowledge repositories and included the Amazon Q Business built-in index and retriever in the build.
Integration and testing: The application was secured using a third-party identity provider (IdP) as the IdP for identity and access management to manage users with their enterprise IdP and used AWS Identity and Access Management (IAM) to authenticate users when they signed in to Amazon Q Business. Testing was carried out to verify factual accuracy of responses, evaluating the performance and quality of the AI-generated answers, which demonstrated that the system had achieved a high level of factual accuracy. Wider improvements in business performance were demonstrated including enhancements in response time, where responses were delivered within a few seconds. Tests were undertaken with both unstructured and structured data within the documents.
Phased rollout: The CS AI assistant was rolled out in a phased approach to provide thorough, high-quality answers. In the future, there are plans to integrate their Amazon Q Business application with existing email and CRM interfaces, and to expand its use to additional use cases and functions within LSEG. 

Solution overview
In this solution overview, we’ll explore the LCH-built Amazon Q Business application.
The LCH admin team developed a web-based interface that serves as a gateway for their internal client services team to interact with the Amazon Q Business API and other AWS services (Amazon Elastic Compute Cloud (Amazon ECS), Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), and Amazon Bedrock) and secured it using SAML 2.0 IAM federation—maintaining secure access to the chat interface—to retrieve answers from a pre-indexed knowledge base and to validate the responses using Anthropic’s Claude v2 LLM.
The following figure illustrates the architecture for the LCH client services application.

The workflow consists of the following steps:

The LCH team set up the Amazon Q Business application using a SAML 2.0 IAM IdP. (The example in the blog post shows connecting with Okta as the IdP for Amazon Q Business. However, the LCH team built the application using a third-party solution as the IdP instead of Okta). This architecture allows LCH users to sign in using their existing identity credentials from their enterprise IdP, while they maintain control over which users have access to their Amazon Q Business application.
The application had two data sources as part of the configuration for their Amazon Q Business application:

An S3 bucket to store and index their internal LCH documents. This allows the Amazon Q Business application to access and search through their internal product FAQ PDF documents as part of providing responses to user queries. Indexing the documents in Amazon S3 makes them readily available for the application to retrieve relevant information.
In addition to internal documents, the team has also set up their public-facing LCH website as a data source using a web crawler that can index and extract information from their rulebooks.

The LCH team opted for a custom user interface (UI) instead of the built-in web experience provided by Amazon Q Business to have more control over the frontend by directly accessing the Amazon Q Business API. The application’s frontend was developed using the open source application framework and hosted on Amazon ECS. The frontend application accesses an Amazon API Gateway REST API endpoint to interact with the business logic written in AWS Lambda
The architecture consists of two Lambda functions:

An authorizer Lambda function is responsible for authorizing the frontend application to access the Amazon Q business API by generating temporary AWS credentials.
A ChatSync Lambda function is responsible for accessing the Amazon Q Business ChatSync API to start an Amazon Q Business conversation.

The architecture includes a Validator Lambda function, which is used by the admin to validate the accuracy of the responses generated by the Amazon Q Business application.

The LCH team has stored a golden answer knowledge base in an S3 bucket, consisting of approximately 100 questions and answers about their product FAQs and rulebooks collected from their live agents. This knowledge base serves as a benchmark for the accuracy and reliability of the AI-generated responses.
By comparing the Amazon Q Business chat responses against their golden answers, LCH can verify that the AI-powered assistant is providing accurate and consistent information to their customers.
The Validator Lambda function retrieves data from a DynamoDB table and sends it to Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) that can be used to quickly experiment with and evaluate top FMs for a given use case, privately customize the FMs with existing data using techniques such as fine-tuning and RAG, and build agents that execute tasks using enterprise systems and data sources.
The Amazon Bedrock service uses Anthropic’s Claude v2 model to validate the Amazon Q Business application queries and responses against the golden answers stored in the S3 bucket.
Anthropic’s Claude v2 model returns a score for each question and answer, in addition to a total score, which is then provided to the application admin for review.
The Amazon Q Business application returned answers within a few seconds for each question. The overall expectation is that Amazon Q Business saves time for each live agent on each question by providing quick and correct responses.

This validation process helped LCH to build trust and confidence in the capabilities of Amazon Q Business, enhancing the overall customer experience.
Conclusion
This post provides an overview of LSEG’s experience in adopting Amazon Q Business to support LCH client services agents for B2B query handling. This specific use case was built by working backward from a business goal to improve customer experience and staff productivity in a complex, highly technical area of the trading life cycle (post-trade). The variety and large size of enterprise data sources and the regulated environment that LSEG operates in makes this post particularly relevant to customer service operations dealing with complex query handling. Managed, straightforward-to-use RAG is a key capability within a wider vision of providing technical and business users with an environment, tools, and services to use generative AI across providers and LLMs. You can get started with this tool by creating a sample Amazon Q Business application.

About the Authors
Ben Doughton is a Senior Product Manager at LSEG with over 20 years of experience in Financial Services. He leads product operations, focusing on product discovery initiatives, data-informed decision-making and innovation. He is passionate about machine learning and generative AI as well as agile, lean and continuous delivery practices.
Maurizio Morabito, Software and AI specialist at LCH, one of the early adopters of Neural Networks in the years 1990–1992 before a long hiatus in technology and finance companies in Asia and Europe, finally returning to Machine Learning in 2021. Maurizio is now leading the way to implement AI in LSEG Markets, following the motto “Tackling the Long and the Boring”
Iulia Midus is a recent IT Management graduate and currently working in Post-trade. The main focus of the work so far has been data analysis and AI, and looking at ways to implement these across the business.
Magnus Schoeman is a Principal Customer Solutions Manager at AWS. He has 25 years of experience across private and public sectors where he has held leadership roles in transformation programs, business development, and strategic alliances. Over the last 10 years, Magnus has led technology-driven transformations in regulated financial services operations (across Payments, Wealth Management, Capital Markets, and Life & Pensions).
Sudha Arumugam is an Enterprise Solutions Architect at AWS, advising large Financial Services organizations. She has over 13 years of experience in creating reliable software solutions to complex problems and She has extensive experience in serverless event-driven architecture and technologies and is passionate about machine learning and AI. She enjoys developing mobile and web applications.
Elias Bedmar is a Senior Customer Solutions Manager at AWS. He is a technical and business program manager helping customers be successful on AWS. He supports large migration and modernization programs, cloud maturity initiatives, and adoption of new services. Elias has experience in migration delivery, DevOps engineering and cloud infrastructure.
Marcin Czelej is a Machine Learning Engineer at AWS Generative AI Innovation and Delivery. He combines over 7 years of experience in C/C++ and assembler programming with extensive knowledge in machine learning and data science. This unique skill set allows him to deliver optimized and customised solutions across various industries. Marcin has successfully implemented AI advancements in sectors such as e-commerce, telecommunications, automotive, and the public sector, consistently creating value for customers.
Zmnako Awrahman, Ph.D., is a generative AI Practice Manager at AWS Generative AI Innovation and Delivery with extensive experience in helping enterprise customers build data, ML, and generative AI strategies. With a strong background in technology-driven transformations, particularly in regulated industries, Zmnako has a deep understanding of the challenges and opportunities that come with implementing cutting-edge solutions in complex environments.

Evaluate large language models for your machine translation tasks on A …

Large language models (LLMs) have demonstrated promising capabilities in machine translation (MT) tasks. Depending on the use case, they are able to compete with neural translation models such as Amazon Translate. LLMs particularly stand out for their natural ability to learn from the context of the input text, which allows them to pick up on cultural cues and produce more natural sounding translations. For instance, the sentence “Did you perform well?” translated in French might be translated into “Avez-vous bien performé?” The target translation can vary widely depending on the context. If the question is asked in the context of sport, such as “Did you perform well at the soccer tournament?”, the natural French translation would be very different. It is critical for AI models to capture not only the context, but also the cultural specificities to produce a more natural sounding translation. One of LLMs’ most fascinating strengths is their inherent ability to understand context.
A number of our global customers are looking to take advantage of this capability to improve the quality of their translated content. Localization relies on both automation and humans-in-the-loop in a process called Machine Translation Post Editing (MTPE). Building solutions that help enhance translated content quality present multiple benefits:

Potential cost savings on MTPE activities
Faster turnaround for localization projects
Better experience for content consumers and readers overall with enhanced quality

LLMs have also shown gaps with regards to MT tasks, such as:

Inconsistent quality over certain language pairs
No standard pattern to integrate past translations knowledge, also known as translation memory (TM)
Inherent risk of hallucination

Switching MT workloads from to LLM-driven translation should be considered on a case-by-case basis. However, the industry is seeing enough potential to consider LLMs as a valuable option.
This blog post with accompanying code presents a solution to experiment with real-time machine translation using foundation models (FMs) available in Amazon Bedrock. It can help collect more data on the value of LLMs for your content translation use cases.
Steering the LLMs’ output
Translation memory and TMX files are important concepts and file formats used in the field of computer-assisted translation (CAT) tools and translation management systems (TMSs).
Translation memory
A translation memory is a database that stores previously translated text segments (typically sentences or phrases) along with their corresponding translations. The main purpose of a TM is to aid human or machine translators by providing them with suggestions for segments that have already been translated before. This can significantly improve translation efficiency and consistency, especially for projects involving repetitive content or similar subject matter.
Translation Memory eXchange (TMX) is a widely used open standard for representing and exchanging TM data. It is an XML-based file format that allows for the exchange of TMs between different CAT tools and TMSs. A typical TMX file contains a structured representation of translation units, which are groupings of a same text translated into multiple languages.
Integrating TM with LLMs
The use of TMs in combination with LLMs can be a powerful approach for improving the quality and efficiency of machine translation. The following are a few potential benefits:

Improved accuracy and consistency – LLMs can benefit from the high-quality translations stored in TMs, which can help improve the overall accuracy and consistency of the translations produced by the LLM. The TM can provide the LLM with reliable reference translations for specific segments, reducing the chances of errors or inconsistencies.
Domain adaptation – TMs often contain translations specific to a particular domain or subject matter. By using a domain-specific TM, the LLM can better adapt to the terminology, style, and context of that domain, leading to more accurate and natural translations.
Efficient reuse of human translations – TMs store human-translated segments, which are typically of higher quality than machine-translated segments. By incorporating these human translations into the LLM’s training or inference process, the LLM can learn from and reuse these high-quality translations, potentially improving its overall performance.
Reduced post-editing effort – When the LLM can accurately use the translations stored in the TM, the need for human post-editing can be reduced, leading to increased productivity and cost savings.

Another approach to integrating TM data with LLMs is to use fine-tuning in the same way you would fine-tune a model for business domain content generation, for instance. For customers operating in global industries, potentially translating to and from over 10 languages, this approach can prove to be operationally complex and costly. The solution proposed in this post relies on LLMs’ context learning capabilities and prompt engineering. It enables you to use an off-the-shelf model as is without involving machine learning operations (MLOps) activity.
Solution overview
The LLM translation playground is a sample application providing the following capabilities:

Experiment with LLM translation capabilities using models available in Amazon Bedrock
Create and compare various inference configurations
Evaluate the impact of prompt engineering and Retrieval Augmented Generation (RAG) on translation with LLMs
Configure supported language pairs
Import, process, and test translation using your existing TMX file with Multiple LLMS
Custom terminology conversion
Performance, quality, and usage metrics including BLEU, BERT, METEOR and, CHRF

The following diagram illustrates the translation playground architecture. The numbers are color-coded to represent two flows: the translation memory ingestion flow (orange) and the text translation flow (gray). The solution offers two TM retrieval modes for users to choose from: vector and document search. This is covered in detail later in the post.

The TM ingestion flow (orange) consists of the following steps:

The user uploads a TMX file to the playground UI.
Depending on which retrieval mode is being used, the appropriate adapter is invoked.
When using the Amazon OpenSearch Service adapter (document search), translation unit groupings are parsed and stored into an index dedicated to the uploaded file. When using the FAISS adapter (vector search), translation unit groupings are parsed and turned into vectors using the selected embedding model from Amazon Bedrock.
When using the FAISS adapter, translation units are stored into a local FAISS index along with the metadata.

The text translation flow (gray) consists of the following steps:

The user enters the text they want to translate along with source and target language.
The request is sent to the prompt generator.
The prompt generator invokes the appropriate knowledge base according to the selected mode.
The prompt generator receives the relevant translation units.
Amazon Bedrock is invoked using the generated prompt as input along with customization parameters.

The translation playground could be adapted into a scalable serverless solution as represented by the following diagram using AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon API Gateway.

Strategy for TM knowledge base
The LLM translation playground offers two options to incorporate the translation memory into the prompt. Each option is available through its own page within the application:

Vector store using FAISS – In this mode, the application processes the .tmx file the user uploaded, indexes it, and stores it locally into a vector store (FAISS).
Document store using Amazon OpenSearch Serverless – Only standard document search using Amazon OpenSearch Serverless is supported. To test vector search, use the vector store option (using FAISS).

In vector store mode, the translation segments are processed as follows:

Embed the source segment.
Extract metadata:

Segment language
System generated <tu> segment unique identifier

Store source segment vectors along with metadata and the segment itself in plain text as a document

The translation customization section allows you to select the embedding model. You can choose either Amazon Titan Embeddings Text V2 or Cohere Embed Multilingual v3. Amazon Titan Text Embeddings V2 includes multilingual support for over 100 languages in pre-training. Cohere Embed supports 108 languages.
In document store mode, the language segments are not embedded and are stored following a flat structure. Two metadata attributes are maintained across the documents:

Segment Language
System generated <tu> segment unique identifier

Prompt engineering
The application uses prompt engineering techniques to incorporate several types of inputs for the inference. The following sample XML illustrates the prompt’s template structure:

<prompt>
<system_prompt>…</system_prompt>
<source_language>EN</source_language>
<target_language>FR</target_language>
<translation_memory_pairs>
<source_language>…</source_language>
<target_language>…</target_language>
</translation_memory_pairs>
<custom_terminology_pairs>
<source_language>…</source_language>
<target_language>…</target_language>
</custom_terminology_pairs ><user_prompt>…</user_prompt>
</prompt>

Prerequisites
The project code uses the Python version of the AWS Cloud Development Kit (AWS CDK). To run the project code, make sure that you have fulfilled the AWS CDK prerequisites for Python.
The project also requires that the AWS account is bootstrapped to allow the deployment of the AWS CDK stack.
Install the UI
To deploy the solution, first install the UI (Streamlit application):

Clone the GitHub repository using the following command:

git clone https://github.com/aws-samples/llm-translation-playground.git

Navigate to the deployment directory:

cd llm-translation-playground

Install and activate a Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install Python libraries:

python -m pip install -r requirements.txt
Deploy the AWS CDK stack
Complete the following steps to deploy the AWS CDK stack:

Move into the deployment folder:

cd deployment/cdk

Configure the AWS CDK context parameters file context.json. For collection_name, use the OpenSearch Serverless collection name. For example:

“collection_name”: “search-subtitles”

Deploy the AWS CDK stack:

cdk deploy

Validate successful deployment by reviewing the OpsServerlessSearchStack stack on the AWS CloudFormation The status should read CREATE_COMPLETE.
On the Outputs tab, make note of the OpenSearchEndpoint attribute value.

Configure the solution
The stack creates an AWS Identity and Access Management (IAM) role with the right level of permission needed to run the application. The LLM translation playground assumes this role automatically on your behalf. To achieve this, modify the role or principal under which you are planning to run the application so you are allowed to assume the newly created role. You can use the pre-created policy and attach it to your role. The policy Amazon Resource Name (ARN) can be retrieved as a stack output under the key LLMTranslationPlaygroundAppRoleAssumePolicyArn, as illustrated in the preceding screenshot. You can do so from the IAM console after selecting your role and choosing Add permissions. If you prefer to use the AWS Command Line Interface (AWS CLI), refer to the following sample command line:
aws iam attach-role-policy –role-name &lt;role-name&gt;  –policy-arn &lt;policy-arn&gt;
Finally, configure the .env file in the utils folder as follows:

APP_ROLE_ARN – The ARN of the role created by the stack (stack output LLMTranslationPlaygroundAppRoleArn)
HOST – OpenSearch Serverless collection endpoint (without https)
REGION – AWS Region the collection was deployed into
INGESTION_LIMIT – Maximum amount of translation units (<tu> tags) indexed per TMX file you upload

Run the solution
To start the translation playground, run the following commands:
cd llm-translation-playground/source
streamlit run LLM_Translation_Home.py
Your default browser should open a new tab or window displaying the Home page.

Simple test case
Let’s run a simple translation test using the phrase mentioned earlier: “Did you perform well?”
Because we’re not using a knowledge base for this test case, we can use either a vector store or document store. For this post, we use a document store.

Choose With Document Store.
For Source Text, enter the text to be translated.
Choose your source and target languages (for this post, English and French, respectively).
You can experiment with other parameters, such as model, maximum tokens, temperature, and top-p.
Choose Translate.

The translated text appears in the bottom section. For this example, the translated text, although accurate, is close to a literal translation, which is not a common phrasing in French.

We can rerun the same test after slightly modifying the initial text: “Did you perform well at the soccer tournament?”

We’re now introducing some situational context in the input. The translated text should be different and closer to a more natural translation. The new output literally means “Did you play well at the soccer tournament?”, which is consistent with the initial intent of the question.

Also note the completion metrics on the left pane, displaying latency, input/output tokens, and quality scores.
This example highlights the ability of LLMs to naturally adapt the translation to the context.
Adding translation memory
Let’s test the impact of using a translation memory TMX file on the translation quality.

Copy the text contained within test/source_text.txt and paste into the Source text
Choose French as the target language and run the translation.
Copy the text contained within test/target_text.txt and paste into the reference translation field.

Choose Evaluate and notice the quality scores on the left.
In the Translation Customization section, choose Browse files and choose the file test/subtitles_memory.tmx.

This will index the translation memory into the OpenSearch Service collection previously created. The indexing process can take a few minutes.

When the indexing is complete, select the created index from the index dropdown.
Rerun the translation.

You should see a noticeable increase in the quality score. For instance, we’ve seen up to 20 percentage points improvement in BLEU score with the preceding test case. Using prompt engineering, we were able to steer the model’s output by providing sample phrases directly pulled from the TMX file. Feel free to explore the generated prompt for more details on how the translation pairs were introduced.
You can replicate a similar test case with Amazon Translate by launching an asynchronous job customized using parallel data.

Here we took a simplistic retrieval approach, which consists of loading all of the samples as part of the same TMX file, matching the source and target language. You can enhance this technique by using metadata-driven filtering to collect the relevant pairs according to the source text. For example, you can classify the documents by theme or business domain, and use category tags to select language pairs relevant to the text and desired output.
Semantic similarity for translation memory selection
In vector store mode, the application allows you to upload a TMX and create a local index that uses semantic similarity to select the translation memory segments. First, we retrieve the segment with the highest similarity score based on the text to be translated and the source language. Then we retrieve the corresponding segment matching the target language and parent translation unit ID.
To try it out, upload the file in the same way as shown earlier. Depending on the size of the file, this can take a few minutes. There is a maximum limit of 200 MB. You can use the sample file as in the previous example or one of the other samples provided in the code repository.
This approach differs from the static index search because it’s assumed that the source text is semantically close to segments representative enough of the expected style and tone.

Adding custom terminology
Custom terminology allows you to make sure that your brand names, character names, model names, and other unique content get translated to the desired result. Given that LLMs are pre-trained on massive amounts of data, they can likely already identify unique names and render them accurately in the output. If there are names for which you want to enforce a strict and literal translation, you can try the custom terminology feature of this translate playground. Simply provide the source and target language pairs separated by semicolon in the Translation Customization section. For instance, if you want to keep the phrase “Gen AI” untranslated regardless of the language, you can configure the custom terminology as illustrated in the following screenshot.

Clean up
To delete the stack, navigate to the deployment folder and run:cdk destroy.
Further considerations
Using existing TMX files with generative AI-based translation systems can potentially improve the quality and consistency of translations. The following are some steps to use TMX files for generative AI translations:

TMX data pipeline – TMX files contain structured translation units, but the format might need to be preprocessed to extract the source and target text segments in a format that can be consumed by the generative AI model. This involves extract, transform, and load (ETL) pipelines able to parse the XML structure, handle encoding issues, and add metadata.
Incorporate quality estimation and human review – Although generative AI models can produce high-quality translations, it is recommended to incorporate quality estimation techniques and human review processes. You can use automated quality estimation models to flag potentially low-quality translations, which can then be reviewed and corrected by human translators.
Iterate and refine – Translation projects often involve iterative cycles of translation, review, and improvement. You can periodically retrain or fine-tune the generative AI model with the updated TMX file, creating a virtuous cycle of continuous improvement.

Conclusion
The LLM translation playground presented in this post enables you evaluate the use of LLMs for your machine translation needs. The key features of this solution include:

Ability to use translation memory – The solution allows you to integrate your existing TM data, stored in the industry-standard TMX format, directly into the LLM translation process. This helps improve the accuracy and consistency of the translations by using high-quality human-translated content.
Prompt engineering capabilities – The solution showcases the power of prompt engineering, demonstrating how LLMs can be steered to produce more natural and contextual translations by carefully crafting the input prompts. This includes the ability to incorporate custom terminology and domain-specific knowledge.
Evaluation metrics – The solution includes standard translation quality evaluation metrics, such as BLEU, BERT Score, METEOR, and CHRF, to help you assess the quality and effectiveness of the LLM-powered translations compared to their your existing machine translation workflows.

As the industry continues to explore the use of LLMs, this solution can help you gain valuable insights and data to determine if LLMs can become a viable and valuable option for your content translation and localization workloads.
To dive deeper into the fast-moving field of LLM-based machine translation on AWS, check out the following resources:

How 123RF saved over 90% of their translation costs by switching to Amazon Bedrock
Video auto-dubbing using Amazon Translate, Amazon Bedrock, and Amazon Polly
Multi-Model agentic & reflective translation workflow in Amazon Bedrock

About the Authors
Narcisse Zekpa is a Sr. Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their business transformation through innovative, and scalable solutions, on the AWS Cloud. He is passionate about enabling organizations to transform transform their business, using advanced analytics and AI. When Narcisse is not building, he enjoys spending time with his family, traveling, running, cooking and playing basketball.
Ajeeb Peter is a Principal Solutions Architect with Amazon Web Services based in Charlotte, North Carolina, where he guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. He brings over 20 years of technology experience on Software Development, Architecture and Analytics from industries like finance and telecom