Time series forecasting with Amazon SageMaker AutoML

Time series forecasting is a critical component in various industries for making informed decisions by predicting future values of time-dependent data. A time series is a sequence of data points recorded at regular time intervals, such as daily sales revenue, hourly temperature readings, or weekly stock market prices. These forecasts are pivotal for anticipating trends and future demands in areas such as product demand, financial markets, energy consumption, and many more.
However, creating accurate and reliable forecasts poses significant challenges because of factors such as seasonality, underlying trends, and external influences that can dramatically impact the data. Additionally, traditional forecasting models often require extensive domain knowledge and manual tuning, which can be time-consuming and complex.
In this blog post, we explore a comprehensive approach to time series forecasting using the Amazon SageMaker AutoMLV2 Software Development Kit (SDK). SageMaker AutoMLV2 is part of the SageMaker Autopilot suite, which automates the end-to-end machine learning workflow from data preparation to model deployment. Throughout this blog post, we will be talking about AutoML to indicate SageMaker Autopilot APIs, as well as Amazon SageMaker Canvas AutoML capabilities. We’ll walk through the data preparation process, explain the configuration of the time series forecasting model, detail the inference process, and highlight key aspects of the project. This methodology offers insights into effective strategies for forecasting future data points in a time series, using the power of machine learning without requiring deep expertise in model development. The code for this post can be found in the GitHub repo.
The following diagram depicts the basic AutoMLV2 APIs, all of which are relevant to this post. The diagram shows the workflow for building and deploying models using the AutoMLV2 API. In the training phase, CSV data is uploaded to Amazon S3, followed by the creation of an AutoML job, model creation, and checking for job completion. The deployment phase allows you to choose between real-time inference via an endpoint or batch inference using a scheduled transform job that stores results in S3.

1. Data preparation
The foundation of any machine learning project is data preparation. For this project, we used a synthetic dataset containing time series data of product sales across various locations, focusing on attributes such as product code, location code, timestamp, unit sales, and promotional information. The dataset can be found in an Amazon-owned, public Amazon Simple Storage Service (Amazon S3) dataset.
When preparing your CSV file for input into a SageMaker AutoML time series forecasting model, you must ensure that it includes at least three essential columns (as described in the SageMaker AutoML V2 documentation):

Item identifier attribute name: This column contains unique identifiers for each item or entity for which predictions are desired. Each identifier distinguishes the individual data series within the dataset. For example, if you’re forecasting sales for multiple products, each product would have a unique identifier.
Target attribute name: This column represents the numerical values that you want to forecast. These could be sales figures, stock prices, energy usage amounts, and so on. It’s crucial that the data in this column is numeric because the forecasting models predict quantitative outcomes.
Timestamp attribute name: This column indicates the specific times when the observations were recorded. The timestamp is essential for analyzing the data in a chronological context, which is fundamental to time series forecasting. The timestamps should be in a consistent and appropriate format that reflects the regularity of your data (for example, daily or hourly).

All other columns in the dataset are optional and can be used to include additional time-series related information or metadata about each item. Therefore, your CSV file should have columns named according to the preceding attributes (item identifier, target, and timestamp) as well as any other columns needed to support your use case For instance, if your dataset is about forecasting product demand, your CSV might look something like this:

Product_ID (item identifier): Unique product identifiers.
Sales (target): Historical sales data to be forecasted.
Date (timestamp): The dates on which sales data was recorded.

The process of splitting the training and test data in this project uses a methodical and time-aware approach to ensure that the integrity of the time series data is maintained. Here’s a detailed overview of the process:
Ensuring timestamp integrity
The first step involves converting the timestamp column of the input dataset to a datetime format using pd.to_datetime. This conversion is crucial for sorting the data chronologically in subsequent steps and for ensuring that operations on the timestamp column are consistent and accurate.
Sorting the data
The sorted dataset is critical for time series forecasting, because it ensures that data is processed in the correct temporal order. The input_data DataFrame is sorted based on three columns: product_code, location_code, and timestamp. This multi-level sort guarantees that the data is organized first by product and location, and then chronologically within each product-location grouping. This organization is essential for the logical partitioning of data into training and test sets based on time.
Splitting into training and test sets
The splitting mechanism is designed to handle each combination of product_code and location_code separately, respecting the unique temporal patterns of each product-location pair. For each group:

The initial test set is determined by selecting the last eight timestamps (yellow + green below). This subset represents the most recent data points that are candidates for testing the model’s forecasting ability.
The final test set is refined by removing the last four timestamps from the initial test set, resulting in a test dataset that includes the four timestamps immediately preceding the latest data (green below). This strategy ensures the test set is representative of the near-future periods the model is expected to predict, while also leaving out the most recent data to simulate a realistic forecasting scenario.
The training set comprises the remaining data points, excluding the last eight timestamps (blue below). This ensures the model is trained on historical data that precedes the test period, avoiding any data leakage and ensuring that the model learns from genuinely past observations.

This process is visualized in the following figure with an arbitrary value on the Y axis and the days of February on the X axis.

The test dataset is used to evaluate the performance of the trained model and compute various loss metrics, such as mean absolute error (MAE) and root-mean-squared error (RMSE). These metrics quantify the model’s accuracy in forecasting the actual values in the test set, providing a clear indication of the model’s quality and its ability to make accurate predictions. The evaluation process is detailed in the “Inference: Batch, real-time, and asynchronous” section, where we discuss the comprehensive approach to model evaluation and conditional model registration based on the computed metrics.
Creating and saving the datasets
After the data for each product-location group is categorized into training and test sets, the subsets are aggregated into comprehensive training and test DataFrames using pd.concat. This aggregation step combines the individual DataFrames stored in train_dfs and test_dfs lists into two unified DataFrames:

train_df for training data
test_df for testing data

Finally, the DataFrames are saved to CSV files (train.csv for training data and test.csv for test data), making them accessible for model training and evaluation processes. This saving step not only facilitates a clear separation of data for modelling purposes but also enables reproducibility and sharing of the prepared datasets.
Summary
This data preparation strategy meticulously respects the chronological nature of time series data and ensures that the training and test sets are appropriately aligned with real-world forecasting scenarios. By splitting the data based on the last known timestamps and carefully excluding the most recent periods from the training set, the approach mimics the challenge of predicting future values based on past observations, thereby setting the stage for a robust evaluation of the forecasting model’s performance.
2. Training a model with AutoMLV2
SageMaker AutoMLV2 reduces the resources needed to train, tune, and deploy machine learning models by automating the heavy lifting involved in model development. It provides a straightforward way to create high-quality models tailored to your specific problem type, be it classification, regression, or forecasting, among others. In this section, we delve into the steps to train a time series forecasting model with AutoMLV2.
Step 1: Define the tine series forecasting configuration
The first step involves defining the problem configuration. This configuration guides AutoMLV2 in understanding the nature of your problem and the type of solution it should seek, whether it involves classification, regression, time-series classification, computer vision, natural language processing, or fine-tuning of large language models. This versatility is crucial because it allows AutoMLV2 to adapt its approach based on the specific requirements and complexities of the task at hand. For time series forecasting, the configuration includes details such as the frequency of forecasts, the horizon over which predictions are needed, and any specific quantiles or probabilistic forecasts. Configuring the AutoMLV2 job for time series forecasting involves specifying parameters that would best use the historical sales data to predict future sales.
The AutoMLTimeSeriesForecastingConfig is a configuration object in the SageMaker AutoMLV2 SDK designed specifically for setting up time series forecasting tasks. Each argument provided to this configuration object tailors the AutoML job to the specifics of your time series data and the forecasting objectives.

time_series_config = AutoMLTimeSeriesForecastingConfig(
forecast_frequency=’W’,
forecast_horizon=4,
item_identifier_attribute_name=’product_code’,
target_attribute_name=’unit_sales’,
timestamp_attribute_name=’timestamp’,

)

The following is a detailed explanation of each configuration argument used in your time series configuration:

forecast_frequency

Description: Specifies how often predictions should be made.
Value ‘W’: Indicates that forecasts are expected on a weekly basis. The model will be trained to understand and predict data as a sequence of weekly observations. Valid intervals are an integer followed by Y (year), M (month), W (week), D (day), H (hour), and min (minute). For example, 1D indicates every day and 15min indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of 1H instead of 60min.

forecast_horizon

Description: Defines the number of future time-steps the model should predict.
Value 4: The model will forecast four time-steps into the future. Given the weekly frequency, this means the model will predict the next four weeks of data from the last known data point.

forecast_quantiles

Description: Specifies the quantiles at which to generate probabilistic forecasts.
Values [p50,p60,p70,p80,p90]: These quantiles represent the 50th, 60th, 70th, 80th, and 90th percentiles of the forecast distribution, providing a range of possible outcomes and capturing forecast uncertainty. For instance, the p50 quantile (median) might be used as a central forecast, while the p90 quantile provides a higher-end forecast, where 90% of the actual data is expected to fall below the forecast, accounting for potential variability.

filling

Description: Defines how missing data should be handled before training; specifying filling strategies for different scenarios and columns.
Value filling_config: This should be a dictionary detailing how to fill missing values in your dataset, such as filling missing promotional data with zeros or specific columns with predefined values. This ensures the model has a complete dataset to learn from, improving its ability to make accurate forecasts.

item_identifier_attribute_name

Description: Specifies the column that uniquely identifies each time series in the dataset.
Value ’product_code’: This setting indicates that each unique product code represents a distinct time series. The model will treat data for each product code as a separate forecasting problem.

target_attribute_name

Description: The name of the column in your dataset that contains the values you want to predict.
Value unit_sales: Designates the unit_sales column as the target variable for forecasts, meaning the model will be trained to predict future sales figures.

timestamp_attribute_name

Description: The name of the column indicating the time point for each observation.
Value ‘timestamp’: Specifies that the timestamp column contains the temporal information necessary for modeling the time series.

grouping_attribute_names

Description: A list of column names that, in combination with the item identifier, can be used to create composite keys for forecasting.
Value [‘location_code’]: This setting means that forecasts will be generated for each combination of product_code and location_code. It allows the model to account for location-specific trends and patterns in sales data.

The configuration provided instructs the SageMaker AutoML to train a model capable of weekly sales forecasts for each product and location, accounting for uncertainty with quantile forecasts, handling missing data, and recognizing each product-location pair as a unique series. This detailed setup aims to optimize the forecasting model’s relevance and accuracy for your specific business context and data characteristics.
Step 2: Initialize the AutoMLV2 job
Next, initialize the AutoMLV2 job by specifying the problem configuration, the AWS role with permissions, the SageMaker session, a base job name for identification, and the output path where the model artifacts will be stored.

automl_sm_job = AutoMLV2(
problem_config=time_series_config,
role=role,
sagemaker_session=sagemaker_session,
base_job_name=’time-series-forecasting-job’,
output_path=f’s3://{bucket}/{prefix}/output’
)

Step 3: Fit the model
To start the training process, call the fit method on your AutoMLV2 job object. This method requires specifying the input data’s location in Amazon S3 and whether SageMaker should wait for the job to complete before proceeding further. During this step, AutoMLV2 will automatically pre-process your data, select algorithms, train multiple models, and tune them to find the best solution.

automl_sm_job.fit(
inputs=[AutoMLDataChannel(s3_data_type=’S3Prefix’, s3_uri=train_uri, channel_type=’training’)],
wait=True,
logs=True
)

Please note that model fitting may take several hours, depending on the size of your dataset and compute budget. A larger compute budget allows for more powerful instance types, which can accelerate the training process. In this situation, provided you’re not running this code as part of the provided SageMaker notebook (which handles the order of code cell processing correctly), you will need to implement some custom code that monitors the training status before retrieving and deploying the best model.
3. Deploying a model with AutoMLV2
Deploying a machine learning model into production is a critical step in your machine learning workflow, enabling your applications to make predictions from new data. SageMaker AutoMLV2 not only helps build and tune your models but also provides a seamless deployment experience. In this section, we’ll guide you through deploying your best model from an AutoMLV2 job as a fully managed endpoint in SageMaker.
Step 1: Identify the best model and extract name
After your AutoMLV2 job completes, the first step in the deployment process is to identify the best performing model, also known as the best candidate. This can be achieved by using the best_candidate method of your AutoML job object. You can either use this method immediately after fitting the AutoML job or specify the job name explicitly if you’re operating on a previously completed AutoML job.

# Option 1: Directly after fitting the AutoML job
best_candidate = automl_sm_job.best_candidate()

# Option 2: Specifying the job name directly
best_candidate = automl_sm_job.best_candidate(job_name=’your-auto-ml-job-name’)

best_candidate_name = best_candidate[‘CandidateName’]

Step 2: Create a SageMaker model
Before deploying, create a SageMaker model from the best candidate. This model acts as a container for the artifacts and metadata necessary to serve predictions. Use the create_model method of the AutoML job object to complete this step.

endpoint_name = f”ep-{best_candidate_name}-automl-ts”

# Create a SageMaker model from the best candidate
automl_sm_model = automl_sm_job.create_model(name=best_candidate_name, candidate=best_candidate)

4. Inference: Batch, real-time, and asynchronous
For deploying the trained model, we explore batch, real-time, and asynchronous inference methods to cater to different use cases.
The following figure is a decision tree to help you decide what type of endpoint to use. The diagram outlines a decision-making process for selecting between batch, asynchronous, or real-time inference endpoints. Starting with the need for immediate responses, it guides you through considerations like the size of the payload and the computational complexity of the model. Depending on these factors, you can choose a faster option with lower computational requirements or a slower batch process for larger datasets.

Batch inference using SageMaker pipelines

Usage: Ideal for generating forecasts in bulk, such as monthly sales predictions across all products and locations.
Process: We used SageMaker’s batch transform feature to process a large dataset of historical sales data, outputting forecasts for the specified horizon.

The inference pipeline used for batch inference demonstrates a comprehensive approach to deploying, evaluating, and conditionally registering a machine learning model for time series forecasting using SageMaker. This pipeline is structured to ensure a seamless flow from data preprocessing, through model inference, to post-inference evaluation and conditional model registration. Here’s a detailed breakdown of its construction:

Batch tranform step

Transformer Initialization: A Transformer object is created, specifying the model to use for batch inference, the compute resources to allocate, and the output path for the results.
Transform step creation: This step invokes the transformer to perform batch inference on the specified input data. The step is configured to handle data in CSV format, a common choice for structured time series data.

Evaluation step

Processor setup: Initializes an SKLearn processor with the specified role, framework version, instance count, and type. This processor is used for the evaluation of the model’s performance.
Evaluation processing: Configures the processing step to use the SKLearn processor, taking the batch transform output and test data as inputs. The processing script (evaluation.py) is specified here, which will compute evaluation metrics based on the model’s predictions and the true labels.
Evaluation strategy: We adopted a comprehensive evaluation approach, using metrics like mean absolute error (MAE) and root-means squared error (RMSE) to quantify the model’s accuracy and adjusting the forecasting configuration based on these insights.
Outputs and property files: The evaluation step produces an output file (evaluation_metrics.json) that contains the computed metrics. This file is stored in Amazon S3 and registered as a property file for later access in the pipeline.

Conditional model registration

Model metrics setup: Defines the model metrics to be associated with the model package, including statistics and explainability reports sourced from specified Amazon S3 URIs.
Model registration: Prepares for model registration by specifying content types, inference and transform instance types, model package group name, approval status, and model metrics.
Conditional registration step: Implements a condition based on the evaluation metrics (for example, MAE). If the condition (for example, MAE is greater than or equal to threshold) is met, the model is registered; otherwise, the pipeline concludes without model registration.

Pipeline creation and runtime

Pipeline definition: Assembles the pipeline by naming it and specifying the sequence of steps to run: batch transform, evaluation, and conditional registration.
Pipeline upserting and runtime: The pipeline.upsert method is called to create or update the pipeline based on the provided definition, and pipeline.start() runs the pipeline.

The following figure is an example of the SageMaker Pipeline directed acyclic graph (DAG).

This pipeline effectively integrates several stages of the machine learning lifecycle into a cohesive workflow, showcasing how Amazon SageMaker can be used to automate the process of model deployment, evaluation, and conditional registration based on performance metrics. By encapsulating these steps within a single pipeline, the approach enhances efficiency, ensures consistency in model evaluation, and streamlines the model registration process—all while maintaining the flexibility to adapt to different models and evaluation criteria.
Inferencing with Amazon SageMaker Endpoint in (near) real-time
But what if you want to run inference in real-time or asynchronously? SageMaker real-time endpoint inference offers the capability to deliver immediate predictions from deployed machine learning models, crucial for scenarios demanding quick decision making. When an application sends a request to a SageMaker real-time endpoint, it processes the data in real time and returns the prediction almost immediately. This setup is optimal for use cases that require near-instant responses, such as personalized content delivery, immediate fraud detection, and live anomaly detection.

Usage: Suited for on-demand forecasts, such as predicting next week’s sales for a specific product at a particular location.
Process: We deployed the model as a SageMaker endpoint, allowing us to make real-time predictions by sending requests with the required input data.

Deployment involves specifying the number of instances and the instance type to serve predictions. This step creates an HTTPS endpoint that your applications can invoke to perform real-time predictions.

# Deploy the model to a SageMaker endpoint
predictor = automl_sm_model.deploy(initial_instance_count=1, endpoint_name=endpoint_name, instance_type=’ml.m5.xlarge’)

The deployment process is asynchronous, and SageMaker takes care of provisioning the necessary infrastructure, deploying your model, and ensuring the endpoint’s availability and scalability. After the model is deployed, your applications can start sending prediction requests to the endpoint URL provided by SageMaker.
While real-time inference is suitable for many use cases, there are scenarios where a slightly relaxed latency requirement can be beneficial. SageMaker Asynchronous Inference provides a queue-based system that efficiently handles inference requests, scaling resources as needed to maintain performance. This approach is particularly useful for applications that require processing of larger datasets or complex models, where the immediate response is not as critical.

Usage: Examples include generating detailed reports from large datasets, performing complex calculations that require significant computational time, or processing high-resolution images or lengthy audio files. This flexibility makes it a complementary option to real-time inference, especially for businesses that face fluctuating demand and seek to maintain a balance between performance and cost.
Process: The process of using asynchronous inference is straightforward yet powerful. Users submit their inference requests to a queue, from which SageMaker processes them sequentially. This queue-based system allows SageMaker to efficiently manage and scale resources according to the current workload, ensuring that each inference request is handled as promptly as possible.

Clean up
To avoid incurring unnecessary charges and to tidy up resources after completing the experiments or running the demos described in this post, follow these steps to delete all deployed resources:

Delete the SageMaker endpoints: To delete any deployed real-time or asynchronous endpoints, use the SageMaker console or the AWS SDK. This step is crucial as endpoints can accrue significant charges if left running.
Delete the SageMaker Pipeline: If you have set up a SageMaker Pipeline, delete it to ensure that there are no residual executions that might incur costs.
Delete S3 artifacts: Remove all artifacts stored in your S3 buckets that were used for training, storing model artifacts, or logging. Ensure you delete only the resources related to this project to avoid data loss.
Clean up any additional resources: Depending on your specific implementation and additional setup modifications, there may be other resources to consider, such as roles or logs. Check your AWS Management Console for any resources that were created and delete them if they are no longer needed.

Conclusion
This post illustrates the effectiveness of Amazon SageMaker AutoMLV2 for time series forecasting. By carefully preparing the data, thoughtfully configuring the model, and using both batch and real-time inference, we demonstrated a robust methodology for predicting future sales. This approach not only saves time and resources but also empowers businesses to make data-driven decisions with confidence.
If you’re inspired by the possibilities of time series forecasting and want to experiment further, consider exploring the SageMaker Canvas UI. SageMaker Canvas provides a user-friendly interface that simplifies the process of building and deploying machine learning models, even if you don’t have extensive coding experience.
Visit the SageMaker Canvas page to learn more about its capabilities and how it can help you streamline your forecasting projects. Begin your journey towards more intuitive and accessible machine learning solutions today!

About the Authors
Nick McCarthy is a Senior Machine Learning Engineer at AWS, based in London. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time travelling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.
Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Automate user on-boarding for financial services with a digital assist …

In this post, we present a solution that harnesses the power of generative AI to streamline the user onboarding process for financial services through a digital assistant. Onboarding new customers in the banking industry is a crucial step in the customer journey, involving a series of activities designed to fulfill know your customer (KYC) requirements, conduct necessary verifications, and introduce them to the bank’s products or services. Traditionally, customer onboarding has been a tedious and heavily manual process. Our solution provides practical guidance on addressing this challenge by using a generative AI assistant on AWS.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock, we build a digital assistant that automates document processing, identity verifications, and engages customers through conversational interactions. As a result, customers can be onboarded in a matter of minutes through secure, automated workflows. In this post we provide you a solution and the accompanying code that banks can use to dramatically enhance the customer experience and establish a strong customer relationship from the outset.
Challenges with traditional onboarding
The traditional onboarding process for banks faces challenges in the current digital landscape because many institutions don’t have fully automated account-opening systems. While customers in other sectors have access to intelligent assistants, those in banking often encounter legacy processes. As the financial services industry adapts to changing consumer expectations, there’s a need to address the demand for instant and 24/7 availability of services.
The challenges associated with the manual onboarding process include but aren’t limited to, the following:

Time-consuming paperwork – New customers are asked to manually fill out extensive paperwork including account opening forms, disclosures, and so on. Reviewing physical documents also takes up valuable staff time. This lengthy paperwork process can result in slow onboarding and a poor customer experience.
Security risks – Paper documents and in-person ID verification lack security compared to digital processes because of their susceptibility to tampering, loss, and lack of traceability. For example, there’s a greater risk of identity theft and fraud with physical documents, because they can be altered or misplaced without leaving an audit trail.
Accessibility issues – Requiring in-person account opening at branches can create accessibility challenges for many customers, including senior citizens and disabled individuals.
Limited service hours – The account opening process is available only during branch operating hours, which limits the timeframe when customers can complete the onboarding process. This constraint impacts the flexibility for customers to initiate account opening at their preferred time.
High costs – Manual paperwork processing and in-person verification are labor-intensive tasks that require significant staff time and resources, leading to high operational costs.

AI-powered services enable automated, secure, and compliant processes for self-service account opening. Providing onboarding experiences aligned with current digital standards might offer a competitive edge for banks in the future.
Solution overview
The solution allows users to open bank accounts remotely through a conversational interface, eliminating the need to visit a physical branch. We created a digital assistant named Penny to guide users through the process, including uploading KYC documents and facilitating identity verification using document scanning and facial recognition. The approach uses Retrieval Augmented Generation (RAG), which combines text generation capabilities with database querying to provide contextually relevant responses to customer inquiries. Implementing digital onboarding reduces the accessibility barriers present in traditional manual account opening processes. The code for this solution is available in a GitHub repository.
The brain of our application is a custom LangChain Agent. When a user wants to open a new bank account, the agent will help them complete the onboarding process using preconfigured stages corresponding to each onboarding step. Each stage might use a LangChain tool, allowing for the automation and orchestration of onboarding. These tools call on AWS service APIs for the required functionality.
The following figure represents the high-level architecture of the proposed solution.

The flow of the application is as follows:

Users access the frontend website hosted within AWS Amplify. AWS Amplify is an end-to-end solution that enables frontend web developers to build and deploy secure, scalable full stack applications.
The website invokes an Amazon CloudFront endpoint to interact with the digital assistant, Penny, which is containerized and deployed in AWS Fargate. Fargate is a serverless compute engine for containers that manages and scales your containers for you, compatible with Amazon Elastic Container Service (Amazon ECS).
The digital assistant uses a custom LangChain agent to answer questions on the bank’s products and services and orchestrate the onboarding flow.
If the user asks a general question related to the bank’s products or service, the agent will use a custom LangChain tool called ProductSearch. This tool uses Amazon Kendra linked with an Amazon Simple Storage Service (Amason S3) data source that contains the bank’s data. Amazon Kendra is an intelligent enterprise search service powered by machine learning that enables companies to index and search content across their document stores.
If the user indicates that they want to open a new account, the agent will prompt the user for their email. After the user responds, the application will invoke a custom LangChain tool called EmailValidation. This tools checks if there is an existing account in the bank’s Amazon DynamoDB database, by calling an endpoint deployed in Amazon API Gateway.
After the email validation, KYC information is gathered, such as first and last name. Then, the user is prompted for an identity document, which is uploaded to Amazon S3.
The agent will invoke a custom LangChain tool called IDVerification. This tool checks if the user details entered during the session match the ID by calling an endpoint deployed in Amazon API Gateway. The details are verified by extracting the document text using Amazon Textract, a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents.
After the ID verification, the user is asked for a selfie. The image is uploaded to Amazon S3. Then, the agent will invoke a custom LangChain tool called SelfieVerification. This tool checks if the uploaded selfie matches the face on the ID by calling an endpoint deployed in API Gateway. The face match is detected using Amazon Rekognition, which offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos.
After the face verification is successful, the agent will use a custom LangChain tool called SaveData. This tool creates a new account in the bank’s DynamoDB database by calling an endpoint deployed in API Gateway.
The user is notified that their new account has been created successfully, using Amazon Simple Email Service (Amazon SES).

Prompt design for agent orchestration
Now, let’s take a look at how we give our digital assistant, Penny, the capability to handle onboarding for financial services. The key is the prompt engineering for the custom LangChain agent. This has been specified in PennyAgent.py. This prompt includes onboarding stages and relevant LangChain tools that the agent might need to complete the onboarding steps.
To begin, we provide the agent with a name, role and company.

AGENT_TOOLS_PROMPT = “””
Never forget your name is {assistant_name}. You work as a {assistant_role}.
You work at company named {bank_name}

Next, we define the various stages of onboarding and specify the respective tools and expected responses. Having stages in a sequential and structured format while also providing awareness of all possible stages helps the agent determine the onboarding stage with accuracy.

<STAGES>

These are the stages:

Introduction or greeting: When conversation history is empty, choose stage 1
Response: Start the conversation with a greeting. Say that you can help with {bank_name} related questions or open a bank account for them. Do this only during the start of the conversation.
Tool:

General Banking Questions: Customer asks general questions about AnyBank
Response: Use ProductSearch tool to get the relevant information and answer the question like a banking assistant. Never assume anything.
Tool: ProductSearch

Account Open 1: Customer has requested to open an account.
Response: Customer has requested to open an account. Now, respond with a question asking for the customer’s email address only to get them started with onboarding. We need the email address to start the process.
Tool:

Account Open 2: User provided their email.
Response: Take the email and validate it using a EmailValidation tool. If it is valid and there is no existing account with the email, ask for account type: either CHEQUING or SAVINGS. If it is invalid or there is an existing account with the email, the user must try again.
Tool: EmailValidation

Account Open 3: User provided which account type to open.
Response: Ask the user for their first name
Tool:

Account Open 4: User provided first name.
Response: Ask the user for their last name
Tool:

Account Open 5: User provided last name.
Response: Ask the user to upload an identity document.
Tool:

Account Open 6: Penny asked for identity document and then System notified that a new file has been uploaded
Response: Take the identity file name and verify it using the IDVerification tool. If the verification is unsuccessful, ask the user to try again.
Tool: IDVerification

Account Open 7: The ID document is valid.
Response: Ask the user to upload their selfie to compare their face to the ID.
Tool:

Account Open 8: Penny asked user for their selfie and then “System notified that a file has been uploaded. ”
Response: Take the “selfie” file name and verify it using the SelfieVerification tool. If there is no face match, ask the user to try again.
Tool: SelfieVerification: Use this tool to verify the user selfie and compare faces.

Account Open 9: Face match verified
Response: Give the summary of the all the information you collected and ask user to confirm.
Tool:

Account Open 10: Confirmation
Response: Save the user data for future reference using SaveData tool. Upon saving the data, let the user know that they will receive an email confirmation of the bank account opening.
Tool: SaveData

We append the tools, their descriptions, and their response formats to the prompt. When calling on a specific tool, the agent can generate input parameters as required. Access to all the tools helps the agent identify the best tool choice based on the conversation stage.

TOOLS:
——
Penny has access to the following tools:
{tools}

We include some guidelines that the agent needs to follow while generating outputs. By using emotion-based prompt engineering, we minimize hallucinations and deviation from expected outputs. These guidelines were chosen after extensive testing to minimize edge cases and help prevent common agent mistakes.

<GUIDELINES>

1. If you ever assume any user response without asking, it may cause significant consequences.
2. It is of high priority that you respond and use appropriate tools in their respective stages. If not, it may cause significant consequences.
3. It is of high priority that you never reveal the tools or tool names to the user. Only communicate the outcome.
4. It is critical that you never reveal any details provided by the System including file names.
5. If ever the user deviates by asking general question during your account opening process, Retrieve the necessary information using ‘ProductSearch’ tool and answer the question. With confidence, ask user if they want to resume the account opening process and continue from where we left off.

The agent uses the ReAct framework to make decisions about how to respond based on user input. ReAct provides the agent with a thinking structure, through which it selects the most appropriate tool for a given task. Such frameworks make LLM agents versatile and adaptable to different use cases.
Based on the stage descriptions and the tools available, if the LLM generates a response that requires access to an external tool, then the response of the LLM will include Thought, Decision, Action, Action Input and Observation. The agent comes with a string matcher, which will detect Action and Action Input from the LLM’s response and trigger the respective tool. Based on the response from the tool, the LLM with decide whether to proceed with the Final Answer, and then the output will be returned by the agent.

FORMAT:
——

To use a tool, please always use the following format:
“`
Thought: {input}
Decision: Do I need to use a tool? y
Action: what tool to use, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
“`
When I am finished, I will have a response like this:

Final Answer: [your response as a banking assistant]

Finally, we give the agent access to the conversation history to better decide what stage the conversation is currently in. In addition, we also give access to an agent scratchpad where it can store its thought processes to execute certain actions.

Be confident that you are a banking assistant and only respond with final answer.
Begin!

<Conversation history>
{conversation_history}

{agent_scratchpad}

Orchestrating intelligent digital assistants requires thoughtful prompt engineering to handle complex tasks. By structuring the conversation into stages, providing tooling, and setting guidelines, we enable the assistant to systematically complete the onboarding process. This approach allows assistants to scale across use cases while maintaining accuracy. With the right guardrails, assistants can deliver smooth, trustworthy customer experiences.
Prompt design is key to unlocking the versatility of LLMs for real-world automation. Amazon Bedrock Prompt Management can be used to streamline the creation, evaluation, versioning and testing of prompts. This will help developers and prompt engineers save time by applying the same prompt to different onboarding processes. When you create a prompt, you can select a different model for inference and adjust the variables to obtain the best-suited results for a variety of workflows.
The following sections explain how to deploy the solution in your AWS account.
Note: Running this workload would have an estimated hourly cost of $1.34 for the Oregon (us-west-2) AWS Region. Check the pricing details for each service to understand the costs you might be charged for different usage tiers and resource configurations.
Setup
To deploy the agent, visit the project Github Repository, and use the following instructions:

Ensure the pre-requisites are completed as mentioned in the README.
Deploy the solution including the agent, tools infrastructure, and demo application—in that order—based on the instructions in the README.
After the deployment is successful, visit the outputted domain where the demo application is running. You can now begin testing the agent.

Testing the agent
Begin your exploration by accessing the Amplify endpoint where the demonstration is hosted. The demonstration incorporates an interactive chat interface, enabling you to engage in a conversational exchange with the digital assistant, Penny. Whenever you want to initiate a new instance of the agent, refresh the web page.
Let’s start talking to Penny:

Enter Hi

Penny will respond with a friendly greeting

Enter What are the cutoff times to receive wire transfers on the same day?

Penny will use the ProductSearch tool to find the relevant information from the loaded product catalog. You can try asking other questions about the bank’s product or services including the AnyBank Travel Rewards Visa Infinite Card or New Vehicle Loans.

Enter I would like to open a new bank account

Penny will recognize that the account opening flow needs to be initiated and will proceed with the first step, which is asking you for an email address.

Enter the verified customer email you registered with the Amazon SES identity. For our demonstration, we will use anup@test.com(parameter SesCustomerEmail used in the example command to setup infrastructure)

Penny will take the email address and run the EmailValidation Tool. If there is an existing account with this email, it will ask you to retry. Otherwise, it will move on to the next step which is gathering your account type.

Enter I want a savings account or indicate that you want a checking account.

Penny will record your account type and move on to the KYC questions.

Enter Anup

Penny will record your first name and continue gathering the remaining KYC information.

Enter Ravi

It will record your last name and prompt you for an ID next. We used Ravi to match the ID document provided below.

Download the picture ID. It’s also located at ./api/lambdas/test/passport.png

Upload it to the chat by selecting Choose File.
After uploading the image, you will receive a confirmation message on the chat stating We have received your document. Penny will use ID Verification to compare the name entered during the session to the document. After verification is complete, Penny will prompt you to upload a selfie.

Upload the selfie located at ./api/lambdas/test/selfie.png to the chat by selecting Choose File.

After the upload is complete, you will receive a confirmation message on the chat stating We have received your document. Penny will use Selfie Verification to compare the face on the ID to the selfie for a face match. After verification is complete, Penny will prompt you to confirm that you want to proceed.

Enter Yes I confirm

Penny will use Create Account to complete the onboarding process and send an email confirmation. It will inform to you of this update in the chat.

Check the customer email you used. The email address specified as the SesCustomerEmail parameter (in this example: anup@test.com) during setup will receive a new email from the email address you set as the SesBankEmail parameter (in this example: owner@anybank.com).

Go to the DynamoDB console, select Table from the navigation pane and select the table created by the AWS CloudFormation This is the accounts table in the bank’s AWS account. From the Table page, choose Explore items. You will see a new account created with the details that you entered.

Guardrails and security
Security is a critical part of any application and must be rigorously addressed when developing and deploying solutions, especially those that involve handling sensitive data or interacting with users. For a solution similar to the example in this post, several robust security measures should be implemented to maintain the confidentiality, integrity, and availability of the system.

Address the security of the service itself. One approach to mitigate potential biases, toxicity, or other undesirable outputs is to use Constitutional AI techniques, such as those provided by the LangChain library or Guardrails for Amazon Bedrock. By defining and enforcing a set of rules or constraints, the system can be trained to generate outputs that align with predefined ethical principles and values, thereby enhancing the trustworthiness and reliability of the service.
To maintain data protection and privacy, implementing a write-only database architecture is recommended. In this setup, the agent or service can write data to the database but is prohibited from reading or retrieving sensitive stored information. This measure effectively isolates sensitive user data, making sure that the agent would be unable to access or disclose confidential details even in the event of a compromise.
Prompt injection attacks, where malicious inputs are crafted to manipulate the system’s behavior, are a serious concern in conversational AI systems. To mitigate this risk, it’s crucial to implement robust input validation and sanitization mechanisms. This could include techniques like whitelisting permissible characters, filtering out potentially harmful patterns, and employing context-aware input processing.
Secure coding practices, such as input validation, output encoding, and proper error handling, should be rigorously followed throughout the development process. Regular security audits, penetration testing, and vulnerability assessments should be conducted to identify and address potential weaknesses in the system.
Amazon API Gateway, a fully managed service, securely handles API traffic, acting as a front door for applications running on AWS. It supports multiple security mechanisms, including AWS Identity and Access Management (IAM) for authentication and authorization, AWS WAF for web application protection, AWS Secrets Manager for securely storing and retrieving secrets, and integration with AWS CloudTrail for API activity logging. API Gateway also supports client-side SSL certificates, API keys, and resource policies for granular access control.
Communication between users, the solution, and its internal dependencies should be protected using TLS to encrypt data in transit.
Additionally, the data should be encrypted using data-at-rest encryption with AWS Key Management Service (AWS KMS) customer managed keys (CMK).

By implementing these robust security measures and fostering a culture of continuous security awareness and improvement, the solution can better protect against potential threats, safeguard user privacy, and maintain the integrity and reliability of the service.
Cleanup
Follow the cleanup Instructions in the README of the Github repository to remove the environment from your account.
Conclusion
In this post, we presented an end-to-end solution that demonstrates how banks can transform user onboarding with an AI-powered digital assistant. By orchestrating workflows across AWS services, we enabled automated, secure account opening within minutes. The conversational interface delivers exceptional customer experiences while reducing operational costs.
This solution can be quickly deployed and enhanced using the features of Amazon Bedrock. Amazon Bedrock Agents streamlines workflows by executing multistep tasks and integrating with company systems and data sources. Amazon Bedrock Knowledge Bases provides contextual information from proprietary data sources, enhancing the accuracy and relevance of responses. Additionally, Amazon Bedrock Guardrails implements safeguards to enable responsible AI usage, filtering harmful content and protecting sensitive information. These can enable a robust and secure deployment of an AI-powered onboarding solution.
Key outcomes of this solution include:

Fully digital onboarding without paper forms or branch visits
Automated KYC verification using documents and facial recognition
Customers onboarded securely in minutes with email confirmations
Lower costs by reducing manual verification workloads
Personalized assistance for any product questions 24/7

Instant, secure, and scalable delivery has become the norm that customers demand. This AI assistant solution, powered by AWS, showcases the potential future of user onboarding for financial institutions. As consumer behaviors and expectations continue to be influenced by the latest digital experiences across industries, banks that invest in advanced technologies will gain a competitive edge over their rivals.
Ready to future proof your banking experience? Visit Artificial Intelligence and Machine learning for Financial services with AWS.

About the authors
Anup Ravindranath is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.
Arya Subramanyam is a Solutions Architect based in Toronto, Canada. She works with Enterprise Greenfield customers as well as Small & Medium businesses as a technical advisor, helping them solve business challenges with cloud solutions. Arya holds a Bachelor of Applied Science in Computer Engineering from the University of British Columbia, Vancouver. Her passion for Generative AI has led her to develop various solutions leveraging Large Language Models (LLMs) with a focus on prompt engineering and AI agents.
Venkata Satyanarayana Chivatam is a Solutions Architect at AWS. He specializes in Generative AI and Computer Vision, with a particular focus on driving adoption across industries such as healthcare and finance. At AWS, he helps ISV and SMB customers leverage cutting-edge AI technologies to unlock new possibilities and solve complex challenges. He is passionate about supporting businesses of all sizes in their AI journey.
Akshata Ramesh Rao is a Solutions Architect in Toronto, Canada. Akshata works with enterprise customers to accelerate innovation and advise them through technical challenges. She also loves working with SMB customers and help them reach their business objectives quickly, safely, and cost-effectively with AWS services, frameworks, and best practices. Prior to joining AWS, Akshata worked as Devops Engineer at Amazon and holds a master’s degree in computer science from University of Ottawa.

NVIDIA AI Releases OpenMathInstruct-2: A Math Instruction Tuning Datas …

Language models have made significant strides in mathematical reasoning, with synthetic data playing a crucial role in their development. However, the field faces significant challenges due to the closed-source nature of the largest math datasets. This lack of transparency raises concerns about data leakage and erodes trust in benchmark results, as evidenced by performance drops when models are tested on unpublished, distributionally similar sets. Also, it hinders practitioners from fully comprehending the impact of data composition and algorithmic choices. While open-source alternatives exist, they often come with restrictive licenses or limitations in question diversity and difficulty levels. These issues collectively impede progress and broader application of mathematical reasoning capabilities in language models.

Several datasets have been developed to enhance the mathematical reasoning abilities of language models. NuminaMath and Skywork-MathQA offer large collections of competition-level problems with chain-of-thought annotations and diverse augmentation techniques. MuggleMath focuses on complicating and diversifying queries, while MetaMathQA employs bootstrapping and advanced reasoning techniques. MAmmoTH2 introduced an efficient method for extracting instruction data from pre-training web corpora. Other approaches have expanded existing datasets like MATH and GSM8K, significantly improving model accuracy.

Tool-integrated methods have gained prominence, with the Program of Thoughts (PoT) approach combining text and programming language statements for problem-solving. Building on this concept, datasets like OpenMathInstruct-1 and InfinityMATH have been created, focusing on code-interpreter solutions and programmatic mathematical reasoning. These diverse approaches aim to address the limitations of earlier datasets by increasing question diversity, difficulty levels, and reasoning complexity.

The proposed approach by the researchers from NVIDIA, built upon previous approaches, utilizing chain-of-thought-based solutions and question augmentation to create a robust dataset. However, it introduces several key innovations that set it apart from existing work. Firstly, the method employs open-weight models instead of proprietary closed-source language models, enabling the release of the dataset under a permissive license. This approach enhances accessibility and transparency in the field. Secondly, it provides new insights into critical aspects of dataset creation, including the impact of low-quality data, the effectiveness of on-policy training, and the design of solution formats. Lastly, the method ensures result accuracy through a comprehensive decontamination process, utilizing an LLM-based pipeline capable of detecting rephrased variations of test set questions, thus addressing concerns about data leakage and benchmark validity.

The OpenMathInstruct-2 utilizes the Llama3.1 family of models to generate synthetic math instruction tuning data. The approach is refined through careful ablation studies on the MATH dataset, revealing several key insights. The proposed chain-of-thought solution format outperforms Llama’s format by 3.9% while being 40% shorter. Data generated by a strong teacher model surpasses on-policy data from a weaker student model by 7.8%. The method demonstrates robustness to up to 20% of low-quality data, and increasing question diversity significantly improves performance.

The dataset is created using Llama-3.1-405B-Instruct to synthesize solutions for existing MATH and GSM8K questions and generate new question-solution pairs. A thorough decontamination process, including the lm-sys pipeline and manual inspection, ensures test set integrity. The resulting dataset comprises 14 million question-solution pairs, including 592,000 synthesized questions, making it about eight times larger than previous open-source datasets. The effectiveness of OpenMathInstruct-2 is demonstrated by the superior performance of fine-tuned models, with OpenMath2-Llama3.1-8B outperforming Llama3.1-8B-Instruct by 15.9% on the MATH benchmark.

OpenMathInstruct-2 demonstrates impressive results across various mathematical reasoning benchmarks. Training details involve using the AdamW optimizer with specific learning rates and weight decay. The 8B model is trained on different subsets of the dataset to understand data scaling effects, while the 70B model is trained on a 5M subset due to computational constraints. Evaluation is conducted on a comprehensive set of benchmarks, including GSM8K, MATH, AMC 2023, AIME 2024, and OmniMATH, covering a wide range of difficulty levels.

The impact of data scaling shows consistent performance gains, with even the 1M subset outperforming Llama3.1-8B-Instruct and NuminaMath-7B-CoT. The OpenMath2-Llama3.1-8B model, trained on the full dataset, outperforms or matches Llama3.1-8B-Instruct across all benchmarks. Among open-source models, it surpasses the recently released NuminaMath-7B-CoT. The 70B model shows improvements on a subset of benchmarks, suggesting that the data blend or solution format might be more suitable for smaller models. Overall, the results demonstrate the effectiveness of the OpenMathInstruct-2 method in enhancing the mathematical reasoning capabilities of language models.

The OpenMathInstruct-2 project makes significant contributions to open-source progress in mathematical reasoning for language models. By releasing a comprehensive dataset, high-performing models, and reproducible code, it advances the field’s understanding of effective dataset construction. The research reveals crucial insights: the importance of optimized chain-of-thought formats, the limitations of on-policy data for supervised fine-tuning, the robustness of models to incorrect solutions during training, and the critical role of question diversity. These findings, coupled with rigorous decontamination processes, ensure accurate benchmark evaluations. This work not only provides valuable resources but also establishes best practices for developing future mathematical reasoning datasets and models.

Check out the Paper and Dataset on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post NVIDIA AI Releases OpenMathInstruct-2: A Math Instruction Tuning Dataset with 14M Problem-Solution Pairs Generated Using the Llama3.1-405B-Instruct Model appeared first on MarkTechPost.

From Fixed to Random Designs: Unveiling the Hidden Factor Behind Moder …

Modern machine learning (ML) phenomena such as double descent and benign overfitting have challenged long-standing statistical intuitions, confusing many classically trained statisticians. These phenomena contradict fundamental principles taught in introductory data science courses, especially overfitting and the bias-variance tradeoff. The striking performance of highly overparameterized ML models trained to zero loss contradicts conventional wisdom about model complexity and generalization. This unexpected behavior raises critical questions about the continued relevance of traditional statistical concerns and whether recent developments in ML represent a paradigm shift or reveal previously overlooked approaches to learning from data.

Various researchers have attempted to unravel the complexities of modern ML phenomena. Studies have shown that benign interpolation and double descent are not limited to deep learning but also occur in simpler models like kernel methods and linear regression. Some researchers have revisited the bias-variance tradeoff, noting its absence in deep neural networks and proposing updated decompositions of prediction error. Others have developed taxonomies of interpolating models, distinguishing between benign, tempered, and catastrophic behaviors. These efforts aim to bridge the gap between classical statistical intuitions and modern ML observations, providing a more comprehensive understanding of generalization in complex models.

A researcher from the University of Cambridge has presented a note to understand the discrepancies between classical statistical intuitions and modern ML phenomena such as double descent and benign overfitting. While previous explanations have focused on the complexity of model ML methods, overparameterization, and higher data dimensionality, this study explores a simpler yet often overlooked reason for the observed behaviors. The researchers highlight that statistics historically focused on fixed design settings and in-sample prediction error, whereas modern ML evaluates performance based on generalization error and out-of-sample predictions.

The researchers explore how moving from fixed to random design settings affects the bias-variance tradeoff. The k-nearest Neighbor (k-NN) estimators are used as a simple example to show that surprising behaviors in bias and variance are not limited to complex modern ML methods. Moreover, in the random design setting, the classical intuition that “variance increases with model complexity, while bias decreases” does not necessarily hold. This is because bias no longer monotonically decreases as complexity increases. The key insight is that there is no perfect match between training points and new test points in random design, meaning that even the simplest models may not achieve zero bias. This fundamental difference challenges the traditional understanding of the bias-variance tradeoff and its implications for model selection.

The researchers’ analysis shows that the traditional bias-variance tradeoff intuition breaks down in out-of-sample predictions, even for simple estimators and data-generating processes. While the classical notion that “variance increases with model complexity, and bias decreases” holds for in-sample settings, it doesn’t necessarily apply to out-of-sample predictions. Moreover, there are scenarios where bias and variance decrease as model complexity is reduced, contradicting conventional wisdom. This observation is crucial for understanding phenomena like double descent and benign overfitting. The researchers emphasize that overparameterization and interpolation alone are not responsible for challenging textbook principles.

In conclusion, the researcher from the University of Cambridge highlights a crucial yet often overlooked factor in the emergence of seemingly counterintuitive modern ML phenomena: the shift from evaluating model performance based on in-sample prediction error to generalization to new inputs. This transition from fixed to random designs fundamentally alters the classical bias-variance tradeoff, even for simple k-NN estimators in under-parameterized regimes. This finding challenges the idea that high-dimensional data, complex ML estimators, and over-parameterization are only responsible for these surprising behaviors. This research provides valuable insights into the learning and generalization in contemporary ML landscapes.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post From Fixed to Random Designs: Unveiling the Hidden Factor Behind Modern Machine Learning ML Phenomena appeared first on MarkTechPost.

Revisiting Recurrent Neural Networks RNNs: Minimal LSTMs and GRUs for …

Recurrent neural networks (RNNs) have been foundational in machine learning for addressing various sequence-based problems, including time series forecasting and natural language processing. RNNs are designed to handle sequences of varying lengths by maintaining an internal state that captures information across time steps. However, these models often struggle with vanishing and exploding gradient issues, which reduce their effectiveness for longer sequences. To address this limitation, various architectural advancements have been developed over the years, enhancing the ability of RNNs to capture long-term dependencies and perform more complex sequence-based tasks.

A significant challenge in sequence modeling is the computational inefficiency of existing models, particularly for long sequences. Transformers have emerged as a dominant architecture, achieving state-of-the-art results in numerous applications such as language modeling and translation. However, their quadratic complexity concerning sequence length renders them resource-intensive and impractical for many applications with longer sequences or limited computational resources. This has led to a renewed interest in models that can balance performance and efficiency, ensuring scalability without compromising on accuracy.

Several current methods have been proposed to tackle this problem, such as state-space models like Mamba, which utilize input-dependent transitions to efficiently manage sequences. Other methods, like linear attention models, optimize training by reducing the computation required for longer sequences. Despite achieving performance comparable to transformers, these methods often involve complex algorithms and require specialized techniques for efficient implementation. Moreover, attention-based mechanisms like Aaren and S4 have introduced innovative strategies to address the inefficiencies, but they still face limitations, such as increased memory usage and complexity in implementation.

The researchers at Borealis AI and Mila—Université de Montréal have reexamined traditional RNN architectures, specifically the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models. They introduced simplified, minimal versions of these models, named minLSTM and minGRU, to address the scalability issues faced by their traditional counterparts. By removing hidden state dependencies, the minimal versions no longer require backpropagation through time (BPTT) and can be trained in parallel, significantly improving efficiency. This breakthrough enables these minimal RNNs to handle longer sequences with reduced computational costs, making them competitive with the latest sequence models.

The proposed minimal LSTM and GRU models eliminate various gating mechanisms that are computationally expensive and unnecessary for many sequence tasks. By simplifying the architecture and ensuring the outputs are time-independent in scale, the researchers were able to create models that use up to 33% fewer parameters than traditional RNNs. Further, the modified architecture allows for parallel training, making these minimal models up to 175 times faster than standard LSTMs and GRUs when handling sequences of length 512. This improvement in training speed is crucial for scaling up the models for real-world applications that require handling long sequences, such as text generation and language modeling.

In terms of performance and results, the minimal RNNs demonstrated substantial gains in training time and efficiency. For example, on a T4 GPU, the minGRU model achieved a 175x speedup in training time compared to the traditional GRU for a sequence length of 512, while minLSTM showed a 235x improvement. For even longer sequences of length 4096, the speedup was even more pronounced, with minGRU and minLSTM achieving speedups of 1324x and 1361x, respectively. These improvements make the minimal RNNs highly suitable for applications requiring fast and efficient training. The models also performed competitively with modern architectures like Mamba in empirical tests, showing that the simplified RNNs can achieve similar or even superior results with much lower computational overhead.

The researchers further tested the minimal models on reinforcement learning tasks and language modeling. In the reinforcement learning experiments, the minimal models outperformed existing methods such as Decision S4 and performed comparably with Mamba and Decision Transformer. For example, on the Hopper-Medium dataset, the minLSTM model achieved a performance score of 85.0, while the minGRU scored 79.4, indicating strong results across varying levels of data quality. Similarly, in language modeling tasks, minGRU and minLSTM achieved cross-entropy losses comparable to transformer-based models, with minGRU reaching a loss of 1.548 and minLSTM achieving a loss of 1.555 on the Shakespeare dataset. These results highlight the efficiency and robustness of the minimal models in diverse sequence-based applications.

In conclusion, the research team’s introduction of minimal LSTMs and GRUs addresses the computational inefficiencies of traditional RNNs while maintaining strong empirical performance. By simplifying the models and leveraging parallel training, the minimal versions offer a viable alternative to more complex modern architectures. The findings suggest that with some modifications, traditional RNNs can still be effective for long sequence modeling tasks, making these minimal models a promising solution for future research and applications in the field.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post Revisiting Recurrent Neural Networks RNNs: Minimal LSTMs and GRUs for Efficient Parallel Training appeared first on MarkTechPost.

Build a generative AI Slack chat assistant using Amazon Bedrock and Am …

Despite the proliferation of information and data in business environments, employees and stakeholders often find themselves searching for information and struggling to get their questions answered quickly and efficiently. This can lead to productivity losses, frustration, and delays in decision-making.
A generative AI Slack chat assistant can help address these challenges by providing a readily available, intelligent interface for users to interact with and obtain the information they need. By using the natural language processing and generation capabilities of generative AI, the chat assistant can understand user queries, retrieve relevant information from various data sources, and provide tailored, contextual responses.
By harnessing the power of generative AI and Amazon Web Services (AWS) services Amazon Bedrock, Amazon Kendra, and Amazon Lex, this solution provides a sample architecture to build an intelligent Slack chat assistant that can streamline information access, enhance user experiences, and drive productivity and efficiency within organizations.
Why use Amazon Kendra for building a RAG application?
Amazon Kendra is a fully managed service that provides out-of-the-box semantic search capabilities for state-of-the-art ranking of documents and passages. You can use Amazon Kendra to quickly build high-accuracy generative AI applications on enterprise data and source the most relevant content and documents to maximize the quality of your Retrieval Augmented Generation (RAG) payload, yielding better large language model (LLM) responses than using conventional or keyword-based search solutions. Amazon Kendra offers simple-to-use deep learning search models that are pre-trained on 14 domains and don’t require machine learning (ML) expertise. Amazon Kendra can index content from a wide range of sources, including databases, content management systems, file shares, and web pages.
Further, the FAQ feature in Amazon Kendra complements the broader retrieval capabilities of the service, allowing the RAG system to seamlessly switch between providing prewritten FAQ responses and dynamically generating responses by querying the larger knowledge base. This makes it well-suited for powering the retrieval component of a RAG system, allowing the model to access a broad knowledge base when generating responses. By integrating the FAQ capabilities of Amazon Kendra into a RAG system, the model can use a curated set of high-quality, authoritative answers for commonly asked questions. This can improve the overall response quality and user experience, while also reducing the burden on the language model to generate these basic responses from scratch.
This solution balances retaining customizations in terms of model selection, prompt engineering, and adding FAQs with not having to deal with word embeddings, document chunking, and other lower-level complexities typically required for RAG implementations.
Solution overview
The chat assistant is designed to assist users by answering their questions and providing information on a variety of topics. The purpose of the chat assistant is to be an internal-facing Slack tool that can help employees and stakeholders find the information they need.
The architecture uses Amazon Lex for intent recognition, AWS Lambda for processing queries, Amazon Kendra for searching through FAQs and web content, and Amazon Bedrock for generating contextual responses powered by LLMs. By combining these services, the chat assistant can understand natural language queries, retrieve relevant information from multiple data sources, and provide humanlike responses tailored to the user’s needs. The solution showcases the power of generative AI in creating intelligent virtual assistants that can streamline workflows and enhance user experiences based on model choices, FAQs, and modifying system prompts and inference parameters.
Architecture diagram
The following diagram illustrates a RAG approach where the user sends a query through the Slack application and receives a generated response based on the data indexed in Amazon Kendra. In this post, we use Amazon Kendra Web Crawler as the data source and include FAQs stored on Amazon Simple Storage Service (Amazon S3). See Data source connectors for a list of supported data source connectors for Amazon Kendra.

The step-by-step workflow for the architecture is the following:

The user sends a query such as What is the AWS Well-Architected Framework? through the Slack app.
The query goes to Amazon Lex, which identifies the intent.
Currently two intents are configured in Amazon Lex (Welcome and FallbackIntent).
The welcome intent is configured to respond with a greeting when a user enters a greeting such as “hi” or “hello.” The assistant responds with “Hello! I can help you with queries based on the documents provided. Ask me a question.”
The fallback intent is fulfilled with a Lambda function.

The Lambda function searches Amazon Kendra FAQs through the search_Kendra_FAQ method by taking the user query and Amazon Kendra index ID as inputs. If there’s a match with a high confidence score, the answer from the FAQ is returned to the user.

def search_Kendra_FAQ(question, kendra_index_id):
“””
This function takes in the question from the user, and checks if the question exists in the Kendra FAQs.
:param question: The question the user is asking that was asked via the frontend input text box.
:param kendra_index_id: The kendra index containing the documents and FAQs
:return: If found in FAQs, returns the answer along with any relevant links. If not, returns False and then calls kendra_retrieve_document function.
“””
kendra_client = boto3.client(‘kendra’)
response = kendra_client.query(IndexId=kendra_index_id, QueryText=question, QueryResultTypeFilter=’QUESTION_ANSWER’)
for item in response[‘ResultItems’]:
score_confidence = item[‘ScoreAttributes’][‘ScoreConfidence’]
# Taking answers from FAQs that have a very high confidence score only
if score_confidence == ‘VERY_HIGH’ and len(item[‘AdditionalAttributes’]) > 1:
text = item[‘AdditionalAttributes’][1][‘Value’][‘TextWithHighlightsValue’][‘Text’]
url = “None”
if item[‘DocumentURI’] != ”:
url = item[‘DocumentURI’]
return (text, url)
return (False, False)

If there isn’t a match with a high enough confidence score, relevant documents from Amazon Kendra with a high confidence score are retrieved through the kendra_retrieve_document method and sent to Amazon Bedrock to generate a response as the context.

def kendra_retrieve_document(question, kendra_index_id):
“””
This function takes in the question from the user, and retrieves relevant passages based on default PageSize of 10.
:param question: The question the user is asking that was asked via the frontend input text box.
:param kendra_index_id: The kendra index containing the documents and FAQs
:return: Returns the context to be sent to the LLM and document URIs to be returned as relevant data sources.
“””
kendra_client = boto3.client(‘kendra’)
documents = kendra_client.retrieve(IndexId=kendra_index_id, QueryText=question)
text = “”
uris = set()
if len(documents[‘ResultItems’]) > 0:
for i in range(len(documents[‘ResultItems’])):
score_confidence = documents[‘ResultItems’][i][‘ScoreAttributes’][‘ScoreConfidence’]
if score_confidence == ‘VERY_HIGH’ or score_confidence == ‘HIGH’:
text += documents[‘ResultItems’][i][‘Content’] + “n”
uris.add(documents[‘ResultItems’][i][‘DocumentURI’])
return (text, uris)

The response is generated from Amazon Bedrock with the invokeLLM method. The following is a snippet of the invokeLLM method within the fulfillment function. Read more on inference parameters and system prompts to modify parameters that are passed into the Amazon Bedrock invoke model request.

def invokeLLM(question, context, modelId):
“””
This function takes in the question from the user, along with the Kendra responses as context to generate an answer
for the user on the frontend.
:param question: The question the user is asking that was asked via the frontend input text box.
:param documents: The response from the Kendra document retrieve query, used as context to generate a better
answer.
:return: Returns the final answer that will be provided to the end-user of the application who asked the original
question.
“””
# Setup Bedrock client
bedrock = boto3.client(‘bedrock-runtime’)
# configure model specifics such as specific model
modelId = modelId

# body of data with parameters that is passed into the bedrock invoke model request
body = json.dumps({“max_tokens”: 350,
“system”: “You are a truthful AI assistant. Your goal is to provide informative and substantive responses to queries based on the documents provided. If you do not know the answer to a question, you truthfully say you do not know.”,
“messages”: [{“role”: “user”, “content”: “Answer this user query:” + question + “with the following context:” + context}],
“anthropic_version”: “bedrock-2023-05-31”,
“temperature”:0,
“top_k”:250,
“top_p”:0.999})

# Invoking the bedrock model with your specifications
response = bedrock.invoke_model(body=body,
modelId=modelId)
# the body of the response that was generated
response_body = json.loads(response.get(‘body’).read())
# retrieving the specific completion field, where you answer will be
answer = response_body.get(‘content’)
# returning the answer as a final result, which ultimately gets returned to the end user
return answer

Finally, the response generated from Amazon Bedrock along with the relevant referenced URLs are returned to the end user.
When selecting websites to index, adhere to the AWS Acceptable Use Policy and other AWS terms. Remember that you can only use Amazon Kendra Web Crawler to index your own web pages or web pages that you have authorization to index. Visit the Amazon Kendra Web Crawler data source guide to learn more about using the web crawler as a data source. Using Amazon Kendra Web Crawler to aggressively crawl websites or web pages you don’t own is not considered acceptable use. Supported features The chat assistant supports the following features:

Support for the following Anthropic’s models on Amazon Bedrock:

claude-v2
claude-3-haiku-20240307-v1:0
claude-instant-v1
claude-3-sonnet-20240229-v1:0

Support for FAQs and the Amazon Kendra Web Crawler data source
Returns FAQ answers only if the confidence score is VERY_HIGH
Retrieves only documents from Amazon Kendra that have a HIGH or VERY_HIGH confidence score
If documents with a high confidence score aren’t found, the chat assistant returns “No relevant documents found”
Prerequisites To perform the solution, you need to have following prerequisites:

Basic knowledge of AWS
An AWS account with access to Amazon S3 and Amazon Kendra
An S3 bucket to store your documents. For more information, see Step 1: Create your first S3 bucket and the Amazon S3 User Guide.
A Slack workspace to integrate the chat assistant
Permission to install Slack apps in your Slack workspace
Seed URLs for the Amazon Kendra Web Crawler data source

You’ll need authorization to crawl and index any websites provided

AWS CloudFormation for deploying the solution resources
Build a generative AI Slack chat assistant To build a Slack application, use the following steps:

Request model access on Amazon Bedrock for all Anthropic models
Create an S3 bucket in the us-east-1 (N. Virginia) AWS Region.
Upload the AIBot-LexJson.zip and SampleFAQ.csv files to the S3 bucket
Launch the CloudFormation stack in the us-east-1 (N. Virginia) AWS Region.
Enter a Stack name of your choice
For S3BucketName, enter the name of the S3 bucket created in Step 2
For S3KendraFAQKey, enter the name of the SampleFAQs uploaded to the S3 bucket in Step 3
For S3LexBotKey, enter the name of the Amazon Lex .zip file uploaded to the S3 bucket in Step 3
For SeedUrls, enter up to 10 URLs for the web crawler as a comma delimited list. In the example in this post, we give the publicly available Amazon Bedrock service page as the seed URL
Leave the rest as defaults. Choose Next. Choose Next again on the Configure stack options
Acknowledge by selecting the box and choose Submit, as shown in the following screenshot
Wait for the stack creation to complete
Verify all resources are created
Test on the AWS Management Console for Amazon Lex

On the Amazon Lex console, choose your chat assistant ${YourStackName}-AIBot
Choose Intents
Choose Version 1 and choose Test, as shown in the following screenshot
Select the AIBotProdAlias and choose Confirm, as shown in the following screenshot. If you want to make changes to the chat assistant, you can use the draft version, publish a new version, and assign the new version to the AIBotProdAlias. Learn more about Versioning and Aliases.
Test the chat assistant with questions such as, “Which AWS service has 11 nines of durability?” and “What is the AWS Well-Architected Framework?” and verify the responses. The following table shows that there are three FAQs in the sample .csv file.

_question
_answer
_source_uri

Which AWS service has 11 nines of durability?
Amazon S3
https://aws.amazon.com/s3/

What is the AWS Well-Architected Framework?
The AWS Well-Architected Framework enables customers and partners to review their architectures using a consistent approach and provides guidance to improve designs over time.
https://aws.amazon.com/architecture/well-architected/

In what Regions is Amazon Kendra available?
Amazon Kendra is currently available in the following AWS Regions: Northern Virginia, Oregon, and Ireland
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/

The following screenshot shows the question “Which AWS service has 11 nines of durability?” and its response. You can observe that the response is the same as in the FAQ file and includes a link.
Based on the pages you have crawled, ask a question in the chat. For this example, the publicly available Amazon Bedrock page was crawled and indexed. The following screenshot shows the question, “What are agents in Amazon Bedrock?” and and a generated response that includes relevant links.

For integration of the Amazon Lex chat assistant with Slack, see Integrating an Amazon Lex V2 bot with Slack. Choose the AIBotProdAlias under Alias in the Channel Integrations
Run sample queries to test the solution

In Slack, go to the Apps section. In the dropdown menu, choose Manage and select Browse apps.
Search for ${AIBot} in App Directory and choose the chat assistant. This will add the chat assistant to the Apps section in Slack. You can now start asking questions in the chat. The following screenshot shows the question “Which AWS service has 11 nines of durability?” and its response. You can observe that the response is the same as in the FAQ file and includes a link.
The following screenshot shows the question, “What is the AWS Well-Architected Framework?” and its response.
Based on the pages you have crawled, ask a question in the chat. For this example, the publicly available Amazon Bedrock page was crawled and indexed. The following screenshot shows the question, “What are agents in Amazon Bedrock?” and and a generated response that includes relevant links.
The following screenshot shows the question, “What is amazon polly?” Because there is no Amazon Polly documentation indexed, the chat assistant responds with “No relevant documents found,” as expected.
These examples show how the chat assistant retrieves documents from Amazon Kendra and provides answers based on the documents retrieved. If no relevant documents are found, the chat assistant responds with “No relevant documents found.” Clean up To clean up the resources created by this solution:

Delete the CloudFormation stack by navigating to the CloudFormation console
Select the stack you created for this solution and choose Delete
Confirm the deletion by entering the stack name in the provided field. This will remove all the resources created by the CloudFormation template, including the Amazon Kendra index, Amazon Lex chat assistant, Lambda function, and other related resources.
Conclusion This post describes the development of a generative AI Slack application powered by Amazon Bedrock and Amazon Kendra. This is designed to be an internal-facing Slack chat assistant that helps answer questions related to the indexed content. The solution architecture includes Amazon Lex for intent identification, a Lambda function for fulfilling the fallback intent, Amazon Kendra for FAQ searches and indexing crawled web pages, and Amazon Bedrock for generating responses. The post walks through the deployment of the solution using a CloudFormation template, provides instructions for running sample queries, and discusses the steps for cleaning up the resources. Overall, this post demonstrates how to use various AWS services to build a powerful generative AI–powered chat assistant application. This solution demonstrates the power of generative AI in building intelligent chat assistants and search assistants. Explore the generative AI Slack chat assistant: Invite your teams to a Slack workspace and start getting answers to your indexed content and FAQs. Experiment with different use cases and see how you can harness the capabilities of services like Amazon Bedrock and Amazon Kendra to enhance your business operations. For more information about using Amazon Bedrock with Slack, refer to Deploy a Slack gateway for Amazon Bedrock.
About the authors Kruthi Jayasimha Rao is a Partner Solutions Architect with a focus on AI and ML. She provides technical guidance to AWS Partners in following best practices to build secure, resilient, and highly available solutions in the AWS Cloud. Mohamed Mohamud is a Partner Solutions Architect with a focus on Data Analytics. He specializes in streaming analytics, helping partners build real-time data pipelines and analytics solutions on AWS. With expertise in services like Amazon Kinesis, Amazon MSK, and Amazon EMR, Mohamed enables data-driven decision-making through streaming analytics.

Optimizing Long-Context Processing with Role-RL: A Reinforcement Learn …

Training Large Language Models (LLMs) that can handle long-context processing is still a difficult task because of data sparsity constraints, implementation complexity, and training efficiency. Working with documents of infinite duration, which are typical in contemporary media formats like automated news updates, live-stream e-commerce platforms, and viral short-form movies, makes these problems very clear. Online Long-context Processing (OLP) is a new paradigm that is used to overcome this.

The OLP paradigm is specifically made to handle and process massive amounts of data in real-time, arranging and evaluating various media streams as they come in. OLP can assist in segmenting and categorizing streaming transcripts into relevant areas, such as product descriptions, pricing talks, or customer interactions, in live e-commerce. It can assist in organizing a constant stream of news data into groups such as facts, views, and projections in automated news reporting, which enhances the information’s accuracy and user-friendliness.

However, trying to choose the best available LLM from an ever-increasing pool of models presents another difficulty. It is challenging to identify a model that performs well in all of these areas because each one differs in terms of cost, response time, and performance. In response to this problem, a framework known as Role Reinforcement Learning (Role-RL) has been introduced in a recent research paper from South China Normal University, Toronto University and Zhejiang University. Role-RL uses real-time performance data to automate the deployment of various LLMs in the OLP pipeline according to their ideal roles.

Each LLM is assessed by Role-RL based on important performance metrics such as speed, accuracy, and cost-effectiveness. Role-RL maximizes the system’s overall efficiency by dynamically assigning each LLM to the tasks for which they are most suitable based on these evaluations. With this method, resources can be used more strategically, guaranteeing that high-performing LLMs take on the most important jobs and that more economical models are used for simpler procedures.

Extensive studies on the OLP-MINI dataset have revealed that the combined OLP and Role-RL framework yielded notable benefits. With an average recall rate of 93.2%, it achieved an OLP benchmark, demonstrating the system’s ability to reliably and frequently retrieve pertinent information. This framework was also responsible for a 79.4% cost reduction for LLM deployment, demonstrating its economic viability in addition to its efficiency.

The team has summarized their primary contributions as follows.

The Role Reinforcement Learning (Role-RL) framework, has been introduced, which is intended to strategically place different LLMs in the roles that best fit them according to how well they perform in real-time on certain tasks. This guarantees that LLMs are deployed as efficiently and accurately as possible.

To manage long-context jobs, the team has suggested Online Long-context Processing (OLP) pipeline. The pipeline processes and organises data from long documents or media streams in a successful manner. OLP-MINI dataset has also been presented for validation and testing.

The benchmark average recall rate of 93.2% has been attained using the Role-RL framework in conjunction with the OLP pipeline. The framework also reduces LLM expenses by 79.4%. In addition, the recall rate is increased by 53.6 percentage points using the OLP pipeline as opposed to non-OLP procedures.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post Optimizing Long-Context Processing with Role-RL: A Reinforcement Learning Framework for Efficient Large Language Model Deployment appeared first on MarkTechPost.

Compositional Hardness in Large Language Models (LLMs): A Probabilisti …

A popular method when employing Large Language Models (LLMs) for complicated analytical tasks, such as code generation, is to attempt to solve the full problem within the model’s context window. The informational segment that the LLM is capable of processing concurrently is referred to as the context window. The amount of data the model can process at once has a significant impact on its capacity to produce a solution. Although this method is effective for simpler jobs, issues arise when handling more intricate, multi-step situations.

According to recent research, LLMs do noticeably better on complex tasks when they divide the task into smaller subtasks using a technique called subtask decomposition, sometimes referred to as chain of thought (COT). This method involves breaking down a huge problem into smaller tasks and tackling them separately, then integrating the findings to provide a complete solution. By using this approach, LLMs can concentrate on the easier parts of the process and make sure that every section is completed more efficiently. 

The in-context construction of tasks is still severely limited, even with the benefits of task decomposition. This constraint describes the challenge LLMs encounter while trying to manage several subtasks in the same context window. The complexity of organizing and integrating the processes increases dramatically with the number of subtasks included. Even though an LLM can deconstruct an issue, solving it in its entirety within the framework of the model tax the system, resulting in lower performance and accuracy.

Researchers have established the concept of generation complexity to help comprehend this limitation. This metric calculates the number of times an LLM must produce alternative answers before coming up with the right one. When every step needs to be completed within of the same context window, generation complexity for composite problems, those with several related tasks increases dramatically. The generation complexity increases with the number of steps and task complexity, particularly when managed by a single model instance. 

The primary problem is that LLMs function inside a fixed context limit, even when they attempt to decompose activities. This makes it difficult for the model to appropriately compose all of the answers when jobs become more complex and require a number of sub-steps. Multi-agent systems are a possible solution. Different instances of LLMs can be used to divide the burden instead of one LLM handling all subtasks inside a constrained context window. As a separate LLM, each agent can concentrate on resolving a certain aspect of the problem. The results can be combined to create the entire solution once each agent has finished its part. A distributed approach greatly reduces the in-context hardness and generation complexity because each model only concentrates on a smaller, more manageable fraction of the work.

Compared to the single-agent approach, the employment of multi-agent systems has several benefits. Firstly, the models are not limited by the context window when the work is divided among numerous agents, which enables them to solve longer and more complicated tasks. Second, the system as a whole is more accurate and efficient since each agent operates separately, preventing the task’s complexity from growing exponentially as it would in a situation with a single agent. The autoregressive nature of LLMs, which produce outputs one step at a time, is another benefit that multi-agent systems exploit. In this way, the problems that occur when a single model has to handle all phases at once are avoided, and each agent can focus on their portion of the problem step by step.

The team has demonstrated that dividing up composite problems among several agents significantly lowers the generation complexity. Empirical data has indicated that when many LLM instances work together to solve tasks, instead of depending on a single model to handle everything within a single context window, tasks are performed more quickly, especially in areas like code generation.

In conclusion, though LLMs have demonstrated significant promise in resolving intricate analytical problems, the difficulties associated with in-context construction impede their effectiveness. Although subtask decomposition has been useful, it is insufficient to get beyond the context window’s limitations completely. By splitting up work across several LLM instances, multi-agent systems have presented a viable option that increases precision, lowers complexity, and enables LLMs to tackle more complicated and large-scale issues. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post Compositional Hardness in Large Language Models (LLMs): A Probabilistic Approach to Code Generation appeared first on MarkTechPost.

Compositional GSM: A New AI Benchmark for Evaluating Large Language Mo …

Natural language processing (NLP) has experienced rapid advancements, with large language models (LLMs) being used to tackle various challenging problems. Among the diverse applications of LLMs, mathematical problem-solving has emerged as a benchmark to assess their reasoning abilities. These models have demonstrated remarkable performance on math-specific benchmarks such as GSM8K, which measures their capabilities to solve grade-school math problems. However, there is an ongoing debate regarding whether these models truly comprehend mathematical concepts or exploit patterns within training data to produce correct answers. This has led to a need for a deeper evaluation to understand the extent of their reasoning capabilities in handling complex, interconnected problem types.

Despite their success on existing math benchmarks, researchers identified a critical problem: most LLMs need to exhibit consistent reasoning when faced with more complex, compositional questions. While standard benchmarks involve solving individual problems independently, real-world scenarios often require understanding relationships between multiple problems, where the answer to one question must be used to solve another. Traditional evaluations do not adequately represent such scenarios, which focus only on isolated problem-solving. This creates a discrepancy between the high benchmark scores and LLMs’ practical usability for complex tasks requiring step-by-step reasoning and deeper understanding.

Researchers from Mila, Google DeepMind, and Microsoft Research have introduced a new evaluation method called “Compositional Grade-School Math (GSM).” This method involves chaining two separate math problems such that the solution to the first problem becomes a variable in the second problem. Using this approach, researchers can analyze the LLMs’ abilities to handle dependencies between questions, a concept that needs to be adequately captured by existing benchmarks. The Compositional GSM method offers a more comprehensive assessment of LLMs’ reasoning capabilities by introducing linked problems that require the model to carry information from one problem to another, making it necessary to solve both correctly for a successful outcome.

The evaluation was carried out using a variety of LLMs, including open-weight models like LLAMA3 and closed-weight models like GPT and Gemini families. The study included three test sets: the original GSM8K test split, a modified version of GSM8K where some variables were substituted, and the new Compositional GSM test set, each containing 1,200 examples. Models were tested using an 8-shot prompting method, where they were given several examples before being asked to solve the compositional problems. This method enabled the researchers to benchmark the models’ performance comprehensively, considering their ability to solve problems individually and in a compositional context.

The results showed a considerable gap in reasoning abilities. For instance, cost-efficient models such as GPT-4o mini exhibited a 2 to 12 times worse reasoning gap on compositional GSM compared to their performance on the standard GSM8K. Further, math-specialized models like Qwen2.5-MATH-72B, which achieved above 80% accuracy on high-school competition-level questions, could only solve less than 60% of the compositional grade-school math problems. This substantial drop suggests that more than specialized training in mathematics is needed to prepare models for multi-step reasoning tasks adequately. Furthermore, it was observed that models like LLAMA3-8B and Mistral-7B, despite achieving high scores on isolated problems, showed a sharp decline when required to link answers between related problems.

The researchers also explored the impact of instruction tuning and code generation on model performance. Instruction-tuning improved results for smaller models on standard GSM8K problems but led to only minor improvements on compositional GSM. Meanwhile, generating code solutions instead of using natural language resulted in a 71% to 149% improvement for some smaller models on compositional GSM. This finding indicates that while code generation helps reduce the reasoning gap, it does not eliminate it, and systematic differences in reasoning capabilities persist among various models.

Analysis of the reasoning gaps revealed that the performance drop was not due to test-set leakage but rather to distractions caused by additional context and poor second-hop reasoning. For example, when models like LLAMA3-70B-IT and Gemini 1.5 Pro were required to solve a second question using the answer of the first, they frequently needed to apply the solution accurately, resulting in incorrect final answers. This phenomenon, referred to as the second-hop reasoning gap, was more pronounced in smaller models, which tended to overlook crucial details when solving complex problems.

The study highlights that current LLMs, regardless of their performance on standard benchmarks, still struggle with compositional reasoning tasks. The Compositional GSM benchmark introduced in the research provides a valuable tool for evaluating the reasoning abilities of LLMs beyond isolated problem-solving. These results suggest that more robust training strategies and benchmark designs are needed to enhance the compositional capabilities of these models, enabling them to perform better in complex problem-solving scenarios. This research underscores the importance of reassessing existing evaluation methods and prioritizing the development of models capable of multi-step reasoning.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post Compositional GSM: A New AI Benchmark for Evaluating Large Language Models’ Reasoning Capabilities in Multi-Step Problems appeared first on MarkTechPost.

FactAlign: A Novel Alignment AI Framework Designed to Enhance the Fact …

LLMs show great promise as advanced information access engines thanks to their ability to generate long-form, natural language responses. Their large-scale pre-training on vast datasets allows them to answer various questions. Techniques like instruction tuning and reinforcement learning from human feedback further improve the coherence and detail of their responses. However, LLMs need help with hallucinations and generating inaccurate content, particularly in long-form responses, where ensuring factual accuracy is difficult. Despite improvements in reasoning and helpfulness, the issue of factuality remains a key obstacle to their real-world adoption.

Researchers from National Taiwan University have developed FACTALIGN, a framework designed to enhance the factual accuracy of LLMs while preserving their helpfulness. FACTALIGN introduces fKTO, a fine-grained, sentence-level alignment algorithm based on the Kahneman-Tversky Optimization method. By leveraging recent advancements in automatic factuality evaluation, FACTALIGN aligns LLM responses with fine-grained factual assessments. Experiments on open-domain and information-seeking prompts show that FACTALIGN significantly improves factual accuracy without sacrificing helpfulness, boosting the factual F1 score. The study’s key contributions include the fKTO algorithm and the FACTALIGN framework for improving LLM reliability.

Recent research on language model alignment focuses on aligning models with human values. InstructGPT and LLaMA-2 demonstrated improved instruction-following using reinforcement learning from human feedback (RLHF). Fine-grained RLHF and methods like Constitutional AI introduced AI-based feedback to reduce human annotation needs. Alternatives like DPO and KTO offer simpler alignment objectives without RL, with fKTO extending KTO to sentence-level alignment using factuality evaluators. Factuality challenges, such as hallucination, have been addressed through techniques like retrieval-augmented generation and self-checking models like SelfCheckGPT. Recent methods like FactTune and FLAME focus on improving factuality using factuality evaluators and alignment strategies, which fKTO enhances further.

The FACTALIGN framework includes a pipeline for assessing long-form factuality and an alignment process to improve factual accuracy and helpfulness in LMs. It utilizes atomic statements from sentences to create a sentence-level loss, allowing for more effective alignment than algorithms requiring pairwise preference labels. The overall loss function combines response-level and sentence-level losses, assigning a weight to the latter. The framework employs iterative optimization to address discrepancies between offline response assessments and the model’s training data. This involves periodically sampling new responses, assessing their factuality, and incorporating these into the training dataset for continuous improvement.

The experiments demonstrate the effectiveness of the FACTALIGN framework compared to various models, including GPT-4-Turbo and LLaMA-2-70B-Chat. FACTALIGN significantly enhances the factuality and helpfulness of the baseline Gemma-2B model, achieving improvements of 40.1% in f1@100 and 29.2% in MT-Bench scores. The findings indicate that FACTALIGN primarily boosts factual recall, increasing factual claims from 66.8 to 135.1 while slightly improving factual precision. An ablation study shows the necessity of iterative optimization and highlights the positive impact of both the fKTO loss and general-domain data on overall model performance.

In conclusion, the study introduces FACTALIGN, a framework to improve the factual accuracy of long-form responses generated by LLMs. The framework integrates a data construction process and a fine-grained alignment algorithm called fKTO, enhancing the factuality and helpfulness of LLM outputs. The analysis shows that FACTALIGN allows precise control over factual precision and recall levels. By addressing issues like hallucination and non-factual content, FACTALIGN demonstrates a significant improvement in the accuracy of LLM responses to open-domain and information-seeking prompts, enabling LLMs to provide richer information while maintaining factual integrity.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post FactAlign: A Novel Alignment AI Framework Designed to Enhance the Factuality of LLMs’ Long-Form Responses While Maintaining Their Helpfulness appeared first on MarkTechPost.

OpenAI’s ChatGPT Canvas Tutorial and Use Cases: Coding Customization …

OpenAI’s ChatGPT Canvas is an AI-powered workspace that integrates ChatGPT to assist coders and writers in real-time by providing intelligent suggestions, code completions, and content enhancements within a customizable environment that understands context and adapts to individual styles; featuring real-time collaboration, productivity tools like version control and task management, and supporting multiple programming languages and writing formats, Canvas aims to enhance productivity, streamline workflows, and revolutionize the creative and development processes while ensuring user privacy, data security, and ethical AI practices, with plans for continuous improvement and new features to further push the boundaries of innovation.

Table of contentsUse Case 1: CodingUse Case 2: Analyze a Graph for Tesla Stock

Use Case 1: Coding

Here is how to use OpenAI’s Canvas for Coding (A step-by-step guide):

Step 1: Go to the ChatGPT platform (Premium) version and choose GPT-4o with Canvas

Step 2: Now ask chatgpt to write some codes. It will then open the canvas with codes

Step 3: Now you can make edits on any section of the code. You add comments or add logs or review codes at any section of the code as well. This is all done simultaneously while making edits.

Use Case 2: Analyze a Graph for Tesla Stock

Here is how to use OpenAI’s Canvas for Coding (A step-by-step guide):

Step 1: Go to the ChatGPT platform (Premium) version and choose GPT-4o with Canvas

Step 2: Now upload the Tesla Stock graph (PNG) from last few days

Step 3: Using GPT-4o with Canvas ask ChatGPT to analyze the graph that we have uploaded as an image.

Step 4: Using the bottom right button for ‘Shortcodes’, you can make any suggestion for any section of the analysis.

Step 5: You can even select any section on the content and ask specific details

Conclusion

OpenAI’s Canvas is poised to transform the way coders and writers work, making AI an indispensable ally in the creative process. By combining powerful AI capabilities with a user-friendly workspace, Canvas opens up new possibilities for innovation and collaboration

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post OpenAI’s ChatGPT Canvas Tutorial and Use Cases: Coding Customization and Visualizing Tesla Stock Data appeared first on MarkTechPost.

Meta AI Unveils MovieGen: A Series of New Advanced Media Foundation AI …

Meta AI research team has introduced MovieGen, a suite of state-of-the-art (SotA) media foundation models that are set to revolutionize how we generate and interact with media content. This super cool development encompasses innovations in text-to-video generation, video personalization, and video editing, all while supporting personalized video creation using user-provided images. At the core of MovieGen are advanced architectural designs, training methodologies, and inference techniques that enable scalable media generation like never before.

Key Features of MovieGen

High-Resolution Video Generation

One of the standout features of MovieGen is its ability to generate 16-second videos at 1080p resolution and 16 frames per second (fps), complete with synchronized audio. This is made possible by a colossal 30 billion parameter model that leverages cutting-edge latent diffusion techniques. The model excels in producing high-quality, coherent videos that align perfectly with textual prompts, opening up new horizons in content creation and storytelling.

Advanced Audio Synthesis

In addition to video generation, MovieGen introduces a 13 billion parameter model specifically designed for video/text-to-audio synthesis. This model generates 48kHz cinematic audio that is synchronized with the visual input and can handle variable lengths of media up to 30 seconds. By learning visual-audio associations, the model can create both diegetic and non-diegetic sounds and music, enhancing the realism and emotional impact of the generated media.

Versatile Audio Context Handling

The audio generation capabilities of MovieGen are further enhanced through masked audio prediction training, which allows the model to handle different audio contexts, including generation, extension, and infilling. This means that the same model can be used for a variety of audio tasks without the need for separate specialized models, making it a versatile tool for content creators.

Efficient Training and Inference

MovieGen utilizes the Flow Matching objective for efficient training and inference, combined with a Diffusion Transformer (DiT) architecture. This approach accelerates the training process and reduces computational requirements, enabling faster generation of high-quality media content.

Technical Details

Latent Diffusion with DAC-VAE

At the technical core of MovieGen’s audio capabilities is the use of Latent Diffusion with DAC-VAE. This technique encodes 48kHz audio at 25Hz, achieving higher quality at a lower frame rate compared to traditional methods like Encodec. The result is crisp, high-fidelity audio that matches the cinematic quality of the generated videos.

DAC-VAE Enhancements

The DAC-VAE model incorporates several enhancements to improve audio reconstruction at compressed rates:

Multi-scale Short-Time Fourier Transform (STFT): This allows for better capture of both temporal and frequency-domain information.

Snake Activation Functions: These help reduce artifacts and improve the periodicity of the audio signals.

Removal of Residual Vector Quantization (RVQ): By eliminating RVQ and focusing on Variational Autoencoder (VAE) training, the model achieves superior reconstruction quality.

Applications and Implications

The introduction of MovieGen marks a significant leap forward in media generation technology. By combining high-resolution video generation with advanced audio synthesis, MovieGen enables the creation of immersive and personalized media experiences. Content creators can leverage these tools for:

Text-to-Video Generation: Crafting videos directly from textual descriptions.

Video Personalization: Customizing videos using user-provided images and content.

Video Editing: Enhancing and modifying existing videos with new audio-visual elements.

These capabilities have far-reaching implications for industries such as entertainment, advertising, education, and more, where dynamic and personalized content is increasingly in demand.

Conclusion

Meta AI’s MovieGen represents a monumental advancement in the field of media generation. With its sophisticated models and innovative techniques, it sets a new standard for what is possible in automated content creation. As AI continues to evolve, tools like MovieGen will play a pivotal role in shaping the future of media, offering unprecedented opportunities for creativity and expression.

Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!

The post Meta AI Unveils MovieGen: A Series of New Advanced Media Foundation AI Models appeared first on MarkTechPost.

Create your fashion assistant application using Amazon Titan models an …

In the generative AI era, agents that simulate human actions and behaviors are emerging as a powerful tool for enterprises to create production-ready applications. Agents can interact with users, perform tasks, and exhibit decision-making abilities, mimicking humanlike intelligence. By combining agents with foundation models (FMs) from the Amazon Titan in Amazon Bedrock family, customers can develop multimodal, complex applications that enable the agent to understand and generate natural language or images.
For example, in the fashion retail industry, an assistant powered by agents and multimodal models can provide customers with a personalized and immersive experience. The assistant can engage in natural language conversations, understanding the customer’s preferences and intents. It can then use the multimodal capabilities to analyze images of clothing items and make recommendations based on the customer’s input. Additionally, the agent can generate visual aids, such as outfit suggestions, enhancing the overall customer experience.
In this post, we implement a fashion assistant agent using Amazon Bedrock Agents and the Amazon Titan family models. The fashion assistant provides a personalized, multimodal conversational experience. Among others, the capabilities of Amazon Titan Image Generator to inpaint and outpaint images can be used to generate fashion inspirations and edit user photos. Amazon Titan Multimodal Embeddings models can be used to search for a style on a database using both a prompt text or a reference image provided by the user to find similar styles. Anthropic Claude 3 Sonnet is used by the agent to orchestrate the agent’s actions, for example, search for the current weather to receive weather-appropriate outfit recommendations. A simple web UI through Streamlit provides the user with the best experience to interact with the agent.
The fashion assistant agent can be smoothly integrated into existing ecommerce platforms or mobile applications, providing customers with a seamless and delightful experience. Customers can upload their own images, describe their desired style, or even provide a reference image, and the agent will generate personalized recommendations and visual inspirations.
The code used in this solution is available in the GitHub repository.
Solution overview
The fashion assistant agent uses the power of Amazon Titan models and Amazon Bedrock Agents to provide users with a comprehensive set of style-related functionalities:

Image-to-image or text-to-image search – This tool allows customers to find products similar to styles they like from the catalog, enhancing their user experience. We use the Titan Multimodal Embeddings model to embed each product image and store them in Amazon OpenSearch Serverless for future retrieval.
Text-to-image generation – If the desired style is not available in the database, this tool generates unique, customized images based on the user’s query, enabling the creation of personalized styles.
Weather API connection – By fetching weather information for a given location mentioned in the user’s prompt, the agent can suggest appropriate styles for the occasion, making sure the customer is dressed for the weather.
Outpainting – Users can upload an image and request to change the background, allowing them to visualize their preferred styles in different settings.
Inpainting – This tool enables users to modify specific clothing items in an uploaded image, such as changing the design or color, while keeping the background intact.

The following flow chart illustrates the decision-making process:

And the corresponding architecture diagram:

Prerequisites
To set up the fashion assistant agent, make sure you have the following:

An active AWS account and AWS Identity and Access Management (IAM) role with Amazon Bedrock, AWS Lambda, and Amazon Simple Storage (Amazon S3) access
Installation of required Python libraries such as Streamlit
Anthropic Claude 3 Sonnet, Amazon Titan Image Generator and Amazon Titan Multimodal Embeddings models enabled in Amazon Bedrock. You can confirm these are enabled on the Model access page of the Amazon Bedrock console. If these models are enabled, the access status will show as Access granted, as shown in the following screenshot.

Before executing the notebook provided in the GitHub repo to start building the infrastructure, make sure your AWS account has permission to:

Create managed IAM roles and policies
Create and invoke Lambda functions
Create, read from, and write to S3 buckets
Access and manage Amazon Bedrock agents and models

If you want to enable the image-to-image or text-to-image search capabilities, additional permissions for your AWS account are required:

Create security policy, access policy, collect, index, and index mapping on OpenSearch Serverless
Call the BatchGetCollection on OpenSearch Serverless

Set up the fashion assistant agent
To set up the fashion assistant agent, follow these steps:

Clone the GitHub repository using the command

git clone

Complete the prerequisites to grant sufficient permissions
Follow the deployment steps outlined in the README.md
(Optional) If you want to use the image_lookup feature, execute code snippets in opensearch_ingest.ipynb to use Amazon Titan Multimodal Embeddings to embed and store sample images
Run the Streamlit UI to interact with the agent using the command

streamlit run frontend/app.py

By following these steps, you can create a powerful and engaging fashion assistant agent that combines the capabilities of Amazon Titan models with the automation and decision-making capabilities of Amazon Bedrock Agents.
Test the fashion assistant
After the fashion assistant is set up, you can interact with it through the Streamlit UI. Follow these steps:

Navigate to your Streamlit UI, as shown in the following screenshot

Upload an image or enter a text prompt describing the desired style, according to the desired action, for example, image search, image generation, outpainting, or inpainting. The following screenshot shows an example prompt.

Press enter to send the prompt to the agent. You can view the chain-of-thought (CoT) process of the agent in the UI, as shown in the following screenshot

When the response is ready, you can view the agent’s response in the UI, as shown in the following screenshot. The response may include generated images, similar style recommendations, or modified images based on your request. You can download the generated images directly from the UI or check the image in your S3 bucket.

Clean up
To avoid unnecessary costs, make sure to delete the resources used in this solution. You can do this by running the following command.

cdk destroy

Conclusion
The fashion assistant agent, powered by Amazon Titan models and Amazon Bedrock Agents, is an example of how retailers can create innovative applications that enhance the customer experience and drive business growth. By using this solution, retailers can gain a competitive edge, offering personalized style recommendations, visual inspirations, and interactive fashion advice to their customers.
We encourage you to explore the potential of building more agents like this fashion assistant by checking out the examples available on the aws-samples GitHub repository.

 About the Authors
Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based solutions. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions.
Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.
Antonia Wiebeler is a Data Scientist at the AWS Generative AI Innovation Center, where she enjoys building proofs of concept for customers. Her passion is exploring how generative AI can solve real-world problems and create value for customers. While she is not coding, she enjoys running and competing in triathlons.
Alex Newton is a Data Scientist at the AWS Generative AI Innovation Center, helping customers solve complex problems with generative AI and machine learning. He enjoys applying state of the art ML solutions to solve real world challenges. In his free time you’ll find Alex playing in a band or watching live music.
Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.
Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Google Ads Customer Match: How to Use First-Party Data for Smarter Ret …

As long as I can remember, advertisers have been hesitant to send their first-party data to Google Ads audiences. Why? The onus of privacy.

Advertisers, and specifically agencies, didn’t want to be the ones responsible for leaking any personal customer info. And let’s be real – Google made it quite clear they wouldn’t be taking responsibility. 

So, for years, as privacy moved to the forefront of technology, cookies lessened, and targeting capabilities became worse and worse, we’ve just simply dealt with it. 

Until now. 

In September 2024, Google Ads Liason Ginny Marvin announced confidential customer match, a way to securely connect your first-party data to Google Ads for Customer Match. Here’s the breakdown:

“Confidential matching is powered by a technology called confidential computing, which uses special software, and hardware called a trusted execution environment or TEE (you may recall we mentioned this at GML this year) to securely process data. 

Confidential matching can ensure your data remains encrypted and unseen by anyone, including Google.

Advertisers also have the option to encrypt their data themselves and receive proof that their data is processed as intended.”

At the same time, here at Customers.ai, we were launching our first-party Google Ads integration, which also safely and securely sends your first-party audience data to your Google Ads customer match list. 

Big news, right?!

It sure is! The benefits of this are unreal and we truly believe every Google advertiser should be using this. 

So, let’s get into Google Ads Customer Match, the power of first-party data, and how to start sending your first-party Customers.ai audience data to Google. 

What is Google Ads Customer Match?

How Google Ads Customer Match Works

What are the Benefits of Google Ads Customer Match?

What Are The Problems With Google Ads Customer Match?

How Does Customers.ai Work with Google Ads Customer Match?

What’s a Good Use Case to Leverage Customers.ai & Google Ads Customer Match?

How to Sync Customers.ai with Google Ads Customer Match

What is Google Ads Customer Match?

Before we get into the nitty-gritty, let’s make sure we address the most basic question – what is Google Ads Customer Match?

Google Ads Customer Match is a feature that lets you use the data you already have—like email addresses, phone numbers, and mailing addresses—to target your ads more precisely across Google platforms like Search, Shopping, Gmail, YouTube, and Display. 

By uploading a list of customer details, Google matches that data to its users, allowing you to reach both your current customers and others who share similar characteristics. 

It’s a highly effective way to connect your brand’s database with targeted outreach, helping you create ads that hit the right audience at the right time – something we all want, right?

How Google Ads Customer Match Works

At the most basic level, the way Google Ads Customer Match works is that you upload your customer data and Google matches it to its users. 

The more complete your data (names, addresses, and phone numbers vs. just an email), the better the match results. Once matched, you can target these users with ads across Google platforms. 

Just make sure your account meets Google’s requirements, like having 1,000 active users and a good compliance history.

What are the Benefits of Google Ads Customer Match?

The benefit of Google Ads Customer Match is simple – first-party data is better than third-party data and certainly better than Google’s technology “guessing” who your audience is.

While that is the most basic benefit, we’d be remiss if we didn’t discuss the other core benefits of using a Google Ads customer match list:

Better Targeting: Customer Match lets you reach in-market shoppers by using your first-party data to find customers who are actively looking for products like yours. More in-market shoppers means more sales.

Retargeting Reach: As we’ve discussed plenty, privacy updates have really made retargeting a challenge. With customer match data, you are giving Google the people it may have missed, allowing you to retarget customers who have already interacted with your brand.

Optimized Ad Spend / Higher ROAS: By focusing on high-intent audiences, Customer Match helps optimize your spend, resulting in a higher return on ad spend (ROAS) by minimizing wasted impressions and clicks.

Personalization: I think most people will tell you personalization is one of the biggest benefits, especially when it comes to Gmail or Shopping ads. With Customer Match, you can tailor your ads based on your customer data, creating a more personalized experience that resonates with your audience and increases engagement.

Repeat Purchases: Customer Match helps you stay connected with past customers, encouraging repeat purchases through targeted ads that showcase new offers or products they may be interested in.

Lookalike Audiences: Use your existing customer data to create lookalike audiences that share similar traits with your best customers, expanding your reach to potential new buyers who are likely to convert.

It’s important to remember that with any ad platform, the results are only as good as the data you provide. Customer Match isn’t a miracle, it’s simply a way to give Google the data it needs to perform.

What Are The Problems With Google Ads Customer Match?

Like anything, Google Ads Customer Match isn’t perfect. While it’s a great tool for targeting the right people, there are a few hurdles. 

First and foremost, you need a solid dataset—if your customer info is incomplete or outdated, match rates will suffer. You also need at least 1,000 active users to even start. 

Now, had we written this article prior to September, we’d also note that you need to comply with Google’s strict privacy policies and data requirements, which can be time-consuming to manage and verify. 

However, Google’s new confidential customer match addresses some of these pain points. With it’s confidential computing, your data stays secure and encrypted, which takes some of the stress off managing privacy and compliance. 

How Does Customers.ai Work with Google Ads Customer Match?

If you aren’t already familiar with Customers.ai, we are a website visitor identification and remarketing solution. We are also an official Google parnter.

What does that mean?

Essentially, using a simple pixel, we can ID 20-30% of the people visiting your website – the anonymous website visitors you don’t know. We’re talking names, emails, phone numbers, addresses, business info, demographics, and more.

Once we identify those website visitors, you can put those people into custom audiences based on everything from location, product viewed, number of visits, intent, and more. 

Here’s the best part – our platform will then automatically sync your audiences to Google Ads Customer Match List audiences. No need for cumbersome CSV uploads.

Now, you can show ads to these high-intent visitors who are right on the cusp of buying but might need one more touchpoint to put them over the edge.

What’s a Good Use Case to Leverage Customers.ai & Google Ads Customer Match?

Let’s say you’re an online retailer selling home fitness equipment. 

You’ve already collected customer data from site visitors, so now you use Customers.ai and Google Ads Customer Match to retarget them across multiple platforms. 

Your ads could appear on the Search Network or Shopping tab if they’re looking for fitness gear again, but here’s where it gets interesting — Gmail. 

Instead of waiting for them to search again, your brand shows up in their inbox, creating a natural touchpoint when they’re more likely to engage. 

The same goes for YouTube or Display, reinforcing your brand in places where your customers are spending time. 

Gmail, in particular, feels like a smart follow-up because it’s more personal—delivered right where they’re likely to see it, offering a more seamless continuation of their journey.

Here’s a real-life example we just saw with a customer the other day: 

A shopper visits multiple pages on the customer site, we ID them, and they get added to the Customers.ai High Engaged audience list in Klaviyo and synced to both Meta and Google Ads.

The shopper comes back a day later through a Facebook ad and they add an item to the cart. 

We sent them through the abandoned cart flow in Klaviyo and over 4 days, we hit them with 3 targeted emails.

On the 4th day, they search Google, click a Google ad, and boom—place a $216 order!

Over the course of that 6-day journey, there were 33 pageviews and 3 different channel touchpoints. 

By ensuring that the shopper was synced across all channels (INCLUDING GOOGLE ADS), the customer was able to seal the deal. Pretty cool, right?

How to Sync Customers.ai with Google Ads Customer Match

Integrating Customers.ai with Google Ads is easy and requires no development work on your side. Let’s walk through how to set it up in your account:

1. Navigate to Google Ads under Restore

2. Connect Your Google Account 

3. Create Your Google Ads Audiences

Note: You can sync multiple audiences based on the audiences you have already created.

4. Start using your audiences in Google Ads!

Give Your Google Ads Customer Match List the Data it Needs

If you’re ready to step up your Google Ads targeting game, now’s the time. 

With Customers.ai and Google Ads Customer Match, you’re not just uploading lists—you’re building real, multi-channel connections with your audience. You saw that example I showed, right?

Syncing high-intent data across platforms like Gmail, YouTube, and Display keeps your brand in front of your customers when it matters most. 

So, why wait? 

Get access to our Google Ads Restore and start feeding your Google Ads Customer Match lists real-time, actionable data to drive better results without the hassle. 

Let’s make those touchpoints count!

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Google Ads Customer Match FAQs

What data can be used for Customer Match?

You can use email addresses, phone numbers, mailing addresses, and other personal data to create a Customer Match list, provided the data meets Google’s privacy standards and requirements.

What platforms support Customer Match?

Customer Match allows you to target users across Google’s platforms, including Search, Gmail, YouTube, Shopping, and the Google Display Network.

What are the eligibility requirements for Customer Match?

To use Customer Match, your Google Ads account must be at least 90 days old, have a history of good compliance, and have spent more than $50,000 on Google Ads.

How can I improve match rates in Google Ads Customer Match?

Providing more complete and up-to-date data, such as full names, email addresses, and phone numbers, improves the match rate. The more data you provide, the more likely it will match with Google’s user base.

What are the benefits of using Google Ads Customer Match?

Benefits include better targeting with first-party data, increased retargeting reach, improved ROAS, personalization of ads, and the ability to create lookalike audiences based on your customer data.

Can I use Customer Match for retargeting?

Yes, Customer Match is great for retargeting. It allows you to show ads to people who have already interacted with your brand or visited your website, keeping your brand top of mind.

Is Customer Match secure?

Yes, Google uses the SHA256 algorithm to encrypt customer data. Additionally, Google’s confidential computing ensures that data remains encrypted and unseen by anyone, including Google.

How do I create a Customer Match list?

You can create a Customer Match list using Customers.ai, by uploading your customer data (in CSV format) or using the API in your Google Ads account. Google will then match the data to its users.

What are the minimum requirements for a Customer Match list?

Your list must contain at least 1,000 active users for Google to allow you to run a Customer Match campaign. Smaller lists won’t be eligible.

Can I use Customer Match for lookalike audiences?

Yes, once you have a Customer Match list, you can create lookalike or similar audiences to target people who share similar traits with your existing customers.

What happens if my data is incomplete?

Incomplete data can result in lower match rates. For example, if you only provide an email without a phone number or other identifiers, it may reduce the chances of successfully matching that user to a Google account.

How often should I update my Customer Match list?

It’s recommended to update your Customer Match list regularly to ensure the data is accurate and recent. Keeping your list up to date can improve match rates and ad performance.

What is confidential customer matching in Google Ads?

Confidential customer matching is a feature introduced in 2024 that uses confidential computing to securely process and encrypt customer data, ensuring it remains private and unseen by anyone, including Google.

What’s the difference between Customer Match and remarketing?

Customer Match uses your first-party data to target specific users, while remarketing targets users who have visited your website or interacted with your brand, regardless of their personal data.

Can I create multiple Customer Match lists?

Yes, you can create multiple lists based on different audience segments, such as customers who purchased specific products or leads from certain campaigns. Each list can be used for specific targeting strategies.

What are the limitations of Google Ads Customer Match?

Limitations include the need for a large dataset (at least 1,000 users), potential low match rates if data is incomplete, and the requirement for good compliance and ad spend history.

How can I use Customers.ai with Google Ads Customer Match?

Customers.ai allows you to identify anonymous website visitors and sync their data automatically to your Customer Match lists in Google Ads, streamlining the process and improving targeting with real-time data.

How can I get started with Google Ads Customer Match?

To get started, make sure your account meets the eligibility requirements. Collect and organize your first-party data, then upload it to Google Ads. Once uploaded, create targeted campaigns using the matched users to start reaching your ideal audience.
The post Google Ads Customer Match: How to Use First-Party Data for Smarter Retargeting appeared first on Customers.ai.

Mirage: A Multi-Level Tensor Algebra Super-Optimizer that Automates GP …

With the increasing growth of artificial intelligence—introduction of large language models (LLMs) and generative AI—there has been a growing demand for more efficient graphics processing units (GPUs). GPUs are specialized hardware extensively used for high computing tasks and capable of executing computations in parallel. Writing proper GPU kernels is important to utilize GPUs to their full potential. This task is quite time-consuming and complex, requiring deep expertise in GPU architecture and some programming languages like C++, CUDA, etc. 

Machine Learning ML compilers like TVM, Triton, and Mojo provide certain automation but still need manual handling of the GPU kernels to obtain the optimal result. To achieve optimal results and avoid manual tasking, researchers at Carnegie Mellon University have developed Mirage, an innovative tool designed to automate the generation of high-performance GPU kernels by searching for and generating them. The kernels generated by Mirage can directly be used on PyTorch tensors and be called in PyTorch programs. Users need to write a few lines of code in Mirage compared to the traditional script, which uses many lines. 

Mirage can be seen as a future changer, attaining high productivity, better performance, and stronger correctness in AI applications. Writing manual codes requires substantial engineering expertise due to the complex nature of GPU architecture, but Mirage simplifies the process by automatically generating kernels, easing and simplifying the tasks for engineers. 

Manually written GPU kernels might have some errors, which makes it hard to achieve the required results, but research on Mirage has shown that kernels generated by Mirage are 1.2x-2.5x times faster than the best human-written code. Also, integrating Mirage into PyTorch reduces overall latency by 15-20%. 

# Use Mirage to generate GPU kernels for attention
import mirage as mi
graph = mi.new_kernel_graph()
Q = graph.new_input(dims=(64, 1, 128), dtype=mi.float16)
K = graph.new_input(dims=(64, 128, 4096), dtype=mi.float16)
V = graph.new_input(dims=(64, 4096, 128), dtype=mi.float16)
A = graph.matmul(Q, K)
S = graph.softmax(A)
O = graph.matmul(S, V)
optimized_graph = graph.superoptimize()

Code in Mirage takes few lines compared to traditional method with many lines

All the computations in GPUs are centered around kernels, which are functions running parallely around multiple streaming multiprocessors (SM) in a single-program-multiple data fashion (SPMD). Kernels organize computation in a grid of thread blocks, with each thread block running on a single SM. Each block further has multiple threads to perform calculations on individual data elements.

GPU follows a particular memory hierarchy with:

Register file for quick data access 

Shared Memory: Shared by all threads in a block for efficient data exchange. 

Device Memory: Accessible by all threads in a kernel 

The architecture is represented with the help of the uGraph representation, which contains graphs on multiple levels: Kernel level, thread block level and thread level with kernel-level encapsulating computation over the entire GPU, thread block level addressing computation on an individual streaming multiprocessor (SM), and thread graph addressing computation at the CUDA or tensor core level. The uGraph provides a structured way to represent GPU computations.

Four Categories of GPU Optimization:

1. Normalization + Linear

LLMs generally use LayernNorm, RMSNorm, GroupNorm, and BatchNorm techniques, which are often treated separately by ML compilers. This separation is because normalization techniques require both reduction and broadcast operations. These normalization layers can be fused with linear ones by matrix multiplication. 

2. LoRA + Linear

It fuses low-rank adaptation (LoRA), a technique to adapt pre-trained models to new tasks or datasets while reducing computational requirements with linear layers. It is 1.6x faster than the existing systems. 

3. Gated MLP

It combines two MatMuls, SiLU activation, and element-wise multiplication. Gated MLP reduces kernel launch overhead and device memory access to 1.3x faster than the best baseline. 

4. Attention variants

a. Query-Key Normalization 

Chameleon, ViT-22B, and Google’s recent paper have introduced query-key normalization and fused LayerNorm into the attention kernel. This custom kernel also performs existing GPU optimizations tailored for attention with a 1.7x-2.5x performance improvement. 

      Four categories of GPU Optimization that are mostly missing in today’s ML systems

b. Multi-Head Latent Attention 

It optimizes memory usage by compressing traditional key-value cache of attention into a more compact latent vector. This change introduces two linear layers before attention. Mirage generates a custom kernel that integrates the linear layers with the attention mechanism into a single kernel. This prevents storing intermediate key-value vectors in the GPU device memory. 

In conclusion, Mirage addresses the critical challenge of dealing with high GPU kernels in advanced artificial intelligence problems. It eliminates the problems of significant time investment, high coding expertise, and error generation by providing the best optimal GPU kernels that work in a PyTorch-based environment. It also deals with the loopholes that manual computing might miss, accelerating the deployment of LLMs and other AI technologies across real-world applications.

Check out the GitHub page and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!
The post Mirage: A Multi-Level Tensor Algebra Super-Optimizer that Automates GPU Kernel Generation for PyTorch Applications appeared first on MarkTechPost.