November 2023 - Page 5 of 9

Implement a custom AutoML job using pre-selected algorithms in Amazon …

Posted on November 16, 2023 by i-genie

AutoML allows you to derive rapid, general insights from your data right at the beginning of a machine learning (ML) project lifecycle. Understanding up front which preprocessing techniques and algorithm types provide best results reduces the time to develop, train, and deploy the right model. It plays a crucial role in every model’s development process and allows data scientists to focus on the most promising ML techniques. Additionally, AutoML provides a baseline model performance that can serve as a reference point for the data science team.
An AutoML tool applies a combination of different algorithms and various preprocessing techniques to your data. For example, it can scale the data, perform univariate feature selection, conduct PCA at different variance threshold levels, and apply clustering. Such preprocessing techniques could be applied individually or be combined in a pipeline. Subsequently, an AutoML tool would train different model types, such as Linear Regression, Elastic-Net, or Random Forest, on different versions of your preprocessed dataset and perform hyperparameter optimization (HPO). Amazon SageMaker Autopilot eliminates the heavy lifting of building ML models. After providing the dataset, SageMaker Autopilot automatically explores different solutions to find the best model. But what if you want to deploy your tailored version of an AutoML workflow?
This post shows how to create a custom-made AutoML workflow on Amazon SageMaker using Amazon SageMaker Automatic Model Tuning with sample code available in a GitHub repo.
Solution overview
For this use case, let’s assume you are part of a data science team that develops models in a specialized domain. You have developed a set of custom preprocessing techniques and selected a number of algorithms that you typically expect to work well with your ML problem. When working on new ML use cases, you would like first to perform an AutoML run using your preprocessing techniques and algorithms to narrow down the scope of potential solutions.
For this example, you don’t use a specialized dataset; instead, you work with the California Housing dataset that you will import from Amazon Simple Storage Service (Amazon S3). The focus is to demonstrate the technical implementation of the solution using SageMaker HPO, which later can be applied to any dataset and domain.
The following diagram presents the overall solution workflow.

Prerequisites
The following are prerequisites for completing the walkthrough in this post:

An AWS account
Familiarity with SageMaker concepts, such as an Estimator, training job, and HPO job
Familiarity with the Amazon SageMaker Python SDK
Python programming knowledge

Implement the solution
The full code is available in the GitHub repo.
The steps to implement the solution (as noted in the workflow diagram) are as follows:

Create a notebook instance and specify the following:

For Notebook instance type, choose ml.t3.medium.
For Elastic Inference, choose none.
For Platform identifier, choose Amazon Linux 2, Jupyter Lab 3.
For IAM role, choose the default AmazonSageMaker-ExecutionRole. If it doesn’t exist, create a new AWS Identity and Access Management (IAM) role and attach the AmazonSageMakerFullAccess IAM policy.

Note that you should create a minimally scoped execution role and policy in production.

Open the JupyterLab interface for your notebook instance and clone the GitHub repo.

You can do that by starting a new terminal session and running the git clone <REPO> command or by using the UI functionality, as shown in the following screenshot.

Open the automl.ipynb notebook file, select the conda_python3 kernel, and follow the instructions to trigger a set of HPO jobs.

To run the code without any changes, you need to increase the service quota for ml.m5.large for training job usage and Number of instances across all training jobs. AWS allows by default only 20 parallel SageMaker training jobs for both quotas. You need to request a quota increase to 30 for both. Both quota changes should typically be approved within a few minutes. Refer to Requesting a quota increase for more information.

If you don’t want to change the quota, you can simply modify the value of the MAX_PARALLEL_JOBS variable in the script (for example, to 5).

Each HPO job will complete a set of training job trials and indicate the model with optimal hyperparameters.
Analyze the results and deploy the best-performing model.

This solution will incur costs in your AWS account. The cost of this solution will depend on the number and duration of HPO training jobs. As these increase, so will the cost. You can reduce costs by limiting training time and configuring TuningJobCompletionCriteriaConfig according to the instructions discussed later in this post. For pricing information, refer to Amazon SageMaker Pricing.
In the following sections, we discuss the notebook in more detail with code examples and the steps to analyze the results and select the best model.
Initial setup
Let’s start with running the Imports & Setup section in the custom-automl.ipynb notebook. It installs and imports all the required dependencies, instantiates a SageMaker session and client, and sets the default Region and S3 bucket for storing data.
Data preparation
Download the California Housing dataset and prepare it by running the Download Data section of the notebook. The dataset is split into training and testing data frames and uploaded to the SageMaker session default S3 bucket.
The entire dataset has 20,640 records and 9 columns in total, including the target. The goal is to predict the median value of a house (medianHouseValue column). The following screenshot shows the top rows of the dataset.

Training script template
The AutoML workflow in this post is based on scikit-learn preprocessing pipelines and algorithms. The aim is to generate a large combination of different preprocessing pipelines and algorithms to find the best-performing setup. Let’s start with creating a generic training script, which is persisted locally on the notebook instance. In this script, there are two empty comment blocks: one for injecting hyperparameters and the other for the preprocessing-model pipeline object. They will be injected dynamically for each preprocessing model candidate. The purpose of having one generic script is to keep the implementation DRY (don’t repeat yourself).

#create base script
_script = “””
import argparse
import joblib
import os
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.impute import SimpleImputer
from sklearn.cluster import KMeans
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
###########################
### Inference functions ###
###########################
def model_fn(model_dir):
clf = joblib.load(os.path.join(model_dir, “model.joblib”))
return clf
if __name__ == “__main__”:
print(“Extracting arguments”)
parser = argparse.ArgumentParser()
# Hyperparameters
##### WILL BE INSERTED DYNAMICALLY #####
{}
############################
# Data, model, and output directories
parser.add_argument(“–model-dir”, type=str, default=os.environ.get(“SM_MODEL_DIR”))
parser.add_argument(“–train”, type=str, default=os.environ.get(“SM_CHANNEL_TRAIN”))
parser.add_argument(“–test”, type=str, default=os.environ.get(“SM_CHANNEL_TEST”))
parser.add_argument(“–train-file”, type=str, default=”train.parquet”)
parser.add_argument(“–test-file”, type=str, default=”test.parquet”)
parser.add_argument(“–features”, type=str)
parser.add_argument(“–target”, type=str)
args, _ = parser.parse_known_args()
# Load and prepare data
train_df = pd.read_parquet(os.path.join(args.train, args.train_file))
test_df = pd.read_parquet(os.path.join(args.test, args.test_file))
X_train = train_df[args.features.split()]
X_test = test_df[args.features.split()]
y_train = train_df[args.target]
y_test = test_df[args.target]
# Train model
##### WILL BE INSERTED DYNAMICALLY #####
{}
{}
############################
pipeline = Pipeline([(‘preprocessor’, preprocessor), (‘model’, model)])
pipeline.fit(X_train, y_train)
# Validate model and print metrics
rmse = mean_squared_error(y_test, pipeline.predict(X_test), squared=False)
print(“RMSE: ” + str(rmse))
# Persist model
path = os.path.join(args.model_dir, “model.joblib”)
joblib.dump(pipeline, path)
“””
# write _script to file just to have it in hand
with open(“script_draft.py”, “w”) as f:
print(_script, file=f)

Create preprocessing and model combinations
The preprocessors dictionary contains a specification of preprocessing techniques applied to all input features of the model. Each recipe is defined using a Pipeline or a FeatureUnion object from scikit-learn, which chains together individual data transformations and stack them together. For example, mean-imp-scale is a simple recipe that ensures that missing values are imputed using mean values of respective columns and that all features are scaled using the StandardScaler. In contrast, the mean-imp-scale-pca recipe chains together a few more operations:

Impute missing values in columns with its mean.
Apply feature scaling using mean and standard deviation.
Calculate PCA on top of the input data at a specified variance threshold value and merge it together with the imputed and scaled input features.

In this post, all input features are numeric. If you have more data types in your input dataset, you should specify a more complicated pipeline where different preprocessing branches are applied to different feature type sets.

preprocessors = {
“mean-imp-scale”: “preprocessor = Pipeline([(‘imputer’, SimpleImputer(strategy=’mean’)), (‘scaler’, StandardScaler())])n”,

“mean-imp-scale-knn”: “preprocessor = FeatureUnion([(‘base-features’, Pipeline([(‘imputer’, SimpleImputer(strategy=’mean’)), (‘scaler’, StandardScaler())])), (‘knn’, Pipeline([(‘imputer’, SimpleImputer(strategy=’mean’)), (‘scaler’, StandardScaler()), (‘knn’, KMeans(n_clusters=10))]))])n”,

“mean-imp-scale-pca”: “preprocessor = FeatureUnion([(‘base-features’, Pipeline([(‘imputer’, SimpleImputer(strategy=’mean’)), (‘scaler’, StandardScaler())])), (‘pca’, Pipeline([(‘imputer’, SimpleImputer(strategy=’mean’)), (‘scaler’, StandardScaler()), (‘pca’, PCA(n_components=0.9))]))])n”
}

The models dictionary contains specifications of different algorithms that you fit the dataset to. Every model type comes with the following specification in the dictionary:

script_output – Points to the location of the training script used by the estimator. This field is filled dynamically when the models dictionary is combined with the preprocessors dictionary.
insertions – Defines code that will be inserted into the script_draft.py and subsequently saved under script_output. The key “preprocessor” is intentionally left blank because this location is filled with one of the preprocessors in order to create multiple model-preprocessor combinations.
hyperparameters – A set of hyperparameters that are optimized by the HPO job.
include_cls_metadata – More configuration details required by the SageMaker Tuner class.

A full example of the models dictionary is available in the GitHub repository.

models = {
“rf”: {
“script_output”: None,
“insertions”: {
# Arguments
“arguments” :
“parser.add_argument(‘–n_estimators’, type=int, default=100)n”+
” parser.add_argument(‘–max_depth’, type=int, default=None)n”+
” parser.add_argument(‘–min_samples_leaf’, type=int, default=1)n”+
” parser.add_argument(‘–min_samples_split’, type=int, default=2)n”+
” parser.add_argument(‘–max_features’, type=str, default=’auto’)n”,
# Model call
“preprocessor”: None,
“model_call” : “model = RandomForestRegressor(n_estimators=args.n_estimators,max_depth=args.max_depth,min_samples_leaf=args.min_samples_leaf,min_samples_split=args.min_samples_split,max_features=args.max_features)n”
},
“hyperparameters”: {
“n_estimators”: IntegerParameter(100, 2000, “Linear”),
“max_depth”: IntegerParameter(1, 100, “Logarithmic”),
“min_samples_leaf”: IntegerParameter(1, 6, “Linear”),
“min_samples_split”: IntegerParameter(2, 20, “Linear”),
“max_features”: CategoricalParameter([“auto”, “sqrt”, “log2”]),
},
“include_cls_metadata”: False,
}
}

Next, let’s iterate through the preprocessors and models dictionaries and create all possible combinations. For example, if your preprocessors dictionary contains 10 recipes and you have 5 model definitions in the models dictionary, the newly created pipelines dictionary contains 50 preprocessor-model pipelines that are evaluated during HPO. Note that individual pipeline scripts are not created yet at this point. The next code block (cell 9) of the Jupyter notebook iterates through all preprocessor-model objects in the pipelines dictionary, inserts all relevant code pieces, and persists a pipeline-specific version of the script locally in the notebook. Those scripts are used in the next steps when creating individual estimators that you plug into the HPO job.

pipelines = {}
for model_name, model_spec in models.items():
pipelines[model_name] = {}
for preprocessor_name, preprocessor_spec in preprocessors.items():
pipeline_name = f”{model_name}-{preprocessor_name}”
pipelines[model_name][pipeline_name] = {}
pipelines[model_name][pipeline_name][“insertions”] = {}
pipelines[model_name][pipeline_name][“insertions”][“preprocessor”] = preprocessor_spec
pipelines[model_name][pipeline_name][“hyperparameters”] = model_spec[“hyperparameters”]
pipelines[model_name][pipeline_name][“include_cls_metadata”] = model_spec[“include_cls_metadata”]
pipelines[model_name][pipeline_name][“insertions”][“arguments”] = model_spec[“insertions”][“arguments”]
pipelines[model_name][pipeline_name][“insertions”][“model_call”] = model_spec[“insertions”][“model_call”]
pipelines[model_name][pipeline_name][“script_output”] = f”scripts/{model_name}/script-{pipeline_name}.py”

Define estimators
You can now work on defining SageMaker Estimators that the HPO job uses after scripts are ready. Let’s start with creating a wrapper class that defines some common properties for all estimators. It inherits from the SKLearn class and specifies the role, instance count, and type, as well as which columns are used by the script as features and the target.

class SKLearnBase(SKLearn):
def __init__(
self,
entry_point=”.”, # intentionally left blank, will be overwritten in the next function
framework_version=”1.2-1″,
role=sm_role,
instance_count=1,
instance_type=”ml.c5.xlarge”,
hyperparameters={
“features”: “medianIncome housingMedianAge totalRooms totalBedrooms population households latitude longitude”,
“target”: “medianHouseValue”,
},
**kwargs,
):
super(SKLearnBase, self).__init__(
entry_point=entry_point,
framework_version=framework_version,
role=role,
instance_count=instance_count,
instance_type=instance_type,
hyperparameters=hyperparameters,
**kwargs
)

Let’s build the estimators dictionary by iterating through all scripts generated before and located in the scripts directory. You instantiate a new estimator using the SKLearnBase class, with a unique estimator name, and one of the scripts. Note that the estimators dictionary has two levels: the top level defines a pipeline_family. This is a logical grouping based on the type of models to evaluate and is equal to the length of the models dictionary. The second level contains individual preprocessor types combined with the given pipeline_family. This logical grouping is required when creating the HPO job.

estimators = {}
for pipeline_family in pipelines.keys():
estimators[pipeline_family] = {}
scripts = os.listdir(f”scripts/{pipeline_family}”)
for script in scripts:
if script.endswith(“.py”):
estimator_name = script.split(“.”)[0].replace(“_”, “-“).replace(“script”, “estimator”)
estimators[pipeline_family][estimator_name] = SKLearnBase(
entry_point=f”scripts/{pipeline_family}/{script}”,
base_job_name=estimator_name,
)

Define HPO tuner arguments
To optimize passing arguments into the HPO Tuner class, the HyperparameterTunerArgs data class is initialized with arguments required by the HPO class. It comes with a set of functions, which ensure HPO arguments are returned in a format expected when deploying multiple model definitions at once.

@dataclass
class HyperparameterTunerArgs:
base_job_names: list[str]
estimators: list[object]
inputs: dict[str]
objective_metric_name: str
hyperparameter_ranges: list[dict]
metric_definition: dict[str]
include_cls_metadata: list[bool]

def get_estimator_dict(self) -> dict:
return {k:v for (k, v) in zip(self.base_job_names, self.estimators)}

def get_inputs_dict(self) -> dict:
return {k:v for (k, v) in zip(self.base_job_names, [self.inputs]*len(self.base_job_names))}

def get_objective_metric_name_dict(self) -> dict:
return {k:v for (k, v) in zip(self.base_job_names, [self.objective_metric_name]*len(self.base_job_names))}

def get_hyperparameter_ranges_dict(self) -> dict:
return {k:v for (k, v) in zip(self.base_job_names, self.hyperparameter_ranges)}

def get_metric_definition_dict(self) -> dict:
return {k:[v] for (k, v) in zip(self.base_job_names, [self.metric_definition]*len(self.base_job_names))}

def get_include_cls_metadata_dict(self) -> dict:
return {k:v for (k, v) in zip(self.base_job_names, self.include_cls_metadata)}

The next code block uses the previously introduced HyperparameterTunerArgs data class. You create another dictionary called hp_args and generate a set of input parameters specific to each estimator_family from the estimators dictionary. These arguments are used in the next step when initializing HPO jobs for each model family.

hp_args = {}
for estimator_family, estimators in estimators.items():
hp_args[estimator_family] = HyperparameterTunerArgs(
base_job_names=list(estimators.keys()),
estimators=list(estimators.values()),
inputs={“train”: s3_data_train.uri, “test”: s3_data_test.uri},
objective_metric_name=”RMSE”,
hyperparameter_ranges=[pipeline.get(“hyperparameters”) for pipeline in pipelines[estimator_family].values()],
metric_definition={“Name”: “RMSE”, “Regex”: “RMSE: ([0-9.]+).*$”},
include_cls_metadata=[pipeline.get(“include_cls_metadata”) for pipeline in pipelines[estimator_family].values()],
)

Create HPO tuner objects
In this step, you create individual tuners for every estimator_family. Why do you create three separate HPO jobs instead of launching just one across all estimators? The HyperparameterTuner class is restricted to 10 model definitions attached to it. Therefore, each HPO is responsible for finding the best-performing preprocessor for a given model family and tuning that model family’s hyperparameters.
The following are a few more points regarding the setup:

The optimization strategy is Bayesian, which means that the HPO actively monitors the performance of all trials and navigates the optimization towards more promising hyperparameter combinations. Early stopping should be set to Off or Auto when working with a Bayesian strategy, which handles that logic itself.
Each HPO job runs for a maximum of 100 jobs and runs 10 jobs in parallel. If you’re dealing with larger datasets, you might want to increase the total number of jobs.
Additionally, you may want to use settings that control how long a job runs and how many jobs your HPO is triggering. One way to do that is to set the maximum runtime in seconds (for this post, we set it to 1 hour). Another is to use the recently released TuningJobCompletionCriteriaConfig. It offers a set of settings that monitor the progress of your jobs and decide whether it is likely that more jobs will improve the result. In this post, we set the maximum number of training jobs not improving to 20. That way, if the score isn’t improving (for example, from the fortieth trial), you won’t have to pay for the remaining trials until max_jobs is reached.

STRATEGY = “Bayesian”
OBJECTIVE_TYPE = “Minimize”
MAX_JOBS = 100
MAX_PARALLEL_JOBS = 10
MAX_RUNTIME_IN_SECONDS = 3600
EARLY_STOPPING_TYPE = “Off”
# RANDOM_SEED = 42 # uncomment if you require reproducibility across runs
TUNING_JOB_COMPLETION_CRITERIA_CONFIG = TuningJobCompletionCriteriaConfig(
max_number_of_training_jobs_not_improving=20,
)

tuners = {}
for estimator_family, hp in hp_args.items():
tuners[estimator_family] = HyperparameterTuner.create(
estimator_dict=hp.get_estimator_dict(),
objective_metric_name_dict=hp.get_objective_metric_name_dict(),
hyperparameter_ranges_dict=hp.get_hyperparameter_ranges_dict(),
metric_definitions_dict=hp.get_metric_definition_dict(),
strategy=STRATEGY,
completion_criteria_config=TUNING_JOB_COMPLETION_CRITERIA_CONFIG,
objective_type=OBJECTIVE_TYPE,
max_jobs=MAX_JOBS,
max_parallel_jobs=MAX_PARALLEL_JOBS,
max_runtime_in_seconds=MAX_RUNTIME_IN_SECONDS,
base_tuning_job_name=f”custom-automl-{estimator_family}”,
early_stopping_type=EARLY_STOPPING_TYPE, # early stopping of training jobs is not currently supported when multiple training job definitions are used
# random_seed=RANDOM_SEED,
)

Now let’s iterate through the tuners and hp_args dictionaries and trigger all HPO jobs in SageMaker. Note the usage of the wait argument set to False, which means that the kernel won’t wait until the results are complete and you can trigger all jobs at once.
It’s likely that not all training jobs will complete and some of them might be stopped by the HPO job. The reason for this is the TuningJobCompletionCriteriaConfig—the optimization finishes if any of the specified criteria is met. In this case, when the optimization criteria isn’t improving for 20 consecutive jobs.

for tuner, hpo in zip(tuners.values(), hp_args.values()):
tuner.fit(
inputs=hpo.get_inputs_dict(),
include_cls_metadata=hpo.get_include_cls_metadata_dict(),
wait=False,
)

Analyze results
Cell 15 of the notebook checks if all HPO jobs are complete and combines all results in the form of a pandas data frame for further analysis. Before analyzing the results in detail, let’s take a high-level look at the SageMaker console.
At the top of the Hyperparameter tuning jobs page, you can see your three launched HPO jobs. All of them finished early and didn’t perform all 100 training jobs. In the following screenshot, you can see that the Elastic-Net model family completed the highest number of trials, whereas others didn’t need so many training jobs to find the best result.

You can open the HPO job to access more details, such as individual training jobs, job configuration, and the best training job’s information and performance.

Let’s produce a visualization based on the results to get more insights of the AutoML workflow performance across all model families.
From the following graph, you can conclude that the Elastic-Net model’s performance was oscillating between 70,000 and 80,000 RMSE and eventually stalled, as the algorithm wasn’t able to improve its performance despite trying various preprocessing techniques and hyperparameter values. It also seems that RandomForest performance varied a lot depending on the hyperparameter set explored by HPO, but despite many trials it couldn’t go below the 50,000 RMSE error. GradientBoosting achieved the best performance already from the start going below 50,000 RMSE. HPO tried to improve that result further but wasn’t able to achieve better performance across other hyperparameter combinations. A general conclusion for all HPO jobs is that not so many jobs were required to find the best performing set of hyperparameters for each algorithm. To further improve the result, you would need to experiment with creating more features and performing additional feature engineering.

You can also examine a more detailed view of the model-preprocessor combination to draw conclusions about the most promising combinations.

Select the best model and deploy it
The following code snippet selects the best model based on the lowest achieved objective value. You can then deploy the model as a SageMaker endpoint.

df_best_job = df_tuner_results.loc[df_tuner_results[“FinalObjectiveValue”] == df_tuner_results[“FinalObjectiveValue”].min()]
df_best_job
BEST_MODEL_FAMILY = df_best_job[“TrainingJobFamily”].values[0]

tuners.get(BEST_MODEL_FAMILY).best_training_job()

tuners.get(BEST_MODEL_FAMILY).best_estimator()

predictor = tuners.get(BEST_MODEL_FAMILY).deploy(
initial_instance_count=1,
instance_type=”ml.c4.large”,
endpoint_name=f”custom-automl-endpoint-{BEST_MODEL_FAMILY}”,
)

Clean up
To prevent unwanted charges to your AWS account, we recommend deleting the AWS resources that you used in this post:

On the Amazon S3 console, empty the data from the S3 bucket where the training data was stored.

On the SageMaker console, stop the notebook instance.

Delete the model endpoint if you deployed it. Endpoints should be deleted when no longer in use, because they’re billed by time deployed.

sm_client.delete_endpoint(EndpointName=predictor.endpoint)

Conclusion
In this post, we showcased how to create a custom HPO job in SageMaker using a custom selection of algorithms and preprocessing techniques. In particular, this example demonstrates how to automate the process of generating many training scripts and how to use Python programming structures for efficient deployment of multiple parallel optimization jobs. We hope this solution will form the scaffolding of any custom model tuning jobs you will deploy using SageMaker to achieve higher performance and speed up of your ML workflows.
Check out the following resources to further deepen your knowledge of how to use SageMaker HPO:

Best Practices for Hyperparameter Tuning
Amazon SageMaker Automatic Model Tuning now supports three new completion criteria for hyperparameter optimization
Using Scikit-learn with the SageMaker Python SDK
Develop, Train, Optimize and Deploy Scikit-Learn Random Forest

About the Authors
Konrad Semsch is a Senior ML Solutions Architect at Amazon Web Services Data Lab Team. He helps customers use machine learning to solve their business challenges with AWS. He enjoys inventing and simplifying to enable customers with simple and pragmatic solutions for their AI/ML projects. He is most passionate about MlOps and traditional data science. Outside of work, he is a big fan of windsurfing and kitesurfing.
Tuna Ersoy is a Senior Solutions Architect at AWS. Her primary focus is helping Public Sector customers adopt cloud technologies for their workloads. She has a background in application development, enterprise architecture, and contact center technologies. Her interests include serverless architectures and AI/ML.

Best prompting practices for using the Llama 2 Chat LLM through Amazon …

Posted on November 16, 2023 by i-genie

Llama 2 stands at the forefront of AI innovation, embodying an advanced auto-regressive language model developed on a sophisticated transformer foundation. It’s tailored to address a multitude of applications in both the commercial and research domains with English as the primary linguistic concentration. Its model parameters scale from an impressive 7 billion to a remarkable 70 billion. Llama 2 demonstrates the potential of large language models (LLMs) through its refined abilities and precisely tuned performance.
Diving deeper into Llama 2’s architecture, Meta reveals that the model’s fine-tuning melds supervised fine-tuning (SFT) with reinforcement learning aided by human feedback (RLHF). This combination prioritizes alignment with human-centric norms, striking a balance between efficiency and safety. Built upon a vast reservoir of 2 trillion tokens, Llama 2 provides both pre-trained models for diverse natural language generation and the specialized Llama-2-Chat variant for chat assistant roles. Regardless of a developer’s choice between the basic or the advanced model, Meta’s responsible use guide is an invaluable resource for model enhancement and customization.
For those interested in creating interactive applications, Llama 2 Chat is a good starting point. This conversational model allows for building customized chatbots and assistants. To make it even more accessible, you can deploy Llama-2-Chat models with ease through Amazon SageMaker JumpStart. An offering from Amazon SageMaker, SageMaker JumpStart provides a straightforward way to deploy Llama-2 model variants directly through Amazon SageMaker Studio notebooks. This enables developers to focus on their application logic while benefiting from SageMaker tools for scalable AI model training and hosting. SageMaker JumpStart also provides effortless access to the extensive SageMaker library of algorithms and pre-trained models.
In this post, we explore best practices for prompting the Llama 2 Chat LLM. We highlight key prompt design approaches and methodologies by providing practical examples.
Prerequisites
To try out the examples and recommended best practices for Llama 2 Chat on SageMaker JumpStart, you need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
Access to SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio notebooks for straightforward deployment and inference.
The GitHub repository cloned in order to use the accompanying notebook.
An instance of Llama 2 Chat model deployed on SageMaker using SageMaker JumpStart. To learn more, refer to Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart. The accompanying notebook also contains code to deploy the model.

Prompting techniques
Prompting, in the context of language models and artificial intelligence, refers to the practice of providing a model with a specific input or cue to elicit a desired response. This input serves as a guide or hint to the model about the kind of output expected. Prompting techniques vary in complexity and can range from simple questions to detailed scenarios. Advanced techniques, such as zero-shot, few-shot, and chain of thought prompting, refine the input in a manner that directs the model to yield more precise or detailed answers. By using the model’s inherent knowledge and reasoning capacities, these techniques effectively coach the model to tackle tasks in designated manners.
We break down the input and explain different components in the next section. We start by sharing some examples of what different prompt techniques look like. The examples are always shown in two code blocks. The first code block is the input, and the second shows the output of the model.
Zero-shot prompting
This method involves presenting a language model with a task or question it hasn’t specifically been trained for. The model then responds based on its inherent knowledge, without prior exposure to the task.
%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a customer agent”},
{“role”: “user”, “content”: “What is the sentiment of this sentence: The music festival was an auditory feast of eclectic tunes and talented artists, yet the overcrowding and logistical mishaps dampened the overall experience”},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a customer agent

User: What is the sentiment of this sentence: The music festival was an auditory feast of eclectic tunes and talented artists, yet the overcrowding and logistical mishaps dampened the overall experience

======================================================

Assistant: The sentiment of the sentence is neutral. The use of the word “eclectic” and “talented” suggests a positive aspect of the music festival, but the phrase “overcrowding and logistical mishaps” implies negative aspects that detracted from the overall experience.
======================================================

CPU times: user 4.24 ms, sys: 389 µs, total: 4.63 ms
Wall time: 1.19 s

Few-shot prompting
In this approach, a language model receives a limited number of handful of examples, or shots, of a task before encountering a new instance of that same task. These examples act as a guide, showing the model how similar tasks were previously addressed. Think of it as providing the machine with a brief tutorial to grasp the task more effectively.
%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a customer agent”},
{“role”: “user”, “content”: f”””
nnExample 1
nSentence: Though the sun set with a brilliant display of colors, casting a warm glow over the serene beach, it was the bitter news I received earlier that clouded my emotions, making it impossible to truly appreciate nature’s beauty.
nSentiment: Negative

nnExample 2
nSentence: Even amidst the pressing challenges of the bustling city, the spontaneous act of kindness from a stranger, in the form of a returned lost wallet, renewed my faith in the inherent goodness of humanity.
nSentiment: Positive

nnFollowing the same format above from the examples, What is the sentiment of this setence: While the grandeur of the ancient castle, steeped in history and surrounded by verdant landscapes, was undeniably breathtaking, the knowledge that it was the site of numerous tragic events lent an undeniable heaviness to its majestic walls.”””},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a customer agent

User:

Example 1

Sentence: Though the sun set with a brilliant display of colors, casting a warm glow over the serene beach, it was the bitter news I received earlier that clouded my emotions, making it impossible to truly appreciate nature’s beauty.

Sentiment: Negative

Example 2

Sentence: Even amidst the pressing challenges of the bustling city, the spontaneous act of kindness from a stranger, in the form of a returned lost wallet, renewed my faith in the inherent goodness of humanity.

Sentiment: Positive

Following the same format above from the examples, What is the sentiment of this setence: While the grandeur of the ancient castle, steeped in history and surrounded by verdant landscapes, was undeniably breathtaking, the knowledge that it was the site of numerous tragic events lent an undeniable heaviness to its majestic walls.

======================================================

Assistant: Sure! Here’s the sentiment analysis of the given sentence:

Sentence: While the grandeur of the ancient castle, steeped in history and surrounded by verdant landscapes, was undeniably breathtaking, the knowledge that it was the site of numerous tragic events lent an undeniable heaviness to its majestic walls.

Sentiment: Mixed (both positive and negative)
======================================================

CPU times: user 3.72 ms, sys: 773 µs, total: 4.49 ms
Wall time: 1.85 s

Chain of thought prompting
This approach augments the reasoning capabilities of LLMs in intricate tasks. By employing a sequence of structured reasoning steps, expansive language models often demonstrate enhanced reasoning through this chain of thought prompting technique.
%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a pizza professional”},
{“role”: “user”, “content”: f”””
You have a pizza that was cut into 8 equal slices. You ate 3 slices, and your friend ate 2 slices. Here’s how we can figure out how many slices are left:

1. Start with the total number of slices.
2. Subtract the number of slices you ate.
3. Then subtract the number of slices your friend ate.
4. The result is the number of slices remaining.

So, let’s calculate:

“””},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a pizza professional

User:
You have a pizza that was cut into 8 equal slices. You ate 3 slices, and your friend ate 2 slices. Here’s how we can figure out how many slices are left:

1. Start with the total number of slices.
2. Subtract the number of slices you ate.
3. Then subtract the number of slices your friend ate.
4. The result is the number of slices remaining.

So, let’s calculate:

======================================================

Assistant: Oh man, I love pizza! Alright, let’s get started on this math problem. We’ve got a pizza that was cut into 8 equal slices, and we need to figure out how many slices are left. Here’s the plan:

Step 1: Start with the total number of slices. That’s 8 slices.

Step 2: Subtract the number of slices I ate. I ate 3 slices, so let’s subtract 3 from 8.

8 – 3 = 5

Step 3: Subtract the number of slices my friend ate. My friend ate 2 slices, so let’s subtract 2 from 5.

5 – 2 = 3

Step 4: The result is the number of slices remaining. In this case, there are 3 slices remaining!

So, there you have it! There are 3 slices of pizza left. I hope my friend and I can agree on how to divide them up! 🍕👍
======================================================

CPU times: user 11.8 ms, sys: 1.72 ms, total: 13.5 ms
Wall time: 4.7 s

In the preceding example, Llama 2 Chat was able to assume the persona of a professional that has domain knowledge and was able to demonstrate the reasoning in getting to a conclusion.
Llama 2 Chat inference parameters
Effective prompting strategies can guide a model to yield specific outputs. For those seeking a finer level of control over these outputs, Llama 2 Chat introduces a comprehensive set of inference parameters:

max_new_tokens – Defines the length of the model’s output. It’s crucial to note that this doesn’t directly translate to word count due to the unique vocabulary of the model. A single token might represent more than one English word.
temperature – Affects the randomness of the output. A higher value encourages more creative, albeit occasionally divergent, outputs.
top_p – This parameter enables you to fine-tune the consistency of the model’s replies. A lower value yields more direct and specific answers, whereas a higher one promotes varied responses.

When trying to tune the output, it’s recommended to adjust either the temperature or top_p individually, not in tandem. Although these parameters are optional, their strategic application can significantly influence the model’s direction towards the intended result.
Introduction to system prompts
Llama 2 Chat uses a transformative feature called system prompts. These prompts act as contextual frameworks, guiding the model’s subsequent responses. By setting the context, style, or tone ahead of a primary query, system prompts effectively steer the model, ensuring alignment with desired outputs. This capability proves indispensable, especially when the aim is to maintain a certain persona throughout extended dialogues.
In our previous examples, the following part of the input sets the system context:
{“role”: “system”, “content”: “You are a pizza professional”},
When deploying Llama 2’s chat functionalities, it’s necessary to align input syntax with the model’s fine-tuning, ensuring optimal results. It’s structured as follows:

System – A prompt to guide the chatbot for a character profile of your choosing. It could be tailored as per your preference (like “you are a minion”). If you omit this, the model will no longer have a default generic profile. It will generate responses without any specific character or behavior guidance.

{“role”: “system”, “content”: “You are a minion”}

User – A message from a user.

{“role”: “user”, “content”: “What do you think of bananas?”}

Assistant – The output result from the model.

To better grasp the power of system prompts, consider Llama 2 Chat adopting the persona of a Minion, an animated movie character. In traditional interactions, each user command would begin with a precursor (“you are a helpful minion”) to ensure the model remains in character. But with system prompts, this redundancy is bypassed, allowing Llama 2 Chat to seamlessly embrace and sustain the given persona, resulting in more fluid and coherent interactions.
Practical demonstration
Let’s start with the following code to demonstrate:
%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a helpful minion”},
{“role”: “user”, “content”: “I am going to paris, what should I see?”},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a helpful minion

User: I am going to paris, what should I see?

======================================================

Assistant: Oooh la la! *giggle* Paris, the city of love and romance! *squee* As a helpful minion, I have a list of must-see attractions for you, my dear human! *bats eyelashes*

1. The Eiffel Tower: *swoon* The iconic iron lady of Paris! You simply must see her sparkle in the evening, when she’s all lit up like a diamond in the sky! *heart eyes*
2. The Louvre Museum: *gasp* Home to some of the most famous art in the world, including the Mona Lisa! *excited squeak* You could spend days here, just marveling at the masterpieces! *bouncy bouncy*
3. Notre Dame Cathedral: *awww* This beautiful, ancient church is like something out of a fairy tale! *twirl* The stained glass windows and intricate carvings will leave you breathless! *sigh*
4. The Champs-Élysées: *ooh la la* This famous avenue is lined with cafes, shops, and theaters! *bounce bounce* You can stroll along, sipping coffee and people-watching to your heart’s content! *wink wink*
5. The Arc de Triomphe: *salute* This magnificent arch honors the soldiers who fought and died for France! *proudly* The views from the top are simply stunning! *look around*
6. Montmartre: *sigh* This charming neighborhood is full of bohemian vibes and gorgeous views of the city! *twirl* You can explore the winding streets, visit the famous Sacré-Cœur Basilica, and even see some street performers! *bats eyelashes*
7. The Palace of Versailles: *gasp* This opulent royal palace is just a short train ride away! *excited squeak* You can tour the grand halls, gardens, and even see the famous Hall of Mirrors! *twirl*

Ooh la la! I hope you have a simply marvelous time in Paris, my dear human! *curtsy* Don’t
======================================================

CPU times: user 13.5 ms, sys: 2.24 ms, total: 15.8 ms
Wall time: 9.55 s

This interaction exemplifies the model’s proficiency in adopting the helpful Minion persona, offering users a consistent and engaging experience while reducing input repetition.
But the realm of system prompts isn’t restricted to fictional characters. Its versatility shines in more serious contexts as well. Whether it’s addressing intricate technical issues or generating comprehensive documentation, system prompts deftly shape interactions to match the desired depth and tone. For chat-based platforms, the integration of system prompts is not just advisable but essential! Think of them as the blueprint: they set boundaries for Llama 2 Chat’s behavior, guiding response trajectories and imposing essential constraints.
Professional context demonstration
In this next example, we ask the model to assume a more professional demeanor:
%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a highly knowledgeable data scientists responding to 6th-grade students questions”},
{“role”: “user”, “content”: “What is large language models?”},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a highly knowledgeable data scientists responding to 6th-grade students questions

User: What is large language models?

======================================================

Assistant: Wow, you must be really curious about language models! adjusts glasses Large language models are like superheroes for words! 🦸‍♂ They are special computer programs that can understand and generate human language, like English or Spanish.

Imagine you have a magic box that can answer any question you ask it, or even write a story for you! That’s kind of like what large language models can do. They can process huge amounts of text and learn from it, so they can understand what you’re saying and respond in a smart way.

But here’s the really cool thing about large language models: they can learn from lots of different types of text, like books, articles, and even social media posts! 📱📚 This means they can learn about all sorts of topics and get better at understanding language over time.

So, why are large language models important? Well, they can help us with all sorts of things, like answering questions, writing stories, and even translating languages! 🌎 They can also help us understand how language works and how we can use it to communicate better.

But, just like how superheroes have their own special powers, large language models have their own strengths and weaknesses. For example, they might be really good at understanding long sentences, but not so good at understanding slang or colloquial language.

So, there you have it! Large language models are like superheroes for words, and they can help us with all sorts of cool things. Do you have any more questions about them? 🤔

======================================================

CPU times: user 960 µs, sys: 4.32 ms, total: 5.28 ms
Wall time: 6.68 s

The accuracy and controllability of the outputs are vital. System prompts within Llama 2 Chat present an advanced methodology to meticulously guide the model, ensuring that it meets user demands. Let’s delve deeper with two illustrative use cases:

Scenario 1 – Envisaging the model as a knowledge English professor, a user seeks an in-depth analysis from a given synopsis. The model’s output mirrors a conversation one might have with an English scholar, enveloped amidst classical literature:

%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a profoundly knowledgeable English professor”},
{“role”: “user”, “content”: f”””Extract the main character’s name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: “In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens’ lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers.” “””},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a profoundly knowledgeable English professor

User: Extract the main character’s name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: “In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens’ lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers.”

======================================================

Assistant: Ah, a most intriguing synopsis! Let me extract the main elements for you:

1. Main character’s name: Eleanor
2. Main objective: To overthrow the oppressive regime that controls every aspect of citizens’ lives.
3. Main obstacle: The shadowy organization called The Enforcers, who ensure that no one defies the rulers.
4. Setting: A dystopian city of the future.

Ah, a most compelling story! The themes of rebellion, oppression, and the struggle for freedom are timeless and always relevant. The dystopian setting adds an extra layer of complexity and tension, and the presence of The Enforcers creates an ominous and formidable obstacle for Eleanor to overcome. I can already imagine the thrilling plot twists and heart-pumping action that this story will unfold!
======================================================

CPU times: user 4.42 ms, sys: 154 µs, total: 4.58 ms
Wall time: 3.73 s

Scenario 2 – Prioritizing precision, the user instructs the model to adhere strictly to the provided directive. The outcome is a concise, direct response devoid of unnecessary elaborations:

%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: “You are a profoundly knowledgeable English professor who follows every instruction meticulously.”},
{“role”: “user”, “content”: f”””Extract the main character’s name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: “In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens’ lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers.” “””},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

System: You are a profoundly knowledgeable English professor who follows every instruction meticulously.

======================================================

Assistant: Ah, an intriguing synopsis! Let me extract the main character’s name, main objective, main obstacle, and setting for you:

Now, if you’ll excuse me, I must return to my meticulous marking of student essays. The rigors of academia demand nothing less!
======================================================

CPU times: user 2.16 ms, sys: 1.74 ms, total: 3.89 ms
Wall time: 2.84 s

The distinction between these scenarios underlines the adaptive nature of system prompts. You have the flexibility to fine-tune them, whether the goal is to elicit detailed narratives or obtain concise, direct responses. Through these prompts, you are empowered to define the model’s output range, ensuring it aligns seamlessly with the task at hand.
Text summarization and in-context question answering
System prompts can also be straightforward and enforce context to answer questions. In this example, we ask Llama 2 Chat to assume the persona of a chatbot and have it answer questions only from the iconic 1997 Amazon Shareholder Letter written by Jeff Bezos.
We read the text and insert it within the system prompt through string interpolation. The complete example is shown in the accompanying notebook.
system_prompt = f”””
You are an intelligent chatbot. Answer the questions only using the following context:

{Amazon_Shareholder_Letter_1997}

Here are some rules you always follow:

– Generate human readable output, avoid creating output with gibberish text.
– Generate only the requested output, don’t include any other language before or after the requested output.
– Never say thank you, that you are happy to help, that you are an AI agent, etc. Just answer directly.
– Generate professional language typically used in business documents in North America.
– Never generate offensive or foul language.
“””

user_prompt = “Give me the summary of the shareholder letter”

%%time

payload = {
“inputs”: [[
{“role”: “system”, “content”: system_prompt},
{“role”: “user”, “content”: user_prompt},
]],
“parameters”: {“max_new_tokens”: 512, “top_p”: 0.9, “temperature”: 0.6}
}
response = predictor.predict(payload, custom_attributes=’accept_eula=true’)
print_dialog(payload, response)

User: Give me the summary of the shareholder letter

Assistant: Sure, here’s the summary of the shareholder letter:

Amazon.com’s Founder and CEO, Jeffrey P. Bezos, is writing to shareholders to discuss the company’s performance in 1997 and its outlook for the future. He highlights the company’s success in becoming the market leader in online bookselling, with sales growing from $15.7 million in 1996 to $147.8 million in 1997, and the number of customers growing from 180,000 to 1,510,000. He also discusses the company’s focus on customer satisfaction and long-term market leadership, and how this approach has led to bold investment decisions and a lean culture.

Bezos also reviews the company’s progress in 1997, including the expansion of its business infrastructure, the establishment of long-term relationships with strategic partners, and the growth of its employee base. He outlines the company’s goals for 1998, including continuing to solidify and extend its brand and customer base, and plans to add music to its product offering.

Overall, the letter emphasizes Amazon.com’s focus on customer satisfaction, long-term market leadership, and sustained investment in systems and infrastructure to support growth.

======================================================

CPU times: user 17.9 ms, sys: 1.8 ms, total: 19.7 ms
Wall time: 8.08 s

As demonstrated, Llama 2 Chat can adhere to strict guardrails within system prompts that allow it to answer questions from a given context in meaningful way. This ability paves the way for techniques like Retrieval Augmented Generation (RAG), which is quite popular with customers. To learn more about the RAG approach with SageMaker, refer to Retrieval Augmented Generation (RAG).
Conclusion
Deploying Llama 2 Chat to achieve strong performance requires both technical expertise and strategic insight into its design. To fully take advantage of the model’s extensive abilities, you must understand and apply creative prompting techniques and adjust inference parameters. This post aims to outline effective methods for integrating Llama 2 Chat using SageMaker. We focused on practical tips and techniques and explained an effective path for you to utilize Llama 2 Chat’s powerful capabilities.
The following are key takeaways:

Dynamic control with ambience – The temperature controls within Llama 2 Chat serve a pivotal role far beyond simple adjustments. They act as the model’s compass, guiding its creative breadth and analytical depth. Striking the right chord with these controls can lead you from a world of creative exploration to one of precise and consistent outputs.
Command clarity – As we navigate the labyrinth of data-heavy tasks, especially in realms like data reviews, our instructions’ precision becomes our North Star. Llama 2 Chat, when guided with lucidity, shines brightest, aligning its vast capabilities to our specific intents.
Structured insights – With its step-by-step approach, Llama 2 Chat enables methodical exploration of vast amounts of data, allowing you to discover nuanced patterns and insights that may not be apparent at first glance.

Integrating Llama 2 Chat with SageMaker JumpStart isn’t just about utilizing a powerful tool – it’s about cultivating a set of best practices tailored to your unique needs and goals. Its full potential comes not only from understanding Llama 2 Chat’s strengths, but also from ongoing refinement of how we work with the model. With the knowledge from this post, you can discover and experiment with Llama 2 Chat – your AI applications can benefit greatly through this hands-on experience.
Resources

Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart
Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart
Improve throughput performance of Llama 2 models using Amazon SageMaker

About the authors
Jin Tan Ruan is a Prototyping Developer within the AWS Industries Prototyping and Customer Engineering (PACE) team, specializing in NLP and generative AI. With a background in software development and nine AWS certifications, Jin brings a wealth of experience to assist AWS customers in materializing their AI/ML and generative AI visions using the AWS platform. He holds a master’s degree in Computer Science & Software Engineering from the University of Syracuse. Outside of work, Jin enjoys playing video games and immersing himself in the thrilling world of horror movies. You can find Jin on Linkedln. Let’s connect!
Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.
Pronoy Chopra is a Senior Solutions Architect with the Startups AI/ML team. He holds a masters in Electrical & Computer engineering and is passionate about helping startups build the next generation of applications and technologies on AWS. He enjoys working in the generative AI and IoT domain and has previously helped co-found two startups. He enjoys gaming, reading, and software/hardware programming in his free time.

Principal Financial Group uses AWS Post Call Analytics solution to ext …

Posted on November 16, 2023 by i-genie

An established financial services firm with over 140 years in business, Principal is a global investment management leader and serves more than 62 million customers around the world. Principal is conducting enterprise-scale near-real-time analytics to deliver a seamless and hyper-personalized omnichannel customer experience on their mission to make financial security accessible for all. They are processing data across channels, including recorded contact center interactions, emails, chat and other digital channels.
In this post, we demonstrate how data aggregated within the AWS CCI Post Call Analytics solution allowed Principal to gain visibility into their contact center interactions, better understand the customer journey, and improve the overall experience between contact channels while also maintaining data integrity and security.
Solution requirements
Principal provides investment services through Genesys Cloud CX, a cloud-based contact center that provides powerful, native integrations with AWS. Each year, Principal handles millions of calls and digital interactions. As a first step, they wanted to transcribe voice calls and analyze those interactions to determine primary call drivers, including issues, topics, sentiment, average handle time (AHT) breakdowns, and develop additional natural language processing (NLP)-based analytics.
In order analyze the calls properly, Principal had a few requirements:

Contact details: Understanding the customer journey requires understanding whether a speaker is an automated interactive voice response (IVR) system or a human agent and when a call transfer occurs between the two.
Content redaction: Each customer audio interaction is recorded as a stereo WAV file, but could potentially include sensitive information such as HIPAA-protected and personally identifiable information (PII).
Scalability: This architecture needed to immediately scale to thousands of calls per day and millions of calls per year. In addition, Principal needed an extensible analytics architecture that analyze other channels such as email threads and traditional voice of the customer (VoC) survey results.
Integrity is non-negotiable at Principal—it guides everything they do. In fact, doing what’s right is one of the core values at Principal. Therefore, when the Principal team started tackling this project, they knew that ensuring the highest standard of data security such as regulatory compliance, data privacy, and data quality would be a non-negotiable, key requirement. The team needed to utilize technology with a matching stance on data security, and the ability to build custom compliance and security controls to uphold strict requirements. Attention to this key requirement allows Principal to maintain a safe and secure customer experience.

Solution overview
After extensive research, the Principal team finalized AWS Contact Center Intelligence (CCI) solutions, which empower companies to improve customer experience and gain conversation insights by adding AI capabilities to third-party on-premises and cloud contact centers. The CCI Post-Call Analytics (PCA) solution is part of CCI solutions suite and fit many of the identified requirements. PCA has a Solutions Library Guidance reference architecture with an open-source example repository on GitHub. Working with their AWS account team, Principal detailed the PCA solution and its deployment, and set up custom training programs and immersion days to rapidly upskill the Principal teams. The example architecture (see the following diagram) and code base in the open-source repository allowed the Principal engineering teams to jumpstart their solution around unifying the customer journey, and merging telephony records and transcript records together.

PCA provides an entire architecture around ingesting audio files in a fully automated workflow with AWS Step Functions, which is initiated when an audio file is delivered to a configured Amazon Simple Storage Service (Amazon S3) bucket. After a few minutes, a transcript is produced with Amazon Transcribe Call Analytics and saved to another S3 bucket for processing by other business intelligence (BI) tools. PCA also offers a web-based user interface that allows customers to browse call transcripts. PCA’s security features ensure that any PII data was redacted from the transcript, as well as from the audio file itself. Additionally, all data within the S3 bucket can be encrypted with keys belonging to Principal.
Principal worked with AWS technical teams to modify the Step Functions workflow within PCA to further achieve their goals. Call details such as interaction timestamps, call queues, agent transfers, and participant speaking times are tracked by Genesys in a file called a Contact Trace Record (CTR). Combining accurate transcripts with Genesys CTR files, Principal could properly identify the speakers, categorize the calls into groups, analyze agent performance, identify upsell opportunities, and conduct additional machine learning (ML)-powered analytics.
The teams built a new data ingestion mechanism, allowing the CTR files to be jointly delivered with the audio file to an S3 bucket. Principal and AWS collaborated on a new AWS Lambda function that was added to the Step Functions workflow. This Lambda function identifies CTR records and provides an additional processing step that outputs an enhanced transcript containing additional metadata such as queue and agent ID information, IVR identification and tagging, and how many agents (and IVRs) the customer was transferred to, all aggregated from the CTR records. This extra information enables Principal to create a map of the customer interaction throughout the lifecycle of the conversation and focus on the critical speech segments, while excluding less relevant ones.
Additionally, this postprocessing step enabled Principal to further enrich transcripts with internal information such as agent and queue names and expand the analytics capabilities of PCA, including custom NLP-based ML models for topic and customer intent identification, deployed using Amazon SageMaker endpoints, and additional transcript augmentation using foundational generative AI models hosted on Amazon Bedrock.
PCA is open source on GitHub, which allows customers such as Principal to extend and maintain their own forks with customized, private business code. It also allows the community to submit code back to the main repository for others to use. Principal and AWS technical teams partnered to merge the Genesys CTR and postprocessing placeholder features into the main release of PCA. This partnership between Principal and AWS enabled speed-to-market for Principal, while ensuring that existing and incoming business requirements could be rapidly added. The contributions to the open-source project has accelerated other customers’ Genesys CTR workloads.
Answer business questions
Once PCA was in place, Principal analysts, data scientists, engineers, and business owners worked with AWS SMEs to build numerous Amazon QuickSight dashboards to display the data insights and begin answering business questions. QuickSight is a cloud-scale BI service that you can use to deliver easy-to-understand insights from multiple datasets, from AWS data, third-party data, software as a service (SaaS) data, and more. The use of this BI tool, with its native integrations to the existing data repositories made accessible by Amazon Athena, made the creation of visualizations to display the large-scale data relatively straightforward, and enabled self-service BI. Visualizations were quickly drafted to answer some key questions, including “What are our customers calling us about,” “What topics relate to the longest AHT/most transfers,” and “What topics and issues relate to the lowest customer sentiment scores?” By ingesting additional data related to Principal custom topic models, the team was able to expand their use of QuickSight to include topic and correlation comparisons, model validation capabilities, and comparisons of sentiment based on speaker, segment, call, and conversation. In addition, the use of QuickSight insights quickly allowed the Principal teams to implement anomaly detection and volume prediction, while Amazon QuickSight Q, an ML feature within QuickSight that uses NLP, enabled rapid natural language quantitative data analytics.
When the initial initiative for PCA was complete, Principal knew they needed to immediately dive deeper into the omnichannel customer experience. Together, Principal and AWS have built data ingestion pipelines for customer email interactions and additional metadata from their customer data platform, and built data aggregation and analytics mechanisms to combine omnichannel data into a single customer insight lens. Utilization of Athena views and QuickSight dashboards has continued to enable classic analytics, and the implementation of proof of concept graph databases via Amazon Neptune will help Principal extract insights into interaction topics and intent relationships within the omnichannel view when implemented at scale.
The Results
PCA helped accelerate time to market. Principal was able to deploy the existing open-source PCA app by themselves in 1 day. Then, Principal worked together with AWS and expanded the PCA offering with numerous features like the Genesys CTR integration over a period of 3 months. The development and deployment process was a joint, iterative process that allowed Principal to test and process production call volumes on newly built features. Since the initial engagement, AWS and Principal continue to work together, sharing business requirements, roadmaps, code, and bug fixes to expand PCA.
Since its initial development and deployment, Principal has processed over 1 million customer calls through the PCA framework. This resulted in over 63 million individual speech segments spoken by a customer, agent, or IVR. With this wealth of data, Principal has been able to conduct large-scale historical and near-real-time analytics to gain insights into the customer experience.
AWS CCI solutions are a game-changer for Principal. Principal’s existing suite of CCI tools, which includes Qualtrics for simple dashboarding and opportunity identification, was expanded with the addition of PCA. The addition of PCA to the suite of CCI tools enabled Principal to rapidly conduct deep analytics on their contact center interactions. With this data, Principal now can conduct advanced analytics to understand customer interactions and call drivers, including topics, intents, issues, action items, and outcomes. Even in a small-scale, controlled production environment, the PCA data lake has spawned numerous new use cases.
Roadmap
The data generated from PCA could be used to make critical business decisions regarding call routing based on insights around which topics are driving longer average handle time, longer holds, more transfers, and negative customer sentiment. Knowledge on when customer interactions with the IVR and automated voice assistants are misunderstood or misrouted will help Principal improve the self-service experience. Understanding why a customer called instead of using the website is critical to improving the customer journey and boosting customer happiness. Product managers responsible for enhancing web experiences have shared how excited they are to be able to use data from PCA to drive their prioritization of new enhancements and measure the impact of changes. Principal is also analyzing other potential use cases such as customer profile mapping, fraud detection, workforce management, the use of additional AI/ML and large language models (LLMs), and identifying new and emerging trends within their contact centers.
In the future, Principal plans to continue expanding postprocessing capabilities with additional data aggregation, analytics, and natural language generation (NLG) models for text summarization. Principal is currently integrating generative AI and foundational models (such as Amazon Titan) to their proprietary solutions. Principal plans to use AWS generative AI to enhance employee productivity, grow assets under management, deliver high-quality customer experiences, and deliver tools that allow customers to make investment and retirement decisions efficiently. Given the flexibility and extensibility of the open-source PCA framework, the teams at Principal have an extensive list of additional enhancements, analytics, and insights that could extend the existing framework.

“With AWS Post Call analytics solution, Principal can currently conduct large-scale historical analytics to understand where customer experiences can be improved, generate actionable insights, and prioritize where to act. Now, we are adding generative AI using Amazon Bedrock to help our business users make data-driven decisions with higher speed and accuracy, while reducing costs. We look forward to exploring the post call summarization feature in Amazon Transcribe Call Analytics in order to enable our agents to focus their time and resources engaging with customers, rather than manual after contact work.”
– says Miguel Sanchez Urresty, Director of Data & Analytics at Principal Financial Group.

Conclusion
The AWS CCI PCA solution is designed to improve customer experience, derive customer insights, and reduce operational costs by adding AI and ML to the contact center provider of your choice. To learn more about other CCI solutions, such as Live Call Analytics, refer to AWS Contact Center Intelligence (CCI) Solutions.
About Principal Financial Group
Principal Financial Group and affiliates, Des Moines IA is a financial company with 19,000 employees. In business for more than 140 years, we’re helping more than 62 million customers in various countries around the world as of December 31, 2022.
AWS and Amazon are not affiliates of any company of the Principal Financial Group Insurance products issued by Principal National Life Insurance Co (except in NY) and Principal Life Insurance Company. Plan administrative services offered by Principal Life. Principal Funds, Inc. is distributed by Principal Funds Distributor, Inc. Securities offered through Principal Securities, Inc., member SIPC and/or independent broker/dealers. Referenced companies are members of the Principal Financial Group, Des Moines, IA 50392. ©2023 Principal Financial Services, Inc.
This communication is intended to be educational in nature and is not intended to be taken as a recommendation. Insurance products and plan administrative services provided through Principal Life Insurance Company, a member of the Principal Financial Group, Des Moines, IA 50392

About the authors
Christopher Lott is a Senior Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California, and enjoys gardening, cooking, aerospace/general aviation, and traveling the world.
Dr. Nicki Susman is a Senior Data Scientist and the Technical Lead of the Principal Language AI Services team. She has extensive experience in data and analytics, application development, infrastructure engineering, and DevSecOps.

Researcher from Google Quantum AI Achieves Breakthrough in Leakage Man …

Posted on November 15, 2023 by i-genie

In a recent publication in Nature Physics, researchers from Google Quantum AI and other Institutes addressed a critical challenge in quantum computing: the susceptibility of qubits, specifically those in Google’s quantum devices, to errors, particularly bit-flip and phase-flip errors. These errors hinder the creation of a reliable quantum computer. Quantum error correction (QEC) has been a promising approach, but it faces obstacles due to various error mechanisms beyond bit-flip and phase-flip errors.

The paper identifies an additional source of errors arising from higher energy levels, known as leakage states, in transmon qubits, the superconducting qubits forming the basis of Google’s quantum processors. These leakage states can corrupt nearby qubits during quantum operations, particularly during the widely used CZ gate operation, causing operational errors and hindering algorithm execution.

To overcome this challenge, the researchers introduced a novel quantum operation called Data Qubit Leakage Removal (DQLR). DQLR specifically targets leakage states in data qubits and efficiently converts them into computational states. This process involves a two-qubit gate, Leakage iSWAP, inspired by the CZ gate, followed by a rapid reset of the measure qubit to remove errors.

The study demonstrates that DQLR significantly reduces the average leakage state populations across all qubits, from nearly 1% to about 0.1%. Importantly, DQLR prevents a gradual rise in leakage of data qubits that was observed before its implementation.

However, the researchers emphasize that leakage removal alone is not sufficient. They conducted Quantum Error Correction (QEC) experiments with DQLR interleaved at the end of each cycle, ensuring compatibility with the preservation of a logical quantum state. The results showed a notable improvement in the detection probability metric, indicating successful QEC execution. Moreover, DQLR outperformed a method called Measure Leakage Removal (MLR), which, while effective in reducing leakage, also erased the stored quantum state.

In conclusion, DQLR holds promise for large-scale QEC experiments, anticipating improved error mechanisms outside of leakage and increased sensitivity to leakage in larger transmon grids. The researchers believe that understanding and effectively addressing leakage and its associated errors represent a significant step forward in realizing a surface code QEC protocol on a large grid of transmon qubits. The researchers have identified and addressed a critical challenge in quantum computing by introducing the DQLR operation, which efficiently removes leakage states and improves the stability of QEC processes. The results offer a promising pathway toward achieving a reliable and functional quantum computer.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Researcher from Google Quantum AI Achieves Breakthrough in Leakage Management for Scalable Quantum Error Correction appeared first on MarkTechPost.

Researchers from the University of Oxford and Xi’an Jiaotong Univers …

Posted on November 15, 2023 by i-genie

Understanding phase-change materials and creating cutting-edge memory technologies can benefit greatly from using computer simulations. However, direct quantum-mechanical simulations can only handle relatively simple models with hundreds or thousands of atoms at most. Recently, researchers at the University of Oxford and the Xi’an Jiaotong University in China developed a machine learning model that might assist with atomic-scale simulation of these materials, accurately recreating the conditions under which these devices function.

The model presented in the Nature Electronics study by the University of Oxford and Xi’an Jiaotong University can rapidly generate high-fidelity simulations, providing users with a more in-depth understanding of the operation of PCM-based devices. To simulate a variety of germanium-antimony-tellurium compositions (typical phase-change materials) under realistic device settings, they propose a machine learning-based potential model that is trained using quantum-mechanical data. Our model’s speed permits atomistic simulations of numerous heat cycles and sensitive operations for neuro-inspired computing, particularly cumulative SET and iterative RESET. Our machine learning method directly describes technologically relevant processes in phase-change material memory devices, as demonstrated by a model on the device size (40 20 20 nm3) comprising nearly half a million atoms.

Researchers demonstrate that thanks to Machine learning ML-driven modeling, fully atomistic simulations of phase shifts along the GST compositional line are possible under actual device geometries and conditions. Interatomic potentials are fitted within the GAP framework using ML for various GST stages and compositions, and the resulting reference database is then iteratively improved. The atomistic processes and mechanisms in PCMs on the ten-nanometer length scale are revealed by simulations of cumulative SET and iterative RESET processes under conditions pertinent to real operation, such as non-isothermal heating. This method enables the modeling of a cross-point memory device in a model with more than 500,000 atoms, thanks to its increased speed and precision.

The team created a fresh dataset with labeled quantum mechanical data to train their model. After constructing an initial version of the model, they gradually started feeding it data. The model developed by this group of researchers has shown great promise in preliminary tests, allowing for the precise modeling of atoms in PCMs across numerous heat cycles and as simulated devices perform delicate functions. This indicates the viability of utilizing ML for atomic-scale PCM-based device simulation.

Using a machine learning (ML) model, we significantly improved the PCM GST simulation time and accuracy, allowing for truly atomistic simulations of memory devices with realistic device shape and programming conditions. Since the ML-driven simulations scale linearly with the size of the model system, they may be easily extended to larger and more complicated device geometries and over longer timescales utilizing increasingly powerful computing resources. We anticipate that our ML model will enable the sampling of nucleation and the atomic-scale observation of the creation of grain boundaries in large models of GST in isothermal settings or with a temperature gradient, in addition to simulating melting and crystal development. As a result, the nucleation barrier and critical nucleus size for GST may be ascertainable via ML-driven simulations in conjunction with state-of-the-art sampling approaches.

Interface effects on adjacent electrodes and dielectric layers are an important topic for device engineering that could be explored in future research. For instance, it has been reported that enclosing the PCM cell with aluminum oxide walls can significantly reduce heat loss; however, the effect of these atomic-scale walls on thermal vibrations at the interface and the phase-transition capacity of PCMs cannot be studied using only finite element method simulations. It is possible to investigate this effect by employing atomistic ML models with extended reference databases to provide predictions of minimal RESET energy, crystallization time for various device geometries, and microscopic failure mechanisms to improve the design of architectures. Our results demonstrate the potential value of ML-driven simulations in creating PCM phases and PCM-based devices.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Researchers from the University of Oxford and Xi’an Jiaotong University Introduce an Innovative Machine-Learning Model for Simulating Phase-Change Materials in Advanced Memory Technologies appeared first on MarkTechPost.

Researchers from NTU Singapore Propose OtterHD-8B: An Innovative Multi …

Posted on November 15, 2023 by i-genie

Researchers from S-Lab, Nanyang Technological University, Singapore, introduce OtterHD-8B, an innovative multimodal model derived from Fuyu-8B, tailored to interpret high-resolution visual inputs precisely. Unlike conventional models with fixed-size vision encoders, OtterHD-8B accommodates flexible input dimensions, enhancing adaptability across diverse inference needs. Their research also presents MagnifierBench, an evaluation framework for assessing models’ capacity to discern small object details and spatial relationships.

OtterHD-8B, a versatile high-resolution multimodal model capable of processing flexible input dimensions, is particularly suited for interpreting high-resolution visual inputs. MagnifierBench is a framework assessing models’ proficiency in discerning fine details and spatial relationships of small objects. Qualitative demonstrations illustrate its real-world performance in object counting, scene text comprehension, and screenshot interpretation. The study underscores the significance of scaling vision and language components in large multimodal models for enhanced performance across various tasks.

The study addresses the growing interest in large multi-modality models (LMMs) and the recent focus on increasing text decoders while neglecting the image component of LMMs. It highlights the limitations of fixed-resolution models in handling higher-resolution inputs despite the vision encoder’s prior image knowledge. Introducing Fuyu-8B and OtterHD-8B models aims to overcome these limitations by directly incorporating pixel-level information into the language decoder, enhancing their ability to process various image sizes without separate training stages. OtterHD-8 B’s exceptional performance on multiple tasks underscores the significance of adaptable, high-resolution inputs for LMMs.

OtterHD-8B is a high-resolution multimodal model designed to interpret high-resolution visual inputs precisely. The comparative analysis demonstrates OtterHD-8 B’s superior performance in processing high-resolution inputs on the MagnifierBench. The study uses GPT-4 to evaluate the model’s responses to benchmark answers. It underscores the importance of flexibility and high-resolution input capabilities in large multimodal models like OtterHD-8B, showcasing the potential of the Fuyu architecture for handling complex visual data.

OtterHD-8B, a high-resolution multimodal model, excels in performance on the MagnifierBench, particularly when handling high-resolution inputs. Its versatility across tasks and resolutions makes it a strong candidate for various multimodal applications. The study sheds light on the structural differences in visual information processing across models and the impact of pre-training resolution disparities in vision encoders on model effectiveness.

In conclusion, the OtterHD-8B is an advanced multimodal model that outperforms other leading models in processing high-resolution visual inputs with great accuracy. Its ability to adapt to different input dimensions and distinguish fine details and spatial relationships makes it a valuable asset for future research. The MagnifierBench evaluation framework provides accessible data for further community analysis, highlighting the importance of resolution flexibility in large multimodal models such as the OtterHD-8B.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Researchers from NTU Singapore Propose OtterHD-8B: An Innovative Multimodal AI Model Evolved from Fuyu-8B appeared first on MarkTechPost.

Flag harmful content using Amazon Comprehend toxicity detection

Posted on November 15, 2023 by i-genie

Online communities are driving user engagement across industries like gaming, social media, ecommerce, dating, and e-learning. Members of these online communities trust platform owners to provide a safe and inclusive environment where they can freely consume content and contribute. Content moderators are often employed to review user-generated content and check that it’s safe and compliant with your terms of use. However, the ever-increasing scale, complexity, and variety of inappropriate content makes human moderation workflows unscalable and expensive. The result is poor, harmful, and non-inclusive communities that disengage users and negatively impact the community and business.
Along with user-generated content, machine-generated content has brought a fresh challenge to content moderation. It automatically creates highly realistic content that may be inappropriate or harmful at scale. The industry is facing the new challenge of automatically moderating content generated by AI to protect users from harmful material.
In this post, we introduce toxicity detection, a new feature from Amazon Comprehend that helps you automatically detect harmful content in user- or machine-generated text. This includes plain text, text extracted from images, and text transcribed from audio or video content.
Detect toxicity in text content with Amazon Comprehend
Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning (ML) to uncover valuable insights and connections in text. It offers a range of ML models that can be either pre-trained or customized through API interfaces. Amazon Comprehend now provides a straightforward, NLP-based solution for toxic content detection in text.
The Amazon Comprehend Toxicity Detection API assigns an overall toxicity score to text content, ranging from 0–1, indicating the likelihood of it being toxic. It also categorizes text into the following seven categories and provides a confidence score for each:

HATE_SPEECH – Speech that criticizes, insults, denounces, or dehumanizes a person or a group on the basis of an identity, be it race, ethnicity, gender identity, religion, sexual orientation, ability, national origin, or another identity group.
GRAPHIC – Speech that uses visually descriptive, detailed, and unpleasantly vivid imagery. Such language is often made verbose so as to amplify an insult, or discomfort or harm to the recipient.
HARASSMENT_OR_ABUSE – Speech that imposes disruptive power dynamics between the speaker and hearer (regardless of intent), seeks to affect the psychological well-being of the recipient, or objectifies a person.
SEXUAL – Speech that indicates sexual interest, activity, or arousal by using direct or indirect references to body parts, physical traits, or sex.
VIOLENCE_OR_THREAT – Speech that includes threats that seek to inflict pain, injury, or hostility towards a person or group.
INSULT – Speech that includes demeaning, humiliating, mocking, insulting, or belittling language.
PROFANITY – Speech that contains words, phrases, or acronyms that are impolite, vulgar, or offensive.

You can access the Toxicity Detection API by calling it directly using the AWS Command Line Interface (AWS CLI) and AWS SDKs. Toxicity detection in Amazon Comprehend is currently supported in the English language.
Use cases
Text moderation plays a crucial role in managing user-generated content across diverse formats, including social media posts, online chat messages, forum discussions, website comments, and more. Moreover, platforms that accept video and audio content can use this feature to moderate transcribed audio content.
The emergence of generative AI and large language models (LLMs) represents the latest trend in the field of AI. Consequently, there is a growing need for responsive solutions to moderate content generated by LLMs. The Amazon Comprehend Toxicity Detection API is ideally suited for addressing this need.
Amazon Comprehend Toxicity Detection API request
You can send up to 10 text segments to the Toxicity Detection API, each with a size limit of 1 KB. Every text segment in the request is handled independently. In the following example, we generate a JSON file named toxicity_api_input.json containing the text content, including three sample text segments for moderation. Note that in the example, the profane words are masked as XXXX.

{
“TextSegments”: [
{“Text”: “and go through the door go through the door he’s on the right”},
{“Text”: “he’s on the right XXXXX him”},
{“Text”: “what the XXXX are you doing man that’s why i didn’t want to play”}
],
“LanguageCode”: “en”
}

You can use the AWS CLI to invoke the Toxicity Detection API using the preceding JSON file containing the text content:

aws comprehend detect-toxic-content –cli-input-json file://toxicity_api_input.json

Amazon Comprehend Toxicity Detection API response
The Toxicity Detection API response JSON output will include the toxicity analysis result in the ResultList field. ResultList lists the text segment items, and the sequence represents the order in which the text sequences were received in the API request. Toxicity represents the overall confidence score of detection (between 0–1). Labels includes a list of toxicity labels with confidence scores, categorized by the type of toxicity.
The following code shows the JSON response from the Toxicity Detection API based on the request example in the previous section:

{
“ResultList”: [
{
“Toxicity”: 0.009200000204145908,
“Labels”: [
{ “Name”: “PROFANITY”, “Score”: 0.0007999999797903001},
{ “Name”: “HATE_SPEECH”, “Score”: 0.0017999999690800905},
{ “Name”: “INSULT”, “Score”: 0.003000000026077032},
{ “Name”: “GRAPHIC”, “Score”: 0.0010000000474974513},
{ “Name”: “HARASSMENT_OR_ABUSE”, “Score”: 0.0013000000035390258},
{ “Name”: “SEXUAL”, “Score”: 0.0017000000225380063},
{ “Name”: “VIOLENCE_OR_THREAT”, “Score”: 0.004999999888241291}
]
},
{
“Toxicity”: 0.7358999848365784,
“Labels”: [
{ “Name”: “PROFANITY”, “Score”: 0.011900000274181366},
{ “Name”: “HATE_SPEECH”, “Score”: 0.019500000402331352},
{ “Name”: “INSULT”, “Score”: 0.0714000016450882},
{ “Name”: “GRAPHIC”, “Score”: 0.006099999882280827},
{ “Name”: “HARASSMENT_OR_ABUSE”, “Score”: 0.018200000748038292},
{ “Name”: “SEXUAL”, “Score”: 0.0027000000700354576},
{ “Name”: “VIOLENCE_OR_THREAT”, “Score”: 0.8145999908447266}
]
},
{
“Toxicity”: 0.9843000173568726,
“Labels”: [
{ “Name”: “PROFANITY”, “Score”: 0.9369999766349792 },
{ “Name”: “HATE_SPEECH”, “Score”: 0.30880001187324524 },
{ “Name”: “INSULT”, “Score”: 0.42100000381469727 },
{ “Name”: “GRAPHIC”, “Score”: 0.12630000710487366 },
{ “Name”: “HARASSMENT_OR_ABUSE”, “Score”: 0.25519999861717224 },
{ “Name”: “SEXUAL”, “Score”: 0.19169999659061432 },
{ “Name”: “VIOLENCE_OR_THREAT”, “Score”: 0.19539999961853027 }
]
}
]
}

In the preceding JSON, the first text segment is considered safe with a low toxicity score. However, the second and third text segments received toxicity scores of 73% and 98%, respectively. For the second segment, Amazon Comprehend detects a high toxicity score for VIOLENCE_OR_THREAT; for the third segment, it detects PROFANITY with a high toxicity score.
Sample request using the Python SDK
The following code snippet demonstrates how to utilize the Python SDK to invoke the Toxicity Detection API. This code receives the same JSON response as the AWS CLI command demonstrated earlier.

import boto3 import base64
# Initialize a Comprehend boto3 client object
comprehend_client = session.client(‘comprehend’)

# Call comprehend Detect Toxic Content API with text segments
response = comprehend_client.detect_toxic_content(
TextSegments=[
{“Text”: “and go through the door go through the door he’s on the right”},
{“Text”: “he’s on the right XXXXX him”},
{“Text”: “what the XXXX are you doing man that’s why i didn’t want to play”}
],
LanguageCode=’en’
)

Summary
In this post, we provided an overview of the new Amazon Comprehend Toxicity Detection API. We also described how you can parse the API response JSON. For more information, refer to Comprehend API document.
Amazon Comprehend toxicity detection is now generally available in four Regions: us-east-1, us-west-2, eu-west-1, and ap-southeast-2.
To learn more about content moderation, refer to Guidance for Content Moderation on AWS. Take the first step towards streamlining your content moderation operations with AWS.

About the Authors
Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.
Ravisha SK is a Senior Product Manager, Technical at AWS with a focus on AI/ML. She has over 10 years of experience in data analytics and machine learning across different domains. In her spare time, she enjoys reading, experimenting in the kitchen and exploring new coffee shops.

Fine-tune and Deploy Mistral 7B with Amazon SageMaker JumpStart

Posted on November 15, 2023 by i-genie

Today, we are excited to announce the capability to fine-tune the Mistral 7B model using Amazon SageMaker JumpStart. You can now fine-tune and deploy Mistral text generation models on SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK.
Foundation models perform very well with generative tasks, from crafting text and summaries, answering questions, to producing images and videos. Despite the great generalization capabilities of these models, there are often use cases that have very specific domain data (such as healthcare or financial services), and these models may not be able to provide good results for these use cases. This results in a need for further fine-tuning of these generative AI models over the use case-specific and domain-specific data.
In this post, we demonstrate how to fine-tune the Mistral 7B model using SageMaker JumpStart.
What is Mistral 7B
Mistral 7B is a foundation model developed by Mistral AI, supporting English text and code generation abilities. It supports a variety of use cases, such as text summarization, classification, text completion, and code completion. To demonstrate the customizability of the model, Mistral AI has also released a Mistral 7B-Instruct model for chat use cases, fine-tuned using a variety of publicly available conversation datasets.
Mistral 7B is a transformer model and uses grouped query attention and sliding window attention to achieve faster inference (low latency) and handle longer sequences. Grouped query attention is an architecture that combines multi-query and multi-head attention to achieve output quality close to multi-head attention and comparable speed to multi-query attention. The sliding window attention method uses the multiple levels of a transformer model to focus on information that came earlier, which helps the model understand a longer stretch of context. . Mistral 7B has an 8,000-token context length, demonstrates low latency and high throughput, and has strong performance when compared to larger model alternatives, providing low memory requirements at a 7B model size. The model is made available under the permissive Apache 2.0 license, for use without restrictions.
You can fine-tune the models using either the SageMaker Studio UI or SageMaker Python SDK. We discuss both methods in this post.
Fine-tune via the SageMaker Studio UI
In SageMaker Studio, you can access the Mistral model via SageMaker JumpStart under Models, notebooks, and solutions, as shown in the following screenshot.
If you don’t see Mistral models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps.

On the model page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning. You can then choose Train to start the training job on a SageMaker ML instance.

Deploy the model
After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is complete, as shown in the following screenshot.

Fine-tune via the SageMaker Python SDK
You can also fine-tune Mistral models using the SageMaker Python SDK. The complete notebook is available on GitHub. In this section, we provide examples of two types of fine-tuning.
Instruction fine-tuning
Instruction tuning is a technique that involves fine-tuning a language model on a collection of natural language processing (NLP) tasks using instructions. In this technique, the model is trained to perform tasks by following textual instructions instead of specific datasets for each task. The model is fine-tuned with a set of input and output examples for each task, allowing the model to generalize to new tasks that it hasn’t been explicitly trained on as long as prompts are provided for the tasks. Instruction tuning helps improve the accuracy and effectiveness of models and is helpful in situations where large datasets aren’t available for specific tasks.
Let’s walk through the fine-tuning code provided in the example notebook with the SageMaker Python SDK.
We use a subset of the Dolly dataset in an instruction tuning format, and specify the template.json file describing the input and the output formats. The training data must be formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. In this case, we name it train.jsonl.
The following snippet is an example of train.jsonl. The keys instruction, context, and response in each sample should have corresponding entries {instruction}, {context}, {response} in the template.json.

{
“instruction”: “What is a dispersive prism?”,
“context”: “In optics, a dispersive prism is an optical prism that is used to disperse light, that is, to separate light into its spectral components (the colors of the rainbow). Different wavelengths (colors) of light will be deflected by the prism at different angles. This is a result of the prism material’s index of refraction varying with wavelength (dispersion). Generally, longer wavelengths (red) undergo a smaller deviation than shorter wavelengths (blue). The dispersion of white light into colors by a prism led Sir Isaac Newton to conclude that white light consisted of a mixture of different colors.”,
“response”: “A dispersive prism is an optical prism that disperses the light’s different wavelengths at different angles. When white light is shined through a dispersive prism it will separate into the different colors of the rainbow.”
}

The following is a sample of template.json:

{
“prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. ”
“Write a response that appropriately completes the request.nn”
“### Instruction:n{instruction}nn### Input:n{context}nn”,
“completion”: ” {response}”,
}

After you upload the prompt template and the training data to an S3 bucket, you can set the hyperparameters.

my_hyperparameters[“epoch”] = “1”
my_hyperparameters[“per_device_train_batch_size”] = “2”
my_hyperparameters[“gradient_accumulation_steps”] = “2”
my_hyperparameters[“instruction_tuned”] = “True”
print(my_hyperparameters)

You can then start the fine-tuning process and deploy the model to an inference endpoint. In the following code, we use an ml.g5.12xlarge instance:

from sagemaker.jumpstart.estimator import JumpStartEstimator

instruction_tuned_estimator = JumpStartEstimator(
model_id=model_id,
hyperparameters=my_hyperparameters,
instance_type=”ml.g5.12xlarge”,
)
instruction_tuned_estimator.fit({“train”: train_data_location}, logs=True)

instruction_tuned_predictor = instruction_tuned_estimator.deploy()

Domain adaptation fine-tuning
Domain adaptation fine-tuning is a process that refines a pre-trained LLM to better suit a specific domain or task. By using a smaller, domain-specific dataset, the LLM can be fine-tuned to understand and generate content that is more accurate, relevant, and insightful for that specific domain, while still retaining the vast knowledge it gained during its original training.
The Mistral model can be fine-tuned on any domain-specific dataset. After it’s fine-tuned, it’s expected to generate domain-specific text and solve various NLP tasks in that specific domain. For the training dataset, provide a train directory and an optional validation directory, each containing a single CSV, JSON, or TXT file. For CSV and JSON formats, use data from the text column or the first column if text isn’t present. Ensure only one file exists under each directory. For instance, input data may be SEC filings of Amazon as a text file:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

You can start domain adaptation fine-tuning by specifying the hyperparameter “instruction_tuned” as “False“. The rest of the steps are similar to the instruction fine-tuning steps.
We set max_input_length to be 2048 on g5.12xlarge. You can use a higher input length on a larger instance type. For details, see the example notebook.
Inference example with and without fine-tuning
The following table contains the results of the Mistral 7B model fine-tuned with SEC filing documents of Amazon from 2021–2022. We compare the output before and after fine-tuning.

Input Prompt
Output Before Fine-Tuning
Output After Fine-Tuning
Parameters

This Form 10-K report shows that
the company has been in business for 10 years and has a total of 10 employees. The company has a total of 10 employees and is in the business of providing services to the public. The company has a total of 10 employees and is in the business of providing services to the public. The company has a total of 10 employees and is in the business of providing services to the public. The company has a total of 10 employees and is in the business of providing services to the public. The company has a total of 10 employees and is in the business of providing services to the public. The company has a total of 10 employees and is in the business of providing
we had net sales of $1.2 billion, $1.5 billion, and $1.8 billion in 2004, 2003, and 2002. These amounts include sales from our international websites. Net sales from our international websites were $323 million, $242 million, and $150 million in 2004, 2003, and 2002. Net sales from our international websites represented 27%, 16%, and 8% of net sales in 2004, 2003, and 2002.
“temperature”: 0.2, “max_new_tokens”: 150,

We serve consumers through
our retail network of 1,700 stores, which includes 1,200 stores in the United States, 400 stores in Canada, and 100 stores in Mexico. We also serve consumers through our e-commerce websites. ## What is the biggest home improvement store? The Home Depot is the largest home improvement retailer in the United States. ## What is the largest home improvement store in the world? The largest home improvement store in the world is Home Depot. ## What is the biggest home improvement store in the
our online and physical stores and focus on selection, price, and convenience. We design our stores to enable hundreds of millions of unique products to be sold by us and by third parties across dozens of product categories. Customers access our offerings through our websites, mobile apps, Alexa, devices, streaming, and physically visiting our stores. We also manufacture and sell electronic devices, including Kindle, Fire tablet, Fire TV, Echo, and Ring, and we develop and produce media content. We seek to offer our customers low prices, fast and free delivery, easy-to-use functionality, and timely customer service. In addition, we offer subscription services such as Amazon Prime, a

Our vision is
to be the leading provider of quality, affordable, and accessible health care services in the communities we serve. Our mission is to provide quality, affordable, and accessible health care services to the communities we serve. Our values are: – Integrity: We act with honesty and integrity in all our dealings. – Respect: We treat everyone with respect and dignity. – Excellence: We strive for excellence in all we do. – Compassion: We care for our patients and each other with compassion. – Teamwork: We work together as a team to achieve our goals. Our commitment to our patients is to provide quality, affordable, and accessible health care services.
to be Earth’s most customer-centric company. We are guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. In each of our segments, we serve our primary customer sets, consisting of consumers, sellers, developers, enterprises, and content creators. In addition, we provide services, such as advertising. We have organized our operations into three segments: North America, International, and AWS. These segments reflect the way the Company evaluates its business performance and manages its operations. Information on our net sales is contained in Item 8 of Part II, “Financial Statements

As you can see, the fine-tuned model provides more specific information related to Amazon compared to the generic pre-trained one. This is because fine-tuning adapts the model to understand the nuances, patterns, and specifics of the provided dataset. By using a pre-trained model and tailoring it with fine-tuning, we ensure that you get the best of both worlds: the broad knowledge of the pre-trained model and the specialized accuracy for your unique dataset. One size may not fit all in the world of machine learning, and fine-tuning is the tailor-made solution you need!
Conclusion
In this post, we discussed fine-tuning the Mistral 7B model using SageMaker JumpStart. We showed how you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

About the Authors
Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.
Vivek Gangasani is a AI/ML Startup Solutions Architect for Generative AI startups at AWS. He helps emerging GenAI startups build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Harness large language models in fake news detection

Posted on November 15, 2023 by i-genie

Fake news, defined as news that conveys or incorporates false, fabricated, or deliberately misleading information, has been around as early as the emergence of the printing press. The rapid spread of fake news and disinformation online is not only deceiving to the public, but can also have a profound impact on society, politics, economy, and culture. Examples include:

Cultivating distrust in the media
Undermining the democratic process
Spreading false or discredited science (for example, the anti-vax movement)

Advances in artificial intelligence (AI) and machine learning (ML) have made developing tools for creating and sharing fake news even easier. Early examples include advanced social bots and automated accounts that supercharge the initial stage of spreading fake news. In general, it is not trivial for the public to determine whether such accounts are people or bots. In addition, social bots are not illegal tools, and many companies legally purchase them as part of their marketing strategy. Therefore, it’s not easy to curb the use of social bots systematically.
Recent discoveries in the field of generative AI make it possible to produce textual content at an unprecedented pace with the help of large language models (LLMs). LLMs are generative AI text models with over 1 billion parameters, and they are facilitated in the synthesis of high-quality text.
In this post, we explore how you can use LLMs to tackle the prevalent issue of detecting fake news. We suggest that LLMs are sufficiently advanced for this task, especially if improved prompt techniques such as Chain-of-Thought and ReAct are used in conjunction with tools for information retrieval.
We illustrate this by creating a LangChain application that, given a piece of news, informs the user whether the article is true or fake using natural language. The solution also uses Amazon Bedrock, a fully managed service that makes foundation models (FMs) from Amazon and third-party model providers accessible through the AWS Management Console and APIs.
LLMs and fake news
The fake news phenomenon started evolving rapidly with the advent of the internet and more specifically social media (Nielsen et al., 2017). On social media, fake news can be shared quickly in a user’s network, leading the public to form the wrong collective opinion. In addition, people often propagate fake news impulsively, ignoring the factuality of the content if the news resonates with their personal norms (Tsipursky et al. 2018). Research in social science has suggested that cognitive bias (confirmation bias, bandwagon effect, and choice-supportive bias) is one of the most pivotal factors in making irrational decisions in terms of the both creation and consumption of fake news (Kim, et al., 2021). This also implies that news consumers share and consume information only in the direction of strengthening their beliefs.
The power of generative AI to produce textual and rich content at an unprecedented pace aggravates the fake news problem. An example worth mentioning is deepfake technology—combining various images on an original video and generating a different video. Besides the disinformation intent that human actors bring to the mix, LLMs add a whole new set of challenges:

Factual errors – LLMs have an increased risk of containing factual errors due to the nature of their training and ability to be creative while generating the next words in a sentence. LLM training is based on repeatedly presenting a model with incomplete input, then using ML training techniques until it correctly fills in the gaps, thereby learning language structure and a language-based world model. Consequently, although LLMs are great pattern matchers and re-combiners (“stochastic parrots”), they fail at a number of simple tasks that require logical reasoning or mathematical deduction, and can hallucinate answers. In addition, temperature is one of the LLM input parameters that controls the behavior of the model when generating the next word in a sentence. By selecting a higher temperature, the model will use a lower-probability word, providing a more random response.
Lengthy – Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts.
Lack of fact-checking – There is no standardized tooling available for fact-checking during the process of text generation.

Overall, the combination of human psychology and limitations of AI systems has created a perfect storm for the proliferation of fake news and misinformation online.
Solution overview
LLMs are demonstrating outstanding capabilities in language generation, understanding, and few-shot learning. They are trained on a vast corpus of text from the internet, where quality and accuracy of extracted natural language may not be assured.
In this post, we provide a solution to detect fake news based both on Chain-of-Thought and Re-Act (Reasoning and Acting) prompt approaches. First, we discuss those two prompt engineering techniques, then we show their implementation using LangChain and Amazon Bedrock.
The following architecture diagram outlines the solution for our fake news detector.

Architecture diagram for fake news detection.

We use a subset of the FEVER dataset containing a statement and the ground truth about the statement indicating false, true, or unverifiable claims (Thorne J. et al., 2018).
The workflow can be broken down into the following steps:

The user selects one of the statements to check if fake or true.
The statement and the fake news detection task are incorporated into the prompt.
The prompt is passed to LangChain, which invokes the FM in Amazon Bedrock.
Amazon Bedrock returns a response to the user request with the statement True or False.

In this post, we use the Claude v2 model from Anthrophic (anthropic.claude-v2). Claude is a generative LLM based on Anthropic’s research into creating reliable, interpretable, and steerable AI systems. Created using techniques like constitutional AI and harmlessness training, Claude excels at thoughtful dialogue, content creation, complex reasoning, creativity, and coding. However, by using Amazon Bedrock and our solution architecture, we also have the flexibility to choose among other FMs provided by Amazon, AI21labs, Cohere, and Stability.ai.
You can find the implementation details in the following sections. The source code is available in the GitHub repository.
Prerequisites
For this tutorial, you need a bash terminal with Python 3.9 or higher installed on either Linux, Mac, or a Windows Subsystem for Linux and an AWS account.
We also recommend using either an Amazon SageMaker Studio notebook, an AWS Cloud9 instance, or an Amazon Elastic Compute Cloud (Amazon EC2) instance.
Deploy fake news detection using the Amazon Bedrock API
The solution uses the Amazon Bedrock API, which can be accessed using the AWS Command Line Interface (AWS CLI), the AWS SDK for Python (Boto3), or an Amazon SageMaker notebook. Refer to the Amazon Bedrock User Guide for more information. For this post, we use the Amazon Bedrock API via the AWS SDK for Python.
Set up Amazon Bedrock API environment
To set up your Amazon Bedrock API environment, complete the following steps:

Download the latest Boto3 or upgrade it:

pip install –upgrade boto3

Make sure you configure the AWS credentials using the aws configure command or pass them to the Boto3 client.
Install the latest version of LangChain:

pip install “langchain>=0.0.317” –quiet

You can now test your setup using the following Python shell script. The script instantiates the Amazon Bedrock client using Boto3. Next, we call the list_foundation_models API to get the list of foundation models available for use.

import boto3
import json
bedrock = boto3.client( ‘bedrock’, region_name=YOUR_REGION)
print(json.dumps(bedrock.list_foundation_models(), indent=4))

After successfully running the preceding command, you should get the list of FMs from Amazon Bedrock.
LangChain as a prompt chaining solution
To detect fake news for a given sentence, we follow the zero-shot Chain-of-Thought reasoning process (Wei J. et al., 2022), which is composed of the following steps:

Initially, the model attempts to create a statement about the news prompted.
The model creates a bullet point list of assertions.
For each assertion, the model determines if the assertion is true or false. Note that using this methodology, the model relies exclusively on its internal knowledge (weights computed in the pre-training phase) to reach a verdict. The information is not verified against any external data at this point.
Given the facts, the model answers TRUE or FALSE for the given statement in the prompt.

To achieve these steps, we use LangChain, a framework for developing applications powered by language models. This framework allows us to augment the FMs by chaining together various components to create advanced use cases. In this solution, we use the built-in SimpleSequentialChain in LangChain to create a simple sequential chain. This is very useful, because we can take the output from one chain and use it as the input to another.
Amazon Bedrock is integrated with LangChain, so you only need to instantiate it by passing the model_id when instantiating the Amazon Bedrock object. If needed, the model inference parameters can be provided through the model_kwargs argument, such as:

maxTokenCount – The maximum number of tokens in the generated response
stopSequences – The stop sequence used by the model
temperature – A value that ranges between 0–1, with 0 being the most deterministic and 1 being the most creative
top – A value that ranges between 0–1, and is used to control tokens’ choices based on the probability of the potential choices

If this is the first time you are using an Amazon Bedrock foundational model, make sure you request access to the model by selecting from the list of models on the Model access page on the Amazon Bedrock console, which in our case is claude-v2 from Anthropic.

from langchain.llms.bedrock import Bedrock
bedrock_runtime = boto3.client(
service_name=’bedrock-runtime’,
region_name= YOUR_REGION,
)
model_kwargs={
‘max_tokens_to_sample’: 8192
}
llm = Bedrock(model_id=” anthropic.claude-v2″, client=bedrock_runtime, model_kwargs=model_kwargs)

The following function defines the Chain-of-Thought prompt chain we mentioned earlier for detecting fake news. The function takes the Amazon Bedrock object (llm) and the user prompt (q) as arguments. LangChain’s PromptTemplate functionality is used here to predefine a recipe for generating prompts.

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SimpleSequentialChain

def generate_and_print(llm, q):
total_prompt = “”””””

# the model is asked to create a bullet point list of assertions
template = “””Here is a statement:
{statement}
Make a bullet point list of the assumptions you made when given the above statement.nn”””
prompt_template = PromptTemplate(input_variables=[“statement”], template=template)
assumptions_chain = LLMChain(llm=llm, prompt=prompt_template)
total_prompt = total_prompt + template

# the model is asked to create a bullet point list of assertions
template = “””Here is a bullet point list of assertions:
{assertions}
For each assertion, determine whether it is true or false. If it is false, explain why.nn”””
prompt_template = PromptTemplate(input_variables=[“assertions”], template=template)
fact_checker_chain = LLMChain(llm=llm, prompt=prompt_template)
total_prompt = total_prompt + template

#for each assertion, the model is askded to determine if the assertion is true or false, based on internal knowledge alone

template = “”” Based on the above assertions, the final response is FALSE if one of the assertions is FALSE. Otherwise, the final response is TRUE. You should only respond with TRUE or FALSE.'{}'”””.format(q)
template = “””{facts}n””” + template
prompt_template = PromptTemplate(input_variables=[“facts”], template=template)
answer_chain = LLMChain(llm=llm, prompt=prompt_template)
total_prompt = total_prompt + template

#SimpleSequentialChain allows us to take the output from one chain and use it as the input to another
overall_chain = SimpleSequentialChain(chains=[assumptions_chain, fact_checker_chain, answer_chain], verbose=True)
answer = overall_chain.run(q)

return answer

The following code calls the function we defined earlier and provides the answer. The statement is TRUE or FALSE. TRUE means that the statement provided contains correct facts, and FALSE means that the statement contains at least one incorrect fact.

from IPython.display import display, Markdown

q=”The first woman to receive a Ph.D. in computer science was Dr. Barbara Liskov, who earned her degree from Stanford University in 1968.”
print(f’The statement is: {q}’)
display(Markdown(generate_and_print(llm, q)))

An example of a statement and model response is provided in the following output:

The statement is: The first woman to receive a Ph.D. in computer science was Dr. Barbara Liskov, who earned her degree from Stanford University in 1968.

> Entering new SimpleSequentialChain chain…
Here is a bullet point list of assumptions I made about the statement:

– Dr. Barbara Liskov was the first woman to earn a Ph.D. in computer science.

– Dr. Liskov earned her Ph.D. from Stanford University.

– She earned her Ph.D. in 1968.

– No other woman earned a Ph.D. in computer science prior to 1968.

– Stanford University had a computer science Ph.D. program in 1968.

– The statement refers to Ph.D. degrees earned in the United States.
Here are my assessments of each assertion:

– Dr. Barbara Liskov was the first woman to earn a Ph.D. in computer science.
– True. Dr. Liskov was the first American woman to earn a Ph.D. in computer science, which she received from Stanford University in 1968.

– Dr. Liskov earned her Ph.D. from Stanford University.
– True. Multiple sources confirm she received her Ph.D. from Stanford in 1968.

– She earned her Ph.D. in 1968.
– True. This is consistent across sources.

– No other woman earned a Ph.D. in computer science prior to 1968.
– False. While she was the first American woman, Mary Kenneth Keller earned a Ph.D. in computer science from the University of Wisconsin in 1965. However, Keller earned her degree in the US as well.

– Stanford University had a computer science Ph.D. program in 1968.
– True. Stanford established its computer science department and Ph.D. program in 1965.

– The statement refers to Ph.D. degrees earned in the United States.
– False. The original statement does not specify the country. My assumptions that it refers to the United States is incorrect. Keller earned her Ph.D. in the US before Liskov.
False

ReAct and tools
In the preceding example, the model correctly identified that the statement is false. However, submitting the query again demonstrates the model’s inability to distinguish the correctness of facts. The model doesn’t have the tools to verify the truthfulness of statements beyond its own training memory, so subsequent runs of the same prompt can lead it to mislabel fake statements as true. In the following code, you have a different run of the same example:

The statement is: The first woman to receive a Ph.D. in computer science was Dr. Barbara Liskov, who earned her degree from Stanford University in 1968.

> Entering new SimpleSequentialChain chain…
Here is a bullet point list of assumptions I made about the statement:

– Dr. Barbara Liskov was the first woman to earn a Ph.D. in computer science
– Dr. Liskov earned her Ph.D. degree in 1968
– Dr. Liskov earned her Ph.D. from Stanford University
– Stanford University awarded Ph.D. degrees in computer science in 1968
– Dr. Liskov was a woman
– Ph.D. degrees existed in 1968
– Computer science existed as a field of study in 1968
Here are my assessments of each assertion:

– Dr. Barbara Liskov was the first woman to earn a Ph.D. in computer science
– True. Dr. Liskov was the first woman to earn a Ph.D. in computer science in 1968 from Stanford University.

– Dr. Liskov earned her Ph.D. degree in 1968
– True. Multiple sources confirm she received her Ph.D. in computer science from Stanford in 1968.

– Dr. Liskov earned her Ph.D. from Stanford University
– True. Dr. Liskov earned her Ph.D. in computer science from Stanford University in 1968.

– Stanford University awarded Ph.D. degrees in computer science in 1968
– True. Stanford awarded Liskov a Ph.D. in computer science in 1968, so they offered the degree at that time.

– Dr. Liskov was a woman
– True. All biographical information indicates Dr. Liskov is female.

– Ph.D. degrees existed in 1968
– True. Ph.D. degrees have existed since the late 19th century.

– Computer science existed as a field of study in 1968
– True. While computer science was a relatively new field in the 1960s, Stanford and other universities offered it as a field of study and research by 1968.
True

One technique for guaranteeing truthfulness is ReAct. ReAct (Yao S. et al., 2023) is a prompt technique that augments the foundation model with an agent’s action space. In this post, as well as in the ReAct paper, the action space implements information retrieval using search, lookup, and finish actions from a simple Wikipedia web API.
The reason behind using ReAct in comparison to Chain-of-Thought is to use external knowledge retrieval to augment the foundation model to detect if a given piece of news is fake or true.
In this post, we use LangChain’s implementation of ReAct through the agent ZERO_SHOT_REACT_DESCRIPTION. We modify the previous function to implement ReAct and use Wikipedia by using the load_tools function from the langchain.agents.
We also need to install the Wikipedia package:

!pip install Wikipedia

Below is the new code:

from langchain.agents import load_tools, initialize_agent, AgentType

def generate_and_print(llm, q):
print(f’Inside generate_and_print: q = {q}’)
tools = load_tools([“wikipedia”], llm=llm)
agent = initialize_agent(tools, llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
handle_parsing_errors=True,
agent_kwargs={})

input = “””Here is a statement:
{statement}
Is this statement correct? You can use tools to find information if needed.
The final response is FALSE if the statement is FALSE. Otherwise, TRUE.”””

answer = agent.run(input.format(statement=q))

return answer

The following is the output of the preceding function given the same statement used before:

> Entering new AgentExecutor chain…
Here are my thoughts and actions to determine if the statement is true or false:

Thought: To verify if this statement about the first woman to receive a PhD in computer science is true, I should consult a reliable information source like Wikipedia.

Action: Wikipedia
Action Input: first woman to receive phd in computer science
Observation: Page: Fu Foundation School of Engineering and Applied Science
Summary: The Fu Foundation School of Engineering and Applied Science (popularly known as SEAS or Columbia Engineering; previously known as Columbia School of Mines) is the engineering and applied science school of Columbia University. It was founded as the School of Mines in 1863 and then the School of Mines, Engineering and Chemistry before becoming the School of Engineering and Applied Science. On October 1, 1997, the school was renamed in honor of Chinese businessman Z.Y. Fu, who had donated $26 million to the school.
The Fu Foundation School of Engineering and Applied Science maintains a close research tie with other institutions including NASA, IBM, MIT, and The Earth Institute. Patents owned by the school generate over $100 million annually for the university. SEAS faculty and alumni are responsible for technological achievements including the developments of FM radio and the maser.
The School’s applied mathematics, biomedical engineering, computer science and the financial engineering program in operations research are very famous and highly ranked. The current SEAS faculty include 27 members of the National Academy of Engineering and one Nobel laureate. In all, the faculty and alumni of Columbia Engineering have won 10 Nobel Prizes in physics, chemistry, medicine, and economics.
The school consists of approximately 300 undergraduates in each graduating class and maintains close links with its undergraduate liberal arts sister school Columbia College which shares housing with SEAS students. The School’s current dean is Shih-Fu Chang, who was appointed in 2022.

Page: Doctor of Science
Summary: A Doctor of Science (Latin: Scientiae Doctor; most commonly abbreviated DSc or ScD) is an academic research doctorate awarded in a number of countries throughout the world. In some countries, a Doctor of Science is the degree used for the standard doctorate in the sciences; elsewhere a Doctor of Science is a “higher doctorate” awarded in recognition of a substantial and sustained contribution to scientific knowledge beyond that required for a Doctor of Philosophy (PhD).

Page: Timeline of women in science
Summary: This is a timeline of women in science, spanning from ancient history up to the 21st century. While the timeline primarily focuses on women involved with natural sciences such as astronomy, biology, chemistry and physics, it also includes women from the social sciences (e.g. sociology, psychology) and the formal sciences (e.g. mathematics, computer science), as well as notable science educators and medical scientists. The chronological events listed in the timeline relate to both scientific achievements and gender equality within the sciences.
Thought: Based on the Wikipedia pages, the statement appears to be false. The Wikipedia Timeline of Women in Science page indicates that Adele Goldstine was the first woman to earn a PhD in computer science in 1964 from the University of Michigan, not Barbara Liskov from Stanford in 1968. Therefore, my final answer is:

Final Answer: FALSE

Clean up
To save costs, delete all the resources you deployed as part of the tutorial. If you launched AWS Cloud9 or an EC2 instance, you can delete it via the console or using the AWS CLI. Similarly, you can delete the SageMaker notebook you may have created via the SageMaker console.
Limitations and related work
The field of fake news detection is actively researched in the scientific community. In this post, we used Chain-of-Thought and ReAct techniques and in evaluating the techniques, we only focused on the accuracy of the prompt technique classification (if a given statement is true or false). Therefore, we haven’t considered other important aspects such as speed of the response, nor extended the solution to additional knowledge base sources besides Wikipedia.
Although this post focused on two techniques, Chain-of-Thought and ReAct, an extensive body of work has explored how LLMs can detect, eliminate or mitigate fake news. Lee et al. has proposed the use of an encoder-decoder model using NER (named entity recognition) to mask the named entities in order to ensure that the token masked actually uses the knowledge encoded in the language model. Chern et.al. developed FacTool, which uses Chain-of-Thought principles to extract claims from the prompt, and consequently collect relevant evidences of the claims. The LLM then judges the factuality of the claim given the retrieved list of evidences. Du E. et al. presents a complementary approach where multiple LLMs propose and debate their individual responses and reasoning processes over multiple rounds in order to arrive at a common final answer.
Based on the literature, we see that the effectiveness of LLMs in detecting fake news increases when the LLMs are augmented with external knowledge and multi-agent conversation capability. However, these approaches are more computationally complex because they require multiple model calls and interactions, longer prompts, and lengthy network layer calls. Ultimately, this complexity translates into an increased overall cost. We recommend assessing the cost-to-performance ratio before deploying similar solutions in production.
Conclusion
In this post, we delved into how to use LLMs to tackle the prevalent issue of fake news, which is one of the major challenges of our society nowadays. We started by outlining the challenges presented by fake news, with an emphasis on its potential to sway public sentiment and cause societal disruptions.
We then introduced the concept of LLMs as advanced AI models that are trained on a substantial quantity of data. Due to this extensive training, these models boast an impressive understanding of language, enabling them to produce human-like text. With this capacity, we demonstrated how LLMs can be harnessed in the battle against fake news by using two different prompt techniques, Chain-of-Thought and ReAct.
We underlined how LLMs can facilitate fact-checking services on an unparalleled scale, given their capability to process and analyze vast amounts of text swiftly. This potential for real-time analysis can lead to early detection and containment of fake news. We illustrated this by creating a Python script that, given a statement, highlights to the user whether the article is true or fake using natural language.
We concluded by underlining the limitations of the current approach and ended on a hopeful note, stressing that, with the correct safeguards and continuous enhancements, LLMs could become indispensable tools in the fight against fake news.
We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the GitHub repository.
Disclaimer: The code provided in this post is meant for educational and experimentation purposes only. It should not be relied upon to detect fake news or misinformation in real-world production systems. No guarantees are made about the accuracy or completeness of fake news detection using this code. Users should exercise caution and perform due diligence before utilizing these techniques in sensitive applications.
To get started with Amazon Bedrock, visit the Amazon Bedrock console.

About the authors
Anamaria Todor is a Principal Solutions Architect based in Copenhagen, Denmark. She saw her first computer when she was 4 years old and never let go of computer science, video games, and engineering since. She has worked in various technical roles, from freelancer, full-stack developer, to data engineer, technical lead, and CTO, at various companies in Denmark, focusing on the gaming and advertising industries. She has been at AWS for over 3 years, working as a Principal Solutions Architect, focusing mainly on life sciences and AI/ML. Anamaria has a bachelor’s in Applied Engineering and Computer Science, a master’s degree in Computer Science, and over 10 years of AWS experience. When she’s not working or playing video games, she’s coaching girls and female professionals in understanding and finding their path through technology.
Marcel Castro is a Senior Solutions Architect based in Oslo, Norway. In his role, Marcel helps customers with architecture, design, and development of cloud-optimized infrastructure. He is a member of the AWS Generative AI Ambassador team with the goal to drive and support EMEA customers on their generative AI journey. He holds a PhD in Computer Science from Sweden and a master’s and bachelor’s degree in Electrical Engineering and Telecommunications from Brazil.

What is Website Visitor Identification & Why Do You Need This AI T …

Posted on November 15, 2023 by i-genie

Yes, You Can Learn Who Is Visiting Your Site in 90 Seconds!

In this fast-paced digital world we live in, understanding who your website visitors are is crucial to making smart marketing decisions and driving revenue. Unfortunately, companies like Google and Apple are making it harder and harder to get visitor data.

ClickIDs are going away and remarketing audiences are shrinking. User-level data is effectively gone from analytics and ads just aren’t performing like they used to. What are advertisers supposed to do?

The answer comes in the form of website visitor identification. Website visitor identification is a total game changer in the marketing space and it is an absolute must as paid ads become less and less effective.

Let’s dive into what website visitor identification is, how it works, and how you can find out who is visiting your site.

What is Website Visitor Identification?

How Do I Identify Visitors on My Site?

Key Trends in Website Visitor Identification

Benefits of Website Visitor Identification

How Website Visitor Identification Can Enhance Lead Generation

Success Story: DTC Health & Wellness Brand

Website Visitor Identification FAQs

What is Website Visitor Identification?

Website visitor identification is the process of identifying and understanding the companies and individuals visiting your website.

When an anonymous user lands on your site, visitor identification tools tell you who they are — their name, their email, their phone number, what page they landed on, and much more. They are no longer anonymous!

How Do I Identify Visitors on My Site?

To turn anonymous website visitors into known users, you need the right tools and technologies. That’s where the Website Visitor ID X-Ray Pixel comes in. By placing this pixel on your site, you can begin identifying who is coming to your site.

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations.

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

Once your pixel is set up, you can verify it’s working right on the same screen. Sounds simple enough right?

Let’s listen to our Head of Product, Mitchell Trulli, show you how you can start tracking visitors in 90 seconds:

The best part? It’s FREE to start! Go spend the next 90 seconds setting up your tracking and then come back and let’s talk more about website visitor identification.

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Key Trends in Website Visitor Identification

As we explore the key trends in website visitor identification, it’s essential to grasp the magnitude of the challenge. A staggering 98% of website visitors remain anonymous, withholding their valuable information.

The Evolution of Data Analytics and Tracking Technologies

The landscape of data analytics and tracking technologies is undergoing a seismic shift, making website visitor identification essential. The introduction of GA4 signifies a fundamental change, requiring businesses to adapt to new event-tracking methodologies and a more user-centric model while recent privacy changes instituted by Apple, notably ATT and LTP, pose significant challenges for marketers in tracking and identifying website visitors using Apple devices.

These changes emphasize the need for adaptive strategies and for marketers and advertisers to think outside of the box when it comes to reaching their target audience.

The Rise of AI and Machine Learning

AI and machine learning algorithms are revolutionizing how businesses identify and engage with website visitors. These technologies go beyond traditional methods, offering not just predictive insights into user behavior and preferences, but also telling you who your visitors are!

By leveraging AI and ML, marketers gain a deeper understanding of their visitors, allowing for more personalized and targeted interactions.

Personalization and Cross-Device Tracking

The landscape of personalization and cross-device tracking is grappling with unprecedented challenges due to recent privacy changes. Notably, with iOS 17 stripping click IDs, the once-viable cross-device tracking has become an intricate puzzle for marketers.

Simultaneously, these privacy measures, while safeguarding user data, paradoxically diminish the size of targeted audiences and impact retargeting effectiveness on platforms like Google Ads and Facebook Ads.

Navigating this delicate balance between privacy and personalization becomes increasingly intricate, forcing marketers to reassess strategies and adopt innovative approaches to engage audiences effectively in this evolving privacy-conscious landscape.

Benefits of Website Visitor Identification

Unlocking the potential of website visitor identification brings a host of strategic advantages. From elevating the user experience to driving higher conversion rates and sales, the benefits are kind of endless IMO. If you aren’t convinced though, let’s take a look.

1. Enhanced User Experience Through Personalization

According to McKinsey, companies that get personalization right have the potential to generate 40% more revenue.

Capturing crucial website visitor information enables marketers to craft a highly personalized user experience and in turn — generate more revenue!

What’s an example of personalization in action? With a website visitor identification tool like Customers.ai, you can track users by page. So let’s say Visitor A came in on a page for high heels. You can now create a segment for all individuals who visited the high heels page and ensure you are serving them ads for that particular product category. Visitor A along with everyone else looking for high heels is now being targeted with exactly what they were looking for.

This level of personalization not only enhances user satisfaction but also increases the likelihood of a conversion, creating a more meaningful and engaging journey for every individual interacting with the site.

2. Improved Conversion Rates and Sales

An invaluable aspect of website visitor identification lies in its ability to generate warm leads. Unlike individuals encountering your brand for the first time, identified website visitors bring a level of familiarity that significantly enhances conversion potential.

Leveraging this familiarity, marketers can tailor their strategies to engage with visitors who already have a connection with the brand, ultimately fostering a higher likelihood of successful conversions and sales.

3. Better Email Deliverability and Engagement

How many times have you received an email and thought, why am I getting this? And then…spam.

But what about the email from the store you just left or the cart you abandoned? You care about that one!

Website visitor identification plays a pivotal role in elevating email deliverability and engagement. It ensures email communications are tailored to individuals who are already familiar with the brand.

By honing in on this connection, marketers can deliver content that genuinely speaks to the interests and needs of their audience, ultimately fostering stronger relationships and driving better email performance.

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

4. Competitive Advantage and Staying Ahead of the Curve

Embracing website visitor identification provides a distinct competitive advantage for businesses, positioning them at the forefront of their industries.

By staying ahead of the curve in understanding and engaging with website visitors, marketers can anticipate and respond to evolving market demands more effectively than competitors. This proactive approach not only enhances brand visibility but also allows businesses to tailor their strategies based on real-time insights, creating a dynamic and responsive marketing framework.

In an ever-changing landscape, the ability to glean valuable information from your website visitors grants businesses the upper hand, ensuring they stand out.

5. Larger Retargeting Audiences for Google and Facebook Ads

We’ve already discussed recent privacy changes and unfortunately, two of the hardest-hit marketing channels are Google Ads and Facebook Ads. After all, how can you retarget people if you can’t tell they’ve been on your site?

Website visitor identification emerges as a strategic solution to this challenge. By capturing and utilizing identified visitor data, marketers can not only counteract the impact of privacy changes but also expand retargeting audiences significantly.

Take your website visitor lists and use them to create custom audiences on Google and Facebook. As mentioned above, you can also create different segments, allowing you to put money toward targeted and effective campaigns.

How Website Visitor Identification Can Enhance Lead Generation

Website visitor identification is a powerful tool that goes beyond merely tracking website traffic—it’s a strategic asset for supercharging your sales efforts. Here are five key ways in which website visitor identification can elevate your lead generation game:

1. Building High-Quality Email Lists

The absolute best part about website visitor identification is it enables you to capture the email addresses of your users. By identifying individuals who have engaged with your content or shown interest in your products/services, you can build targeted and high-quality email lists. You can also break those lists into specific segments and make your targeting even more spot-on!

2. Personalized Content for Targeted Outreach

Understanding the specific pages your users interact with allows for highly personalized content and highly personalized outreach. With website visitor identification, you can tailor your messaging based on the interests and preferences of individual visitors. This level of personalization significantly enhances the relevance of your content, making your outreach more compelling and increasing the likelihood of converting leads into sales.

3. Identifying Warm Leads for Strategic Follow-ups

Look, we know that not all leads are created equal. One of the best things about website visitor identification is it helps you distinguish warm leads from cold ones. By analyzing visitor behavior, such as repeated visits to pricing pages or engagement with specific product/service information, you can identify leads exhibiting purchase intent.

This insight allows your sales and marketing teams to strategically prioritize follow-ups, focusing their efforts on leads more likely to convert. Seems like a good thing right?

4. Optimizing Landing Pages for Conversion

We use analytics to understand how users are interacting with our websites. It’s why we also use eye-tracking tools and heat maps and A/B testing software. We just want to know what people are doing on our site.

Website visitor identification can provide valuable data on how visitors interact with not just your site, but also specific landing pages, allowing you to pinpoint areas for improvement and optimize landing pages to enhance conversion rates.

Whether it’s adjusting the layout, refining your call-to-action (CTA), or tailoring content based on visitor segments, these optimizations can have a direct and positive impact on sales by creating a more seamless and compelling user experience.

5. Enhanced Retargeting Strategies

We’ve said it before and we’ll say it again – website visitor identification lays the foundation for robust retargeting strategies.

By identifying individuals who have visited your site but haven’t taken a desired action, you can deploy targeted retargeting campaigns. These campaigns, whether through display ads or social media, keep your brand in front of potential leads, nurturing them through the sales funnel and increasing the chances of conversion.

Incorporating website visitor identification into your lead generation strategy not only brings efficiency but also precision, ensuring your efforts are focused on the most promising prospects. You can create a more personalized and effective process that drives meaningful results for your business.

A Website Visitor ID Success Story: DTC Health & Wellness Brand

A DTC Health & Wellness brand used the Customers.ai Website Visitor ID X-Ray Pixel tool to learn who was visiting their site and what products they were shopping for. They wanted to know who was coming to site but not adding items to their cart or creating accounts. With Customers.ai, they got it.

They were able to identify their visitors and remarket to both those who hadn’t purchased and those who had abandoned their carts with high-converting email automations in Klaviyo.

The result?

Get Started with Website Visitor Identification

In today’s dynamic digital landscape, acquiring insights into your website visitors is paramount for informed marketing decisions and revenue generation.

However, challenges arise as tech giants like Google and Apple implement stringent privacy measures, eroding click IDs and diminishing remarketing audiences.

This is where the game-changing role of website visitor identification comes into play, offering a lifeline as traditional advertising effectiveness wanes.

By understanding who is visiting your site through tools like Customers.ai Website Visitor ID X-Ray Pixel, you gain a competitive edge in personalization, conversion optimization, and strategic retargeting, ensuring your marketing strategies stay ahead in an evolving and privacy-conscious environment.

Dive into the world of website visitor identification to unlock the potential of personalized and effective marketing.

Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 50 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Website Visitor Identification FAQs

Q. How does website visitor identification work?

It works by using tools and technologies like IP matching or digital fingerprints to associate website visits with specific companies or individuals, providing detailed insights into their interactions with the site.

Q. Why is website visitor identification important for businesses?

Website visitor identification is crucial for businesses as it allows them to turn anonymous traffic into actionable data. It helps in targeted marketing, lead generation, and improving overall user experience.

Q. How can website visitor identification enhance lead generation?

It enhances lead generation by providing businesses with information about their website visitors, allowing for targeted outreach, personalized content, and strategic follow-ups to convert leads into customers.

Q. What challenges are associated with website visitor identification?

Challenges include privacy concerns, evolving data privacy regulations, and adapting to changes in tracking technologies, such as Google Analytics 4 or Apple’s privacy updates.

Q. How does website visitor identification comply with data privacy regulations?

Compliance is achieved by prioritizing user consent, anonymizing sensitive data, and following regulations like GDPR. Tools often have features to ensure privacy and compliance.

Q. What role does AI play in website visitor identification?

AI plays a crucial role by analyzing patterns in visitor behavior, predicting preferences, and automating identification processes. It enables more accurate and real-time insights.

Q. How does customers.ai enhance website visitor tracking for businesses?

Customers.ai leverages advanced analytics and artificial intelligence to provide in-depth insights into website visitor behavior. By combining data from various touchpoints, it enables businesses to track customer journeys effectively, understand preferences, and optimize engagement strategies for improved conversions.

Q. How can website visitor identification improve user experience?

It improves user experience by personalizing content based on identified visitor interests, creating a more engaging and relevant journey through the website.

Q. Are there any case studies or success stories related to website visitor identification?

Yes, businesses have reported success in lead generation, increased sales, and improved marketing strategies by effectively implementing website visitor identification tools.

Q. How does website visitor identification contribute to personalized marketing?

It contributes to personalized marketing by providing detailed insights into visitor preferences, allowing businesses to tailor content and outreach for a more personalized experience.

Q. What impact do privacy changes have on website visitor identification?

Privacy changes, like those from Apple or Google, pose challenges by limiting tracking capabilities. Businesses must adapt with privacy-compliant strategies and innovative approaches.

Q. What unique features does customers.ai offer for website visitor tracking compared to other tools?

Customers.ai has personalized tracking capabilities, offering businesses the ability to create detailed customer profiles. The platform utilizes machine learning algorithms to predict customer behavior, allowing for more targeted marketing efforts and the delivery of customized content based on individual visitor preferences.

Q. How can businesses leverage customers.ai’s website visitor tracking to improve customer satisfaction?

By utilizing customers.ai’s website visitor tracking tool, businesses can gain real-time insights into customer interactions. This enables proactive customer support, personalized communication, and the identification of pain points in the customer journey, ultimately leading to enhanced customer satisfaction and loyalty.

Q. How does website visitor identification impact marketing ROI?

It positively impacts marketing ROI by optimizing ad spend, focusing efforts on high-engagement audiences, and delivering more targeted campaigns to identified visitors.

Q. Can website visitor identification be applied to both B2B and B2C businesses?

Yes, it is versatile and applicable to both B2B and B2C contexts, aiding in lead generation, personalized marketing, and overall customer engagement.

Q. What role does website visitor identification play in event marketing?

For events, it helps identify companies or individuals showing interest in event-related content, enabling more effective event promotion and tailored outreach.

Q. How does website visitor identification contribute to customer retention?

By identifying existing customers’ website visits, businesses can understand their interests and preferences, facilitating targeted campaigns and strategies to enhance customer loyalty and lifetime value.

Q. Can website visitor identification be integrated with customer relationship management (CRM) systems?

Yes, many website visitor identification tools including Customers.ai offer integrations with CRMs, allowing businesses to seamlessly sync identified visitor data with their CRM systems for comprehensive customer insights.

Q. What role does website visitor identification play in lead scoring?

It significantly enhances lead scoring by providing data on visitor behavior, enabling businesses to prioritize leads based on engagement levels and potential conversion likelihood.

Q. How does website visitor tracking work, and what tools can I use for effective tracking?

Website visitor tracking involves using tools like Customers.ai to monitor user activities. Implement the tracking code on your site, and tools specifically designed for website visitor tracking provide insights into metrics such as page views, visitor demographics, referral sources, and more.

Q. How does website visitor identification impact email marketing strategies?

It improves email marketing strategies by ensuring that communications are tailored to the interests of identified visitors, leading to higher engagement and better overall campaign performance.

Q. Can website visitor identification help businesses understand the effectiveness of specific website pages?

Yes, businesses can use visitor identification data to analyze the effectiveness of individual pages, identify popular content, and optimize underperforming pages to enhance the overall user experience.

Q. How does website visitor identification contribute to content optimization?

It contributes to content optimization by providing insights into visitor preferences, allowing businesses to tailor and optimize content to better meet the needs and interests of their audience.

Q. What role does website visitor identification play in account-based marketing (ABM)?

In ABM, website visitor identification helps businesses identify key accounts visiting their site, allowing for targeted outreach, personalized content, and a more strategic approach to engaging high-value accounts.

Q. What are the best practices for ensuring privacy in website visitor tracking?

Privacy is crucial in website visitor tracking. To address concerns, we implement best practices such as anonymizing IP addresses, offering clear opt-out options, and maintaining transparent privacy policies. Compliance with data protection regulations is essential to ensure responsible tracking.
The post What is Website Visitor Identification & Why Do You Need This AI Tech? appeared first on Customers.ai.

Cerebras and G42 Break New Ground with 4-Exaflop AI Supercomputer: Pav …

Posted on November 14, 2023 by i-genie

As technology continues to advance at an astonishing pace, Cerebras Systems and G42 have just taken a giant leap forward in the world of artificial intelligence. In a groundbreaking partnership, they have successfully completed a 4-Exaflop AI supercomputer, marking a significant milestone in the quest for unprecedented computational power. This achievement also signifies the beginning of an ambitious journey towards reaching an astounding 8-Exaflops.

Unleashing the Power of Exaflops: A Milestone in AI

The world of AI and high-performance computing is constantly evolving, with researchers and engineers tirelessly pushing the boundaries of what’s possible. Cerebras and G42 have emerged as frontrunners in this race, showcasing their technical prowess with the completion of a 4-Exaflop AI supercomputer. This achievement is a testament to their commitment to innovation and their dedication to advancing the field of artificial intelligence.

The Road to 8-Exaflops: A Grand Ambition

What makes this achievement even more remarkable is that it’s not the end of the road for Cerebras and G42. Instead, it’s just the beginning of an even more ambitious journey. With the successful completion of the 4-Exaflop supercomputer, they are now setting their sights on the staggering goal of achieving 8-Exaflops. This pursuit promises to redefine the boundaries of AI research and open up new possibilities that were once unimaginable.

Implications for AI Research and Beyond

The implications of this achievement are far-reaching. As AI continues to play an increasingly crucial role in various industries, from healthcare to finance to autonomous vehicles, having access to such colossal computational power is a game-changer. Researchers will be able to tackle complex problems at an unprecedented scale, leading to breakthroughs in AI applications that can benefit society as a whole.

Collaboration and Innovation

The success of Cerebras and G42 in creating a 4-Exaflop AI supercomputer underscores the importance of collaboration and innovation in the tech industry. By pooling their resources and expertise, these two companies have managed to achieve what was once thought to be impossible. This partnership serves as an inspiring example of what can be accomplished when brilliant minds come together to tackle the most challenging problems.

A Bright Future for AI

As we look to the future, it’s clear that the world of AI is poised for even greater advancements. Cerebras and G42’s achievement is a testament to the incredible potential of human ingenuity and the relentless pursuit of excellence. With their eyes set on 8-Exaflops, we can only imagine the groundbreaking discoveries and innovations that lie ahead.

In conclusion, the completion of the 4-Exaflop AI supercomputer by Cerebras and G42 is a historic moment in the world of artificial intelligence. It marks the beginning of an exciting journey towards 8-Exaflops and promises to usher in a new era of AI research and innovation. The tech world is watching with anticipation as these two visionary companies continue to push the boundaries of what’s possible in the field of AI.
The post Cerebras and G42 Break New Ground with 4-Exaflop AI Supercomputer: Paving the Way for 8-Exaflops appeared first on MarkTechPost.

Researchers from Waabi and the University of Toronto Introduce LabelFo …

Posted on November 14, 2023 by i-genie

Modern self-driving systems frequently use Large-scale manually annotated datasets to train object detectors to recognize the traffic participants in the picture. Auto-labeling methods that automatically produce sensor data labels have recently gained more attention. Auto-labeling may provide far bigger datasets at a fraction of the expense of human annotation if its computational cost is less than that of human annotation and the labels it produces are of comparable quality. More precise perception models may then be trained using these auto-labeled datasets. Since LiDAR is the main sensor used on many self-driving platforms, they use it as input after that. Furthermore, they concentrate on the supervised scenario in which the auto-labeler may be trained using a collection of ground-truth labels.

This issue setting is also known as offboard perception, which does not have real-time limitations and, in contrast to onboard perception, has access to future observations. As seen in Fig. 1, the most popular model addresses the offboard perception problem in two steps, drawing inspiration from the human annotation procedure. Using a “detect-then-track” framework, objects and their coarse bounding box trajectories are first acquired, and each object track is then refined independently. Tracking as many objects in the scene as possible is the primary objective of the first stage, which aims to obtain high recall. On the other hand, the second stage concentrates on track refining to generate higher-quality bounding boxes. They call the second step “trajectory refinement,” which is the subject of this study.

Figure 1: Auto-labelling paradigm in two stages. The detect-then-track paradigm is used in the first step to collect trajectories of coarse objects. Every trajectory is separately refined in the second step.

Managing object occlusions, sparsity of observations as the range grows, and objects’ various sizes and motion patterns make this work difficult. To address these issues, a model that can efficiently and effectively utilize the temporal context of the complete object trajectory must be designed. Nevertheless, current techniques are inadequate as they are intended to handle dynamic object trajectories in a suboptimal sliding window manner, applying a neural network individually at every time step within a restricted temporal context to extract characteristics. This could be more efficient since features are repeatedly retrieved from the same frame for several overlapping windows. Consequently, the structures take advantage of relatively little temporal context to stay inside the computational budget.

Moreover, earlier efforts used complex pipelines with several distinct networks (e.g., to accommodate differing handling of static and dynamic objects), which are difficult to construct, debug, and maintain. Using a different strategy, researchers from Waabi and University of Toronto provide LabelFormer in this paper a straightforward, effective, and economical trajectory refining technique. It produces more precise bounding boxes by utilizing the entire time environment. Furthermore, their solution outperforms the current window-based approaches regarding computing efficiency, providing auto-labelling with a distinct edge over human annotation. To do this, they create a transformer-based architecture using self-attention blocks to take advantage of dependencies over time after individually encoding the initial bounding box parameters and the LiDAR observations at each time step.

Their approach eliminates superfluous computing by refining the complete trajectory in a single shot, so it only has to be used once for each item tracked during inference. Their design is also far simpler than previous methods and handles static and dynamic objects easily. Their comprehensive experimental assessment of highway and urban datasets demonstrates that their method is quicker than window-based methods and produces higher performance. They also show how LabelFormer can auto-label a bigger dataset to train downstream item detectors. This leads to more accurate detections than when preparing human data alone or with other auto-labelers.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Researchers from Waabi and the University of Toronto Introduce LabelFormer: An Efficient Transformer-Based AI Model to Refine Object Trajectories for Auto-Labelling appeared first on MarkTechPost.

Rosalyn Unveils StableSight AI to Combat Rising Online Exam Cheating

Posted on November 14, 2023 by i-genie

Amid the growing complexities of academic dishonesty in online education, Rosalyn has introduced StableSight, an advanced AI system crafted to counter cheating methods, encompassing generative AI and organized cheating rings during online exams. The surge in demand for professional certifications, driven by a significant wage gap between credentialed and non-credentialed workers, has intensified the need for robust anti-cheating measures in online testing environments. StableSight, with its innovative features and proactive approach, aims to address this issue, providing a fair testing environment while prioritizing fairness, accuracy, and test-takers’ rights.

StableSight is a comprehensive AI-driven solution that goes beyond conventional anti-cheating measures. It features a gaze-tracking model capable of detecting secondary screens, a prevalent method employed by individuals attempting to cheat during exams. The Keyboard Correlation Model adds an extra layer of security by predicting typed text through keyboard sound analysis, thwarting attempts to use concealed devices for dishonest purposes. The platform’s approach involves identifying suspected cases of cheating and escalating them to human reviewers, underscoring a commitment to fairness and accuracy in assessing test-taker behavior.

The team emphasized the evolving landscape of online education and examination, noting that generative AI has presented new opportunities for cheating, potentially triggering a network effect. StableSight has been meticulously crafted to counteract traditional forms of cheating and the complex challenges posed by generative AI and organized cheating syndicates. Rosalyn has introduced an invite-only gamified portal to demonstrate its efficacy, allowing individuals to test their cheating detection skills against StableSight through 20 recorded exam sessions.

The platform’s AI proctoring services have garnered recognition from notable organizations, including the U.S. Department of Defense, Coursera, Stripe, Nexford University, Dominican University, and Missouri Baptist University. With a focus on maintaining the integrity and credibility of certifications and degrees, Rosalyn believes that AI is vital in enhancing security by continuously updating and learning to detect new cheating methods.

In light of the reported surge in cheating incidents, especially in online testing scenarios, Rosalyn’s proactive approach aligns with the demand for stringent anti-cheating measures. A 2022 academic study revealed that 35% of undergraduate business students admitted to cheating during online tests amid the COVID-19 pandemic. Collaborative research by Rosalyn and a language testing partner demonstrated StableSight’s effectiveness in identifying cheating rings among test-takers.

To address concerns about potential bias and false flags generated by AI proctoring platforms, Rosalyn emphasizes a balanced approach. StableSight leverages both human and artificial intelligence, combining advanced algorithms with human proctoring to enhance accuracy in detecting online cheating. This integration ensures that interventions in test-taking scenarios are implemented only when necessary, preventing unwarranted consequences for test-takers.

As the demand for remote proctoring continues to rise, Rosalyn acknowledges the need to make their systems more transparent and user-friendly, reducing perceived invasiveness while maintaining effectiveness. The central aim is to establish a harmonized ecosystem where technology elevates the online testing experience, preserves the integrity of online credentials, and prioritizes student privacy and comfort. StableSight emerges as a pioneering solution in the battle against online exam cheating, marking a significant step towards fair and secure online education environments.
The post Rosalyn Unveils StableSight AI to Combat Rising Online Exam Cheating appeared first on MarkTechPost.

Implement real-time personalized recommendations using Amazon Personal …

Posted on November 14, 2023 by i-genie

At a basic level, Machine Learning (ML) technology learns from data to make predictions. Businesses use their data with an ML-powered personalization service to elevate their customer experience. This approach allows businesses to use data to derive actionable insights and help grow their revenue and brand loyalty.
Amazon Personalize accelerates your digital transformation with ML, making it easier to integrate personalized recommendations into existing websites, applications, email marketing systems, and more. Amazon Personalize enables developers to quickly implement a customized personalization engine, without requiring ML expertise. Amazon Personalize provisions the necessary infrastructure and manages the entire machine learning (ML) pipeline, including processing the data, identifying features, using the most appropriate algorithms, and training, optimizing, and hosting the models. You receive results through an API and pay only for what you use, with no minimum fees or upfront commitments.
The post Architecting near real-time personalized recommendations with Amazon Personalize shows how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services. In this post, we walk you through a reference implementation of a real-time personalized recommendation system using Amazon Personalize.
Solution overview
The real-time personalized recommendations solution is implemented using Amazon Personalize, Amazon Simple Storage Service (Amazon S3), Amazon Kinesis Data Streams, AWS Lambda, and Amazon API Gateway.
The architecture is implemented as follows:

Data preparation – Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data.
Train the model – After importing your data, select the recipe matching your use case, and then create a solution to train a model by creating a solution version. When your solution version is ready, you can create a campaign for your solution version.
Get near real-time recommendations – When you have a campaign, you can integrate calls to the campaign in your application. This is where calls to the GetRecommendations or GetPersonalizedRanking APIs are made to request near real-time recommendations from Amazon Personalize.

For more information, refer to Architecting near real-time personalized recommendations with Amazon Personalize.
The following diagram illustrates the solution architecture.

Implementation
We demonstrate this implementation with a use case about making real-time movie recommendations to an end user based on their interactions with the movie database over time.
The solution is implemented using the following steps:

Prerequisite (Data preparation)
Setup your development environment
Deploy the solution
Create a solution version
Create a campaign
Create an event tracker
Get recommendations
Ingest real-time interactions
Validate real-time recommendations
Cleanup

Prerequisites
Before you get started, make sure you have the following prerequisites:

Prepare your training data – Prepare and upload the data to an S3 bucket using the instructions. For this particular use case, you will be uploading interactions data and items data. An interaction is an event that you record and then import as training data. Amazon Personalize generates recommendations primarily based on the interactions data you import into an Interactions dataset. You can record multiple event types, such as click, watch, or like. Although the model created by Amazon Personalize can suggest based on a user’s past interactions, the quality of these suggestions can be enhanced when the model possesses data about the associations among users or items . If a user has engaged with movies categorized as Drama in the item dataset, Amazon Personalize will suggest movies (items) with the same genre.
Setup your development environment – Install the AWS Command Line Interface (AWS CLI).
Configure CLI with your Amazon account – Configure the AWS CLI with your AWS account information.
Install and bootstrap AWS Cloud Development Kit (AWS CDK)

Deploy the solution
To deploy the solution, do the following:

Clone the repository to a new folder on your desktop.
Deploy the stack to your AWS environment.

Create a solution version
A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models). When you deploy the CDK project in the previous step, a solution with a User-Personalization recipe is created for you automatically. A solution version refers to a trained machine learning model. Create a solution version for the implementation.
Create a campaign
A campaign deploys a solution version (trained model) with a provisioned transaction capacity for generating real-time recommendations. Create a campaign for the implementation.
Create an event tracker
Amazon Personalize can make recommendations based on real-time event data only, historical event data only, or both. Record real-time events to build out your interactions data and allow Amazon Personalize to learn from your user’s most recent activity. This keeps your data fresh and improves the relevance of Amazon Personalize recommendations. Before you can record events, you must create an event tracker. An event tracker directs new event data to the Interactions dataset in your dataset group. Create and event tracker for the implementation.
Get recommendations
In this use case, the interaction dataset is composed of movie IDs. Consequently, the recommendations presented to the user will consist of movie IDs that align most closely with their personal preferences, determined from their historical interactions. You can use the getRecommendations API to retrieve personalized recommendations for a user by sending its associated userID, the number of results for recommendations that you need for the user as well as the campaign ARN. You can find the campaign ARN in the Amazon Personalize console menu.
For example, the following request will retrieve 5 recommendations for the user whose userId is 429:

curl –location ‘https://{your-api-id}.execute-api.{your-region}.amazonaws.com/prod/getRecommendations?campaignArn={campaignArn}&userId=429&numResults=5’

The response from the request will be:

{
“$metadata”: {
“httpStatusCode”: 200,
“requestId”: “7159c128-4e16-45a4-9d7e-cf19aa2256e8”,
“attempts”: 1,
“totalRetryDelay”: 0
},
“itemList”: [
{
“itemId”: “596”,
“score”: 0.0243044
},
{
“itemId”: “153”,
“score”: 0.0151695
},
{
“itemId”: “16”,
“score”: 0.013694
},
{
“itemId”: “261”,
“score”: 0.013524
},
{
“itemId”: “34”,
“score”: 0.0122294
}
],
“recommendationId”: “RID-1d-40c1-8d20-dfffbd7b0ac7-CID-06b10f”
}

The items returned by the API call are the movies that Amazon Personalize recommends to the user based on their historical interactions.
The score values provided in this context represent floating-point numbers that range between zero and 1.0. These values correspond to the current campaign and the associated recipes for this use case. They are determined based on the collective scores assigned to all items present in your comprehensive dataset.
Ingest real-time interactions
In the previous example, recommendations were obtained for the user with an ID of 429 based on their historical interactions with the movie database. For real-time recommendations, the user interactions with the items must be ingested into Amazon Personalize in real-time. These interactions are ingested into the recommendation system through the Amazon Personalize Event Tracker. The type of interaction, also called EventType, is given by the column of the same name in the interaction data dataset (EVENT_TYPE). In this example, the events can be of type “watch” or “click”, but you can have your own types of events according to the needs of your application.
In this example, the exposed API that generates the events of the users with the items receives the “interactions” parameter that corresponds to the number of events (interactions) of a user (UserId) with a single element (itemId) right now. The trackingId parameter can be found in the Amazon Personalize console and in the response of the creation of Event Tracker request.
This example shows a putEvent request: Generate 1 interactions of click type, with an item id of ‘185’ for the user id ‘429’, using the current timestamp. Note that in production, the ‘sentAt’ should be set to the time of the user’s interaction. In the following example, we set this to the point in time in epoch time format when we wrote the API request for this post. The events are sent to Amazon Kinesis Data Streams through an API Gateway which is why you need to send the stream-name and PartitionKey parameters.

curl –location ‘https://iyxhva3ll6.execute-api.us-west-2.amazonaws.com/prod/data’ –header ‘Content-Type: application/json’ –data ‘{ “stream-name”: “my-stream”,”Data”: {“userId” : “429”, “interactions”: 1, “itemId”: “185”, “trackingId” : “c90ac6d7-3d89-4abc-8a70-9b09c295cbcd”, “eventType”: “click”, “sentAt”:”1698711110″},”PartitionKey”:”userId”}’

You will receive a confirmation response similar to the following:

{
“Message”: “Event sent successfully”,
“data”: {
“EncryptionType”: “KMS”,
“SequenceNumber”: “49……….1901314”,
“ShardId”: “shardId-xxxxxxx”
}
}

Validate real-time recommendations
Because the interaction dataset has been updated, the recommendations will be automatically updated to consider the new interactions. To validate the recommendations updated in real-time, you can call the getRecommendations API again for the same user id 429, and the result should be different from the previous one. The following results show a new recommendation with an id of 594 and the recommendations with the id’s of 16, 596, 153and 261 changed their scores. These items brought in new movie genre (‘Animation|Children|Drama|Fantasy|Musical’) the top 5 recommendations.
Request:

curl –location ‘https://{your-api-id}.execute-api.{your-region}.amazonaws.com/prod/getRecommendations?campaignArn={campaignArn} &userId=429&numResults=5’

Response:

{
“$metadata”: {
“httpStatusCode”: 200,
“requestId”: “680f2be8-2e64-47d7-96f7-1c4aa9b9ac9d”,
“attempts”: 1,
“totalRetryDelay”: 0
},
“itemList”: [
{
“itemId”: “596”,
“score”: 0.0288085
},
{
“itemId”: “16”,
“score”: 0.0134173
},
{
“itemId”: “594”,
“score”: 0.0129357
},
{
“itemId”: “153”,
“score”: 0.0129337
},
{
“itemId”: “261”,
“score”: 0.0123728
}
],
“recommendationId”: “RID-dc-44f8-a327-482fb9e54921-CID-06b10f”
}

The response shows that the recommendation provided by Amazon Personalize was updated in real-time.
Clean up
To avoid unnecessary charges, clean up the solution implementation by using Cleaning up resources.
Conclusion
In this post, we showed you how to implement a real-time personalized recommendations system using Amazon Personalize. The interactions with Amazon Personalize to ingest real-time interactions and get recommendations were executed through a command line tool called curl but these API calls can be integrated into a business application and derive the same outcome.
To choose a new recipe for your use case, refer to Real-time personalization. To measure the impact of the recommendations made by Amazon Personalize, refer to Measuring impact of recommendations.

About the Authors
Cristian Marquez is a Senior Cloud Application Architect. He has vast experience designing, building, and delivering enterprise-level software, high load and distributed systems and cloud native applications. He has experience in backend and frontend programming languages, as well as system design and implementation of DevOps practices. He actively assists customers build and secure innovative cloud solutions, solving their business problems and achieving their business goals.
Anand Komandooru is a Senior Cloud Architect at AWS. He joined AWS Professional Services organization in 2021 and helps customers build cloud-native applications on AWS cloud. He has over 20 years of experience building software and his favorite Amazon leadership principle is “Leaders are right a lot.“

Improve LLM responses in RAG use cases by interacting with the user

Posted on November 14, 2023 by i-genie

One of the most common applications of generative AI and large language models (LLMs) is answering questions based on a specific external knowledge corpus. Retrieval-Augmented Generation (RAG) is a popular technique for building question answering systems that use an external knowledge base. To learn more, refer to Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain.
Traditional RAG systems often struggle to provide satisfactory answers when users ask vague or ambiguous questions without providing sufficient context. This leads to unhelpful responses like “I don’t know” or incorrect, made-up answers provided by an LLM. In this post, we demonstrate a solution to improve the quality of answers in such use cases over traditional RAG systems by introducing an interactive clarification component using LangChain.
The key idea is to enable the RAG system to engage in a conversational dialogue with the user when the initial question is unclear. By asking clarifying questions, prompting the user for more details, and incorporating the new contextual information, the RAG system can gather the necessary context to provide an accurate, helpful answer—even from an ambiguous initial user query.
Solution overview
To demonstrate our solution, we have set up an Amazon Kendra index (composed of the AWS online documentation for Amazon Kendra, Amazon Lex, and Amazon SageMaker), a LangChain agent with an Amazon Bedrock LLM, and a straightforward Streamlit user interface.
Prerequisites
To run this demo in your AWS account, complete the following prerequisites:

Clone the GitHub repository and follow the steps explained in the README.
Deploy an Amazon Kendra index in your AWS account. You can use the following AWS CloudFormation template to create a new index or use an already running index. Deploying a new index might add additional charges to your bill, therefore we recommend deleting it if you don’t longer need it. Note that the data within the index will be sent to the selected Amazon Bedrock foundation model (FM).
The LangChain agent relies on FMs available in Amazon Bedrock, but this can be adapted to any other LLM that LangChain supports.
To experiment with the sample front end shared with the code, you can use Amazon SageMaker Studio to run a local deployment of the Streamlit app. Note that running this demo will incur some additional costs.

Implement the solution
Traditional RAG agents are often designed as follows. The agent has access to a tool that is used to retrieve documents relevant to a user query. The retrieved documents are then inserted into the LLM prompt, so that the agent can provide an answer based on the retrieved document snippets.
In this post, we implement an agent that has access to KendraRetrievalTool and derives relevant documents from the Amazon Kendra index and provides the answer given the retrieved context:

# tool for Kendra retrieval

kendra_tool = Tool(
    name=”KendraRetrievalTool”,
    func=retrieval_qa_chain,
    description=”Use this tool first to answer human questions. The input to this tool should be the question.”,
)
# traditional RAG agent

traditional_agent = initialize_agent(
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    tools=[kendra_tool]
    llm=llm,
    early_stopping_method=”generate”,
    memory=conversational_memory,
)

# user question
answer = traditional_agent.run(“How many GPUs does my EC2 instance have?”)

Refer to the GitHub repo for the full implementation code. To learn more about traditional RAG use cases, refer to Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart.
Consider the following example. A user asks “How many GPUs does my EC2 instance have?” As shown in the following screenshot, the agent is looking for the answer using KendraRetrievalTool. However, the agent realizes it doesn’t know which Amazon Elastic Compute Cloud (Amazon EC2) instance type the user is referencing and therefore provides no helpful answer to the user, leading to a poor customer experience.

To address this problem, we define an additional custom tool called AskHumanTool and provide it to the agent. The tool instructs an LLM to read the user question and ask a follow-up question to the user if KendraRetrievalTool is not able to return a good answer. This implies that the agent will now have two tools at its disposal:

# tool for asking human
human_ask_tool = CustomAskHumanTool()

# RAG agent with two tools
improved_agent = initialize_agent(
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
tools=[kendra_tool, human_ask_tool],
llm=llm,
early_stopping_method=”generate”,
memory=conversational_memory,
)

# user question
answer = improved_agent.run(“How many GPUs does my EC2 instance have?”)

This allows the agent to either refine the question or provide additional context that is needed to respond to the prompt. To guide the agent to use AskHumanTool for this purpose, we provide the following tool description to the LLM:
Use this tool if you don’t find an answer using the KendraRetrievalTool.
Ask the human to clarify the question or provide the missing information.
The input should be a question for the human.
As illustrated in the following screenshot, by using AskHumanTool, the agent is now identifying vague user questions and returning a follow-up question to the user asking to specify what EC2 instance type is being used.

After the user has specified the instance type, the agent is incorporating the additional answer into the context for the original question, before deriving the correct answer.

Note that the agent can now decide whether to use KendraRetrievalTool to retrieve the relevant documents or ask a clarifying question using AskHumanTool. The agent’s decision is based on whether it finds the document snippets inserted into the prompt sufficient to provide the final answer. This flexibility allows the RAG system to support different queries a user may submit, including both well-formulated and vague questions.
In our example, the full agent workflow is as follows:

The user makes a request to the RAG app, asking “How many GPUs does my EC2 instance have?”
The agent uses the LLM to decide what action to take: Find relevant information to answer the user’s request by calling the KendraRetrievalTool.
The agent retrieves information from the Amazon Kendra index using the tool. The snippets from the retrieved documents are inserted into the agent prompt.
The LLM (of the agent) derives that the retrieved documents from Amazon Kendra aren’t helpful or don’t contain enough context to provide an answer to the user’s request.
The agent uses AskHumanTool to formulate a follow-up question: “What is the specific EC2 instance type you are using? Knowing the instance type would help answer how many GPUs it has.” The user provides the answer “ml.g5.12xlarge,” and the agent calls KendraRetrievalTool again, but this time adding the EC2 instance type into the search query.
After running through Steps 2–4 again, the agent derives a useful answer and sends it back to the user.

The following diagram illustrates this workflow.
The example described in this post illustrates how the addition of the custom AskHumanTool allows the agent to request clarifying details when needed. This can improve the reliability and accuracy of the responses, leading to a better customer experience in a growing number of RAG applications across different domains.
Clean up
To avoid incurring unnecessary costs, delete the Amazon Kendra index if you’re not using it anymore and shut down the SageMaker Studio instance if you used it to run the demo.
Conclusion
In this post, we showed how to enable a better customer experience for the users of a RAG system by adding a custom tool that enables the system to ask a user for a missing piece of information. This interactive conversational approach represents a promising direction for improving traditional RAG architectures. The ability to resolve vagueness through a dialogue can lead to delivering more satisfactory answers from a knowledge base.
Note that this approach is not limited to RAG use cases; you can use it in other generative AI use cases that depend on an agent at its core, where a custom AskHumanTool can be added.
For more information about using Amazon Kendra with generative AI, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models.

About the authors
Antonia Wiebeler is a Data Scientist at the AWS Generative AI Innovation Center, where she enjoys building proofs of concept for customers. Her passion is exploring how generative AI can solve real-world problems and create value for customers. While she is not coding, she enjoys running and competing in triathlons.
Nikita Kozodoi is an Applied Scientist at the AWS Generative AI Innovation Center, where he develops ML solutions to solve customer problems across industries. In his role, he focuses on advancing generative AI to tackle real-world challenges. In his spare time, he loves playing beach volleyball and reading.