Use Amazon SageMaker Studio to build a RAG question answering solution …

Retrieval Augmented Generation (RAG) allows you to provide a large language model (LLM) with access to data from external knowledge sources such as repositories, databases, and APIs without the need to fine-tune it. When using generative AI for question answering, RAG enables LLMs to answer questions with the most relevant, up-to-date information and optionally cite their data sources for verification.
A typical RAG solution for knowledge retrieval from documents uses an embeddings model to convert the data from the data sources to embeddings and stores these embeddings in a vector database. When a user asks a question, it searches the vector database and retrieves documents that are most similar to the user’s query. Next, it combines the retrieved documents and the user’s query in an augmented prompt that is sent to the LLM for text generation. There are two models in this implementation: the embeddings model and the LLM that generates the final response.
In this post, we demonstrate how to use Amazon SageMaker Studio to build a RAG question answering solution.
Using notebooks for RAG-based question answering
Implementing RAG typically entails experimenting with various embedding models, vector databases, text generation models, and prompts, while also debugging your code until you achieve a functional prototype. Amazon SageMaker offers managed Jupyter notebooks equipped with GPU instances, enabling you to rapidly experiment during this initial phase without spinning up additional infrastructure. There are two options for using notebooks in SageMaker. The first option is fast launch notebooks available through SageMaker Studio. In SageMaker Studio, the integrated development environment (IDE) purpose-built for ML, you can launch notebooks that run on different instance types and with different configurations, collaborate with colleagues, and access additional purpose-built features for machine learning (ML). The second option is using a SageMaker notebook instance, which is a fully managed ML compute instance running the Jupyter Notebook app.
In this post, we present a RAG solution that augments the model’s knowledge with additional data from external knowledge sources to provide more accurate responses specific to a custom domain. We use a single SageMaker Studio notebook running on an ml.g5.2xlarge instance (1 A10G GPU) and Llama 2 7b chat hf, the fine-tuned version of Llama 2 7b, which is optimized for dialog use cases from Hugging Face Hub. We use two AWS Media & Entertainment Blog posts as the sample external data, which we convert into embeddings with the BAAI/bge-small-en-v1.5 embeddings. We store the embeddings in Pinecone, a vector-based database that offers high-performance search and similarity matching. We also discuss how to transition from experimenting in the notebook to deploying your models to SageMaker endpoints for real-time inference when you complete your prototyping. The same approach can be used with different models and vector databases.
Solution overview
The following diagram illustrates the solution architecture.

Implementing the solution consists of two high-level steps: developing the solution using SageMaker Studio notebooks, and deploying the models for inference.
Develop the solution using SageMaker Studio notebooks
Complete the following steps to start developing the solution:

Load the Llama-2 7b chat model from Hugging Face Hub in the notebook.
Create a PromptTemplate with LangChain and use it to create prompts for your use case.
For 1–2 example prompts, add relevant static text from external documents as prompt context and assess if the quality of the responses improves.
Assuming that the quality improves, implement the RAG question answering workflow:

Gather the external documents that can help the model better answer the questions in your use case.
Load the BGE embeddings model and use it to generate embeddings of these documents.
Store these embeddings in a Pinecone index.
When a user asks a question, perform a similarity search in Pinecone and add the content from the most similar documents to the prompt’s context.

Deploy the models to SageMaker for inference at scale
When you hit your performance goals, you can deploy the models to SageMaker to be used by generative AI applications:

Deploy the Llama-2 7b chat model to a SageMaker real-time endpoint.
Deploy the BAAI/bge-small-en-v1.5 embeddings model to a SageMaker real-time endpoint.
Use the deployed models in your question answering generative AI applications.

In the following sections, we walk you through the steps of implementing this solution in SageMaker Studio notebooks.
Prerequisites
To follow the steps in this post, you need to have an AWS account and an AWS Identity and Access Management (IAM) role with permissions to create and access the solution resources. If you are new to AWS, see Create a standalone AWS account.
To use SageMaker Studio notebooks in your AWS account, you need a SageMaker domain with a user profile that has permissions to launch the SageMaker Studio app. If you are new to SageMaker Studio, the Quick Studio setup is the fastest way to get started. With a single click, SageMaker provisions the SageMaker domain with default presets, including setting up the user profile, IAM role, IAM authentication, and public internet access. The notebook for this post assumes an ml.g5.2xlarge instance type. To review or increase your quota, open the AWS Service Quotas console, choose AWS Services in the navigation pane, choose Amazon SageMaker, and refer to the value for Studio KernelGateway apps running on ml.g5.2xlarge instances.

After confirming your quota limit, you need to complete the dependencies to use Llama 2 7b chat.
Llama 2 7b chat is available under the Llama 2 license. To access Llama 2 on Hugging Face, you need to complete a few steps first:

Create a Hugging Face account if you don’t have one already.
Complete the form “Request access to the next version of Llama” on the Meta website.
Request access to Llama 2 7b chat on Hugging Face.

After you have been granted access, you can create a new access token to access models. To create an access token, navigate to the Settings page on the Hugging Face website.
You need to have an account with Pinecone to use it as a vector database. Pinecone is available on AWS via the AWS Marketplace. The Pinecone website also offers the option to create a free account that comes with permissions to create a single index, which is sufficient for the purposes of this post. To retrieve your Pinecone keys, open the Pinecone console and choose API Keys.

Set up the notebook and environment
To follow the code in this post, open SageMaker Studio and clone the following GitHub repository. Next, open the notebook studio-local-gen-ai/rag/RAG-with-Llama-2-on-Studio.ipynb and choose the PyTorch 2.0.0 Python 3.10 GPU Optimized image, Python 3 kernel, and ml.g5.2xlarge as the instance type. If this is your first time using SageMaker Studio notebooks, refer to Create or Open an Amazon SageMaker Studio Notebook.

To set up the development environment, you need to install the necessary Python libraries, as demonstrated in the following code:
%%writefile requirements.txt
sagemaker>=2.175.0
transformers==4.33.0
accelerate==0.21.0
datasets==2.13.0
langchain==0.0.297
pypdf>=3.16.3
pinecone-client
sentence_transformers
safetensors>=0.3.3
!pip install -U -r requirements.txt
Load the pre-trained model and tokenizer
After you have imported the required libraries, you can load the Llama-2 7b chat model along with its corresponding tokenizers from Hugging Face. These loaded model artifacts are stored in the local directory within SageMaker Studio. This enables you to swiftly reload them into memory whenever you need to resume your work at a different time.
import torch

from transformers import (
AutoTokenizer,
LlamaTokenizer,
LlamaForCausalLM,
GenerationConfig,
AutoModelForCausalLM
)
import transformers

tg_model_id = “meta-llama/Llama-2-7b-chat-hf” #the model id in Hugging Face
tg_model_path = f”./tg_model/{tg_model_id}” #the local directory where the model will be saved

tg_model = AutoModelForCausalLM.from_pretrained(tg_model_id, token=hf_access_token,do_sample=True, use_safetensors=True, device_map=”auto”, torch_dtype=torch.float16
tg_tokenizer = AutoTokenizer.from_pretrained(tg_model_id, token=hf_access_token)

tg_model.save_pretrained(save_directory=tg_model_path, from_pt=True)
tg_tokenizer.save_pretrained(save_directory=tg_model_path, from_pt=True)
Ask a question that requires up-to-date information
You can now start using the model and ask questions. Llama-2 chat models expect the prompt to adhere to the following format:
<s>[INST] <<SYS>>
system_prompt
<<SYS>>
{{ user_message }} [/INST]
You can use the PromptTemplate from LangChain to create a recipe based on the prompt format, so that you can easily create prompts going forward:
from langchain import PromptTemplate

template = “””<s>[INST] <<SYS>>nYou are an assistant for question-answering tasks. You are helpful and friendly. Use the following pieces of retrieved context to answer the query. If you don’t know the answer, you just say I don’t know. Use three sentences maximum and keep the answer concise.
<<SYS>>n
{context}n
{question} [/INST]
“””
prompt_template = PromptTemplate( template=template, input_variables=[‘context’,’question’] )
Let’s ask the model a question that needs recent information from 2023. You can use LangChain and specifically the LLMChain type of chain and pass as parameters the LLM, the prompt template you created earlier, and the question:
question = “When can I visit the AWS M&E Customer Experience Center in New York City?”

tg_tokenizer.add_special_tokens( {“pad_token”: “[PAD]”} )
tg_tokenizer.padding_side = “left”

tg_pipe = transformers.pipeline(task=’text-generation’, model=tg_model, tokenizer=tg_tokenizer, num_return_sequences=1, eos_token_id=tg_tokenizer.eos_token_id, pad_token_id=tg_tokenizer.eos_token_id, max_new_tokens=400, temperature=0.7)

from langchain.chains import LLMChain
from langchain.llms import HuggingFacePipeline

llm=HuggingFacePipeline(pipeline=tg_pipe, model_kwargs={‘temperature’:0.7})
llm_chain = LLMChain(llm=llm, prompt=prompt_template)
no_context_response = llm_chain.predict(context=””, question=question)
print(no_context_response)
We get the following generated answer:

Thank you for reaching out! The AWS M&E Customer Experience Center in New York City is currently closed for visits due to the COVID-19 pandemic. However, you can check their official website or social media accounts for any updates on when the center will reopen. In the meantime, you can explore their virtual tours and resources available online.

Improve the answer by adding context to the prompt
The answer we generated is not entirely true. Let’s see if we can improve it by providing some context. You can add an extract from the post AWS announces new M&E Customer Experience Center in New York, which includes updates on the topic from 2023:
context = “””Media and entertainment (M&E) customers continue to face challenges in creating more content, more quickly, and distributing it to more endpoints than ever before in their quest to delight viewers globally. Amazon Web Services (AWS), along with AWS Partners, have showcased the rapid evolution of M&E solutions for years at industry events like the National Association of Broadcasters (NAB) Show and the International Broadcast Convention (IBC). Until now, AWS for M&E technology demonstrations were accessible in this way just a few weeks out of the year. Customers are more engaged than ever before; they want to have higher quality conversations regarding user experience and media tooling. These conversations are best supported by having an interconnected solution architecture for reference. Scheduling a visit of the M&E Customer Experience Center will be available starting November 13th, please send an email to AWS-MediaEnt-CXC@amazon.com.”””
Use the LLMChain again and pass the preceding text as context:
context_response = llm_chain.predict(context=context, question=question)
print(context_response)
The new response answers the question with up-to-date information:

You can visit the AWS M&E Customer Experience Center in New York City starting from November 13th. Please send an email to AWS-MediaEnt-CXC@amazon.com to schedule a visit.

We have confirmed that by adding the right context, the model’s performance is improved. Now you can focus your efforts on finding and adding the right context for the question asked. In other words, implement RAG.
Implement RAG question answering with BGE embeddings and Pinecone
At this juncture, you must decide on the sources of information to enhance the model’s knowledge. These sources could be internal webpages or documents within your organization, or publicly available data sources. For the purposes of this post and for the sake of simplicity, we have chosen two AWS Blog posts published in 2023:

AWS announces new M&E Customer Experience Center in New York
AWS Media Services awarded industry accolades

These posts are already available as PDF documents in the data project directory in SageMaker Studio for quick access. To divide the documents into manageable chunks, you can employ the RecursiveCharacterTextSplitter method from LangChain:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader(“./data/”)

documents = loader.load()

text_splitter=RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=5
)
docs = text_splitter.split_documents(documents)
Next, use the BGE embeddings model bge-small-en created by the Beijing Academy of Artificial Intelligence (BAAI) that is available on Hugging Face to generate the embeddings of these chunks. Download and save the model in the local directory in Studio. We use fp32 so that it can run on the instance’s CPU.
em_model_name = “BAAI/bge-small-en”
em_model_path = f”./em-model”

from transformers import AutoModel
# Load model from HuggingFace Hub
em_model = AutoModel.from_pretrained(em_model_name,torch_dtype=torch.float32)
em_tokenizer = AutoTokenizer.from_pretrained(em_model_name,device=”cuda”)

# save model to disk
em_tokenizer.save_pretrained(save_directory=f”{em_model_path}/model”,from_pt=True)
em_model.save_pretrained(save_directory=f”{em_model_path}/model”,from_pt=True)
em_model.eval()
Use the following code to create an embedding_generator function, which takes the document chunks as input and generates the embeddings using the BGE model:
# Tokenize sentences
def tokenize_text(_input, device):
return em_tokenizer(
[_input],
padding=True,
truncation=True,
return_tensors=’pt’
).to(device)

# Run embedding task as a function with model and text sentences as input
def embedding_generator(_input, normalize=True):
# Compute token embeddings
with torch.no_grad():
embedded_output = em_model(
**tokenize_text(
_input,
em_model.device
)
)
sentence_embeddings = embedded_output[0][:, 0]
# normalize embeddings
if normalize:
sentence_embeddings = torch.nn.functional.normalize(
sentence_embeddings,
p=2,
dim=1
)

return sentence_embeddings[0, :].tolist()

sample_sentence_embedding = embedding_generator(docs[0].page_content)
print(f”Embedding size of the document —>”, len(sample_sentence_embedding))
In this post, we demonstrate a RAG workflow using Pinecone, a managed, cloud-native vector database that also offers an API for similarity search. You are free to rewrite the following code to use your preferred vector database.
We initialize a Pinecone python client and create a new vector search index using the embedding model’s output length. We use LangChain’s built-in Pinecone class to ingest the embeddings we created in the previous step. It needs three parameters: the documents to ingest, the embeddings generator function, and the name of the Pinecone index.
import pinecone
pinecone.init(
api_key = os.environ[“PINECONE_API_KEY”],
environment = os.environ[“PINECONE_ENV”]
)
#check if index already exists, if not we create it
index_name = “rag-index”
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=len(sample_sentence_embedding), ## 384 for bge-small-en
metric=’cosine’
)

#insert the embeddings
from langchain.vectorstores import Pinecone
vector_store = Pinecone.from_documents(
docs,
embedding_generator,
index_name=index_name
)
With the Llama-2 7B chat model loaded into memory and the embeddings integrated into the Pinecone index, you can now combine these elements to enhance Llama 2’s responses for our question-answering use case. To achieve this, you can employ the LangChain RetrievalQA, which augments the initial prompt with the most similar documents from the vector store. By setting return_source_documents=True, you gain visibility into the exact documents used to generate the answer as part of the response, allowing you to verify the accuracy of the answer.
from langchain.chains import RetrievalQA
import textwrap

#helper method to improve the readability of the response
def print_response(llm_response):
temp = [textwrap.fill(line, width=100) for line in llm_response[‘result’].split(‘n’)]
response = ‘n’.join(temp)
print(f”{llm_response[‘query’]}n n{response}’n n Source Documents:”)
for source in llm_response[“source_documents”]:
print(source.metadata)

llm_qa_chain = RetrievalQA.from_chain_type(
llm=llm, #the Llama-2 7b chat model
chain_type=’stuff’,
retriever=vector_store.as_retriever(search_kwargs={“k”: 2}), # perform similarity search in Pinecone
return_source_documents=True, #show the documents that were used to answer the question
chain_type_kwargs={“prompt”: prompt_template}
)
print_response(llm_qa_chain(question))
We get the following answer:

Q: When can I visit the AWS M&E Customer Experience Center in New York City?
A: I’m happy to help! According to the context, the AWS M&E Customer Experience Center in New York City will be available for visits starting on November 13th. You can send an email to AWS-MediaEnt-CXC@amazon.com to schedule a visit.’
Source Documents:
{‘page’: 4.0, ‘source’: ‘data/AWS announces new M&E Customer Experience Center in New York City _ AWS for M&E Blog.pdf’}
{‘page’: 2.0, ‘source’: ‘data/AWS announces new M&E Customer Experience Center in New York City _ AWS for M&E Blog.pdf’}

Let’s try a different question:
question2=” How many awards have AWS Media Services won in 2023?”
print_response(llm_qa_chain(question2))
We get the following answer:

Q: How many awards have AWS Media Services won in 2023?
A: According to the blog post, AWS Media Services have won five industry awards in 2023.’
Source Documents:
{‘page’: 0.0, ‘source’: ‘data/AWS Media Services awarded industry accolades _ AWS for M&E Blog.pdf’}
{‘page’: 1.0, ‘source’: ‘data/AWS Media Services awarded industry accolades _ AWS for M&E Blog.pdf’}

After you have established a sufficient level of confidence, you can deploy the models to SageMaker endpoints for real-time inference. These endpoints are fully managed and offer support for auto scaling.
SageMaker offers large model inference using Large Model Inference containers (LMIs), which we can utilize to deploy our models. These containers come equipped with pre-installed open source libraries like DeepSpeed, facilitating the implementation of performance-enhancing techniques such as tensor parallelism during inference. Additionally, they use DJLServing as a pre-built integrated model server. DJLServing is a high-performance, universal model-serving solution that offers support for dynamic batching and worker auto scaling, thereby increasing throughput.
In our approach, we use the SageMaker LMI with DJLServing and DeepSpeed Inference to deploy the Llama-2-chat 7b and BGE models to SageMaker endpoints running on ml.g5.2xlarge instances, enabling real-time inference. If you want to follow these steps yourself, refer to the accompanying notebook for detailed instructions.
You will require two ml.g5.2xlarge instances for deployment. To review or increase your quota, open the AWS Service Quotas console, choose AWS Services in the navigation pane, choose Amazon SageMaker, and refer to the value for ml.g5.2xlarge for endpoint usage.

The following steps outline the process of deploying custom models for the RAG workflow on a SageMaker endpoint:

Deploy the Llama-2 7b chat model to a SageMaker real-time endpoint running on an ml.g5.2xlarge instance for fast text generation.
Deploy the BAAI/bge-small-en-v1.5 embeddings model to a SageMaker real-time endpoint running on an ml.g5.2xlarge instance. Alternatively, you can deploy your own embedding model.
Ask a question and use the LangChain RetrievalQA to augment the prompt with the most similar documents from Pinecone, this time using the model deployed in the SageMaker real-time endpoint:

# convert your local LLM into SageMaker endpoint LLM
llm_sm_ep = SagemakerEndpoint(
endpoint_name=tg_sm_model.endpoint_name, # <— Your text-gen model endpoint name
region_name=region,
model_kwargs={
“temperature”: 0.05,
“max_new_tokens”: 512
},
content_handler=content_handler,
)

llm_qa_smep_chain = RetrievalQA.from_chain_type(
llm=llm_sm_ep, # <— This uses SageMaker Endpoint model for inference
chain_type=’stuff’,
retriever=vector_store.as_retriever(search_kwargs={“k”: 2}),
return_source_documents=True,
chain_type_kwargs={“prompt”: prompt_template}
)

Use LangChain to verify that the SageMaker endpoint with the embedding model works as expected so that it can be used for future document ingestion:

response_model = smr_client.invoke_endpoint(
EndpointName=em_sm_model.endpoint_name, <— Your embedding model endpoint name
Body=json.dumps({
“text”: “This is a sample text”
}),
ContentType=”application/json”,
)

outputs = json.loads(response_model[“Body”].read().decode(“utf8”))[‘outputs’]
Clean up
Complete the following steps to clean up your resources:

When you have finished working in your SageMaker Studio notebook, make sure you shut down the ml.g5.2xlarge instance to avoid any charges by choosing the stop icon. You can also set up lifecycle configuration scripts to automatically shut down resources when they are not used.

If you deployed the models to SageMaker endpoints, run the following code at the end of the notebook to delete the endpoints:

#delete your text generation endpoint
sm_client.delete_endpoint(
EndpointName=tg_sm_model.endpoint_name
)
# delete your text embedding endpoint
sm_client.delete_endpoint(
EndpointName=em_sm_model.endpoint_name
)

Finally, run the following line to delete the Pinecone index:

pinecone.delete_index(index_name)
Conclusion
SageMaker notebooks provide a straightforward way to kickstart your journey with Retrieval Augmented Generation. They allow you to experiment interactively with various models, configurations, and questions without spinning up additional infrastructure. In this post, we showed how to enhance the performance of Llama 2 7b chat in a question answering use case using LangChain, the BGE embeddings model, and Pinecone. To get started, launch SageMaker Studio and run the notebook available in the following GitHub repo. Please share your thoughts in the comments section!

About the authors
Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including Natural Language Processing (NLP), MLOps and Low Code No Code tools.
Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes. In his free time, he enjoys playing chess and traveling.

KT’s journey to reduce training time for a vision transformers model …

KT Corporation is one of the largest telecommunications providers in South Korea, offering a wide range of services including fixed-line telephone, mobile communication, and internet, and AI services. KT’s AI Food Tag is an AI-based dietary management solution that identifies the type and nutritional content of food in photos using a computer vision model. This vision model developed by KT relies on a model pre-trained with a large amount of unlabeled image data to analyze the nutritional content and calorie information of various foods. The AI Food Tag can help patients with chronic diseases such as diabetes manage their diets. KT used AWS and Amazon SageMaker to train this AI Food Tag model 29 times faster than before and optimize it for production deployment with a model distillation technique. In this post, we describe KT’s model development journey and success using SageMaker.
Introducing the KT project and defining the problem
The AI Food Tag model pre-trained by KT is based on the vision transformers (ViT) architecture and has more model parameters than their previous vision model to improve accuracy. To shrink the model size for production, KT is using a knowledge distillation (KD) technique to reduce the number of model parameters without significant impact to accuracy. With knowledge distillation, the pre-trained model is called a teacher model, and a lightweight output model is trained as a student model, as illustrated in the following figure. The lightweight student model has fewer model parameters than the teacher, which reduces memory requirements and allows for deployment on smaller, less expensive instances. The student maintains acceptable accuracy even though it’s smaller by learning from the outputs of the teacher model.

The teacher model remains unchanged during KD, but the student model is trained using the output logits of the teacher model as labels to calculate loss. With this KD paradigm, both the teacher and the student need to be on a single GPU memory for training. KT initially used two GPUs (A100 80 GB) in their internal, on-premises environment to train the student model, but the process took about 40 days to cover 300 epochs. To accelerate training and generate a student model in less time, KT partnered with AWS. Together, the teams significantly reduced model training time. This post describes how the team used Amazon SageMaker Training, the SageMaker Data Parallelism Library, Amazon SageMaker Debugger, and Amazon SageMaker Profiler to successfully develop a lightweight AI Food Tag model.
Building a distributed training environment with SageMaker
SageMaker Training is a managed machine learning (ML) training environment on AWS that provides a suite of features and tools to simplify the training experience and can be useful in distributed computing, as illustrated in the following diagram.

SageMaker customers can also access built-in Docker images with various pre-installed deep learning frameworks and the necessary Linux, NCCL, and Python packages for model training. Data scientists or ML engineers who want to run model training can do so without the burden of configuring training infrastructure or managing Docker and the compatibility of different libraries.
During a 1-day workshop, we were able to set up a distributed training configuration based on SageMaker within KT’s AWS account, accelerate KT’s training scripts using the SageMaker Distributed Data Parallel (DDP) library, and even test a training job using two ml.p4d.24xlarge instances. In this section, we describe KT’s experience working with the AWS team and using SageMaker to develop their model.
In the proof of concept, we wanted to speed up a training job by using the SageMaker DDP library, which is optimized for AWS infrastructure during distributed training. To change from PyTorch DDP to SageMaker DDP, you simply need to declare the torch_smddp package and change the backend to smddp, as shown in the following code:

import smdistributed.dataparallel.torch.torch_smddp

dist.init_process_group(backend=’smddp’,

rank=args.rank,

world_size=args.world_size)

To learn more about the SageMaker DDP library, refer to SageMaker’s Data Parallelism Library.
Analyzing the causes of slow training speed with the SageMaker Debugger and Profiler
The first step in optimizing and accelerating a training workload involves understanding and diagnosing where bottlenecks occur. For KT’s training job, we measured the training time per iteration of the data loader, forward pass, and backward pass:

1 iter time – dataloader : 0.00053 sec, forward : 7.77474 sec, backward: 1.58002 sec

2 iter time – dataloader : 0.00063 sec, forward : 0.67429 sec, backward: 24.74539 sec

3 iter time – dataloader : 0.00061 sec, forward : 0.90976 sec, backward: 8.31253 sec

4 iter time – dataloader : 0.00060 sec, forward : 0.60958 sec, backward: 30.93830 sec

5 iter time – dataloader : 0.00080 sec, forward : 0.83237 sec, backward: 8.41030 sec

6 iter time – dataloader : 0.00067 sec, forward : 0.75715 sec, backward: 29.88415 sec

Looking at the time in the standard output for each iteration, we saw that the backward pass’s run time fluctuated significantly from iteration to iteration. This variation is unusual and can impact total training time. To find the cause of this inconsistent training speed, we first tried to identify resource bottlenecks by utilizing the System Monitor (SageMaker Debugger UI), which allows you to debug training jobs on SageMaker Training and view the status of resources such as the managed training platform’s CPU, GPU, network, and I/O within a set number of seconds.
The SageMaker Debugger UI provides detailed and essential data that can help identifying and diagnose bottlenecks in a training job. Specifically, the CPU utilization line chart and CPU/GPU utilization heat map per instance tables caught our eye.
In the CPU utilization line chart, we noticed that some CPUs were being used 100%.

In the heat map (where darker colors indicate higher utilization), we noted that a few CPU cores had high utilization throughout the training, whereas GPU utilization wasn’t consistently high over time.

From here, we began to suspect that one of the reasons for the slow training speed was a CPU bottleneck. We reviewed the training script code to see if anything was causing the CPU bottleneck. The most suspicious part was the large value of num_workers in the data loader, so we changed this value to 0 or 1 to reduce CPU utilization. We then ran the training job again and checked the results.
The following screenshots show the CPU utilization line chart, GPU utilization, and heat map after mitigating the CPU bottleneck.

By simply changing num_workers, we saw a significant decrease in CPU utilization and an overall increase in GPU utilization. This was an important change that improved training speed significantly. Still, we wanted to see where we could optimize GPU utilization. For this, we used SageMaker Profiler.
SageMaker Profiler helps identify optimization clues by providing visibility into utilization by operations, including tracking GPU and CPU utilization metrics and kernel consumption of GPU/CPU within training scripts. It helps users understand which operations are consuming resources. First, to use SageMaker Profiler, you need to add ProfilerConfig to the function that invokes the training job using the SageMaker SDK, as shown in the following code:

from sagemaker import ProfilerConfig, Profiler

from sagemaker.debugger import (ProfilerRule, rule_configs)

rules=[ProfilerRule.sagemaker(rule_configs.ProfilerReport())]

profiler_config = ProfilerConfig(profile_params = Profiler(cpu_profiling_duration=3600))

from sagemaker.pytorch import PyTorch

region_name = ‘us-west-2′

image_uri=f’763104351884.dkr.ecr.{region_name}.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker’

estimator = PyTorch(

entry_point=’train.py’,

source_dir=’src’,

role=role,

image_uri=image_uri,

instance_count=4,

instance_type=’ml.p4d.24xlarge’,

distribution={‘smdistributed’: {‘dataparallel’: {‘enabled’: True}}},

profiler_config=profiler_config,

hyperparameters=hyperparameters,

sagemaker_session=sagemaker_session,

)

In the SageMaker Python SDK, you have the flexibility to add the annotate functions for SageMaker Profiler to select code or steps in the training script that needs profiling. The following is an example of the code that you should declare for SageMaker Profiler in the training scripts:

import smppy

SMProf = smppy.SMProfiler.instance()

config = smppy.Config()

config.profiler = {

“EnableCuda”: “1”,

}

SMProf.configure(config)

SMProf.start_profiling()

with smppy.annotate(“Forward”):

student_out = student_model(inp)

with smppy.annotate(“Backward”):

loss.backward()

SMProf.stop_profiling()

After adding the preceding code, if you run a training job using the training scripts, you can get information about the operations consumed by the GPU kernel (as shown in the following figure) after the training runs for a period of time. In the case of KT’s training scripts, we ran it for one epoch and got the following results.

When we checked the top five operation consumption times of the GPU kernel among the results of SageMaker Profiler, we found that for the KT training script, the most time is consumed by the matrix product operation, which is a general matrix multiplication (GEMM) operation on GPUs. With this important insight from the SageMaker Profiler, we began investigating ways to accelerate these operations and improve GPU utilization.
Speeding up training time
We reviewed various ways to reduce computation time of matrix multiplication and applied two PyTorch functions.
Shard optimizer states with ZeroRedundancyOptimizer
If you look at the Zero Redundancy Optimizer (ZeRO), the DeepSpeed/ZeRO technique enables the training of a large model efficiently with better training speed by eliminating the redundancies in memory used by the model. ZeroRedundancyOptimizer in PyTorch uses the technique of sharding the optimizer state to reduce memory usage per a process in Distributed Data Parallel (DDP). DDP uses synchronized gradients in the backward pass so that all optimizer replicas iterate over the same parameters and gradient values, but instead of having all the model parameters, each optimizer state is maintained by sharding only for different DDP processes to reduce memory usage.
To use it, you can leave your existing Optimizer in optimizer_class and declare a ZeroRedundancyOptimizer with the rest of the model parameters and the learning rate as parameters.

student_optimizer = ZeroRedundancyOptimizer(

student_model.parameters(),

optimizer_class=torch.optim.AdamW,

lr=initial_lr

)

Automatic mixed precision
Automatic mixed precision (AMP) uses the torch.float32 data type for some operations and torch.bfloat16 or torch.float16 for others, for the convenience of fast computation and reduced memory usage. In particular, because deep learning models are typically more sensitive to exponent bits than fraction bits in their computations, torch.bfloat16 is equivalent to the exponent bits of torch.float32, allowing them to learn quickly with minimal loss. torch.bfloat16 only runs on instances with A100 NVIDIA architecture (Ampere) or higher, such as ml.p4d.24xlarge, ml.p4de.24xlarge, and ml.p5.48xlarge.
To apply AMP, you can declare torch.cuda.amp.autocast in the training scripts as shown in the code above and declare dtype as torch.bfloat16.

with torch.cuda.amp.autocast(dtype=”torch.bfloat16″):

teacher = teacher_model(input_data)

student = student_model(input_data)

loss = loss(teacher, student, target)

loss.requires_grad_(True)

loss.backward()

student_optimizer.step()

student_optimizer.zero_grad(set_to_none=True)

Results in SageMaker Profiler
After applying the two functions to the training scripts and running a train job for one epoch again, we checked the top five operations consumption times for the GPU kernel in SageMaker Profiler. The following figure shows our results.

We can see that the GEMM operation, which was at the top of the list before applying the two Torch functions, has disappeared from the top five operations, replaced by the ReduceScatter operation, which typically occurs in distributed training.
Training speed results of the KT distilled model
We increased the training batch size by 128 more to account for the memory savings from applying the two Torch functions, resulting in a final batch size of 1152 instead of 1024. The training of the final student model was able to run 210 epochs per 1 day; the training time and speedup between KT’s internal training environment and SageMaker are summarized in the following table.

Training Environment
Training GPU spec.
Number of GPU
Training Time (hours)
Epoch
Hours per Epoch
Reduction Ratio

KT’s internal training environment
A100 (80GB)
2
960
300
3.20
29

Amazon SageMaker
A100 (40GB)
32
24
210
0.11
1

The scalability of AWS allowed us to complete the training job 29 times faster than before using 32 GPUs instead of 2 on premises. As a result, using more GPUs on SageMaker would have significantly reduced training time with no difference in overall training costs.
Conclusion
Park Sang-min (Vision AI Serving Technology Team Leader) from the AI2XL Lab in KT’s Convergence Technology Center commented on the collaboration with AWS to develop the AI Food Tag model:

“Recently, as there are more transformer-based models in the vision field, the model parameters and required GPU memory are increasing. We are using lightweight technology to solve this issue, and it takes a lot of time, about a month to learn once. Through this PoC with AWS, we were able to identify the resource bottlenecks with help of SageMaker Profiler and Debugger, resolve them, and then use SageMaker’s data parallelism library to complete the training in about one day with optimized model code on four ml.p4d.24xlarge instances.”

SageMaker helped save Sang-min’s team weeks of time in model training and development.
Based on this collaboration on the vision model, AWS and the SageMaker team will continue to collaborate with KT on various AI/ML research projects to improve model development and service productivity through applying SageMaker capabilities.
To learn more about related features in SageMaker, check out the following:

Train machine learning models
SageMaker’s Data Parallelism Library
Use Amazon SageMaker Debugger to debug and improve model performance
Use Amazon SageMaker Profiler to profile activities on AWS compute resources

About the authors
Youngjoon Choi, AI/ML Expert SA, has experienced enterprise IT in various industries such as manufacturing, high-tech, and finance as a developer, architect, and data scientist. He conducted research on machine learning and deep learning, specifically on topics like hyperparameter optimization and domain adaptation, presenting algorithms and papers. At AWS, he specializes in AI/ML across industries, providing technical validation using AWS services for distributed training/large scale models and building MLOps. He proposes and reviews architectures, aiming to contribute to the expansion of the AI/ML ecosystem.
Jung Hoon Kim is an account SA of AWS Korea. Based on experiences in applications architecture design, development and systems modeling in various industries such as hi-tech, manufacturing, finance and public sector, he is working on AWS Cloud journey and workloads optimization on AWS for enterprise customers.
Rock Sakong is a researcher at KT R&D. He has conducted research and development for the vision AI in various fields and mainly conducted facial attributes (gender/glasses, hats, etc.)/face recognition technology related to the face. Currently, he is working on lightweight technology for the vision models.
Manoj Ravi is a Senior Product Manager for Amazon SageMaker. He is passionate about building next-gen AI products and works on software and tools to make large-scale machine learning easier for customers. He holds an MBA from Haas School of Business and a Masters in Information Systems Management from Carnegie Mellon University. In his spare time, Manoj enjoys playing tennis and pursuing landscape photography.
Robert Van Dusen is a Senior Product Manager with Amazon SageMaker. He leads frameworks, compilers, and optimization techniques for deep learning training.

25 Black Friday Cyber Monday Tips for 2023 and Beyond

With only two weeks until the Black Friday Cyber Monday (BFCM) showdown, ecommerce experts, including our founder and CEO, Larry Kim, gathered for a crash course in savvy last-minute strategies. 

Hosted by our friends at Justuno, eight industry pros spilled the tea on CRO, reviews, loyalty, and more. Each expert dished out their standout BFCM tips, turning this AMA into a goldmine for businesses hustling for success.

Dive into the recap, catch the highlights, and gear up for ecommerce greatness!

Tip 1: Expand Email Captures with Website ID Pixel

Expand your email captures by identifying anonymous website visitors. Install the Customers.ai Website Visitor ID X-Ray pixel on your site to gather contact information from visitors and capture 15% to 20% of website visitors’ contact details. To manage the influx of leads, implement filters based on factors like scroll depth, time on site, or engagement with specific pages.

Tip 2: Recover Retargeting Audience Reach

Recent iOS 14 and Google Chrome cookie policy changes have resulted in new retargeting challenges. To overcome these limitations and expand your retargeting audiences, leverage customer IDs gathered from website visits. Extracting additional information, such as phone numbers and secondary emails, can enhance the targeting capabilities and potentially double or triple the size of remarketing audiences.

Tip 3: Maximize Revenue Recovery from Abandoned Carts

Leverage the customer ID data and synchronize with platforms like Shopify or Klaviyo to enhance cart abandonment sequences. By sending more leads into these sequences, including those who didn’t log in or add items to their cart, you can maximize revenue recovery efforts.

Tip 4: (Bonus Tip!)

Install the Website Visitor ID X-Ray Pixel on your site and start identifying who your visitors are. 

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations. 

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

Back to the recap and all of the great Black Friday Cyber Monday tips from our presenters.

SJ Carcamo, Partner Marketing Manager at Justuno

Tip 5: Curate Experiences with Source Data Precision

Understanding your audience begins with identified and anonymous visitors. Whether they’re in your email list or entirely new, tailor experiences based on source data. Pinpoint your top traffic sources and invest where it matters. Focus on key touchpoints such as welcome pop-ups, exit intents, cart abandonment, banners, and post-purchase prompts. 

Tip 6: Incorporate Last Touch Scenarios with Exit and Cart Abandonment Popups

Incorporate a last-touch scenario using behavioral triggers to deliver the right message at the right time. Plus, with behavioral segmentation, you already have the creative. Use existing creative from other channels and adapt them for onsite pop-ups. 

Tip 7: Free Gifts and Redemption Offers

Engage your customers at the pinnacle of their intent—add to cart and post-purchase. Offer a free gift or redemption. It’s not just about discounts; it’s a strategic move to increase average order value (AOV) and foster repeat purchases.

Andrew Christison, Co-Founder Retencity

Tip 8: Create a Detailed Segment By Segment Email Strategy

Sending to your entire list during this high-traffic period can lead to deliverability issues. Don’t do it. Tailor your segments to ensure relevance, steering clear of the risk of being blacklisted. 

Tip 9: Leverage Email Intimacy Effectively

Lean into the unique intimacy of email. People subscribe to fewer brands on email than on other channels. Coordinate your email and SMS strategies to create a cohesive experience and connect with your audience during the festive season.

Tip 10: Align the Lifecycle Customer Experience

Maintain consistency across all customer touchpoints. Ensure that pop-ups, automations, and emails align with the brand’s Black Friday and Cyber Monday promotions. Have backup assignments and email layouts ready in case of out-of-stock situations to avoid last-minute panic.

Stephanie Charon, Senior Customer Success Manager at Postscript

Tip 11: Craft Irresistible Black Friday Specials

Start by addressing your Black Friday specials strategically. Avoid conflicting offers that may lead customers to double-stack coupons. Inject a sense of urgency into your cart abandonment messages, emphasizing that this is the best sale of the year. 

Tip 12: Strategic SMS Reminders for Maximum Impact

Send strategic reminders. A staggering 80% of SMS-attributed orders occur within the first 20 minutes of receiving a message. Don’t underestimate the power of same-day follow-ups.

Tip 13: Ensure Maximum Deliverability

When sending multiple-day campaigns, include an opt-out option to maintain compliance and build trust with carriers. Schedule messages off the hour and prioritize SMS over MMS during this intense period.

Virgil Ghic, Co-Founder WeSupply

Tip 14: Proactive Shipment Monitoring and Notification

The key is to control the controllables and maintain customer trust and proactive shipment monitoring is a must. By automatically notifying customers about delays and offering compensation or discounts strategically, you can turn a potentially negative experience into a positive one.

Tip 15: Elevate Customer Experience with Personalized Tracking

Elevate the customer experience by sending personalized emails with embedded tracking information. This approach ensures customers are informed about delays, out-for-delivery status, and delivery, all through SMS or integrated scripts. 

Tip 16: Turn Exchanges into Upselling Opportunities

Exchanges aren’t just about replacing products; they’re an opportunity to upsell. Converting a return into an exchange, especially for a more expensive item, can lead to additional revenue.

Erik Swanson, Director of Customer Success at Refersion

Tip 17: Bulletproof Tracking and Testing

Nothing dampens a Black Friday campaign like tracking issues. Beyond adopting a reliable tracking solution, ensure every link, landing page, and checkout flow is meticulously tested. 

Tip 18: Incentivize Effectively

Standing out in the affiliate and influencer crowd requires proactive communication and compelling incentives. Gamifying campaigns. A gamified approach, coupled with unique rewards, sets your brand apart. 

Tip 19: Make it Fun and Valuable for Affiliates

Affiliates are bombarded with promotions during Black Friday. Make it fun and valuable and think outside the box. Create exclusive bundles, limited-time offers, or unique products available only through affiliate links. The goal is to make the promotion enticing for both affiliates and customers.

Oren Charnoff, Co-Founder & CEO Fondue

Tip 20: Ditch Default Coupons for Exotic Discounts

Rather than defaulting to conventional discounts, get creative with exotic and diverse discount types. From cashback to BOGO offers and free gifts, an engaging discount strategy can combat coupon blindness.

Tip 21: Resist the “FML” Mindset

Don’t succumb to the “FML” (Forget My List) mindset during the Black Friday rush. Instead, use this as an opportune moment to leverage the increased traffic and convert first-time shoppers into long-term customers. 

Tip 22: Make Discounts an Ongoing Engagement, Not a Holiday Fling

Don’t use discounts solely for first-time purchases. Instead, utilize discounts that encourage ongoing engagement, ensuring customers return for more.

Teala Beischer, Product Marketing Manager at Stamped

Tip 23: Elevate Customer Reviews with Context

Contextualized reviews enhance the customer experience. Whether it’s apparel, wellness, or any product, incorporate photo reviews. Ensure that the widget showcasing photo reviews on product detail pages (PDPs) is enabled and optimized for Black Friday.

Tip 24: Tap into Your Existing Audience for Loyalty

Explore the untapped potential within your existing customer base. Leverage loyalty programs and VIP initiatives for existing customers or create exclusive pre-sale campaigns for VIP customers. 

Tip 25: Maximize Post-Purchase Strategies

Offer special rewards, such as bonus points, to customers making purchases during the holiday season. This not only encourages loyalty program enrollment but also sets the stage for collecting valuable reviews from this newly engaged segment.

Let’s “Wrap it Up”

And that’s a wrap on our BFCM AMA adventure! From engaging CRO tips to the secrets of digital gifting experiences, we’ve unlocked a treasure trove of wisdom for the ecommerce world. 

As the BFCM weekend approaches, let these actionable takeaways guide your journey to success. Cheers to the thriving ecommerce community and the boundless possibilities ahead!

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 50 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website
The post 25 Black Friday Cyber Monday Tips for 2023 and Beyond appeared first on Customers.ai.

This AI Paper from MIT Introduces a Novel Approach to Robotic Manipula …

A team of researchers from MIT and the Institute of AI and Fundamental Interactions (IAIFI) has introduced a groundbreaking framework for robotic manipulation, addressing the challenge of enabling robots to understand and manipulate objects in unpredictable and cluttered environments. The problem at hand is the need for robots to have a detailed understanding of 3D geometry, which is often lacking in 2D image features.

Currently, many robotic tasks require both spatial and semantic understanding. For instance, a warehouse robot may need to pick up an item from a cluttered storage bin based on a text description in a product manifest. This necessitates the ability to grasp objects with stable affords based on both their geometric properties and semantic attributes.

To bridge this gap between 2D image features and 3D geometry, the researchers developed a framework called Feature Fields for Robotic Manipulation (F3RM). This approach leverages distilled feature fields, combining accurate 3D geometry with rich semantics from 2D foundation models. The key idea is to use pre-trained vision and vision-language models to extract features and distill them into 3D feature fields.

The F3RM framework involves three main components: feature field distillation, representing 6-DOF poses with feature fields, and open-text language guidance. Distilled Feature Fields (DFFs) extend the concept of Neural Radiance Fields (NeRF) by including an additional output to reconstruct dense 2D features from a vision model, which allows the model to map a 3D position to a feature vector, incorporating both spatial and semantic information.

For pose representation, the researchers use a set of query points in the gripper’s coordinate frame, which are sampled from a 3D Gaussian. These points are transformed into the world frame, and the features are weighted based on the local geometry. The resulting feature vectors are concatenated into a representation of the pose.

The framework also includes the ability to incorporate open-text language commands for object manipulation. The robot receives natural language queries specifying the object to manipulate during testing. It then retrieves relevant demonstrations, initializes coarse grasps, and optimizes the grasp pose based on the provided language guidance.

In terms of results, the researchers conducted experiments on grasping and placing tasks, as well as language-guided manipulation. It could understand density, color and distance between items. Experiments with cups, mugs, screwdriver handles, and caterpillar ears showed successful runs. The robot could generalize to objects that differ significantly in shape, appearance, materials, and poses. It also successfully responded to free-text natural language commands, even for new categories of objects not seen during demonstrations.

In conclusion, the F3RM framework offers a promising solution to the challenge of open-set generalization for robotic manipulation systems. By combining 2D visual priors with 3D geometry and incorporating natural language guidance, it paves the way for robots to handle complex tasks in diverse and cluttered environments. While there are still limitations, such as the time it takes to model each scene, the framework holds significant potential for advancing the field of robotics and automation.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper from MIT Introduces a Novel Approach to Robotic Manipulation: Bridging the 2D-to-3D Gap with Distilled Feature Fields and Vision-Language Models appeared first on MarkTechPost.

Zhejiang University Researchers Propose UrbanGIRAFFE to Tackle Control …

UrbanGIRAFFE, an approach proposed by researchers from Zhejiang University for photorealistic image synthesis, is introduced for controllable camera pose and scene contents. Addressing challenges in generating urban scenes for free camera viewpoint control and scene editing, the model employs a compositional and controllable strategy, utilizing a coarse 3D panoptic prior. It also includes the layout distribution of uncountable stuff and countable objects. The approach breaks down the scene into things, objects, and sky, facilitating diverse controllability, such as large camera movement, stuff editing, and object manipulation. 

In conditional image synthesis, prior methods have excelled, particularly those leveraging Generative Adversarial Networks (GANs) to generate photorealistic images. While existing approaches condition image synthesis on semantic segmentation maps or layouts, the focus has predominantly been on object-centric scenes, neglecting complex, unaligned urban scenes. UrbanGIRAFFE, a dedicated 3D-aware generative model for urban scenes, the proposal addresses these limitations, offering diverse controllability for large camera movements, stuff editing, and object manipulation.

GANs have proven effective in generating controllable and photorealistic images in conditional image synthesis. However, existing methods are limited to object-centric scenes and need help with urban scenes, hindering free camera viewpoint control and scene editing. UrbanGIRAFFE breaks down scenes into stuff, objects, and sky, leveraging semantic voxel grids and object layouts before diverse controllability, including significant camera movements and scene manipulations. 

UrbanGIRAFFE innovatively dissects urban scenes into uncountable stuff, countable objects, and the sky, employing prior distributions for stuff and things to untangle complex urban environments. The model features a conditioned stuff generator utilizing semantic voxel grids as stuff prior for integrating coarse semantic and geometry information. An object layout prior facilitates learning an object generator from cluttered scenes. Trained end-to-end with adversarial and reconstruction losses, the model leverages ray-voxel and ray-box intersection strategies to optimize sampling locations, reducing the number of required sampling points. 

In a comprehensive evaluation, the proposed UrbanGIRAFFE method surpasses various 2D and 3D baselines on synthetic and real-world datasets, showcasing superior controllability and fidelity. Qualitative assessments on the KITTI-360 dataset reveal UrbanGIRAFFE’s outperformance over GIRAFFE in background modeling, enabling enhanced stuff editing and camera viewpoint control. Ablation studies on KITTI-360 affirm the efficacy of UrbanGIRAFFE’s architectural components, including reconstruction loss, object discriminator, and innovative object modeling. Adopting a moving averaged model during inference further enhances the quality of generated images.

UrbanGIRAFFE innovatively addresses the complex task of controllable 3D-aware image synthesis for urban scenes, achieving remarkable versatility in camera viewpoint manipulation, semantic layout, and object interactions. Leveraging a 3D panoptic prior, the model effectively disentangles scenes into stuff, objects, and sky, facilitating compositional generative modeling. The approach underscores UrbanGIRAFFE’s advancement in 3D-aware generative models for intricate, unbounded sets. Future directions include integrating a semantic voxel generator for novel scene sampling and exploring lighting control through light-ambient color disentanglement. The significance of the reconstruction loss is emphasized for maintaining fidelity and producing diverse results, especially for infrequently encountered semantic classes.

Future work for UrbanGIRAFFE includes incorporating a semantic voxel generator for novel scene sampling, enhancing the method’s ability to generate diverse and novel urban scenes. There is a plan to explore lighting control by disentangling light from ambient color, aiming to provide more fine-grained control over the visual aspects of the generated scenes. One potential way to improve the quality of generated images is to use a moving average model during inference.

Check out the Paper, Github, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Zhejiang University Researchers Propose UrbanGIRAFFE to Tackle Controllable 3D Aware Image Synthesis for Challenging Urban Scenes appeared first on MarkTechPost.

Semantic Hearing: A Machine Learning-Based Novel Capability for Hearab …

Researchers from the University of Washington and Microsoft have introduced a cutting-edge concept: noise-canceling headphones with semantic hearing capabilities driven by advanced machine learning algorithms. This innovation empowers wearers to cherry-pick the sounds they wish to hear while eliminating all other auditory distractions.

The team elaborated on the central hurdle that propelled their innovative endeavor. They highlighted the problem in current noise-canceling headphones, emphasizing their inability to possess the necessary real-time intelligence for discerning and isolating specific sounds from the ambient environment. Consequently, achieving seamless synchronization between the auditory experience of wearers and their visual perception emerges as a critical concern. Any delay in processing auditory stimuli is deemed unacceptable; it must happen almost instantaneously.

Unlike conventional noise-canceling headphones that primarily focus on muffling incoming sounds or filtering selected frequencies, this pioneering prototype takes a divergent approach. It employs a classification system for incoming sounds, allowing users to personalize their auditory experience by choosing what they want to hear.

The prototype’s potential was demonstrated through a series of trials. These ranged from holding conversations amidst vacuum cleaner noise to tuning out street chatter to focus on bird calls and even mitigating construction clatter while remaining attentive to traffic honks. The device facilitated meditation by silencing ambient noises, except for an alarm signaling the session’s end.

The crux of achieving rapid sound processing lies in leveraging a more potent device than what can be integrated into headphones: the user’s smartphone. This device hosts a specialized neural network explicitly designed for binaural sound extraction—a pioneering feat, according to the researchers.

During the experimentation, the team successfully operated with 20 distinct sound classes, showcasing that their transformer-based network executes within a mere 6.56 milliseconds on a connected smartphone. The real-world assessments in novel indoor and outdoor scenarios confirm the proof-of-concept system’s efficacy in extracting target sounds while preserving spatial cues in its binaural output.

This pioneering stride in noise-canceling technology holds vast promise for enhancing user experiences in diverse settings. By allowing individuals to curate their auditory environment in real time, these next-generation headphones transcend the limitations of their predecessors. As the team continues refining this innovation and prepares for code publication, the prospects for a future where personalized soundscapes are at our fingertips seem ever closer to reality.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Semantic Hearing: A Machine Learning-Based Novel Capability for Hearable Devices to Focus on or Ignore Specific Sounds in Real Environments while Maintaining Spatial Awareness appeared first on MarkTechPost.

This AI Paper Introduces LCM-LoRA: Revolutionizing Text-to-Image Gener …

Latent Diffusion Models are generative models used in machine learning, particularly in probabilistic modeling. These models aim to capture a dataset’s underlying structure or latent variables, often focusing on generating realistic samples or making predictions. These describe the evolution of a system over time. This can refer to transforming a set of random variables from an initial distribution to a desired distribution through a series of steps or diffusion processes.

These models are based on  ODE-Solver methods. Despite reducing the number of inference steps needed, they still demand a significant computational overhead, especially when incorporating classifier-free guidance. Distillation methods such as Guided-Distill are promising but must be improved due to their intensive computational requirements. 

To tackle such issues, the need for Latent Consistency Models has emerged. Their approach involves a reverse diffusion process, treating it as an augmented probability floe ODE problem. They innovatively predict the solution in the latent space and bypass the need for iterative solutions through numerical ODE solvers. It just takes 1 to 4 inference steps in the remarkable synthesis of high-resolution images. 

Researchers at Tsinghua University extend the LCM’s potential by applying LoRA distillation to Stable-Diffusion models, including SD-V1.5, SSD-1B, and SDXL. They have expanded LCM’s scope to larger models with significantly less memory consumption by achieving superior image generation quality. For specialized datasets like those for anime, photo-realistic, or fantasy images, additional steps are necessary, such as employing Latent Consistency Distillation (LCD) to distill a pre-trained LDM into an LCM or directly fine-tuning an LCM using LCF. However, can one achieve fast, training-free inference on custom datasets?

The team introduces LCM-LoRA as a universal training-free acceleration module that can be directly plugged into various Stable-Diffusion fine-tuned models to answer this. Within the framework of LoRA, the resultant LoRA parameters can be seamlessly integrated into the original model parameters. The team has demonstrated the feasibility of employing LoRA for the Latent Consistency Models (LCMs) distillation process. The LCM-LoRA parameters can be directly combined with other LoRA parameters and fine-tuned on datasets of particular styles. This will enable one to generate images in specific styles with minimal sampling steps without the need for any further training. Thus, they represent a universally applicable accelerator for diverse image-generation tasks.

This innovative approach significantly reduces the need for iterative steps, enabling the rapid generation of high-fidelity images from text inputs and setting a new standard for state-of-the-art performance. LoRA significantly trims the volume of parameters to be modified, thereby enhancing computational efficiency and permitting model refinement with considerably less data.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Introduces LCM-LoRA: Revolutionizing Text-to-Image Generative Tasks with Advanced Latent Consistency Models and LoRA Distillation appeared first on MarkTechPost.

Moderate your Amazon IVS live stream using Amazon Rekognition

Amazon Interactive Video Service (Amazon IVS) is a managed live streaming solution that is designed to provide a quick and straightforward setup to let you build interactive video experiences and handles interactive video content from ingestion to delivery.
With the increased usage of live streaming, the need for effective content moderation becomes even more crucial. User-generated content (UGC) presents complex challenges for safety. Many companies rely on human moderators to monitor video streams, which is time-consuming, error-prone, and doesn’t scale with business growth speed. An automated moderation solution supporting a human in the loop (HITL) is increasingly needed.
Amazon Rekognition Content Moderation, a capability of Amazon Rekognition, automates and streamlines image and video moderation workflows without requiring machine learning (ML) experience. In this post, we explain the common practice of live stream visual moderation with a solution that uses the Amazon Rekognition Image API to moderate live streams. You can deploy this solution to your AWS account using the AWS Cloud Development Kit (AWS CDK) package available in our GitHub repo.
Moderate live stream visual content
The most common approach for UGC live stream visual moderation involves sampling images from the stream and utilizing image moderation to receive near-real-time results. Live stream platforms can use flexible rules to moderate visual content. For instance, platforms with younger audiences might have strict rules about adult content and certain products, whereas others might focus on hate symbols. These platforms establish different rules to match their policies effectively. Combining human and automatic review, a hybrid process is a common design approach. Certain streams will be stopped automatically, but human moderators will also assess whether a stream violates platform policies and should be deactivated.
The following diagram illustrates the conceptual workflow of a near-real-time moderation system, designed with loose coupling to the live stream system.

The workflow contains the following steps:

The live stream service (or the client app) samples image frames from video streams based on a specific interval.
A rules engine evaluates moderation guidelines, determining the frequency of stream sampling and the applicable moderation categories, all within predefined policies. This process involves the utilization of both ML and non-ML algorithms.
The rules engine alerts human moderators upon detecting violations in the video streams.
Human moderators assess the result and deactivate the live stream.

Moderating UGC live streams is distinct from classic video moderation in media. It caters to diverse regulations. How frequently images are sampled from video frames for moderation is typically determined by the platform’s Trust & Safety policy and the service-level agreement (SLA). For instance, if a live stream platform aims to stop channels within 3 minutes for policy violations, a practical approach is to sample every 1–2 minutes, allowing time for human moderators to verify and take action. Some platforms require flexible moderation frequency control. For instance, highly reputable streamers may need less moderation, whereas new ones require closer attention. This also enables cost-optimization by reducing sampling frequency.
Cost is an important consideration in any live stream moderation solution. As UGC live stream platforms rapidly expand, moderating concurrent streams at a high frequency can raise cost concerns. The solution presented in this post is designed to optimize cost by allowing you to define moderation rules to customize sample frequency, ignore similar image frames, and other techniques.
Recording Amazon IVS stream content to Amazon S3
Amazon IVS offers native solutions for recording stream content to an Amazon Simple Storage Service (Amazon S3) bucket and generating thumbnails—image frames from a video stream. It generates thumbnails every 60 seconds by default and provides users the option to customize the image quality and frequency. Using the AWS Management Console, you can create a recording configuration and link it to an Amazon IVS channel. When a recording configuration is associated with a channel, the channel’s live streams are automatically recorded to the specified S3 bucket.
There are no Amazon IVS charges for using the auto-record to Amazon S3 feature or for writing to Amazon S3. There are charges for Amazon S3 storage, Amazon S3 API calls that Amazon IVS makes on behalf of the customer, and serving the stored video to viewers. For details about Amazon IVS costs, refer to Costs (Low-Latency Streaming).
Amazon Rekognition Moderation APIs
In this solution, we use the Amazon Rekognition DetectModerationLabel API to moderate Amazon IVS thumbnails in near-real time. Amazon Rekognition Content Moderation provides pre-trained APIs to analyze a wide range of inappropriate or offensive content, such as violence, nudity, hate symbols, and more. For a comprehensive list of Amazon Rekognition Content Moderation taxonomies, refer to Moderating content.
The following code snippet demonstrates how to call the Amazon Rekognition DetectModerationLabel API to moderate images within an AWS Lambda function using the Python Boto3 library:

import boto3

# Initialize the Amazon Rekognition client object
rekognition = boto3.client(‘rekognition’)

# Call the Rekognition Image moderation API
response = rekognition.detect_moderation_labels(
Image={‘S3Object’: {‘Bucket’: data_bucket,’Name’: s3_key}}
)

The following is an example response from the Amazon Rekognition Image Moderation API:

{
“ModerationLabels”: [
{
“Confidence”: 99.9290542602539,
“Name”: “Female Swimwear Or Underwear”,
“ParentName”: “Suggestive”
},

],
“ModerationModelVersion”: “6.1”
}

For additional examples of the Amazon Rekognition Image Moderation API, refer to our Content Moderation Image Lab.
Solution overview
This solution integrates with Amazon IVS by reading thumbnail images from an S3 bucket and sending images to the Amazon Rekognition Image Moderation API. It provides choices for stopping the stream automatically and human-in-the-loop review. You can configure rules for the system to automatically halt streams based on conditions. It also includes a light human review portal, empowering moderators to monitor streams, manage violation alerts, and stop streams when necessary.
In this section, we briefly introduce the system architecture. For more detailed information, refer to the GitHub repo.
The following screen recording displays the moderator UI, enabling them to monitor active streams with moderation warnings, and take actions such as stopping the stream or dismissing warnings.

Users can customize moderation rules, controlling video stream sample frequency per channel, configuring Amazon Rekognition moderation categories with confidence thresholds, and enabling similarity checks, which ensures performance and cost-optimization by avoiding processing redundant images.
The following screen recording displays the UI for managing a global configuration.

The solution uses a microservices architecture, which consists of two key components loosely coupled with Amazon IVS.

Rules engine
The rules engine forms the backbone of the live stream moderation system. It is a live processing service that enables near-real-time moderation. It uses Amazon Rekognition to moderate images, validates results against customizable rules, employs image hashing algorithms to recognize and exclude similar images, and can halt streams automatically or alert the human review subsystem upon rule violations. The service integrates with Amazon IVS through Amazon S3-based image reading and facilitates API invocation via Amazon API Gateway.
The following architecture diagram illustrates the near-real-time moderation workflow.

There are two methods to trigger the rules engine processing workflow:

S3 file trigger – When a new image is added to the S3 bucket, the workflow starts. This is the recommended way for Amazon IVS integration.
REST API call – You can make a RESTful API call to API Gateway with the image bytes in the request body. The API stores the image in an S3 bucket, triggering near-real-time processing. This approach is fitting for images captured by the client side of the live stream app and transmitted over the internet.

The image processing workflow, managed by AWS Step Functions, involves several steps:

Check the sample frequency rule. Processing halts if the previous sample time is too recent.
If enabled in the config, perform a similarity check using image hash algorithms. The process skips the image if it’s similar to the previous one received for the same channel.
Use the Amazon Rekognition Image Moderation API to assess the image against configured rules, applying a confidence threshold and ignoring unnecessary categories.
If the moderation result violates any rules, send notifications to an Amazon Simple Notification Service (Amazon SNS) topic, alerting downstream systems with moderation warnings.
If the auto stop moderation rule is violated, the Amazon IVS stream will be stopped automatically.

The design manages rules through a Step Functions state machine, providing a drag-and-drop GUI for flexible workflow definition. You can extend the rules engine by incorporating additional Step Functions workflows.
Monitoring and management dashboard
The monitoring and management dashboard is a web application with a UI that lets human moderators monitor Amazon IVS live streams. It provides near-real-time moderation alerts, allowing moderators to stop streams or dismiss warnings. The web portal also empowers administrators to manage moderation rules for the rules engine. It supports two types of configurations:

Channel rules – You can define rules for specific channels.
Global rules – These rules apply to all or a subset of Amazon IVS channels that lack specific configurations. You can define a regular expression to apply the global rule to Amazon IVS channel names matching a pattern. For example: .* applies to all channels. /^test-/ applies to channels with names starting with test-.

The system is a serverless web app, featuring a static React front end hosted on Amazon S3 with Amazon CloudFront for caching. Authentication is handled by Amazon Cognito. Data is served through API Gateway and Lambda, with state storage in Amazon DynamoDB. The following diagram illustrates this architecture.

The monitoring dashboard is a lightweight demo app that provides essential features for moderators. To enhance functionality, you can extend the implementation to support multiple moderators with a management system and reduce latency by implementing a push mechanism using WebSockets.
Moderation latency
The solution is designed for near-real-time moderation, with latency measured across two separate subsystems:

Rules engine workflow – The rules engine workflow, from receiving images to sending notifications via Amazon SNS, averages within 2 seconds. This service promptly handles images through a Step Functions state machine. The Amazon Rekognition Image Moderation API processes under 500 milliseconds for average file sizes below 1 MB. (These findings are based on tests conducted with the sample app, meeting near-real-time requirements.) In Amazon IVS, you have the option to select different thumbnail resolutions to adjust the image size.
Monitoring web portal – The monitoring web portal subscribes to the rules engine’s SNS topic. It records warnings in a DynamoDB table, while the website UI fetches the latest warnings every 10 seconds. This design showcases a lightweight demonstration of the moderator’s view. To further reduce latency, consider implementing a WebSocket to instantly push warnings to the UI upon their arrival via Amazon SNS.

Extend the solution
This post focuses on live stream visual content moderation. However, the solution is intentionally flexible, capable of accommodating complex business rules and extensible to support other media types, including moderating chat messages and audio in live streams. You can enhance the rules engine by introducing new Step Functions state machine workflows with upstream dispatching logic. We’ll delve deeper into live stream text and audio moderation using AWS AI services in upcoming posts.
Summary
In this post, we provided an overview of a sample solution that showcases how to moderate Amazon IVS live stream videos using Amazon Rekognition. You can experience the sample app by following the instructions in the GitHub repo and deploying it to your AWS account using the included AWS CDK package.
Learn more about content moderation on AWS. Take the first step towards streamlining your content moderation operations with AWS.

About the Authors
Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.
Tony Vu is a Senior Partner Engineer at Twitch. He specializes in assessing partner technology for integration with Amazon Interactive Video Service (IVS), aiming to develop and deliver comprehensive joint solutions to our IVS customers.

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpSt …

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content.
The Retrieval-Augmented Generation (RAG) framework augments prompts with external data from multiple sources, such as document repositories, databases, or APIs, to make foundation models effective for domain-specific tasks. This post presents the capabilities of the RAG model and highlights the transformative potential of MongoDB Atlas with its Vector Search feature.
MongoDB Atlas is an integrated suite of data services that accelerate and simplify the development of data-driven applications. Its vector data store seamlessly integrates with operational data storage, eliminating the need for a separate database. This integration enables powerful semantic search capabilities through Vector Search, a fast way to build semantic search and AI-powered applications.
Amazon SageMaker enables enterprises to build, train, and deploy machine learning (ML) models. Amazon SageMaker JumpStart provides pre-trained models and data to help you get started with ML. You can access, customize, and deploy pre-trained models and data through the SageMaker JumpStart landing page in Amazon SageMaker Studio with just a few clicks.
Amazon Lex is a conversational interface that helps businesses create chatbots and voice bots that engage in natural, lifelike interactions. By integrating Amazon Lex with generative AI, businesses can create a holistic ecosystem where user input seamlessly transitions into coherent and contextually relevant responses.
Solution overview
The following diagram illustrates the solution architecture.

In the following sections, we walk through the steps to implement this solution and its components.
Set up a MongoDB cluster
To create a free tier MongoDB Atlas cluster, follow the instructions in Create a Cluster. Set up the database access and network access.
Deploy the SageMaker embedding model
You can choose the embedding model (ALL MiniLM L6 v2) on the SageMaker JumpStart Models, notebooks, solutions page.

Choose Deploy to deploy the model.
Verify the model is successfully deployed and verify the endpoint is created.

Vector embedding
Vector embedding is a process of converting a text or image into a vector representation. With the following code, we can generate vector embeddings with SageMaker JumpStart and update the collection with the created vector for every document:
payload = {“text_inputs”: [document[field_name_to_be_vectorized]]}
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode(‘utf-8’))
embeddings = parse_response_multiple_texts(query_response)

# update the document
update = {‘$set’: {vector_field_name :  embeddings[0]}}
collection.update_one(query, update)
The code above shows how to update a single object in a collection.  To update all objects follow the instructions.
MongoDB vector data store
MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB. Vector data is a type of data that represents a point in a high-dimensional space. This type of data is often used in ML and artificial intelligence applications. MongoDB Atlas Vector Search uses a technique called k-nearest neighbors (k-NN) to search for similar vectors. k-NN works by finding the k most similar vectors to a given vector. The most similar vectors are the ones that are closest to the given vector in terms of the Euclidean distance.
Storing vector data next to operational data can improve performance by reducing the need to move data between different storage systems. This is especially beneficial for applications that require real-time access to vector data.
Create a Vector Search index
The next step is to create a MongoDB Vector Search index on the vector field you created in the previous step. MongoDB uses the knnVector type to index vector embeddings. The vector field should be represented as an array of numbers (BSON int32, int64, or double data types only).
Refer to Review knnVector Type Limitations for more information about the limitations of the knnVector type.
The following code is a sample index definition:
{
“mappings”: {
“dynamic”: true,
“fields”: {
“egVector”: {
“dimensions”: 384,
“similarity”: “euclidean”,
“type”: “knnVector”
}
}
}
}

Note that the dimension must match you embeddings model dimension.
Query the vector data store
You can query the vector data store using the Vector Search aggregation pipeline. It uses the Vector Search index and performs a semantic search on the vector data store.
The following code is a sample search definition:
{
$search: {
“index”: “<index name>”, // optional, defaults to “default”
“knnBeta”: {
“vector”: [<array-of-numbers>],
“path”: “<field-to-search>”,
“filter”: {<filter-specification>},
“k”: <number>,
“score”: {<options>}
}
}
}

Deploy the SageMaker large language model
SageMaker JumpStart foundation models are pre-trained large language models (LLMs) that are used to solve a variety of natural language processing (NLP) tasks, such as text summarization, question answering, and natural language inference. They are available in a variety of sizes and configurations. In this solution, we use the Hugging Face FLAN-T5-XL model.
Search for the FLAN-T5-XL model in SageMaker JumpStart.

Choose Deploy to set up the FLAN-T5-XL model.

Verify the model is deployed successfully and the endpoint is active.

Create an Amazon Lex bot
To create an Amazon Lex bot, complete the following steps:

On the Amazon Lex console, choose Create bot.

For Bot name, enter a name.
For Runtime role, select Create a role with basic Amazon Lex permissions.
Specify your language settings, then choose Done.
Add a sample utterance in the NewIntent UI and choose Save intent.
Navigate to the FallbackIntent that was created for you by default and toggle Active in the Fulfillment section.
Choose Build and after the build is successful, choose Test.
Before testing, choose the gear icon.
Specify the AWS Lambda function that will interact with MongoDB Atlas and the LLM to provide responses.  To create the lambda function follow these steps.
You can now interact with the LLM.

Clean up
To clean up your resources, complete the following steps:

Delete the Amazon Lex bot.
Delete the Lambda function.
Delete the LLM SageMaker endpoint.
Delete the embeddings model SageMaker endpoint.
Delete the MongoDB Atlas cluster.

Conclusion
In the post, we showed how to create a simple bot that uses MongoDB Atlas semantic search and integrates with a model from SageMaker JumpStart. This bot allows you to quickly prototype user interaction with different LLMs in SageMaker Jumpstart while pairing them with the context originating in MongoDB Atlas.
As always, AWS welcomes feedback. Please leave your feedback and questions in the comments section.

About the authors

Igor Alekseev is a Senior Partner Solution Architect at AWS in Data and Analytics domain. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation.
Babu Srinivasan is a Senior Partner Solutions Architect at MongoDB. In his current role, he is working with AWS to build the technical integrations and reference architectures for the AWS and MongoDB solutions. He has more than two decades of experience in Database and Cloud technologies . He is passionate about providing technical solutions to customers working with multiple Global System Integrators(GSIs) across multiple geographies.

Build a foundation model (FM) powered customer service bot with agents …

From enhancing the conversational experience to agent assistance, there are plenty of ways that generative artificial intelligence (AI) and foundation models (FMs) can help deliver faster, better support. With the increasing availability and diversity of FMs, it’s difficult to experiment and keep up-to-date with the latest model versions. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. With Amazon Bedrock’s comprehensive capabilities, you can easily experiment with a variety of top FMs, customize them privately with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).
Agents for Amazon Bedrock
In July, AWS announced the preview of agents for Amazon Bedrock, a new capability for developers to create fully managed agents in a few clicks. Agents extend FMs to run complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code. With fully managed agents, you don’t have to worry about provisioning or managing infrastructure.
In this post, we provide a step-by-step guide with building blocks to create a customer service bot. We use a text generation model (Anthropic Claude V2) and agents for Amazon Bedrock for this solution. We provide an AWS CloudFormation template to provision the resources needed for building this solution. Then we walk you through steps to create an agent for Amazon Bedrock.
ReAct Prompting
FMs determine how to solve user-requested tasks with a technique called ReAct. It’s a general paradigm that combines reasoning and acting with FMs. ReAct prompts FMs to generate verbal reasoning traces and actions for a task. This allows the system to perform dynamic reasoning to create, maintain, and adjust plans for acting while incorporating additional information into the reasoning. The structured prompts include a sequence of question-thought-action-observation examples.

The question is the user-requested task or problem to solve.
The thought is a reasoning step that helps demonstrate to the FM how to tackle the problem and identify an action to take.
The action is an API that the model can invoke from an allowed set of APIs.
The observation is the result of carrying out the action.

Components in agents for Amazon Bedrock
Behind the scenes, agents for Amazon Bedrock automate the prompt engineering and orchestration of user-requested tasks. They can securely augment the prompts with company-specific information to provide responses back to the user in natural language. The agent breaks the user-requested task into multiple steps and orchestrates subtasks with the help of FMs. Action groups are tasks that the agent can perform autonomously. Action groups are mapped to an AWS Lambda function and related API schema to perform API calls. The following diagram depicts the agent structure.

Solution overview
We use a shoe retailer use case to build the customer service bot. The bot helps customers purchase shoes by providing options in a humanlike conversation. Customers converse with the bot in natural language with multiple steps invoking external APIs to accomplish subtasks. The following diagram illustrates the sample process flow.

The following diagram depicts a high-level architecture of this solution.

You can create an agent with Amazon Bedrock-supported FMs such as Anthropic Claude V2.
Attach API schema, residing in an Amazon Simple Storage Service (Amazon S3) bucket, and a Lambda function containing the business logic to the agent. (Note: This is a one-time setup step.)
The agent uses customer requests to create a prompt using the ReAct framework. It, then, uses the API schema to invoke corresponding code in the Lambda function.
You can perform a variety of tasks, including sending email notifications, writing to databases, and triggering application APIs in the Lambda functions.

In this post, we use the Lambda function to retrieve customer details, list shoes matching customer-preferred activity, and finally, place orders. Our code is backed by an in-memory SQLite database. You can use similar constructs to write to a persistent data store.
Prerequisites
To implement the solution provided in this post, you should have an AWS account and access to Amazon Bedrock with agents enabled (currently in preview). Use AWS CloudFormation template to create the resource stack needed for the solution.

us-east-1

The CloudFormation template creates two IAM roles. Update these roles to apply least-privilege permissions as discussed in Security best practices. Click here to learn what IAM features are available to use with agents for Amazon Bedrock.

LambdaBasicExecutionRole with Amazon S3 full access and CloudWatch access for logging.
AmazonBedrockExecutionRoleForAgents with Amazon S3 full access and Lambda full access.

Important: Agents for Amazon Bedrock must have the role name prefixed by AmazonBedrockExecutionRoleForAgents_*
Bedrock Agents setup
In the next two sections, we will walk you through creating and testing an agent.
Create an agent for Amazon Bedrock
To create an agent, open the Amazon Bedrock console and choose Agents in the left navigation pane. Then select Create Agent.

This starts the agent creation workflow.

Provide agent details: Give the agent a name and description (optional). Select the service role created by the CloudFormation stack and select Next.

Select a foundation model: In the Select model screen, you select a model. Provide clear and precise instructions to the agent about what tasks to perform and how to interact with the users.

Add action groups: An action is a task the agent can perform by making API calls. A set of actions comprise an action group. You provide an API schema that defines all the APIs in the action group. You must provide an API schema in the OpenAPI schema JSON format. The Lambda function contains the business logic needed to perform API calls. You must associate a Lambda function to each action group.

Give the action group a name and a description for the action. Select the Lambda function, provide an API schema file and select Next.

In the final step, review the agent configuration and select Create Agent.

Test and deploy agents for Amazon Bedrock

Test the agent: After the agent is created, a dialog box shows the agent overview along with a working draft. The Amazon Bedrock console provides a UI to test your agent.

Deploy: After successful testing, you can deploy your agent. To deploy an agent in your application, you must create an alias. Amazon Bedrock then automatically creates a version for that alias.

The following actions occur with the preceding agent setup and the Lambda code provided with this post:

The agent creates a prompt from the developer-provided instructions (such as “You are an agent that helps customers purchase shoes.”), API schemas needed to complete the tasks, and data source details. The automatic prompt creation saves weeks of experimenting with prompts for different FMs.
The agent orchestrates the user-requested task, such as “I am looking for shoes,” by breaking it into smaller subtasks such as getting customer details, matching the customer-preferred activity with shoe activity, and placing shoe orders. The agent determines the right sequence of tasks and handles error scenarios along the way.

The following screenshot displays some example responses from the agent.

By selecting Show trace for each response, a dialog box shows the reasoning technique used by the agent and the final response generated by the FM.

Cleanup
To avoid incurring future charges, delete the resources. You can do this by deleting the stack from the CloudFormation console.

Feel free to download and test the code used in this post from the GitHub agents for Amazon Bedrock repository. You can also invoke the agents for Amazon Bedrock programmatically; an example Jupyter Notebook is provided in the repository.
Conclusion
Agents for Amazon Bedrock can help you increase productivity, improve your customer service experience, or automate DevOps tasks. In this post, we showed you how to set up agents for Amazon Bedrock to create a customer service bot.
We encourage you to learn more by reviewing additional features of Amazon Bedrock. You can use the example code provided in this post to create your implementation. Try our workshop to gain hands-on experience with Amazon Bedrock.

About the Authors
Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.
Manju Prasad is a Senior Solutions Architect within Strategic Accounts at Amazon Web Services. She focuses on providing technical guidance in a variety of domains, including AI/ML to a marquee M&E customer. Prior to joining AWS, she has worked for companies in the Financial Services sector and also a startup.
Archana Inapudi is a Senior Solutions Architect at AWS supporting Strategic Customers. She has over a decade of experience helping customers design and build data analytics, and database solutions. She is passionate about using technology to provide value to customers and achieve business outcomes.

Use foundation models to improve model accuracy with Amazon SageMaker

Photo by Scott Webb on Unsplash

Determining the value of housing is a classic example of using machine learning (ML). A significant influence was made by Harrison and Rubinfeld (1978), who published a groundbreaking paper and dataset that became known informally as the Boston housing dataset. This seminal work proposed a method for estimating housing prices as a function of numerous dimensions, including air quality, which was the principal focus of their research. Almost 50 years later, the estimation of housing prices has become an important teaching tool for students and professionals interested in using data and ML in business decision-making.
In this post, we discuss the use of an open-source model specifically designed for the task of visual question answering (VQA). With VQA, you can ask a question of a photo using natural language and receive an answer to your question—also in plain language. Our goal in this post is to inspire and demonstrate what is possible using this technology. We propose using this capability with the Amazon SageMaker platform of services to improve regression model accuracy in an ML use case, and independently, for the automated tagging of visual images.
We provide a corresponding YouTube video that demonstrates what is discussed here. Video playback will start midway to highlight the most salient point. We suggest you follow this reading with the video to reinforce and gain a richer understanding of the concept.
Foundation models
This solution centers on the use of a foundation model published to the Hugging Face model repository. Here, we use the term foundation model to describe an artificial intelligence (AI) capability that has been pre-trained on a large and diverse body of data. Foundation models can sometimes be ready to use without the burden of training a model from zero. Some foundation models can be fine-tuned, which means teaching them additional patterns that are relevant to your business but missing from the original, generalized published model. Fine-tuning is sometimes needed to deliver correct responses that are unique to your use case or body of knowledge.
In the Hugging Face repository, there are several VQA models to choose from. We selected the model with the most downloads at the time of this writing. Although this post demonstrates the ability to use a model from an open-source model repository, the same concept would apply to a model you trained from zero or used from another trusted provider.
A modern approach to a classic use case
Home price estimation has traditionally occurred through tabular data where features of the property are used to inform price. Although there can be hundreds of features to consider, some fundamental examples are the size of the home in the finished space, the number of bedrooms and bathrooms, and the location of the residence.
Machine learning is capable of incorporating diverse input sources beyond tabular data, such as audio, still images, motion video, and natural language. In AI, the term multimodal refers to the use of a variety of media types, such as images and tabular data. In this post, we show how to use multimodal data to find and liberate hidden value locked up in the abundant digital exhaust produced by today’s modern world.
With this idea in mind, we demonstrate the use of foundation models to extract latent features from images of the property. By utilizing insights found in the images, not previously available in the tabular data, we can improve the accuracy of the model. Both the images and tabular data discussed in this post were originally made available and published to GitHub by Ahmed and Moustafa (2016).
A picture is worth a thousand words
Now that we understand the capabilities of VQA, let’s consider the two following images of kitchens. How would you assess the home’s value from these images? What are some questions you would ask yourself? Each picture may elicit dozens of questions in your mind. Some of those questions may lead to meaningful answers that improve a home valuation process.

Photos credit Francesca Tosolini (L) and Sidekix Media (R) on Unsplash
The following table provides anecdotal examples of VQA interactions by showing questions alongside their corresponding answers. Answers can come in the form of categorical, continuous value, or binary responses.

Example Question
Example Answer from Foundation Model

What are the countertops made from?
granite, tile, marble, laminate, etc.

Is this an expensive kitchen?
yes, no

How many separated sinks are there?
0, 1, 2

Reference architecture
In this post, we use Amazon SageMaker Data Wrangler to ask a uniform set of visual questions for thousands of photos in the dataset. SageMaker Data Wrangler is purpose-built to simplify the process of data preparation and feature engineering. By providing more than 300 built-in transformations, SageMaker Data Wrangler helps reduce the time it takes to prepare tabular and image data for ML from weeks to minutes. Here, SageMaker Data Wrangler combines data features from the original tabular set with photo-born features from the foundation model for model training.
Next, we build a regression model with the use of Amazon SageMaker Canvas. SageMaker Canvas can build a model, without writing any code, and deliver preliminary results in as little as 2–15 minutes. In the section that follows, we provide a reference architecture used to make this solution guidance possible.
Many popular models from Hugging Face and other providers are one-click deployable with Amazon SageMaker JumpStart. There are hundreds of thousands of models available in these repositories. For this post, we choose a model not available in SageMaker JumpStart, which requires a customer deployment. As shown in the following figure, we deploy a Hugging Face model for inference using an Amazon SageMaker Studio notebook. The notebook is used to deploy an endpoint for real-time inference. The notebook uses assets that include the Hugging Face binary model, a pointer to a container image, and a purpose-built inference.py script that matches the model’s expected input and output. As you read this, the mix of available VQA models may change. The important thing is to review available VQA models, at the time you read this, and be prepared to deploy the model you choose, which will have its own API request and response contract.

After the VQA model is served by the SageMaker endpoint, we use SageMaker Data Wrangler to orchestrate the pipeline that ultimately combines tabular data and features extracted from the digital images and reshape the data for model training. The next figure offers a view of how the full-scale data transformation job is run.
In the following figure, we use SageMaker Data Wrangler to orchestrate data preparation tasks and SageMaker Canvas for model training. First, SageMaker Data Wrangler uses Amazon Location Service to convert ZIP codes available in the raw data into latitude and longitude features. Second, SageMaker Data Wrangler is able to coordinate sending thousands of photos to a SageMaker hosted endpoint for real-time inference, asking a uniform set of questions per scene. This results a rich array of features that describe characteristics observed in kitchens, bathrooms, home exteriors, and more. After data has been prepared by SageMaker Data Wrangler, a training data set is available in Amazon Simple Storage Service (Amazon S3). Using the S3 data as an input, SageMaker Canvas is able to train a model, in as little as 2–15 minutes, without writing any code.

Data transformation using SageMaker Data Wrangler
The following screenshot shows a SageMaker Data Wrangler workflow. The workflow begins with thousands of photos of homes stored in Amazon S3. Next, a scene detector determines the scene, such as kitchen or bathroom. Finally, a scene-specific set of questions are asked of the images, resulting in a richer, tabular dataset available for training.

The following is an example of the SageMaker Data Wrangler custom transformation code used to interact with the foundation model and obtain information about pictures of kitchens. In the preceding screenshot, if you were to choose the kitchen features node, the following code would appear:

from botocore.config import Config
import json
import boto3
import base64
from pyspark.sql.functions import col, udf, struct, lit

def get_answer(question,image):

encoded_input_image = base64.b64encode(bytearray(image)).decode()

payload = {
“question”: question,
“image”: encoded_input_image
}

payload = json.dumps(payload).encode(‘utf-8’)
response = boto3.client(‘runtime.sagemaker’, config=Config(region_name=’us-west-2′)).invoke_endpoint(EndpointName=’my-vqa-endpoint-name’, ContentType=’application/json’, Body=payload)
return json.loads(response[‘Body’].read())[“predicted_answer”]

vqaUDF = udf(lambda q,img: get_answer(q,img))

# process only images of bathroom type
df = df[df[‘scene’]==’kitchen’]

visual_questions = [
(‘kitchen_floor_composition’, ‘what is the floor made of’),
(‘kitchen_floor_color’, ‘what color is the floor’),
(‘kitchen_counter_composition’, ‘what is the countertop made of’),
(‘kitchen_counter_color’, ‘what color is the countertop’),
(‘kitchen_wall_composition’, ‘what are the walls made of’),
(‘kitchen_refrigerator_stainless’, ‘is the refrigerator stainless steel’),
(‘kitchen_refrigerator_builtin’, ‘is there a built-in refrigerator’),
(‘kitchen_refrigerator_visible’, ‘is a refrigerator visible’),
(‘kitchen_cabinet_composition’, ‘what are the kitchen cabinets made of’),
(‘kitchen_cabinet_wood’, ‘what type of wood are the kitchen cabinets’),
(‘kitchen_window’, ‘does the kitchen have windows’),
(‘kitchen_expensive’, ‘is this an expensive kitchen’),
(‘kitchen_large’, ‘is this a large kitchen’),
(‘kitchen_recessed_lights’, ‘are there recessed lights’)
]

for i in visual_questions:
df = df.withColumn(i[0], vqaUDF(lit(i[1]),col(‘image_col.data’)))

As a security consideration, you must first enable SageMaker Data Wrangler to call your SageMaker real-time endpoint through AWS Identity and Access Management (IAM). Similarly, any AWS resources you invoke through SageMaker Data Wrangler will need similar allow permissions.
Data structures before and after SageMaker Data Wrangler
In this section, we discuss the structure of the original tabular data and the enhanced data. The enhanced data contains new data features relative to this example use case. In your application, take time to imagine the diverse set of questions available in your images to help your classification or regression task. The idea is to imagine as many questions as possible and then test them to make sure they do provide value-add.
Structure of original tabular data
As described in the source GitHub repo, the sample dataset contains 535 tabular records including four images per property. The following table illustrates the structure of the original tabular data.

Feature
Comment

Number of bedrooms
.

Number of bathrooms
.

Area (square feet)
.

ZIP Code
.

Price
This is the target variable to be predicted.

Structure of enhanced data
The following table illustrates the enhanced data structure, which contains several new features derived from the images.

Feature
Comment

Number of bedrooms
.

Number of bathrooms
.

Area (square feet)
.

Latitude
Computed by passing original ZIP code into Amazon Location Service. This is the centroid value for the ZIP.

Longitude
Computed by passing original ZIP code into Amazon Location Service. This is the centroid value for the ZIP.

Does the bedroom contain a vaulted ceiling?
0 = no; 1 = yes

Is the bathroom expensive?
0 = no; 1 = yes

Is the kitchen expensive?
0 = no; 1 = yes

Price
This is the target variable to be predicted.

Model training with SageMaker Canvas
A SageMaker Data Wrangler processing job fully prepares and makes the entire tabular training dataset available in Amazon S3. Next, SageMaker Canvas addresses the model building phase of the ML lifecycle. Canvas begins by opening the S3 training set. Being able to understand a model is often a key customer requirement. Without writing code, and within a few clicks, SageMaker Canvas provides rich, visual feedback on model performance. As seen in the screenshot in the following section, SageMaker Canvas shows the how single features inform the model.
Model trained with original tabular data and features derived from real-estate images
We can see from the following screenshot that features developed from images of the property were important. Based on these results, the question “Is this kitchen expensive” from the photo was more significant than “number of bedrooms” in the original tabular set, with feature importance values of 7.08 and 5.498, respectively.

The following screenshot provides important information about the model. First, the residual graph shows most points in the set clustering around the purple shaded zone. Here, two outliers were manually annotated outside SageMaker Canvas for this illustration. These outliers represent significant gaps between the true home value and the predicted value. Additionally, the R2 value, which has a possible range of 0–100%, is shown at 76%. This indicates the model is imperfect and doesn’t have enough information points to fully account for all the variety to fully estimate home values.
We can use outliers to find and propose additional signals to build a more comprehensive model. For example, these outlier properties may include a swimming pool or be located on large plots of land. The dataset didn’t include these features; however, you may be able to locate this data and train a new model with “has swimming pool” included as an additional feature. Ideally, on your next attempt, the R2 value would increase and the MAE and RMSE values would decrease.

Model trained without features derived from real-estate images
Finally, before moving to the next section, let’s explore if the features from the images were helpful. The following screenshot provides another SageMaker Canvas trained model without the features from the VQA model. We see the model error rate has increased, from an RMSE of 282K to an RMSE of 352K. From this, we can conclude that three simple questions from the images improved model accuracy by about 20%. Not shown, but to be complete, the R2 value for the following model deteriorated as well, dropping to a value of 62% from a value of 76% with the VQA features provided. This is an example of how SageMaker Canvas makes it straightforward to quickly experiment and use a data-driven approach that yields a model to serve your business need.

Looking ahead
Many organizations are becoming increasingly interested in foundation models, especially since general pre-trained transformers (GPTs) officially became a mainstream topic of interest in December 2022. A large portion of the interest in foundation models is centered on large language models (LLM) tasks; however, there are other diverse use cases available, such as computer vision and, more narrowly, the specialized VQA task described here.
This post is an example to inspire the use of multimodal data to solve industry use cases. Although we demonstrated the use and benefit of VQA in a regression model, it can also be used to label and tag images for subsequent search or business workflow routing. Imagine being able to search for properties listed for sale or rent. Suppose you want a find a property with tile floors or marble countertops. Today, you might have to get a long list of candidate properties and filter yourself by sight as you browse through each candidate. Instead, imagine being able to filter listings that contain these features—even if a person didn’t explicitly tag them. In the insurance industry, imagine the ability to estimate claim damages, or route next actions in a business workflow from images. In social media platforms, photos could be auto-tagged for subsequent use.
Summary
This post demonstrated how to use computer vision enabled by a foundation model to improve a classic ML use case using the SageMaker platform. As part of the solution proposed, we located a popular VQA model available on a public model registry and deployed it using a SageMaker endpoint for real-time inference.
Next, we used SageMaker Data Wrangler to orchestrate a workflow in which uniform questions were asked of the images in order to generate a rich set of tabular data. Finally, we used SageMaker Canvas to train a regression model. It’s important to note that the sample dataset was very simple and, therefore, imperfect by design. Even so, SageMaker Canvas makes it easy to understand model accuracy and seek out additional signals to improve the accuracy of a baseline model.
We hope this post has encouraged you use the multimodal data your organization may possess. Additionally, we hope the post has inspired you to consider model training as an iterative process. A great model can be achieved with some patience. Models that are near-perfect may be too good to be true, perhaps the result of target leakage or overfitting. An ideal scenario would begin with a model that is good, but not perfect. Using errors, losses, and residual plots, you can obtain additional data signals to increase the accuracy from your initial baseline estimate.
AWS offers the broadest and deepest set of ML services and supporting cloud infrastructure, putting ML in the hands of every developer, data scientist, and expert practitioner. If you’re curious to learn more about the SageMaker platform, including SageMaker Data Wrangler and SageMaker Canvas, please reach out to your AWS account team and start a conversation. Also, consider reading more about SageMaker Data Wrangler custom transformations.
References
Ahmed, E. H., & Moustafa, M. (2016). House price estimation from visual and textual features. IJCCI 2016-Proceedings of the 8th International Joint Conference on Computational Intelligence, 3, 62–68.
Harrison Jr, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of environmental economics and management, 5(1), 81-102.
Kim, W., Son, B. &amp; Kim, I.. (2021). ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research. 139:5583-5594.

About The Author
Charles Laughlin is a Principal AI/ML Specialist Solution Architect and works in the Amazon SageMaker service team at AWS. He helps shape the service roadmap and collaborates daily with diverse AWS customers to help transform their businesses using cutting-edge AWS technologies and thought leadership. Charles holds a M.S. in Supply Chain Management and a Ph.D. in Data Science.

Boosting Email Deliverability: 5 Strategies for DTC Marketers

The success of email marketing hinges on a fundamental metric: email deliverability. Studies reveal that nearly 20% of emails never reach their intended recipients.

This is pretty troubling for DTC marketers when you consider that the return on investment for email marketing is unmatched.

According to a Data & Marketing Association (DMA) 2019 report, the average ROI for email marketing in the United States was $42 for every dollar spent. Meanwhile, display ads, social media marketing and search advertising return estimates ranging from $2 to $10 returned for every dollar spent.

Mastering the art of ensuring messages land in the inbox is more crucial than ever. The potential for email marketing is hindered when emails languish in spam folders or, worse, go undelivered.

Today I’m going to show you how to improve email deliverability so you can get messages in prospects’ inboxes, run more successful email outreach campaigns, and improve lead generation.

So how can you get emails into the inbox so you can connect with your ideal prospects?

With a strategic approach, an understanding of the email delivery and spam algorithms, and the right email warmup tools, you can run an email outreach process that gets your messages at the top of the inbox and seen by your targeted ideal prospects.

Follow along with this guide to email deliverability:

What is Email Deliverability?

Why is Email Deliverability Important to DTC Marketers?

What is a Good Email Deliverability Rate?

What Affects Email Deliverability?

What are Some Emerging Trends in Email Deliverability?

How do You Ensure Email Deliverability?

How Can DTC Marketers Increase Email Deliverability Rates?

How to Increase Email Deliverability Rates with Customers.ai?

Email Deliverability FAQs

What is Email Deliverability?

Email deliverability measures the rate at which emails get delivered to recipients. Sales and marketing teams use this metric to make sure prospects receive their campaigns successfully.

But deliverability is about more than just if your email arrived or if it bounced back undelivered. It also refers to where your email landed—and whether it went to the inbox or to the spam folder.

Maintaining a high deliverability rate is critical for sales reps who use targeted outbound outreach. With good deliverability, you have a better chance of:

Building relationships with your target prospects

Getting prospects to respond to your offer

Generating and qualifying warm leads

Closing more deals for your business

Creating a healthy email account that you can use for future outreach

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Why is Email Deliverability Important to DTC Marketers?

Email deliverability is crucial for Direct-to-Consumer (DTC) marketers for several reasons:

Customer Engagement: Email is a primary channel for engaging with customers. Ensuring that emails reach the inbox enables DTC marketers to communicate directly with their audience, providing updates, promotions, and valuable content.

Brand Visibility: Deliverability impacts the visibility of your brand. If emails consistently land in spam folders, your brand may be overlooked, affecting brand recall and customer trust.

Campaign Effectiveness: Successful outreach campaigns rely on emails being seen by the intended recipients. Improving deliverability increases the chances of your marketing campaigns reaching the target audience and driving desired actions.

Conversion Rates: Emails that land in the inbox are more likely to be opened and acted upon. Improved deliverability contributes to higher open rates, click-through rates, and ultimately, better conversion rates for DTC marketers.

Customer Relationship Building: Email is a personal and direct communication channel. By ensuring emails are delivered, DTC marketers can build and nurture relationships with customers, fostering loyalty and repeat business.

Regulatory Compliance: Compliance with email marketing regulations, such as GDPR or CAN-SPAM, is crucial for DTC marketers. Good deliverability practices contribute to compliance and help avoid legal issues.

Data Insights: Reliable deliverability allows marketers to gather accurate data on the performance of their email campaigns. This data is valuable for refining strategies, understanding customer behavior, and optimizing future campaigns.

What is a Good Email Deliverability Rate?

For targeted outbound email outreach campaigns, Customers.ai aims for delivery rates of around 90%. That means the vast majority of emails land in the inbox, while just 10% end up hard or soft bounced or blocked altogether.

Although a 90% deliverability rate is good for targeted outbound emailing, it’s just one factor that defines a successful campaign. It’s also helpful to look at:

Open rate: the percentage of people who read your messages

Click rate: the percentage of people who visit links in your messages

Response rate: the percentage of people who respond to your messages

Bounce rate: the percentage of emails that return undelivered

Spam report rate: the percentage of people who file spam complaints with their email provider

What Affects Email Deliverability?

It’s easy to assume your deliverability rate is good, especially if you have a great offer and you’re targeting prospects who are a great fit for your product or service. Yet maintaining a good rate is a constant challenge.

Here are some of the key factors that affect the email deliverability metric:

Email account age: Did you just set up a new email account yesterday and start sending cold outreach right away? Brand new email addresses have no history, which can trigger spam filters. To warm up a new account, use the tutorial below.

Sending volume: Are you planning to send hundreds of cold emails per day? New email accounts typically have a limit of about 100 emails per day. If you attempt to send more, you could run into deliverability issues.

Send times: Did you mass email your entire prospect list at once? Sending all your cold outreach for the day at the same time can compromise deliverability, especially for large sales outreach campaigns.

Email server type: Are you using a POP3 server or web-based email to run outreach campaigns? Without the right type of email server, your emails could run into problems with inconsistent delivery.

Email authentication: Have your emails been authenticated? If you aren’t using tools like SPF and DKIM to authenticate your emails, your recipients’ ISPs may think your messages aren’t legitimate.

Engagement rates: Do your prospects ever open, click on, or respond to your cold emails? If your engagement rates are low, your emails could get flagged as spam—which can impact deliverability.

Sender reputation: Both IP and domain reputations act as critical factors in email deliverability. A positive sender reputation, significantly influences whether emails reach the intended inbox or get flagged as spam. Maintaining a favorable sender reputation is imperative.

What Are Some Emerging Trends in Email Deliverability?

AI-Powered Predictive Analytics: As the digital landscape evolves, AI-driven predictive analytics is emerging as a game-changer in email deliverability. Harnessing the power of artificial intelligence, marketers can now anticipate potential deliverability issues before they arise. By analyzing vast datasets, AI identifies patterns, refines targeting strategies, and optimizes content, ensuring messages consistently reach the right audience with precision.

Interactive Email Experiences: The future of email deliverability lies in interactivity. Marketers are increasingly incorporating dynamic, interactive elements into their emails, transforming static messages into engaging experiences. From clickable carousels to live polls, interactive content not only captivates recipients but also enhances the overall user experience. As ISPs recognize the value of engaged users, emails with interactive elements are more likely to land in the coveted inbox.

Enhanced Email Authentication Protocols: With cyber threats on the rise, email authentication protocols are evolving to bolster security and trust. DMARC (Domain-based Message Authentication, Reporting, and Conformance) adoption is becoming more widespread, providing an additional layer of protection against phishing and unauthorized use of domains. As ISPs prioritize authenticated emails, adhering to these protocols is integral to maintaining a positive sender reputation.

Leveraging AI for Personalization: In the era of hyper-personalization, leveraging AI for personalization is a trend reshaping email deliverability. Artificial intelligence not only refines audience segmentation but also dynamically tailors content based on individual preferences and behaviors. As recipients receive more relevant and personalized content, the likelihood of engagement increases, positively impacting deliverability metrics.

Privacy Compliance and Its Impact on Deliverability: As privacy regulations tighten globally, adherence to compliance standards is emerging as a pivotal factor in email deliverability. Marketers navigating the landscape of data protection laws, such as GDPR and CCPA, are not only ensuring legal compliance but also building trust with their audience. Prioritizing privacy compliance is essential for maintaining a positive sender reputation and fostering a secure email environment.

How Do You Ensure Email Deliverability?

With a smart strategy, you can increase the chance of getting your emails in prospects’ inboxes.

Here are 5 email deliverability tips from Janine Du Toit, Customers.ai Director of Marketing Operations, and resident email deliverability expert:

Use a SMTP Server: Always use a SMTP server for cold email marketing. She likens SMTP servers to phone books for email domain senders—using one with a good reputation can boost deliverability.

Moderate Send Volume: Even when you use a SMTP mail server, it’s important to moderate your send volume and your send times. Start your campaign slowly (no more than 100 cold emails per day) and send out emails at intermittent times throughout the day.

Use Multiple Senders: Use multiple email senders if you plan to launch a larger campaign and need to send more than 100 emails per day.

Authenticate Your Emails: Don’t forget to authenticate your emails using a DKIM and SPF checker. With these tools, you’ll be able to send emails more reliably and get better inbox placement.

Generate Engagement! Create cold email campaigns that get opens, clicks, and responses. Every time a prospect opens your email, clicks a link, or responds to your offer, the engagement boosts the health of your email account.

Find Your Customers with Customers.ai

See the new Customers.ai sales outreach automation platform. With X-Ray Website Visitor Detection and automated email and SMS outreach to help mid-market companies find their perfect customers. Request a call to learn more about sales outreach automation today!

Request My Free Demo

How Can DTC Marketers Increase Email Deliverability Rates?

Embarking on the journey of elevating email deliverability rates is a mission-critical endeavor for DTC marketers. From mastering the art of personalization to optimizing send frequency, the is not just inbox placement, but genuine audience engagement. Here are a few tips for superior email deliverability.

1. Focus on Segmentation and Personalization

Tailoring content for different audiences and incorporating personalized subject lines not only enhances engagement but also lays the foundation for improved deliverability. This can be done in several ways.

Tailor Content to Different Audiences: In the realm of Direct-to-Consumer (DTC) email marketing, one size rarely fits all. For example, someone looking for a pair of women’s jeans likely doesn’t want to be emailed about a men’s sneaker sale. Implementing audience segmentation allows you to tailor your content based on distinct customer profiles. By understanding the unique preferences and behaviors of different segments, you can craft messages that resonate more deeply, increasing the likelihood of engagement and, consequently, positively impacting deliverability. With Customers.ai, it’s easy to identify which visitors landed on which pages and segment them into specific audiences.

Personalize Subject Lines: Elevate your email game by embracing the power of personalized subject lines. DTC marketers can significantly enhance open rates by incorporating dynamic content, recipient names, or targeted offers into subject lines. A personalized touch not only captures attention but also signals to recipients that your message is tailored to their interests, bolstering the chances of your emails making it to the inbox.

2. Quality Content and Engagement

In the dynamic landscape of DTC marketing, the value of quality content cannot be overstated. Your email content has to be relevant and it has to be valuable – actively encouraging subscriber interaction. These practices not only captivate your audience but also contribute to a robust sender reputation, fortifying your path to superior email deliverability.

Create Relevant and Valuable Content: The backbone of any successful email campaign lies in the quality of its content. DTC marketers must prioritize crafting content that is not only relevant but also valuable to their audience. Providing insightful information, exclusive offers, or personalized recommendations ensures that subscribers find genuine value in your emails, fostering a positive sender reputation and bolstering deliverability. Additionally, make sure it’s personalized! With a tool like the Customers.ai AI writer, you can create something that is not only interesting but also completely relevant! Check out how it works:

Encourage Subscriber Interaction: Boosting subscriber interaction is a strategic move towards improved deliverability. DTC marketers can implement engagement-driving elements such as clickable calls-to-action, surveys, or interactive content. When recipients actively engage with your emails, it sends positive signals to ISPs, reinforcing your sender reputation and increasing the chances of future messages reaching the inbox.

Leverage User-Generated Content: Harness the power of your community by incorporating user-generated content into your emails. DTC marketers can enhance authenticity and engagement by showcasing real customer experiences. UGC not only adds a personal touch to your emails but also contributes positively to your sender reputation.

3. Optimize Send Frequency and Timing

Mastering the art of email timing is a game-changer in the dynamic landscape of DTC marketing. Your email frequency and send times play a crucial role in captivating your audience without overwhelming them. Striking the perfect balance ensures that your messages are not only noticed but also engaged with, contributing to improved deliverability.

Find the Sweet Spot for Send Frequency: DTC marketers must navigate the delicate balance between staying top-of-mind and avoiding email fatigue. Experiment with different send frequencies to identify the optimal cadence for your audience segments. By tailoring your approach to subscriber preferences, you not only enhance engagement but also solidify your position in their inbox.

Perfect Your Timing with Analytics: Leverage analytics tools to decipher the ideal moments for sending emails to different segments of your audience. By understanding when your subscribers are most active, you can strategically time your campaigns, maximizing the chances of inbox visibility and interaction.

Optimizing your send frequency and timing is a nuanced process that requires continuous refinement but is super important to email deliverability success.

4. Maintain a Clean and Updated Subscriber List

In the dynamic world of DTC email marketing, a clean and updated subscriber list is a must. Ensuring your list is pristine not only enhances the relevance of your campaigns but also prevents potential deliverability pitfalls.

Regular List Purge for Peak Performance: DTC marketers, take note—regularly auditing and purging your subscriber list is a proactive step toward optimal deliverability. Identify and remove inactive or disengaged subscribers to maintain a list populated with genuinely interested recipients. This meticulous curation not only improves open rates but also signals to ISPs that your content is eagerly anticipated.

Update Contacts to Stay Relevant: The key to a thriving DTC email strategy lies in relevance. Regularly update subscriber details to reflect changes in preferences, ensuring your campaigns remain aligned with evolving customer needs. By keeping your contact information current, you not only minimize bounce rates but also foster a positive sender reputation.

Embrace Automation for Efficiency: Streamline list maintenance with automation tools that effortlessly manage subscriber updates and removals. Automation not only saves time but also ensures your list is consistently refined, allowing you to focus on delivering content that truly resonates with your engaged audience.

5. Monitor and Address Deliverability Issues Promptly

Vigilance is the linchpin of successful email deliverability in the world of DTC marketing. Proactive monitoring and swift resolution of deliverability issues are paramount to maintaining a pristine sender reputation and ensuring your messages consistently reach the intended audience.

Regularly Assess Key Metrics: DTC marketers, make it a habit to regularly assess critical deliverability metrics. Keep a close eye on bounce rates, spam complaints, and open rates to detect any anomalies. By identifying issues early on, you can address them swiftly, preventing potential setbacks to your sender reputation.

Investigate Bounces and Complaints: Bounces and spam complaints are red flags that demand immediate attention. Dive deep into the reasons behind bounces and proactively investigate spam complaints. Understanding the root causes allows you to rectify issues promptly, demonstrating to ISPs that you are committed to delivering valuable, solicited content.

Utilize Email Authentication Protocols: Implement and regularly review email authentication protocols like SPF, DKIM, and DMARC. These protocols not only enhance email security but also contribute to better deliverability. Regular checks ensure that your emails are not only reaching inboxes but are also protected from phishing attempts and unauthorized use of your domain.

The journey to increased deliverability rates is an ongoing commitment. By integrating personalized strategies, maintaining a clean subscriber list, and promptly addressing issues, DTC marketers can forge a path to sustained success.

How to Increase Email Deliverability Rates with Customers.ai

Improving email deliverability doesn’t have to be as difficult as you might think. With Customers.ai’s email warmup tools, you can automatically improve deliverability and avoid bounced emails as you send cold outreach campaigns.

Here’s a 3-step process to increase email deliverability to 90% using Customers.ai Website Visitor ID X-Ray tool and AI Email Writer to create targeted outbound emails to a warm audience:

Start Collecting Email Addresses

Set Up Your Email Automation

Warm Up Your Email Accounts

Step #1: Start Collecting Email Addresses

The first step is adding the Customers.ai Website Visitor X-Ray ID Pixel. In 90 seconds, you can start collecting the email addresses of the people who visit your website. Turn anonymous visitors into a warm audience you can use! The Website Visitor ID X-Ray is easy to add – we have several ways to install it including GTM, Shopify, WordPress, and more.

In the dashboard, navigate to My Automations, select + New Automation, and get your pixel. Make sure to test it to verify it’s working. 

Step #2: Set Up Your Email Automation

Once you have the X-Ray pixel working and have connected your email, you can start setting up automations to target this particular audience. On your Customers.ai dashboard, click the + Automation button in the main menu, and select X-Ray Lead Generation Pixel.

For enhanced targeting, you can create both automations and emails by landing page.

Step #3: Warm Up Your Email Accounts

It’s important to remember that your X-Ray list is not cold. These are actual users who have visited your site. To help with deliverability, Customers.ai will automatically detect if an email is active, ensuring you are not wasting email sends or seeing a high number of bounces.

Once you have your automation set up and your email configured, you can start sending!

Boost Email Deliverability With the Website Visitor ID X-Ray Pixel

Hear that? It’s the sound of your emails landing in inboxes and getting responses from your target prospects.

With Customers.ai’s Website Visitor ID X-Ray tool and our email warmup capabilities, your team can generate more revenue.

Curious how these tools can benefit your cold outreach efforts? Get started now with a free trial of Customers.ai.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 50 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Email Deliverability FAQs

Q. What factors affect email deliverability?

Email deliverability is influenced by various factors, including sender reputation, content quality, engagement rates, authentication protocols (SPF, DKIM, DMARC), and list hygiene.

Q. How can I improve my email deliverability?

Focus on maintaining a positive sender reputation, sending relevant content, engaging with your audience, authenticating your emails using SPF, DKIM, and DMARC, and regularly cleaning your email list to remove inactive or bouncing addresses.

Q. Why are my emails going to the spam folder?

Emails can land in the spam folder due to poor sender reputation, spammy content, low engagement, or insufficient authentication. Ensure your emails comply with best practices and monitor your sender reputation.

Q. What is sender reputation, and how is it measured?

Sender reputation is a score assigned to a sender’s domain or IP address based on factors like email engagement, spam complaints, and bounce rates. Internet Service Providers (ISPs) use this score to determine whether to deliver an email to the inbox or mark it as spam.

Q. What role does customer engagement play in DTC email deliverability?

DTC brands heavily rely on customer engagement. High engagement rates, such as opens, clicks, and purchases, contribute positively to sender reputation, enhancing the chances of DTC emails reaching the inbox.

Q. How can DTC brands personalize emails without compromising deliverability?

Personalization is key for DTC marketing. Ensure you have clean and segmented lists, use dynamic content based on customer preferences, and avoid excessive personalization that might trigger spam filters.

Q. How should DTC marketers handle re-engagement campaigns to maintain email deliverability?

Implement re-engagement campaigns to win back inactive subscribers. Clearly communicate the value proposition, offer incentives, and, if necessary, provide an easy way for subscribers to opt-out to maintain list hygiene.

Q. What steps can DTC marketers take to avoid spam traps and maintain a clean email list?

Regularly clean your email list by removing inactive or bouncing addresses. Use confirmed opt-ins and monitor engagement metrics. Avoid purchasing email lists, as they may contain spam traps, and implement double opt-in processes to verify subscriber intent. Regularly update your list hygiene practices to align with industry standards.

Q. What role does a sender’s reputation play in email deliverability, and how can it impact overall email deliverability rates?

A sender’s reputation is a crucial factor in email deliverability. It is a measure of the sender’s trustworthiness based on their sending history. A positive reputation enhances email deliverability, ensuring that emails land in the inbox rather than being flagged as spam.

Q. What are the common challenges faced by marketers in maintaining high email deliverability, and how can they overcome these challenges?

Marketers often face challenges in maintaining high email deliverability due to factors such as changing ISP algorithms, evolving spam filters, and subscriber engagement fluctuations. Overcoming these challenges involves staying updated on industry best practices, regularly monitoring and adjusting sending strategies, and maintaining a clean and engaged email list.

Q. What impact does the quality of email content have on email deliverability?

The quality of email content significantly influences email deliverability. Marketers should create relevant, engaging, and non-spammy content. Avoiding trigger words, optimizing HTML code, and personalizing content based on subscriber preferences are best practices that contribute to improved email deliverability.

Q. How can DTC marketers identify and prevent issues that may hurt email deliverability?

Proactively monitor metrics like bounce rates and spam complaints. Address issues promptly, remove inactive subscribers, and stay compliant with regulations.

Q. How do email deliverability rates impact the success of an email marketing campaign?

Deliverability rates directly impact campaign success. Improve rates with audience segmentation, personalized content, optimal send times, and regular list maintenance.

Q. What are the consequences of neglecting efforts to improve email deliverability?

Neglecting efforts can result in spam marking and damage to reputation. Prioritize ongoing education, invest in tools, and allocate resources for list maintenance and compliance.
The post Boosting Email Deliverability: 5 Strategies for DTC Marketers appeared first on Customers.ai.

Microsoft’s Azure AI Model Catalog Expands with Groundbreaking Artif …

Microsoft has unveiled a significant expansion of its Azure AI Model Catalog, incorporating a range of foundation and generative AI models. This move marks a major advancement in the field of artificial intelligence, bringing together diverse and innovative technologies.

Diverse Additions to the AI Catalog

The Azure AI Model Catalog now includes 40 new models and introduces 4 new modalities, including text-to-image and image embedding capabilities. Key additions are:

Stable Diffusion Models: Developed by Stability AI and CompVis, these models excel in text-to-image and inpainting tasks, offering a robust and consistent output for creative content generation​​.

Falcon Models from TII: Featuring 7 and 40 billion parameters, Falcon models are optimized for inferencing and surpass many open-source models in performance​​.

Code Llama from Meta: A range of generative text models designed to assist in coding tasks, ranging from 7 to 34 billion parameters​​.

NVIDIA Nemotron: This 8-billion parameter model from NVIDIA offers a variety of functionalities, including chat and Q&A, compatible with the NVIDIA NeMo Framework​​.

SAM (Segment Anything Model) from Meta: An image segmentation tool capable of creating high-quality object masks from simple input prompts​​.

Models as a Service (MaaS)

In a strategic move, Microsoft has also introduced the concept of Models as a Service (MaaS). This service will enable professional developers to integrate AI models like Llama 2 from Meta, Command from Cohere, Jais from G42, and Mistral’s premium models as an API endpoint into their applications. This integration process simplifies the complexity of resource provisioning and hosting management for developers​​.

Innovative Models Highlight

Jais: A 13-billion parameter model developed by G42, trained on a dataset of 395 billion tokens, including 116 billion Arabic tokens. Jais is a significant stride forward for AI in the Arabic world​​.

Mistral: A large language model with 7.3 billion parameters, known for its faster inference and longer response sequences due to its grouped query attention and sliding window attention features​​.

Phi Models: Including Phi-1-5 and Phi-2, these transformers demonstrate improved reasoning capabilities and safety measures, with applications in a variety of fields from writing to logical reasoning​​​​.

Future-Focused Innovations

Microsoft’s commitment to responsible AI and cutting-edge innovation is evident in its approach to AI model development and integration. The company has focused on optimizing the user experience in the Azure AI model catalog and improving GPU utilization for better performance​​​​. Furthermore, Microsoft has introduced new optimizations for fine-tuning large LLMs, addressing the challenges associated with GPU memory requirements​​.

In conclusion, Microsoft’s expansion of the Azure AI Model Catalog represents a significant leap in AI technology, offering a diverse array of models and services. These advancements not only cater to the needs of professional developers but also pave the way for innovative applications across various sectors.
The post Microsoft’s Azure AI Model Catalog Expands with Groundbreaking Artificial Intelligence Models appeared first on MarkTechPost.

Unlock Advancing AI Video Understanding with MM-VID for GPT-4V(ision)

Across the globe, individuals create myriad videos daily, including user-generated live streams, video-game live streams, short clips, movies, sports broadcasts, and advertising. As a versatile medium, videos convey information and content through various modalities, such as text, visuals, and audio. Developing methods capable of learning from these diverse modalities is crucial for designing cognitive machines with enhanced capabilities to analyze uncurated real-world videos, transcending the limitations of hand-curated datasets.

However, the richness of this representation introduces numerous challenges for exploring video understanding, particularly when confronting extended-duration videos. Grasping the nuances of long videos, especially those exceeding an hour, necessitates sophisticated methods of analyzing images and audio sequences across multiple episodes. This complexity increases with the need to extract information from diverse sources, distinguish speakers, identify characters, and maintain narrative coherence. Furthermore, answering questions based on video evidence demands a deep comprehension of the content, context, and subtitles.

In live streaming and gaming video, additional challenges emerge in processing dynamic environments in real-time, requiring semantic understanding and the ability to engage in long-term strategic planning.

In recent times, considerable progress has been achieved in large pre-trained and video-language models, showcasing their proficient reasoning capabilities for video content. However, these models are typically trained on concise clips (e.g., 10-second videos) or predefined action classes. Consequently, these models may encounter limitations in providing a nuanced understanding of intricate real-world videos.

The complexity of understanding real-world videos involves identifying individuals in the scene and discerning their actions. Furthermore, pinpointing these actions is necessary, specifying when and how these actions occur. Additionally, it necessitates recognizing subtle nuances and visual cues across different scenes. The primary objective of this work is to confront these challenges and explore methodologies directly applicable to real-world video understanding. The approach involves deconstructing extended video content into coherent narratives, subsequently employing these generated stories for video analysis.

Recent strides in Large Multimodal Models (LMMs), such as GPT-4V(ision), have marked significant breakthroughs in processing both input images and text for multimodal understanding. This has spurred interest in extending the application of LMMs to the video domain. The study reported in this article introduces MM-VID, a system that integrates specialized tools with GPT-4V for video understanding. The overview of the system is illustrated in the figure below.

Upon receiving an input video, MM-VID initiates multimodal pre-processing, encompassing scene detection and automatic speech recognition (ASR), to gather crucial information from the video. Subsequently, the input video is segmented into multiple clips based on the scene detection algorithm. GPT-4V is then employed, utilizing clip-level video frames as input to generate detailed descriptions for each video clip. Finally, GPT-4 produces a coherent script for the entire video, conditioned on clip-level video descriptions, ASR, and available video metadata. The generated script empowers MM-VID to execute a diverse array of video tasks.

Some examples taken from the study are reported below.

This was the summary of MM-VID, a novel AI system integrating specialized tools with GPT-4V for video understanding. If you are interested and want to learn more about it, please feel free to refer to the links cited below. 

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Unlock Advancing AI Video Understanding with MM-VID for GPT-4V(ision) appeared first on MarkTechPost.

Humane Launches Revolutionary AI-Powered Wearable: The AI Pin

In a bold move that could redefine personal computing, Humane, a company founded by former Apple designers, has unveiled the AI Pin, a wearable device integrating advanced artificial intelligence. Priced at $699, the AI Pin is ready for pre-order on November 16, with a subscription model for added services.

Design and Functionality

The AI Pin, small and square-shaped, magnetically attaches to clothing, doubling as a hidden battery pack. Despite its compact size, the device is striking, featuring a smooth glass surface and aluminum housing.

Innovative Features

Powered by a Qualcomm chip, the AI Pin uses smart sensors responsive to gestures and is equipped with a Snapdragon processor, 4GB RAM, and 32GB storage. Notably, it activates by touch rather than always listening, enhancing privacy.

The AI Pin’s standout feature is its integration with OpenAI’s GPT-4, enabling it to provide personalized responses and perform tasks like call handling and real-time language translation. Its camera can recognize objects and project information onto the user’s hand, creating a screenless display.

Privacy and Connectivity

Emphasizing privacy, Humane assures that user data will not train the AI but only enhance personalization. The device supports Bluetooth 5.1, allowing for private conversations with headsets.

Subscription and Services

The AI Pin comes with a $24 monthly subscription, offering unlimited calling, texting, and data through T-Mobile. Users can manage the device through Humane’s website, syncing contacts and accessing services like music.

Funding and Vision

Humane has raised $230 million, reflecting significant investor confidence. The company’s vision, as articulated by co-founder Imran Chaudhri, is to usher in a new era of seamless, screenless, and sensing personal computing.

Implications and Expectations

The AI Pin’s launch represents a significant leap in wearable technology, potentially setting new standards in how we interact with devices and AI. Its unique combination of features, along with Humane’s commitment to privacy, positions it as a notable contender in the evolving landscape of personal technology.
The post Humane Launches Revolutionary AI-Powered Wearable: The AI Pin appeared first on MarkTechPost.