February 2024 - Page 4 of 8

Meet TravelPlanner: A Comprehensive AI Benchmark Designed to Evaluate …

Posted on February 17, 2024 by i-genie

One of the most intriguing challenges is enabling AI agents to emulate human-like planning abilities. Such capabilities would allow these agents to navigate complex, real-world scenarios, a largely unmastered task. Traditional AI planning efforts have primarily focused on controlled environments with predictable variables and outcomes. However, the unpredictable nature of real-world settings, with their myriad constraints and variables, demands a far more sophisticated approach to planning.

Researchers from Fudan University, Ohio State University, and Pennsylvania State University, Meta AI have developed TravelPlanner, a comprehensive benchmark designed to assess AI agents’ planning skills in more lifelike situations. TravelPlanner is not just another dataset; it’s a meticulously crafted testbed that simulates the multifaceted task of planning travel. It challenges AI agents with a scenario many humans routinely handle: organizing a multi-day travel itinerary. This involves balancing various factors within a user’s specified needs, such as budget constraints, accommodation preferences, and transportation logistics.

The brilliance of TravelPlanner provides a sandbox environment enriched with nearly four million data records, including detailed information on cities, attractions, accommodations, and more. AI agents must use this wealth of data to craft travel plans that adhere to predefined constraints, such as staying within budget or selecting pet-friendly accommodations. This process requires the agent to engage in a series of decision-making steps, from choosing the right information-gathering tools to synthesizing the collected data into a coherent plan.

Despite the sophistication of current AI technologies, agents’ performance on the TravelPlanner benchmark has been notably modest. For instance, even advanced models like GPT-4, equipped with state-of-the-art language processing capabilities, achieved a success rate of only 0.6%. This result underscores the considerable gap between AI’s current planning capabilities and the demands of real-world task management. While AI can understand and generate human-like text to some great extent, translating this understanding into practical, real-world planning actions is a different challenge altogether.

The introduction of TravelPlanner represents a pivotal moment in AI research. It shifts the focus from traditional, constrained planning tasks to the broader, more complex domain of real-world problem-solving. This benchmark highlights the limitations of current AI models in handling dynamic, multifaceted planning tasks and sets a new direction for future research. By tackling the challenges presented by TravelPlanner, researchers can push the boundaries of what AI agents can achieve, moving closer to creating AI that can navigate the complexities of the real world with the same ease as humans.

In conclusion, TravelPlanner offers a unique and challenging platform for advancing AI planning capabilities. Its introduction into the field is a benchmark for AI performance and a beacon guiding future efforts. As AI continues to evolve, the quest to bridge the gap between theoretical planning models and their practical application in real-world scenarios remains a key frontier in research. TravelPlanner is at the forefront of this exciting journey.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet TravelPlanner: A Comprehensive AI Benchmark Designed to Evaluate the Planning Abilities of Language Agents in Real-World Scenarios Across Multiple Dimensions appeared first on MarkTechPost.

Meet Functionary: A Language Model that can Interpret and Execute Func …

Posted on February 17, 2024 by i-genie

MeetKai has been a notable player in the dynamic field of conversational AI since its establishment in 2018. Originally known as a “Conversational Intelligence Company,” the company has undergone significant changes. The recent shift in focus from general language models (LLMs) to function calling led to the introduction of Functionary, an open-source LLM.

Functionary is a meticulously designed system for interpreting and executing functions or plugins. Its notable feature lies in its ability to activate functions judiciously, understand their output, and trigger them only when necessary. The vision for Functionary is clear – it is exclusively crafted for function calling, enabling distinct applications beyond the conventional use of language models.

The rationale behind choosing Functionary lies in its capability to empower LLMs to interact with the outside world beyond their internal knowledge base. It is particularly valuable for applications requiring up-to-date information and integration with external productivity tools. While larger models like GPT-4 may be comprehensive, they can be slow and expensive. Functionary offers a targeted solution, providing faster and more cost-effective inference with accuracy surpassing GPT-3.5 and approaching GPT-4.

Functionary’s drop-in compatibility is a noteworthy feature. It seamlessly integrates with OpenAI’s platform, allowing for effortless migration for existing users. Developers can incorporate Functionary into their systems, call Python functions, and extract results for responsive interactions. With support for platforms like PyTorch and vLLM, as well as tools like chatlab, Functionary integration becomes straightforward.

The introduction of Functionary aligns with MeetKai’s overarching vision for the metaverse. The company envisions a future where the metaverse goes beyond the digital realm and becomes an integral extension of our reality. By leveraging AI to create specific worlds, MeetKai aims to empower users and developers to craft unique experiences. Functionary represents the company’s commitment to this mission, enhancing AI-human interactions and elevating metaverse experiences. The ongoing efforts include:

Developing more advanced versions of the model.

Ensuring faster inference times.

Comprehensive integration across the platform.

The researchers invite Developers, innovators, and visionaries to join MeetKai to shape the future of applied Generative AI.

The post Meet Functionary: A Language Model that can Interpret and Execute Functions/Plugins appeared first on MarkTechPost.

Unveiling EVA-CLIP-18B: A Leap Forward in Open-Source Vision and Multi …

Posted on February 17, 2024 by i-genie

In recent years, LMMs have rapidly expanded, leveraging CLIP as a foundational vision encoder for robust visual representations and LLMs as versatile tools for reasoning across various modalities. However, while LLMs have grown to over 100 billion parameters, the vision models they rely on need to be bigger, hindering their potential. Scaling up contrastive language-image pretraining (CLIP) is essential to enhance both vision and multimodal models, bridging the gap and enabling more effective handling of diverse data types.

Researchers from the Beijing Academy of Artificial Intelligence and Tsinghua University have unveiled EVA-CLIP-18B, the largest open-source CLIP model yet, boasting 18 billion parameters. Despite training on just 6 billion samples, it achieves an impressive 80.7% zero-shot top-1 accuracy across 27 image classification benchmarks, surpassing prior models like EVA-CLIP. Notably, this advancement is achieved with a modest dataset of 2 billion image-text pairs, openly available and smaller than those used in other models. EVA-CLIP-18B showcases the potential of EVA-style weak-to-strong visual model scaling, with hopes of fostering further research in vision and multimodal foundation models.

EVA-CLIP-18B is the largest and most powerful open-source CLIP model, with 18 billion parameters. It outperforms its predecessor EVA-CLIP (5 billion parameters) and other open-source CLIP models by a large margin in terms of zero-shot top-1 accuracy on 27 image classification benchmarks. The principles of EVA and EVA-CLIP guide the scaling-up procedure of EVA-CLIP-18B. The EVA philosophy follows a weak-to-strong paradigm, where a small EVA-CLIP model serves as the vision encoder initialization for a larger EVA-CLIP model. This iterative scaling process stabilizes and accelerates the training of larger models.

EVA-CLIP-18B, an 18-billion-parameter CLIP model, is trained on a 2 billion image-text pairs dataset from LAION-2B and COYO-700M. Following the EVA and EVA-CLIP principles, it employs a weak-to-strong paradigm, where a smaller EVA-CLIP model initializes a larger one, stabilizing and expediting training. Evaluation across 33 datasets, including image and video classification and image-text retrieval, demonstrates its efficacy. The scaling process involves distilling knowledge from a small EVA-CLIP model to a larger EVA-CLIP, with the training dataset mostly fixed to showcase the effectiveness of the scaling philosophy. Notably, the approach yields sustained performance gains, exemplifying the effectiveness of progressive weak-to-strong scaling.

EVA-CLIP-18B, boasting 18 billion parameters, showcases outstanding performance across various image-related tasks. It achieves an impressive 80.7% zero-shot top-1 accuracy across 27 image classification benchmarks, surpassing its predecessor and other CLIP models by a significant margin. Moreover, linear probing on ImageNet-1K outperforms competitors like InternVL-C with an average top-1 accuracy of 88.9. Zero-shot image-text retrieval on Flickr30K and COCO datasets achieves an average recall of 87.8, significantly surpassing competitors. EVA-CLIP-18B exhibits robustness across different ImageNet variants, demonstrating its versatility and high performance across 33 widely used datasets.

In conclusion, EVA-CLIP-18B is the largest and highest-performing open-source CLIP model, boasting 18 billion parameters. Applying EVA’s weak-to-strong vision scaling principle achieves exceptional zero-shot top-1 accuracy across 27 image classification benchmarks. This scaling approach consistently improves performance without reaching saturation, pushing the boundaries of vision model capabilities. Notably, EVA-CLIP-18B exhibits robustness in visual representations, maintaining performance across various ImageNet variants, including adversarial ones. Its versatility and effectiveness are demonstrated across multiple datasets, spanning image classification, image-text retrieval, and video classification tasks, marking a significant advancement in CLIP model capabilities.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Unveiling EVA-CLIP-18B: A Leap Forward in Open-Source Vision and Multimodal AI Models appeared first on MarkTechPost.

Code Llama 70B is now available in Amazon SageMaker JumpStart

Posted on February 17, 2024 by i-genie

Today, we are excited to announce that Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Code Llama model via SageMaker JumpStart.
Code Llama
Code Llama is a model released by Meta that is built on top of Llama 2. This state-of-the-art model is designed to improve productivity for programming tasks for developers by helping them create high-quality, well-documented code. The models excel in Python, C++, Java, PHP, C#, TypeScript, and Bash, and have the potential to save developers’ time and make software workflows more efficient.
It comes in three variants, engineered to cover a wide variety of applications: the foundational model (Code Llama), a Python specialized model (Code Llama Python), and an instruction-following model for understanding natural language instructions (Code Llama Instruct). All Code Llama variants come in four sizes: 7B, 13B, 34B, and 70B parameters. The 7B and 13B base and instruct variants support infilling based on surrounding content, making them ideal for code assistant applications. The models were designed using Llama 2 as the base and then trained on 500 billion tokens of code data, with the Python specialized version trained on an incremental 100 billion tokens. The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.
The model is made available under the same community license as Llama 2.
Foundation models in SageMaker
SageMaker JumpStart provides access to a range of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and are adaptable to a wide category of use cases, such as text summarization, digital art generation, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.
You can find foundation models from different model providers within SageMaker JumpStart, enabling you to get started with foundation models quickly. You can find foundation models based on different tasks or model providers, and easily review model characteristics and usage terms. You can also try out these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you can rest assured that your data, whether used for evaluating or using the model at scale, is never shared with third parties.
Discover the Code Llama model in SageMaker JumpStart
To deploy the Code Llama 70B model, complete the following steps in Amazon SageMaker Studio:

On the SageMaker Studio home page, choose JumpStart in the navigation pane.
Search for Code Llama models and choose the Code Llama 70B model from the list of models shown. You can find more information about the model on the Code Llama 70B model card. The following screenshot shows the endpoint settings. You can change the options or use the default ones.
Accept the End User License Agreement (EULA) and choose Deploy. This will start the endpoint deployment process, as shown in the following screenshot.

Deploy the model with the SageMaker Python SDK
Alternatively, you can deploy through the example notebook by choosing Open Notebook within model detail page of Classic Studio. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=”meta-textgeneration-llama-codellama-70b”)
predictor = model.deploy(accept_eula=False) # Change EULA acceptance to True

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. Note that by default, accept_eula is set to False. You need to set accept_eula=True to deploy the endpoint successfully. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement.
Invoke a SageMaker endpoint
After the endpoint is deployed, you can carry out inference by using Boto3 or the SageMaker Python SDK. In the following code, we use the SageMaker Python SDK to call the model for inference and print the response:

def print_response(payload, response):
print(payload[“inputs”])
print(f”> {response[0][‘generated_text’]}”)
print(“n==================================n”)

The function print_response takes a payload consisting of the payload and model response and prints the output. Code Llama supports many parameters while performing inference:

max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
max_new_tokens – The model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
num_beams – This specifies the number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature – This controls the randomness in the output. Higher temperature results in an output sequence with low-probability words, and lower temperature results in an output sequence with high-probability words. If temperature is 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping – If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be Boolean.
do_sample – If True, the model samples the next word as per the likelihood. If specified, it must be Boolean.
top_k – In each step of text generation, the model samples from only the top_k most likely words. If specified, it must be a positive integer.
top_p – In each step of text generation, the model samples from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.
return_full_text – If True, the input text will be part of the output generated text. If specified, it must be Boolean. The default value for it is False.
stop – If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

You can specify any subset of these parameters while invoking an endpoint. Next, we show an example of how to invoke an endpoint with these arguments.
Code completion
The following examples demonstrate how to perform code completion where the expected endpoint response is the natural continuation of the prompt.
We first run the following code:

prompt = “””
import socket

def ping_exponential_backoff(host: str):
“””

payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 256, “temperature”: 0.2, “top_p”: 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

“””
Pings the given host with exponential backoff.
“””
timeout = 1
while True:
try:
socket.create_connection((host, 80), timeout=timeout)
return
except socket.error:
timeout *= 2

For our next example, we run the following code:

prompt = “””
import argparse
def main(string: str):
print(string)
print(string[::-1])
if __name__ == “__main__”:
“””

payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 256, “temperature”: 0.2, “top_p”: 0.9},
}
predictor.predict(payload)

We get the following output:

parser = argparse.ArgumentParser(description=’Reverse a string’)
parser.add_argument(‘string’, type=str, help=’String to reverse’)
args = parser.parse_args()
main(args.string)

Code generation
The following examples show Python code generation using Code Llama.
We first run the following code:

prompt = “””
Write a python function to traverse a list in reverse.
“””

payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 256, “temperature”: 0.2, “top_p”: 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def reverse(list1):
for i in range(len(list1)-1,-1,-1):
print(list1[i])

list1 = [1,2,3,4,5]
reverse(list1)

For our next example, we run the following code:

prompt = “””
Write a python function to to carry out bubble sort.
“””

payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 256, “temperature”: 0.1, “top_p”: 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr

arr = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(arr))

These are some of the examples of code-related tasks using Code Llama 70B. You can use the model to generate even more complicated code. We encourage you to try it using your own code-related use cases and examples!
Clean up
After you have tested the endpoints, make sure you delete the SageMaker inference endpoints and the model to avoid incurring charges. Use the following code:

predictor.delete_endpoint()

Conclusion
In this post, we introduced Code Llama 70B on SageMaker JumpStart. Code Llama 70B is a state-of-the-art model for generating code from natural language prompts as well as code. You can deploy the model with a few simple steps in SageMaker JumpStart and then use it to carry out code-related tasks such as code generation and code infilling. As a next step, try using the model with your own code-related use cases and data.

About the authors
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.
June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.

Microsoft’s TAG-LLM: An AI Weapon for Decoding Complex Protein Struc …

Posted on February 16, 2024 by i-genie

The seamless integration of Large Language Models (LLMs) into the fabric of specialized scientific research represents a pivotal shift in the landscape of computational biology, chemistry, and beyond. Traditionally, LLMs excel in broad natural language processing tasks but falter when navigating the complex terrains of domains rich in specialized terminologies and structured data formats, such as protein sequences and chemical compounds. This limitation constrains the utility of LLMs in these critical areas and curtails the potential for AI-driven innovations that could revolutionize scientific discovery and application.

Addressing this challenge, a groundbreaking framework developed at Microsoft Research, TAG-LLM, emerges. It is designed to harness LLMs’ general capabilities while tailoring their prowess to specialized domains. At the heart of TAG-LLM lies a system of meta-linguistic input tags, ingeniously conditioning the LLM to navigate domain-specific landscapes adeptly. These tags, conceptualized as continuous vectors, are ingeniously appended to the model’s embedding layer, enabling it to recognize and process specialized content with unprecedented accuracy.

The ingenuity of TAG-LLM unfolds through a meticulously structured methodology comprising three stages. Initially, domain tags are cultivated using unsupervised data, capturing the essence of domain-specific knowledge. This foundational step is crucial, allowing the model to acquaint itself with the unique linguistic and symbolic representations endemic to each specialized field. Subsequently, these domain tags undergo a process of enrichment, being infused with task-relevant information that further refines their utility. The culmination of this process sees the introduction of function tags tailored to guide the LLM across a myriad of tasks within these specialized domains. This tripartite approach leverages the inherent knowledge embedded within LLMs and equips them with the flexibility and precision required for domain-specific tasks.

The prowess of TAG-LLM is vividly illustrated through its exemplary performance across a spectrum of tasks involving protein properties, chemical compound characteristics, and drug-target interactions. Compared to existing models and fine-tuning approaches, TAG-LLM demonstrates superior efficacy, underscored by its ability to outperform specialized models tailored to these tasks. This remarkable achievement is a testament to TAG-LLM’s robustness and highlights its potential to catalyze significant advancements in scientific research and applications.

Beyond its immediate applications, the implications of TAG-LLM extend far into scientific inquiry and discovery. TAG-LLM opens new avenues for leveraging AI to advance our understanding and capabilities within these fields by bridging the gap between general-purpose LLMs and the nuanced requirements of specialized domains. Its versatility and efficiency present a compelling solution to the challenges of applying AI to technical and scientific research, promising a future where AI-driven innovations are at the forefront of scientific breakthroughs and applications.

TAG-LLM stands as a beacon of innovation, embodying the confluence of AI and specialized scientific research. Its development addresses a critical challenge in applying LLMs to technical domains and sets the stage for a new era of scientific discovery powered by AI. The journey of TAG-LLM from concept to realization underscores the transformative potential of AI in revolutionizing our approach to scientific research, heralding a future where the boundaries of what can be achieved through AI-driven science are continually expanded.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Microsoft’s TAG-LLM: An AI Weapon for Decoding Complex Protein Structures and Chemical Compounds! appeared first on MarkTechPost.

This AI Paper Unveils Mixed-Precision Training for Fourier Neural Oper …

Posted on February 16, 2024 by i-genie

Neural operators, specifically the Fourier Neural Operators (FNO), have revolutionized how researchers approach solving partial differential equations (PDEs), a cornerstone problem in science and engineering. These operators have shown exceptional promise in learning mappings between function spaces, pivotal for accurately simulating phenomena like climate modeling and fluid dynamics. Despite their potential, the substantial computational resources required for training these models, especially in GPU memory and processing power, pose significant challenges.

The research’s core problem lies in optimizing neural operator training to make it more feasible for real-world applications. Traditional training approaches demand high-resolution data, which in turn requires extensive memory and computational time, limiting the scalability of these models. This issue is particularly pronounced when deploying neural operators for solving complex PDEs across various scientific domains.

While effective, current methodologies for training neural operators need to work on memory usage and computational speed inefficiencies. These limitations become stark barriers when dealing with high-resolution data, a necessity for ensuring the accuracy and reliability of solutions produced by neural operators. As such, there is a pressing need for innovative approaches that can mitigate these challenges without compromising on model performance.

The research introduces a mixed-precision training technique for neural operators, notably the FNO, aiming to reduce memory requirements and enhance training speed significantly. This method leverages the inherent approximation error in neural operator learning, arguing that full precision in training is not always necessary. By rigorously analyzing the approximation and precision errors within FNOs, the researchers establish that a strategic reduction in precision can maintain a tight approximation bound, thus preserving the model’s accuracy while optimizing memory use.

Delving deeper, the proposed method optimizes tensor contractions, a memory-intensive step in FNO training, by employing a targeted approach to reduce precision. This optimization addresses the limitations of existing mixed-precision techniques. Through extensive experiments, it demonstrates a reduction in GPU memory usage by up to 50% and an improvement in training throughput by 58% without significant loss in accuracy.

The remarkable outcomes of this research showcase the method’s effectiveness across various datasets and neural operator models, underscoring its potential to transform neural operator training. By achieving similar levels of accuracy with significantly lower computational resources, this mixed-precision training approach paves the way for more scalable and efficient solutions to complex PDE-based problems in science and engineering.

In conclusion, the presented research provides a compelling solution to the computational challenges of training neural operators to solve PDEs. By introducing a mixed-precision training method, the research team has opened new avenues for making these powerful models more accessible and practical for real-world applications. The approach conserves valuable computational resources and maintains the high accuracy essential for scientific computations, marking a significant step forward in the field of computational science.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper Unveils Mixed-Precision Training for Fourier Neural Operators: Bridging Efficiency and Precision in High-Resolution PDE Solutions appeared first on MarkTechPost.

Meet Hawkeye: A Unified Deep Learning-based Fine-Grained Image Recogni …

Posted on February 16, 2024 by i-genie

In recent years, notable advancements in the design and training of deep learning models have led to significant improvements in image recognition performance, particularly on large-scale datasets. Fine-Grained Image Recognition (FGIR) represents a specialized domain focusing on the detailed recognition of subcategories within broader semantic categories. Despite the progress facilitated by deep learning, FGIR remains a formidable challenge, with wide-ranging applications in smart cities, public safety, ecological protection, and agricultural production.

The primary hurdle in FGIR revolves around discerning subtle visual disparities crucial for distinguishing objects with highly similar overall appearances but varying fine-grained features. Existing FGIR methods can generally be categorized into three paradigms: recognition by localization-classification subnetworks, recognition by end-to-end feature encoding, and recognition with external information.

While some methods from these paradigms have been made available as open-source, a unified open-needs-to-be library currently lacks. This absence poses a significant obstacle for new researchers entering the field, as different methods often rely on disparate deep-learning frameworks and architectural designs, necessitating a steep learning curve for each. Moreover, the absence of a unified library often compels researchers to develop their code from scratch, leading to redundant efforts and less reproducible results due to variations in frameworks and setups.

To tackle this, researchers at the Nanjing University of Science and Technology introduce Hawkeye, a PyTorch-based library for Fine-Grained Image Recognition (FGIR) built upon a modular architecture, prioritizing high-quality code and human-readable configuration. With its deep learning capabilities, Hawkeye offers a comprehensive solution tailored specifically for FGIR tasks.

Hawkeye encompasses 16 representative methods spanning six paradigms in FGIR, providing researchers with a holistic understanding of current state-of-the-art techniques. Its modular design facilitates easy integration of custom methods or enhancements, enabling fair comparisons with existing approaches. The FGIR training pipeline in Hawkeye is structured into multiple modules integrated within a unified pipeline. Users can override specific modules, ensuring flexibility and customization while minimizing code modifications.

Emphasizing code readability, Hawkeye simplifies each module within the pipeline to enhance comprehensibility. This approach aids beginners in quickly grasping the training process and the functions of each component.

Hawkeye provides YAML configuration files for each method, allowing users to conveniently modify hyperparameters related to the dataset, model, optimizer, etc. This streamlined approach enables users to efficiently tailor experiments to their specific requirements.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet Hawkeye: A Unified Deep Learning-based Fine-Grained Image Recognition Toolbox Built on PyTorch appeared first on MarkTechPost.

Detect anomalies in manufacturing data using Amazon SageMaker Canvas

Posted on February 16, 2024 by i-genie

With the use of cloud computing, big data and machine learning (ML) tools like Amazon Athena or Amazon SageMaker have become available and useable by anyone without much effort in creation and maintenance. Industrial companies increasingly look at data analytics and data-driven decision-making to increase resource efficiency across their entire portfolio, from operations to performing predictive maintenance or planning.
Due to the velocity of change in IT, customers in traditional industries are facing a dilemma of skillset. On the one hand, analysts and domain experts have a very deep knowledge of the data in question and its interpretation, yet often lack the exposure to data science tooling and high-level programming languages such as Python. On the other hand, data science experts often lack the experience to interpret the machine data content and filter it for what is relevant. This dilemma hampers the creation of efficient models that use data to generate business-relevant insights.
Amazon SageMaker Canvas addresses this dilemma by providing domain experts a no-code interface to create powerful analytics and ML models, such as forecasts, classification, or regression models. It also allows you to deploy and share these models with ML and MLOps specialists after creation.
In this post, we show you how to use SageMaker Canvas to curate and select the right features in your data, and then train a prediction model for anomaly detection, using the no-code functionality of SageMaker Canvas for model tuning.
Anomaly detection for the manufacturing industry
At the time of writing, SageMaker Canvas focuses on typical business use cases, such as forecasting, regression, and classification. For this post, we demonstrate how these capabilities can also help detect complex abnormal data points. This use case is relevant, for instance, to pinpoint malfunctions or unusual operations of industrial machines.
Anomaly detection is important in the industry domain, because machines (from trains to turbines) are normally very reliable, with times between failures spanning years. Most data from these machines, such as temperature senor readings or status messages, describes the normal operation and has limited value for decision-making. Engineers look for abnormal data when investigating root causes for a fault or as warning indicators for future faults, and performance managers examine abnormal data to identify potential improvements. Therefore, the typical first step in moving towards data-driven decision-making relies on finding that relevant (abnormal) data.
In this post, we use SageMaker Canvas to curate and select the right features in data, and then train a prediction model for anomaly detection, using SageMaker Canvas no-code functionality for model tuning. Then we deploy the model as a SageMaker endpoint.
Solution overview
For our anomaly detection use case, we train a prediction model to predict a characteristic feature for the normal operation of a machine, such as the motor temperature indicated in a car, from influencing features, such as the speed and recent torque applied in the car. For anomaly detection on a new sample of measurements, we compare the model predictions for the characteristic feature with the observations provided.
For the example of the car motor, a domain expert obtains measurements of the normal motor temperature, recent motor torque, ambient temperature, and other potential influencing factors. These allow you to train a model to predict the temperature from the other features. Then we can use the model to predict the motor temperature on a regular basis. When the predicted temperature for that data is similar to the observed temperature in that data, the motor is working normally; a discrepancy will point to an anomaly, such as the cooling system failing or a defect in the motor.
The following diagram illustrates the solution architecture.

The solution consists of four key steps:

The domain expert creates the initial model, including data analysis and feature curation using SageMaker Canvas.
The domain expert shares the model via the Amazon SageMaker Model Registry or deploys it directly as a real-time endpoint.
An MLOps expert creates the inference infrastructure and code translating the model output from a prediction into an anomaly indicator. This code typically runs inside an AWS Lambda function.
When an application requires an anomaly detection, it calls the Lambda function, which uses the model for inference and provides the response (whether or not it’s an anomaly).

Prerequisites
To follow along with this post, you must meet the following prerequisites:

The domain expert user has access to Sagemaker Canvas.
The MLOps expert user has access to a SageMaker notebook and the AWS Management Console. For more information, refer to Getting Started with the AWS Management Console.
The domain expert has access to the dataset they want to use to train their anomaly detection model in CSV or another format that SageMaker Canvas supports.

Create the model using SageMaker
The model creation process follows the standard steps to create a regression model in SageMaker Canvas. For more information, refer to Getting started with using Amazon SageMaker Canvas.
First, the domain expert loads relevant data into SageMaker Canvas, such as a time series of measurements. For this post, we use a CSV file containing the (synthetically generated) measurements of an electrical motor. For details, refer to Import data into Canvas. The sample data used is available for download as a CSV.

Curate the data with SageMaker Canvas
After the data is loaded, the domain expert can use SageMaker Canvas to curate the data used in the final model. For this, the expert selects those columns that contain characteristic measurements for the problem in question. More precisely, the expert selects columns that are related to each other, for instance, by a physical relationship such as a pressure-temperature curve, and where a change in that relationship is a relevant anomaly for their use case. The anomaly detection model will learn the normal relationship between the selected columns and indicate when data doesn’t conform to it, such as an abnormally high motor temperature given the current load on the motor.
In practice, the domain expert needs to select a set of suitable input columns and a target column. The inputs are typically the collection of quantities (numeric or categorical) that determine a machine’s behavior, from demand settings, to load, speed, or ambient temperature. The output is typically a numeric quantity that indicates the performance of the machine’s operation, such as a temperature measuring energy dissipation or another performance metric changing when the machine runs under suboptimal conditions.
To illustrate the concept of what quantities to select for input and output, let’s consider a few examples:

For rotating equipment, such as the model we build in this post, typical inputs are the rotation speed, torque (current and history), and ambient temperature, and the targets are the resulting bearing or motor temperatures indicating good operational conditions of the rotations
For a wind turbine, typical inputs are the current and recent history of wind speed and rotor blade settings, and the target quantity is the produced power or rotational speed
For a chemical process, typical inputs are the percentage of different ingredients and the ambient temperature, and targets are the heat produced or the viscosity of the end product
For moving equipment such as sliding doors, typical inputs are the power input to the motors, and the target value is the speed or completion time for the movement
For an HVAC system, typical inputs are the achieved temperature difference and load settings, and the target quantity is the energy consumption measured

Ultimately, the right inputs and targets for a given equipment will depend on the use case and anomalous behavior to detect, and are best known to a domain expert who is familiar with the intricacies of the specific dataset.
In most cases, selecting suitable input and target quantities means selecting the right columns only and marking the target column (for this example, bearing_temperature). However, a domain expert can also use the no-code features of SageMaker Canvas to transform columns and refine or aggregate the data. For instance, you can extract or filter specific dates or timestamps from the data that are not relevant. SageMaker Canvas supports this process, showing statistics on the quantities selected, allowing you to understand if a quantity has outliers and spread that may affect the results of the model.
Train, tune, and evaluate the model
After the domain expert has selected suitable columns in the dataset, they can train the model to learn the relationship between the inputs and outputs. More precisely, the model will learn to predict the target value selected from the inputs.
Normally, you can use the SageMaker Canvas Model Preview option. This provide a quick indication of the model quality to expect, and allows you to investigate the effect that different inputs have on the output metric. For instance, in the following screenshot, the model is most affected by the motor_speed and ambient_temperature metrics when predicting bearing_temperature. This is sensible, because these temperatures are closely related. At the same time, additional friction or other means of energy loss are likely to affect this.
For the model quality, the RMSE of the model is an indicator how well the model was able to learn the normal behavior in the training data and reproduce the relationships between the input and output measures. For instance, in the following model, the model should be able to predict the correct motor_bearing temperature within 3.67 degrees Celsius, so we can consider a deviation of the real temperature from a model prediction that is larger than, for example, 7.4 degrees as an anomaly. The real threshold that you would use, however, will depend on the sensitivity required in the deployment scenario.

Finally, after the model evaluation and tuning is finished, you can start the complete model training that will create the model to use for inference.
Deploy the model
Although SageMaker Canvas can use a model for inference, productive deployment for anomaly detection requires you to deploy the model outside of SageMaker Canvas. More precisely, we need to deploy the model as an endpoint.
In this post and for simplicity, we deploy the model as an endpoint from SageMaker Canvas directly. For instructions, refer to Deploy your models to an endpoint. Make sure to take note of the deployment name and consider the pricing of the instance type you deploy to (for this post, we use ml.m5.large). SageMaker Canvas will then create a model endpoint that can be called to obtain predictions.

In industrial settings, a model needs to undergo thorough testing before it can be deployed. For this, the domain expert will not deploy it, but instead share the model to the SageMaker Model Registry. Here, an MLOps operations expert can take over. Typically, that expert will test the model endpoint, evaluate the size of computing equipment required for the target application, and determine most cost-efficient deployment, such as deployment for serverless inference or batch inference. These steps are normally automated (for instance, using Amazon Sagemaker Pipelines or the Amazon SDK).

Use the model for anomaly detection
In the previous step, we created a model deployment in SageMaker Canvas, called canvas-sample-anomaly-model. We can use it to obtain predictions of a bearing_temperature value based on the other columns in the dataset. Now, we want to use this endpoint to detect anomalies.
To identify anomalous data, our model will use the prediction model endpoint to get the expected value of the target metric and then compare the predicted value against the actual value in the data. The predicted value indicates the expected value for our target metric based on the training data. The difference of this value therefore is a metric for the abnormality of the actual data observed. We can use the following code:

# We are using pandas dataframes for data handling
import pandas as pd
import boto3,json
sm_runtime_client = boto3.client(‘sagemaker-runtime’)

# Configuration of the actual model invocation
endpoint_name=”canvas-sample-anomaly-model”
# Name of the column in the input data to compare with predictions
TARGET_COL=’bearing_temperature’

def do_inference(data, endpoint_name):
# Example Code provided by Sagemaker Canvas
body = data.to_csv(header=False, index=True).encode(“utf-8”)
response = sm_runtime_client.invoke_endpoint(Body = body,
EndpointName = endpoint_name,
ContentType = “text/csv”,
Accept = “application/json”,
)
return json.loads(response[“Body”].read())

def input_transformer(input_data, drop_cols = [ TARGET_COL ] ):
# Transform the input: Drop the Target column
return input_data.drop(drop_cols,axis =1 )

def output_transformer(input_data,response):
# Take the initial input data and compare it to the response of the prediction model
scored = input_data.copy()
scored.loc[ input_data.index,’prediction_’+TARGET_COL ] = pd.DataFrame(
response[ ‘predictions’ ],
index = input_data.index
)[‘score’]
scored.loc[ input_data.index,’error’ ] = (
scored[ TARGET_COL ]-scored[ ‘prediction_’+TARGET_COL ]
).abs()
return scored

# Run the inference
raw_input = pd.read_csv(MYFILE) # Read my data for inference
to_score = input_transformer(raw_input) # Prepare the data
predictions = do_inference(to_score, endpoint_name) # create predictions
results = output_transformer(to_score,predictions) # compare predictions & actuals

The preceding code performs the following actions:

The input data is filtered down to the right features (function “input_transformer“).
The SageMaker model endpoint is invoked with the filtered data (function “do_inference“), where we handle input and output formatting according to the sample code provided when opening the details page of our deployment in SageMaker Canvas.
The result of the invocation is joined to the original input data and the difference is stored in the error column (function “output_transform“).

Find anomalies and evaluate anomalous events
In a typical setup, the code to obtain anomalies is run in a Lambda function. The Lambda function can be called from an application or Amazon API Gateway. The main function returns an anomaly score for each row of the input data—in this case, a time series of an anomaly score.
For testing, we can also run the code in a SageMaker notebook. The following graphs show the inputs and output of our model when using the sample data. Peaks in the deviation between predicted and actual values (anomaly score, shown in the lower graph) indicate anomalies. For instance, in the graph, we can see three distinct peaks where the anomaly score (difference between expected and real temperature) surpasses 7 degrees Celsius: the first after a long idle time, the second at a steep drop of bearing_temperature, and the last where bearing_temperature is high compared to motor_speed.

In many cases, knowing the time series of the anomaly score is already sufficient; you can set up a threshold for when to warn of a significant anomaly based on the need for model sensitivity. The current score then indicates that a machine has an abnormal state that needs investigation. For instance, for our model, the absolute value of the anomaly score is distributed as shown in the following graph. This confirms that most anomaly scores are below the (2xRMS=)8 degrees found during training for the model as the typical error. The graph can help you choose a threshold manually, such that the right percentage of the evaluated samples are marked as anomalies.

If the desired output are events of anomalies, then the anomaly scores provided by the model require refinement to be relevant for business use. For this, the ML expert will typically add postprocessing to remove noise or large peaks on the anomaly score, such as adding a rolling mean. In addition, the expert will typically evaluate the anomaly score by a logic similar to raising an Amazon CloudWatch alarm, such as monitoring for the breach of a threshold over a specific duration. For more information about setting up alarms, refer to Using Amazon CloudWatch alarms. Running these evaluations in the Lambda function allows you to send warnings, for instance, by publishing a warning to an Amazon Simple Notification Service (Amazon SNS) topic.
Clean up
After you have finished using this solution, you should clean up to avoid unnecessary cost:

In SageMaker Canvas, find your model endpoint deployment and delete it.
Log out of SageMaker Canvas to avoid charges for it running idly.

Summary
In this post, we showed how a domain expert can evaluate input data and create an ML model using SageMaker Canvas without the need to write code. Then we showed how to use this model to perform real-time anomaly detection using SageMaker and Lambda through a simple workflow. This combination empowers domain experts to use their knowledge to create powerful ML models without additional training in data science, and enables MLOps experts to use these models and make them available for inference flexibly and efficiently.
A 2-month free tier is available for SageMaker Canvas, and afterwards you only pay for what you use. Start experimenting today and add ML to make the most of your data.

About the author
Helge Aufderheide is an enthusiast of making data usable in the real world with a strong focus on Automation, Analytics and Machine Learning in Industrial Applications, such as Manufacturing and Mobility.

Enhance Amazon Connect and Lex with generative AI capabilities

Posted on February 16, 2024 by i-genie

Effective self-service options are becoming increasingly critical for contact centers, but implementing them well presents unique challenges.
Amazon Lex provides your Amazon Connect contact center with chatbot functionalities such as automatic speech recognition (ASR) and natural language understanding (NLU) capabilities through voice and text channels. The bot takes natural language speech or text input, recognizes the intent behind the input, and fulfills the user’s intent by invoking the appropriate response.
Callers can have diverse accents, pronunciation, and grammar. Combined with background noise, this can make it challenging for speech recognition to accurately understand statements. For example, “I want to track my order” may be misrecognized as “I want to truck my holder.” Failed intents like these frustrate customers who have to repeat themselves, get routed incorrectly, or are escalated to live agents—costing businesses more.
Amazon Bedrock democratizes foundational model (FM) access for developers to effortlessly build and scale generative AI-based applications for the modern contact center. FMs delivered by Amazon Bedrock, such as Amazon Titan and Anthropic Claude, are pretrained on internet-scale datasets that gives them strong NLU capabilities such as sentence classification, question and answer, and enhanced semantic understanding despite speech recognition errors.
In this post, we explore a solution that uses FMs delivered by Amazon Bedrock to enhance intent recognition of Amazon Lex integrated with Amazon Connect, ultimately delivering an improved self-service experience for your customers.
Overview of solution
The solution uses Amazon Connect, Amazon Lex , AWS Lambda, and Amazon Bedrock in the following steps:

An Amazon Connect contact flow integrates with an Amazon Lex bot via the GetCustomerInput block.
When the bot fails to recognize the caller’s intent and defaults to the fallback intent, a Lambda function is triggered.
The Lambda function takes the transcript of the customer utterance and passes it to a foundation model in Amazon Bedrock
Using its advanced natural language capabilities, the model determines the caller’s intent.
The Lambda function then directs the bot to route the call to the correct intent for fulfillment.

By using Amazon Bedrock foundation models, the solution enables the Amazon Lex bot to understand intents despite speech recognition errors. This results in smooth routing and fulfillment, preventing escalations to agents and frustrating repetitions for callers.
The following diagram illustrates the solution architecture and workflow.

In the following sections, we look at the key components of the solution in more detail.
Lambda functions and the LangChain Framework
When the Amazon Lex bot invokes the Lambda function, it sends an event message that contains bot information and the transcription of the utterance from the caller. Using this event message, the Lambda function dynamically retrieves the bot’s configured intents, intent description, and intent utterances and builds a prompt using LangChain, which is an open source machine learning (ML) framework that enables developers to integrate large language models (LLMs), data sources, and applications.
An Amazon Bedrock foundation model is then invoked using the prompt and a response is received with the predicted intent and confidence level. If the confidence level is greater than a set threshold, for example 80%, the function returns the identified intent to Amazon Lex with an action to delegate. If the confidence level is below the threshold, it defaults back to the default FallbackIntent and an action to close it.
In-context learning, prompt engineering, and model invocation
We use in-context learning to be able to use a foundation model to accomplish this task. In-context learning is the ability for LLMs to learn the task using just what’s in the prompt without being pre-trained or fine-tuned for the particular task.
In the prompt, we first provide the instruction detailing what needs to be done. Then, the Lambda function dynamically retrieves and injects the Amazon Lex bot’s configured intents, intent descriptions, and intent utterances into the prompt. Finally, we provide it instructions on how to output its thinking and final result.
The following prompt template was tested on text generation models Anthropic Claude Instant v1.2 and Anthropic Claude v2. We use XML tags to better improve the performance of the model. We also add room for the model to think before identifying the final intent to better improve its reasoning for choosing the right intent. The {intent_block} contains the intent IDs, intent descriptions, and intent utterances. The {input} block contains the transcribed utterance from the caller. Three backticks (“`) are added at the end to help the model output a code block more consistently. A <STOP> sequence is added to stop it from generating further.

“””
Human: You are a call center agent. You try to understand the intent given an utterance from the caller.

The available intents are as follows, the intent of the caller is highly likely to be one of these.
<intents>
{intents_block} </intents>
The output format is:
<thinking>
</thinking>

<output>
{{
“intent_id”: intent_id,
“confidence”: confidence
}}
</output><STOP>

For the given utterance, you try to categorize the intent of the caller to be one of the intents in <intents></intents> tags.
If it does not match any intents or the utterance is blank, respond with FALLBCKINT and confidence of 1.0.
Respond with the intent name and confidence between 0.0 and 1.0.
Put your thinking in <thinking></thinking> tags before deciding on the intent.

Utterance: {input}

Assistant: “`”””

After the model has been invoked, we receive the following response from the foundation model:

<thinking>
The given utterance is asking for checking where their shipment is. It matches the intent order status.
</thinking>

{
“intent”: “ORDERSTATUSID”,
“confidence”: 1.0
}
“`

Filter available intents based on contact flow session attributes
When using the solution as part of an Amazon Connect contact flow, you can further enhance the ability of the LLM to identify the correct intent by specifying the session attribute available_intents in the “Get customer input” block with a comma-separated list of intents, as shown in the following screenshot. By doing so, the Lambda function will only include these specified intents as part of the prompt to the LLM, reducing the number of intents that the LLM has to reason through. If the available_intents session attribute is not specified, all intents in the Amazon Lex bot will be used by default.

Lambda function response to Amazon Lex
After the LLM has determined the intent, the Lambda function responds in the specific format required by Amazon Lex to process the response.
If a matching intent is found above the confidence threshold, it returns a dialog action type Delegate to instruct Amazon Lex to use the selected intent and subsequently return the completed intent back to Amazon Connect. The response output is as follows:

{
“sessionState”: {
“dialogAction”: {
“type”: “Delegate”
},
“intent”: {
“name”: intent,
“state”: “InProgress”,
}
}
}

If the confidence level is below the threshold or an intent was not recognized, a dialog action type Close is returned to instruct Amazon Lex to close the FallbackIntent, and return the control back to Amazon Connect. The response output is as follows:

{
“sessionState”: {
“dialogAction”: {
“type”: “Close”
},
“intent”: {
“name”: intent,
“state”: “Fulfilled”,
}
}
}

The complete source code for this sample is available in GitHub.
Prerequisites
Before you get started, make sure you have the following prerequisites:

A basic understanding of the Amazon Connect contact center solution using Amazon Lex and Amazon Bedrock
An AWS account with an AWS Identity and Access Management (IAM) user with permissions to deploy the CloudFormation template
The AWS Command Line Interface (AWS CLI) installed and configured for use
Docker installed and running for building the Lambda container image
Python 3.9 or later, to package Python code for the Lambda function
jq installed

Implement the solution
To implement the solution, complete the following steps:

Clone the repository

git clone https://github.com/aws-samples/amazon-connect-with-amazon-lex-genai-capabilities
cd amazon-connect-with-amazon-lex-genai-capabilities

Run the following command to initialize the environment and create an Amazon Elastic Container Registry (Amazon ECR) repository for our Lambda function’s image. Provide the AWS Region and ECR repository name that you would like to create.

bash ./scripts/build.sh region-name repository-name

Update the ParameterValue fields in the scripts/parameters.json file:

ParameterKey (“AmazonECRImageUri”) – Enter the repository URL from the previous step.
ParameterKey (“AmazonConnectName”) – Enter a unique name.
ParameterKey (“AmazonLexBotName”) – Enter a unique name.
ParameterKey (“AmazonLexBotAliasName”) – The default is “prodversion”; you can change it if needed.
ParameterKey (“LoggingLevel”) – The default is “INFO”; you can change it if required. Valid values are DEBUG, WARN, and ERROR.
ParameterKey (“ModelID”) – The default is “anthropic.claude-instant-v1”; you can change it if you need to use a different model.
ParameterKey (“AmazonConnectName”) – The default is “0.75”; you can change it if you need to update the confidence score.

Run the command to generate the CloudFormation stack and deploy the resources:

bash ./scripts/deploy.sh region cfn-stack-name

If you don’t want to build the contact flow from scratch in Amazon Connect, you can import the sample flow provided with this repository filelocation: /contactflowsample/samplecontactflow.json.

Log in to your Amazon Connect instance. The account must be assigned a security profile that includes edit permissions for flows.
On the Amazon Connect console, in the navigation pane, under Routing, choose Contact flows.
Create a new flow of the same type as the one you are importing.
Choose Save and Import flow.
Select the file to import and choose Import.

When the flow is imported into an existing flow, the name of the existing flow is updated, too.

Review and update any resolved or unresolved references as necessary.
To save the imported flow, choose Save. To publish, choose Save and Publish.
After you upload the contact flow, update the following configurations:

Update the GetCustomerInput blocks with the correct Amazon Lex bot name and version.
Under Manage Phone Number, update the number with the contact flow or IVR imported earlier.

Verify the configuration
Verify that the Lambda function created with the CloudFormation stack has an IAM role with permissions to retrieve bots and intent information from Amazon Lex (list and read permissions), and appropriate Amazon Bedrock permissions (list and read permissions).
In your Amazon Lex bot, for your configured alias and language, verify that the Lambda function was set up correctly. For the FallBackIntent, confirm that Fulfillmentis set to Active to be able to run the function whenever the FallBackIntent is triggered.

At this point, your Amazon Lex bot will automatically run the Lambda function and the solution should work seamlessly.
Test the solution
Let’s look at a sample intent, description, and utterance configuration in Amazon Lex and see how well the LLM performs with sample inputs that contains typos, grammar mistakes, and even a different language.
The following figure shows screenshots of our example. The left side shows the intent name, its description, and a single-word sample utterance. Without much configuration on Amazon Lex, the LLM is able to predict the correct intent (right side). In this test, we have a simple fulfillment message from the correct intent.

Clean up
To clean up your resources, run the following command to delete the ECR repository and CloudFormation stack:

bash ./scripts/cleanup.sh region repository-name cfn-stack-name

Conclusion
By using Amazon Lex enhanced with LLMs delivered by Amazon Bedrock, you can improve the intent recognition performance of your bots. This provides a seamless self-service experience for a diverse set of customers, bridging the gap between accents and unique speech characteristics, and ultimately enhancing customer satisfaction.
To dive deeper and learn more about generative AI, check out these additional resources:

How contact center leaders can prepare for generative AI
Generative AI Use Cases
How Technology Leaders Can Prepare for Generative AI
How Your Organization Can Prepare for Generative AI
An introduction to generative AI with Swami Sivasubramanian

For more information on how you can experiment with the generative AI-powered self-service solution, see Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models.

About the Authors
Hamza Nadeem is an Amazon Connect Specialist Solutions Architect at AWS, based in Toronto. He works with customers throughout Canada to modernize their Contact Centers and provide solutions to their unique customer engagement challenges and business requirements. In his spare time, Hamza enjoys traveling, soccer and trying new recipes with his wife.
Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.
Ross Alas is a Solutions Architect at AWS based in Toronto, Canada. He helps customers innovate with AI/ML and Generative AI solutions that leads to real business outcomes. He has worked with a variety of customers from retail, financial services, technology, pharmaceutical, and others. In his spare time, he loves the outdoors and enjoying nature with his family.
Sangeetha Kamatkar is a Solutions Architect at Amazon Web Services (AWS), helping customers with successful cloud adoption and migration. She works with customers to craft highly scalable, flexible, and resilient cloud architectures that address customer business problems. In her spare time, she listens to music, watch movies and enjoy gardening during summer time.

Customers.ai + Advertisers = A Match Made in Heaven

Posted on February 16, 2024 by i-genie

In the ever-evolving landscape of digital marketing, the ability to target and retarget audiences with precision is not just an advantage; it’s the backbone of successful advertising campaigns.

Yet, this critical component of digital strategy faces unprecedented challenges. The rise of ad blockers, stringent privacy regulations, and the gradual phasing out of third-party cookies are creating significant obstacles for advertisers on platforms like Facebook.

These hurdles compromise the effectiveness of tools such as the Facebook pixel, long relied upon for tracking website visitors and refining ad targeting strategies.

Enter our Website Visitor ID X-Ray pixel: Whether the Facebook pixel is blocked or third-party cookies are rendered obsolete, our tool ensures that your ad targeting remains not only viable but exceptionally precise.

How? By identifying 20% of your anonymous web visitors. These contacts are easily synced back to Facebook to build super powerful retargeting and lookalike audiences (that are way more effective than what Facebook was providing).

This blog post will delve into the synergy between our innovative product and Facebook ads, a pairing we believe is a match made in heaven.

We’ll explore how our tool not only complements but significantly enhances Facebook ad performance by enabling advertisers to build powerful retargeting and lookalike audiences. Join us as we unveil how to turn the challenges of today’s digital advertising landscape into opportunities, ensuring your campaigns reach their full potential.

What happened to digital ad targeting?

Using Restore

Why Restore Works

AI-Powered Advertising

How to Unlock AI and Lead Capture Tech for 10X Return on Ad Spend

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

What happened to digital ad targeting?

In today’s digital marketplace, the precision of ad targeting plays a pivotal role in the success of online advertising campaigns. For platforms like Facebook, this precision hinges on the ability to track user interactions, preferences, and behaviors across the web. However, the landscape of digital advertising is undergoing a seismic shift, presenting advertisers with a series of unprecedented challenges.

The Rise of Ad Blockers

One of the most significant hurdles to effective online advertising is the growing use of ad blockers. With more users installing ad-blocking software to enhance their browsing experience, the ability of businesses to track user activity and serve relevant ads is severely compromised. This trend not only diminishes the reach of advertising campaigns but also impacts the accuracy of targeting, leading to decreased engagement and conversion rates.

Privacy Regulations and the Phasing Out of Third-Party Cookies

Privacy concerns have led to stringent regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States. These regulations restrict the use of personal data for advertising purposes, making it challenging for advertisers to personalize their campaigns.

Compounding this issue is the industry-wide move away from third-party cookies. Major browsers are phasing out support for these tracking mechanisms in favor of protecting user privacy. This shift marks a significant turning point for advertisers who have long relied on third-party cookies to gather insights into user behavior and preferences.

Limitations of the Facebook Pixel

The Facebook pixel, a critical tool for tracking conversions, optimizing campaigns, and building targeted audiences, is also feeling the pressure. With the advent of privacy-focused browser features and extensions that block tracking scripts, the effectiveness of the Facebook pixel is waning. Advertisers are finding it increasingly difficult to capture the full spectrum of user activity on their websites, leading to incomplete data and less effective ad targeting.

The Convergence of Challenges

These challenges are not isolated incidents but part of a broader transformation in the digital advertising ecosystem. Ad blockers, privacy regulations, and the decline of third-party cookies are converging to create an environment where traditional methods of tracking and targeting are becoming less effective. This shift demands innovative solutions that can navigate the new landscape while respecting user privacy and delivering on the promise of precise, effective ad targeting.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Using Restore from Customers.ai to Supercharge Facebook Ads

These obstacles have been causing digital advertisers heartburn for years. Our Restore tool, powered by our Website Visitor ID X-Ray Pixel, is a powerful solution that provides better results than advertisers could generate even in the halcyon days of 2014!

In other words, say goodbye to the Tums!

So, how does it work?

Our Website Visitor ID X-Ray tool identifies your visitors

The first step is installing our pixel on your website. This process is just like installing the Facebook pixel except for one thing: after ours is installed, it works way better!

The Facebook pixel is blocked by Chrome, iOS, Safari, and other browsers. Ours isn’t! Our tool will identify 20-25% of your website visitors. This information includes:

Name

Landing Page

Time on Page

And more!

Send Everyone Identified to Facebook Ads

Retargeting and Lookalike audiences are great tools because they’re a very safe landing spot for users who aren’t yet high-intent buyers.

You can build a relationship with them through digital advertising and eventually put them in an email nurture campaign and turn them into a lifetime customer!

Build Targeted Ad Campaigns for Users

Even better than general retargeting is product-specific retargeting!

Imagine, for example, that you sell both Yellow Shirts and Blue Hats insurance.

Yyour X-Ray data shows you which page they landed on, you can get more specific!

This means that you can segment them!

So you can show yellow shirt-specific ads to the yellow shirt folks and blue hat-specific ads to the blue hat crew! More personalization is always better.

Creating audiences like this is dead simple, too.

In your Customers.ai account, navigate to your My Leads tab and click on “Audiences.”

You’ll see a big list of all your contacts. Then select “Add FIlter.”

In the Attribute drop-down, select “Landing Page URL” and in the Operator drop-down select “Equals.” Paste the landing page URL in the Value section.

Then just save it as an audience and you’re good to send it to your Facebook account!

Why Restore Works

In the quest for precision and efficiency in digital advertising, our visitor identification tool not only presents a solution to the challenges outlined earlier but stands as a superior alternative to traditional methods, especially when it comes to data accuracy and comprehensiveness. This section delves into how our technology enhances Facebook ad campaigns by providing data that is more accurate and comprehensive than what Facebook alone can offer.

A More Comprehensive View of Your Audience

Beyond accuracy, our tool offers a more comprehensive understanding of your audience. By identifying 20-25% of website visitors that would otherwise remain anonymous, our technology uncovers a wealth of actionable insights that Facebook’s native tools cannot replicate. This broader dataset includes visitors who are typically lost due to ad blockers or privacy settings, providing a fuller picture of who your audience really is.

The data gleaned from our tool encompasses a wide array of visitor actions, preferences, and behaviors. This depth of insight allows advertisers to build richer, more detailed audience profiles. When used in conjunction with Facebook’s ad platform, it enables the creation of retargeting and lookalike audiences that are not just larger but more precisely tailored to match potential customers.

Enhancing ROI on Ad Spend

The combination of accuracy and comprehensiveness significantly impacts the return on investment (ROI) for your ad campaigns. With more precise targeting comes higher engagement rates, as your ads are more likely to resonate with an audience that has been finely segmented based on reliable data. This leads to better conversion rates and a more efficient use of your advertising budget.

Moreover, by expanding the pool of identifiable visitors, our tool allows for the optimization of ad spend across a wider audience without sacrificing relevance. Advertisers can confidently scale their campaigns, knowing that they are reaching out to potential customers with a high propensity to engage or convert, thanks to the depth and accuracy of the data provided.

Overcoming Facebook’s Limitations

While Facebook remains a powerful platform for digital advertising, its limitations in data collection and audience identification are increasingly apparent in a privacy-conscious world. Our tool fills these gaps, offering a complementary solution that bolsters Facebook’s targeting capabilities. By providing a richer, more accurate dataset, our technology not only overcomes the challenges posed by the digital advertising landscape but also sets a new standard for what advertisers can achieve when precision and privacy coexist.

These powerful tools give you a big leg-up on your competitors, stuck with Facebook’s bad data. See for yourself how many visitors we could identify for you here:

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Customers.ai + Advertisers = A Match Made in Heaven appeared first on Customers.ai.

The Advertiser Survival Guide for the End of the Cookie Era

Posted on February 16, 2024 by i-genie

It’s the year of the cookie and while we’ve had plenty of time to prepare for what we are calling the “end of the cookie era”, it feels like marketers still aren’t ready.

And it turns out it feels that way because well, they aren’t. According to Martech, 75% of marketers still rely heavily on cookies and another 64% plan to increase their spending on cookie-based activations this year.

It may seem kind of backwards, relying on a technology that’s being phased out, but the reality is that marketers need ad platforms for leads, customers, sales, and revenue.

That isn’t going to change.

What is going to change (and already has) is the ability to measure campaigns effectively, allocate budgets appropriately, and reach the right people in the right places.

The bad news is it’s only going to get more challenging.

Which is why we brought together our Founder and CEO Larry Kim and InfoTrust’s D2C Industry Team Manager Courtney Fenstermaker to discuss what advertisers need to do to survive this so called cookiepocolypse.

Let’s jump into what they had to say.

Start Stockpiling Your First-Party Data

The demise of third-party cookies signals a return to first-party data supremacy – the data that you own and no one can take from you.

According to Kim, marketers and advertisers need to develop robust first-party data strategies to maintain and enhance customer engagement and targeting precision.

While that can sound overwhelming (and expensive), it’s actually easier said than done. Here are Kim’s tips for gathering and stockpiling your first-party data.

1. Invest in a Tracking Pixel to ID Visitors

In the wake of third-party cookie deprecation, tracking pixels are the solution for marketers aiming to gather actionable insights about their audience.

A pixel like Customers.ai Website Visitor ID X-Ray can help you not only identify up to 20% of your visitors but will also allow you to retarget, email, and add these visitors directly to your CRM.

Here are a few benefits of having a visitor tracking pixel in place:

Direct Data Collection: Custom tracking pixels enable businesses to collect data directly from their website visitors without relying on third-party sources. This first-party data is more accurate, relevant, and compliant with privacy regulations.

Enhanced Customer Journey Mapping: This is perhaps one of the biggest losses of the removal of cookies. But by implementing your own tracking solution, you can gain a comprehensive view of the customer journey across their digital ecosystem. Just like the old days, this detailed mapping allows for the identification of key touchpoints, preferences, and behaviors, enabling more effective targeting and optimization of the customer experience.

Privacy Compliance and Trust: With growing concerns over privacy and data protection, custom tracking pixels offer a way to respect user consent while still gathering valuable insights. By controlling the data collection process, businesses can ensure transparency and build trust with their audience.

2. Use First-Party Data to Restore Ad Retargeting & Attribution

We’ve talked a lot here about retargeting (here, here, and here for instance).

The privacy changes over the past 5 years have all but decimated retargeting campaigns and this latest focus on cookies is only going to make it worse.

The cool thing is, first-party data makes retargeting work again!

We know that remarketing engagement is 2-3x higher than non-remarketing and certainly more cost effective.

By capturing visitor information directly on your website and pumping it into your remarketing campaigns, you can retarget like it’s 2019!

3. Create Advanced Audience Segments with Your First Party Data

Larry Kim’s third tip for navigating the end of the cookie era focuses on using first-party persistent identifiers to create sophisticated audience segmentations and connecting these segments to specific marketing actions.

Again, it’s actually much easier than it sounds!

Let’s use Customers.ai as the example again.

A visitor comes to your site, the tracking pixel is able to identity them.

You now know their name, email address, business role, company, LinkedIn profile, and much more.

Based on their actions, let’s say visiting your request a demo page, they are put into your existing automation for people who request a demo.

We are able to track if they opened the email, what other pages they might’ve visited, what actions they might have taken, and all of this is sent to your CRM.

Here’s another example.

A visitor comes to your site and reads a blog post about lead generation.

They are not a high-intent visitor so shouldn’t be put into the email automation but they are a great fit for a retargeting campaign focused around anyone who visited pages about lead generation.

The key here is using the first-party data you have to not just create sophisticated audience segmentations, but to connect these segments to specific marketing actions. The result is better targeting, better attribution, and more money.

It’s Time to Adapt to a Privacy-First World

While Larry Kim focused on some of the more tactical things you can do to capture first-party data, Courtney Fenstermaker offered a broader view of the issue, focusing on adapting marketing strategies in a privacy-centric landscape while maximizing data utility.

Here are Fenstermaker’s tips for doing just that:

1. Actioning Data with a Privacy Lense

While this tip might seem broad at first glance, it’s all about recognizing the gold mine we’re sitting on: our data.

Fenstermaker reminds us that, even in a world where privacy is king and data is being removed, there’s still a ton we can do with the data we have, from deep dives into analytics to fine-tuning our ad targeting. It’s really about seeing the full spectrum of possibilities and making the most of what we’ve got, without stepping on anyone’s privacy toes.

First off, she’s big on using data to get the lay of the land. What’s happening with our audience? Who’s engaging, who’s not, and why? This isn’t just number-crunching; it’s about telling the story of our users’ journey with our brand.

And then there’s A/B testing—our primary weapon for figuring out what works and what doesn’t, right there on our own turf.

That leads into how we can take our data game to the next level with modeling and AI. AI really has the ability to give us the edge we need to stay ahead. And when it comes to enriching our data, she’s all about mixing and matching datasets to get a fuller picture of our audience, which is pure gold for targeting.

Last but not least, what this is all really about is reaching our customers in ways they actually appreciate. It’s about reading the room (or the data) and knowing whether to slide into their emails, pop up in their social feeds, or nudge them with a push notification.

It’s this blend of savvy data use and genuine respect for privacy that can help us nail our marketing goals while keeping our audience’s trust intact.

2. Be Aware of Privacy Legislation Updates

Courtney Fenstermaker’s second tip may seem obviously but is actually one of the most challenging parts of the whole privacy issue.

Staying on top of privacy changes from state to state is almost impossible.

Not with the IAPP US State Privacy Legislation Tracker.

The tracker can help marketers staying on top of privacy laws in the US quickly and easily. So, whether you’re in Pennsylvania or pitching to folks in California, this resource is your go-to for navigating the complex world of privacy laws without missing a beat.

3. Question Your Data & Assess Your Risk

The challenge is real: identifying users and tailoring marketing strategies has never been trickier, potentially dialing down the effectiveness of our tactics.

Yet, within this challenge lies an opportunity to rethink and revamp how we connect with our audience. It’s about diving deeper, beyond the surface data, to truly understand and segment our customer base, focusing on creating more meaningful connections.

Whether it’s leveraging AI, modeling, or just getting smart with the data we have, the goal is to refine our approach to reach even those who remain elusive.

This strategic shift isn’t just about adaptation; it’s about reimagining our engagement to keep pace with the industry’s evolution, ensuring our marketing remains relevant and impactful.

Additionally, a practical tip is to simply ask, “What now?”

Dive into your website and do a full inventory.

Pinpoint which platforms and tools you’re using and how they’re affected by the new privacy rules.

Often, we overlook the extent of our reliance on various technologies, from analytics to customer engagement tools. Understanding which ones might be at risk helps clarify how prepared (or not) your business is.

Fenstermaker calls it a data compliance or cookie assessment—a deep dive to see where you stand and how to adjust.

A Path Forward in the Post-Cookie World

This webinar really was a goldmine of tactics for marketers struggling with the cookie apocalypse.

The key here is to adapt and shift.

It’s all about doubling down on first-party data, putting privacy at the heart of everything we do, and using tech to really get what makes our customers tick.

With Larry Kim and Courtney Fenstermaker guiding the way, we’ve got ourselves a solid game plan to keep our marketing strong, legal, and totally on point in this new era.

Interested in learning more about how Customers.ai is helping marketers adapt their strategies and future-proof their marketing campaigns?

Contact our team for more information or install the Customers.ai Website Visitor ID X-Ray pixel right now.

It’s free, it takes less than 90 seconds, and you can capture up to 5,000 contacts for now charge.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post The Advertiser Survival Guide for the End of the Cookie Era appeared first on Customers.ai.

Skeleton-based pose annotation labeling using Amazon SageMaker Ground …

Posted on February 15, 2024 by i-genie

Pose estimation is a computer vision technique that detects a set of points on objects (such as people or vehicles) within images or videos. Pose estimation has real-world applications in sports, robotics, security, augmented reality, media and entertainment, medical applications, and more. Pose estimation models are trained on images or videos that are annotated with a consistent set of points (coordinates) defined by a rig. To train accurate pose estimation models, you first need to acquire a large dataset of annotated images; many datasets have tens or hundreds of thousands of annotated images and take significant resources to build. Labeling mistakes are important to identify and prevent because model performance for pose estimation models is heavily influenced by labeled data quality and data volume.
In this post, we show how you can use a custom labeling workflow in Amazon SageMaker Ground Truth specifically designed for keypoint labeling. This custom workflow helps streamline the labeling process and minimize labeling errors, thereby reducing the cost of obtaining high-quality pose labels.
Importance of high-quality data and reducing labeling errors
High-quality data is fundamental for training robust and reliable pose estimation models. The accuracy of these models is directly tied to the correctness and precision of the labels assigned to each pose keypoint, which, in turn, depends on the effectiveness of the annotation process. Additionally, having a substantial volume of diverse and well-annotated data ensures that the model can learn a broad range of poses, variations, and scenarios, leading to improved generalization and performance across different real-world applications. The acquisition of these large, annotated datasets involves human annotators who carefully label images with pose information. While labeling points of interest within the image, it’s useful to see the skeletal structure of the object while labeling in order to provide visual guidance to the annotator. This is helpful for identifying labeling errors before they are incorporated into the dataset like left-right swaps or mislabels (such as marking a foot as a shoulder). For example, a labeling error like the left-right swap made in the following example can easily be identified by the crossing of the skeleton rig lines and the mismatching of the colors. These visual cues help labelers recognize mistakes and will result in a cleaner set of labels.
Due to the manual nature of labeling, obtaining large and accurate labeled datasets can be cost-prohibitive and even more so with an inefficient labeling system. Therefore, labeling efficiency and accuracy are critical when designing your labeling workflow. In this post, we demonstrate how to use a custom SageMaker Ground Truth labeling workflow to quickly and accurately annotate images, reducing the burden of developing large datasets for pose estimation workflows.

Overview of solution
This solution provides an online web portal where the labeling workforce can use a web browser to log in, access labeling jobs, and annotate images using the crowd-2d-skeleton user interface (UI), a custom UI designed for keypoint and pose labeling using SageMaker Ground Truth. The annotations or labels created by the labeling workforce are then exported to an Amazon Simple Storage Service (Amazon S3) bucket, where they can be used for downstream processes like training deep learning computer vision models. This solution walks you through how to set up and deploy the necessary components to create a web portal as well as how to create labeling jobs for this labeling workflow.
The following is a diagram of the overall architecture.

This architecture is comprised of several key components, each of which we explain in more detail in the following sections. This architecture provides the labeling workforce with an online web portal hosted by SageMaker Ground Truth. This portal allows each labeler to log in and see their labeling jobs. After they’ve logged in, the labeler can select a labeling job and begin annotating images using the custom UI hosted by Amazon CloudFront. We use AWS Lambda functions for pre-annotation and post-annotation data processing.
The following screenshot is an example of the UI.

The labeler can mark specific keypoints on the image using the UI. The lines between keypoints will be automatically drawn for the user based on a skeleton rig definition that the UI uses. The UI allows many customizations, such as the following:

Custom keypoint names
Configurable keypoint colors
Configurable rig line colors
Configurable skeleton and rig structures

Each of these are targeted features to improve the ease and flexibility of labeling. Specific UI customization details can be found in the GitHub repo and are summarized later in this post. Note that in this post, we use human pose estimation as a baseline task, but you can expand it to labeling object pose with a pre-defined rig for other objects as well, such as animals or vehicles. In the following example, we show how this can be applied to label the points of a box truck.

SageMaker Ground Truth
In this solution, we use SageMaker Ground Truth to provide the labeling workforce with an online portal and a way to manage labeling jobs. This post assumes that you’re familiar with SageMaker Ground Truth. For more information, refer to Amazon SageMaker Ground Truth.
CloudFront distribution
For this solution, the labeling UI requires a custom-built JavaScript component called the crowd-2d-skeleton component. This component can be found on GitHub as part of Amazon’s open source initiatives. The CloudFront distribution will be used to host the crowd-2d-skeleton.js, which is needed by the SageMaker Ground Truth UI. The CloudFront distribution will be assigned an origin access identity, which will allow the CloudFront distribution to access the crowd-2d-skeleton.js residing in the S3 bucket. The S3 bucket will remain private and no other objects in this bucket will be available via the CloudFront distribution due to restrictions we place on the origin access identity through a bucket policy. This is a recommended practice for following the least-privilege principle.
Amazon S3 bucket
We use the S3 bucket to store the SageMaker Ground Truth input and output manifest files, the custom UI template, images for the labeling jobs, and the JavaScript code needed for the custom UI. This bucket will be private and not accessible to the public. The bucket will also have a bucket policy that restricts the CloudFront distribution to only being able to access the JavaScript code needed for the UI. This prevents the CloudFront distribution from hosting any other object in the S3 bucket.
Pre-annotation Lambda function
SageMaker Ground Truth labeling jobs typically use an input manifest file, which is in JSON Lines format. This input manifest file contains metadata for a labeling job, acts as a reference to the data that needs to be labeled, and helps configure how the data should be presented to the annotators. The pre-annotation Lambda function processes items from the input manifest file before the manifest data is input to the custom UI template. This is where any formatting or special modifications to the items can be done before presenting the data to the annotators in the UI. For more information on pre-annotation Lambda functions, see Pre-annotation Lambda.
Post-annotation Lambda function
Similar to the pre-annotation Lambda function, the post-annotation function handles additional data processing you may want to do after all the labelers have finished labeling but before writing the final annotation output results. This processing is done by a Lambda function, which is responsible for formatting the data for the labeling job output results. In this solution, we are simply using it to return the data in our desired output format. For more information on post-annotation Lambda functions, see Post-annotation Lambda.
Post-annotation Lambda function role
We use an AWS Identity and Access Management (IAM) role to give the post-annotation Lambda function access to the S3 bucket. This is needed to read the annotation results and make any modifications before writing out the final results to the output manifest file.
SageMaker Ground Truth role
We use this IAM role to give the SageMaker Ground Truth labeling job the ability to invoke the Lambda functions and to read the images, manifest files, and custom UI template in the S3 bucket.
Prerequisites
For this walkthrough, you should have the following prerequisites:

Familiarity with SageMaker Ground Truth labeling jobs and the workforce portal
Familiarity with the AWS Cloud Development Kit (AWS CDK)
An AWS account with the permissions to deploy the AWS CDK stack
A SageMaker Ground Truth private workforce
Python 3.9+ installed
The AWS CDK installed

For this solution, we use the AWS CDK to deploy the architecture. Then we create a sample labeling job, use the annotation portal to label the images in the labeling job, and examine the labeling results.
Create the AWS CDK stack
After you complete all the prerequisites, you’re ready to deploy the solution.
Set up your resources
Complete the following steps to set up your resources:

Download the example stack from the GitHub repo.
Use the cd command to change into the repository.
Create your Python environment and install required packages (see the repository README.md for more details).
With your Python environment activated, run the following command:

cdk synth

Run the following command to deploy the AWS CDK:

cdk deploy

Run the following command to run the post-deployment script:

python scripts/post_deployment_script.py

Create a labeling job
After you have set up your resources, you’re ready to create a labeling job. For the purposes of this post, we create a labeling job using the example scripts and images provided in the repository.

CD into the scripts directory in the repository.
Download the example images from the internet by running the following code:

python scripts/download_example_images.py

This script downloads a set of 10 images, which we use in our example labeling job. We review how to use your own custom input data later in this post.

Create a labeling job by running to following code:

python scripts/create_example_labeling_job.py <Labeling Workforce ARN>

This script takes a SageMaker Ground Truth private workforce ARN as an argument, which should be the ARN for a workforce you have in the same account you deployed this architecture into. The script will create the input manifest file for our labeling job, upload it to Amazon S3, and create a SageMaker Ground Truth custom labeling job. We take a deeper dive into the details of this script later in this post.
Label the dataset
After you have launched the example labeling job, it will appear on the SageMaker console as well as the workforce portal.

In the workforce portal, select the labeling job and choose Start working.

You’ll be presented with an image from the example dataset. At this point, you can use the custom crowd-2d-skeleton UI to annotate the images. You can familiarize yourself with the crowd-2d-skeleton UI by referring to User Interface Overview. We use the rig definition from the COCO keypoint detection dataset challenge as the human pose rig. To reiterate, you can customize this without our custom UI component to remove or add points based on your requirements.
When you’re finished annotating an image, choose Submit. This will take you to the next image in the dataset until all images are labeled.

Access the labeling results
When you have finished labeling all the images in the labeling job, SageMaker Ground Truth will invoke the post-annotation Lambda function and produce an output.manifest file containing all of the annotations. This output.manifest will be stored in the S3 bucket. In our case, the location of the output manifest should follow the S3 URI path s3://<bucket name> /labeling_jobs/output/<labeling job name>/manifests/output/output.manifest. The output.manifest file is a JSON Lines file, where each line corresponds to a single image and its annotations from the labeling workforce. Each JSON Lines item is a JSON object with many fields. The field we are interested in is called label-results. The value of this field is an object containing the following fields:

dataset_object_id – The ID or index of the input manifest item
data_object_s3_uri – The image’s Amazon S3 URI
image_file_name – The image’s file name
image_s3_location – The image’s Amazon S3 URL
original_annotations – The original annotations (only set and used if you are using a pre-annotation workflow)
updated_annotations – The annotations for the image
worker_id – The workforce worker who made the annotations
no_changes_needed – Whether the no changes needed check box was selected
was_modified – Whether the annotation data differs from the original input data
total_time_in_seconds – The time it took the workforce worker to annotation the image

With these fields, you can access your annotation results for each image and do calculations like average time to label an image.
Create your own labeling jobs
Now that we have created an example labeling job and you understand the overall process, we walk you through the code responsible for creating the manifest file and launching the labeling job. We focus on the key parts of the script that you may want to modify to launch your own labeling jobs.
We cover snippets of code from the create_example_labeling_job.py script located in the GitHub repository. The script starts by setting up variables that are used later in the script. Some of the variables are hard-coded for simplicity, whereas others, which are stack dependent, will be imported dynamically at runtime by fetching the values created from our AWS CDK stack.

# Setup/get variables values from our CDK stack
s3_upload_prefix = “labeling_jobs”
image_dir = ‘scripts/images’
manifest_file_name = “example_manifest.txt”
s3_bucket_name = read_ssm_parameter(‘/crowd_2d_skeleton_example_stack/bucket_name’)
pre_annotation_lambda_arn = read_ssm_parameter(‘/crowd_2d_skeleton_example_stack/pre_annotation_lambda_arn’)
post_annotation_lambda_arn = read_ssm_parameter(‘/crowd_2d_skeleton_example_stack/post_annotation_lambda_arn’)
ground_truth_role_arn = read_ssm_parameter(‘/crowd_2d_skeleton_example_stack/sagemaker_ground_truth_role’)
ui_template_s3_uri = f”s3://{s3_bucket_name}/infrastructure/ground_truth_templates/crowd_2d_skeleton_template.html”
s3_image_upload_prefix = f'{s3_upload_prefix}/images’
s3_manifest_upload_prefix = f'{s3_upload_prefix}/manifests’
s3_output_prefix = f'{s3_upload_prefix}/output’

The first key section in this script is the creation of the manifest file. Recall that the manifest file is a JSON lines file that contains the details for a SageMaker Ground Truth labeling job. Each JSON Lines object represents one item (for example, an image) that needs to be labeled. For this workflow, the object should contain the following fields:

source-ref – The Amazon S3 URI to the image you wish to label.
annotations – A list of annotation objects, which is used for pre-annotating workflows. See the crowd-2d-skeleton documentation for more details on the expected values.

The script creates a manifest line for each image in the image directory using the following section of code:

# For each image in the image directory lets create a manifest line
manifest_items = []
for filename in os.listdir(image_dir):
if filename.endswith(‘.jpg’) or filename.endswith(‘.png’):
img_path = os.path.join(
image_dir,
filename
)
object_name = os.path.join(
s3_image_upload_prefix,
filename
).replace(“\”, “/”)

# upload to s3_bucket
s3_client.upload_file(img_path, s3_bucket_name, object_name)
f
# add it to manifest file
manifest_items.append({
“source-ref”: f’s3://{s3_bucket_name}/{object_name}’,
“annotations”: [],
})

If you want to use different images or point to a different image directory, you can modify that section of the code. Additionally, if you’re using a pre-annotation workflow, you can update the annotations array with a JSON string consisting of the array and all its annotation objects. The details of the format of this array are documented in the crowd-2d-skeleton documentation.
With the manifest line items now created, you can create and upload the manifest file to the S3 bucket you created earlier:

# Create Manifest file
manifest_file_contents = “n”.join([json.dumps(mi) for mi in manifest_items])
with open(manifest_file_name, “w”) as file_handle:
file_handle.write(manifest_file_contents)

# Upload manifest file
object_name = os.path.join(
s3_manifest_upload_prefix,
manifest_file_name
).replace(“\”, “/”)
s3_client.upload_file(manifest_file_name, s3_bucket_name, object_name)

Now that you have created a manifest file containing the images you want to label, you can create a labeling job. You can create the labeling job programmatically using the AWS SDK for Python (Boto3). The code to create a labeling job is as follows:

# Create labeling job
client = boto3.client(“sagemaker”)
now = int(round(datetime.now().timestamp()))
response = client.create_labeling_job(
LabelingJobName=f”crowd-2d-skeleton-example-{now}”,
LabelAttributeName=”label-results”,
InputConfig={
“DataSource”: {
“S3DataSource”: {“ManifestS3Uri”: f’s3://{s3_bucket_name}/{object_name}’},
},
“DataAttributes”: {},
},
OutputConfig={
“S3OutputPath”: f”s3://{s3_bucket_name}/{s3_output_prefix}/”,
},
RoleArn=ground_truth_role_arn,
HumanTaskConfig={
“WorkteamArn”: workteam_arn,
“UiConfig”: {“UiTemplateS3Uri”: ui_template_s3_uri},
“PreHumanTaskLambdaArn”: pre_annotation_lambda_arn,
“TaskKeywords”: [“example”],
“TaskTitle”: f”Crowd 2D Component Example {now}”,
“TaskDescription”: “Crowd 2D Component Example”,
“NumberOfHumanWorkersPerDataObject”: 1,
“TaskTimeLimitInSeconds”: 28800,
“TaskAvailabilityLifetimeInSeconds”: 2592000,
“MaxConcurrentTaskCount”: 123,
“AnnotationConsolidationConfig”: {
“AnnotationConsolidationLambdaArn”: post_annotation_lambda_arn
},
},
)
print(response)

The aspects of this code you may want to modify are LabelingJobName, TaskTitle, and TaskDescription. The LabelingJobName is the unique name of the labeling job that SageMaker will use to reference your job. This is also the name that will appear on the SageMaker console. TaskTitle serves a similar purpose, but doesn’t need to be unique and will be the name of the job that appears in the workforce portal. You may want to make these more specific to what you are labeling or what the labeling job is for. Lastly, we have the TaskDescription field. This field appears in the workforce portal to provide extra context to the labelers as to what the task is, such as instructions and guidance for the task. For more information on these fields as well as the others, refer to the create_labeling_job documentation.
Make adjustments to the UI
In this section, we go over some of the ways you can customize the UI. The following is a list of the most common potential customizations to the UI in order to adjust it to your modeling task:

You can define which keypoints can be labeled. This includes the name of the keypoint and its color.
You can change the structure of the skeleton (which keypoints are connected).
You can change the line colors for specific lines between specific keypoints.

All of these UI customizations are configurable through arguments passed into the crowd-2d-skeleton component, which is the JavaScript component used in this custom workflow template. In this template, you will find the usage of the crowd-2d-skeleton component. A simplified version is shown in the following code:

<crowd-2d-skeleton
imgSrc=”{{ task.input.image_s3_uri | grant_read_access }}”
keypointClasses='<keypoint classes>’
skeletonRig='<skeleton rig definition>’
skeletonBoundingBox='<skeleton bounding box size>’
initialValues=”{{ task.input.initial_values }}”
>

In the preceding code example, you can see the following attributes on the component: imgSrc, keypointClasses, skeletonRig, skeletonBoundingBox, and intialValues. We describe each attribute’s purpose in the following sections, but customizing the UI is as straightforward as changing the values for these attributes, saving the template, and rerunning the post_deployment_script.py we used previously.
imgSrc attribute
The imgSrc attribute controls which image to show in the UI when labeling. Usually, a different image is used for each manifest line item, so this attribute is often populated dynamically using the built-in Liquid templating language. You can see in the previous code example that the attribute value is set to {{ task.input.image_s3_uri | grant_read_access }}, which is Liquid template variable that will be replaced with the actual image_s3_uri value when the template is being rendered. The rendering process starts when the user opens an image for annotation. This process grabs a line item from the input manifest file and sends it to the pre-annotation Lambda function as an event.dataObject. The pre-annotation function takes take the information it needs from the line item and returns a taskInput dictionary, which is then passed to the Liquid rendering engine, which will replace any Liquid variables in your template. For example, let’s say you have a manifest file with the following line:

{“source-ref”: “s3://my-bucket/exmaple.jpg”, “annotations”: []}

This data would be passed to the pre-annotation function. The following code shows how the function extracts the values from the event object:

def lambda_handler(event, context):
print(“Pre-Annotation Lambda Triggered”)
data_object = event[“dataObject”] # this comes directly from the manifest file
annotations = data_object[“annotations”]

taskInput = {
“image_s3_uri”: data_object[“source-ref”],
“initial_values”: json.dumps(annotations)
}
return {“taskInput”: taskInput, “humanAnnotationRequired”: “true”}

The object returned from the function in this case would look like the following code:

{
“taskInput”: {
“image_s3_uri”: “s3://my-bucket/exmaple.jpg”,
“annotations”: “[]”
},
“humanAnnotationRequired”: “true”
}

The returned data from the function is then available to the Liquid template engine, which replaces the template values in the template with the data values returned by the function. The result would be something like the following code:

<crowd-2d-skeleton
imgSrc=”s3://my-bucket/exmaple.jpg” <– This was “injected” into template
keypointClasses='<keypoint classes>’
skeletonRig='<skeleton rig definition>’
skeletonBoundingBox='<skeleton bounding box size>’
initialValues=”[]”
>

keypointClasses attribute
The keypointClasses attribute defines which keypoints will appear in the UI and be used by the annotators. This attribute takes a JSON string containing a list of objects. Each object represents a keypoint. Each keypoint object should contain the following fields:

id – A unique value to identify that keypoint.
color – The color of the keypoint represented as an HTML hex color.
label – The name or keypoint class.
x – This optional attribute is only needed if you want to use the draw skeleton functionality in the UI. The value for this attribute is the x position of the keypoint relative to the skeleton’s bounding box. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to 0.
y – This optional attribute is similar to x, but for the vertical dimension.

For more information on the keypointClasses attribute, see the keypointClasses documentation.
skeletonRig attribute
The skeletonRig attribute controls which keypoints should have lines drawn between them. This attribute takes a JSON string containing a list of keypoint label pairs. Each pair informs the UI which keypoints to draw lines between. For example, ‘[[“left_ankle”,”left_knee”],[“left_knee”,”left_hip”]]’ informs the UI to draw lines between “left_ankle” and “left_knee” and draw lines between “left_knee” and “left_hip”. This can be generated by the Skeleton Rig Creator tool.
skeletonBoundingBox attribute
The skeletonBoundingBox attribute is optional and only needed if you want to use the draw skeleton functionality in the UI. The draw skeleton functionality is the ability to annotate entire skeletons with a single annotation action. We don’t cover this feature in this post. The value for this attribute is the skeleton’s bounding box dimensions. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to null. It is recommended to use the Skeleton Rig Creator tool to get this value.
intialValues attribute
The initialValues attribute is used to pre-populate the UI with annotations obtained from another process (such as another labeling job or machine learning model). This is useful when doing adjustment or review jobs. The data for this field is usually populated dynamically in the same description for the imgSrc attribute. More details can be found in the crowd-2d-skeleton documentation.
Clean up
To avoid incurring future charges, you should delete the objects in your S3 bucket and delete your AWS CDK stack. You can delete your S3 objects via the Amazon SageMaker console or the AWS Command Line Interface (AWS CLI). After you have deleted all of the S3 objects in the bucket, you can destroy the AWS CDK by running the following code:

cdk destroy

This will remove the resources you created earlier.
Considerations
Additional steps maybe needed to productionize your workflow. Here are some considerations depending on your organizations risk profile:

Adding access and application logging
Adding a web application firewall (WAF)
Adjusting IAM permissions to follow least privilege

Conclusion
In this post, we shared the importance of labeling efficiency and accuracy in building pose estimation datasets. To help with both items, we showed how you can use SageMaker Ground Truth to build custom labeling workflows to support skeleton-based pose labeling tasks, aiming to enhance efficiency and precision during the labeling process. We showed how you can further extend the code and examples to various custom pose estimation labeling requirements.
We encourage you to use this solution for your labeling tasks and to engage with AWS for assistance or inquiries related to custom labeling workflows.

About the Authors
Arthur Putnam is a Full-Stack Data Scientist in AWS Professional Services. Arthur’s expertise is centered around developing and integrating front-end and back-end technologies into AI systems. Outside of work, Arthur enjoys exploring the latest advancements in technology, spending time with his family, and enjoying the outdoors.
Ben Fenker is a Senior Data Scientist in AWS Professional Services and has helped customers build and deploy ML solutions in industries ranging from sports to healthcare to manufacturing. He has a Ph.D. in physics from Texas A&M University and 6 years of industry experience. Ben enjoys baseball, reading, and raising his kids.
Jarvis Lee is a Senior Data Scientist with AWS Professional Services. He has been with AWS for over six years, working with customers on machine learning and computer vision problems. Outside of work, he enjoys riding bicycles.

Build generative AI chatbots using prompt engineering with Amazon Reds …

Posted on February 15, 2024 by i-genie

With the advent of generative AI solutions, organizations are finding different ways to apply these technologies to gain edge over their competitors. Intelligent applications, powered by advanced foundation models (FMs) trained on huge datasets, can now understand natural language, interpret meaning and intent, and generate contextually relevant and human-like responses. This is fueling innovation across industries, with generative AI demonstrating immense potential to enhance countless business processes, including the following:

Accelerate research and development through automated hypothesis generation and experiment design
Uncover hidden insights by identifying subtle trends and patterns in data
Automate time-consuming documentation processes
Provide better customer experience with personalization
Summarize data from various knowledge sources
Boost employee productivity by providing software code recommendations

Amazon Bedrock is a fully managed service that makes it straightforward to build and scale generative AI applications. Amazon Bedrock offers a choice of high-performing foundation models from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, via a single API. It enables you to privately customize the FMs with your data using techniques such as fine-tuning, prompt engineering, and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources while complying with security and privacy requirements.
In this post, we discuss how to use the comprehensive capabilities of Amazon Bedrock to perform complex business tasks and improve the customer experience by providing personalization using the data stored in a database like Amazon Redshift. We use prompt engineering techniques to develop and optimize the prompts with the data that is stored in a Redshift database to efficiently use the foundation models. We build a personalized generative AI travel itinerary planner as part of this example and demonstrate how we can personalize a travel itinerary for a user based on their booking and user profile data stored in Amazon Redshift.
Prompt engineering
Prompt engineering is the process where you can create and design user inputs that can guide generative AI solutions to generate desired outputs. You can choose the most appropriate phrases, formats, words, and symbols that guide the foundation models and in turn the generative AI applications to interact with the users more meaningfully. You can use creativity and trial-and-error methods to create a collection on input prompts, so the application works as expected. Prompt engineering makes generative AI applications more efficient and effective. You can encapsulate open-ended user input inside a prompt before passing it to the FMs. For example, a user may enter an incomplete problem statement like, “Where to purchase a shirt.” Internally, the application’s code uses an engineered prompt that says, “You are a sales assistant for a clothing company. A user, based in Alabama, United States, is asking you where to purchase a shirt. Respond with the three nearest store locations that currently stock a shirt.” The foundation model then generates more relevant and accurate information.
The prompt engineering field is evolving constantly and needs creative expression and natural language skills to tune the prompts and obtain the desired output from FMs. A prompt can contain any of the following elements:

Instruction – A specific task or instruction you want the model to perform
Context – External information or additional context that can steer the model to better responses
Input data – The input or question that you want to find a response for
Output indicator – The type or format of the output

You can use prompt engineering for various enterprise use cases across different industry segments, such as the following:

Banking and finance – Prompt engineering empowers language models to generate forecasts, conduct sentiment analysis, assess risks, formulate investment strategies, generate financial reports, and ensure regulatory compliance. For example, you can use large language models (LLMs) for a financial forecast by providing data and market indicators as prompts.
Healthcare and life sciences – Prompt engineering can help medical professionals optimize AI systems to aid in decision-making processes, such as diagnosis, treatment selection, or risk assessment. You can also engineer prompts to facilitate administrative tasks, such as patient scheduling, record keeping, or billing, thereby increasing efficiency.
Retail – Prompt engineering can help retailers implement chatbots to address common customer requests like queries about order status, returns, payments, and more, using natural language interactions. This can increase customer satisfaction and also allow human customer service teams to dedicate their expertise to intricate and sensitive customer issues.

In the following example, we implement a use case from the travel and hospitality industry to implement a personalized travel itinerary planner for customers who have upcoming travel plans. We demonstrate how we can build a generative AI chatbot that interacts with users by enriching the prompts from the user profile data that is stored in the Redshift database. We then send this enriched prompt to an LLM, specifically, Anthropic’s Claude on Amazon Bedrock, to obtain a customized travel plan.
Amazon Redshift has announced a feature called Amazon Redshift ML that makes it straightforward for data analysts and database developers to create, train, and apply machine learning (ML) models using familiar SQL commands in Redshift data warehouses. However, this post uses LLMs hosted on Amazon Bedrock to demonstrate general prompt engineering techniques and its benefits.
Solution overview
We all have searched the internet for things to do in a certain place during or before we go on a vacation. In this solution, we demonstrate how we can generate a custom, personalized travel itinerary that users can reference, which will be generated based on their hobbies, interests, favorite foods, and more. The solution uses their booking data to look up the cities they are going to, along with the travel dates, and comes up with a precise, personalized list of things to do. This solution can be used by the travel and hospitality industry to embed a personalized travel itinerary planner within their travel booking portal.
This solution contains two major components. First, we extract the user’s information like name, location, hobbies, interests, and favorite food, along with their upcoming travel booking details. With this information, we stitch a user prompt together and pass it to Anthropic’s Claude on Amazon Bedrock to obtain a personalized travel itinerary. The following diagram provides a high-level overview of the workflow and the components involved in this architecture.

First, the user logs in to the chatbot application, which is hosted behind an Application Load Balancer and authenticated using Amazon Cognito. We obtain the user ID from the user using the chatbot interface, which is sent to the prompt engineering module. The user’s information like name, location, hobbies, interests, and favorite food is extracted from the Redshift database along with their upcoming travel booking details like travel city, check-in date, and check-out date.
Prerequisites
Before you deploy this solution, make sure you have the following prerequisites set up:

A valid AWS account.
An AWS Identity and Access Management (IAM) role in the account that has sufficient permissions to create the necessary resources. If you have administrator access to the account, no action is necessary.
An SSL certificate created and imported into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate.

Deploy this solution
Use the following steps to deploy this solution in your environment. The code used in this solution is available in the GitHub repo.
The first step is to make sure the account and the AWS Region where the solution is being deployed have access to Amazon Bedrock base models.

On the Amazon Bedrock console, choose Model access in the navigation pane.
Choose Manage model access.
Select the Anthropic Claude model, then choose Save changes.

It may take a few minutes for the access status to change to Access granted.

Next, we use the following AWS CloudFormation template to deploy an Amazon Redshift Serverless cluster along with all the related components, including the Amazon Elastic Compute Cloud (Amazon EC2) instance to host the webapp.

Choose Launch Stack to launch the CloudFormation stack:
Provide a stack name and SSH keypair, then create the stack.
On the stack’s Outputs tab, save the values for the Redshift database workgroup name, secret ARN, URL, and Amazon Redshift service role ARN.

Now you’re ready to connect to the EC2 instance using SSH.

Open an SSH client.
Locate your private key file that was entered while launching the CloudFormation stack.
Change the permissions of the private key file to 400 (chmod 400 id_rsa).
Connect to the instance using its public DNS or IP address. For example:

ssh -i “id_rsa” ec2-user@ ec2-54-xxx-xxx-187.compute-1.amazonaws.com

Update the configuration file personalized-travel-itinerary-planner/core/data_feed_config.ini with the Region, workgroup name, and secret ARN that you saved earlier.
Run the following command to create the database objects that contain the user information and travel booking data:

python3 ~/personalized-travel-itinerary-planner/core/redshift_ddl.py

This command creates the travel schema along with the tables named user_profile and hotel_booking.

Run the following command to launch the web service:

streamlit run ~/personalized-travel-itinerary-planner/core/chatbot_app.py –server.port=8080 &

In the next steps, you create a user account to log in to the app.

On the Amazon Cognito console, choose User pools in the navigation pane.
Select the user pool that was created as part of the CloudFormation stack (travelplanner-user-pool).
Choose Create user.
Enter a user name, email, and password, then choose Create user.

Now you can update the callback URL in Amazon Cognito.

On the travelplanner-user-pool user pool details page, navigate to the App integration tab.
In the App client list section, choose the client that you created (travelplanner-client).
In the Hosted UI section, choose Edit.
For URL, enter the URL that you copied from the CloudFormation stack output (make sure to use lowercase).
Choose Save changes.

Test the solution
Now we can test the bot by asking it questions.

In a new browser window, enter the URL you copied from the CloudFormation stack output and log in using the user name and password that you created. Change the password if prompted.
Enter the user ID whose information you want to use (for this post, we use user ID 1028169).
Ask any question to the bot.

The following are some example questions:

Can you plan a detailed itinerary for my July trip?
Should I carry a jacket for my upcoming trip?
Can you recommend some places to travel in March?

Using the user ID you provided, the prompt engineering module will extract the user details and design a prompt, along with the question asked by the user, as shown in the following screenshot.

The highlighted text in the preceding screenshot is the user-specific information that was extracted from the Redshift database and stitched together with some additional instructions. The elements of a good prompt such as instruction, context, input data, and output indicator are also called out.
After you pass this prompt to the LLM, we get the following output. In this example, the LLM created a custom travel itinerary for the specific dates of the user’s upcoming booking. It also took into account the user’s hobbies, interests, and favorite food while planning this itinerary.

Clean up
To avoid incurring ongoing charges, clean up your infrastructure.

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Select the stack that you created and choose Delete.

Conclusion
In this post, we demonstrated how we can engineer prompts using data that is stored in Amazon Redshift and can be passed on to Amazon Bedrock to obtain an optimized response. This solution provides a simplified approach for building a generative AI application using proprietary data residing in your own database. By engineering tailored prompts based on the data in Amazon Redshift and having Amazon Bedrock generate responses, you can take advantage of generative AI in a customized way using your own datasets. This allows for more specific, relevant, and optimized output than would be possible with more generalized prompts. The post shows how you can integrate AWS services to create a generative AI solution that unleashes the full potential of these technologies with your data.
Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.

About the Authors
Ravikiran Rao is a Data Architect at AWS and is passionate about solving complex data challenges for various customers. Outside of work, he is a theatre enthusiast and an amateur tennis player.
Jigna Gandhi is a Sr. Solutions Architect at Amazon Web Services, based in the Greater New York City area. She has over 15 years of strong experience in leading several complex, highly robust, and massively scalable software solutions for large-scale enterprise applications.
Jason Pedreza is a Senior Redshift Specialist Solutions Architect at AWS with data warehousing experience handling petabytes of data. Prior to AWS, he built data warehouse solutions at Amazon.com and Amazon Devices. He specializes in Amazon Redshift and helps customers build scalable analytic solutions.
Roopali Mahajan is a Senior Solutions Architect with AWS based out of New York. She thrives on serving as a trusted advisor for her customers, helping them navigate their journey on cloud. Her day is spent solving complex business problems by designing effective solutions using AWS services. During off-hours, she loves to spend time with her family and travel.

Is Identity Resolution the Future of Marketing?

Posted on February 15, 2024 by i-genie

Identity resolution has long been key to understanding the customer journey, traditionally enabling enterprise businesses to consolidate the many, many data points they have into a cohesive customer profile.

However, with recent shifts in the digital space, notably the phasing out of third-party cookies and increased privacy regulations, marketers and businesses of all sizes are taking notice of this technology.

In fact, a staggering 87% of marketers agree that identity resolution is key to future-proofing their campaigns!

As tools like Google Analytics evolve and restrict access to once readily available data, marketers are leaning into identity resolution to navigate attribution.

It makes us wonder, “Is identity resolution the future of marketing?”.

The answer. It certainly feels like it.

Let’s dig a little deeper and find out just why identity resolution is so important to the marketing landscape and this new age where data is sacred and attribution feels impossible.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Understanding Identity Resolution

At its heart, identity resolution seems pretty simple. But there is a lot to it and understanding what it is and its role in business and marketing is important.

What is Identity Resolution?

Identity resolution is a data processing technique used by marketers to aggregate and analyze interactions and touchpoints from various sources to build a singular, cohesive identity of a consumer.

This process involves matching customer data points, such as email addresses, social media profiles, and device identifiers, across different channels and platforms.

By doing so, it allows businesses to understand and track an individual’s journey with their brand, enabling personalized marketing efforts and a more comprehensive understanding of consumer behavior.

Ok…that’s the boring definition.

Identity resolution basically takes all the things you know about your customers and prospects and puts them into one place (think name, email, address, business role, family connections, hobbies, etc).

For marketers, this is GOLD!

What are the Two Types of Identity Resolution?

It’s important to note there are two types of identity resolution: deterministic and probabilistic.

Deterministic matching operates on the principle of exactness, linking customer records by pinpointing exact matches among identifiers such as email addresses, phone numbers, or usernames. This method is best when there’s a wealth of first-party data at one’s disposal.

On the flip side, probabilistic matching takes a more nuanced approach, estimating the probability that different identifiers—think IP addresses, device types, browsers, or operating systems—belong to the same customer. While this approach might not offer the certainty of deterministic matching, it opens doors when first-party data is scarce or when expanding reach is the goal.

While no one likes what essentially feels like “guesswork”, sometimes probabilistic is the only way to go.

What Technologies Are Used in Identity Resolution?

There are a lot of technologies that go into identity resolution. After all, to get a true picture of an individual, you need to track them both online and in real life.

Here’s a breakdown of how to think about it:

The Collectors: Data Management Platforms (DMPs)

DMPs collect and organize data from a variety of sources, enabling marketers to better understand and segment their audiences. They are essential for targeting and personalization efforts in marketing campaigns.

The Organizers: Customer Relationship Management (CRM) Systems

CRMs store and manage customer interactions across different channels, providing a unified view of the customer journey. This helps businesses tailor their communications and improve customer relationships.

The Brainiacs: Artificial Intelligence (AI) and Machine Learning (ML)

AI and ML technologies analyze data to identify patterns and predict future behaviors. They enhance identity resolution by improving the accuracy of matching customer data across various sources.

The Connectors: Identity Graphs

Identity graphs compile data from multiple channels and devices to create comprehensive profiles of individual customers. They are key to understanding customer behavior and delivering consistent marketing messages.

The Protectors: Blockchain Technology

Blockchain offers a secure and transparent way to manage identity data. Its decentralized nature helps protect against fraud and ensures the integrity of customer information.

These technologies collectively empower marketers to achieve a holistic understanding of their customers, driving more effective and efficient marketing strategies.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

The Benefits of Identity Resolution for Marketers

Over the past few years, marketers have seen more and more of their data disappear.

In 2020, we saw iOS 14 wipe out ad audiences and in 2024, we are seeing cookies crumble to the wayside.

Oh, and did I mention GA4?

The result is an urgent need for a solution that gives us our data back and helps us make smarter marketing decisions.

Let’s look at a few ways identity resolution benefits marketers.

1. Personalized Marketing at Scale

With identity resolution, marketers can move beyond broad segmentation and generic messaging to tailor their outreach with precision.

This capability is only possible because of the rich, unified customer profiles that identity resolution provides – preferences, behaviors, interaction histories, and more.

With this data, marketers can craft messages that speak directly to the needs and interests of each customer, enhancing engagement and fostering loyalty.

Additionally, the scale at which personalized marketing can be achieved with identity resolution is totally unprecedented.

Automated technologies and sophisticated algorithms allow for the processing of huge amounts of data, ensuring that personalized experiences can be delivered to not just hundreds, but millions of customers simultaneously.

This scalability ensures that businesses of all sizes can leverage the power of personalization to compete effectively in their markets.

2. Enhanced Targeting and Retargeting Campaigns

As we’ve discussed many times, one of the biggest victims of privacy changes is retargeting.

Campaigns that used to perform unbelievably well now barely make a dent. Unless you have extremely targeted and segmented first-party lists, you might as well kiss retargeting success goodbye.

That’s where identity resolution comes in. It gives you the first-party data you need to retarget efficiently.

That first-party data can be put directly into your retargeting campaigns, allowing you to reach the right people, with the right message, in the right place.

The real bonus is that by accurately identifying and understanding customers across devices and platforms, marketers can ensure that their advertising dollars are being spent on the most receptive audiences.

3. Attribution and Customer Journey Mapping

Remember when Google Analytics would tell you which pages a user landed on, which pages they visited, and which channel they converted from?

That feels like a lifetime ago and I hate to tell you, it’s not getting any better.

Unless….

You guessed it – identity resolution!

Identity resolution gives us back some of the attribution and customer journey mapping data we used to have.

If you are using the Customers.ai Website Visitor ID X-Ray pixel for example, you can see the pages each user visited, if they were sent an email, and if they made a purchase. Crazy right?!

By accurately linking each interaction to the same individual, we can map out the entire customer journey, from initial awareness through to conversion.

This granular view allows marketers to attribute conversions and other key actions to the specific channels, campaigns, and messages that influenced them, providing invaluable insights into what works and what doesn’t.

Even better, understanding the customer journey at this level helps marketers to optimize their strategies in real time, reallocating resources to the most effective channels and touchpoints.

The role of identity resolution in marketing cannot be overstated. By leveraging the comprehensive insights provided by identity resolution, marketers can deliver the right message, to the right person, at the right time, maximizing impact and driving sales.

Future Trends in Identity Resolution

Identity resolution may not be a new concept but it does feel like it’s starting to make waves in the marketing space.

Previously, CDPs and identity resolution platforms were only accessible to big companies with big budgets.

Not anymore.

Emerging trends and technologies, such as AI and blockchain, are making this technology more accessible and redefining how marketers understand and interact with their audiences.

These advancements promise not only to enhance the accuracy and efficiency of marketing programs and reshape consumer experiences in exciting new ways. Let’s look at a couple trends impacting the future of identity resolution.

The AI Revolution in Identity Resolution

AI is already taking identity resolution to new heights (hello….CUSTOMERS.AI).

With its ability to process and analyze data at unprecedented scales, AI is improving the accuracy of matching algorithms, enabling more precise identification of individuals across devices and platforms.

This leap in precision will allow for even more personalized marketing strategies, as businesses will be able to understand their customers’ needs and preferences in real time, responding with tailored messages that hit the mark every time.

Moreover, AI’s predictive capabilities are set to revolutionize customer journey mapping, forecasting future behaviors based on past interactions.

Blockchain: Privacy and Security

Customers.ai is a 100% privacy-compliant solution but we do understand people’s concerns when they hear about this groundbreaking technology.

As consumers become more aware of their digital footprint, the demand for transparent and secure data practices continues to rise.

Blockchain technology offers a promising solution to the growing concerns around privacy and data security in identity resolution.

By creating a decentralized record of customer identities, blockchain can provide a secure framework that respects user privacy while still allowing for personalized marketing.

This balance between personalization and privacy is the holy grail for marketers, and blockchain could be the key to achieving it.

Marketing Technology Integrations

The potential for identity resolution to integrate seamlessly with other marketing technologies is immense.

The goal? A unified marketing ecosystem where every piece of customer data is leveraged to its fullest potential, delivering a cohesive and personalized customer experience across all touchpoints.

Such a connected ecosystem would not only streamline marketing operations but also provide deeper insights into customer behavior, enabling more strategic decision-making.

As identity resolution becomes more intertwined with other marketing technologies, we can expect to see a more holistic approach to customer engagement, where every interaction is informed by a comprehensive understanding of the individual’s journey.

Implementing Identity Resolution in Your Marketing Strategy

If you aren’t convinced identity resolution is the future of marketing, we encourage you to continue exploring the data that is out there.

If are you, and you are ready to take the next step, then Customers.ai is a great entry point:

Contact Capture & Data Enrichment

At the heart of Customers.ai is our ability to identify website visitors. This includes names, emails, business profiles, company data, LinkedIn profiles, and more.

Additionally, once we have contacts captured, we can enrich each individual with additional data points.

Precision Targeting and Personalization

With Customers.ai, precision targeting becomes more than a buzzword—it’s a reality.

Our X-ray tool allows you to not only capture visitor data but also segment into specific audiences.

This capability allows for the delivery of highly personalized content, offers, and messages that resonate with the individual preferences and needs of each customer.

Enhanced Attribution and Journey Mapping

The attribution and customer journey mapping that was taken away…no longer.

By providing a clear and cohesive view of the customer journey across channels and devices, marketers can attribute conversions accurately and understand the impact of various touchpoints.

This insight is invaluable for optimizing marketing strategies, reallocating budget to the most effective channels, and tailoring customer experiences to encourage loyalty and repeat business.

Ready to Take the Next Step in Your Identity Resolution Journey?

Customers.ai represents a pivotal advancement in identity resolution, offering marketers the tools they need to engage their audiences more effectively, understand their behaviors and preferences in greater depth, and drive meaningful interactions at every touchpoint.

With Customers.ai, businesses are well-equipped to meet the challenges of modern marketing and harness the full power of personalized engagement.

If you are ready to get started, Customers.ai is free to try and easy to set up.

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations.

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

If you want to learn more, don’t hesitate to contact us or reach out to sales team with questions.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Identity Resolution FAQs

Q. What is identity resolution in marketing?

Identity resolution in marketing is the process of integrating multiple identifiers across different devices and platforms to build a cohesive, unified view of an individual consumer, enabling personalized marketing strategies.

Q. How does identity resolution work?

Identity resolution works by collecting data from various sources, matching and linking identifiers like email addresses, social media profiles, and device IDs to create a comprehensive profile of a consumer’s interactions with a brand.

Q. Why is identity resolution important?

Identity resolution is crucial for understanding consumer behavior across channels, enabling personalized marketing, improving customer experiences, and increasing the effectiveness of targeting and retargeting campaigns.

Q. What are the benefits of identity resolution?

Benefits include enhanced targeting accuracy, personalized customer experiences, improved attribution and ROI, and the ability to engage consumers consistently across multiple touchpoints.

Q. What challenges does identity resolution face?

Challenges include data privacy regulations, the phasing out of third-party cookies, data fragmentation, and ensuring accuracy and completeness of customer profiles.

Q. How does identity resolution impact data privacy?

Identity resolution must balance personalized marketing with data privacy, adhering to regulations like GDPR and CCPA by managing consumer data transparently and securely.

Q. What technologies are used in identity resolution?

Technologies include Data Management Platforms (DMPs), Customer Relationship Management (CRM) systems, Artificial Intelligence (AI), Machine Learning (ML), and blockchain for secure data management.

Q. How does AI enhance identity resolution?

AI enhances identity resolution by improving the accuracy of data matching and prediction of consumer behavior, allowing for more sophisticated personalization and targeting.

Q. Can identity resolution work without third-party cookies?

Yes, identity resolution can work without third-party cookies by relying on first-party data, contextual targeting, and emerging technologies like AI and blockchain for data matching.

Q. What is deterministic matching in identity resolution?

Deterministic matching involves linking customer data across channels and devices using precise identifiers like email addresses or phone numbers, ensuring high accuracy.

Q. What is probabilistic matching in identity resolution?

Probabilistic matching estimates the likelihood that different identifiers belong to the same consumer, using data points like IP addresses and device types, useful when precise data is limited.

Q. How do identity graphs facilitate identity resolution?

Identity graphs compile and connect individual data points across channels and devices into a unified profile, enabling a comprehensive view of consumer behavior.

Q. What role does identity resolution play in attribution modeling?

Identity resolution enables accurate attribution modeling by linking every touchpoint and conversion to the correct consumer journey, clarifying the impact of each marketing channel.

Q. How can marketers improve identity resolution?

Marketers can improve identity resolution by investing in quality data sources, adopting advanced technologies like AI, and ensuring compliance with privacy regulations.

Q. What is the future of identity resolution?

The future of identity resolution includes increased reliance on AI and machine learning, the integration of blockchain for security, and adaptation to privacy-focused web environments.

Q. How does identity resolution enhance customer journey mapping?

Identity resolution provides detailed insights into each consumer’s path to purchase, allowing marketers to tailor experiences and communications at every stage of the journey.

Q. Can identity resolution improve online advertising?

Yes, by enabling precise targeting and retargeting based on unified customer profiles, identity resolution significantly improves the relevance and effectiveness of online advertising.

Q. What is the difference between identity resolution and data enrichment?

Identity resolution focuses on linking customer data across touchpoints to build a unified profile, while data enrichment involves adding external data to existing profiles to enhance understanding.

Q. How does blockchain technology benefit identity resolution?

Blockchain offers a secure, transparent framework for managing identity data, potentially enhancing privacy and trust in personalized marketing practices.

Q. What is the impact of GDPR on identity resolution?

GDPR requires stringent data protection measures and consumer consent, impacting how identity resolution processes handle and link EU citizens’ data.

Q. How do CRM systems complement identity resolution?

CRM systems provide a central repository for customer data, which, when integrated with identity resolution practices, enhances the understanding and personalization of customer interactions.

Q. How does identity resolution support omnichannel marketing?

Identity resolution enables consistent, personalized marketing messages across all channels by providing a unified view of the customer, essential for omnichannel strategies.

Q. What are identity graphs and how are they created?

Identity graphs are databases that link an individual’s identifiers across different platforms and devices, created by aggregating and matching data from various sources.

Q. How does identity resolution affect retargeting campaigns?

Identity resolution improves retargeting campaigns by accurately identifying and following consumers across devices, leading to more relevant and effective ad placements.

Q. What is the significance of first-party data in identity resolution?

First-party data is crucial for identity resolution in a privacy-focused era, offering a reliable, consent-based source of customer information for personalization efforts.

Q. How can businesses ensure accuracy in identity resolution?

Businesses can enhance accuracy by continually updating their data, utilizing advanced matching algorithms, and integrating various data sources for a comprehensive view.

Q. What role does machine learning play in identity resolution?

Machine learning algorithms analyze and predict consumer behaviors, improving the efficiency and accuracy of matching customer data across platforms and devices.

Q. How can identity resolution drive customer loyalty?

By enabling personalized interactions that meet consumers’ needs and preferences, identity resolution fosters positive experiences, enhancing customer satisfaction and loyalty.

Q. What are the best practices for implementing identity resolution?

Best practices include prioritizing data quality, ensuring privacy compliance, leveraging AI and machine learning for data matching, and integrating identity resolution with broader marketing strategies.

Q. How does identity resolution influence marketing ROI?

Identity resolution enhances marketing ROI by improving targeting accuracy, reducing ad waste, and delivering personalized experiences that drive conversions and customer retention.
The post Is Identity Resolution the Future of Marketing? appeared first on Customers.ai.

Meet MouSi: A Novel PolyVisual System that Closely Mirrors the Complex …

Posted on February 14, 2024 by i-genie

Current challenges faced by large vision-language models (VLMs) include limitations in the capabilities of individual visual components and issues arising from excessively long visual tokens. These challenges pose constraints on the model’s ability to accurately interpret complex visual information and lengthy contextual details. Recognizing the importance of overcoming these hurdles for improved performance and versatility, this paper introduces a novel approach!

The proposed solution involves leveraging ensemble expert techniques to synergize the strengths of individual visual encoders, encompassing skills in image-text matching, OCR, and image segmentation, among others. This methodology incorporates a fusion network to harmonize the processing of outputs from diverse visual experts, effectively bridging the gap between image encoders and pre-trained language models (LLMs).

Numerous researchers have highlighted deficiencies in the CLIP encoder, citing challenges such as its inability to reliably capture basic spatial factors in images and its susceptibility to object hallucination. Given the diverse capabilities and limitations of various vision models, a pivotal question arises: How can one harness the strengths of multiple visual experts to synergistically enhance overall performance?

Inspired by biological systems, the approach taken here adopts a poly-visual-expert perspective, akin to the operation of the vertebrate visual system. In the pursuit of developing Vision-Language Models (VLMs) with poly-visual experts, three primary concerns come to the forefront:

The effectiveness of poly-visual experts,

Optimal integration of multiple experts and

Prevention of exceeding the maximum length of Language Models (LLMs) with multiple visual experts.

A candidate pool comprising six renowned experts, including CLIP, DINOv2, LayoutLMv3, Convnext, SAM, and MAE, was constructed to assess the effectiveness of multiple visual experts in VLMs. Employing LLaVA-1.5 as the base setup, single-expert, double-expert, and triple-expert combinations were explored across eleven benchmarks. The results, as depicted in Figure 1, demonstrate that with an increasing number of visual experts, VLMs gain richer visual information (attributed to more visual channels), leading to an overall improvement in the upper limit of multimodal capability across various benchmarks.

Left: Comparing InstructBLIP, Qwen-VL-Chat, and LLaVA-1.5-7B, poly-visual-expert MouSi achieves SoTA on a broad range of nine benchmarks. Right: Performances of the best models with different numbers of experts on nine benchmark datasets. Overall, triple experts are better than double experts, who in turn are better than a single expert.

Furthermore, the paper explores various positional encoding schemes aimed at mitigating issues associated with lengthy image feature sequences. This addresses concerns related to position overflow and length limitations. For instance, in the implemented technique, there is a substantial reduction in positional occupancy in models like SAM, from 4096 to a more efficient and manageable 64 or even down to 1.

Experimental results showcased the consistently superior performance of VLMs employing multiple experts compared to isolated visual encoders. The integration of additional experts marked a significant performance boost, highlighting the effectiveness of this approach in enhancing the capabilities of vision-language models. They have illustrated that the polyvisual approach significantly elevates the performance of Vision-Language Models (VLMs), surpassing the accuracy and depth of understanding achieved by existing models.

The demonstrated results align with the hypothesis that a cohesive assembly of expert encoders can indeed bring about a substantial enhancement in the capability of VLMs to handle intricate multimodal inputs. To wrap it up, the research shows that using different visual experts makes Vision-Language Models (VLMs) work better. It helps the models understand complex information more effectively. This not only fixes current issues but also makes VLMs stronger. In the future, this approach could change how we bring together vision and language!

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet MouSi: A Novel PolyVisual System that Closely Mirrors the Complex and Multi-Dimensional Nature of Biological Visual Processing appeared first on MarkTechPost.