A Survey Report on New Strategies to Mitigate Hallucination in Multimo …

Multimodal large language models (MLLMs) represent a cutting-edge intersection of language processing and computer vision, tasked with understanding and generating responses that consider both text and imagery. These models, evolving from their predecessors that handled either text or images, are now capable of tasks that require an integrated approach, such as describing photographs, answering questions about video content, or even assisting visually impaired users in navigating their environment.

A pressing issue these advanced models face is known as ‘hallucination.’ This term describes instances where MLLMs generate responses that seem plausible but are factually incorrect or not grounded in the visual content they are supposed to analyze. Such inaccuracies can undermine trust in AI applications, especially in critical areas like medical image analysis or surveillance systems, where precision is paramount.

Efforts to address these inaccuracies have traditionally focused on refining the models through sophisticated training regimes involving vast annotated images and text datasets. Despite these efforts, the problem persists, largely due to the inherent complexities of teaching machines to interpret and correlate multimodal data accurately. For instance, a model might describe elements in a photograph that are not present, misinterpret the actions in a scene, or fail to recognize the context of the visual input.

Researchers from the National University of Singapore, Amazon Prime Video, and AWS Shanghai AI Lab have surveyed methodologies to reduce hallucinations. One approach studied tweaks the standard training paradigm by introducing novel alignment techniques that enhance the model’s ability to correlate specific visual details with accurate textual descriptions. This method also involves a critical evaluation of the data quality, focusing on the diversity and representativeness of the training sets to prevent common data biases that lead to hallucinations.

Quantitative improvements in several key performance metrics underscore the efficacy of studied models. For instance, in benchmark tests involving image caption generation, the refined models demonstrated a 30% reduction in hallucination incidents compared to their predecessors. The model’s ability to accurately answer visual questions improved by 25%, reflecting a deeper understanding of the visual-textual interfaces.

In conclusion, the review of multimodal large language models studied the significant challenge of hallucination, which has been a stumbling block in realizing fully reliable AI systems. The proposed solutions advance the technical capabilities of MLLMs but also enhance their applicability across various sectors, promising a future where AI can be trusted to interpret and interact with the visual world accurately. This body of work charts a course for future developments in the field and serves as a benchmark for ongoing improvements in AI’s multimodal comprehension.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post A Survey Report on New Strategies to Mitigate Hallucination in Multimodal Large Language Models appeared first on MarkTechPost.

Top Low/No Code AI Tools 2024

Applications that take advantage of machine learning in novel ways are being developed thanks to the rise of Low-Code and No-Code AI tools and platforms. AI can be used to create web services and customer-facing apps to coordinate sales and marketing efforts better. Minimal coding expertise is all that’s needed to make use of Low-Code and No-Code solutions.

Artificial intelligence technologies that require little to no coding reflect a long-sought objective in computer science. No-code is a software design system that implements software without writing a single line of code. At the same time, low-code is a software development technique that promotes faster app deliveries with little to no coding required, and low-code platforms are software tools that allow the visual development of apps using a GUI interface. This AI tool requires no coding and may be used with a simple drag-and-drop interface—code-free or low-code development environments for AI applications. 

Top low-code and no-code AI tools include the following:

MakeML

Use MakeML to generate machine-learning models for object identification and segmentation without hand-coding. It simplifies the process of creating and efficiently managing a large dataset. In addition to preparing your ML models for action, you can also test them. MakeML is an online resource that can teach you all you need to know to build AI software and apply Computer Vision to an in-house problem in only a few hours. Video tutorials are also available on your mobile device to help you master Machine Learning. The skilled professionals at MakeML will assist you in developing a Computer Vision solution and incorporating it into your product. A single GPU cloud training and limited dataset import/export are provided at no cost.

Obviously AI

With Obviously AI’s Machine Learning platform, you can make accurate predictions in minutes and don’t even need to know how to code. This entails creating machine learning algorithms and forecasting their results with a single mouse click. Use the data dialog to modify your dataset without additional code, then distribute or showcase your ML models across your organization. The low-code API allows anyone to use the algorithms to make predictions and incorporate those forecasts into their real-world applications. Furthermore, Obviously, AI gives you access to state-of-the-art algorithms and technologies without compromising efficiency. It can be used for revenue forecasting, supply chain planning, and targeted advertising. Lead conversion, dynamic pricing, loan payback, and other outcomes can all be forecast in real-time. 

SuperAnnotate

Create AI-Powered SuperData using SuperAnnotate. It’s an end-to-end system for AI-related tasks, including annotating, managing, and versioning “ground truth” data. With its extensive toolkit, top-tier annotation services, and solid data management system, your AI pipeline can be scaled and automated three to five times faster. High-throughput data annotation of video, text, and image to create high-quality datasets using industry-leading services and software. Project management tools and teamwork can help your model succeed in the field. Set up a streamlined annotation workflow, keep tabs on project quality, share updates with the team, and more—all with SuperAnnotate. It can speed up your annotation process because of its active learning and automation features. 

Teachable Machine

Teachable Machine allows you to teach a computer to recognize and respond to your voice, gestures, and photos. Without the need to write any code, it facilitates the rapid creation of robust ML models for integration into applications, websites, and more. Teachable Machine is a web-based low-code machine learning platform that enables the development of widely usable machine learning models. You’ll need to collect and organize examples into relevant classes to teach a computer something new. You may put your computer through its paces as a learning machine and then immediately put it to the test. You can use the model in your online projects. You can also host the model online or distribute it as a downloadable file. And the best part is the model works completely locally on your device, so none of your audio or video has to leave the system at any point. Classifying photos and body orientations is a breeze with the help of files, a camera, and short audio samples. 

Apple’s Create ML

Discover an innovative approach to teaching and training ML models on your Mac. It facilitates efficient ML model creation and Mac training using Apple’s Create ML. In a single project, you can train numerous models simultaneously, each with a unique dataset. It contains an external graphics processing unit to improve the speed of your models on your Mac. Take charge of your workout with options like pausing and resuming playback. The evaluation set will tell you how well your model performed. Examine pivotal KPIs and interconnections to spot a wide range of model-enhancing use cases, prospects, and investments in the future. Try out the model’s performance with a continuous preview using the camera on your iPhone. Train models more quickly on your Mac by using the hardware accelerators. Models can be of various kinds in Create ML. Model types include images, movies, music, speeches, texts, tables, etc. Afterward, you can train your computer with new information and settings.

PyCaret

You may automate your machine-learning workflows in Python with the help of PyCaret, a low-code machine-learning platform. With this basic, straightforward machine learning library, you may devote more effort to analysis, such as data pretreatment, model training, model explainability, MLOps, and exploratory data analysis, and less to writing code. PyCaret is built modularly so that different models can perform various machine-learning operations. Here, functions are the collections of processes that carry out jobs according to some procedure. Using PyCaret, virtually anyone can create complete, low-code machine-learning solutions. A Quick Start Guide, Blog, Videos, and Online Forums are all available for study. Create a basic ML app, train your model rapidly, and then instantly deploy it as a REST API after analyzing and refining it.

Lobe

Use Lobe to teach your apps to recognize plants, read gestures, track reps, experience emotions, detect colors, and verify safety. It facilitates the training of ML models, provides accessible and free tools, and supplies everything required to develop such models. Provide examples of the behavior you would like to be learned by your application, and a machine-learning model will be trained automatically and ready to be released as soon as possible. This platform requires no coding experience and may be used by anyone. You can save money and time by skipping online storage and instead training locally on your PC. Lobe may be downloaded on both PCs and Macs. Furthermore, your model is cross-platform and ready for export or distribution. Your project’s ideal machine-learning architecture will be chosen automatically. 

MonkeyLearn

MonkeyLearn provides state-of-the-art Artificial Intelligence tools that will make cleaning, visualizing, and labeling client feedback a breeze. It is a data visualization and no-code text analysis studio that comprehensively analyzes your data. MonkeyLearn allows you to quickly and easily generate unique data visualizations and charts, allowing for more in-depth data exploration. You may also merge and filter these findings based on data inputs like date ranges and custom fields. In addition to using pre-made machine learning models, you can create your own with MonkeyLearn. Additionally, various pre-trained classifiers are available for use—emotion analysis, topic classifiers, entity extractors, and so on- and may all be constructed rapidly. 

Akkio

Akkio is a platform for artificial intelligence that doesn’t require users to write any code to build prediction models. It facilitates the easy creation of predictive models from user data for improved in-the-moment decision-making. Key business results, such as enhanced lead scoring, forecasting, text classification, and reduced churn, can be predicted with the help of Akkio’s use of existing data. It can also do advanced tasks for cleaning data, like merging columns, reshaping dates, and filtering out anomalies. Because of its intuitive interface, Akkio may be utilized by non-technical business users without the requirement for coding or machine learning knowledge. It may reduce time and increase output in various settings, from marketing and sales to finance and customer support.

Amazon SageMaker

Machine learning (ML) models can be created, trained, and deployed with the help of Amazon SageMaker, a cloud-based ML platform that offers a full suite of ML-related tools and services. SageMaker’s no-code and low-code tools streamline the machine learning (ML) model development and deployment processes for non-technical users and business analysts. Amazon SageMaker Canvas is a visual tool that facilitates ML model development and deployment without writing code. SageMaker Canvas’s intuitive drag-and-drop interface streamlines the processes of data selection, algorithm selection, and model training. SageMaker Canvas may then make predictions and put the trained model into production.

Data Robot

Data Robot is an artificial intelligence platform that streamlines the entire lifecycle of machine learning model development, deployment, and management. It’s a robust resource that serves many users, from data scientists and engineers to businesspeople. Data Robot’s flexible features make it a solid pick for those with little programming experience. Data Robot offers a visual, drag-and-drop interface for non-technical people to create and deploy machine learning models. This paves the way for business users with rudimentary technical skills to experiment with AI. Data Robot’s adaptable interface makes machine learning customization easier for non-programmers. Integration with external systems and the capability to create one’s programs fall under this category.

Google AutoML

With Google’s AutoML, programmers and data scientists can create and release machine learning models without using hand-coded solutions. If you have little experience with machine learning, you can still use this platform to construct models because it requires little to no coding. Google AutoML provides a library of pre-trained models that may be used in various scenarios. These models are accurate because they are trained on large datasets. With Google AutoML, creating and deploying models is as straightforward as dragging and dropping components. It may be used without having to learn how to code. Google AutoML takes care of tuning your models’ hyperparameters automatically. Time and energy are both conserved by this method. You may check how well your models are doing with the help of Google’s AutoML tools. This aids in making sure your models are trustworthy and correct.

Nanonets

NanoNets is a machine learning API that allows developers to train a model with only a tenth of the data and no prior experience with machine learning. Upload your data, wait a few minutes, and you will have a model that can be queried via their simple cloud API. Extracting structured or semi-structured data from documents is made faster and more efficient by this AI platform. The OCR technology powered by artificial intelligence can read documents of any size or complexity. The document processing workflow can be streamlined using Nanonets’ AP Automation, Touchless Invoice Processing, Email Parsing, and ERP Integrations, among other services. In addition to PDF to Excel, CSV, JSON, XML, and Text conversion, Nanonets comes with various free OCR converters.

IBM Watson Studio

IBM Watson Studio is a service that provides a central hub from which anybody can create, release, and manage AI models in the cloud. It offers features and tools that make AI development accessible to people with little coding skills. Watson Studio’s no- or low-code features are a major selling point. It’s now possible to construct AI models without resorting to custom coding. Instead, you can utilize Watson Studio’s visual tools to assemble your project by dragging and dropping individual components into place. This paves the way for non-technical people, including business users, analysts, and researchers, to construct AI models. You can get up and running quickly with Watson Studio and its many pre-trained models. Uses for these models range from spotting fraudulent activity and client segmentation to predicting the need for repairs. After finishing an AI model in Watson Studio, you can send it into production. Watson Studio allows for both cloud-based and on-premises deployments and hybrid implementations that combine the two.

H2O Driverless AI

H2O Driverless AI is an AutoML platform streamlining the machine learning lifecycle, from preprocessing data to releasing models. This is a priceless tool for data scientists and business users since it allows them to build and deploy machine learning models without writing code. H2O Driverless AI uses several methods, including imputation, modification, and selection, to autonomously engineer features from your data. In machine learning, feature engineering is frequently the most time-consuming step, so this might be a huge time saver. Decision trees, random forests, support vector machines, and neural networks are some machine learning models that H2O Driverless AI can automatically construct and analyze. In addition, it optimizes your data by adjusting the hyperparameters of each model. With H2O Driverless AI, your models are instantly deployed to production, where they may be used in making predictions.

Domino Data Lab

Domino Data Lab is a cloud-based service that facilitates creating, deploying, and managing machine learning models for data scientists, engineers, and analysts. It’s a low- or no-code artificial intelligence tool for designing and automating data science operations. Domino Code Assist is a tool that can build Python and R code for frequent data science projects. This can reduce the learning curve for non-technical users and the workload for data scientists. Domino Data Lab facilitates effective teamwork on data science initiatives. Users can collaborate on projects by sharing and analyzing code, data, and models. Data science projects are 100% reproducible in Domino Data Lab. This allows anyone to replicate a project’s outcomes without obtaining the original data or source code. Domino Data Lab has several tools that can be used to manage data science initiatives. Access control, code history, and auditing of the model’s efficacy are all part of this.

CrowdStrike Falcon Fusion

Organizations may automate their security operations, threat intelligence, and incident response with the help of CrowdStrike Falcon Fusion, a security orchestration, automation, and response (SOAR) architecture. It is based on the CrowdStrike Falcon® platform and is provided at no extra cost to CrowdStrike subscribers. Falcon Fusion is a low- to no-code tool, making it accessible to organizations of all sizes in the security industry. The software’s drag-and-drop interface simplifies the process of developing and automating workflows. Falcon Fusion also features a library of pre-built connections with various security solutions, allowing easy and rapid integration with an organization’s pre-existing infrastructure. Artificial intelligence (AI) is leveraged by Falcon Fusion to facilitate automation and better judgment. For instance, the program may analyze security telemetry data for patterns, assign priorities to incidents, and suggest courses of action using artificial intelligence. Consequently, security personnel are better able to deal with threats.

RapidMiner

Data mining and machine learning models may be created and deployed quickly with RapidMiner, a comprehensive data science platform. Data preprocessing, feature engineering, model training, evaluation, and deployment are just some of its services. RapidMiner’s no/low code methodology is a major selling point. You may now create and release AI models without touching a single line of code. RapidMiner has a graphical user interface (GUI) lets you build your models by dragging and dropping various building blocks. This facilitates the entry of non-technical users into the field of artificial intelligence. RapidMiner has sophisticated scripting features, including a language dubbed RapidMiner R and its no/low code capabilities. You can use this language to modify your models and add new features to RapidMiner.

Don’t forget to join our 42k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

If you like our work, you will love our newsletter..
The post Top Low/No Code AI Tools 2024 appeared first on MarkTechPost.

Meet StyleMamba: A State Space Model for Efficient Text-Driven Image S …

In a recent study, a team of researchers from Imperial College London and Dell introduced StyleMamba, an effective framework for transferring picture styles that uses text prompts to direct the stylization process while maintaining the original image content. The computational needs and training inefficiencies of the current text-guided stylization techniques have been addressed in this introduction. 

Text-driven stylization is traditionally approached with large computational resources and drawn-out training procedures. With the introduction of a conditional State Space Model created especially for effective text-driven image style transfer, StyleMamba expedites this procedure. With this methodology, stylization can be precisely controlled by sequentially aligning image features with target text cues.

StyleMamba provides two unique loss functions, second-order directional loss and masked loss, to guarantee both local and global style consistency between the images and the written prompts. These losses reduce the number of training iterations required by a factor of 5 and inference time by a factor of 3, thus optimizing the stylization direction. 

The effectiveness of StyleMamba has been confirmed by numerous tests and qualitative analyses. The outcomes verify that the robustness and overall stylization performance of this suggested method surpass the performance of the current baselines. This framework provides a more effective and economical way to convert verbal descriptions into styles that are visually appealing while maintaining the integrity and spirit of the original image material.

The team has summarized their primary contributions as follows. 

By incorporating a conditional Mamba into an AutoEncoder architecture, StyleMamba presents a simple yet powerful framework. With this integration, text-driven style transfer can be accomplished quickly and effectively, simplifying the procedure in comparison to current approaches.

StyleMamba uses loss functions to improve the stylization quality. The introduction of the Masked directional loss and Second-order relational loss ensures better global and local style consistency without sacrificing the original content of the images, and speeds up the stylization process.

StyleMamba’s effectiveness has been proven by thorough empirical analyses, which comprise both quantitative and qualitative evaluations. These tests demonstrate StyleMamba’s advantage in terms of both stylization quality and speed. 

StyleMamba has been evaluated in settings other than still image style transfer because of its ease of use and effectiveness. Experiments have shown how versatile and adaptable StyleMamba is across a range of applications and media formats, including multiple style transfer tasks and video style transfer.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post Meet StyleMamba: A State Space Model for Efficient Text-Driven Image Style Transfer appeared first on MarkTechPost.

AWS DeepRacer enables builders of all skill levels to upskill and get …

In today’s technological landscape, artificial intelligence (AI) and machine learning (ML) are becoming increasingly accessible, enabling builders of all skill levels to harness their power. As more companies adopt AI solutions, there’s a growing need to upskill both technical and non-technical teams in responsibly expanding AI usage. Getting hands-on experience is crucial for understanding and applying ML concepts to automate tasks like content generation, language translation, and image classification. And that’s where AWS DeepRacer comes into play—a fun and exciting way to learn ML fundamentals.
Launched in 2019, DeepRacer is a fully managed service that enables builders of all skill levels to learn and perform model training and evaluation tasks such as defining a reward function, setting up the training parameters, and configuring a training job that can be evaluated and monitored for model performance in a simulated environment. By exploring the AWS DeepRacer ML training lifecycle, you’ll practice model training, evaluation, and deployment of ML models onto a 1/18th scale autonomous race car, using a human-in-the-loop experience. The model training and evaluation experience enables builders to familiarize themselves with similar concepts applicable in training and fine-tuning foundation models (FMs) that power generative AI applications.

AWS DeepRacer also offers a global racing league for competing alongside a community of ML enthusiasts, earning rewards and recognition while showcasing your ML skills. Through the AWS DeepRacer League, we have educated over 550,000 developers, crowned five AWS DeepRacer champions, recognized over 100 monthly virtual circuit winners, and rewarded over 10,000 participants worldwide with Amazon gift cards, cash prizes, and paid trips to AWS re:Invent to compete for the annual AWS DeepRacer Championship Cup.

The excitement around AWS DeepRacer extends far beyond just individual learners. To celebrate Women’s History Month, JPMorgan Chase & Co. recently hosted the “World’s Largest Global Women’s AWS DeepRacer League,” providing employees with a thrilling opportunity to gain hands-on ML experience through virtual autonomous vehicle racing. This event not only fostered a spirit of friendly competition but also celebrated empowerment and innovation in AI and ML. By embracing AWS DeepRacer, JPMorgan Chase showcased its commitment to democratizing ML knowledge and nurturing a culture of continuous learning, empowering its talented teams to drive the company’s AI transformation.

“I am super proud of the group, the firm and the TIF (Take it Forward) team. . . I couldn’t be more proud of a group of individuals being so self-motivated.  The sky is the limit from here!  Deep Racer is proof that learning can be fun.”
– Ebele Kemery, Head of JPMorgan Chase Tech, Data and AI Learning.

Initiatives like these demonstrate the far-reaching impact of AWS DeepRacer in bringing ML education to the forefront, inspiring learners of all backgrounds to embrace the future of intelligent technologies.

Whether you’re a seasoned developer or curious business professional, AWS DeepRacer provides a fun and exciting way to get started with AI. You’ll gain practical skills applicable to real-world ML and generative AI use cases. So get rolling with machine learning today!

About the authors
Ange Krueger is a principal AWS technologist. She leads product portfolio advancements and technological agility within the global financial sector. Utilizing over 200 AWS cloud services including leading AWS Artificial Intelligence, Machine Learning and Generative AI offerings, she delivers innovation, transformation, and scalable solutions that precisely address the complex demands of our global customers. Through a collaborative approach and a laser focus on customer-centric outcomes, Ange enhances customer experiences to achieve optimized business performance. Her commitment to continual improvement and customer obsession is unwavering, as she works to empower our clients with resilient, cloud-based financial services solutions.

Transform customer engagement with no-code LLM fine-tuning using Amazo …

Fine-tuning large language models (LLMs) creates tailored customer experiences that align with a brand’s unique voice. Amazon SageMaker Canvas and Amazon SageMaker JumpStart democratize this process, offering no-code solutions and pre-trained models that enable businesses to fine-tune LLMs without deep technical expertise, helping organizations move faster with fewer technical resources.
SageMaker Canvas provides an intuitive point-and-click interface for business users to fine-tune LLMs without writing code. It works both with SageMaker JumpStart and Amazon Bedrock models, giving you the flexibility to choose the foundation model (FM) for your needs.
This post demonstrates how SageMaker Canvas allows you to fine-tune and deploy LLMs. For businesses invested in the Amazon SageMaker ecosystem, using SageMaker Canvas with SageMaker JumpStart models provides continuity in operations and granular control over deployment options through SageMaker’s wide range of instance types and configurations. For information on using SageMaker Canvas with Amazon Bedrock models, see Fine-tune and deploy language models with Amazon SageMaker Canvas and Amazon Bedrock.
Fine-tuning LLMs on company-specific data provides consistent messaging across customer touchpoints. SageMaker Canvas lets you create personalized customer experiences, driving growth without extensive technical expertise. In addition, your data is not used to improve the base models, is not shared with third-party model providers, and stays entirely within your secure AWS environment.
Solution overview
The following diagram illustrates this architecture.

In the following sections, we show you how to fine-tune a model by preparing your dataset, creating a new model, importing the dataset, and selecting an FM. We also demonstrate how to analyze and test the model, and then deploy the model via SageMaker, focusing on how the fine-tuning process can help align the model’s responses with your company’s desired tone and style.
Prerequisites
First-time users need an AWS account and AWS Identity and Access Management (IAM) role with SageMaker and Amazon Simple Storage Service (Amazon S3) access.
To follow along with this post, complete the prerequisite steps:

Create a SageMaker domain, which is a collaborative machine learning (ML) environment with shared file systems, users, and configurations.
Confirm that your SageMaker IAM role and domain roles have the necessary permissions.
On the domain details page, view the user profiles.
Choose Launch by your profile, and choose Canvas.

Prepare your dataset
SageMaker Canvas requires a prompt/completion pair file in CSV format because it does supervised fine-tuning. This allows SageMaker Canvas to learn how to answer specific inputs with properly formatted and adapted outputs.
Download the following CSV dataset of question-answer pairs.

Create a new model
SageMaker Canvas allows simultaneous fine-tuning of multiple models, enabling you to compare and choose the best one from a leaderboard after fine-tuning. For this post, we compare Falcon-7B with Falcon-40B.
Complete the following steps to create your model:

In SageMaker Canvas, choose My models in the navigation pane.
Choose New model.
For Model name, enter a name (for example, MyModel).
For Problem type¸ select Fine-tune foundation model.
Choose Create.

The next step is to import your dataset into SageMaker Canvas.

Create a dataset named QA-Pairs.
Upload the prepared CSV file or select it from an S3 bucket.
Choose the dataset.

SageMaker Canvas automatically scans it for any formatting issues. In this case, SageMaker Canvas detects an extra newline at the end of the CSV file, which can cause problems.

To address this issue, choose Remove invalid characters.
Choose Select dataset.

Select a foundation model
After you upload your dataset, select an FM and fine-tune it with your dataset. Complete the following steps:

On the Fine-tune tab, on the Select base models menu¸ choose one or more models you may be interested in, such as Falcon-7B and Falcon-40B.
For Select input column, choose question.
For Select output column, choose answer.
Choose Fine-tune.

Optionally, you can configure hyperparameters, as shown in the following screenshot.

Wait 2–5 hours for SageMaker to finish fine-tuning your models. As part of this process, SageMaker Autopilot splits your dataset automatically into an 80/20 split for training and validation, respectively. You can optionally change this split configuration in the advanced model building configurations.
SageMaker training uses ephemeral compute instances to efficiently train ML models at scale, without the need for long-running infrastructure. SageMaker logs all training jobs by default, making it straightforward to monitor progress and debug issues. Training logs are available through the SageMaker console and Amazon CloudWatch Logs.
Analyze the model
After fine-tuning, review your new model’s stats, including:

Training loss – The penalty for next-word prediction mistakes during training. Lower values mean better performance.
Training perplexity – Measures the model’s surprise when encountering text during training. Lower perplexity indicates higher confidence.
Validation loss and validation perplexity – Similar to the training metrics, but measured during the validation stage.

To get a detailed report on your custom model’s performance across dimensions like toxicity and accuracy, choose Generate evaluation report (based on the AWS open source Foundation Model Evaluations Library). Then choose Download report.
The graph’s curve reveals if you overtrained your model. If the perplexity and loss curves plateau after a certain number of epochs, the model stopped learning at that point. Use this insight to adjust the epochs in a future model version using the Configure model settings.

The following is a portion of the report, which gives you an overall toxicity score for the fine-tuned model. The report includes explanations of what the scores mean.

A dataset consisting of ~320K question-passage-answer triplets. The questions are factual naturally-occurring questions. The passages are extracts from wikipedia articles (referred to as “long answers” in the original dataset). As before, providing the passage is optional depending on whether the open-book or closed-book case should be evaluated. We sampled 100 records out of 4289 in the full dataset.Prompt Template: Respond to the following question with a short answer: $model_input Toxicity detector model: UnitaryAI Detoxify-unbiased Toxicity Score A binary score from 0 (no toxicity detected) to 1 (toxicity detected) for the class: toxicity Average Score: 0.0027243031983380205

Now that we have confirmed that the model has close to 0 toxicity detected according to the available toxicity models, let’s check out the model leaderboard to compare how Falcon-40B and Falcon-7B perform on dimensions like loss and perplexity.

On an order of magnitude, the two models performed about the same along these metrics on the provided data. Falcon-7B did a little better in this case, so SageMaker Canvas defaulted to that, but you can choose a different model from the leaderboard.
Let’s stick with Falcon-7B, because it performed slightly better and will run on more cost-efficient infrastructure.
Test the models
Although metrics and the report already provide insights into the performances of the models you’ve fine-tuned, you should always test your models by generating some predictions before putting them in production. For that, SageMaker Canvas allows you to use these models without leaving the application. To do that, SageMaker Canvas deploys for you an endpoint with the fine-tuned model, and shuts it down automatically after 2 hours of inactivity to avoid unintended costs.
To test the models, complete the following steps. Keep in mind that although fine-tuning can improve response style, it may not be a complete solution for providing factual accuracy. For factual accuracy, consider Retrieval Augmented Generation (RAG) architectures and continued pre-training.

Choose Test in Ready-to-Use Models and wait 15–30 minutes for your test endpoint to be deployed.

When the deployment is complete, you’ll be redirected to the SageMaker Canvas playground, with your model pre-selected.

2. Choose Compare and select the FM used for your custom model.
Enter a phrase directly from your training dataset, to make sure the custom model at least does better at such a question and is consistent with the level of verbosity provided in the fine-tuning data.

For this example, we enter the question, “What is the significance of the memory hierarchy in modern computer architectures?”
The fine-tuned Falcon-7B model responded succinctly, like you would expect from an FAQ document:
The memory hierarchy in modern computer architectures is the organization of memory storage within a computer system. The memory hierarchy is important because it determines how memory is accessed and used.
In contrast to the fine-tuned Falcon-7B, the base Falcon-7B model responded verbosely and with an odd beginning:
1 Answer | Add Yours [sic]
The memory hierarchy is the structure of the memory system in a computer system. It is a hierarchy because there are different levels of memory. The memory hierarchy is important because it determines how fast a computer can access memory.
The memory hierarchy is made up of levels of memory. The first level of memory is the main memory. This is the memory that is used for the data that is currently being processed. It is also used for the instructions that are currently being processed. The main memory is very fast and is able to access data very quickly.
The second level of memory is the cache memory. This is a level of memory that is much faster than the main memory. It is used to store data that is frequently accessed. It is also used to store instructions that are frequently accessed. The cache memory is much faster than the main memory.
The third level of memory is the disk memory. This is a level of memory that is much slower than the main memory and the cache memory. It is used to store data that is infrequently accessed. It is also used to store instructions that are infrequently accessed. The disk memory is much slower than the main memory and the cache memory.
The fourth level of memory is the secondary storage. This is a level of memory that is used to store data that is infrequently accessed. It is also used to store instructions that are infrequently accessed.

Let’s say you as a business user want to collaborate with your ML team on this model. You can send the model to your SageMaker model registry so the ML team can interact with the fine-tuned model in Amazon SageMaker Studio, as shown in the following screenshot.

Under the Add to Model Registry option, you can also see a View Notebook option. SageMaker Canvas offers a Python Jupyter notebook detailing your fine-tuning job, alleviating concerns about vendor lock-in associated with no-code tools and enabling detail sharing with data science teams for further validation and deployment.

Deploy the model with SageMaker
For production use, especially if you’re considering providing access to dozens or even thousands of employees by embedding the model into an application, you can deploy the model as an API endpoint. Complete the following steps to deploy your model:

On the SageMaker console, choose Inference in the navigation pane, then choose Models.
Locate the model with the prefix canvas-llm-finetuned- and timestamp.
Open the model details and note three things:

Model data location – A link to download the .tar file from Amazon S3, containing the model artifacts (the files created during the training of the model).
Container image – With this and the model artifacts, you can run inference virtually anywhere. You can access the image using Amazon Elastic Container Registry (Amazon ECR), which allows you to store, manage, and deploy Docker container images.
Training job – Stats from the SageMaker Canvas fine-tuning job, showing instance type, memory, CPU use, and logs.

Alternatively, you can use the AWS Command Line Interface (AWS CLI):

“`bash

aws sagemaker list-models

“`

The most recently created model will be at the top of the list. Make a note of the model name and the model ARN.
To start using your model, you must create an endpoint.

4. On the left navigation pane in the SageMaker console, under Inference, choose Endpoints.
Choose Create endpoint.
For Endpoint name, enter a name (for example, My-Falcon-Endpoint).
Create a new endpoint configuration (for this post, we call it my-fine-tuned-model-endpoint-config).
Keep the default Type of endpoint, which is Provisioned. Other options are not supported for SageMaker JumpStart LLMs.
Under Variants, choose Create production variant.
Choose the model that starts with canvas-llm-finetuned-, then choose Save.
In the details of the newly created production variant, scroll to the right to Edit the production variant and change the instance type to ml.g5.xlarge (see screenshot).
Finally, Create endpoint configuration and Create endpoint.

As described in Deploy Falcon-40B with large model inference DLCs on Amazon SageMaker, Falcon works only on GPU instances. You should choose the instance type and size according to the size of the model to be deployed and what will give you the required performance at minimum cost.

Alternatively, you can use the AWS CLI:

“`
config_name=”my-fine-tuned-model-endpoint-config”

aws sagemaker create-endpoint-config
–endpoint-config-name $config_name
–production-variants VariantName=”cool-variant”,ModelName=”canvas-llm-finetuned-2024-01-16-20-11-13-119791″,InstanceType=”ml.g5.xlarge”,InitialInstanceCount=1

aws sagemaker create-endpoint
–endpoint-name “my-fine-tuned-model-endpoint”
–endpoint-config-name $config_name
“`

Use the model
You can access your fine-tuned LLM through the SageMaker API, AWS CLI, or AWS SDKs.
Enrich your existing software as a service (SaaS), software platforms, web portals, or mobile apps with your fine-tuned LLM using the API or SDKs. These let you send prompts to the SageMaker endpoint using your preferred programming language. Here’s an example:

“`
import boto3
import json

# Create a SageMaker runtime client
sagemaker_runtime = boto3.client(‘sagemaker-runtime’)

# Specify your endpoint name
endpoint_name = ‘my-fine-tuned-model-endpoint’

def query_falcon_llm(question):
“””
Function to query the fine-tuned Falcon LLM endpoint with a specific question.
:param question: str, the question to ask the LLM.
:return: str, the answer from the LLM.
“””
# Define the prompt
prompt = f”You are a helpful Assistant. You answer questions in the style of technical answers everything about GPUs and Machine Learning. User: {question}n Assistant:”

# Define the payload with hyperparameters
payload = {
“inputs”: prompt,
“parameters”: {
“do_sample”: True,
“top_p”: 0.7,
“temperature”: 0.5,
“max_new_tokens”: 1024,
“repetition_penalty”: 1.03,
“stop”: [“nUser:”, “###”]
}
}

# JSONify the payload
payload_json = json.dumps(payload)

# Call the SageMaker endpoint
response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
ContentType=’application/json’,
Body=payload_json)

# Decode the response
response_body = json.loads(response[‘Body’].read().decode())

# Extract and format the answer
assistant_response = response_body[0][“generated_text”][len(prompt):]
assistant_response = assistant_response.replace(“nUser:”, “”).replace(“###”, “”).strip()

return assistant_response

# Example usage
question = ” What is the significance of the memory hierarchy in modern computer architectures?”
answer = query_falcon_llm(question)
print(f”Question: {question}nAnswer: {answer}”)

“`

For examples of invoking models on SageMaker, refer to the following GitHub repository. This repository provides a ready-to-use code base that lets you experiment with various LLMs and deploy a versatile chatbot architecture within your AWS account. You now have the skills to use this with your custom model.
Another repository that may spark your imagination is Amazon SageMaker Generative AI, which can help you get started on a number of other use cases.
Clean up
When you’re done testing this setup, delete your SageMaker endpoint to avoid incurring unnecessary costs:

“`

aws sagemaker delete-endpoint –endpoint-name “your-endpoint-name”

“`

After you finish your work in SageMaker Canvas, you can either log out or set the application to automatically delete the workspace instance, which stops billing for the instance.
Conclusion
In this post, we showed you how SageMaker Canvas with SageMaker JumpStart models enable you to fine-tune LLMs to match your company’s tone and style with minimal effort. By fine-tuning an LLM on company-specific data, you can create a language model that speaks in your brand’s voice.
Fine-tuning is just one tool in the AI toolbox and may not be the best or the complete solution for every use case. We encourage you to explore various approaches, such as prompting, RAG architecture, continued pre-training, postprocessing, and fact-checking, in combination with fine-tuning to create effective AI solutions that meet your specific needs.
Although we used examples based on a sample dataset, this post showcased these tools’ capabilities and potential applications in real-world scenarios. The process is straightforward and applicable to various datasets, such as your organization’s FAQs, provided they are in CSV format.
Take what you learned and start brainstorming ways to use language models in your organization while considering the trade-offs and benefits of different approaches. For further inspiration, see Overcoming common contact center challenges with generative AI and Amazon SageMaker Canvas and New LLM capabilities in Amazon SageMaker Canvas, with Bain & Company.

About the Author
Yann Stoneman is a Solutions Architect at AWS focused on machine learning and serverless application development. With a background in software engineering and a blend of arts and tech education from Juilliard and Columbia, Yann brings a creative approach to AI challenges. He actively shares his expertise through his YouTube channel, blog posts, and presentations.

Is There a Library for Cleaning Data before Tokenization? Meet the Uns …

In Natural Language Processing (NLP) tasks, data cleaning is an essential step before tokenization, particularly when working with text data that contains unusual word separations such as underscores, slashes, or other symbols in place of spaces. Since common tokenizers frequently rely on spaces to split text into distinct tokens, this problem can have a major impact on the quality of tokenization. 

This challenge emphasizes the necessity of having a specialized library or tool that can efficiently preprocess such data. To make sure that words are properly segmented before feeding them into NLP models, cleaning text data includes adding, deleting, or changing these symbols. Neglecting this preliminary stage may result in inaccurate tokenization, impacting subsequent tasks such as sentiment analysis, language modeling, or text categorization.

The Unstructured library is a solution to this, as it provides an extensive range of cleaning operations that are specifically tailored to sanitize text output, thereby tackling the problem of cleaning data prior to tokenization. When working with unstructured data from many sources, including HTML, PDFs, CSVs, PNGs, and more, these capabilities are quite helpful because formatting problems, like unusual symbols or word separations, are frequently encountered. 

Unstructured specializes in extracting and converting complex data into AI-friendly formats that are optimized for Large Language Model (LLM) integration, like JSON. Because of the platform’s versatility in handling different document kinds and layouts, data scientists may effectively preprocess data at scale without being constrained by issues with format or cleaning. 

The main features of the platform which are meant to make data workflows more efficient are as follows.

Document Extraction: Unstructured is excellent at extracting metadata and document elements from a wide range of document types. This capacity to extract exact information guarantees the accurate acquisition of pertinent data for processing later on.

Broad File Support: Unstructured provides flexibility in managing several document formats, guaranteeing compatibility and adaptability across multiple platforms and use cases.

Partitioning: Structured material can be extracted from unstructured texts using Unstructured partitioning features. This function is essential for converting disorganized data into usable formats, which makes data processing and analysis more effective. 

Cleaning: Unstructured contains cleaning capabilities to sanitize output, eliminate undesired content, and improve the performance of NLP tasks by guaranteeing data integrity as preparing data is crucial for NLP models. 

Extracting: By locating and isolating particular entities inside documents, the platform’s extraction functionality makes data interpretation easier to understand and concentrates on pertinent information. 

Connectors: Unstructured offers high-performing connectors that optimize data workflows and support popular use cases, including Retrieval Augmented Generation (RAG), fine-tuning models, and pretraining models. These connectors enable fast data import and export.

In conclusion, utilizing Unstructured’s extensive toolkit can expedite data preprocessing processes and cut down on the time spent on data collecting and cleaning. This speeds up the creation and implementation of some amazing NLP solutions driven by LLMs by enabling researchers and developers to devote more time and resources to data modeling and analysis.
The post Is There a Library for Cleaning Data before Tokenization? Meet the Unstructured Library for Seamless Pre-Tokenization Cleaning appeared first on MarkTechPost.

The Rise of Adversarial AI in Cyberattacks

In cybersecurity, while AI technologies have significantly bolstered our defense mechanisms against cyber threats, they have also given rise to a new era of sophisticated attacks. Let’s explore the darker side of AI advancements in the cybersecurity domain, focusing on its role in enhancing adversarial capabilities. From AI-powered phishing attacks that craft deceptively personal messages to advanced cryptographic attacks that challenge the integrity of encryption methods, let’s delve into how AI is reshaping the landscape of cyber warfare, presenting unprecedented challenges and opportunities for cybersecurity professionals.

AI-powered Social Engineering and Phishing Attacks

AI is reshaping the landscape of social engineering and phishing attacks, allowing for highly targeted and personalized campaigns. AI tools analyze vast datasets to identify potential targets, fine-tuning phishing messages that resonate with specific individuals. These messages are increasingly difficult to distinguish from legitimate communication, significantly increasing their effectiveness. The continuous improvement of generative AI models means they can adapt to counteract detection techniques, making traditional defenses less effective. 

Image Source

Deepfakes and Synthetic Media for Deception

The use of AI-generated deepfakes and synthetic media in cyberattacks presents a growing threat, particularly in political misinformation and personal impersonation. These technologies can create convincing audio and visual content, leading to misinformation or manipulation of public opinion. The sophistication of these tools enables the creation of media that can be nearly impossible to differentiate from genuine content, raising significant concerns for security and misinformation. 

Evolving Malware and Ransomware with AI

AI also enhances malware’s capabilities, including ransomware, making these threats more adaptive, resilient, and difficult to detect. AI-driven malware can analyze its environment and modify its behavior to evade security measures. This includes learning from defensive responses and finding new vulnerabilities without human intervention. The increased use of AI in malware development suggests a future where automated threats can independently orchestrate attacks across networks. 

Image Source

AI-enhanced Network Intrusions

AI is increasingly used to automate the process of network intrusion, allowing for rapid and sophisticated attacks. By leveraging AI, attackers can quickly analyze vast data to identify vulnerabilities and orchestrate network attacks. These AI-powered tools can mimic normal user behavior to evade detection systems and perform actions such as data theft, system disruption, or deploying further malware. AI-driven network intrusions represent a significant threat because they can operate at a scale and speed that human attackers cannot match. Integrating AI into network attacks necessitates advancements in equally sophisticated AI-driven security measures to effectively detect and neutralize these threats.

Image Source

AI in Information Warfare

AI’s capabilities are being exploited in information warfare to automate the creation and dissemination of disinformation. This application of AI can influence public opinion, manipulate political outcomes, and destabilize societal cohesion. AI algorithms can generate believable news stories, social media posts, and even fake images or videos, spreading them across platforms where they can be difficult to distinguish from real information. The strategic use of such AI-generated content can profoundly affect public perception and discourse, making it a powerful tool in information warfare. Addressing this challenge requires robust mechanisms to detect AI-generated content and educate the public about the potential for misinformation.

AI for Exploiting IoT Vulnerabilities

The proliferation of IoT devices has expanded the attack surface for cyber threats, and AI is being used to exploit vulnerabilities in these devices. Attackers use AI to automate discovering unsecured IoT devices and deploy botnets or malicious software. This can lead to large-scale attacks, such as distributed denial of service (DDoS), which can impact infrastructure, steal data, or gain unauthorized access to networks. The ability of AI to learn and adapt makes it particularly effective at identifying new vulnerabilities as they emerge, challenging cybersecurity professionals to constantly update defenses.

Image Source

AI and Cryptographic Attacks

AI is also making waves in cryptography by enabling more effective attacks on cryptographic algorithms. Through machine learning and pattern recognition techniques, AI systems can analyze encrypted data to find vulnerabilities without knowing the underlying encryption key. This can potentially lead to the decryption of sensitive data without authorization. The evolving capability of AI to break cryptographic protections faster than ever poses a significant threat to the security of data transmissions and stored information, urging the development of more resilient cryptographic methods that can withstand AI-driven attacks.

Sources

https://ar5iv.org/pdf/2310.13715

https://ar5iv.org/abs/2310.05595

https://ar5iv.org/abs/2307.16336

https://arxiv.org/abs/2103.07110

https://ar5iv.org/abs/2310.07099

https://ar5iv.org/pdf/2204.03433

https://ar5iv.org/abs/2311.02986

https://ar5iv.org/abs/1803.04646

The post The Rise of Adversarial AI in Cyberattacks appeared first on MarkTechPost.

Analyzing the Impact of Flash Attention on Numeric Deviation and Train …

The challenge of training large and sophisticated models is significant, primarily due to the extensive computational resources and time these processes require. This is particularly evident in training large-scale Generative AI models, which are prone to frequent instabilities manifesting as disruptive loss spikes during extended training sessions. Such instabilities often lead to costly interruptions that necessitate pausing and restarting the training process, a challenge noted in models as expansive as the LLaMA2’s 70-billion parameter model, which required over 1.7 million GPU hours.

The root of these instabilities is often traced back to numeric deviations—small, cumulative errors in the computation process that can lead to significant deviations from expected training outcomes. Researchers have explored various optimization methods, including the Flash Attention technique, which aims to reduce the computational overhead in transformer models, a widely recognized bottleneck.

Flash Attention, a technique analyzed for its utility and efficiency, particularly targets the efficiency of the attention mechanism, a crucial component of transformer models. This technique leverages a system of tiling and recomputation to process the attention mechanism’s large matrices more efficiently, minimizing the extensive memory usage that traditional methods incur. For instance, in specific implementations, Flash Attention has demonstrated a 14% increase in speed for both forward and backward processing passes in text-to-image models, highlighting its potential for enhancing training efficiency.

The method introduces certain computational nuances, such as rescaling factors necessary for managing data blocks within the model’s memory constraints. While beneficial for memory management, these rescaling factors introduce an additional layer of numeric deviation. Researchers from FAIR at Meta, Harvard University, and Meta have quantified this deviation, finding that Flash Attention introduces roughly ten times more numeric deviation than Baseline Attention at BF16 numerical precision. However, a more comprehensive analysis, like one utilizing the Wasserstein Distance, shows that this deviation is still 2-5 times less impactful than deviations from low-precision training.

Despite the improvements in computational efficiency and memory usage, the numeric deviations associated with Flash Attention could still pose risks to model training stability. Analyzing these deviations is critical, allowing a deeper understanding of how they can impact long-term training stability. As such, while Flash Attention offers considerable advantages in terms of efficiency and speed, its broader implications on training stability require careful evaluation.

In conclusion, Flash Attention advances in optimizing attention mechanisms within large-scale machine learning models. Efficiently managing the computational demands and reducing memory usage marks a step forward in addressing the enduring challenge of training instabilities. However, the introduction of numeric deviations by the method underscores the need for ongoing analysis and potential refinement to ensure that these efficiencies do not inadvertently compromise the overall stability of model training. Thus, while Flash Attention provides a promising avenue for improving training processes, its implications on stability are yet to be fully realized and warrant further investigation.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post Analyzing the Impact of Flash Attention on Numeric Deviation and Training Stability in Large-Scale Machine Learning Models appeared first on MarkTechPost.

How LotteON built dynamic A/B testing for their personalized recommend …

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON.
LotteON is transforming itself into an online shopping platform that provides customers with an unprecedented shopping experience based on its in-store and online shopping expertise. Rather than simply selling the product, they create and let customers experience the product through their platform.
LotteON has been providing various forms of personalized recommendation services throughout the LotteON customer journey and across its platform, from its main page to its shopping cart and order completion pages. Through the development of new, high-performing models and continuous experimentation, they’re providing customers with personalized recommendations, improving CTR (click-through rate) metrics and increasing customer satisfaction.
In this post, we show you how LotteON implemented dynamic A/B testing for their personalized recommendation system.
The dynamic A/B testing system monitors user reactions, such as product clicks, in real-time from the recommended item lists provided. It dynamically assigns the most responsive recommendation model among multiple models to enhance the customer experience with the recommendation list. Using Amazon SageMaker and AWS services, these solutions offer insights into real-world implementation know-how and practical use cases for deployment.

Defining the business problem
In general, there are two types of A/B testing that are useful for measuring the performance of a new model: offline testing and online testing. Offline testing evaluates the performance of a new model based on past data. Online A/B testing, also known as split testing, is a method used to compare two versions of a webpage, or in LotteON’s case, two recommendation models, to determine which one performs better. A key strength of online A/B testing is its ability to provide empirical evidence based on user behavior and preferences. This evidence-based approach to selecting a recommendation model reduces guesswork and subjectivity in optimizing both click-through rates and sales.
A typical online A/B test serves two models in a certain ratio (such as 5:5) for a fixed period of time (for example, a day or a week). When one model performs better than the other, the lower performing model is still served for the duration of the experiment, regardless of its impact on the business. To improve this, LotteON turned to dynamic A/B testing, which evaluates the performance of models in real time and dynamically updates the ratios at which each model is served, so that better performing models are served more often. To implement dynamic A/B testing, they used the multi-armed bandit (MAB) algorithm, which performs real-time optimizations.
LotteON’s dynamic A/B testing automatically selects the model that drives the highest click-through rate (CTR) on their site. To build their dynamic A/B testing solution, LotteON used AWS services such as Amazon SageMaker and AWS Lambda. By doing so, they were able to reduce the time and resources that would otherwise be required for traditional forms of A/B testing. This frees up their scientists to focus more of their time on model development and training.
Solution and implementation details
The MAB algorithm evolved from casino slot machine profit optimization. MAB’s usage method differs in selection (arm) from the existing method, which is widely used to re-rank news or products. In this implementation the selection (the arm) in MAB must be a model. There are various MAB algorithms such as ε-greedy and Thompson sampling.
The ε-greedy algorithm balances exploration and exploitation by choosing the best-known option most of the time, but randomly exploring other options with a small probability ε. Thompson sampling involves defining the β distribution for each option, with parameters alpha (α) representing the number of successes so far and beta (β) representing failures. As the algorithm collects more observations, alpha and beta are updated, shifting the distributions toward the true success rate. The algorithm then randomly samples from these distributions to decide which option to try next—balancing exploitation of the best-performing options to-date with exploration of less-tested options. In this way, MAB learns which model is best based on actual outcomes.
Based on LotteON’s evaluation of both ε-greedy and Thompson sampling, which considered the balance of exposure opportunities of the models under test, they decided to use Thompson sampling. Based on the number of clicks obtained, they were able to derive an efficiency model. For a hands-on workshop on dynamic A/B testing with MAB and Thompson sampling algorithms, see Dynamic A/B Testing on Amazon Personalize & SageMaker Workshop. LotteON’s goal was to provide real-time recommendations for high CTR efficient models.

With the option (arm) configured as a model, and the alpha value for each model configured as a click, the beta value for each model was configured as a non-click. To apply the MAB algorithm to actual services, they introduced the bTS (batched Thompson sampling) method, which processes Thompson sampling on a batch basis. Specifically, they evaluated models based on traffic over a certain period of time (24 hours), and updated parameters at a certain time interval (1 hour).
In the handler part of the Lambda function, a bTS operation is performed that reflects the parameter values ​​for each model (arm), and the click probabilities of the two models are calculated. The ID of the model with the highest probability of clicks is then selected. One thing to keep in mind when conducting dynamic A/B testing is not to start Thompson sampling right away. You should allow warm-up time for sufficient exploration. To avoid prematurely determining the winner due to small parameter values at the beginning of the test, you must collect an adequate number of impressions or click-metrics.
Dynamic A/B test architecture
The following figure shows the architecture for the dynamic A/B test that LotteON implemented.

The architecture in the preceding figure shows the data flow of Dynamic A/B testing and consists of the following four decoupled components:
1. MAB serving flow
Step 1: The user accesses LotteON’s recommendation page.
Step 2: The recommendations API checks MongoDB for information about ongoing experiments with recommendation section codes and, if the experiment is active, sends an API request with the member ID and section code to the Amazon API Gateway.
Step 3: API Gateway provides the received data to Lambda. If there is relevant data in the API Gateway cache, a specific model code in the cache is immediately passed to the recommendation API.
Step 4: The Lambda function checks the experiment type (that is, dynamic A/B test or online static A/B test) in MongoDB and runs its algorithm. If the experiment type is dynamic A/B test, the alpha (number of clicks) and beta (number of non-clicks) required for the Thompson sampling algorithm are retrieved from MongoDB, the values ​​are obtained, and the Thompson sampling algorithm is run. Through this, the selected model’s identifier is delivered to Amazon API Gateway by the Lambda function.
Step 5: API Gateway provides the selected model’s identifier to the recommended API and caches the selected model’s identifier for a certain period of time.
Step 6: The recommendation API calls the model inference server (that is, the SageMaker endpoint) using the selected model’s identifier to receive a recommendation list and provides it to the user’s recommendation web page.
2. The flow of an alpha and beta parameter update
Step 1: The system powering LotteON’s recommendation page stores real-time logs in Amazon S3.
Step 2: Amazon EMR downloads the logs stored in Amazon S3.
Step 3: Amazon EMR processes the data and updates the alpha and beta parameter values to MongoDB for use in the Thompson sampling algorithm.
3. The flow of business metrics monitoring
Step 1: Streamlit pulls experimental business metrics from MongoDB to visualize.
Step 2: Monitor efficiency metrics such as CTR per model over time.
4. The flow of system operation monitoring
Step 1: When a recommended API call occurs, API Gateway and Lambda are launched, and Amazon CloudWatch logs are produced.
Step 2: Check system operation metrics using CloudWatch and AWS X-Ray dashboards based on CloudWatch logs.
Implementation Details 1: MAB serving flow mainly involving API Gateway and Lambda
The APIs that can serve MAB results—that is, the selected model—are implemented using serverless compute services, Lambda, and API Gateway. Let’s take a look at the implementation and settings.
1. API Gateway configuration
When a LotteON user signs in to the recommended product area, member ID, section code, and so on are passed to API Gateway as GET parameters. Using the passed parameters, the selected model can be used for inferencing during a certain period of time through the cache function of Amazon API Gateway.
2. API Gateway cache settings
Setting up a cache in API Gateway is straightforward. To set up the cache, first enable it by selecting the appropriate checkbox under the Settings tab for your chosen stage. After it’s activated, you can define the cache time-to-live (TTL), which is the duration in seconds that cached data remains valid. This value can be set anywhere up to a maximum of 3,600 seconds.

The API Gateway caching feature is limited to the parameters of GET requests. To use caching for a particular parameter, you should insert a query string in the GET request’s query parameters within the resource. Then select the Enable API Cache option. It is essential to deploy your API using the deploy action in the API Gateway console to activate the caching function.

After the cache is set, the same model is used for inference on specific customers until the TTL has elapsed. Following that, or when the recommendation section is first exposed, API Gateway will call Lambda with the MAB function implemented.
3. Add an API Gateway mapping template
When a Lambda handler function is invoked, it can receive the HTTPS request details from API Gateway as an event parameter. To provide a Lambda function with more detailed information, you can enhance the event payload using a mapping template in the API Gateway. This template is part of the integration request setup, which defines how incoming requests are mapped to the expected format of the Lambda function.

The specified parameters are then passed to the Lambda function’s event parameters. The following code is an example of source code that uses the event parameter in Lambda.

def lambda_handler (event, context):
event_param = event [“name”]
return {
‘message’: event_param
}

4. Lambda for Dynamic A/B Test
Lambda receives a member ID and section code as event parameter values. The Lambda function uses the received section code to run the MAB algorithm. In the case of the MAB algorithm, a dynamic A/B test is performed by getting the model (arm) settings and aggregated results. After updating the alpha and beta values according to bTS when reading the aggregated results, the probability of a click for each model is obtained through the beta distribution (see the following code), and the model with the maximum value is returned. For example, given model A and model B, where model B has a higher probability of producing a click-through event, model B is returned.

def select_variant (self):
probs = []
for v in self.variant_metrics:
success = v[“mab_alpha”]
failure = v[“mab_beta”]
probs.append(AlgorithmBase.random_beta(1 + success, 1 + failure))

variant_index = AlgorithmBase.argmax(probs)

return (self.variant_metrics [variant_index] [“variant_name”], probs)

The overall implementation using the bTS algorithm, including the above code, was based on the Dynamic A/B testing for machine learning models with Amazon SageMaker MLOps projects post.
Implementation details 2: Alpha and beta parameter update
A product recommendation list is displayed to the LotteON user. When the user clicks on a specific product in the recommendation list, that data is captured and logged to Amazon S3. As shown in the following figure, LotteON used AWS EMR to perform Spark Jobs that periodically pulled the logged data from S3, processed the data, and inserted the results into MongoDB.

The results generated at this stage play a key role in determining the distribution used in MAB. The following impression and click data were examined in detail.

Impression and click data

Note: Before updating the alpha and beta parameters in bTS, verify the integrity and completeness of log data, including impressions and clicks from the recommendation section.
Implementation details 3: Business metrics monitoring
To assess the most effective model, it’s essential to monitor business metrics during A/B testing. For this purpose, a dashboard was developed using Streamlit on an Amazon Elastic Compute Cloud (Amazon EC2) environment.
Streamlit is a Python library can be used to create web apps for data analysis. LotteON added the necessary Python package information for dashboard configuration to the requirements.txt file, specifying Streamlit version 1.14.1, and proceeded with the installation as demonstrated in the following:

$ python3 -m pip install –upgrade pip
$ pip3 install -r requirements.txt

The default port provided by Streamlit is 8501, so it’s required to set the inbound custom TCP port 8501 to allow access to the Streamlit web browser.

When setup is complete, use the streamlit run pythoncode.py command in the terminal, where pythoncode.py is the Python script containing the Streamlit code to run the application. This command launches the Streamlit web interface for the specified application.

import streamlit as st
st.title (‘streamlit example’)

LotteON created a dashboard based on Streamlit. The functionality of this organized dashboard includes monitoring simple business metrics such as model trends over time, daily and real-time winner models, as shown in the following figure.
The dashboard allowed LotteON to analyze the business metrics of the model and check the service status in real time. It also monitored the effectiveness of model version updates and reduced the time to check the service impact of the retraining pipeline.

The following shows an enlarged view of the cumulative CTR of the two models (EXP-01-APS002-01 model A, EXP-01-NCF-01 model B) on the testing day. Let’s take a look at each model to see what that means. Model A provided customers with 29,274 recommendation lists that received 1,972 product clicks and generated a CTR of 6.7 percent (1,972/29,274).
Model B, on the other hand, served 7,390 recommended lists, received 430 product clicks, and generated a CTR of 5.8 percent (430/7,390). Alpha and beta parameters, the number of clicks and the number of non-clicks respectively, of each model were used to set the beta distribution. Model A’s alpha parameter was 1972 (number of clicks) and its beta parameter was 27,752 (number of non-clicks [29,724 – 1,972]). Model B’s alpha parameter was 430 (number of clicks) and its beta parameter was 6,960 (number of non-clicks). The larger the X-axis value corresponding to the peak in the beta distribution graph, the better the performance (CTR) model.
In the following figure, model A (EXP-01-APS002-01) shows better performance because it’s further to the right in relation to the X axis. This is also consistent with the CTR rates of 6.7 percent and 5.8 percent.

Implementation details 4: System operation monitoring with CloudWatch and AWS X-Ray
You can enable CloudWatch settings, custom access logging, and AWS X-Ray tracking features from the Logs/Tracking tab in the API Gateway menu.
CloudWatch settings and custom access logging
In the configuration step, you can change the CloudWatch Logs type to set the logging level, and after activating detailed indicators, you can check detailed metrics such as 400 errors and 500 errors. By enabling custom access logs, you can check which IP accessed the API and how.

Additionally, the retention period for CloudWatch Logs must be specified separately on the CloudWatch page to avoid storing them indefinitely.
If you select API Gateway from the CloudWatch Explorer list, you can view the number of API calls, latency, and cache hits and misses on a dashboard. Find the Cache Hit Rate as shown in the following formula and check the effectiveness of the cache on the dashboard.

Cache Hit Rate = CacheHitCount / (CacheHitCount + CacheMissCount)

By selecting Lambda as the log group in the CloudWatch Logs Insights menu, you can verify the actual model code returned by Lambda, where MAB is performed, to check whether the sampling logic is working and branch processing is being performed.

fields @timestamp, @message, @logStream, @log
| filter @message like ‘Model A’ or message like ‘Model B’
| stats count (*) by @message

As shown in the preceding image, LotteON observed how often the two models were called by the Lambda function during the A/B test. Specifically, the model labeled LF001-01 (the champion model) was invoked 4,910 times, while the model labeled NCF-02 (the challenger model) was invoked 4,905 times. These numbers represent the degree to which each model was selected in the experiment.
AWS X-Ray
If you enable the X-Ray trace feature, trace data is sent from the enabled AWS service to X-Ray and the visualized API service flow can be monitored from the service map menu in the X-Ray section of the CloudWatch page.

As shown in the preceding figure, you can easily track and monitor latency, number of calls, and number of HTTP call status for each service section by choosing the API Gateway icon and each Lambda node.
There was no need to store performance metrics for a long time because most for Lambda functions metrics are analyzed within a week and aren’t used afterward. Because data from X-Ray is stored for 30 days by default, which is enough time to use the metrics, the data was used without changing the storage cycle. (For more information, see the AWS X-Ray FAQs.)
Conclusion
In this post, we explained how Lotte ON builds and uses a dynamic A/B testing environment. Through this project, Lotte ON was able to test the model’s performance in various ways online by combining dynamic A/B testing with the MAB function. It also allows comparison of different types of recommendation models and is designed to be comparable across model versions, facilitating online testing.
In addition, data scientists could concentrate on improving model performance and training as they can check metrics and system monitoring instantly. The dynamic A/B testing system was initially developed and applied to the LotteON main page, and then expanded to the main page recommendation tab and product detail recommendation section. Because the system is able to evaluate online performance without significantly reducing the click-through rate of existing models, we have been able to conduct more experiments without impacting users.
Dynamic A/B Test exercises can also be found in AWS Workshop – Dynamic A/B Testing on Amazon Personalize & SageMaker.

About the Authors
HyeKyung Yang is a research engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of developing ML/DL recommendation models by analyzing and utilizing various data and developing a dynamic A/B test environment.
Jieun Lim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of operating LotteON’s personalized recommendation system and developing personalized recommendation models and dynamic A/B test environments.
SeungBum Shim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team, responsible for discovering ways to use and improve recommendation-related products through LotteON data analysis, and developing MLOps pipelines and ML/DL recommendation models.
Jesam Kim is an AWS Solutions Architect and helps enterprise customers adopt and troubleshoot cloud technologies and provides architectural design and technical support to address their business needs and challenges, especially in AIML areas such as recommendation services and generative AI.
Gonsoo Moon is an AWS AI/ML Specialist Solutions Architect and provides AI/ML technical support. His main role is to collaborate with customers to solve their AI/ML problems based on various use cases and production experience in AI/ML.

Unleashing the power of generative AI: Verisk’s journey to an Instan …

This post is co-written with Tom Famularo, Abhay Shah and Nicolette Kontor from Verisk.
Verisk (Nasdaq: VRSK) is a leading data analytics and technology partner for the global insurance industry. Through advanced analytics, software, research, and industry expertise across over 20 countries, Verisk helps build resilience for individuals, communities, and businesses. The company is committed to ethical and responsible AI development, with human oversight and transparency. Verisk is using generative artificial intelligence (AI) to enhance operational efficiencies and profitability for insurance clients while adhering to its ethical AI principles.
Verisk’s FAST platform is a leader in the life insurance and retirement sector, providing enhanced efficiency and flexible, easily upgradable architecture. FAST has earned a fourth consecutive leader ranking in the 2024 ISG Provider Lens report for its seamless integration with Verisk’s data, analytics, and claims tools. The software as a service (SaaS) platform offers out-of-the-box solutions for life, annuity, employee benefits, and institutional annuity providers. With preconfigured components and platform configurability, FAST enables carriers to reduce product time-to-market by 75% and launch new offerings in as little as 2 months.
In this post, we describe the development of the customer support process in FAST incorporating generative AI, the data, the architecture, and the evaluation of the results. Conversational AI assistants are rapidly transforming customer and employee support. Verisk has embraced this technology and has developed their own Instant Insight Engine, or AI companion, that provides an enhanced self-service capability to their FAST platform.
The Opportunity
Verisk FAST’s initial foray into using AI was due to the immense breadth and complexity of the platform. With hundreds of thousands of hours spent on customer support every year, it became abundantly clear they needed help to scale their efforts and meet their objectives. Verisk’s talented teams were overloaded handling common inquiries, leaving less time for the type of innovation that would allow them to maintain the pole position as insurance technology providers.
Verisk FAST’s AI companion aims to alleviate this burden by not only providing 24/7 support for business processing and configuration questions related to FAST, but also tapping into the immense knowledge base to provide an in-depth, tailored response. It is designed to be deeply integrated into the FAST platform and use all of Verisk’s documentation, training materials, and collective expertise. It relies on a Retrieval Augmented Generation (RAG) approach and a mix of AWS services and proprietary configuration to instantly answer most user questions about the Verisk FAST platform’s extensive capabilities.
When the AI companion is rolled out at scale, it will allow Verisk’s staff to focus more time on complex problems, critical initiatives, and innovation while delivering a better customer experience. As part of the build-out, Verisk came across several considerations, key findings, and decisions worth sharing for any enterprise looking to take the first step in tapping into generative AI’s potential.
The Approach
When building an interactive agent with large language models (LLMs), there are often two techniques that can be used: RAG and fine-tuning. The choice between these approaches depends on the use case and available dataset. Verisk FAST started building a RAG pipeline for their AI companion and have iteratively enhanced this solution. The following are some of the reasons why continuing with a RAG architecture made sense to Verisk:

Access to Dynamic Data – The FAST platform is a constantly evolving platform adding both business functionality and technical capabilities. Verisk needed to make sure their responses were always based on the most up-to-date information. The RAG approach allows for accessing frequently updated data, enabling responses using the most recent information without frequent retraining of the model.
Multiple Data Sources – In addition to recency of data, another important aspect was the ability to tap into multiple different data sources to retrieve the right context. These data sources may be both internal and external to provide a more holistic response. The ease of expanding the knowledge domain without the need to fine-tune with new data sources makes the solution extensible.
Reduce Hallucination – Retrieval reduces the risk of hallucination compared to free-form text generation because responses derive directly from the provided excerpts.
LLM Linguistics – Although appropriate context can be retrieved from enterprise data sources, the underlying LLM handles linguistics and fluency.
Transparency – Verisk wants to continuously improve the AI companion’s ability to generate responses. A RAG architecture gave them the transparency needed into the context retrieval process, information that would ultimately be used for generating user responses. Having that transparency helped Verisk identify areas of the system where their documents were lacking and needed some restructuring.
Data governance – With a wide variety of users accessing the platform and with different users having access to different data, data governance and isolation was paramount. Verisk injected controls into the RAG pipeline that restricted access to data based on user access controls, making sure responses were highly tuned to the user.

Although both RAG and fine-tuning have trade-offs, RAG was the optimal approach for building an AI companion on the FAST platform given their requirements for real-time accuracy, explainability, and configurability. The pipeline architecture allows for iterative enhancement as Verisk FAST’s use cases evolve.
Solution Overview
The following diagram presents a high-level architectural data flow highlighting several of the AWS services used in building the solution. Verisk’s solution represents a compound AI system, involving multiple interacting components and making numerous calls to the LLM to furnish responses to the user. Using the FAST platform for orchestrating these diverse components proved to be an intuitive choice, circumventing certain challenges encountered with alternative frameworks such as LangChain.

The key components are as follows:

Amazon Comprehend
Amazon Kendra
Amazon Bedrock
Amazon Rekognition
Amazon Transcribe
A prompt template warehouse

Amazon Comprehend
To bolster security, Verisk aimed to block the submission of personally identifiable information (PII) within user questions. Although PII isn’t typically necessary for interactions with the AI companion, Verisk employed Amazon Comprehend to detect any potential PII within queries.
 Amazon Kendra
In designing an effective RAG solution, one of the most critical steps is the context retrieval from enterprise documentation. Although many options exist to store embeddings, Verisk FAST opted to use Amazon Kendra due to its powerful out-of-the-box semantic search capabilities. As a fully managed service, Verisk took advantage of its deep-learning search models without additional provisioning. Verisk compared using Amazon OpenSearch Serverless with several embedding approaches and Amazon Kendra, and saw better retrieval results with Amazon Kendra. As you’ll see further in the post, Verisk incorporated the Retrieve API and the Query API to retrieve semantically relevant passages for their queries to further improve generation by the LLM.
Amazon Bedrock
Anthropic Claude, available in Amazon Bedrock, played various roles within Verisk’s solution:

Response Generation – When building their AI companion, Verisk thoroughly evaluated the LLM options from leading providers, using their dataset to test each model’s comprehension and response quality. After this extensive testing, Verisk found Anthropic’s Claude model consistently outperformed across key criteria. Claude demonstrated superior language understanding in Verisk’s complex business domain, allowing more pertinent responses to user questions. It also did exceedingly well at SQL generation, better than any other model they tested. Given Claude’s standout results across Verisk FAST’s use cases, it was the clear choice to power their AI companion’s natural language capabilities.
Preprocessing of Images and Videos – The outputs from Amazon Rekognition and Amazon Transcribe were fed into Claude. Claude demonstrated remarkable capabilities in generating natural language descriptions, which could be effectively used for indexing purposes with Amazon Kendra. Additionally, Claude excelled at summarizing video transcriptions into concise segments corresponding to specific time intervals, enabling the display of videos at precise points. This combination of AWS services and Claude’s language processing capabilities facilitated a more intuitive and user-friendly experience for media exploration and navigation.
Relevance Ranking – Although Amazon Kendra returned confidence scores on search results, Verisk needed to further tune the search results for Query API calls for a few scenarios. Verisk was able to use Claude to rank the relevance of search results from Amazon Kendra, further improving the results returned to the user.
Tool Identification – Verisk used Claude to determine the most suitable techniques, whether API calls or SQL queries, for retrieving data from the operational database based on user requests. Furthermore, Claude generated SQL queries tailored to the provided schemas, enabling efficient data retrieval.
Conversation Summarization – When a user asks a follow-up question, the AI companion can continue the conversational thread. To enable this, Verisk used Claude to summarize the dialogue to update the context from Amazon Kendra. The full conversation summary and new excerpts are input to the LLM to generate the next response. This conversational flow allows the AI compan to answer user follow-up questions and have a more natural, contextual dialogue, bringing Verisk FAST closer to having a true AI assistant that can engage in useful back-and-forth conversations with users.

Amazon Rekognition
Primarily used for processing images containing text and process flow diagrams, the pre-trained features of Amazon Rekognition facilitated information extraction. The extracted data was then passed to Claude for transformation into a more natural language format suitable for indexing within Amazon Kendra.
Amazon Transcribe
Similar to Amazon Rekognition, Amazon Transcribe was employed to preprocess videos and generate transcripts, with a notable feature being the masking of sensitive information. The verbose transcripts, along with timestamps, were condensed using Claude before being indexed into Amazon Kendra.
Prompt Template Warehouse
Central to the solution was the dynamic selection of templates to create prompts based on question classification. Substantial effort was invested in developing and continuously improving these prompt templates.
Throughout Verisk’s journey, they worked closely with the AWS Solutioning team to brainstorm concrete suggestions to enhance the overall solution.
Data Harvesting
Before Verisk started building anything in the platform, they spent weeks amassing information, initially in the form of questions and answers. Verisk FAST’s initial dataset comprised 10,000 questions and their corresponding answers, meticulously collected and vetted to confirm accuracy and relevance. However, they understood that this was not a one-and-done effort. Verisk needed to continually expand its knowledge base by identifying new data sources across the business.
Driven by this, Verisk diligently added 15,000 more questions, making sure they covered less frequently encountered scenarios. Verisk also added user guides, technical documentation, and other text-based information. This data spanned several categories, from business processing to configuration to their delivery approach. This enriched the AI companion’s knowledge and understanding of diverse user queries, enabling it to provide more accurate and insightful responses.
The Verisk FAST team also recognized the necessity of exploring additional modalities. Videos and images, particularly those illustrating process flows and information sharing videos, proved to be invaluable sources of data. During the initial rollout phase, it became evident that certain inquiries demanded real-time data retrieval from their operational data store. Through some slick prompt engineering and using Claude’s latest capabilities to invoke APIs, Verisk seamlessly accessed their database to procure real-time information.
Structuring and Retrieving the Data
An essential element in developing the AI companion’s knowledge base was properly structuring and effectively querying the data to deliver accurate answers. Verisk explored various techniques to optimize both the organization of the content and the methods to extract the most relevant information:

Chunking – One key step in preparing the accumulated questions and answers was splitting the data into individual documents to facilitate indexing into Amazon Kendra. Rather than uploading a single large file containing all 10,000 question-answer pairs, Verisk chunked the data into 10,000 separate text documents, with each document containing one question-answer pair. By splitting the data into small, modular documents focused on a single question-answer pair, Verisk could more easily index each document and had greater success in pulling back the correct context. Chunking the data also enabled straightforward updating and reindexing of the knowledge base over time. Verisk applied the same technique to other data sources as well.
Selecting the Right Number of Results – Verisk tested configuring Amazon Kendra to return different numbers of results for each question query. Returning too few results ran the risk of not capturing the best answer, whereas too many results made it more difficult to identify the right response. Verisk found returning the top three matching results from Amazon Kendra optimized both accuracy and performance.
Multi-step Query – To further improve accuracy, Verisk implemented a multi-step query process. First, they used the Amazon Kendra Retrieve API to get multiple relevant passages and excerpts based on keyword search. Next, they took a second pass at getting excerpts through the Query API, to find any additional shorter documents that might have been missed. Combining these two query types enabled Verisk to reliably identify the correct documentation and excerpts to generate a response.
Relevance Parameters – Verisk also tuned relevance parameters in Amazon Kendra to weigh their most up-to-date documentation higher than others. This improved results over just generic text search.

By thoroughly experimenting and optimizing both the knowledge base powering their AI companion and the queries to extract answers from it, Verisk was able to achieve very high answer accuracy during the proof of concept, paving the way for further development. The techniques they explored—multi-stage querying, tuning relevance, enriching data—became core elements of their approach for extracting quality automated answers.
LLM Parameters and Models
Experimenting with prompt structure, length, temperature, role-playing, and context was key to improving the quality and accuracy of the AI companion’s Claude-powered responses. The prompt design guidelines provided by Anthropic were incredibly helpful.
Verisk crafted prompts that provided Claude with clear context and set roles for answering user questions. Setting the temperature to 0.5 helped reduce randomness and repetition in the generated responses.
Verisk also experimented with different models to improve the efficiency of the overall solution. Although Claude 3 models like Sonnet and Haiku did a great job at generating responses, as part of the overall solution, Verisk didn’t always need the LLM to generate text. For scenarios that required identification of tools, Claude Instant was a better suited model due to its quicker response times.
Metrics, Data Governance, and Accuracy
A critical component of Verisk FAST’s AI companion and its usefulness is their rigorous evaluation of its performance and the accuracy of its generated responses.
As part of the proof of concept in working with the Amazon Generative AI Innovation Center, Verisk came up with 100 questions to evaluate the accuracy and performance of the AI companion. Central to this process was crafting questions designed to assess the bot’s ability to comprehend and respond effectively across a diverse range of topics and scenarios. These questions spanned a variety of topics and varying levels of difficulty. Verisk wanted to make sure their AI companion provided accurate responses to frequently asked questions and could demonstrate proficiency in handling nuanced and less predictable or straightforward inquiries. The results provided invaluable insights into RAG’s strengths and areas for improvement, guiding Verisk’s future efforts to refine and enhance its capabilities further.
After Verisk integrated their AI companion into the platform and began testing it with real-world scenarios, their accuracy rate was approximately 40%. However, within a few months, it rapidly increased to over 70% because of all the data harvesting work, and the accuracy continues to steadily improve each day.
Contributing to the AI companion’s rising accuracy is Verisk’s evaluation heat map. This provides a visual representation of the documentation available across 20 topics that comprehensively encompasses the Verisk FAST platform’s capabilities. This is compared against the volume of inquiries within each specific topic segment and the health of the generated responses in each.
This visualized data allows the Verisk FAST team to effortlessly identify gaps. They can quickly see which capability the AI companion currently struggles with against where user questions are most focused on. The Verisk team can then prioritize expanding its knowledge in these areas through additional documentation, training data, research materials, and testing.

Business Impact
Verisk initially rolled out the AI companion to one beta customer to demonstrate real-world performance and impact. Supporting a customer in this way is a stark contrast to how Verisk has historically engaged with and supported customers in the past, where they would typically have a team allocated to interact with the customer directly. Now only a fraction of the time a person would usually spend is needed to review submissions and adjust responses. Verisk FAST’s AI companion has helped them cost-effectively scale while still providing high-quality assistance.
In analyzing this early usage data, Verisk uncovered additional areas they can drive business value for their customers. As they collect additional information, this data will help them uncover what will be needed to improve results and prepare for a wider rollout.
Ongoing development will focus on expanding these capabilities, prioritized based on the collected questions. Most exciting, though, are the new possibilities on the horizon with generative AI. Verisk knows this technology is rapidly advancing, and they are eager to harness innovations to bring even more value to their customers. As new models and techniques emerge, Verisk plans to adapt their AI companion to take advantage of the latest capabilities. Although the AI companion currently focuses on responding to user questions, this is only the starting point. Verisk plans to quickly improve its capabilities to proactively make suggestions and configure functionality directly in the system itself. The Verisk FAST team is inspired by the challenge of pushing the boundaries of what is possible with generative AI and is excited to test the limits of what’s possible.
Conclusion
Verisk’s journey in developing an AI companion for their FAST platform showcases the immense potential of generative AI to transform customer support and drive operational efficiencies. By meticulously harvesting, structuring, and retrieving data, and leveraging large language models, semantic search capabilities, and rigorous evaluation processes, Verisk has created a robust solution that provides accurate, real-time responses to user inquiries. As Verisk continues to expand the AI companion’s capabilities while adhering to ethical and responsible AI development practices, they are poised to unlock greater value for customers, enable staff to focus on innovation, and set new standards for customer support in the insurance industry.
For more information, see the following resources:

Explore generative AI on AWS
Learn about Unlocking the business value of Generative AI
Learn more about Anthropic Claude 3 models on Amazon Bedrock
Learn about Amazon Bedrock and how to build and scale generative AI applications with foundation models
Generative AI Quickstart POCs

About the Authors
Tom Famularo was Co-Founder/CEO or FAST and lead’s Verisk Life Solutions, based in NJ. Tom is responsible for platform strategy, data/analytics, AI and Verisk’s life/annuity customers. His focus and passion are for teaching customers and team members how to allow technology to enable business outcomes with far less human effort. Outside of work, he’s an avid fan of his son’s baseball and football teams.
Abhay Shah leads engineering efforts for the FAST Platform at Verisk – Life Solutions, where he offers guidance on architecture and provides technical leadership for Customer Implementations and Product Development. With over two decades of experience in the technology sector, Abhay helps insurance carriers maximize the value of their ecosystem through modern technology and is excited by the opportunities that AI provides. Beyond his professional passion, he enjoys reading, traveling, and coaching the middle school robotics team.
Nicolette Kontor is a technology enthusiast who thrives on helping customers embrace digital transformation. In her current role at Verisk – Life Solutions, she spearheads the application of artificial intelligence to the FAST Platform, which she finds tremendously rewarding and exciting. With over 10 years of experience in major customer implementations and product development, Nicolette is driven to deliver innovative solutions that unlock value for insurance carriers. Beyond her professional pursuits, Nicolette is an avid traveler, having explored 39 countries to date. She enjoys winning trivia, reading mystery novels, and learning new languages.
Ryan Doty is a Sr. Solutions Architect at AWS, based out of New York. He helps enterprise customers in the Northeast U.S. accelerate their adoption of the AWS Cloud by providing architectural guidelines to design innovative and scalable solutions. Coming from a software development and sales engineering background, the possibilities that the cloud can bring to the world excite him.
Tarik Makota is a Senior Principal Solutions Architect with Amazon Web Services. He provides technical guidance, design advice, and thought leadership to AWS’ customers across the US Northeast. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.
Dom Bavaro is a Senior Solutions Architect for Financial Services. While providing technical guidance to customers across many use cases, He is focused on helping customer build and productionize Generative AI solutions and workflows

Establishing an AI/ML center of excellence

The rapid advancements in artificial intelligence and machine learning (AI/ML) have made these technologies a transformative force across industries. According to a McKinsey study, across the financial services industry (FSI), generative AI is projected to deliver over $400 billion (5%) of industry revenue in productivity benefits. As maintained by Gartner, more than 80% of enterprises will have AI deployed by 2026. At Amazon, we believe innovation (rethink and reinvent) drives improved customer experiences and efficient processes, leading to increased productivity. Generative AI is a catalyst for business transformation, making it imperative for FSI organizations to determine where generative AI’s current capabilities could deliver the biggest value for FSI customers.
Organizations across industries face numerous challenges implementing generative AI across their organization, such as lack of clear business case, scaling beyond proof of concept, lack of governance, and availability of the right talent. An effective approach that addresses a wide range of observed issues is the establishment of an AI/ML center of excellence (CoE). An AI/ML CoE is a dedicated unit, either centralized or federated, that coordinates and oversees all AI/ML initiatives within an organization, bridging business strategy to value delivery. As observed by Harvard Business Review, an AI/ML CoE is already established in 37% of large companies in the US. For organizations to be successful in their generative AI journey, there is growing importance for coordinated collaboration across lines of businesses and technical teams.
This post, along with the Cloud Adoption Framework for AI/ML and Well-Architected Machine Learning Lens, serves as a guide for implementing an effective AI/ML CoE with the objective to capture generative AI’s possibilities. This includes guiding practitioners to define the CoE mission, forming a leadership team, integrating ethical guidelines, qualification and prioritization of use cases, upskilling of teams, implementing governance, creating infrastructure, embedding security, and enabling operational excellence.
What is an AI/ML CoE?
The AI/ML CoE is responsible for partnering with lines of business and end-users in identifying AI/ML use cases aligned to business and product strategy, recognizing common reusable patterns from different business units (BUs), implementing a company-wide AI/ML vision, and deploying an AI/ML platform and workloads on the most appropriate combination of computing hardware and software. The CoE team synergizes business acumen with profound technical AI/ML proficiency to develop and implement interoperable, scalable solutions throughout the organization. They establish and enforce best practices encompassing design, development, processes, and governance operations, thereby mitigating risks and making sure robust business, technical, and governance frameworks are consistently upheld. For ease of consumption, standardization, scalability, and value delivery, the outputs of an AI/ML CoE can be of two types: guidance such as published guidance, best practices, lessons learned, and tutorials, and capabilities such as people skills, tools, technical solutions, and reusable templates.
The following are benefits of establishing an AI/ML CoE:

Faster time to market through a clear path to production
Maximized return on investments through delivering on the promise of generative AI business outcomes
Optimized risk management
Structured upskilling of teams
Sustainable scaling with standardized workflows and tooling
Better support and prioritization of innovation initiatives

The following figure illustrates the key components for establishing an effective AI/ML CoE.

In the following sections, we discuss each numbered component in detail.
1. Sponsorship and mission
The foundational step in setting up an AI/ML CoE is securing sponsorship from senior leadership, establishing leadership, defining its mission and objectives, and aligning empowered leadership.
Establish sponsorship
Establish clear leadership roles and structure to provide decision-making processes, accountability, and adherence to ethical and legal standards:

Executive sponsorship – Secure support from senior leadership to champion AI/ML initiatives
Steering committee – Form a committee of key stakeholders to oversee the AI/ML CoE’s activities and strategic direction
Ethics board – Create a board to address ethical and responsible AI considerations in AI/ML development and deployment

Define the mission
Making the mission customer- or product-focused and aligned with the organization’s overall strategic goals helps outline the AI/ML CoE’s role in achieving them. This mission, usually set by the executive sponsor in alignment with the heads of business units, serves as a guiding principle for all CoE activities, and contains the following:

Mission statement – Clearly articulate the purpose of the CoE in advancing customer and product outcomes applying AI/ML technologies
Strategic objectives – Outline tangible and measurable AI/ML goals that align with the organization’s overall strategic goals
Value proposition – Quantify the expected business value Key Performance Indicators (KPIs) such as cost savings, revenue gains, user satisfaction, time savings, and time-to-market

2. People
According to a Gartner report, 53% of business, functional, and technical teams rate their technical acumen on generative AI as “Intermediate” and 64% of senior leadership rate their skill as “Novice.” By developing customized solutions tailored to the specific and evolving needs of the business, you can foster a culture of continuous growth and learning and cultivate a deep understanding of AI and ML technologies, including generative AI skill development and enablement.
Training and enablement
To help educate employees on AI/ML concepts, tools, and techniques, the AI/ML CoE can develop training programs, workshops, certification programs, and hackathons. These programs can be tailored to different levels of expertise and designed to help employees understand how to use AI/ML to solve business problems. Additionally, the CoE could provide a mentoring platform to employees who are interested in further enhancing their AI/ML skills, develop certification programs to recognize employees who have achieved a certain level of proficiency in AI/ML, and provide ongoing training to keep the team updated with the latest technologies and methodologies.
Dream team
Cross-functional engagement is essential to achieve well-rounded AI/ML solutions. Having a multidisciplinary AI/ML CoE that combines industry, business, technical, compliance, and operational expertise helps drive innovation. It harnesses the 360 view potential of AI in achieving a company’s strategic business goals. Such a diverse team with AI/ML expertise may include roles such as:

Product strategists – Make sure all products, features, and experiments are cohesive to the overall transformation strategy
AI researchers – Employ experts in the field to drive innovation and explore cutting-edge techniques such as generative AI
Data scientists and ML engineers – Develop capabilities for data preprocessing, model training, and validation
Domain experts – Collaborate with professionals from business units who understand the specific applications and business need
Operations – Develop KPIs, demonstrate value delivery, and manage machine learning operations (MLOPs) pipelines
Project managers – Appoint project managers to implement projects efficiently

Knowledge sharing
By fostering collaboration within the CoE, internal stakeholders, business unit teams, and external stakeholders, you can enable knowledge sharing and cross-disciplinary teamwork. Encourage knowledge sharing, establish a knowledge repository, and facilitate cross-functional projects to maximize the impact of AI/ML initiatives. Some example key actions to foster knowledge sharing are:

Cross-functional collaborations – Promote teamwork between experts in generative AI and business unit domain-specific professionals to innovate on cross-functional use cases
Strategic partnerships – Investigate partnerships with research institutions, universities, and industry leaders specializing in generative AI to take advantage of their collective expertise and insights

3. Governance
Establish governance that enables the organization to scale value delivery from AI/ML initiatives while managing risk, compliance, and security. Additionally, pay special attention to the changing nature of the risk and cost that is associated with the development as well as the scaling of AI.
Responsible AI
Organizations can navigate potential ethical dilemmas associated with generative AI by incorporating considerations such as fairness, explainability, privacy and security, robustness, governance, and transparency. To provide ethical integrity, an AI/ML CoE helps integrate robust guidelines and safeguards across the AI/ML lifecycle in collaboration with stakeholders. By taking a proactive approach, the CoE provides ethical compliance but also builds trust, enhances accountability, and mitigates potential risks such as veracity, toxicity, data misuse, and intellectual property concerns.
Standards and best practices
Continuing its stride towards excellence, the CoE helps define common standards, industry-leading practices, and guidelines. These encompass a holistic approach, covering data governance, model development, ethical deployment, and ongoing monitoring, reinforcing the organization’s commitment to responsible and ethical AI/ML practices. Examples of such standards include:

Development framework – Establishing standardized frameworks for AI development, deployment, and governance provides consistency across projects, making it easier to adopt and share best practices.
Repositories – Centralized code and model repositories facilitate the sharing of best practices and industry standard solutions in coding standards, enabling teams to adhere to consistent coding conventions for better collaboration, reusability, and maintainability.
Centralized knowledge hub – A central repository housing datasets and research discoveries to serve as a comprehensive knowledge center.
Platform – A central platform such as Amazon SageMaker for creation, training, and deployment. It helps manage and scale central policies and standards.
Benchmarking and metrics – Defining standardized metrics and benchmarking to measure and compare the performance of AI models, and the business value derived.

Data governance
Data governance is a crucial function of an AI/ML CoE, such as making sure data is collected, used, and shared in a responsible and trustworthy manner. Data governance is essential for AI applications, because these applications often use large amounts of data. The quality and integrity of this data are critical to the accuracy and fairness of AI-powered decisions. The AI/ML CoE helps define best practices and guidelines for data preprocessing, model development, training, validation, and deployment. The CoE should make sure that data is accurate, complete, and up-to-date; the data is protected from unauthorized access, use, or disclosure; and data governance policies demonstrate the adherence to regulatory and internal compliance.
Model oversight
Model governance is a framework that determines how a company implements policies, controls access to models, and tracks their activity. The CoE helps make sure that models are developed and deployed in a safe, trustworthy, and ethical fashion. Additionally, it can confirm that model governance policies demonstrate the organization’s commitment to transparency, fostering trust with customers, partners, and regulators. It can also provide safeguards customized to your application requirements and make sure responsible AI policies are implemented using services such as Guardrails for Amazon Bedrock.
Value delivery
Manage the AI/ML initiative return on investment, platform and services expenses, efficient and effective use of resources, and ongoing optimization. This requires monitoring and analyzing use case-based value KPIs and expenditures related to data storage, model training, and inference. This includes assessing the performance of various AI models and algorithms to identify cost-effective, resource-optimal solutions such as using AWS Inferentia for inference and AWS Trainium for training. Setting KPIs and metrics is pivotal to gauge effectiveness. Some example KPIs are:

Return on investment (ROI) – Evaluating financial returns against investments justifies resource allocation for AI projects
Business impact – Measuring tangible business outcomes like revenue uplift or enhanced customer experiences validates AI’s value
Project delivery time – Tracking time from project initiation to completion showcases operational efficiency and responsiveness

4. Platform
The AI/ML CoE, in collaboration with the business and technology teams, can help build an enterprise-grade and scalable AI platform, enabling organizations to operate AI-enabled services and products across business units. It can also help develop custom AI solutions and help practitioners adapt to change in AI/ML development.
Data and engineering architecture
The AI/ML CoE helps set up the right data flows and engineering infrastructure, in collaboration with the technology teams, to accelerate the adoption and scaling of AI-based solutions:

High-performance computing resources – Powerful GPUs such as Amazon Elastic Compute Cloud (Amazon EC2) instances, powered by the latest NVIDIA H100 Tensor Core GPUs, are essential for training complex models.
Data storage and management – Implement robust data storage, processing, and management systems such as AWS Glue and Amazon OpenSearch Service.
Platform – Using cloud platforms can provide flexibility and scalability for AI/ML projects for tasks such as SageMaker, which can help provide end-to-end ML capability across generative AI experimentation, data prep, model training, deployment, and monitoring. This further helps accelerate generative AI workloads from experimentation to production. Amazon Bedrock is an easier way to build and scale generative AI applications with foundation models (FMs). As a fully managed service, it offers a choice of high-performing FMs from leading AI companies including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon.
Development tools and frameworks – Use industry-standard AI/ML frameworks and tools such as Amazon CodeWhisperer, Apache MXNet, PyTorch, and TensorFlow.
Version control and collaboration tools – Git repositories, project management tools, and collaboration platforms can facilitate teamwork, such as AWS CodePipeline and Amazon CodeGuru.
Generative AI frameworks – Utilize state-of-the-art foundation models, tools, agents, knowledge bases, and guardrails available on Amazon Bedrock.
Experimentation platforms – Deploy platforms for experimentation and model development, allowing for reproducibility and collaboration, such as Amazon SageMaker JumpStart.
Documentation – Emphasize the documentation of processes, workflows, and best practices within the platform to facilitate knowledge sharing among practitioners and teams.

Lifecycle management
Within the AI/ML CoE, the emphasis on scalability, availability, reliability, performance, and resilience is fundamental to the success and adaptability of AI/ML initiatives. Implementation and operationalization of a lifecycle management system such as MLOps can help automate deployment and monitoring, resulting in improved reliability, time to market, and observability. Using tools like Amazon SageMaker Pipelines for workflow management, Amazon SageMaker Experiments for managing experiments, and Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration enables adaptable deployment and management of AI/ML applications, fostering scalability and portability across various environments. Similarly, employing serverless architectures such as AWS Lambda empowers automatic scaling based on demand, reducing operational complexity while offering flexibility in resource allocation.
Strategic alliances in AI services
The decision to buy or build solutions involves trade-offs. Buying offers speed and convenience by using pre-built tools, but may lack customization. On the other hand, building provides tailored solutions but demands time and resources. The balance hinges on the project scope, timeline, and long-term needs, achieving optimal alignment with organizational goals and technical requirements. The decision, ideally, can be based on a thorough assessment of the specific problem to be solved, the organization’s internal capabilities, and the area of the business targeted for growth. For example, if the business system helps establish uniqueness and then builds to differentiate in the market, or if the business system supports a standard commoditized business process, then buys to save.
By partnering with third-party AI service providers, such as AWS Generative AI Competency Partners, the CoE can use their expertise and experience to accelerate the adoption and scaling of AI-based solutions. These partnerships can help the CoE stay up to date with the latest AI/ML research and trends, and can provide access to cutting-edge AI/ML tools and technologies. Additionally, third-party AI service providers can help the CoE identify new use cases for AI/ML and can provide guidance on how to implement AI/ML solutions effectively.
5. Security
Emphasize, assess, and implement security and privacy controls across the organization’s data, AI/ML, and generative AI workloads. Integrate security measures across all aspects of AI/ML to identify, classify, remediate, and mitigate vulnerabilities and threats.
Holistic vigilance
Based on how your organization is using generative AI solutions, scope the security efforts, design resiliency of the workloads, and apply relevant security controls. This includes employing encryption techniques, multifactor authentication, threat detection, and regular security audits to make sure data and systems remain protected against unauthorized access and breaches. Regular vulnerability assessments and threat modeling are crucial to address emerging threats. Strategies such as model encryption, using secure environments, and continuous monitoring for anomalies can help protect against adversarial attacks and malicious misuse. To monitor the model for threats detection, you can use tools like Amazon GuardDuty. With Amazon Bedrock, you have full control over the data you use to customize the foundation models for your generative AI applications. Data is encrypted in transit and at rest. User inputs and model outputs are not shared with any model providers; keeping your data and applications secure and private.
End-to-end assurance
Enforcing the security of the three critical components of any AI system (inputs, model, and outputs) is critical. Establishing clearly defined roles, security policies, standards, and guidelines across the lifecycle can help manage the integrity and confidentiality of the system. This includes implementation of industry best practice measures and industry frameworks, such as NIST, OWASP-LLM, OWASP-ML, MITRE Atlas. Furthermore, evaluate and implement requirements such as Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) and European Union’s General Data Protection Regulation (GDPR). You can use tools such as Amazon Macie to discover and protect your sensitive data.
Infrastructure (data and systems)
Given the sensitivity of the data involved, exploring and implementing access and privacy-preserving techniques is vital. This involves techniques such as least privilege access, data lineage, keeping only relevant data for use case, and identifying and classifying sensitive data to enable collaboration without compromising individual data privacy. It’s essential to embed these techniques within the AI/ML development lifecycle workflows, maintain a secure data and modeling environment, and stay in compliance with privacy regulations and protect sensitive information. By integrating security-focused measures into the AI/ML CoE’s strategies, the organization can better mitigate risks associated with data breaches, unauthorized access, and adversarial attacks, thereby providing integrity, confidentiality, and availability for its AI assets and sensitive information.
6. Operations
The AI/ML CoE needs to focus on optimizing the efficiency and growth potential of implementing generative AI within the organization’s framework. In this section, we discuss several key aspects aimed at driving successful integration while upholding workload performance.
Performance management
Setting KPIs and metrics is pivotal to gauge effectiveness. Regular assessment of these metrics allows you to track progress, identify trends, and foster a culture of continual improvement within the CoE. Reporting on these insights provides alignment with organizational objectives and informs decision-making processes for enhanced AI/ML practices. Solutions such as Bedrock integration with Amazon CloudWatch, helps track and manage usage metrics, and build customized dashboards for auditing.
An example KPI is model accuracy: assessing models against benchmarks provides reliable and trustworthy AI-generated outcomes.
Incident management
AI/ML solutions need ongoing control and observation to manage any anomalous activities. This requires establishing processes and systems across the AI/ML platform, ideally automated. A standardized incident response strategy needs to be developed and implemented in alignment with the chosen monitoring solution. This includes elements such as formalized roles and responsibilities, data sources and metrics to be monitored, systems for monitoring, and response actions such as mitigation, escalation, and root cause analysis.
Continuous improvement
Define rigorous processes for generative AI model development, testing, and deployment. Streamline the development of generative AI models by defining and refining robust processes. Regularly evaluate the AI/ML platform performance and enhance generative AI capabilities. This involves incorporating feedback loops from stakeholders and end-users and dedicating resources to exploratory research and innovation in generative AI. These practices drive continual improvement and keep the CoE at the forefront of AI innovation. Furthermore, implement generative AI initiatives seamlessly by adopting agile methodologies, maintaining comprehensive documentation, conducting regular benchmarking, and implementing industry best practices.
7. Business
The AI/ML CoE helps drive business transformation by continuously identifying priority pain points and opportunities across business units. Aligning business challenges and opportunities to customized AI/ML capabilities, the CoE drives rapid development and deployment of high-value solutions. This alignment to real business needs enables step-change value creation through new products, revenue streams, productivity, optimized operations, and customer satisfaction.
Envision an AI strategy
With the objective to drive business outcomes, establish a compelling multi-year vision and strategy on how the adoption of AI/ML and generative AI techniques can transform major facets of the business. This includes quantifying the tangible value at stake from AI/ML in terms of revenues, cost savings, customer satisfaction, productivity, and other vital performance indicators over a defined strategic planning timeline, such as 3–5 years. Additionally, the CoE must secure buy-in from executives across business units by making the case for how embracing AI/ML will create competitive advantages and unlock step-change improvements in key processes or offerings.
Use case management
To identify, qualify, and prioritize the most promising AI/ML use cases, the CoE facilitates an ongoing discovery dialogue with all business units to surface their highest-priority challenges and opportunities. Each complex business issue or opportunity must be articulated by the CoE, in collaboration with business unit leaders, as a well-defined problem and opportunity statement that lends itself to an AI/ML-powered solution. These opportunities establish clear success metrics tied to business KPIs and outline the potential value impact vs. implementation complexity. A prioritized pipeline of high-potential AI/ML use cases can then be created, ranking opportunities based on expected business benefit and feasibility.
Proof of concept
Before undertaking full production development, prototype proposed solutions for high-value use cases through controlled proof of concept (PoC) projects focused on demonstrating initial viability. Rapid feedback loops during these PoC phases allow for iteration and refinement of approaches at a small scale prior to wider deployment. The CoE establishes clear success criteria for PoCs, in alignment with business unit leaders, that map to business metrics and KPIs for ultimate solution impact. Furthermore, the CoE can engage to share expertise, reusable assets, best practices, and standards.
Executive alignment
To provide full transparency, the business unit executive stakeholders must be aligned with AI/ML initiatives, and have regular reporting with them. This way, any challenges that need to be escalated can be quickly resolved with executives who are familiar with the initiatives.
8. Legal
The legal landscape of AI/ML and generative AI is complex and evolving, presenting a myriad of challenges and implications for organizations. Issues such as data privacy, intellectual property, liability, and bias require careful consideration within the AI/ML CoE. As regulations struggle to keep pace with technological advancements, the CoE must partner with the organization’s legal team to navigate this dynamic terrain to enforce compliance and responsible development and deployment of these technologies. The evolving landscape demands that the CoE, working in collaboration with the legal team, develops comprehensive AI/ML governance policies covering the entire AI/ML lifecycle. This process involves business stakeholders in decision-making processes and regular audits and reviews of AI/ML systems to validate compliance with governance policies.
9. Procurement
The AI/ML CoE needs to work with partners, both Independent Software Vendors (ISV) and System Integrators (SI) to help with the buy and build strategies. They need to partner with the procurement team to develop a selection, onboarding, management, and exit framework. This includes acquiring technologies, algorithms, and datasets (sourcing reliable datasets is crucial for training ML models, and acquiring cutting-edge algorithms and generative AI tools enhances innovation). This will help accelerated development of capabilities needed for business. Procurement strategies must prioritize ethical considerations, data security, and ongoing vendor support to provide sustainable, scalable, and responsible AI integration.
10. Human Resources
Partner with Human Resources (HR) on AI/ML talent management and pipeline. This involves cultivating talent to understand, develop, and implement these technologies. HR can help bridge the technical and non-technical divide, fostering interdisciplinary collaboration, building a path for onboarding new talent, training them, and growing them on both professional and skills. They can also address ethical concerns through compliance training, upskill employees on the latest emerging technologies, and manage the impact of job roles that are critical for continued success.
11. Regulatory and compliance
The regulatory landscape for AI/ML is rapidly evolving, with governments worldwide racing to establish governance regimes for the increasing adoption of AI applications. The AI/ML CoE needs a focused approach to stay updated, derive actions, and implement regulatory legislations such as Brazil’s General Personal Data Protection Law (LGPD), Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA), and the European Union’s General Data Protection Regulation (GDPR), and frameworks such as ISO 31700, ISO 29100, ISO 27701, Federal Information Processing Standards (FIPS), and NIST Privacy Framework. In the US, regulatory actions include mitigating risks posed by the increased adoption of AI, protecting workers affected by generative AI, and providing stronger consumer protections. The EU AI Act includes new assessment and compliance requirements.
As AI regulations continue to take shape, organizations are advised to establish responsible AI as a C-level priority, set and enforce clear governance policies and processes around AI/ML, and involve diverse stakeholders in decision-making processes. The evolving regulations emphasize the need for comprehensive AI governance policies that cover the entire AI/ML lifecycle, and regular audits and reviews of AI systems to address biases, transparency, and explainability in algorithms. Adherence to standards fosters trust, mitigates risks, and promotes responsible deployment of these advanced technologies.
Conclusion
The journey to establishing a successful AI/ML center of excellence is a multifaceted endeavor that requires dedication and strategic planning, while operating with agility and collaborative spirit. As the landscape of artificial intelligence and machine learning continues to evolve at a rapid pace, the creation of an AI/ML CoE represents a necessary step towards harnessing these technologies for transformative impact. By focusing on the key considerations, from defining a clear mission to fostering innovation and enforcing ethical governance, organizations can lay a solid foundation for AI/ML initiatives that drive value. Moreover, an AI/ML CoE is not just a hub for technological innovation; it’s a beacon for cultural change within the organization, promoting a mindset of continuous learning, ethical responsibility, and cross-functional collaboration.
Stay tuned as we continue to explore the AI/ML CoE topics in our upcoming posts in this series. If you need help establishing an AI/ML Center of Excellence, please reach out to a specialist.

About the Authors
Ankush Chauhan is a Sr. Manager, Customer Solutions at AWS based in New York, US. He supports Capital Markets customers optimize their cloud journey, scale adoption, and realize the transformative value of building and inventing in the cloud. In addition, he is focused on enabling customers on their AI/ML journeys including generative AI. Beyond work, you can find Ankush running, hiking, or watching soccer.
Ava Kong is a Generative AI Strategist at the AWS Generative AI Innovation Center, specializing in the financial services sector. Based in New York, Ava has worked closely with a variety of financial institutions on a variety of use cases, combining the latest in generative AI technology with strategic insights to enhance operational efficiency, drive business outcomes, and demonstrate the broad and impactful application of AI technologies.
Vikram Elango is a Sr. AI/ML Specialist Solutions Architect at AWS, based in Virginia, US. He is currently focused on generative AI, LLMs, prompt engineering, large model inference optimization, and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.
Rifat Jafreen is a Generative AI Strategist in the AWS Generative AI Innovation center where her focus is to help customers realize business value and operational efficiency by using generative AI. She has worked in industries across telecom, finance, healthcare and energy; and onboarded machine learning workloads for numerous customers. Rifat is also very involved in MLOps, FMOps and Responsible AI.
Authors would like to extend special thanks to Arslan Hussain, David Ping, Jarred Graber, and Raghvender Arni, for their support, expertise, and guidance.

Deep Learning Techniques for Autonomous Driving: An Overview

Over the past decade, advancements in deep learning and artificial intelligence have driven significant strides in self-driving vehicle technology. These technologies have revolutionized computer vision, robotics, and natural language processing and played a pivotal role in the autonomous driving revolution. From basic driver assistance to fully autonomous vehicles(AVs) capable of navigating without human intervention, the progression is evident through the SAE Levels of vehicle automation. Despite most scenarios being solvable with traditional methods, unresolved corner cases highlight the necessity for AI-driven solutions. With sensors enabling perception and communication technologies like 5G aiding extended perception, AVs promise safer, more efficient transportation, albeit with challenges like sensor reliability and integration.

Deep Learning-based Decision-Making Architectures for Self-Driving Cars:

Self-driving cars rely on complex decision-making systems that analyze data from various sensors to navigate autonomously. These systems can be modular, with distinct components for perception, path planning, behavior arbitration, and motion control, each designed using AI or classical methods. Alternatively, an End2End learning approach directly maps sensory data to control outputs. Safety monitors ensure the reliability of each module. Understanding the environment, planning paths, behavior arbitration, and motion control are essential tasks. Classical methodologies for these tasks are also explored. Deep learning and AI technologies play crucial roles in both modular and End2End systems for autonomous driving.

Overview of Deep Learning Technologies:

Deep learning plays an important role in autonomous driving, with CNNs being crucial for processing spatial information like images, replacing traditional handcrafted features with learned representations. Mimicking aspects of the mammalian visual cortex, CNNs efficiently detect image features, aiding in object recognition. RNNs excel in processing temporal sequences such as video streams or text. Unlike conventional networks, RNNs possess a time-dependent feedback loop, allowing them to capture temporal dependencies. Long Short-Term Memory (LSTM) networks mitigate the vanishing gradient problem encountered in basic RNNs, enabling the modeling of longer-term dependencies in sequences.

DRL presents a paradigm for autonomous driving, employing the Partially Observable Markov Decision Process formalism. In this framework, an agent, like a self-driving car, navigates an environment based on observed sensory data, taking actions to maximize cumulative future rewards. DRL models, such as Deep Q-Networks (DQN), estimate optimal action policies by training neural networks to approximate the maximum expected future rewards. Extensions to the base DQN algorithm, like Double Q Learning and Prioritized replay, enhance its performance, offering promising avenues for autonomous driving applications. However, challenges remain in adapting DRL to real-world driving conditions.

Deep Learning for Driving Scene Perception and Localization:

Autonomous vehicles rely on perceiving their surroundings to navigate safely. The methods involve deep learning, particularly for object detection, recognition, and scene understanding. The debate between camera and LiDAR sensors persists, each having advantages and limitations. While LiDAR offers precise 3D data but is costly and susceptible to weather, cameras are cost-efficient but lack depth perception. Researchers aim to bridge this gap by generating LiDAR-like point clouds from visual depth estimation. Deep learning architectures are employed for object detection, semantic segmentation, and localization, leveraging camera and LiDAR data for comprehensive scene understanding essential for autonomous driving.

Safety of Deep Learning in Autonomous Driving:

Ensuring the safety of autonomous driving systems that utilize deep learning is a multifaceted challenge. Safety hinges on understanding potential failures, the system’s context and defining safe behavior. Different definitions of safety exist, from risk reduction to minimizing harm from unwanted outcomes. Existing standards like ISO 26262 provide a framework, but adapting them for deep learning is complex. Deep learning introduces unique hazards and uncertainties, requiring new fault detection and mitigation approaches. While machine learning techniques are becoming more reliable, comprehensive safety assurance for deep learning in safety-critical systems remains an ongoing endeavor, necessitating the development of tailored safety standards.

Conclusion:

In the realm of autonomous driving, several open challenges persist, all of which can be addressed with the help of Deep Learning and AI:

Perception: Deep learning enhances object detection and recognition accuracy, but future systems should aim for increased detail recognition and improved camera and LiDAR data integration.

Short- to middle-term reasoning: AI and deep learning are crucial for path planning, particularly in local trajectory estimation and planning.

Availability of training data: Deep learning’s efficacy relies heavily on data quality, with simulation environments bridging the gap between real-world data scarcity and training requirements.

Learning corner cases: Deep learning algorithms need enhanced generalization power to handle rare driving scenarios, necessitating the development of one-shot and low-shot learning methods.

Learning-based control methods: Deep learning can adaptively learn control parameters, improving autonomous vehicle performance by approximating true system models.

Functional safety: Integrating deep learning into safety-critical systems poses challenges, particularly in meeting existing safety standards and ensuring the explainability, stability, and robustness of neural networks.

Real-time computing and communication: Meeting real-time processing requirements for large sensor data volumes and high-speed communication lines requires advances in hardware and communication networks.

The post Deep Learning Techniques for Autonomous Driving: An Overview appeared first on MarkTechPost.

TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Sp …

Wearables have transformed human-technology interaction, facilitating continuous health monitoring. The wearables market is projected to surge from 70 billion USD in 2023 to 230 billion USD by 2032, with head-worn devices, including earphones and glasses, experiencing rapid growth (71 billion USD in 2023 to 172 billion USD by 2030). This growth is propelled by the rising significance of wearables, augmented reality (AR), and virtual reality (VR). Head-worn wearables uniquely capture speech signals, traditionally collected by over-the-air (OTA) microphones near or on the head, converting air pressure fluctuations into electrical signals for various applications. However, OTA microphones, typically located near the mouth, easily capture background noise, potentially compromising speech quality, particularly in noisy environments.

Various studies have tackled the challenge of separating speech from background noise through denoising, sound source separation, and speech enhancement techniques. However, this approach is hindered by the model’s inability to anticipate the diverse types of background noises and the prevalence of noisy environments, such as bustling cafeterias or construction sites. Unlike OTA microphones, bone conduction microphones (BCM) placed directly on the head are resilient to ambient noise, detecting vibrations from the skin and skull during speech. Although BCMs offer noise robustness, vibration-based methods suffer from frequency attenuation, affecting speech intelligibility. Some research endeavors explore vibration and bone-conduction super-resolution methods to reconstruct higher frequencies for improved speech quality, yet practical implementation for real-time wearable systems faces challenges. These include the heavy processing demands of state-of-the-art speech super-resolution models like generative adversarial networks (GANs), which require substantial memory and computational resources, resulting in performance gaps compared to smaller footprint methods. Optimization considerations, such as sampling rate and deployment strategies, remain crucial for enhancing real-time system efficiency.

Researchers from Northwestern University and Columbia University introduced TRAMBA, a hybrid transformer, and Mamba architecture for enhancing acoustic and bone conduction speech in mobile and wearable platforms. Previously, adopting bone conduction speech enhancement in such platforms faced challenges due to labor-intensive data collection and performance gaps between models. TRAMBA addresses this by pre-training with widely available audio speech datasets and fine-tuning with a small amount of bone conduction data. It achieves reconstructing intelligible speech using a single wearable accelerometer, demonstrating generalizability across multiple acoustic modalities. Integrated into wearable and mobile platforms, TRAMBA enables real-time speech super-resolution and significant power consumption reduction. This is also the first study to sense intelligible speech using only a single head-worn accelerometer.

At a macro level, TRAMBA architecture integrates a modified U-Net structure with self-attention in the downsampling and upsampling layers, along with Mamba in the narrow bottleneck layer. TRAMBA operates on 512ms windows of single-channel audio and preprocesses acceleration data from an accelerometer. Each downsampling block consists of a 1D convolutional layer with LeakyReLU activations, followed by a robust conditioning layer called Scale-only Attention-based Feature-wise Linear Modulation (SAFiLM). SAFiLM utilizes a multi-head attention mechanism to learn scaling factors for enhancing feature representations. The bottleneck layer employs Mamba, known for its efficient memory usage and attention mechanisms akin to transformers. However, due to gradient vanishing issues, transformers are retained only in the downsampling and upsampling blocks. Residual connections are employed to facilitate gradient flow and optimize deeper networks, enhancing training efficiency.

TRAMBA exhibits superior performance across various metrics and sampling rates compared to other models, including U-Net architectures. Although the Aero GAN method slightly outperforms TRAMBA in the LSD metric, TRAMBA excels in perceptual and noise metrics such as SNR, PESQ, and STOI. This highlights the effectiveness of integrating transformers and Mamba in enhancing local speech formants compared to traditional architectures. Also, transformer and Mamba-based models demonstrate superior performance over state-of-the-art GANs with significantly reduced memory and inference time requirements. Notably, TRAMBA’s efficient processing allows for real-time operation, unlike Aero GAN, which exceeds the window size, making it impractical for real-time applications. Comparisons with the top-performing U-Net architecture (TUNet) are also made.

In conclusion, this study presents TRAMBA, a hybrid architecture combining transformer and Mamba elements for speech super-resolution and enhancement on mobile and wearable platforms. It surpasses existing methods across various acoustic modalities while maintaining a compact memory footprint of only 19.7 MBs, contrasting with GANs requiring at least hundreds of MBs. Integrated into real mobile and head-worn wearable systems, TRAMBA exhibits superior speech quality in noisy environments compared to traditional denoising approaches. Also, it extends battery life by up to 160% by reducing the resolution of audio that needs to be sampled and transmitted. TRAMBA represents a crucial advancement for integrating speech enhancement into practical mobile and wearable platforms.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Speech Super Resolution and Enhancement for Mobile and Wearable Platforms appeared first on MarkTechPost.

What Are The Dimensions For Creating Retrieval Augmented Generation (R …

In the dynamic realm of Artificial Intelligence, Natural Language Processing (NLP), and Information Retrieval, advanced architectures like Retrieval Augmented Generation (RAG) have gained a significant amount of attention. However, most data science researchers suggest not to leap into sophisticated RAG models until the evaluation pipeline is completely reliable and robust.

Carefully assessing RAG pipelines is vital, but it is frequently overlooked in the rush to incorporate cutting-edge features. It is recommended that researchers and practitioners strengthen their evaluation set up as a top priority before tackling intricate model improvements. 

Comprehending the assessment nuances for RAG pipelines is critical because these models depend on both generation capabilities and retrieval quality. The dimensions have been divided into two important categories, which are as follows.

 1. Retrieval Dimensions  

a. Context Precision: It determines if every ground-truth item in the context has a higher priority ranking than any other item.

b. Context Recall: It assesses the degree to which the ground-truth response and the recovered context correspond. It is dependent on the retrieved context as well as the ground truth.

c. Context Relevance: It evaluates the contexts that are offered in order to assess the relevance of the retrieved context.

d. Context Entity Recall: By comparing the number of entities present in the ground truths and the contexts to the number of entities present in the ground truths alone, the Context Entity Recall metric calculates the recall of the retrieved context.

e. Noise Robustness: The Noise Robustness metric assesses the model’s ability to handle question-related noise documents that don’t provide much information.

2. Generation dimensions

a. Faithfulness: It evaluates the generated response’s factual consistency in according to the given context. 

b. Answer Relevance It calculates how well the generated response responds to the given question. Lower points are awarded for answers that contain redundant or missing information, and vice versa. 

c. Negative Rejection: It assesses the model’s capacity to hold off on responding when the documents it has obtained don’t include enough information to address a query. 

d. Information Integration: It evaluates how well the model can integrate data from different documents to provide answers to complex questions.

e. Counterfactual Robustness: It assesses the model’s ability to recognize and ignore known errors in documents, even while it is aware of possible disinformation.

Here are some frameworks consisting of these dimensions which can be accessed by the following links.

1. Ragas – https://docs.ragas.io/en/stable/

2. TruLens – https://www.trulens.org/

3. ARES – https://ares-ai.vercel.app/

4. DeepEval – https://docs.confident-ai.com/docs/getting-started

5. Tonic Validate – https://docs.tonic.ai/validate

6. LangFuse – https://langfuse.com/

This article is inspired by this LinkedIn post.
The post What Are The Dimensions For Creating Retrieval Augmented Generation (RAG) Pipelines? appeared first on MarkTechPost.

Build a Hugging Face text classification model in Amazon SageMaker Jum …

Amazon SageMaker JumpStart provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including image, text, and tabular.
This post introduces using the text classification and fill-mask models available on Hugging Face in SageMaker JumpStart for text classification on a custom dataset. We also demonstrate performing real-time and batch inference for these models. This supervised learning algorithm supports transfer learning for all pre-trained models available on Hugging Face. It takes a piece of text as input and outputs the probability for each of the class labels. You can fine-tune these pre-trained models using transfer learning even when a large corpus of text isn’t available. It’s available in the SageMaker JumpStart UI in Amazon SageMaker Studio. You can also use it through the SageMaker Python SDK, as demonstrated in the example notebook Introduction to SageMaker HuggingFace – Text Classification.
Solution overview
Text classification with Hugging Face in SageMaker provides transfer learning on all pre-trained models available on Hugging Face. According to the number of class labels in the training data, a classification layer is attached to the pre-trained Hugging Face model. Then either the whole network, including the pre-trained model, or only the top classification layer can be fine-tuned on the custom training data. In this transfer learning mode, training can be achieved even with a smaller dataset.
In this post, we demonstrate how to do the following:

Use the new Hugging Face text classification algorithm
Perform inference with the Hugging Face text classification algorithm
Fine-tune the pre-trained model on a custom dataset
Perform batch inference with the Hugging Face text classification algorithm

Prerequisites
Before you run the notebook, you must complete some initial setup steps. Let’s set up the SageMaker execution role so it has permissions to run AWS services on your behalf:

!pip install sagemaker –upgrade –quiet

import sagemaker, boto3, json
from sagemaker.session import Session
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

Run inference on the pre-trained model
SageMaker JumpStart support inference for any text classification model available through Hugging Face. The model can be hosted for inference and support text as the application/x-text content type. This will not only allow you to use a set of pre-trained models, but also enable you to choose other classification tasks.
The output contains the probability values, class labels for all classes, and the predicted label corresponding to the class index with the highest probability encoded in JSON format. The model processes a single string per request and outputs only one line. The following is an example of a JSON format response:

accept: application/json;verbose
{“probabilities”: [prob_0, prob_1, prob_2, …],
“labels”: [label_0, label_1, label_2, …],
“predicted_label”: predicted_label}

If accept is set to application/json, then the model only outputs probabilities. For more details on training and inference, see the sample notebook.
You can run inference on the text classification model by passing the model_id in the environment variable while creating the object of the Model class. See the following code:

from sagemaker.jumpstart.model import JumpStartModel

hub = {}
HF_MODEL_ID = ‘distilbert-base-uncased-finetuned-sst-2-english’ # Pass any other HF_MODEL_ID from – https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hub[‘HF_MODEL_ID’] = HF_MODEL_ID
hub[‘HF_TASK’] = ‘text-classification’

model = JumpStartModel(model_id=infer_model_id, env =hub, enable_network_isolation=False

Fine-tune the pre-trained model on a custom dataset
You can fine-tune each of the pre-trained fill-mask or text classification models to any given dataset made up of text sentences with any number of classes. The pretrained model attaches a classification layer to the text embedding model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes detected in the input data. The objective is to minimize classification errors on the input data. Then you can deploy the fine-tuned model for inference.
The following are the instructions for how the training data should be formatted for input to the model:

Input – A directory containing a data.csv file. Each row of the first column should have an integer class label between 0 and the number of classes. Each row of the second column should have the corresponding text data.
Output – A fine-tuned model that can be deployed for inference or further trained using incremental training.

The following is an example of an input CSV file. The file should not have any header. The file should be hosted in an Amazon Simple Storage Service (Amazon S3) bucket with a path similar to the following: s3://bucket_name/input_directory/. The trailing / is required.

|0 |hide new secretions from the parental units|
|0 |contains no wit , only labored gags|
|1 |that loves its characters and communicates something rather beautiful about human nature|
|…|…|

The algorithm also supports transfer learning for Hugging Face pre-trained models. Each model is identified by a unique model_id. The following example shows how to fine-tune a BERT base model identified by model_id=huggingface-tc-bert-base-cased on a custom training dataset. The pre-trained model tarballs have been pre-downloaded from Hugging Face and saved with the appropriate model signature in S3 buckets, such that the training job runs in network isolation.
For transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default, update them as needed, and then pass them to the Estimator class. The hyperparameter Train_only_top_layer defines which model parameters change during the fine-tuning process. If train_only_top_layer is True, parameters of the classification layers change and the rest of the parameters remain constant during the fine-tuning process. If train_only_top_layer is False, all parameters of the model are fine-tuned. See the following code:

from sagemaker import hyperparameters# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)# [Optional] Override default hyperparameters with custom values
hyperparameters[“epochs”] = “5”

For this use case, we provide SST2 as a default dataset for fine-tuning the models. The dataset contains positive and negative movie reviews. It has been downloaded from TensorFlow under the Apache 2.0 License. The following code provides the default training dataset hosted in S3 buckets:

# Sample training data is available in this bucket
training_data_bucket = f”jumpstart-cache-prod-{aws_region}”
training_data_prefix = “training-datasets/SST/”

training_dataset_s3_path = f”s3://{training_data_bucket}/{training_data_prefix}”

We create an Estimator object by providing the model_id and hyperparameters values as follows:

# Create SageMaker Estimator instance
tc_estimator = JumpStartEstimator(
hyperparameters=hyperparameters,
model_id=dropdown.value,
instance_type=training_instance_type,
metric_definitions=training_metric_definitions,
output_path=s3_output_location,
enable_network_isolation=False if model_id == “huggingface-tc-models” else True
)

To launch the SageMaker training job for fine-tuning the model, call .fit on the object of the Estimator class, while passing the S3 location of the training dataset:

# Launch a SageMaker Training job by passing s3 path of the training data
tc_estimator.fit({“training”: training_dataset_s3_path}, logs=True)

You can view performance metrics such as training loss and validation accuracy/loss through Amazon CloudWatch while training. You can also fetch these metrics and analyze them using TrainingJobAnalytics:

df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe() #It will produce a dataframe with different metrics
df.head(10)

The following graph shows different metrics collected from the CloudWatch log using TrainingJobAnalytics.

For more information about how to use the new SageMaker Hugging Face text classification algorithm for transfer learning on a custom dataset, deploy the fine-tuned model, run inference on the deployed model, and deploy the pre-trained model as is without first fine-tuning on a custom dataset, see the following example notebook.
Fine-tune any Hugging Face fill-mask or text classification model
SageMaker JumpStart supports the fine-tuning of any pre-trained fill-mask or text classification Hugging Face model. You can download the required model from the Hugging Face hub and perform the fine-tuning. To use these models, the model_id is provided in the hyperparameters as hub_key. See the following code:

HF_MODEL_ID = “distilbert-base-uncased” # Specify the HF_MODEL_ID here from https://huggingface.co/models?pipeline_tag=fill-mask&sort=downloads or https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hyperparameters[“hub_key”] = HF_MODEL_ID

Now you can construct an object of the Estimator class by passing the updated hyperparameters. You call .fit on the object of the Estimator class while passing the S3 location of the training dataset to perform the SageMaker training job for fine-tuning the model.
Fine-tune a model with automatic model tuning
SageMaker automatic model tuning (ATM), also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. In the following code, you use a HyperparameterTuner object to interact with SageMaker hyperparameter tuning APIs:

from sagemaker.tuner import ContinuousParameter
# Define objective metric based on which the best model will be selected.
amt_metric_definitions = {
“metrics”: [{“Name”: “val_accuracy”, “Regex”: “‘eval_accuracy’: ([0-9\.]+)”}],
“type”: “Maximize”,
}
# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
“learning_rate”: ContinuousParameter(0.00001, 0.0001, scaling_type=”Logarithmic”)
}
# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 6
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 2

After you have defined the arguments for the HyperparameterTuner object, you pass it the Estimator and start the training. This will find the best-performing model.
Perform batch inference with the Hugging Face text classification algorithm
If the goal of inference is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch inference functionality may be most straightforward, more scalable, and more appropriate.
Batch inference is useful in the following scenarios:

Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset
Get inference from large datasets
Run inference when you don’t need a persistent endpoint
Associate input records with inferences to assist the interpretation of results

For running batch inference in this use case, you first download the SST2 dataset locally. Remove the class label from it and upload it to Amazon S3 for batch inference. You create the object of Model class without providing the endpoint and create the batch transformer object from it. You use this object to provide batch predictions on the input data. See the following code:

batch_transformer = model.transformer(
instance_count=1,
instance_type=inference_instance_type,
output_path=output_path,
assemble_with=”Line”,
accept=”text/csv”
)

batch_transformer.transform(
input_path, content_type=”text/csv”, split_type=”Line”
)

batch_transformer.wait()

After you run batch inference, you can compare the predication accuracy on the SST2 dataset.
Conclusion
In this post, we discussed the SageMaker Hugging Face text classification algorithm. We provided example code to perform transfer learning on a custom dataset using a pre-trained model in network isolation using this algorithm. We also provided the functionality to use any Hugging Face fill-mask or text classification model for inference and transfer learning. Lastly, we used batch inference to run inference on large datasets. For more information, check out the example notebook.

About the authors
Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.
Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.