How ‘Chain of Thought’ Makes Transformers Smarter

Large Language Models (LLMs) like GPT-3 and ChatGPT exhibit exceptional capabilities in complex reasoning tasks such as mathematical problem-solving and code generation, far surpassing standard supervised machine learning techniques. The key to unlocking these advanced reasoning abilities lies in the chain of thought (CoT), which refers to the ability of the model to generate intermediate reasoning steps before arriving at the final answer, kind of like how we humans break down a complex problem into smaller steps in our head. This can be achieved through methods like training the model on examples enriched with intermediate reasoning steps or using few-shot prompting to instruct the model to generate a CoT.

Now, you might think that the contents of these intermediate steps is what allows the model to reason better. But interestingly, in this study, the researchers found that even if the intermediate steps are incorrect or completely random, just the act of generating them still helps the model a lot. It’s like the model is being told “Okay, think this through step-by-step” and that alone improves its reasoning ability drastically.

So the researchers wanted to understand why this “chain of thought” approach is so powerful for transformers (the type of model used in GPT-3, etc). They used concepts from circuit complexity theory and adopted the language of computational complexity classes like NC, AC, and TC to analyze this problem.

Essentially, they found that without the chain of thought, transformers are limited to efficiently performing only parallel computations, meaning they can solve problems that can be broken down into independent sub-tasks that can be computed simultaneously.

However, many complex reasoning tasks require inherently serial computations, where one step follows from the previous step. And this is where the chain of thought helps transformers a lot. By generating step-by-step reasoning, the model can perform many more serial computations than it could without CoT.

The researchers proved theoretically that while a basic transformer without CoT can only solve problems up to a certain complexity level, allowing a polynomial number of CoT steps makes transformers powerful enough to solve almost any computationally hard problem, at least from a theoretical perspective.

To back up their theory, they also did some experiments on different arithmetic tasks – ones that can be parallelized and ones that inherently require sequential computations. Sure enough, they found that transformers struggled on the sequential tasks without CoT, but enabling CoT drastically boosted their performance, especially when the transformer model was relatively small/shallow.

In essence, the chain of thought is a simple but powerful trick that vastly increases the reasoning capabilities of transformer models like GPT-3. It allows them to tackle complex tasks requiring sequential logic that parallel models would fail at. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post How ‘Chain of Thought’ Makes Transformers Smarter appeared first on MarkTechPost.

FastGen: Cutting GPU Memory Costs Without Compromising on LLM Quality

Autoregressive language models (ALMs) have proven their capability in machine translation, text generation, etc. However, these models pose challenges, including computational complexity and GPU memory usage. Despite great success in various applications, there is an urgent need to find a cost-effective way to serve these models. Moreover, the generative inference of large language models (LLMs) utilizes the KV Cache mechanism to enhance the generation speed. Still, an increase in model size and generation length leads to an increase in memory usage of the KV cache. When memory usage exceeds GPU capacity, the generative inference of LLMs resorts to offloading.

Many works have been carried out to enhance the model efficiency for LLMs, e.g., one such method is to skip multiple tokens at a particular time stamp. Recently, a technique that adds a token selection task to the original BERT model learns to select performance-crucial tokens and detect unimportant tokens to prune using a designed learnable threshold. However, these models are only applied to non-autoregressive models and require an extra re-training phrase, making them less suitable for auto-regressive LLMs like ChatGPT and Llama. It is important to consider pruning tokens’ potential within the KV cache of auto-regressive LLMs to fill this gap.   

Researchers from the University of Illinois Urbana-Champaign and Microsoft proposed FastGen, a highly effective technique to enhance the inference efficiency of LLMs without any loss in visible quality, using lightweight model profiling and adaptive key-value caching. FastGen evicts long-range contexts on attention heads by the KV cache construction in an adaptive manner. Moreover, it is deployed using lightweight attention profiling, which has been used to guide the construction of the adaptive KV cache without resource-intensive fine-tuning or re-training. FastGen is capable of reducing GPU memory usage with negligible generation quality loss.

The adaptive KV Cache compression introduced by the researchers reduces the memory footprint of generative inference for LLMs. In this method, there are two steps for a generative model inference which are involved:

Prompt Encoding: The attention module needs to collect contextual information from all the preceding i-1 tokens for the i-th token generated by autoregressive transformer-based LLM.

Token Generation: When prompt encoding is completed, LLM generates the output token by token, and for each step, the new token(s) generated in the previous step are encoded using the LLM. 

For 30B models, FastGen outperforms all non-adaptive KV compression methods and achieves a higher KV cache reduction ratio with an increase in model size, keeping the model’s quality unaffected. For example, FastGen gets a 44.9% pruned ratio on Llama 1-65B, compared to a 16.9% pruned ratio on Llama 1-7B, achieving a 45% win rate. Further, sensitivity analysis was performed on FastGen by choosing different hyper-parameters. Since the model maintains a win rate of 45%, the study shows no visible impact on generation quality after changing the hyper-parameter.   

In conclusion, researchers from the University of Illinois Urbana-Champaign and Microsoft proposed FastGen, a new technique to enhance LLMs inference efficiency with no loss in visible quality, using lightweight model profiling and adaptive key-value caching. Also, the adaptive KV Cache compression introduced by researchers is constructed using FastGen to reduce the memory footprint of generative inference for LLMs. Future work includes integrating FastGen with other model compression approaches, e.g., quantization and distillation, grouped-query attention, etc.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post FastGen: Cutting GPU Memory Costs Without Compromising on LLM Quality appeared first on MarkTechPost.

QoQ and QServe: A New Frontier in Model Quantization Transforming Larg …

Quantization, a method integral to computational linguistics, is essential for managing the vast computational demands of deploying large language models (LLMs). It simplifies data, thereby facilitating quicker computations and more efficient model performance. However, deploying LLMs is inherently complex due to their colossal size and the computational intensity required. Effective deployment strategies must balance performance, accuracy, and computational overhead.

In LLMs, traditional quantization techniques convert high-precision floating-point numbers into lower-precision integers. While this process reduces memory usage and accelerates computation, it often incurs significant computational overhead. This overhead can degrade model accuracy, as the precision reduction can lead to substantial losses in data fidelity.

Researchers from MIT, NVIDIA, UMass Amherst, and MIT-IBM Watson AI Lab introduced the Quattuor-Octo-Quattuor (QoQ) algorithm, a novel approach that refines quantization. This innovative method employs progressive group quantization, which mitigates the accuracy losses typically associated with standard quantization methods. By quantizing weights to an intermediate precision and refining them to the target precision, the QoQ algorithm ensures that all computations are adapted to the capabilities of current-generation GPUs.

The QoQ algorithm utilizes a two-stage quantization process. Initially, weights are quantized to 8 bits using per-channel FP16 scales; these intermediates are further quantized to 4 bits. This approach enables General Matrix Multiplication (GEMM) operations on INT8 tensor cores, enhancing computational throughput and reducing latency. The algorithm also incorporates SmoothAttention, a technique that adjusts the quantization of activation keys to optimize performance further.

The QServe system was developed to support the deployment of the QoQ algorithm. QServe provides a tailored runtime environment that maximizes the efficiency of LLMs by exploiting the algorithm’s full potential. It integrates seamlessly with current GPU architectures, facilitating operations on low-throughput CUDA cores and significantly boosting processing speed. This system design reduces the quantization overhead by focusing on compute-aware weight reordering and fused attention mechanisms, essential for maintaining throughput and minimizing latency in real-time applications.

Performance evaluations of the QoQ algorithm indicate substantial improvements over previous methods. In testing, QoQ improved the maximum achievable serving throughput of Llama-3-8B models by up to 1.2 times on NVIDIA A100 GPUs and up to 1.4 times on L40S GPUs. Remarkably, on the L40S platform, QServe, a system designed to support QoQ, achieved throughput enhancements of up to 3.5 times compared to the same model on A100 GPUs, significantly reducing the cost of LLM serving.

In conclusion, the study introduces the QoQ algorithm and QServe system as groundbreaking solutions to the challenges of deploying LLMs efficiently. By addressing the significant computational overhead and accuracy loss inherent in traditional quantization methods, QoQ and QServe markedly enhance LLM serving throughput. The results from the implementation demonstrate up to 2.4 times faster processing on advanced GPUs, substantially reducing both the computational demands and the economic costs associated with LLM deployment. This advancement paves the way for broader adoption and more effective use of large language models in real-world applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post QoQ and QServe: A New Frontier in Model Quantization Transforming Large Language Model Deployment appeared first on MarkTechPost.

Innovating Game Design with GPT: A Comprehensive Scoping Review

The advent of Generative Pre-trained Transformers (GPT) has revolutionized the gaming industry, enhancing game development and gameplay experiences. Let’s study a comprehensive review that synthesizes findings from 55 research articles published between 2020 and 2023, along with research from other research papers, detailing the applications, challenges, and future potentials of GPT in gaming. It highlights key areas where GPT is making significant impacts and outlines directions for future research.

Image Source

Procedural Content Generation (PCG)

GPT’s application in procedural content generation marks a paradigm shift in how game content is created, offering automated processes that enhance creativity and efficiency:

Story Generation: GPT models can generate intricate, contextually rich narratives that adapt to player decisions, creating a personalized gaming experience.

Quest Creation: These models enhance narrative depth by generating thematic and coherent quests that integrate seamlessly with the game narrative.

Level Design: GPT aids in crafting detailed and challenging game levels, using player data to adjust difficulty and elements within the game environment dynamically.

Mixed-Initiative Game Design (MIGD)

GPT enhances collaborative game design by integrating AI-generated content with human creativity:

Content Development: GPT suggests innovative content that developers refine and integrate, thus enriching the gaming environment and narrative.

Design Efficiency: The technology improves the speed and quality of game development, enabling a more iterative and responsive design process.

Enhancing Gameplay with AI

The integration of GPT into gameplay mechanics significantly enriches player interaction and immersion, making games more engaging and adaptive:

Dynamic Interaction: GPT models respond to player actions with tailored narrative elements and gameplay mechanics, enhancing games’ storytelling.

Adaptive Gameplay: These models adjust game dynamics based on real-time player feedback, customizing experiences to individual preferences and improving engagement.

Autonomous Game Playing

GPT’s ability to play games autonomously or alongside humans showcases its versatility:

Strategy and Role-playing: GPT acts as a competent player or a collaborative partner in strategy games and RPGs, providing sophisticated strategic insights.

Simulation and Training: GPT participates in simulations, offering realistic scenarios that challenge players and aid training and skill development.

Game User Research

GPT plays a crucial role in analyzing player behavior and feedback, thereby aiding developers in refining game design:

Feedback Analysis: GPT analyzes large volumes of player data to extract patterns and preferences, offering valuable insights for game development.

User Experience Improvement: Based on these insights, GPT helps fine-tune game mechanics and features to meet player expectations better.

Conclusion

The integration of GPT into the gaming industry marks a significant milestone. This transformative technology promises to revolutionize how games are designed, developed, and played. It has the potential to automate and enhance various aspects of game creation and player interaction, making games more immersive, personalized, and engaging.

Innovative Content Creation: GPT’s capabilities in generating procedural content have proven to be a game-changer. By automating the creation of narratives, quests, and game levels, GPT allows developers to focus on more complex and creative aspects of game design, thereby speeding up the development process and reducing costs.

Enhanced Player Experience: GPT’s real-time interaction capabilities significantly enhance the gameplay experience. By dynamically adjusting game elements and narratives in response to player actions, GPT ensures that each gaming session is unique & tailored to individual preferences, increasing player engagement and satisfaction.

Strategic Gameplay Enhancement: GPT’s ability to function as a competent player or an intelligent adversary in strategy games and RPGs introduces a new level of challenge and complexity. This enhances the gameplay experience and aids in training and skill development, preparing players for competitive environments.

Research and Development: GPT also contributes to game user research by analyzing player data to provide insights into player behaviors and preferences. This information is invaluable for game developers looking to refine their games to meet player expectations better and improve overall user experience.

Sources:

https://arxiv.org/pdf/2404.17794

https://arxiv.org/abs/2005.14165

The post Innovating Game Design with GPT: A Comprehensive Scoping Review appeared first on MarkTechPost.

ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion …

The capacity of large language models (LLMs) to produce adequate text in various application domains has caused a revolution in natural language creation. These models are essentially two types: 1) Most model weights and data sources are open source. 2) All model-related information is publicly available, including training data, data sampling ratios, training logs, intermediate checkpoints, and assessment methods (Tiny-Llama, OLMo, and StableLM 1.6B). Full access to open language models for the research community is vital for thoroughly investigating these models’ capabilities and limitations and understanding their inherent biases and potential risks. This is necessary despite the continued breakthroughs in the performance of community-released models.

Meet ChuXin 1.6B, a 1.6 billion parameter open-source language model. Various sources, including encyclopedias, online publications, public knowledge databases in English and Chinese, and 2.3 trillion tokens of open-source data, were utilized to train ChuXin. Other open-source projects inspired by this project include OLMo, Tiny-Llama, and StableLM 1.6B. To accomplish an input length of 1 million, the researchers have improved ChuXin’s context length capabilities by continuing pre-training on datasets derived from lengthier texts. The researchers strongly believe that cultivating a broad and diverse ecosystem of these models is the best way to improve their scientific understanding of open language models and drive technology advancements to make them more practical. 

For their backbone, the team used LLaMA2, tweaked for about 1.6 billion parameters. The following provides further information regarding the design of ChuXin 1.6B as provided by the researchers.

Positional embeddings that rotate (RoPE): They use the Rotary Positional Embedding (RoPE) technique to record the associations between sequence parts at different locations.

Root-mean-squared norm: Pre-normalization, which involves normalizing the input before each sub-layer in the transformer, offers a more consistent training process. This work normalization strategy also uses RMSNorm, which improves training efficiency. 

Focus Cover: Following stableLM’s lead, the team implemented a block-diagonal attention mask architecture that resets attention masks at EOS (End of Sequence) tokens for all packed sequences. This method enhances the model’s performance even further by avoiding the problem of cross-attention during the model’s cool-down phase. 

Generator of tokens: The data was tokenized using the DeepSeek LLM tokenizer, which is based on the tokenizers library’s Byte-Level Byte-Pair Encoding (BBPE). The lexicon contains 102,400 words. The tokenizer’s training was done on a 24-gigabyte multilingual corpus. In addition, this tokenizer can improve the encoding of numerical data by dividing numbers into individual digits. 

Expanded information. The team used SwiGLU as their activation function.

The team’s training process involved utilizing all pre-training datasets obtained from HuggingFace, facilitating easier reproduction of their pre-trained model by others. They optimized their model’s training speed by starting from scratch, using a 4096-context length and several efficient implementations. The researchers began by enhancing the device’s throughput during training with FlashAttention-2. Training was executed using BFloat16 mixed precision, with all-reduce operations preserved in FP32. The research indicates that there is little difference in loss between training on unique data and training on repeated data over several epochs. As part of this effort, they trained for two epochs using 2 trillion (2T) tokens.

To test the model’s performance on Chinese tasks, the team uses the CMMLU and the C-Eval, two tests for Chinese comprehension and reasoning. They also use the HumanEval to test how well the model can generate code. The pre-training performance of ChuXin was tracked using commonsense reasoning benchmarks. The results demonstrate that except OpenbookQA, ChuXin’s performance on most tasks improves as the quantity of training tokens increases.

In the future, the team envisions providing larger and more advanced models, incorporating features like instruction tweaking and multi-modal integration. They also plan to share the challenges they faced and the solutions they devised while developing ChuXin, aiming to inspire the open-source community and stimulate further progress in language modeling. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion Parameters appeared first on MarkTechPost.

Top 50 AI Writing Tools To Try in 2024

Grammarly

Grammarly is a great tool for enhancing writing. It reviews grammar, spelling, punctuation, and style to ensure clear and professional content.

Jasper

Jasper AI is one of the most popular AI writing tools that makes it easy to create content for websites, blogs, social media, etc.

ChatGPT

ChatGPT is a robust language generation model helpful for a range of writing tasks. It handles conversation generation, language translation, summarization, and more.

GPT-4

GPT-4 creates text closely resembling human writing, becoming a potent asset for writers. Many top AI writing tools are enhancing their software by incorporating GPT-4 technology.

Growthbar

Growthbar is an ideal tool for creating blog content that optimizes the same for SEO.

ClosersCopy

This AI writing tool can be used for an array of tasks – writing blog posts, social media content, creating presentations, writing books, etc.

Writesonic

Writesonic allows users to generate high-quality articles, blog posts, and more. It can write content in most of the world’s popular languages, like English, Spanish, French, etc.

Article Forge

Article Forge allows users to generate SEO-optimized, high-quality, unique content about any topic.

ParagraphAI

ParagraphAI is an AI writing app for iOS, Android, and Chrome that helps users write emails and articles better and faster.

Scalenut

Scalenut is a platform for content intelligence that helps users find and create content most relevant to their audience.

Content at Scale

This tool produces high-quality articles quickly while focusing on quality instead of quantity.

Copy AI

Copy AI is a content generator software that helps users get out of the writer’s block.

Frase.io

Frase helps users attract and convert more customers from organic search.

Rytr

Rtyr is a pocket-friendly tool that helps users instantly generate high-quality content.

AI Writer

Using this tool, users can generate accurate, relevant, and high-quality content just by entering a brief description of their topic.

Simplified

Simplified allows users to create content, scale their brands, and collaborate with their team.

Copymatic

Copymatic allows users to create unique, high-quality content like blog posts, landing pages, and digital ads.

Peppertype

Peppertype allows content marketers to generate content ideas instantly.

HiveMind

HiveMind is a tool that automates tasks like content writing, data extraction, and translation.

Anyword

Anyword is an AI writing assistant that makes it easy for users to include specific details, SEO keywords, and other important information.

Narrato

Narrato is a platform used for content creation and copywriting. It helps create briefs, assign tasks to writers via the Narrato Marketplace, and manage the content workflow.

WordAI

WordAI is an AI copywriting tool that enhances content production by rephrasing and restructuring text. Using its natural language generation tools, it can generate up to 1,000 SEO-friendly rewrites from a single piece of content.

Writerly

Writerly provides a generative AI Chrome extension that helps users extract ideas from articles during browsing and generates content briefs for writers.

NeuralText

NeuralText is an all-in-one AI copywriter, SEO content, and keyword research tool.

INK

With INK’s top-notch SEO Optimization and Semantic Intelligence, users can produce content more quickly than ever.

SEO.ai

SEO.ai leverages advanced AI tech to analyze keywords, create search intent-based articles, and enhance content for faster, superior search engine outcomes.

HubSpot

Businesses can swiftly create copy for various programs and channels using HubSpot’s free AI writer.

ProWritingAid

This tool is a grammar checker, style editor, and writing mentor.

Wordtune

Wordtune is an AI Writing assistant that works on websites like Facebook, Twitter, Gmail, and LinkedIn.

Writer

Writer is a platform designed for enterprises to help them create content consistent with their brand.

LongShot

LongShot is a Generative AI for the latest content, one-click blogs, and user-sourced content.

GetGenie

GetGenie AI provides an easy-to-use WordPress plugin that uses AI to replace over ten different apps.

Reword

Reword allows users to generate highly engaging and readable articles.

Outranking

Outranking.io allows users to plan, research, write, optimize, and track their content all in one place.

Hoppycopy

Hoppycopy is a copywriting tool that allows SEO marketers to create powerful and effective email marketing campaigns, newsletters, and more.

Lightkey

This tool provides live text predictions, spelling, and grammar fixes while typing in MS Office, Chrome, Edge and supports 85 languages.

SEOmater

SEOmater boosts content writers and SEO experts with features like keyword research, content optimization, competitor analysis, performance tracking, and detailed reporting.

AISEO.ai

AISEO combines AI and SEO to write SEO-optimized content.

Neuroflash

Neuroflash is an AI-powered marketing copy creation software that helps marketing teams create short and long-form copies.

TextCortex

TextCortex is a robust tool crafted to accommodate one’s distinct communication style and individual requirements.

Regie.ai

Regie tailors content using specific business and prospect data to serve enterprise sales teams. It swiftly creates personalized, optimized one-on-one sales emails and sequences through Generative AI.

TextExpander

TextExpander is a writing productivity tool that aids teams in handling repetitive writing tasks, forming snippets, fixing spellings, sharing content, and more.

Lyne.ai

Lyne allows users to send personalized cold emails at scale.

Shopia

Shopia offers 80+ writing templates to create various content pieces instantly.

Lavender

Lavender is a browser extension that merges AI writing, social data, and inbox productivity tools.

NexMind

NexMind swiftly produces optimized long and short-form content with NLP and semantic suggestions.

Benchmark Email

Benchmark Email makes it easy to create and send emails that keep users in touch with their audience.

Swiftbrief

Swiftbrief is a comprehensive tool for crafting top-quality briefs to guide content writers.

Cohesive AI

Cohesive AI is a Cloud-based AI writing assistant that aids businesses in crafting, editing, and overseeing marketing content like SEO blogs.

Quillbot

QuillBot excels at summarizing and paraphrasing documents and articles, preventing plagiarism.

Don’t forget to join our 42k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post Top 50 AI Writing Tools To Try in 2024 appeared first on MarkTechPost.

A Survey Report on New Strategies to Mitigate Hallucination in Multimo …

Multimodal large language models (MLLMs) represent a cutting-edge intersection of language processing and computer vision, tasked with understanding and generating responses that consider both text and imagery. These models, evolving from their predecessors that handled either text or images, are now capable of tasks that require an integrated approach, such as describing photographs, answering questions about video content, or even assisting visually impaired users in navigating their environment.

A pressing issue these advanced models face is known as ‘hallucination.’ This term describes instances where MLLMs generate responses that seem plausible but are factually incorrect or not grounded in the visual content they are supposed to analyze. Such inaccuracies can undermine trust in AI applications, especially in critical areas like medical image analysis or surveillance systems, where precision is paramount.

Efforts to address these inaccuracies have traditionally focused on refining the models through sophisticated training regimes involving vast annotated images and text datasets. Despite these efforts, the problem persists, largely due to the inherent complexities of teaching machines to interpret and correlate multimodal data accurately. For instance, a model might describe elements in a photograph that are not present, misinterpret the actions in a scene, or fail to recognize the context of the visual input.

Researchers from the National University of Singapore, Amazon Prime Video, and AWS Shanghai AI Lab have surveyed methodologies to reduce hallucinations. One approach studied tweaks the standard training paradigm by introducing novel alignment techniques that enhance the model’s ability to correlate specific visual details with accurate textual descriptions. This method also involves a critical evaluation of the data quality, focusing on the diversity and representativeness of the training sets to prevent common data biases that lead to hallucinations.

Quantitative improvements in several key performance metrics underscore the efficacy of studied models. For instance, in benchmark tests involving image caption generation, the refined models demonstrated a 30% reduction in hallucination incidents compared to their predecessors. The model’s ability to accurately answer visual questions improved by 25%, reflecting a deeper understanding of the visual-textual interfaces.

In conclusion, the review of multimodal large language models studied the significant challenge of hallucination, which has been a stumbling block in realizing fully reliable AI systems. The proposed solutions advance the technical capabilities of MLLMs but also enhance their applicability across various sectors, promising a future where AI can be trusted to interpret and interact with the visual world accurately. This body of work charts a course for future developments in the field and serves as a benchmark for ongoing improvements in AI’s multimodal comprehension.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post A Survey Report on New Strategies to Mitigate Hallucination in Multimodal Large Language Models appeared first on MarkTechPost.

Top Low/No Code AI Tools 2024

Applications that take advantage of machine learning in novel ways are being developed thanks to the rise of Low-Code and No-Code AI tools and platforms. AI can be used to create web services and customer-facing apps to coordinate sales and marketing efforts better. Minimal coding expertise is all that’s needed to make use of Low-Code and No-Code solutions.

Artificial intelligence technologies that require little to no coding reflect a long-sought objective in computer science. No-code is a software design system that implements software without writing a single line of code. At the same time, low-code is a software development technique that promotes faster app deliveries with little to no coding required, and low-code platforms are software tools that allow the visual development of apps using a GUI interface. This AI tool requires no coding and may be used with a simple drag-and-drop interface—code-free or low-code development environments for AI applications. 

Top low-code and no-code AI tools include the following:

MakeML

Use MakeML to generate machine-learning models for object identification and segmentation without hand-coding. It simplifies the process of creating and efficiently managing a large dataset. In addition to preparing your ML models for action, you can also test them. MakeML is an online resource that can teach you all you need to know to build AI software and apply Computer Vision to an in-house problem in only a few hours. Video tutorials are also available on your mobile device to help you master Machine Learning. The skilled professionals at MakeML will assist you in developing a Computer Vision solution and incorporating it into your product. A single GPU cloud training and limited dataset import/export are provided at no cost.

Obviously AI

With Obviously AI’s Machine Learning platform, you can make accurate predictions in minutes and don’t even need to know how to code. This entails creating machine learning algorithms and forecasting their results with a single mouse click. Use the data dialog to modify your dataset without additional code, then distribute or showcase your ML models across your organization. The low-code API allows anyone to use the algorithms to make predictions and incorporate those forecasts into their real-world applications. Furthermore, Obviously, AI gives you access to state-of-the-art algorithms and technologies without compromising efficiency. It can be used for revenue forecasting, supply chain planning, and targeted advertising. Lead conversion, dynamic pricing, loan payback, and other outcomes can all be forecast in real-time. 

SuperAnnotate

Create AI-Powered SuperData using SuperAnnotate. It’s an end-to-end system for AI-related tasks, including annotating, managing, and versioning “ground truth” data. With its extensive toolkit, top-tier annotation services, and solid data management system, your AI pipeline can be scaled and automated three to five times faster. High-throughput data annotation of video, text, and image to create high-quality datasets using industry-leading services and software. Project management tools and teamwork can help your model succeed in the field. Set up a streamlined annotation workflow, keep tabs on project quality, share updates with the team, and more—all with SuperAnnotate. It can speed up your annotation process because of its active learning and automation features. 

Teachable Machine

Teachable Machine allows you to teach a computer to recognize and respond to your voice, gestures, and photos. Without the need to write any code, it facilitates the rapid creation of robust ML models for integration into applications, websites, and more. Teachable Machine is a web-based low-code machine learning platform that enables the development of widely usable machine learning models. You’ll need to collect and organize examples into relevant classes to teach a computer something new. You may put your computer through its paces as a learning machine and then immediately put it to the test. You can use the model in your online projects. You can also host the model online or distribute it as a downloadable file. And the best part is the model works completely locally on your device, so none of your audio or video has to leave the system at any point. Classifying photos and body orientations is a breeze with the help of files, a camera, and short audio samples. 

Apple’s Create ML

Discover an innovative approach to teaching and training ML models on your Mac. It facilitates efficient ML model creation and Mac training using Apple’s Create ML. In a single project, you can train numerous models simultaneously, each with a unique dataset. It contains an external graphics processing unit to improve the speed of your models on your Mac. Take charge of your workout with options like pausing and resuming playback. The evaluation set will tell you how well your model performed. Examine pivotal KPIs and interconnections to spot a wide range of model-enhancing use cases, prospects, and investments in the future. Try out the model’s performance with a continuous preview using the camera on your iPhone. Train models more quickly on your Mac by using the hardware accelerators. Models can be of various kinds in Create ML. Model types include images, movies, music, speeches, texts, tables, etc. Afterward, you can train your computer with new information and settings.

PyCaret

You may automate your machine-learning workflows in Python with the help of PyCaret, a low-code machine-learning platform. With this basic, straightforward machine learning library, you may devote more effort to analysis, such as data pretreatment, model training, model explainability, MLOps, and exploratory data analysis, and less to writing code. PyCaret is built modularly so that different models can perform various machine-learning operations. Here, functions are the collections of processes that carry out jobs according to some procedure. Using PyCaret, virtually anyone can create complete, low-code machine-learning solutions. A Quick Start Guide, Blog, Videos, and Online Forums are all available for study. Create a basic ML app, train your model rapidly, and then instantly deploy it as a REST API after analyzing and refining it.

Lobe

Use Lobe to teach your apps to recognize plants, read gestures, track reps, experience emotions, detect colors, and verify safety. It facilitates the training of ML models, provides accessible and free tools, and supplies everything required to develop such models. Provide examples of the behavior you would like to be learned by your application, and a machine-learning model will be trained automatically and ready to be released as soon as possible. This platform requires no coding experience and may be used by anyone. You can save money and time by skipping online storage and instead training locally on your PC. Lobe may be downloaded on both PCs and Macs. Furthermore, your model is cross-platform and ready for export or distribution. Your project’s ideal machine-learning architecture will be chosen automatically. 

MonkeyLearn

MonkeyLearn provides state-of-the-art Artificial Intelligence tools that will make cleaning, visualizing, and labeling client feedback a breeze. It is a data visualization and no-code text analysis studio that comprehensively analyzes your data. MonkeyLearn allows you to quickly and easily generate unique data visualizations and charts, allowing for more in-depth data exploration. You may also merge and filter these findings based on data inputs like date ranges and custom fields. In addition to using pre-made machine learning models, you can create your own with MonkeyLearn. Additionally, various pre-trained classifiers are available for use—emotion analysis, topic classifiers, entity extractors, and so on- and may all be constructed rapidly. 

Akkio

Akkio is a platform for artificial intelligence that doesn’t require users to write any code to build prediction models. It facilitates the easy creation of predictive models from user data for improved in-the-moment decision-making. Key business results, such as enhanced lead scoring, forecasting, text classification, and reduced churn, can be predicted with the help of Akkio’s use of existing data. It can also do advanced tasks for cleaning data, like merging columns, reshaping dates, and filtering out anomalies. Because of its intuitive interface, Akkio may be utilized by non-technical business users without the requirement for coding or machine learning knowledge. It may reduce time and increase output in various settings, from marketing and sales to finance and customer support.

Amazon SageMaker

Machine learning (ML) models can be created, trained, and deployed with the help of Amazon SageMaker, a cloud-based ML platform that offers a full suite of ML-related tools and services. SageMaker’s no-code and low-code tools streamline the machine learning (ML) model development and deployment processes for non-technical users and business analysts. Amazon SageMaker Canvas is a visual tool that facilitates ML model development and deployment without writing code. SageMaker Canvas’s intuitive drag-and-drop interface streamlines the processes of data selection, algorithm selection, and model training. SageMaker Canvas may then make predictions and put the trained model into production.

Data Robot

Data Robot is an artificial intelligence platform that streamlines the entire lifecycle of machine learning model development, deployment, and management. It’s a robust resource that serves many users, from data scientists and engineers to businesspeople. Data Robot’s flexible features make it a solid pick for those with little programming experience. Data Robot offers a visual, drag-and-drop interface for non-technical people to create and deploy machine learning models. This paves the way for business users with rudimentary technical skills to experiment with AI. Data Robot’s adaptable interface makes machine learning customization easier for non-programmers. Integration with external systems and the capability to create one’s programs fall under this category.

Google AutoML

With Google’s AutoML, programmers and data scientists can create and release machine learning models without using hand-coded solutions. If you have little experience with machine learning, you can still use this platform to construct models because it requires little to no coding. Google AutoML provides a library of pre-trained models that may be used in various scenarios. These models are accurate because they are trained on large datasets. With Google AutoML, creating and deploying models is as straightforward as dragging and dropping components. It may be used without having to learn how to code. Google AutoML takes care of tuning your models’ hyperparameters automatically. Time and energy are both conserved by this method. You may check how well your models are doing with the help of Google’s AutoML tools. This aids in making sure your models are trustworthy and correct.

Nanonets

NanoNets is a machine learning API that allows developers to train a model with only a tenth of the data and no prior experience with machine learning. Upload your data, wait a few minutes, and you will have a model that can be queried via their simple cloud API. Extracting structured or semi-structured data from documents is made faster and more efficient by this AI platform. The OCR technology powered by artificial intelligence can read documents of any size or complexity. The document processing workflow can be streamlined using Nanonets’ AP Automation, Touchless Invoice Processing, Email Parsing, and ERP Integrations, among other services. In addition to PDF to Excel, CSV, JSON, XML, and Text conversion, Nanonets comes with various free OCR converters.

IBM Watson Studio

IBM Watson Studio is a service that provides a central hub from which anybody can create, release, and manage AI models in the cloud. It offers features and tools that make AI development accessible to people with little coding skills. Watson Studio’s no- or low-code features are a major selling point. It’s now possible to construct AI models without resorting to custom coding. Instead, you can utilize Watson Studio’s visual tools to assemble your project by dragging and dropping individual components into place. This paves the way for non-technical people, including business users, analysts, and researchers, to construct AI models. You can get up and running quickly with Watson Studio and its many pre-trained models. Uses for these models range from spotting fraudulent activity and client segmentation to predicting the need for repairs. After finishing an AI model in Watson Studio, you can send it into production. Watson Studio allows for both cloud-based and on-premises deployments and hybrid implementations that combine the two.

H2O Driverless AI

H2O Driverless AI is an AutoML platform streamlining the machine learning lifecycle, from preprocessing data to releasing models. This is a priceless tool for data scientists and business users since it allows them to build and deploy machine learning models without writing code. H2O Driverless AI uses several methods, including imputation, modification, and selection, to autonomously engineer features from your data. In machine learning, feature engineering is frequently the most time-consuming step, so this might be a huge time saver. Decision trees, random forests, support vector machines, and neural networks are some machine learning models that H2O Driverless AI can automatically construct and analyze. In addition, it optimizes your data by adjusting the hyperparameters of each model. With H2O Driverless AI, your models are instantly deployed to production, where they may be used in making predictions.

Domino Data Lab

Domino Data Lab is a cloud-based service that facilitates creating, deploying, and managing machine learning models for data scientists, engineers, and analysts. It’s a low- or no-code artificial intelligence tool for designing and automating data science operations. Domino Code Assist is a tool that can build Python and R code for frequent data science projects. This can reduce the learning curve for non-technical users and the workload for data scientists. Domino Data Lab facilitates effective teamwork on data science initiatives. Users can collaborate on projects by sharing and analyzing code, data, and models. Data science projects are 100% reproducible in Domino Data Lab. This allows anyone to replicate a project’s outcomes without obtaining the original data or source code. Domino Data Lab has several tools that can be used to manage data science initiatives. Access control, code history, and auditing of the model’s efficacy are all part of this.

CrowdStrike Falcon Fusion

Organizations may automate their security operations, threat intelligence, and incident response with the help of CrowdStrike Falcon Fusion, a security orchestration, automation, and response (SOAR) architecture. It is based on the CrowdStrike Falcon® platform and is provided at no extra cost to CrowdStrike subscribers. Falcon Fusion is a low- to no-code tool, making it accessible to organizations of all sizes in the security industry. The software’s drag-and-drop interface simplifies the process of developing and automating workflows. Falcon Fusion also features a library of pre-built connections with various security solutions, allowing easy and rapid integration with an organization’s pre-existing infrastructure. Artificial intelligence (AI) is leveraged by Falcon Fusion to facilitate automation and better judgment. For instance, the program may analyze security telemetry data for patterns, assign priorities to incidents, and suggest courses of action using artificial intelligence. Consequently, security personnel are better able to deal with threats.

RapidMiner

Data mining and machine learning models may be created and deployed quickly with RapidMiner, a comprehensive data science platform. Data preprocessing, feature engineering, model training, evaluation, and deployment are just some of its services. RapidMiner’s no/low code methodology is a major selling point. You may now create and release AI models without touching a single line of code. RapidMiner has a graphical user interface (GUI) lets you build your models by dragging and dropping various building blocks. This facilitates the entry of non-technical users into the field of artificial intelligence. RapidMiner has sophisticated scripting features, including a language dubbed RapidMiner R and its no/low code capabilities. You can use this language to modify your models and add new features to RapidMiner.

Don’t forget to join our 42k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

If you like our work, you will love our newsletter..
The post Top Low/No Code AI Tools 2024 appeared first on MarkTechPost.

Meet StyleMamba: A State Space Model for Efficient Text-Driven Image S …

In a recent study, a team of researchers from Imperial College London and Dell introduced StyleMamba, an effective framework for transferring picture styles that uses text prompts to direct the stylization process while maintaining the original image content. The computational needs and training inefficiencies of the current text-guided stylization techniques have been addressed in this introduction. 

Text-driven stylization is traditionally approached with large computational resources and drawn-out training procedures. With the introduction of a conditional State Space Model created especially for effective text-driven image style transfer, StyleMamba expedites this procedure. With this methodology, stylization can be precisely controlled by sequentially aligning image features with target text cues.

StyleMamba provides two unique loss functions, second-order directional loss and masked loss, to guarantee both local and global style consistency between the images and the written prompts. These losses reduce the number of training iterations required by a factor of 5 and inference time by a factor of 3, thus optimizing the stylization direction. 

The effectiveness of StyleMamba has been confirmed by numerous tests and qualitative analyses. The outcomes verify that the robustness and overall stylization performance of this suggested method surpass the performance of the current baselines. This framework provides a more effective and economical way to convert verbal descriptions into styles that are visually appealing while maintaining the integrity and spirit of the original image material.

The team has summarized their primary contributions as follows. 

By incorporating a conditional Mamba into an AutoEncoder architecture, StyleMamba presents a simple yet powerful framework. With this integration, text-driven style transfer can be accomplished quickly and effectively, simplifying the procedure in comparison to current approaches.

StyleMamba uses loss functions to improve the stylization quality. The introduction of the Masked directional loss and Second-order relational loss ensures better global and local style consistency without sacrificing the original content of the images, and speeds up the stylization process.

StyleMamba’s effectiveness has been proven by thorough empirical analyses, which comprise both quantitative and qualitative evaluations. These tests demonstrate StyleMamba’s advantage in terms of both stylization quality and speed. 

StyleMamba has been evaluated in settings other than still image style transfer because of its ease of use and effectiveness. Experiments have shown how versatile and adaptable StyleMamba is across a range of applications and media formats, including multiple style transfer tasks and video style transfer.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit
The post Meet StyleMamba: A State Space Model for Efficient Text-Driven Image Style Transfer appeared first on MarkTechPost.

AWS DeepRacer enables builders of all skill levels to upskill and get …

In today’s technological landscape, artificial intelligence (AI) and machine learning (ML) are becoming increasingly accessible, enabling builders of all skill levels to harness their power. As more companies adopt AI solutions, there’s a growing need to upskill both technical and non-technical teams in responsibly expanding AI usage. Getting hands-on experience is crucial for understanding and applying ML concepts to automate tasks like content generation, language translation, and image classification. And that’s where AWS DeepRacer comes into play—a fun and exciting way to learn ML fundamentals.
Launched in 2019, DeepRacer is a fully managed service that enables builders of all skill levels to learn and perform model training and evaluation tasks such as defining a reward function, setting up the training parameters, and configuring a training job that can be evaluated and monitored for model performance in a simulated environment. By exploring the AWS DeepRacer ML training lifecycle, you’ll practice model training, evaluation, and deployment of ML models onto a 1/18th scale autonomous race car, using a human-in-the-loop experience. The model training and evaluation experience enables builders to familiarize themselves with similar concepts applicable in training and fine-tuning foundation models (FMs) that power generative AI applications.

AWS DeepRacer also offers a global racing league for competing alongside a community of ML enthusiasts, earning rewards and recognition while showcasing your ML skills. Through the AWS DeepRacer League, we have educated over 550,000 developers, crowned five AWS DeepRacer champions, recognized over 100 monthly virtual circuit winners, and rewarded over 10,000 participants worldwide with Amazon gift cards, cash prizes, and paid trips to AWS re:Invent to compete for the annual AWS DeepRacer Championship Cup.

The excitement around AWS DeepRacer extends far beyond just individual learners. To celebrate Women’s History Month, JPMorgan Chase & Co. recently hosted the “World’s Largest Global Women’s AWS DeepRacer League,” providing employees with a thrilling opportunity to gain hands-on ML experience through virtual autonomous vehicle racing. This event not only fostered a spirit of friendly competition but also celebrated empowerment and innovation in AI and ML. By embracing AWS DeepRacer, JPMorgan Chase showcased its commitment to democratizing ML knowledge and nurturing a culture of continuous learning, empowering its talented teams to drive the company’s AI transformation.

“I am super proud of the group, the firm and the TIF (Take it Forward) team. . . I couldn’t be more proud of a group of individuals being so self-motivated.  The sky is the limit from here!  Deep Racer is proof that learning can be fun.”
– Ebele Kemery, Head of JPMorgan Chase Tech, Data and AI Learning.

Initiatives like these demonstrate the far-reaching impact of AWS DeepRacer in bringing ML education to the forefront, inspiring learners of all backgrounds to embrace the future of intelligent technologies.

Whether you’re a seasoned developer or curious business professional, AWS DeepRacer provides a fun and exciting way to get started with AI. You’ll gain practical skills applicable to real-world ML and generative AI use cases. So get rolling with machine learning today!

About the authors
Ange Krueger is a principal AWS technologist. She leads product portfolio advancements and technological agility within the global financial sector. Utilizing over 200 AWS cloud services including leading AWS Artificial Intelligence, Machine Learning and Generative AI offerings, she delivers innovation, transformation, and scalable solutions that precisely address the complex demands of our global customers. Through a collaborative approach and a laser focus on customer-centric outcomes, Ange enhances customer experiences to achieve optimized business performance. Her commitment to continual improvement and customer obsession is unwavering, as she works to empower our clients with resilient, cloud-based financial services solutions.

Transform customer engagement with no-code LLM fine-tuning using Amazo …

Fine-tuning large language models (LLMs) creates tailored customer experiences that align with a brand’s unique voice. Amazon SageMaker Canvas and Amazon SageMaker JumpStart democratize this process, offering no-code solutions and pre-trained models that enable businesses to fine-tune LLMs without deep technical expertise, helping organizations move faster with fewer technical resources.
SageMaker Canvas provides an intuitive point-and-click interface for business users to fine-tune LLMs without writing code. It works both with SageMaker JumpStart and Amazon Bedrock models, giving you the flexibility to choose the foundation model (FM) for your needs.
This post demonstrates how SageMaker Canvas allows you to fine-tune and deploy LLMs. For businesses invested in the Amazon SageMaker ecosystem, using SageMaker Canvas with SageMaker JumpStart models provides continuity in operations and granular control over deployment options through SageMaker’s wide range of instance types and configurations. For information on using SageMaker Canvas with Amazon Bedrock models, see Fine-tune and deploy language models with Amazon SageMaker Canvas and Amazon Bedrock.
Fine-tuning LLMs on company-specific data provides consistent messaging across customer touchpoints. SageMaker Canvas lets you create personalized customer experiences, driving growth without extensive technical expertise. In addition, your data is not used to improve the base models, is not shared with third-party model providers, and stays entirely within your secure AWS environment.
Solution overview
The following diagram illustrates this architecture.

In the following sections, we show you how to fine-tune a model by preparing your dataset, creating a new model, importing the dataset, and selecting an FM. We also demonstrate how to analyze and test the model, and then deploy the model via SageMaker, focusing on how the fine-tuning process can help align the model’s responses with your company’s desired tone and style.
Prerequisites
First-time users need an AWS account and AWS Identity and Access Management (IAM) role with SageMaker and Amazon Simple Storage Service (Amazon S3) access.
To follow along with this post, complete the prerequisite steps:

Create a SageMaker domain, which is a collaborative machine learning (ML) environment with shared file systems, users, and configurations.
Confirm that your SageMaker IAM role and domain roles have the necessary permissions.
On the domain details page, view the user profiles.
Choose Launch by your profile, and choose Canvas.

Prepare your dataset
SageMaker Canvas requires a prompt/completion pair file in CSV format because it does supervised fine-tuning. This allows SageMaker Canvas to learn how to answer specific inputs with properly formatted and adapted outputs.
Download the following CSV dataset of question-answer pairs.

Create a new model
SageMaker Canvas allows simultaneous fine-tuning of multiple models, enabling you to compare and choose the best one from a leaderboard after fine-tuning. For this post, we compare Falcon-7B with Falcon-40B.
Complete the following steps to create your model:

In SageMaker Canvas, choose My models in the navigation pane.
Choose New model.
For Model name, enter a name (for example, MyModel).
For Problem type¸ select Fine-tune foundation model.
Choose Create.

The next step is to import your dataset into SageMaker Canvas.

Create a dataset named QA-Pairs.
Upload the prepared CSV file or select it from an S3 bucket.
Choose the dataset.

SageMaker Canvas automatically scans it for any formatting issues. In this case, SageMaker Canvas detects an extra newline at the end of the CSV file, which can cause problems.

To address this issue, choose Remove invalid characters.
Choose Select dataset.

Select a foundation model
After you upload your dataset, select an FM and fine-tune it with your dataset. Complete the following steps:

On the Fine-tune tab, on the Select base models menu¸ choose one or more models you may be interested in, such as Falcon-7B and Falcon-40B.
For Select input column, choose question.
For Select output column, choose answer.
Choose Fine-tune.

Optionally, you can configure hyperparameters, as shown in the following screenshot.

Wait 2–5 hours for SageMaker to finish fine-tuning your models. As part of this process, SageMaker Autopilot splits your dataset automatically into an 80/20 split for training and validation, respectively. You can optionally change this split configuration in the advanced model building configurations.
SageMaker training uses ephemeral compute instances to efficiently train ML models at scale, without the need for long-running infrastructure. SageMaker logs all training jobs by default, making it straightforward to monitor progress and debug issues. Training logs are available through the SageMaker console and Amazon CloudWatch Logs.
Analyze the model
After fine-tuning, review your new model’s stats, including:

Training loss – The penalty for next-word prediction mistakes during training. Lower values mean better performance.
Training perplexity – Measures the model’s surprise when encountering text during training. Lower perplexity indicates higher confidence.
Validation loss and validation perplexity – Similar to the training metrics, but measured during the validation stage.

To get a detailed report on your custom model’s performance across dimensions like toxicity and accuracy, choose Generate evaluation report (based on the AWS open source Foundation Model Evaluations Library). Then choose Download report.
The graph’s curve reveals if you overtrained your model. If the perplexity and loss curves plateau after a certain number of epochs, the model stopped learning at that point. Use this insight to adjust the epochs in a future model version using the Configure model settings.

The following is a portion of the report, which gives you an overall toxicity score for the fine-tuned model. The report includes explanations of what the scores mean.

A dataset consisting of ~320K question-passage-answer triplets. The questions are factual naturally-occurring questions. The passages are extracts from wikipedia articles (referred to as “long answers” in the original dataset). As before, providing the passage is optional depending on whether the open-book or closed-book case should be evaluated. We sampled 100 records out of 4289 in the full dataset.Prompt Template: Respond to the following question with a short answer: $model_input Toxicity detector model: UnitaryAI Detoxify-unbiased Toxicity Score A binary score from 0 (no toxicity detected) to 1 (toxicity detected) for the class: toxicity Average Score: 0.0027243031983380205

Now that we have confirmed that the model has close to 0 toxicity detected according to the available toxicity models, let’s check out the model leaderboard to compare how Falcon-40B and Falcon-7B perform on dimensions like loss and perplexity.

On an order of magnitude, the two models performed about the same along these metrics on the provided data. Falcon-7B did a little better in this case, so SageMaker Canvas defaulted to that, but you can choose a different model from the leaderboard.
Let’s stick with Falcon-7B, because it performed slightly better and will run on more cost-efficient infrastructure.
Test the models
Although metrics and the report already provide insights into the performances of the models you’ve fine-tuned, you should always test your models by generating some predictions before putting them in production. For that, SageMaker Canvas allows you to use these models without leaving the application. To do that, SageMaker Canvas deploys for you an endpoint with the fine-tuned model, and shuts it down automatically after 2 hours of inactivity to avoid unintended costs.
To test the models, complete the following steps. Keep in mind that although fine-tuning can improve response style, it may not be a complete solution for providing factual accuracy. For factual accuracy, consider Retrieval Augmented Generation (RAG) architectures and continued pre-training.

Choose Test in Ready-to-Use Models and wait 15–30 minutes for your test endpoint to be deployed.

When the deployment is complete, you’ll be redirected to the SageMaker Canvas playground, with your model pre-selected.

2. Choose Compare and select the FM used for your custom model.
Enter a phrase directly from your training dataset, to make sure the custom model at least does better at such a question and is consistent with the level of verbosity provided in the fine-tuning data.

For this example, we enter the question, “What is the significance of the memory hierarchy in modern computer architectures?”
The fine-tuned Falcon-7B model responded succinctly, like you would expect from an FAQ document:
The memory hierarchy in modern computer architectures is the organization of memory storage within a computer system. The memory hierarchy is important because it determines how memory is accessed and used.
In contrast to the fine-tuned Falcon-7B, the base Falcon-7B model responded verbosely and with an odd beginning:
1 Answer | Add Yours [sic]
The memory hierarchy is the structure of the memory system in a computer system. It is a hierarchy because there are different levels of memory. The memory hierarchy is important because it determines how fast a computer can access memory.
The memory hierarchy is made up of levels of memory. The first level of memory is the main memory. This is the memory that is used for the data that is currently being processed. It is also used for the instructions that are currently being processed. The main memory is very fast and is able to access data very quickly.
The second level of memory is the cache memory. This is a level of memory that is much faster than the main memory. It is used to store data that is frequently accessed. It is also used to store instructions that are frequently accessed. The cache memory is much faster than the main memory.
The third level of memory is the disk memory. This is a level of memory that is much slower than the main memory and the cache memory. It is used to store data that is infrequently accessed. It is also used to store instructions that are infrequently accessed. The disk memory is much slower than the main memory and the cache memory.
The fourth level of memory is the secondary storage. This is a level of memory that is used to store data that is infrequently accessed. It is also used to store instructions that are infrequently accessed.

Let’s say you as a business user want to collaborate with your ML team on this model. You can send the model to your SageMaker model registry so the ML team can interact with the fine-tuned model in Amazon SageMaker Studio, as shown in the following screenshot.

Under the Add to Model Registry option, you can also see a View Notebook option. SageMaker Canvas offers a Python Jupyter notebook detailing your fine-tuning job, alleviating concerns about vendor lock-in associated with no-code tools and enabling detail sharing with data science teams for further validation and deployment.

Deploy the model with SageMaker
For production use, especially if you’re considering providing access to dozens or even thousands of employees by embedding the model into an application, you can deploy the model as an API endpoint. Complete the following steps to deploy your model:

On the SageMaker console, choose Inference in the navigation pane, then choose Models.
Locate the model with the prefix canvas-llm-finetuned- and timestamp.
Open the model details and note three things:

Model data location – A link to download the .tar file from Amazon S3, containing the model artifacts (the files created during the training of the model).
Container image – With this and the model artifacts, you can run inference virtually anywhere. You can access the image using Amazon Elastic Container Registry (Amazon ECR), which allows you to store, manage, and deploy Docker container images.
Training job – Stats from the SageMaker Canvas fine-tuning job, showing instance type, memory, CPU use, and logs.

Alternatively, you can use the AWS Command Line Interface (AWS CLI):

“`bash

aws sagemaker list-models

“`

The most recently created model will be at the top of the list. Make a note of the model name and the model ARN.
To start using your model, you must create an endpoint.

4. On the left navigation pane in the SageMaker console, under Inference, choose Endpoints.
Choose Create endpoint.
For Endpoint name, enter a name (for example, My-Falcon-Endpoint).
Create a new endpoint configuration (for this post, we call it my-fine-tuned-model-endpoint-config).
Keep the default Type of endpoint, which is Provisioned. Other options are not supported for SageMaker JumpStart LLMs.
Under Variants, choose Create production variant.
Choose the model that starts with canvas-llm-finetuned-, then choose Save.
In the details of the newly created production variant, scroll to the right to Edit the production variant and change the instance type to ml.g5.xlarge (see screenshot).
Finally, Create endpoint configuration and Create endpoint.

As described in Deploy Falcon-40B with large model inference DLCs on Amazon SageMaker, Falcon works only on GPU instances. You should choose the instance type and size according to the size of the model to be deployed and what will give you the required performance at minimum cost.

Alternatively, you can use the AWS CLI:

“`
config_name=”my-fine-tuned-model-endpoint-config”

aws sagemaker create-endpoint-config
–endpoint-config-name $config_name
–production-variants VariantName=”cool-variant”,ModelName=”canvas-llm-finetuned-2024-01-16-20-11-13-119791″,InstanceType=”ml.g5.xlarge”,InitialInstanceCount=1

aws sagemaker create-endpoint
–endpoint-name “my-fine-tuned-model-endpoint”
–endpoint-config-name $config_name
“`

Use the model
You can access your fine-tuned LLM through the SageMaker API, AWS CLI, or AWS SDKs.
Enrich your existing software as a service (SaaS), software platforms, web portals, or mobile apps with your fine-tuned LLM using the API or SDKs. These let you send prompts to the SageMaker endpoint using your preferred programming language. Here’s an example:

“`
import boto3
import json

# Create a SageMaker runtime client
sagemaker_runtime = boto3.client(‘sagemaker-runtime’)

# Specify your endpoint name
endpoint_name = ‘my-fine-tuned-model-endpoint’

def query_falcon_llm(question):
“””
Function to query the fine-tuned Falcon LLM endpoint with a specific question.
:param question: str, the question to ask the LLM.
:return: str, the answer from the LLM.
“””
# Define the prompt
prompt = f”You are a helpful Assistant. You answer questions in the style of technical answers everything about GPUs and Machine Learning. User: {question}n Assistant:”

# Define the payload with hyperparameters
payload = {
“inputs”: prompt,
“parameters”: {
“do_sample”: True,
“top_p”: 0.7,
“temperature”: 0.5,
“max_new_tokens”: 1024,
“repetition_penalty”: 1.03,
“stop”: [“nUser:”, “###”]
}
}

# JSONify the payload
payload_json = json.dumps(payload)

# Call the SageMaker endpoint
response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
ContentType=’application/json’,
Body=payload_json)

# Decode the response
response_body = json.loads(response[‘Body’].read().decode())

# Extract and format the answer
assistant_response = response_body[0][“generated_text”][len(prompt):]
assistant_response = assistant_response.replace(“nUser:”, “”).replace(“###”, “”).strip()

return assistant_response

# Example usage
question = ” What is the significance of the memory hierarchy in modern computer architectures?”
answer = query_falcon_llm(question)
print(f”Question: {question}nAnswer: {answer}”)

“`

For examples of invoking models on SageMaker, refer to the following GitHub repository. This repository provides a ready-to-use code base that lets you experiment with various LLMs and deploy a versatile chatbot architecture within your AWS account. You now have the skills to use this with your custom model.
Another repository that may spark your imagination is Amazon SageMaker Generative AI, which can help you get started on a number of other use cases.
Clean up
When you’re done testing this setup, delete your SageMaker endpoint to avoid incurring unnecessary costs:

“`

aws sagemaker delete-endpoint –endpoint-name “your-endpoint-name”

“`

After you finish your work in SageMaker Canvas, you can either log out or set the application to automatically delete the workspace instance, which stops billing for the instance.
Conclusion
In this post, we showed you how SageMaker Canvas with SageMaker JumpStart models enable you to fine-tune LLMs to match your company’s tone and style with minimal effort. By fine-tuning an LLM on company-specific data, you can create a language model that speaks in your brand’s voice.
Fine-tuning is just one tool in the AI toolbox and may not be the best or the complete solution for every use case. We encourage you to explore various approaches, such as prompting, RAG architecture, continued pre-training, postprocessing, and fact-checking, in combination with fine-tuning to create effective AI solutions that meet your specific needs.
Although we used examples based on a sample dataset, this post showcased these tools’ capabilities and potential applications in real-world scenarios. The process is straightforward and applicable to various datasets, such as your organization’s FAQs, provided they are in CSV format.
Take what you learned and start brainstorming ways to use language models in your organization while considering the trade-offs and benefits of different approaches. For further inspiration, see Overcoming common contact center challenges with generative AI and Amazon SageMaker Canvas and New LLM capabilities in Amazon SageMaker Canvas, with Bain & Company.

About the Author
Yann Stoneman is a Solutions Architect at AWS focused on machine learning and serverless application development. With a background in software engineering and a blend of arts and tech education from Juilliard and Columbia, Yann brings a creative approach to AI challenges. He actively shares his expertise through his YouTube channel, blog posts, and presentations.

Is There a Library for Cleaning Data before Tokenization? Meet the Uns …

In Natural Language Processing (NLP) tasks, data cleaning is an essential step before tokenization, particularly when working with text data that contains unusual word separations such as underscores, slashes, or other symbols in place of spaces. Since common tokenizers frequently rely on spaces to split text into distinct tokens, this problem can have a major impact on the quality of tokenization. 

This challenge emphasizes the necessity of having a specialized library or tool that can efficiently preprocess such data. To make sure that words are properly segmented before feeding them into NLP models, cleaning text data includes adding, deleting, or changing these symbols. Neglecting this preliminary stage may result in inaccurate tokenization, impacting subsequent tasks such as sentiment analysis, language modeling, or text categorization.

The Unstructured library is a solution to this, as it provides an extensive range of cleaning operations that are specifically tailored to sanitize text output, thereby tackling the problem of cleaning data prior to tokenization. When working with unstructured data from many sources, including HTML, PDFs, CSVs, PNGs, and more, these capabilities are quite helpful because formatting problems, like unusual symbols or word separations, are frequently encountered. 

Unstructured specializes in extracting and converting complex data into AI-friendly formats that are optimized for Large Language Model (LLM) integration, like JSON. Because of the platform’s versatility in handling different document kinds and layouts, data scientists may effectively preprocess data at scale without being constrained by issues with format or cleaning. 

The main features of the platform which are meant to make data workflows more efficient are as follows.

Document Extraction: Unstructured is excellent at extracting metadata and document elements from a wide range of document types. This capacity to extract exact information guarantees the accurate acquisition of pertinent data for processing later on.

Broad File Support: Unstructured provides flexibility in managing several document formats, guaranteeing compatibility and adaptability across multiple platforms and use cases.

Partitioning: Structured material can be extracted from unstructured texts using Unstructured partitioning features. This function is essential for converting disorganized data into usable formats, which makes data processing and analysis more effective. 

Cleaning: Unstructured contains cleaning capabilities to sanitize output, eliminate undesired content, and improve the performance of NLP tasks by guaranteeing data integrity as preparing data is crucial for NLP models. 

Extracting: By locating and isolating particular entities inside documents, the platform’s extraction functionality makes data interpretation easier to understand and concentrates on pertinent information. 

Connectors: Unstructured offers high-performing connectors that optimize data workflows and support popular use cases, including Retrieval Augmented Generation (RAG), fine-tuning models, and pretraining models. These connectors enable fast data import and export.

In conclusion, utilizing Unstructured’s extensive toolkit can expedite data preprocessing processes and cut down on the time spent on data collecting and cleaning. This speeds up the creation and implementation of some amazing NLP solutions driven by LLMs by enabling researchers and developers to devote more time and resources to data modeling and analysis.
The post Is There a Library for Cleaning Data before Tokenization? Meet the Unstructured Library for Seamless Pre-Tokenization Cleaning appeared first on MarkTechPost.

The Rise of Adversarial AI in Cyberattacks

In cybersecurity, while AI technologies have significantly bolstered our defense mechanisms against cyber threats, they have also given rise to a new era of sophisticated attacks. Let’s explore the darker side of AI advancements in the cybersecurity domain, focusing on its role in enhancing adversarial capabilities. From AI-powered phishing attacks that craft deceptively personal messages to advanced cryptographic attacks that challenge the integrity of encryption methods, let’s delve into how AI is reshaping the landscape of cyber warfare, presenting unprecedented challenges and opportunities for cybersecurity professionals.

AI-powered Social Engineering and Phishing Attacks

AI is reshaping the landscape of social engineering and phishing attacks, allowing for highly targeted and personalized campaigns. AI tools analyze vast datasets to identify potential targets, fine-tuning phishing messages that resonate with specific individuals. These messages are increasingly difficult to distinguish from legitimate communication, significantly increasing their effectiveness. The continuous improvement of generative AI models means they can adapt to counteract detection techniques, making traditional defenses less effective. 

Image Source

Deepfakes and Synthetic Media for Deception

The use of AI-generated deepfakes and synthetic media in cyberattacks presents a growing threat, particularly in political misinformation and personal impersonation. These technologies can create convincing audio and visual content, leading to misinformation or manipulation of public opinion. The sophistication of these tools enables the creation of media that can be nearly impossible to differentiate from genuine content, raising significant concerns for security and misinformation. 

Evolving Malware and Ransomware with AI

AI also enhances malware’s capabilities, including ransomware, making these threats more adaptive, resilient, and difficult to detect. AI-driven malware can analyze its environment and modify its behavior to evade security measures. This includes learning from defensive responses and finding new vulnerabilities without human intervention. The increased use of AI in malware development suggests a future where automated threats can independently orchestrate attacks across networks. 

Image Source

AI-enhanced Network Intrusions

AI is increasingly used to automate the process of network intrusion, allowing for rapid and sophisticated attacks. By leveraging AI, attackers can quickly analyze vast data to identify vulnerabilities and orchestrate network attacks. These AI-powered tools can mimic normal user behavior to evade detection systems and perform actions such as data theft, system disruption, or deploying further malware. AI-driven network intrusions represent a significant threat because they can operate at a scale and speed that human attackers cannot match. Integrating AI into network attacks necessitates advancements in equally sophisticated AI-driven security measures to effectively detect and neutralize these threats.

Image Source

AI in Information Warfare

AI’s capabilities are being exploited in information warfare to automate the creation and dissemination of disinformation. This application of AI can influence public opinion, manipulate political outcomes, and destabilize societal cohesion. AI algorithms can generate believable news stories, social media posts, and even fake images or videos, spreading them across platforms where they can be difficult to distinguish from real information. The strategic use of such AI-generated content can profoundly affect public perception and discourse, making it a powerful tool in information warfare. Addressing this challenge requires robust mechanisms to detect AI-generated content and educate the public about the potential for misinformation.

AI for Exploiting IoT Vulnerabilities

The proliferation of IoT devices has expanded the attack surface for cyber threats, and AI is being used to exploit vulnerabilities in these devices. Attackers use AI to automate discovering unsecured IoT devices and deploy botnets or malicious software. This can lead to large-scale attacks, such as distributed denial of service (DDoS), which can impact infrastructure, steal data, or gain unauthorized access to networks. The ability of AI to learn and adapt makes it particularly effective at identifying new vulnerabilities as they emerge, challenging cybersecurity professionals to constantly update defenses.

Image Source

AI and Cryptographic Attacks

AI is also making waves in cryptography by enabling more effective attacks on cryptographic algorithms. Through machine learning and pattern recognition techniques, AI systems can analyze encrypted data to find vulnerabilities without knowing the underlying encryption key. This can potentially lead to the decryption of sensitive data without authorization. The evolving capability of AI to break cryptographic protections faster than ever poses a significant threat to the security of data transmissions and stored information, urging the development of more resilient cryptographic methods that can withstand AI-driven attacks.

Sources

https://ar5iv.org/pdf/2310.13715

https://ar5iv.org/abs/2310.05595

https://ar5iv.org/abs/2307.16336

https://arxiv.org/abs/2103.07110

https://ar5iv.org/abs/2310.07099

https://ar5iv.org/pdf/2204.03433

https://ar5iv.org/abs/2311.02986

https://ar5iv.org/abs/1803.04646

The post The Rise of Adversarial AI in Cyberattacks appeared first on MarkTechPost.

Analyzing the Impact of Flash Attention on Numeric Deviation and Train …

The challenge of training large and sophisticated models is significant, primarily due to the extensive computational resources and time these processes require. This is particularly evident in training large-scale Generative AI models, which are prone to frequent instabilities manifesting as disruptive loss spikes during extended training sessions. Such instabilities often lead to costly interruptions that necessitate pausing and restarting the training process, a challenge noted in models as expansive as the LLaMA2’s 70-billion parameter model, which required over 1.7 million GPU hours.

The root of these instabilities is often traced back to numeric deviations—small, cumulative errors in the computation process that can lead to significant deviations from expected training outcomes. Researchers have explored various optimization methods, including the Flash Attention technique, which aims to reduce the computational overhead in transformer models, a widely recognized bottleneck.

Flash Attention, a technique analyzed for its utility and efficiency, particularly targets the efficiency of the attention mechanism, a crucial component of transformer models. This technique leverages a system of tiling and recomputation to process the attention mechanism’s large matrices more efficiently, minimizing the extensive memory usage that traditional methods incur. For instance, in specific implementations, Flash Attention has demonstrated a 14% increase in speed for both forward and backward processing passes in text-to-image models, highlighting its potential for enhancing training efficiency.

The method introduces certain computational nuances, such as rescaling factors necessary for managing data blocks within the model’s memory constraints. While beneficial for memory management, these rescaling factors introduce an additional layer of numeric deviation. Researchers from FAIR at Meta, Harvard University, and Meta have quantified this deviation, finding that Flash Attention introduces roughly ten times more numeric deviation than Baseline Attention at BF16 numerical precision. However, a more comprehensive analysis, like one utilizing the Wasserstein Distance, shows that this deviation is still 2-5 times less impactful than deviations from low-precision training.

Despite the improvements in computational efficiency and memory usage, the numeric deviations associated with Flash Attention could still pose risks to model training stability. Analyzing these deviations is critical, allowing a deeper understanding of how they can impact long-term training stability. As such, while Flash Attention offers considerable advantages in terms of efficiency and speed, its broader implications on training stability require careful evaluation.

In conclusion, Flash Attention advances in optimizing attention mechanisms within large-scale machine learning models. Efficiently managing the computational demands and reducing memory usage marks a step forward in addressing the enduring challenge of training instabilities. However, the introduction of numeric deviations by the method underscores the need for ongoing analysis and potential refinement to ensure that these efficiencies do not inadvertently compromise the overall stability of model training. Thus, while Flash Attention provides a promising avenue for improving training processes, its implications on stability are yet to be fully realized and warrant further investigation.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post Analyzing the Impact of Flash Attention on Numeric Deviation and Training Stability in Large-Scale Machine Learning Models appeared first on MarkTechPost.

How LotteON built dynamic A/B testing for their personalized recommend …

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON.
LotteON is transforming itself into an online shopping platform that provides customers with an unprecedented shopping experience based on its in-store and online shopping expertise. Rather than simply selling the product, they create and let customers experience the product through their platform.
LotteON has been providing various forms of personalized recommendation services throughout the LotteON customer journey and across its platform, from its main page to its shopping cart and order completion pages. Through the development of new, high-performing models and continuous experimentation, they’re providing customers with personalized recommendations, improving CTR (click-through rate) metrics and increasing customer satisfaction.
In this post, we show you how LotteON implemented dynamic A/B testing for their personalized recommendation system.
The dynamic A/B testing system monitors user reactions, such as product clicks, in real-time from the recommended item lists provided. It dynamically assigns the most responsive recommendation model among multiple models to enhance the customer experience with the recommendation list. Using Amazon SageMaker and AWS services, these solutions offer insights into real-world implementation know-how and practical use cases for deployment.

Defining the business problem
In general, there are two types of A/B testing that are useful for measuring the performance of a new model: offline testing and online testing. Offline testing evaluates the performance of a new model based on past data. Online A/B testing, also known as split testing, is a method used to compare two versions of a webpage, or in LotteON’s case, two recommendation models, to determine which one performs better. A key strength of online A/B testing is its ability to provide empirical evidence based on user behavior and preferences. This evidence-based approach to selecting a recommendation model reduces guesswork and subjectivity in optimizing both click-through rates and sales.
A typical online A/B test serves two models in a certain ratio (such as 5:5) for a fixed period of time (for example, a day or a week). When one model performs better than the other, the lower performing model is still served for the duration of the experiment, regardless of its impact on the business. To improve this, LotteON turned to dynamic A/B testing, which evaluates the performance of models in real time and dynamically updates the ratios at which each model is served, so that better performing models are served more often. To implement dynamic A/B testing, they used the multi-armed bandit (MAB) algorithm, which performs real-time optimizations.
LotteON’s dynamic A/B testing automatically selects the model that drives the highest click-through rate (CTR) on their site. To build their dynamic A/B testing solution, LotteON used AWS services such as Amazon SageMaker and AWS Lambda. By doing so, they were able to reduce the time and resources that would otherwise be required for traditional forms of A/B testing. This frees up their scientists to focus more of their time on model development and training.
Solution and implementation details
The MAB algorithm evolved from casino slot machine profit optimization. MAB’s usage method differs in selection (arm) from the existing method, which is widely used to re-rank news or products. In this implementation the selection (the arm) in MAB must be a model. There are various MAB algorithms such as ε-greedy and Thompson sampling.
The ε-greedy algorithm balances exploration and exploitation by choosing the best-known option most of the time, but randomly exploring other options with a small probability ε. Thompson sampling involves defining the β distribution for each option, with parameters alpha (α) representing the number of successes so far and beta (β) representing failures. As the algorithm collects more observations, alpha and beta are updated, shifting the distributions toward the true success rate. The algorithm then randomly samples from these distributions to decide which option to try next—balancing exploitation of the best-performing options to-date with exploration of less-tested options. In this way, MAB learns which model is best based on actual outcomes.
Based on LotteON’s evaluation of both ε-greedy and Thompson sampling, which considered the balance of exposure opportunities of the models under test, they decided to use Thompson sampling. Based on the number of clicks obtained, they were able to derive an efficiency model. For a hands-on workshop on dynamic A/B testing with MAB and Thompson sampling algorithms, see Dynamic A/B Testing on Amazon Personalize & SageMaker Workshop. LotteON’s goal was to provide real-time recommendations for high CTR efficient models.

With the option (arm) configured as a model, and the alpha value for each model configured as a click, the beta value for each model was configured as a non-click. To apply the MAB algorithm to actual services, they introduced the bTS (batched Thompson sampling) method, which processes Thompson sampling on a batch basis. Specifically, they evaluated models based on traffic over a certain period of time (24 hours), and updated parameters at a certain time interval (1 hour).
In the handler part of the Lambda function, a bTS operation is performed that reflects the parameter values ​​for each model (arm), and the click probabilities of the two models are calculated. The ID of the model with the highest probability of clicks is then selected. One thing to keep in mind when conducting dynamic A/B testing is not to start Thompson sampling right away. You should allow warm-up time for sufficient exploration. To avoid prematurely determining the winner due to small parameter values at the beginning of the test, you must collect an adequate number of impressions or click-metrics.
Dynamic A/B test architecture
The following figure shows the architecture for the dynamic A/B test that LotteON implemented.

The architecture in the preceding figure shows the data flow of Dynamic A/B testing and consists of the following four decoupled components:
1. MAB serving flow
Step 1: The user accesses LotteON’s recommendation page.
Step 2: The recommendations API checks MongoDB for information about ongoing experiments with recommendation section codes and, if the experiment is active, sends an API request with the member ID and section code to the Amazon API Gateway.
Step 3: API Gateway provides the received data to Lambda. If there is relevant data in the API Gateway cache, a specific model code in the cache is immediately passed to the recommendation API.
Step 4: The Lambda function checks the experiment type (that is, dynamic A/B test or online static A/B test) in MongoDB and runs its algorithm. If the experiment type is dynamic A/B test, the alpha (number of clicks) and beta (number of non-clicks) required for the Thompson sampling algorithm are retrieved from MongoDB, the values ​​are obtained, and the Thompson sampling algorithm is run. Through this, the selected model’s identifier is delivered to Amazon API Gateway by the Lambda function.
Step 5: API Gateway provides the selected model’s identifier to the recommended API and caches the selected model’s identifier for a certain period of time.
Step 6: The recommendation API calls the model inference server (that is, the SageMaker endpoint) using the selected model’s identifier to receive a recommendation list and provides it to the user’s recommendation web page.
2. The flow of an alpha and beta parameter update
Step 1: The system powering LotteON’s recommendation page stores real-time logs in Amazon S3.
Step 2: Amazon EMR downloads the logs stored in Amazon S3.
Step 3: Amazon EMR processes the data and updates the alpha and beta parameter values to MongoDB for use in the Thompson sampling algorithm.
3. The flow of business metrics monitoring
Step 1: Streamlit pulls experimental business metrics from MongoDB to visualize.
Step 2: Monitor efficiency metrics such as CTR per model over time.
4. The flow of system operation monitoring
Step 1: When a recommended API call occurs, API Gateway and Lambda are launched, and Amazon CloudWatch logs are produced.
Step 2: Check system operation metrics using CloudWatch and AWS X-Ray dashboards based on CloudWatch logs.
Implementation Details 1: MAB serving flow mainly involving API Gateway and Lambda
The APIs that can serve MAB results—that is, the selected model—are implemented using serverless compute services, Lambda, and API Gateway. Let’s take a look at the implementation and settings.
1. API Gateway configuration
When a LotteON user signs in to the recommended product area, member ID, section code, and so on are passed to API Gateway as GET parameters. Using the passed parameters, the selected model can be used for inferencing during a certain period of time through the cache function of Amazon API Gateway.
2. API Gateway cache settings
Setting up a cache in API Gateway is straightforward. To set up the cache, first enable it by selecting the appropriate checkbox under the Settings tab for your chosen stage. After it’s activated, you can define the cache time-to-live (TTL), which is the duration in seconds that cached data remains valid. This value can be set anywhere up to a maximum of 3,600 seconds.

The API Gateway caching feature is limited to the parameters of GET requests. To use caching for a particular parameter, you should insert a query string in the GET request’s query parameters within the resource. Then select the Enable API Cache option. It is essential to deploy your API using the deploy action in the API Gateway console to activate the caching function.

After the cache is set, the same model is used for inference on specific customers until the TTL has elapsed. Following that, or when the recommendation section is first exposed, API Gateway will call Lambda with the MAB function implemented.
3. Add an API Gateway mapping template
When a Lambda handler function is invoked, it can receive the HTTPS request details from API Gateway as an event parameter. To provide a Lambda function with more detailed information, you can enhance the event payload using a mapping template in the API Gateway. This template is part of the integration request setup, which defines how incoming requests are mapped to the expected format of the Lambda function.

The specified parameters are then passed to the Lambda function’s event parameters. The following code is an example of source code that uses the event parameter in Lambda.

def lambda_handler (event, context):
event_param = event [“name”]
return {
‘message’: event_param
}

4. Lambda for Dynamic A/B Test
Lambda receives a member ID and section code as event parameter values. The Lambda function uses the received section code to run the MAB algorithm. In the case of the MAB algorithm, a dynamic A/B test is performed by getting the model (arm) settings and aggregated results. After updating the alpha and beta values according to bTS when reading the aggregated results, the probability of a click for each model is obtained through the beta distribution (see the following code), and the model with the maximum value is returned. For example, given model A and model B, where model B has a higher probability of producing a click-through event, model B is returned.

def select_variant (self):
probs = []
for v in self.variant_metrics:
success = v[“mab_alpha”]
failure = v[“mab_beta”]
probs.append(AlgorithmBase.random_beta(1 + success, 1 + failure))

variant_index = AlgorithmBase.argmax(probs)

return (self.variant_metrics [variant_index] [“variant_name”], probs)

The overall implementation using the bTS algorithm, including the above code, was based on the Dynamic A/B testing for machine learning models with Amazon SageMaker MLOps projects post.
Implementation details 2: Alpha and beta parameter update
A product recommendation list is displayed to the LotteON user. When the user clicks on a specific product in the recommendation list, that data is captured and logged to Amazon S3. As shown in the following figure, LotteON used AWS EMR to perform Spark Jobs that periodically pulled the logged data from S3, processed the data, and inserted the results into MongoDB.

The results generated at this stage play a key role in determining the distribution used in MAB. The following impression and click data were examined in detail.

Impression and click data

Note: Before updating the alpha and beta parameters in bTS, verify the integrity and completeness of log data, including impressions and clicks from the recommendation section.
Implementation details 3: Business metrics monitoring
To assess the most effective model, it’s essential to monitor business metrics during A/B testing. For this purpose, a dashboard was developed using Streamlit on an Amazon Elastic Compute Cloud (Amazon EC2) environment.
Streamlit is a Python library can be used to create web apps for data analysis. LotteON added the necessary Python package information for dashboard configuration to the requirements.txt file, specifying Streamlit version 1.14.1, and proceeded with the installation as demonstrated in the following:

$ python3 -m pip install –upgrade pip
$ pip3 install -r requirements.txt

The default port provided by Streamlit is 8501, so it’s required to set the inbound custom TCP port 8501 to allow access to the Streamlit web browser.

When setup is complete, use the streamlit run pythoncode.py command in the terminal, where pythoncode.py is the Python script containing the Streamlit code to run the application. This command launches the Streamlit web interface for the specified application.

import streamlit as st
st.title (‘streamlit example’)

LotteON created a dashboard based on Streamlit. The functionality of this organized dashboard includes monitoring simple business metrics such as model trends over time, daily and real-time winner models, as shown in the following figure.
The dashboard allowed LotteON to analyze the business metrics of the model and check the service status in real time. It also monitored the effectiveness of model version updates and reduced the time to check the service impact of the retraining pipeline.

The following shows an enlarged view of the cumulative CTR of the two models (EXP-01-APS002-01 model A, EXP-01-NCF-01 model B) on the testing day. Let’s take a look at each model to see what that means. Model A provided customers with 29,274 recommendation lists that received 1,972 product clicks and generated a CTR of 6.7 percent (1,972/29,274).
Model B, on the other hand, served 7,390 recommended lists, received 430 product clicks, and generated a CTR of 5.8 percent (430/7,390). Alpha and beta parameters, the number of clicks and the number of non-clicks respectively, of each model were used to set the beta distribution. Model A’s alpha parameter was 1972 (number of clicks) and its beta parameter was 27,752 (number of non-clicks [29,724 – 1,972]). Model B’s alpha parameter was 430 (number of clicks) and its beta parameter was 6,960 (number of non-clicks). The larger the X-axis value corresponding to the peak in the beta distribution graph, the better the performance (CTR) model.
In the following figure, model A (EXP-01-APS002-01) shows better performance because it’s further to the right in relation to the X axis. This is also consistent with the CTR rates of 6.7 percent and 5.8 percent.

Implementation details 4: System operation monitoring with CloudWatch and AWS X-Ray
You can enable CloudWatch settings, custom access logging, and AWS X-Ray tracking features from the Logs/Tracking tab in the API Gateway menu.
CloudWatch settings and custom access logging
In the configuration step, you can change the CloudWatch Logs type to set the logging level, and after activating detailed indicators, you can check detailed metrics such as 400 errors and 500 errors. By enabling custom access logs, you can check which IP accessed the API and how.

Additionally, the retention period for CloudWatch Logs must be specified separately on the CloudWatch page to avoid storing them indefinitely.
If you select API Gateway from the CloudWatch Explorer list, you can view the number of API calls, latency, and cache hits and misses on a dashboard. Find the Cache Hit Rate as shown in the following formula and check the effectiveness of the cache on the dashboard.

Cache Hit Rate = CacheHitCount / (CacheHitCount + CacheMissCount)

By selecting Lambda as the log group in the CloudWatch Logs Insights menu, you can verify the actual model code returned by Lambda, where MAB is performed, to check whether the sampling logic is working and branch processing is being performed.

fields @timestamp, @message, @logStream, @log
| filter @message like ‘Model A’ or message like ‘Model B’
| stats count (*) by @message

As shown in the preceding image, LotteON observed how often the two models were called by the Lambda function during the A/B test. Specifically, the model labeled LF001-01 (the champion model) was invoked 4,910 times, while the model labeled NCF-02 (the challenger model) was invoked 4,905 times. These numbers represent the degree to which each model was selected in the experiment.
AWS X-Ray
If you enable the X-Ray trace feature, trace data is sent from the enabled AWS service to X-Ray and the visualized API service flow can be monitored from the service map menu in the X-Ray section of the CloudWatch page.

As shown in the preceding figure, you can easily track and monitor latency, number of calls, and number of HTTP call status for each service section by choosing the API Gateway icon and each Lambda node.
There was no need to store performance metrics for a long time because most for Lambda functions metrics are analyzed within a week and aren’t used afterward. Because data from X-Ray is stored for 30 days by default, which is enough time to use the metrics, the data was used without changing the storage cycle. (For more information, see the AWS X-Ray FAQs.)
Conclusion
In this post, we explained how Lotte ON builds and uses a dynamic A/B testing environment. Through this project, Lotte ON was able to test the model’s performance in various ways online by combining dynamic A/B testing with the MAB function. It also allows comparison of different types of recommendation models and is designed to be comparable across model versions, facilitating online testing.
In addition, data scientists could concentrate on improving model performance and training as they can check metrics and system monitoring instantly. The dynamic A/B testing system was initially developed and applied to the LotteON main page, and then expanded to the main page recommendation tab and product detail recommendation section. Because the system is able to evaluate online performance without significantly reducing the click-through rate of existing models, we have been able to conduct more experiments without impacting users.
Dynamic A/B Test exercises can also be found in AWS Workshop – Dynamic A/B Testing on Amazon Personalize & SageMaker.

About the Authors
HyeKyung Yang is a research engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of developing ML/DL recommendation models by analyzing and utilizing various data and developing a dynamic A/B test environment.
Jieun Lim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of operating LotteON’s personalized recommendation system and developing personalized recommendation models and dynamic A/B test environments.
SeungBum Shim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team, responsible for discovering ways to use and improve recommendation-related products through LotteON data analysis, and developing MLOps pipelines and ML/DL recommendation models.
Jesam Kim is an AWS Solutions Architect and helps enterprise customers adopt and troubleshoot cloud technologies and provides architectural design and technical support to address their business needs and challenges, especially in AIML areas such as recommendation services and generative AI.
Gonsoo Moon is an AWS AI/ML Specialist Solutions Architect and provides AI/ML technical support. His main role is to collaborate with customers to solve their AI/ML problems based on various use cases and production experience in AI/ML.