This AI Research Unveils LSS Transformer: A Revolutionary AI Approach …

A new AI research has introduced the Long Short-Sequence Transformer (LSS Transformer), an efficient distributed training method tailored for transformer models with extended sequences. It segments long sequences among GPUs, with each GPU handling partial self-attention computations. LSS Transformer employs fused communication and a unique double gradient averaging technique to minimize transmission overhead, resulting in impressive speedups and memory reduction, surpassing other sequence parallel methods. Performance evaluation on the Wikipedia enwik8 dataset shows that the LSS Transformer achieves faster training and improved memory efficiency on multiple GPUs, outperforming Nvidia’s sequence parallelism. 

The transformer, known for its self-attention mechanism, is a powerful neural network architecture used in natural language and image processing. Training transformers with longer sequences enhances contextual information grasp and prediction accuracy but increases memory and computational demands. Various approaches have been explored to address this challenge, including hierarchical training, attention approximation, and distributed sequence parallelism. 

The LSS Transformer outperformed state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs by achieving 5.6 times faster training and 10.2 times improved memory efficiency on the Wikipedia enwik8 dataset. It demonstrated remarkable scalability, handling an extreme sequence length of 50,112 with 3,456 GPUs, attaining 161% super-linear parallel efficiency and a substantial throughput of 32 petaflops. In the context of weak scaling performance, the LSS Transformer exhibited superior scalability and reduced communication compared to other sequence parallel methods. In a large model experiment involving 108 GPUs, it maintained a high scaling efficiency of 92 and showcased a smaller memory footprint when contrasted with baseline parallelism. The LSS Transformer also excelled with a computation throughput of 8 petaflops at 144 nodes for a sequence length 50,112, surpassing baseline sequence parallelism in speed and scalability. 

The LSS Transformer presents a groundbreaking solution to the challenge of training transformer models on lengthy sequences, delivering remarkable speed enhancements and memory efficiency while minimizing communication overhead. This distributed training method segments sequences across GPUs, utilizing fused communication and double gradient averaging. The LSS Transformer’s ability to facilitate ultra-long sequence training makes it a valuable asset for applications requiring extensive token dependencies, such as DNA sequence analysis, lengthy document summarization, and image processing.

The study has some limitations. First, it needs to be compared with existing methods for long sequence training, focusing on Nvidia sequence parallelism. Second, an in-depth examination of the trade-offs between accuracy and efficiency achieved by the LSS Transformer is needed. Third, it needs to address potential real-world implementation challenges. Fourth, it does not explore the influence of varying hyperparameters or architectural modifications on the LSS Transformer’s performance. Lastly, there is no comprehensive comparison with approximation-based approaches for reducing computation and memory usage.

Future research directions for the LSS Transformer include:

Evaluating its performance and scalability across diverse datasets and tasks.

Extending its applicability to various transformer models, for example, encoder-only or decoder-only.

Optimizing for larger sequence lengths and more GPUs to enhance ultra-long sequence training.

Refining techniques for handling intertoken dependencies in an efficient and parallelized manner.

Integrating the LSS Transformer into established deep learning frameworks to improve accessibility for researchers and practitioners.

These efforts can broaden its utility and adoption in the field.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post This AI Research Unveils LSS Transformer: A Revolutionary AI Approach for Efficient Long Sequence Training in Transformers appeared first on MarkTechPost.

Researchers from China Introduce CogVLM: A Powerful Open-Source Visual …

Models of visual language are strong and flexible. Next, token prediction may be used to create a variety of vision and cross-modality tasks, such as picture captioning, visual question answering, visual grounding, and even segmentation. As VLMs are scaled up, useful skills like in-context learning also appear along with the enhancement of downstream activities. It is more difficult to train a VLM from the start with the same NLP performance as well-trained pure language models like LLaMA2, as introducing a big language model is already a difficult task. Consequently, it makes sense to look at the process of training a VLM using a pre-trained language model that is readily available. 

The widely used shallow alignment techniques, represented by BLIP-2, transfer the image characteristics into the language model’s input embedding space using a trainable Q-Former or a linear layer, which connects a frozen pretrained vision encoder and language model. While this approach converges quickly, it does not perform as well as training the language and vision modules simultaneously, such as PaLI-X. When it comes to chat-style VLM that was taught using shallow alignment techniques, such as MiniGPT-4, LLAVA, and VisualGLM, the poor visual comprehension skills show up as hallucinations. Is it feasible to enhance the big language model’s visual understanding skills without sacrificing its natural language processing (NLP) capabilities? 

CogVLM responds with a “yes.” Researchers from Zhipu AI and Tsinghua University introduced CogVLM. This powerful open-source visual language foundation model believes the lack of deep integration between language and visual information is the primary reason for the shallow alignment approaches’ subpar performance. This idea came from comparing the two approaches to effective finetuning: p-tuning learns a task prefix embedding in the input. LoRA uses a low-rank matrix to adjust the model weights in each layer. LoRA functions more effectively and steadily as a result. Since the picture features in the shallow alignment techniques behave similarly to the prefix embedding in p-tuning, a similar occurrence may also occur in VLM. 

The following are more specific causes of p-tuning and shallow alignment’s decreased performance: 

1. Text tokens train the language model’s frozen weights. The input text area only perfectly matches visual characteristics. The visual characteristics may, therefore, no longer match the input distribution of the weights in the deep layers following multi-layer modifications. 

2. The writing style and caption length of the picture captioning job, for instance, may only be encoded into the visual characteristics in the shallow alignment approaches during pretraining. The coherence between the visual elements and the content could be stronger. Adapting the language model to the image-text combined training, as used by Qwen-VL and PaLI, is one potential remedy.

However, this unnecessarily impairs NLP, which may impact text-centered activities like creating image-based poetry or providing context for pictures. Making the language model trainable during VLM pretraining, according to PaLM-E, will result in catastrophic forgetting and a loss of 87.3% in the NLG performance for the 8B language model. Instead, CogVLM enhances the language model with a trainable visual expert. Each layer uses a separate QKV matrix for the picture features in the sequence and an MLP layer for the text characteristics. The visual expert maintains the same FLOPs but increases the number of parameters. If there isn’t an image in the input sequence, the behaviors are the same as in the original language model since all parameters are fixed. 

On 14 typical cross-modal benchmarks, such as: 1) image captioning datasets (NoCaps, Flicker30k, COCO), 2) VQA datasets (VQAv2, OKVQA, GQA, TextVQA, VizWiz), and 3) image captioning datasets (SecondBest), their CogVLM-17B trained from Vicuna-7B achieves state-of-the-art or the second-best performance. 3) multiple choice datasets (TDIUC, ScienceQA); 4) visual grounding datasets (RefCOCO, RefCOCO+, RefCOCOg, Visual7W). Not included in this study is the CogVLM-28B-zh that they trained from ChatGLM-12B to support both Chinese and English for commercial use. Since the majority of the most well-known VLMs in the past, such as Flamingo, SimVLM, Coca, BEIT-3, GIT2, PaLI, and PaLI-X, are closed-source, it is anticipated that CogVLM’s open-sourcing will have a significant positive impact on visual understanding research and industrial application.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Researchers from China Introduce CogVLM: A Powerful Open-Source Visual Language Foundation Model appeared first on MarkTechPost.

Google DeepMind Researchers Propose a Framework for Classifying the Ca …

The recent development in the fields of Artificial Intelligence (AI) and Machine Learning (ML) models has turned the discussion of Artificial General Intelligence (AGI) into a matter of immediate practical importance. In computing science, Artificial General Intelligence, or AGI, is a crucial idea that refers to an artificial intelligence system that can do a broad range of tasks at least as well as humans. There is an increasing need for a formal framework to categorize and comprehend the behavior of AGI models and their precursors as the capabilities of machine learning models advance.

In recent research, a team of researchers from Google DeepMind has proposed a framework called ‘Levels of AGI’ to create a systematic approach similar to the levels of autonomous driving for categorizing the skills and behavior of Artificial General Intelligence models and their predecessors. This framework has introduced three important dimensions: autonomy, generality, and performance. This approach has offered a common vocabulary that makes it easier to compare models, evaluate risks, and track advancement toward Artificial Intelligence.

The team has analyzed previous definitions of AGI to create this framework, distilling six ideas they thought were necessary for a practical AGI ontology. The development of the suggested framework has been guided by these principles, which highlight the significance of concentrating on capabilities rather than mechanisms. This includes assessing generality and performance independently and identifying steps rather than just the end goal when shifting towards AGI.

The researchers have shared that the resulting levels of the AGI framework have been constructed around two fundamental aspects, including depth, i.e., the performance, and breadth, which is the generality of capabilities. The framework facilitates comprehension of the dynamic environment of artificial intelligence systems by classifying AGI based on these features. It suggests steps that correspond to varying degrees of competence in terms of both performance and generality.

The team has acknowledged the difficulties and complexities involved while evaluating how existing AI systems fit within the suggested approach. Future benchmarks, which are needed to accurately measure the capabilities and behavior of AGI models compared to the predetermined thresholds, have also been discussed. This focus on benchmarking is essential for assessing development, pinpointing areas in need of development, and guaranteeing an open and quantifiable progression of AI technologies.

The framework has taken into account deployment concerns, specifically risk and autonomy, in addition to technical considerations. Emphasizing the complex relationship between deployment factors and AGI levels, the team has emphasized how critical it is to choose human-AI Interaction paradigms carefully. The ethical aspect of implementing highly capable AI systems has also been highlighted by this emphasis on responsible and safe deployment, which calls for a methodical and cautious approach.

In conclusion, the suggested classification scheme for AGI behavior and capabilities is thorough and well-considered. The framework emphasizes the need for responsible and safe integration into human-centric contexts and provides a structured way to evaluate, compare, and direct the development and deployment of AGI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Google DeepMind Researchers Propose a Framework for Classifying the Capabilities and Behavior of Artificial General Intelligence (AGI) Models and their Precursors appeared first on MarkTechPost.

Johannes Kepler University Researchers Introduce GateLoop: Advancing S …

A researcher from Johannes Kepler University has introduced GateLoop, a novel sequence model that leverages the potential of linear recurrence for efficient long-sequence modeling. It generalized linear recurrent models and outperformed them in auto-regressive language modeling. GateLoop offers low-cost recurrent and efficient parallel modes while introducing a surrogate attention mode that has implications for Transformer architectures. It provides data-controlled relative-positional information to Attention, emphasizing the significance of data-controlled cumulative products for more robust sequence models beyond traditional cumulative sums used in existing models.

GateLoop is a versatile sequence model that extends the capabilities of linear recurrent models like S4, S5, LRU, and RetNet by employing data-controlled state transitions. GateLoop excels in auto-regressive language modeling, offering both cost-efficient recurrent and highly efficient parallel modes. It introduces a surrogate attention mode with implications for Transformer architectures. The study discusses key aspects like prefix-cumulative-product pre-computation, operator associativity, and non-data-controlled parameterization. GateLoop is empirically validated with lower perplexity scores on the WikiText103 dataset. Existing models are shown to underutilize linear recurrence’s potential, which GateLoop addresses with data-controlled transitions and complex cumulative products.

Sequences with long-range dependencies pose challenges in machine learning, traditionally tackled with Recurrent Neural Networks (RNNs). However, RNNs face vanishing and exploding gradients, hindering their stability for lengthy sequences. Gated variants like LSTM and GRU alleviate these problems but must be more efficient. Transformers introduced attention mechanisms for global dependencies, eliminating recurrence. Although they enable efficient parallel training and global pairwise dependencies, their quadratic complexity limits use with long sequences. Linear Recurrent Models (LRMs) offer an alternative, with GateLoop as a foundational sequence model generalizing LRMs through data-controlled state transitions, excelling in auto-regressive language modeling, and providing versatile operational modes.

GateLoop offers an efficient O(l) recurrent mode, an optimized O(llog2l) parallel mode, and an O(l2) surrogate attention mode, providing data-controlled relative-positional information to Attention. Experiments on the WikiText-103 benchmark demonstrate GateLoop’s autoregressive natural language modeling prowess. A synthetic task confirms the empirical advantage of data-controlled over non-data-controlled state transitions. Key aspects include prefix-cumulative-product pre-computation and non-data-controlled parameterization to prevent variable blow-up.

GateLoop, a sequence model incorporating data-controlled state transitions, excels in auto-regressive language modeling, as demonstrated in experiments on the WikiText-103 benchmark. It achieves a lower test perplexity than other models, highlighting the practical benefits of data-controlled state transitions in sequence modeling. GateLoop’s capacity to forget memories input-dependently allows it to manage its hidden state effectively for relevant information. The research outlines future research possibilities, including exploring initialization strategies, amplitude and phase activations, and the interpretability of learned state transitions for deeper model understanding.

GateLoop, a fully data-controlled linear RNN, extends existing linear recurrent models through data-controlled gating of inputs, outputs, and state transitions. It excels in auto-regressive language modeling, outperforming other models. GateLoop’s mechanism provides relative positional information to Attention and can be reformulated in an equivalent surrogate attention mode with O(l2) complexity. Empirical results validate the efficacy of fully data-controlled linear recurrence in autoregressive language modeling. The model can forget memories input-dependently, making room for pertinent information. Future research avenues include exploring different initialization strategies, amplitude, and phase activations and enhancing the interpretability of learned state transitions.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Johannes Kepler University Researchers Introduce GateLoop: Advancing Sequence Modeling with Linear Recurrence and Data-Controlled State Transitions appeared first on MarkTechPost.

This AI Paper Introduces PolyID: Pioneering Machine Learning in the Di …

Artificial intelligence is being used in all aspects of life. AI is used in all directions of life and has become useful in various fields, including chemicals and polymers. In chemistry and polymer science, AI helps scientists discover new materials. It predicts how different chemicals react and suggests the best combinations for creating new and better materials. This makes the process of developing chemicals and polymers faster and more efficient. 

However, the challenge confronting material scientists in the 21st century lies in the formulation of more sustainable polymers with better performance standards. This challenge becomes particularly pronounced when the primary available resources are limited to petrochemicals. This task necessitates a balance, requiring both ingenuity and advanced scientific methodologies to develop polymers that meet rigorous performance criteria and adhere to sustainable principles in alignment with contemporary environmental considerations.

According to Brandon Knott, a National Renewable Energy Laboratory (NREL) scientist, petroleum is primarily composed of hydrocarbons, essentially configurations of carbon and hydrogen. These molecular arrangements exhibit beneficial properties, forming the foundation for various advantageous characteristics. Knott’s finding emphasizes that it is important to comprehend the hydrocarbon elements and the molecular makeup of petroleum to harness its extraordinary characteristics for various applications.

Hydrocarbons lack elements like oxygen and nitrogen. But, these elements are essential when manufacturing polymers that require a broader range of functionalities beyond what hydrocarbons alone can offer. Knott suggests a solution involving introducing biomass and waste rich in oxygen and nitrogen into the ingredient list. Materials such as corn stalks, algae, and even garbage possess additional chemical linkages, providing chemists with increased flexibility to achieve specific properties in the polymer manufacturing process. This approach not only broadens the functionality of polymers but also contributes to a more sustainable and resourceful production methodology.

The National Renewable Energy Laboratory (NREL) has employed an advanced machine learning tool, PolyID (Polymer Inverse Design), to facilitate the balance in polymer development. This tool predicts material properties based on molecular structure. With PolyID, researchers can evaluate millions of potential polymer designs and generate a shortlist tailored for specific applications. 

PolyID establishes connections between the arrangements of elements such as oxygen, hydrogen, and carbon and material properties, facilitating the prediction of attributes like elasticity, heat tolerance, and sealant performance. NREL scientists effectively utilized PolyID to assess over 15,000 plant-based polymers, seeking biodegradable alternatives for contemporary food packaging films primarily composed of high-density polyethylene, a petroleum-based material. PolyID prioritized essential properties, including high-temperature resistance and robust vapor sealing, while also incorporating environmentally desirable attributes such as biodegradability and a reduced greenhouse gas footprint. 

The researchers also did laboratory testing to confirm the accuracy of PolyID’s predictions. The finding was that all seven polymers exhibited resistance to high temperatures and also demonstrated a capacity to lower net greenhouse gas emissions. Additionally, these polymers extended the freshness of packaged food, showcasing the potential of PolyID in efficiently identifying environmentally friendly and high-performance polymer solutions.

PolyID gains the ability to predict the design of new polymers for particular physical properties by building an extensive database that connects the molecular composition of polymers with their known characteristics. According to Nolan Wilson, the study’s principal author, the system can make extremely accurate predictions for novel structures that may not have been experienced or made before.

Check out the Paper and Reference Article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post This AI Paper Introduces PolyID: Pioneering Machine Learning in the Discovery of High-Performance Biobased Polymers appeared first on MarkTechPost.

Duke University Researchers Propose Policy Stitching: A Novel AI Frame …

In robotics, researchers face challenges in using reinforcement learning (RL) to teach robots new skills, as these skills can be sensitive to changes in the environment and robot structure. Current methods need help generalizing to new combinations of robots and tasks and handling complex, real-world tasks due to architectural complexity and strong regularisation. To tackle this issue., Researchers from Duke University and the Air Force Research Laboratory introduced Policy Stitching (PS). The approach enables the combination of separately trained robots and task modules to create a new policy for rapid adaptation. Both simulated and real-world experiments involving 3D manipulation tasks highlight the exceptional zero-shot and few-shot transfer learning capabilities of PS.

Challenges persist in transferring robot policies across diverse environmental conditions and novel tasks. Prior work has mainly concentrated on moving specific components within the RL framework, including value functions, rewards, experience samples, policies, parameters, and features. Meta-learning has emerged as a solution to enable rapid adaptation to new tasks, offering improved parameter initialization and memory-augmented neural networks for swift integration of new data without erasing prior knowledge. Compositional RL, applied in zero-shot transfer learning, multi-task learning, and lifelong learning, has shown promise. The trained modules within this framework are limited to use within a large modular system and cannot seamlessly integrate with new modules.

Robotic systems face challenges in transferring learned experiences to new tasks and body configurations, in contrast to humans’ ability to continuously acquire new skills based on past knowledge. Model-based robot learning aims to build predictive models of robot kinematics and dynamics for various tasks. In contrast, model-free RL trains policies end-to-end, but its transfer learning performance is often limited. Current multi-task RL approaches encounter difficulties as the policy network’s capacity expands exponentially with the number of tasks. 

PS utilizes modular policy design and transferable representations to facilitate knowledge transfer between distinct tasks and robot configurations. This framework is adaptable to a range of model-free RL algorithms. The study suggests extending the concept of Relative Representations from supervised learning to model-free RL, focusing on promoting transformation invariances by aligning intermediate representations in a common latent coordinate system. 

PS excels in zero-shot and few-shot transfer learning for new robot-task combinations, surpassing existing methods in simulated and real-world scenarios. In zero-shot transfers, PS achieves a 100% success rate in touching and 40% overall success, showcasing its capacity to generalize effectively in practical, real-world settings. Latent representation alignment significantly reduces the pairwise distances between high-dimensional latent states in stitched policies, underscoring its success in enabling the learning of transferable representations for PS. The experiments provide practical insights into PS’s real-world applicability within a physical robot setup, offering mobile representations in ineffective PS.

In conclusion, PS proves its efficacy in seamlessly transferring robot learning policies to novel robot-task combinations, underscoring the benefits of modular policy design and the alignment of latent spaces. The method aims to overcome current limitations, particularly concerning high-dimensional state representations and the necessity for fine-tuning. The research outlines future research directions, including exploring self-supervised techniques for disentangling latent features in anchor selections and investigating alternative methods for aligning network modules without relying on anchor states. The study emphasizes the potential for extending PS to a broader range of robot platforms with diverse morphologies.

Check out the Paper, Project, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Duke University Researchers Propose Policy Stitching: A Novel AI Framework that Facilitates Robot Transfer Learning for Novel Combinations of Robots and Tasks appeared first on MarkTechPost.

Build trust and safety for generative AI applications with Amazon Comp …

We are witnessing a rapid increase in the adoption of large language models (LLM) that power generative AI applications across industries. LLMs are capable of a variety of tasks, such as generating creative content, answering inquiries via chatbots, generating code, and more.
Organizations looking to use LLMs to power their applications are increasingly wary about data privacy to ensure trust and safety is maintained within their generative AI applications. This includes handling customers’ personally identifiable information (PII) data properly. It also includes preventing abusive and unsafe content from being propagated to LLMs and checking that data generated by LLMs follows the same principles.
In this post, we discuss new features powered by Amazon Comprehend that enable seamless integration to ensure data privacy, content safety, and prompt safety in new and existing generative AI applications.

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to uncover information in unstructured data and text within documents. In this post, we discuss why trust and safety with LLMs matter for your workloads. We also delve deeper into how these new moderation capabilities are utilized with the popular generative AI development framework LangChain to introduce a customizable trust and safety mechanism for your use case.
Why trust and safety with LLMs matter
Trust and safety are paramount when working with LLMs due to their profound impact on a wide range of applications, from customer support chatbots to content generation. As these models process vast amounts of data and generate humanlike responses, the potential for misuse or unintended outcomes increases. Ensuring that these AI systems operate within ethical and reliable boundaries is crucial, not just for the reputation of businesses that utilize them, but also for preserving the trust of end-users and customers.
Moreover, as LLMs become more integrated into our daily digital experiences, their influence on our perceptions, beliefs, and decisions grows. Ensuring trust and safety with LLMs goes beyond just technical measures; it speaks to the broader responsibility of AI practitioners and organizations to uphold ethical standards. By prioritizing trust and safety, organizations not only protect their users, but also ensure sustainable and responsible growth of AI in society. It can also help to reduce risk of generating harmful content, and help adhere to regulatory requirements.
In the realm of trust and safety, content moderation is a mechanism that addresses various aspects, including but not limited to:

Privacy – Users can inadvertently provide text that contains sensitive information, jeopardizing their privacy. Detecting and redacting any PII is essential.
Toxicity – Recognizing and filtering out harmful content, such as hate speech, threats, or abuse, is of utmost importance.
User intention – Identifying whether the user input (prompt) is safe or unsafe is critical. Unsafe prompts can explicitly or implicitly express malicious intent, such as requesting personal or private information and generating offensive, discriminatory, or illegal content. Prompts may also implicitly express or request advice on medical, legal, political, controversial, personal, or financial

Content moderation with Amazon Comprehend
In this section, we discuss the benefits of content moderation with Amazon Comprehend.
Addressing privacy
Amazon Comprehend already addresses privacy through its existing PII detection and redaction abilities via the DetectPIIEntities and ContainsPIIEntities APIs. These two APIs are backed by NLP models that can detect a large number of PII entities such as Social Security numbers (SSNs), credit card numbers, names, addresses, phone numbers, and so on. For a full list of entities, refer to PII universal entity types. DetectPII also provides character-level position of the PII entity within a text; for example, the start character position of the NAME entity (John Doe) in the sentence “My name is John Doe” is 12, and the end character position is 19. These offsets can be used to perform masking or redaction of the values, thereby reducing risks of private data propagation into LLMs.
Addressing toxicity and prompt safety
Today, we are announcing two new Amazon Comprehend features in the form of APIs: Toxicity detection via the DetectToxicContent API, and prompt safety classification via the ClassifyDocument API. Note that DetectToxicContent is a new API, whereas ClassifyDocument is an existing API that now supports prompt safety classification.
Toxicity detection
With Amazon Comprehend toxicity detection, you can identify and flag content that may be harmful, offensive, or inappropriate. This capability is particularly valuable for platforms where users generate content, such as social media sites, forums, chatbots, comment sections, and applications that use LLMs to generate content. The primary goal is to maintain a positive and safe environment by preventing the dissemination of toxic content.
At its core, the toxicity detection model analyzes text to determine the likelihood of it containing hateful content, threats, obscenities, or other forms of harmful text. The model is trained on vast datasets containing examples of both toxic and nontoxic content. The toxicity API evaluates a given piece of text to provide toxicity classification and confidence score. Generative AI applications can then use this information to take appropriate actions, such as stopping the text from propagating to LLMs. As of this writing, the labels detected by the toxicity detection API are HATE_SPEECH, GRAPHIC, HARRASMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT, INSULT, and PROFANITY. The following code demonstrates the API call with Python Boto3 for Amazon Comprehend toxicity detection:

import boto3
client = boto3.client(‘comprehend’)
response = client.detect_toxic_content(
TextSegments=[{“Text”: “What is the capital of France?”},
{“Text”: “Where do I find good baguette in France?”}],
LanguageCode=’en’)
print(response)

Prompt safety classification
Prompt safety classification with Amazon Comprehend helps classify an input text prompt as safe or unsafe. This capability is crucial for applications like chatbots, virtual assistants, or content moderation tools where understanding the safety of a prompt can determine responses, actions, or content propagation to LLMs.
In essence, prompt safety classification analyzes human input for any explicit or implicit malicious intent, such as requesting personal or private information and generation of offensive, discriminatory, or illegal content. It also flags prompts looking for advice on medical, legal, political, controversial, personal, or financial subjects. Prompt classification returns two classes, UNSAFE_PROMPT and SAFE_PROMPT, for an associated text, with an associated confidence score for each. The confidence score ranges between 0–1 and combined will sum up to 1. For instance, in a customer support chatbot, the text “How do I reset my password?” signals an intent to seek guidance on password reset procedures and is labeled as SAFE_PROMPT. Similarly, a statement like “I wish something bad happens to you” can be flagged for having a potentially harmful intent and labeled as UNSAFE_PROMPT. It’s important to note that prompt safety classification is primarily focused on detecting intent from human inputs (prompts), rather than machine-generated text (LLM outputs). The following code demonstrates how to access the prompt safety classification feature with the ClassifyDocument API:

import boto3
client = boto3.client(‘comprehend’)
response = self.client.classify_document(
Text=prompt_value,
EndpointArn=endpoint_arn)
print(response)

Note that endpoint_arn in the preceding code is an AWS-provided Amazon Resource Number (ARN) of the pattern arn:aws:comprehend:<region>:aws:document-classifier-endpoint/prompt-safety, where <region> is the AWS Region of your choice where Amazon Comprehend is available.
To demonstrate these capabilities, we built a sample chat application where we ask an LLM to extract PII entities such as address, phone number, and SSN from a given piece of text. The LLM finds and returns the appropriate PII entities, as shown in the image on the left.
With Amazon Comprehend moderation, we can redact the input to the LLM and output from the LLM. In the image on the right, the SSN value is allowed to be passed to the LLM without redaction. However, any SSN value in the LLM’s response is redacted.

The following is an example of how a prompt containing PII information can be prevented from reaching the LLM altogether. This example demonstrates a user asking a question that contains PII information. We use Amazon Comprehend moderation to detect PII entities in the prompt and show an error by interrupting the flow.

The preceding chat examples showcase how Amazon Comprehend moderation applies restrictions on data being sent to an LLM. In the following sections, we explain how this moderation mechanism is implemented using LangChain.
Integration with LangChain
With the endless possibilities of the application of LLMs into various use cases, it has become equally important to simplify the development of generative AI applications. LangChain is a popular open source framework that makes it effortless to develop generative AI applications. Amazon Comprehend moderation extends the LangChain framework to offer PII identification and redaction, toxicity detection, and prompt safety classification capabilities via AmazonComprehendModerationChain.
AmazonComprehendModerationChain is a custom implementation of the LangChain base chain interface. This means that applications can use this chain with their own LLM chains to apply the desired moderation to the input prompt as well as to the output text from the LLM. Chains can be built by merging numerous chains or by mixing chains with other components. You can use AmazonComprehendModerationChain with other LLM chains to develop complex AI applications in a modular and flexible manner.
To explain it further, we provide a few samples in the following sections. The source code for the AmazonComprehendModerationChain implementation can be found within the LangChain open source repository. For full documentation of the API interface, refer to the LangChain API documentation for the Amazon Comprehend moderation chain. Using this moderation chain is as simple as initializing an instance of the class with default configurations:

from langchain_experimental.comprehend_moderation import AmazonComprehendModerationChain

comprehend_moderation = AmazonComprehendModerationChain()

Behind the scenes, the moderation chain performs three consecutive moderation checks, namely PII, toxicity, and prompt safety, as explained in the following diagram. This is the default flow for the moderation.

The following code snippet shows a simple example of using the moderation chain with the Amazon FalconLite LLM (which is a quantized version of the Falcon 40B SFT OASST-TOP1 model) hosted in Hugging Face Hub:

from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
from langchain_experimental.comprehend_moderation import AmazonComprehendModerationChain

template = “””Question: {question}
Answer:”””
repo_id = “amazon/FalconLite”
prompt = PromptTemplate(template=template, input_variables=[“question”])
llm = HuggingFaceHub(
repo_id=repo_id,
model_kwargs={“temperature”: 0.5, “max_length”: 256}
)
comprehend_moderation = AmazonComprehendModerationChain(verbose=True)
chain = (
prompt
| comprehend_moderation
| { “input” : (lambda x: x[‘output’]) | llm }
| comprehend_moderation
)

try:
response = chain.invoke({“question”: “An SSN is of the format 123-45-6789. Can you give me John Doe’s SSN?”})
except Exception as e:
print(str(e))
else:
print(response[‘output’])

In the preceding example, we augment our chain with comprehend_moderation for both text going into the LLM and text generated by the LLM. This will perform default moderation that will check PII, toxicity, and prompt safety classification in that sequence.
Customize your moderation with filter configurations
You can use the AmazonComprehendModerationChain with specific configurations, which gives you the ability to control what moderations you wish to perform in your generative AI–based application. At the core of the configuration, you have three filter configurations available.

ModerationPiiConfig – Used to configure PII filter.
ModerationToxicityConfig – Used to configure toxic content filter.
ModerationIntentConfig – Used to configure intent filter.

You can use each of these filter configurations to customize the behavior of how your moderations behave. Each filter’s configurations have a few common parameters, and some unique parameters, that they can be initialized with. After you define the configurations, you use the BaseModerationConfig class to define the sequence in which the filters must apply to the text. For example, in the following code, we first define the three filter configurations, and subsequently specify the order in which they must apply:

from langchain_experimental.comprehend_moderation
import (BaseModerationConfig,
ModerationPromptSafetyConfig,
ModerationPiiConfig,
ModerationToxicityConfig)

pii_config = ModerationPiiConfig(labels=[“SSN”],
redact=True,
mask_character=”X”)
toxicity_config = ModerationToxicityConfig(threshold=0.6)
prompt_safety_config = ModerationPromptSafetyConfig(threshold=0.8)
moderation_config = BaseModerationConfig(filters=[ toxicity_config,
pii_config,
prompt_safety_config])
comprehend_moderation = AmazonComprehendModerationChain(moderation_config=moderation_config)

Let’s dive a little deeper to understand what this configuration achieves:

First, for the toxicity filter, we specified a threshold of 0.6. This means that if the text contains any of the available toxic labels or entities with a score greater than the threshold, the whole chain will be interrupted.
If there is no toxic content found in the text, a PII check is In this case, we’re interested in checking if the text contains SSN values. Because the redact parameter is set to True, the chain will mask the detected SSN values (if any) where the SSN entitiy’s confidence score is greater than or equal to 0.5, with the mask character specified (X). If redact is set to False, the chain will be interrupted for any SSN detected.
Finally, the chain performs prompt safety classification, and will stop the content from propagating further down the chain if the content is classified with UNSAFE_PROMPT with a confidence score of greater than or equal to 0.8.

The following diagram illustrates this workflow.

In case of interruptions to the moderation chain (in this example, applicable for the toxicity and prompt safety classification filters), the chain will raise a Python exception, essentially stopping the chain in progress and allowing you to catch the exception (in a try-catch block) and perform any relevant action. The three possible exception types are:

ModerationPIIError
ModerationToxicityError
ModerationPromptSafetyError

You can configure one filter or more than one filter using BaseModerationConfig. You can also have the same type of filter with different configurations within the same chain. For example, if your use case is only concerned with PII, you can specify a configuration that must interrupt the chain if in case an SSN is detected; otherwise, it must perform redaction on age and name PII entities. A configuration for this can be defined as follows:

pii_config1 = ModerationPiiConfig(labels=[“SSN”],
redact=False)
pii_config2 = ModerationPiiConfig(labels=[“AGE”, “NAME”],
redact=True,
mask_character=”X”)
moderation_config = BaseModerationConfig(filters=[ pii_config1,
pii_config2])
comprehend_moderation = AmazonComprehendModerationChain(moderation_config=moderation_config)

Using callbacks and unique identifiers
If you’re familiar with the concept of workflows, you may also be familiar with callbacks. Callbacks within workflows are independent pieces of code that run when certain conditions are met within the workflow. A callback can either be blocking or nonblocking to the workflow. LangChain chains are, in essence, workflows for LLMs. AmazonComprehendModerationChain allows you to define your own callback functions. Initially, the implementation is limited to asynchronous (nonblocking) callback functions only.
This effectively means that if you use callbacks with the moderation chain, they will run independently of the chain’s run without blocking it. For the moderation chain, you get options to run pieces of code, with any business logic, after each moderation is run, independent of the chain.
You can also optionally provide an arbitrary unique identifier string when creating an AmazonComprehendModerationChain to enable logging and analytics later. For example, if you’re operating a chatbot powered by an LLM, you may want to track users who are consistently abusive or are deliberately or unknowingly exposing personal information. In such cases, it becomes necessary to track the origin of such prompts and perhaps store them in a database or log them appropriately for further action. You can pass a unique ID that distinctly identifies a user, such as their user name or email, or an application name that is generating the prompt.
The combination of callbacks and unique identifiers provides you with a powerful way to implement a moderation chain that fits your use case in a much more cohesive manner with less code that is easier to maintain. The callback handler is available via the BaseModerationCallbackHandler, with three available callbacks: on_after_pii(), on_after_toxicity(), and on_after_prompt_safety(). Each of these callback functions is called asynchronously after the respective moderation check is performed within the chain. These functions also receive two default parameters:

moderation_beacon – A dictionary containing details such as the text on which the moderation was performed, the full JSON output of the Amazon Comprehend API, the type of moderation, and if the supplied labels (in the configuration) were found within the text or not
unique_id – The unique ID that you assigned while initializing an instance of the AmazonComprehendModerationChain.

The following is an example of how an implementation with callback works. In this case, we defined a single callback that we want the chain to run after the PII check is performed:

from langchain_experimental.comprehend_moderation import BaseModerationCallbackHandler

class MyModCallback(BaseModerationCallbackHandler):
async def on_after_pii(self, output_beacon, unique_id):
import json
moderation_type = output_beacon[‘moderation_type’]
chain_id = output_beacon[‘moderation_chain_id’]
with open(f’output-{moderation_type}-{chain_id}.json’, ‘w’) as file:
data = { ‘beacon_data’: output_beacon, ‘unique_id’: unique_id }
json.dump(data, file)

”’
# implement this callback for toxicity
async def on_after_toxicity(self, output_beacon, unique_id):
pass

# implement this callback for prompt safety
async def on_after_prompt_safety(self, output_beacon, unique_id):
pass
”’

my_callback = MyModCallback()

We then use the my_callback object while initializing the moderation chain and also pass a unique_id. You may use callbacks and unique identifiers with or without a configuration. When you subclass BaseModerationCallbackHandler, you must implement one or all of the callback methods depending on the filters you intend to use. For brevity, the following example shows a way to use callbacks and unique_id without any configuration:

comprehend_moderation = AmazonComprehendModerationChain(
moderation_callback = my_callback,
unique_id = ‘john.doe@email.com’)

The following diagram explains how this moderation chain with callbacks and unique identifiers works. Specifically, we implemented the PII callback that should write a JSON file with the data available in the moderation_beacon and the unique_id passed (the user’s email in this case).

In the following Python notebook, we have compiled a few different ways you can configure and use the moderation chain with various LLMs, such as LLMs hosted with Amazon SageMaker JumpStart and hosted in Hugging Face Hub. We have also included the sample chat application that we discussed earlier with the following Python notebook.
Conclusion
The transformative potential of large language models and generative AI is undeniable. However, their responsible and ethical use hinges on addressing concerns of trust and safety. By recognizing the challenges and actively implementing measures to mitigate risks, developers, organizations, and society at large can harness the benefits of these technologies while preserving the trust and safety that underpin their successful integration. Use Amazon Comprehend ContentModerationChain to add trust and safety features to any LLM workflow, including Retrieval Augmented Generation (RAG) workflows implemented in LangChain.
For information on building RAG based solutions using LangChain and Amazon Kendra’s highly accurate, machine learning (ML)-powered intelligent search, see – Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. As a next step, refer to the code samples we created for using Amazon Comprehend moderation with LangChain. For full documentation of the Amazon Comprehend moderation chain API, refer to the LangChain API documentation.

About the authors
Wrick Talukdar is a Senior Architect with the Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.
Anjan Biswas is a Senior AI Services Solutions Architect with a focus on AI/ML and Data Analytics. Anjan is part of the world-wide AI services team and works with customers to help them understand and develop solutions to business problems with AI and ML. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations, and is actively helping customers get started and scale on AWS AI services.
Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, and analytics. In his spare time, he enjoys playing badminton with his daughter and exploring the outdoors.
Chin Rane is an AI/ML Specialist Solutions Architect at Amazon Web Services. She is passionate about applied mathematics and machine learning. She focuses on designing intelligent document processing solutions for AWS customers. Outside of work, she enjoys salsa and bachata dancing.

Use machine learning without writing a single line of code with Amazon …

In the recent past, using machine learning (ML) to make predictions, especially for data in the form of text and images, required extensive ML knowledge for creating and tuning of deep learning models. Today, ML has become more accessible to any user who wants to use ML models to generate business value. With Amazon SageMaker Canvas, you can create predictions for a number of different data types beyond just tabular or time series data without writing a single line of code. These capabilities include pre-trained models for image, text, and document data types.
In this post, we discuss how you can use pre-trained models to retrieve predictions for supported data types beyond tabular data.
Text data
SageMaker Canvas provides a visual, no-code environment for building, training, and deploying ML models. For natural language processing (NLP) tasks, SageMaker Canvas integrates seamlessly with Amazon Comprehend to allow you to perform key NLP capabilities like language detection, entity recognition, sentiment analysis, topic modeling, and more. The integration eliminates the need for any coding or data engineering to use the robust NLP models of Amazon Comprehend. You simply provide your text data and select from four commonly used capabilities: sentiment analysis, language detection, entities extraction, and personal information detection. For each scenario, you can use the UI to test and use batch prediction to select data stored in Amazon Simple Storage Service (Amazon S3).

Sentiment analysis
With sentiment analysis, SageMaker Canvas allows you to analyze the sentiment of your input text. It can determine if the overall sentiment is positive, negative, mixed, or neutral, as shown in the following screenshot. This is useful in situations like analyzing product reviews. For example, the text “I love this product, it’s amazing!” would be classified by SageMaker Canvas as having a positive sentiment, whereas “This product is horrible, I regret buying it” would be labeled as negative sentiment.

Entities extraction
SageMaker Canvas can analyze text and automatically detect entities mentioned within it. When a document is sent to SageMaker Canvas for analysis, it will identify people, organizations, locations, dates, quantities, and other entities in the text. This entity extraction capability enables you to quickly gain insights into the key people, places, and details discussed in documents. For a list of supported entities, refer to Entities.

Language detection
SageMaker Canvas can also determine the dominant language of text using Amazon Comprehend. It analyzes text to identify the main language and provides confidence scores for the detected dominant language, but doesn’t indicate percentage breakdowns for multilingual documents. For best results with long documents in multiple languages, split the text into smaller pieces and aggregate the results to estimate language percentages. It works best with at least 20 characters of text.

Personal information detection
You can also protect sensitive data using personal information detection with SageMaker Canvas. It can analyze text documents to automatically detect personally identifiable information (PII) entities, allowing you to locate sensitive data like names, addresses, dates of birth, phone numbers, email addresses, and more. It analyzes documents up to 100 KB and provides a confidence score for each detected entity so you can review and selectively redact the most sensitive information. For a list of entities detected, refer to Detecting PII entities.

Image data
SageMaker Canvas provides a visual, no-code interface that makes it straightforward for you to use computer vision capabilities by integrating with Amazon Rekognition for image analysis. For example, you can upload a dataset of images, use Amazon Rekognition to detect objects and scenes, and perform text detection to address a wide range of use cases. The visual interface and Amazon Rekognition integration make it possible for non-developers to harness advanced computer vision techniques.

Object detection in images
SageMaker Canvas uses Amazon Rekognition to detect labels (objects) in an image. You can upload the image from the SageMaker Canvas UI or use the Batch Prediction tab to select images stored in an S3 bucket. As shown in the following example, it can extract objects in the image such as clock tower, bus, buildings, and more. You can use the interface to search through the prediction results and sort them.

Text detection in images
Extracting text from images is a very common use case. Now, you can perform this task with ease on SageMaker Canvas with no code. The text is extracted as line items, as shown in the following screenshot. Short phrases within the image are classified together and identified as a phrase.

You can perform batch predictions by uploading a set of images, extract all the images in a single batch job, and download the results as a CSV file. This solution is useful when you want to extract and detect text in images.
Document data
SageMaker Canvas offers a variety of ready-to-use solutions that solve your day-to-day document understanding needs. These solutions are powered by Amazon Textract. To view all the available options for documents, choose to Ready-to-use models in the navigation pane and filter by Documents, as shown in the following screenshot.

Document analysis
Document analysis analyzes documents and forms for relationships among detected text. The operations return four categories of document extraction: raw text, forms, tables, and signatures. The solution’s capability of understanding the document structure gives you extra flexibility in the type of data you want to extract from the documents. The following screenshot is an example of what table detection looks like.

This solution is able to understand layouts of complex documents, which is helpful when you need to extract specific information in your documents.
Identity document analysis
This solution is designed to analyze documents like personal identification cards, driver’s licenses, or other similar forms of identification. Information such as middle name, county, and place of birth, together with its individual confidence score on the accuracy, will be returned for each identity document, as shown in the following screenshot.

There is an option to do batch prediction, whereby you can bulk upload sets of identification documents and process them as a batch job. This provides a quick and seamless way to transform identification document details into key-value pairs that can be used for downstream processes such as data analysis.
Expense analysis
Expense analysis is designed to analyze expense documents like invoices and receipts. The following screenshot is an example of what the extracted information looks like.

The results are returned as summary fields and line item fields. Summary fields are key-value pairs extracted from the document, and contain keys such as Grand Total, Due Date, and Tax. Line item fields refer to data that is structured as a table in the document. This is useful for extracting information from the document while retaining its layout.
Document queries
Document queries are designed for you to ask questions about your documents. This is a great solution to use when you have multi-page documents and you want to extract very specific answers from your documents. The following is an example of the types of questions you can ask and what the extracted answers look like.

The solution provides a straightforward interface for you to interact with your documents. This is helpful when you want to get specific details within large documents.
Conclusion
SageMaker Canvas provides a no-code environment to use ML with ease across various data types like text, images, and documents. The visual interface and integration with AWS services like Amazon Comprehend, Amazon Rekognition, and Amazon Textract eliminates the need for coding and data engineering. You can analyze text for sentiment, entities, languages, and PII. For images, object and text detection enables computer vision use cases. Finally, document analysis can extract text while preserving its layout for downstream processes. The ready-to-use solutions in SageMaker Canvas make it possible for you to harness advanced ML techniques to generate insights from both structured and unstructured data. If you’re interested using no-code tools with ready-to-use ML models, try out SageMaker Canvas today. For more information, refer to Getting started with using Amazon SageMaker Canvas.

About the authors
Julia Ang is a Solutions Architect based in Singapore. She has worked with customers in a range of fields, from health and public sector to digital native businesses, to adopt solutions according to their business needs. She has also been supporting customers in Southeast Asia and beyond to use AI & ML in their businesses. Outside of work, she enjoys learning about the world through traveling and engaging in creative pursuits.
Loke Jun Kai is a Specialist Solutions Architect for AI/ML based in Singapore. He works with customer across ASEAN to architect machine learning solutions at scale in AWS. Jun Kai is an advocate for Low-Code No-Code machine learning tools. In his spare time, he enjoys being with the nature.

Explore advanced techniques for hyperparameter optimization with Amazo …

Creating high-performance machine learning (ML) solutions relies on exploring and optimizing training parameters, also known as hyperparameters. Hyperparameters are the knobs and levers that we use to adjust the training process, such as learning rate, batch size, regularization strength, and others, depending on the specific model and task at hand. Exploring hyperparameters involves systematically varying the values of each parameter and observing the impact on model performance. Although this process requires additional efforts, the benefits are significant. Hyperparameter optimization (HPO) can lead to faster training times, improved model accuracy, and better generalization to new data.
We continue our journey from the post Optimize hyperparameters with Amazon SageMaker Automatic Model Tuning. We previously explored a single job optimization, visualized the outcomes for SageMaker built-in algorithm, and learned about the impact of particular hyperparameter values. On top of using HPO as a one-time optimization at the end of the model creation cycle, we can also use it across multiple steps in a conversational manner. Each tuning job helps us get closer to a good performance, but additionally, we also learn how sensitive the model is to certain hyperparameters and can use this understanding to inform the next tuning job. We can revise the hyperparameters and their value ranges based on what we learned and therefore turn this optimization effort into a conversation. And in the same way that we as ML practitioners accumulate knowledge over these runs, Amazon SageMaker Automatic Model Tuning (AMT) with warm starts can maintain this knowledge acquired in previous tuning jobs for the next tuning job as well.
In this post, we run multiple HPO jobs with a custom training algorithm and different HPO strategies such as Bayesian optimization and random search. We also put warm starts into action and visually compare our trials to refine hyperparameter space exploration.
Advanced concepts of SageMaker AMT
In the next sections, we take a closer look at each of the following topics and show how SageMaker AMT can help you implement them in your ML projects:

Use custom training code and the popular ML framework Scikit-learn in SageMaker Training
Define custom evaluation metrics based on the logs for evaluation and optimization
Perform HPO using an appropriate strategy
Use warm starts to turn a single hyperparameter search into a dialog with our model
Use advanced visualization techniques using our solution library to compare two HPO strategies and tuning jobs results

Whether you’re using the built-in algorithms used in our first post or your own training code, SageMaker AMT offers a seamless user experience for optimizing ML models. It provides key functionality that allows you to focus on the ML problem at hand while automatically keeping track of the trials and results. At the same time, it automatically manages the underlying infrastructure for you.
In this post, we move away from a SageMaker built-in algorithm and use custom code. We use a Random Forest from SkLearn. But we stick to the same ML task and dataset as in our first post, which is detecting handwritten digits. We cover the content of the Jupyter notebook 2_advanced_tuning_with_custom_training_and_visualizing.ipynb and invite you to invoke the code side by side to read further.
Let’s dive deeper and discover how we can use custom training code, deploy it, and run it, while exploring the hyperparameter search space to optimize our results.
How to build an ML model and perform hyperparameter optimization
What does a typical process for building an ML solution look like? Although there are many possible use cases and a large variety of ML tasks out there, we suggest the following mental model for a stepwise approach:

Understand your ML scenario at hand and select an algorithm based on the requirements. For example, you might want to solve an image recognition task using a supervised learning algorithm. In this post, we continue to use the handwritten image recognition scenario and the same dataset as in our first post.
Decide which implementation of the algorithm in SageMaker Training you want to use. There are various options, inside SageMaker or external ones. Additionally, you need to define which underlying metric fits best for your task and you want to optimize for (such as accuracy, F1 score, or ROC). SageMaker supports four options depending on your needs and resources:

Use a pre-trained model via Amazon SageMaker JumpStart, which you can use out of the box or just fine-tune it.
Use one of the built-in algorithms for training and tuning, like XGBoost, as we did in our previous post.
Train and tune a custom model based on one of the major frameworks like Scikit-learn, TensorFlow, or PyTorch. AWS provides a selection of pre-made Docker images for this purpose. For this post, we use this option, which allows you to experiment quickly by running your own code on top of a pre-made container image.
Bring your own custom Docker image in case you want to use a framework or software that is not otherwise supported. This option requires the most effort, but also provides the highest degree of flexibility and control.

Train the model with your data. Depending on the algorithm implementation from the previous step, this can be as simple as referencing your training data and running the training job or by additionally providing custom code for training. In our case, we use some custom training code in Python based on Scikit-learn.
Apply hyperparameter optimization (as a “conversation” with your ML model). After the training, you typically want to optimize the performance of your model by finding the most promising combination of values for your algorithm’s hyperparameters.

Depending on your ML algorithm and model size, the last step of hyperparameter optimization may turn out to be a bigger challenge than expected. The following questions are typical for ML practitioners at this stage and might sound familiar to you:

What kind of hyperparameters are impactful for my ML problem?
How can I effectively search a huge hyperparameter space to find those best-performing values?
How does the combination of certain hyperparameter values influence my performance metric?
Costs matter; how can I use my resources in an efficient manner?
What kind of tuning experiments are worthwhile, and how can I compare them?

It’s not easy to answer these questions, but there is good news. SageMaker AMT takes the heavy lifting from you, and lets you concentrate on choosing the right HPO strategy and value ranges you want to explore. Additionally, our visualization solution facilitates the iterative analysis and experimentation process to efficiently find well-performing hyperparameter values.
In the next sections, we build a digit recognition model from scratch using Scikit-learn and show all these concepts in action.
Solution overview
SageMaker offers some very handy features to train, evaluate, and tune our model. It covers all functionality of an end-to-end ML lifecycle, so we don’t even need to leave our Jupyter notebook.
In our first post, we used the SageMaker built-in algorithm XGBoost. For demonstration purposes, this time we switch to a Random Forest classifier because we can then show how to provide your own training code. We opted for providing our own Python script and using Scikit-learn as our framework. Now, how do we express that we want to use a specific ML framework? As we will see, SageMaker uses another AWS service in the background to retrieve a pre-built Docker container image for training—Amazon Elastic Container Registry (Amazon ECR).
We cover the following steps in detail, including code snippets and diagrams to connect the dots. As mentioned before, if you have the chance, open the notebook and run the code cells step by step to create the artifacts in your AWS environment. There is no better way of active learning.

First, load and prepare the data. We use Amazon Simple Storage Service (Amazon S3) to upload a file containing our handwritten digits data.
Next, prepare the training script and framework dependencies. We provide the custom training code in Python, reference some dependent libraries, and make a test run.
To define the custom objective metrics, SageMaker lets us define a regular expression to extract the metrics we need from the container log files.
Train the model using the scikit-learn framework. By referencing a pre-built container image, we create a corresponding Estimator object and pass our custom training script.
AMT enables us to try out various HPO strategies. We concentrate on two of them for this post: random search and Bayesian search.
Choose between SageMaker HPO strategies.
Visualize, analyze, and compare tuning results. Our visualization package allows us to discover which strategy performs better and which hyperparameter values deliver the best performance based on our metrics.
Continue the exploration of the hyperparameter space and warm start HPO jobs.

AMT takes care of scaling and managing the underlying compute infrastructure to run the various tuning jobs on Amazon Elastic Compute Cloud (Amazon EC2) instances. This way, you don’t need to burden yourself to provision instances, handle any operating system and hardware issues, or aggregate log files on your own. The ML framework image is retrieved from Amazon ECR and the model artifacts including tuning results are stored in Amazon S3. All logs and metrics are collected in Amazon CloudWatch for convenient access and further analysis if needed.
Prerequisites
Because this is a continuation of a series, it is recommended, but not necessarily required, to read our first post about SageMaker AMT and HPO. Apart from that, basic familiarity with ML concepts and Python programming is helpful. We also recommend following along with each step in the accompanying notebook from our GitHub repository while reading this post. The notebook can be run independently from the first one, but needs some code from subfolders. Make sure to clone the full repository in your environment as described in the README file.
Experimenting with the code and using the interactive visualization options greatly enhances your learning experience. So, please check it out.
Load and prepare the data
As a first step, we make sure the downloaded digits data we need for training is accessible to SageMaker. Amazon S3 allows us to do this in a safe and scalable way. Refer to the notebook for the complete source code and feel free to adapt it with your own data.

sm_sess = sagemaker.session.Session(boto_session=boto_sess, sagemaker_client=sm)
BUCKET = sm_sess.default_bucket()
PREFIX = ‘amt-visualize-demo’
s3_data_url = f’s3://{BUCKET}/{PREFIX}/data’
digits = datasets.load_digits()
digits_df = pd.DataFrame(digits.data)
digits_df[‘y’] = digits.target
digits_df.to_csv(‘data/digits.csv’, index=False)
!aws s3 sync data/ {s3_data_url} —exclude ‘*’ —include ‘digits.csv’

The digits.csv file contains feature data and labels. Each digit is represented by pixel values in an 8×8 image, as depicted by the following image for the digit 4.
Prepare the training script and framework dependencies
Now that the data is stored in our S3 bucket, we can define our custom training script based on Scikit-learn in Python. SageMaker gives us the option to simply reference the Python file later for training. Any dependencies like the Scikit-learn or pandas libraries can be provided in two ways:

They can be specified explicitly in a requirements.txt file
They are pre-installed in the underlying ML container image, which is either provided by SageMaker or custom-built

Both options are generally considered standard ways for dependency management, so you might already be familiar with it. SageMaker supports a variety of ML frameworks in a ready-to-use managed environment. This includes many of the most popular data science and ML frameworks like PyTorch, TensorFlow, or Scikit-learn, as in our case. We don’t use an additional requirements.txt file, but feel free to add some libraries to try it out.
The code of our implementation contains a method called fit(), which creates a new classifier for the digit recognition task and trains it. In contrast to our first post where we used the SageMaker built-in XGBoost algorithm, we now use a RandomForestClassifier provided by the ML library sklearn. The call of the fit() method on the classifier object starts the training process using a subset (80%) of our CSV data:

def fit(train_dir, n_estimators, max_depth, min_samples_leaf, max_features, min_weight_fraction_leaf):

digits = pd.read_csv(Path(train_dir)/’digits.csv’)

Xtrain, Xtest, ytrain, ytest = train_test_split(digits.iloc[:, :-1], digits.iloc[:, -1], test_size=.2)

m = RandomForestClassifier(n_estimators=n_estimators,
max_depth=max_depth,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
min_weight_fraction_leaf=min_weight_fraction_leaf)
m.fit(Xtrain, ytrain)
predicted = m.predict(Xtest)
pre, rec, f1, _ = precision_recall_fscore_support(ytest, predicted, pos_label=1, average=’weighted’)

print(f’pre: {pre:5.3f} rec: {rec:5.3f} f1: {f1:5.3}’)

return m

See the full script in our Jupyter notebook on GitHub.
Before you spin up container resources for the full training process, did you try to run the script directly? This is a good practice to quickly ensure the code has no syntax errors, check for matching dimensions of your data structures, and some other errors early on.
There are two ways to run your code locally. First, you can run it right away in the notebook, which also allows you to use the Python Debugger pdb:

# Running the code from within the notebook. It would then be possible to use the Python Debugger, pdb.
from train import fit
fit(‘data’, 100, 10, 1, ‘auto’, 0.01)

Alternatively, run the train script from the command line in the same way you may want to use it in a container. This also supports setting various parameters and overwriting the default values as needed, for example:

!cd src && python train.py –train ../data/ –model-dir /tmp/ –n-estimators 100

As output, you can see the first results for the model’s performance based on the objective metrics precision, recall, and F1-score. For example, pre: 0.970 rec: 0.969 f1: 0.969.
Not bad for such a quick training. But where did these numbers come from and what do we do with them?
Define custom objective metrics
Remember, our goal is to fully train and tune our model based on the objective metrics we consider relevant for our task. Because we use a custom training script, we need to define those metrics for SageMaker explicitly.
Our script emits the metrics precision, recall, and F1-score during training simply by using the print function:

print(f’pre: {pre:5.3f} rec: {rec:5.3f} f1: {f1:5.3}’)

The standard output is captured by SageMaker and sent to CloudWatch as a log stream. To retrieve the metric values and work with them later in SageMaker AMT, we need to provide some information on how to parse that output. We can achieve this by defining regular expression statements (for more information, refer to Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics):

metric_definitions = [
{‘Name’: ‘valid-precision’, ‘Regex’: r’pre:s+(-?[0-9.]+)’},
{‘Name’: ‘valid-recall’, ‘Regex’: r’rec:s+(-?[0-9.]+)’},
{‘Name’: ‘valid-f1’, ‘Regex’: r’f1:s+(-?[0-9.]+)’}]

Let’s walk through the first metric definition in the preceding code together. SageMaker will look for output in the log that starts with pre: and is followed by one or more whitespace and then a number that we want to extract, which is why we use the round parenthesis. Every time SageMaker finds a value like that, it turns it into a CloudWatch metric with the name valid-precision.
Train the model using the Scikit-learn framework
After we create our training script train.py and instruct SageMaker on how to monitor the metrics within CloudWatch, we define a SageMaker Estimator object. It initiates the training job and uses the instance type we specify. But how can this instance type be different from the one you run an Amazon SageMaker Studio notebook on, and why? SageMaker Studio runs your training (and inference) jobs on separate compute instances than your notebook. This allows you to continue working in your notebook while the jobs run in the background.
The parameter framework_version refers to the Scikit-learn version we use for our training job. Alternatively, we can pass image_uri to the estimator. You can check whether your favorite framework or ML library is available as a pre-built SageMaker Docker image and use it as is or with extensions.
Moreover, we can run SageMaker training jobs on EC2 Spot Instances by setting use_spot_instances to True. They are spare capacity instances that can save up to 90% of costs.  These instances provide flexibility on when the training jobs are run.

estimator = SKLearn(
‘train.py’,
source_dir=’src’,
role=get_execution_role(),
instance_type= ‘ml.m5.large’,
instance_count=1,
framework_version=’0.23-1′,
metric_definitions=metric_definitions,

# Uncomment the following three lines to use Managed Spot Training
# use_spot_instances= True,
# max_run= 60 * 60 * 24,
# max_wait= 60 * 60 * 24,

hyperparameters = {‘n-estimators’: 100,
‘max-depth’: 10,
‘min-samples-leaf’: 1,
‘max-features’: ‘auto’,
‘min-weight-fraction-leaf’: 0.1}
)

After the Estimator object is set up, we start the training by calling the fit() function, supplying the path to the training dataset on Amazon S3. We can use this same method to provide validation and test data. We set the wait parameter to True so we can use the trained model in the subsequent code cells.
estimator.fit({‘train’: s3_data_url}, wait=True)
Define hyperparameters and run tuning jobs
So far, we have trained the model with one set of hyperparameter values. But were those values good? Or could we look for better ones? Let’s use the HyperparameterTuner class to run a systematic search over the hyperparameter space. How do we search this space with the tuner? The necessary parameters are the objective metric name and objective type that will guide the optimization. The optimization strategy is another key argument for the tuner because it further defines the search space. The following are four different strategies to choose from:

Grid search
Random search
Bayesian optimization (default)
Hyperband

We further describe these strategies and equip you with some guidance to choose one later in this post.
Before we define and run our tuner object, let’s recap our understanding from an architecture perspective. We covered the architectural overview of SageMaker AMT in our last post and reproduce an excerpt of it here for convenience.

We can choose what hyperparameters we want to tune or leave static. For dynamic hyperparameters, we provide hyperparameter_ranges that can be used to optimize for tunable hyperparameters. Because we use a Random Forest classifier, we have utilized the hyperparameters from the Scikit-learn Random Forest documentation.
We also limit resources with the maximum number of training jobs and parallel training jobs the tuner can use. We will see how these limits help us compare the results of various strategies with each other.

tuner_parameters = {
‘estimator’: estimator,
‘base_tuning_job_name’: ‘random’,
‘metric_definitions’: metric_definitions,
‘objective_metric_name’: ‘valid-f1’,
‘objective_type’: ‘Maximize’,
‘hyperparameter_ranges’: hpt_ranges,
‘strategy’: ‘Random’,
‘max_jobs’: n, # 50
‘max_parallel_jobs’: k # 2
}

Similar to the Estimator’s fit function, we start a tuning job calling the tuner’s fit:

random_tuner = HyperparameterTuner(**tuner_parameters)
random_tuner.fit({‘train’: s3_data_url}, wait=False)

This is all we have to do to let SageMaker run the training jobs (n=50) in the background, each using a different set of hyperparameters. We explore the results later in this post. But before that, let’s start another tuning job, this time applying the Bayesian optimization strategy. We will compare both strategies visually after their completion.

tuner_parameters[‘strategy’] = ‘Bayesian’
tuner_parameters[‘base_tuning_job_name’] = ‘bayesian’
bayesian_tuner = HyperparameterTuner(**tuner_parameters)
bayesian_tuner.fit({‘train’: s3_data_url}, wait=False)

Note that both tuner jobs can run in parallel because SageMaker orchestrates the required compute instances independently of each other. That’s quite helpful for practitioners who experiment with different approaches at the same time, like we do here.
Choose between SageMaker HPO strategies
When it comes to tuning strategies, you have a few options with SageMaker AMT: grid search, random search, Bayesian optimization, and Hyperband. These strategies determine how the automatic tuning algorithms explore the specified ranges of hyperparameters.
Random search is pretty straightforward. It randomly selects combinations of values from the specified ranges and can be run in a sequential or parallel manner. It’s like throwing darts blindfolded, hoping to hit the target. We have started with this strategy, but will the results improve with another one?
Bayesian optimization takes a different approach than random search. It considers the history of previous selections and chooses values that are likely to yield the best results. If you want to learn from previous explorations, you can achieve this only with running a new tuning job after the previous ones. Makes sense, right? In this way, Bayesian optimization is dependent on the previous runs. But do you see what HPO strategy allows for higher parallelization?
Hyperband is an interesting one! It uses a multi-fidelity strategy, which means it dynamically allocates resources to the most promising training jobs and stops those that are underperforming. Therefore, Hyperband is computationally efficient with resources, learning from previous training jobs. After stopping the underperforming configurations, a new configuration starts, and its values are chosen randomly.
Depending on your needs and the nature of your model, you can choose between random search, Bayesian optimization, or Hyperband as your tuning strategy. Each has its own approach and advantages, so it’s important to consider which one works best for your ML exploration. The good news for ML practitioners is that you can select the best HPO strategy by visually comparing the impact of each trial on the objective metric. In the next section, we see how to visually identify the impact of different strategies.
Visualize, analyze, and compare tuning results
When our tuning jobs are complete, it gets exciting. What results do they deliver? What kind of boost can you expect on our metric compared to your base model? What are the best-performing hyperparameters for our use case?
A quick and straightforward way to view the HPO results is by visiting the SageMaker console. Under Hyperparameter tuning jobs, we can see (per tuning job) the combination of hyperparameter values that have been tested and delivered the best performance as measured by our objective metric (valid-f1).

Is that all you need? As an ML practitioner, you may be not only interested in those values, but certainly want to learn more about the inner workings of your model to explore its full potential and strengthen your intuition with empirical feedback.
A good visualization tool can greatly help you understand the improvement by HPO over time and get empirical feedback on design decisions of your ML model. It shows the impact of each individual hyperparameter on your objective metric and provides guidance to further optimize your tuning results.
We use the amtviz custom visualization package to visualize and analyze tuning jobs. It’s straightforward to use and provides helpful features. We demonstrate its benefit by interpreting some individual charts, and finally comparing random search side by side with Bayesian optimization.
First, let’s create a visualization for random search. We can do this by calling visualize_tuning_job() from amtviz and passing our first tuner object as an argument:

from amtviz import visualize_tuning_job
visualize_tuning_job(random_tuner, advanced=True, trials_only=True)

You will see a couple of charts, but let’s take it step by step. The first scatter plot from the output looks like the following and already gives us some visual clues we wouldn’t recognize in any table.

Each dot represents the performance of an individual training job (our objective valid-f1 on the y-axis) based on its start time (x-axis), produced by a specific set of hyperparameters. Therefore, we look at the performance of our model as it progresses over the duration of the tuning job.
The dotted line highlights the best result found so far and indicates improvement over time. The best two training jobs achieved an F1 score of around 0.91.
Besides the dotted line showing the cumulative progress, do you see a trend in the chart?
Probably not. And this is expected, because we’re viewing the results of the random HPO strategy. Each training job was run using a different but randomly selected set of hyperparameters. If we continued our tuning job (or ran another one with the same setting), we would probably see some better results over time, but we can’t be sure. Randomness is a tricky thing.
The next charts help you gauge the influence of hyperparameters on the overall performance. All hyperparameters are visualized, but for the sake of brevity, we focus on two of them: n-estimators and max-depth.

Our top two training jobs were using n-estimators of around 20 and 80, and max-depth of around 10 and 18, respectively. The exact hyperparameter values are displayed via tooltip for each dot (training job). They are even dynamically highlighted across all charts and give you a multi-dimensional view! Did you see that? Each hyperparameter is plotted against the objective metric, as a separate chart.
Now, what kind of insights do we get about n-estimators?
Based on the left chart, it seems that very low value ranges (below 10) more often deliver poor results compared to higher values. Therefore, higher values may help your model to perform better—interesting.
In contrast, the correlation of the max-depth hyperparameter to our objective metric is rather low. We can’t clearly tell which value ranges are performing better from a general perspective.
In summary, random search can help you find a well-performing set of hyperparameters even in a relatively short amount of time. Also, it isn’t biased towards a good solution but gives a balanced view of the search space. Your resource utilization, however, might not be very efficient. It continues to run training jobs with hyperparameters in value ranges that are known to deliver poor results.
Let’s examine the results of our second tuning job using Bayesian optimization. We can use amtviz to visualize the results in the same way as we did so far for the random search tuner. Or, even better, we can use the capability of the function to compare both tuning jobs in a single set of charts. Quite handy!

visualize_tuning_job([random_tuner, bayesian_tuner], advanced=True, trials_only=True)

There are more dots now because we visualize the results of all training jobs for both, the random search (orange dots) and the Bayesian optimization (blue dots). On the right side, you can see a density chart visualizing the distribution of all F1-scores. A majority of the training jobs achieved results in the upper part of the F1 scale (over 0.6)—that’s good!
What is the key takeaway here? The scatter plot clearly shows the benefit of Bayesian optimization. It delivers better results over time because it can learn from previous runs. That’s why we achieved significantly better results using Bayesian compared to random (0.967 vs. 0.919) with the same number of training jobs.
There is even more you can do with amtviz. Let’s drill in.
If you give SageMaker AMT the instruction to run a larger number of jobs for tuning, seeing many trials at once can get messy. That’s one of the reasons why we made these charts interactive. You can click and drag on every hyperparameter scatter plot to zoom in to certain value ranges and refine your visual interpretation of the results. All other charts are automatically updated. That’s pretty helpful, isn’t it? See the next charts as an example and try it for yourself in your notebook!

As a tuning maximalist, you may also decide that running another hyperparameter tuning job could further improve your model performance. But this time, a more specific range of hyperparameter values can be explored because you already know (roughly) where to expect better results. For example, you may choose to focus on values between 100–200 for n-estimators, as shown in the chart. This lets AMT focus on the most promising training jobs and increases your tuning efficiency.
To sum it up, amtviz provides you with a rich set of visualization capabilities that allow you to better understand the impact of your model’s hyperparameters on performance and enable smarter decisions in your tuning activities.
Continue the exploration of the hyperparameter space and warm start HPO jobs
We have seen that AMT helps us explore the hyperparameter search space efficiently. But what if we need multiple rounds of tuning to iteratively improve our results? As mentioned in the beginning, we want to establish an optimization feedback cycle—our “conversation” with the model. Do we need to start from scratch every time?
Let’s look into the concept of running a warm start hyperparameter tuning job. It doesn’t initiate new tuning jobs from scratch, it reuses what has been learned in the previous HPO runs. This helps us be more efficient with our tuning time and compute resources. We can further iterate on top of our previous results. To use warm starts, we create a WarmStartConfig and specify warm_start_type as IDENTICAL_DATA_AND_ALGORITHM. This means that we change the hyperparameter values but we don’t change the data or algorithm. We tell AMT to transfer the previous knowledge to our new tuning job.
By referring to our previous Bayesian optimization and random search tuning jobs as parents, we can use them both for the warm start:

warm_start_config = WarmStartConfig(warm_start_type=WarmStartTypes.IDENTICAL_DATA_AND_ALGORITHM,
parents=[bayesian_tuner_name, random_tuner_name])
tuner_parameters[‘warm_start_config’] = warm_start_config

To see the benefit of using warm starts, refer to the following charts. These are generated by amtviz in a similar way as we did earlier, but this time we have added another tuning job based on a warm start.

In the left chart, we can observe that new tuning jobs mostly lie in the upper-right corner of the performance metric graph (see dots marked in orange). The warm start has indeed reused the previous results, which is why those data points are in the top results for F1 score. This improvement is also reflected in the density chart on the right.
In other words, AMT automatically selects promising sets of hyperparameter values based on its knowledge from previous trials. This is shown in the next chart. For example, the algorithm would test a low value for n-estimators less often because these are known to produce poor F1 scores. We don’t waste any resources on that, thanks to warm starts.

Clean up
To avoid incurring unwanted costs when you’re done experimenting with HPO, you must remove all files in your S3 bucket with the prefix amt-visualize-demo and also shut down SageMaker Studio resources.
Run the following code in your notebook to remove all S3 files from this post:

!aws s3 rm s3://{BUCKET}/amt-visualize-demo –recursive

If you wish to keep the datasets or the model artifacts, you may modify the prefix in the code to amt-visualize-demo/data to only delete the data or amt-visualize-demo/output to only delete the model artifacts.
Conclusion
We have learned how the art of building ML solutions involves exploring and optimizing hyperparameters. Adjusting those knobs and levers is a demanding yet rewarding process that leads to faster training times, improved model accuracy, and overall better ML solutions. The SageMaker AMT functionality helps us run multiple tuning jobs and warm start them, and provides data points for further review, visual comparison, and analysis.
In this post, we looked into HPO strategies that we use with SageMaker AMT. We started with random search, a straightforward but performant strategy where hyperparameters are randomly sampled from a search space. Next, we compared the results to Bayesian optimization, which uses probabilistic models to guide the search for optimal hyperparameters. After we identified a suitable HPO strategy and good hyperparameter value ranges through initial trials, we showed how to use warm starts to streamline future HPO jobs.
You can explore the hyperparameter search space by comparing quantitative results. We have suggested the side-by-side visual comparison and provided the necessary package for interactive exploration. Let us know in the comments how helpful it was for you on your hyperparameter tuning journey!

About the authors
Ümit Yoldas is a Senior Solutions Architect with Amazon Web Services. He works with enterprise customers across industries in Germany. He’s driven to translate AI concepts into real-world solutions. Outside of work, he enjoys time with family, savoring good food, and pursuing fitness.
Elina Lesyk is a Solutions Architect located in Munich. She is focusing on enterprise customers from the financial services industry. In her free time, you can find Elina building applications with generative AI at some IT meetups, driving a new idea on fixing climate change fast, or running in the forest to prepare for a half-marathon with a typical deviation from the planned schedule.
Mariano Kamp is a Principal Solutions Architect with Amazon Web Services. He works with banks and insurance companies in Germany on machine learning. In his spare time, Mariano enjoys hiking with his wife.

Intel Researchers Propose a New Artificial Intelligence Approach to De …

Large Language Models (LLMs) have taken the world by storm because of their remarkable performances and potential across a diverse range of tasks. They are best known for their capabilities in text generation, language understanding, text summarization and many more. The downside to their widespread adoption is the astronomical size of their model parameters, which requires significant memory capacity and specialized hardware for inference. As a result, deploying these models has been quite challenging.

One way the computational power required for inference could be reduced is by using quantization methods, i.e. reducing the precision of weights and activation functions of an artificial neural network. INT8 and weight-only quantization are a couple of ways the inference cost could be improved. These methods, however, are generally optimized for CUDA and may not necessarily work on CPUs.

The authors of this research paper from Intel have proposed an effective way of efficiently deploying LLMs on CPUs. Their approach supports automatic INT-4 weight-only quantization (low precision is applied to model weights only while that of activation functions is kept high) flow. They have also designed a specific LLM runtime that has highly optimized kernels that accelerate the inference process on CPUs.

The quantization flow is developed on the basis of an Intel Neural Compressor and allows for tuning on different quantization recipes, granularities, and group sizes to generate an INT4 model that meets the accuracy target. The model is then passed to the LLM runtime, a specialized environment designed to evaluate the performance of the quantized model. The runtime has been designed to provide an efficient inference of LLMs on CPUs.

For their experiments, the researchers selected some of the popular LLMs having a diverse range of parameter sizes (from 7B to 20B). They evaluated the performance of FP32 and INT4 models using open-source datasets. They observed that the accuracy of the quantized model on the selected datasets was nearly at par with that of the FP32 model. Additionally, they did a comparative analysis of the latency of the next token generation and found that the LLM runtime outperforms the ggml-based solution by up to 1.6 times.

In conclusion, this research paper presents a solution to one of the biggest challenges associated with LLMs, i.e., inference on CPUs. Traditionally, these models require specialized hardware like GPUs, which render them inaccessible for many organizations. This paper presents an INT4 model quantization along with a specialized LLM runtime to provide an efficient inference of LLMs on CPUs. When evaluated on a set of popular LLMs, the method demonstrated an advantage over ggml-based solutions and gave an accuracy on par with that of FP32 models. There is, however, scope for further improvement, and the researchers plan on empowering generative AI on PCs to meet the growing demands of AI-generated content.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Intel Researchers Propose a New Artificial Intelligence Approach to Deploy LLMs on CPUs More Efficiently appeared first on MarkTechPost.

Meta & GeorgiaTech Researchers Release a New Dataset and Associate …

The global community faces a challenge in tackling the impact of rising carbon dioxide (CO2) levels on climate change. To address this, innovative technologies are being developed. Direct Air Capture (DAC) is a very important approach. DAC involves capturing CO2 directly from the atmosphere, and its implementation is crucial in the fight against climate change. However, the high costs associated with DAC have hindered its widespread adoption.

An important aspect of DAC is its reliance on sorbent materials, and among the various options, Metal-Organic Frameworks (MOFs) have gained attention. MOFs offer advantages such as modularity, flexibility, and tunability. In contrast to conventional absorbent materials that require a lot of energy to be restored, Metal-Organic Frameworks (MOFs) offer a more energy-efficient alternative by allowing regeneration at lower temperatures. This makes MOFs a promising and environmentally friendly choice for various applications.

But, identifying suitable sorbents for DAC is a complex task due to the vast chemical space to explore and the need to understand material behaviour under different humidity and temperature conditions. Humidity, in particular, poses a significant challenge, as it can affect adsorption and lead to sorbent degradation over time. 

In response to this challenge, the OpenDAC project has emerged as a collaborative research effort between Fundamental AI Research (FAIR) at Meta and Georgia Tech. The primary goal of OpenDAC is to significantly reduce the cost of DAC by identifying novel sorbents — materials capable of efficiently pulling CO2 from the air. Discovering such sorbents is key to making DAC economically viable and scalable.

The researchers performed extensive research, resulting in the creation of the OpenDAC 2023 (ODAC23) dataset. This dataset is a compilation of over 38 million density functional theory (DFT) calculations on more than 8,800 MOF materials, encompassing adsorbed CO2 and H2O. ODAC23 is the largest dataset of MOF adsorption calculations at the DFT level, offering valuable insights into the properties and structural relaxation of MOFs.

Also, OpenDAC released the ODAC23 dataset to the broader research community and the emerging DAC industry. The aim is to foster collaboration and provide a foundational resource for developing machine learning (ML) models. 

Researchers can identify MOFs easily by approximating DFT-level calculations using cutting-edge machine-learning models trained on the ODAC23 dataset.

In conclusion, the OpenDAC project represents a significant advancement in improving Direct Air Capture’s (DAC) affordability and accessibility. By leveraging Metal-Organic Frameworks (MOF) strengths and employing cutting-edge computational methods, OpenDAC is well-positioned to drive progress in carbon capture technology. The ODAC23 dataset, now open to the public, marks a contribution to the collective effort to combat climate change, offering a wealth of information beyond DAC applications.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.
The post Meta & GeorgiaTech Researchers Release a New Dataset and Associated AI Models to Help Accelerate Research on Direct Air Capture to Combat Climate Change appeared first on MarkTechPost.

Future-Proofing Our Interns: Cultivating the Next Generation Amidst AI …

During my teaching at the Asia Pacific ESSEC Master in Management on the vibrant Singapore campus, we delved into a pivotal discussion about the intersection of AI and sustainability. It was in this forum that we unpacked the necessity of demystifying technology to see AI for its true capabilities—nothing more, nothing less.

Consider the insights of Harvard Business School’s own Karim Lakhani, who eloquently states, “The human-like responses are a statistical illusion.” Lakhani peels back the veneer, revealing that what appears as sentient interaction is nothing but “a statistical or computational illusion,” a mimicry born from extensive digestion of our human texts and videos[1]. This resonates with my longstanding perspective: AI, when harnessed with finesse, is not a usurper of roles but a powerful ally to human capability[2].

Yet, we must tread with caution. The very advancement that promises augmentation can, if left unchecked, pose significant repercussions for the burgeoning minds of future generations. The task at hand is not to shy away from these tools of incredible potential but to engage with them intelligently, ensuring that we steer the helm of AI towards augmenting human potential, not diminishing it.

As the digital age accelerates, AI’s burgeoning role in business uncovers an urgent need for a radical reimagining of intern and junior training. The advent of Large Language Models (LLMs) offers a stark warning—let’s not render our juniors obsolete. Consider this: these LLMs, the sprightly interns of the virtual realm, are already commandeering tasks once reserved for human neophytes. They draft memos, spruce up presentations, and never clock out, all while consuming only the ‘token money’ of computing resources. They’re dazzling in their efficiency, but let’s be clear: they are no substitute for the human intellect’s seasoning—the logic, reasoning, and creativity that only humans can provide.

In the bustling ecosystem of corporate growth, where nurturing newbies is akin to fostering delicate saplings in a forest of towering oaks, the intern’s role is pivotal. They’re the sponge, absorbing not just the technical know-how but also the nuanced dance of corporate culture. It’s in these early career days that the bedrock of their professional journey is laid, shaping the trajectory of their growth.

Yet, this indispensable induction faces a threat from AI, which can effortlessly automate the errands and even the analytics once reserved for the wide-eyed interns. If LLMs, which mimic human dialogue with the ease of a Jane Austen protagonist, can manage what was once an intern’s proving ground, how do we adapt? Google and its ilk have pioneered training programs that attempt to bridge the gap, but the real transformation lies in morphing the initial career years into a live-fire exercise—an ‘execution classroom’ where learning is doing, and doing begets learning.

Herein lies the crux: the narrative of technology as a harbinger of doom for jobs is not entirely accurate. The fear that AI will supplant human roles is as overcooked as the notion that LLMs possess boundless wisdom. While it’s true they can generate responses that uncannily resemble human banter, the reality is they’re adept at stitching words together, nothing more. They’re not the sages of Silicon Valley; they’re the algorithmic illusionists, capable of making data dance but still falling short of human sapience.

This brings us to a dichotomy in our expectations. We impose on machines the lofty standards of Asimov’s first law of robotics—to cause no harm—while simultaneously forgiving the LLMs for their errors, enchanted by their conversational prowess. We scorn the rare slip-up of an autonomous vehicle but chuckle at the eccentricities of a chatbot’s flawed poetry.

So, what’s the call to action? As we integrate AI into our businesses, we must revisit the intern playbook. We must infuse our training with humanity, nurture patience, and offer a scaffolding of experiences that no algorithm could replicate. Our goal? To arm our interns with the skills to thrive alongside AI, ensuring that as the corporate ladder evolves, it remains a climb towards enlightenment, not a chute into irrelevance.

In conclusion, while I debunk AI myths in my lectures—clarifying that LLMs are sophisticated but not infallible and that AI is a tool for augmentation rather than replacement—I emphasize a deeper message. We must evolve our corporate education, not as a knee-jerk reaction to technology but as a strategic embrace of it, ensuring that our juniors become the resilient architects of tomorrow’s enterprise and not its casualties. Bots may take the coffee runs, but the boardroom seats? Let’s reserve those for the humans who’ve learned to dance with the machines.

References

[1] Lakhani, Karim. “AI Won’t Replace Humans — But Humans With AI Will Replace Humans Without AI.” Interview by Adi Ignatius. Harvard Business Review, August 4, 2023. https://hbr.org/2023/08/ai-wont-replace-humans-but-humans-with-ai-will-replace-humans-without-ai. 

[2] Dell’Acqua, Fabrizio and McFowland, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, Available at SSRN: https://ssrn.com/abstract=4573321 or http://dx.doi.org/10.2139/ssrn.4573321
The post Future-Proofing Our Interns: Cultivating the Next Generation Amidst AI’s Corporate March appeared first on MarkTechPost.

Promote pipelines in a multi-environment setup using Amazon SageMaker …

Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance.
In order to fulfill regulatory and compliance requirements, the key requirements when designing such a platform are:

Address data drift
Monitor model performance
Facilitate automatic model retraining
Provide a process for model approval
Keep models in a secure environment

In this post, we show how to create an MLOps framework to address these needs while using a combination of AWS services and third-party toolsets. The solution entails a multi-environment setup with automated model retraining, batch inference, and monitoring with Amazon SageMaker Model Monitor, model versioning with SageMaker Model Registry, and a CI/CD pipeline to facilitate promotion of ML code and pipelines across environments by using Amazon SageMaker, Amazon EventBridge, Amazon Simple Notification Service (Amazon S3), HashiCorp Terraform, GitHub, and Jenkins CI/CD. We build a model to predict the severity (benign or malignant) of a mammographic mass lesion trained with the XGBoost algorithm using the publicly available UCI Mammography Mass dataset and deploy it using the MLOps framework. The full instructions with code are available in the GitHub repository.
Solution overview
The following architecture diagram shows an overview of the MLOps framework with the following key components:

Multi account strategy – Two different environments (dev and prod) are set up in two different AWS accounts following the AWS Well-Architected best practices, and a third account is set up in the central model registry:

Dev environment – Where an Amazon SageMaker Studio domain is set up to allow model development, model training, and testing of ML pipelines (train and inference), before a model is ready to be promoted to higher environments.
Prod environment – Where the ML pipelines from dev are promoted to as a first step, and scheduled and monitored over time.
Central model registry – Amazon SageMaker Model Registry is set up in a separate AWS account to track model versions generated across the dev and prod environments.

CI/CD and source control – The deployment of ML pipelines across environments is handled through CI/CD set up with Jenkins, along with version control handled through GitHub. Code changes merged to the corresponding environment git branch triggers a CI/CD workflow to make appropriate changes to the given target environment.
Batch predictions with model monitoring – The inference pipeline built with Amazon SageMaker Pipelines runs on a scheduled basis to generate predictions along with model monitoring using SageMaker Model Monitor to detect data drift.
Automated retraining mechanism – The training pipeline built with SageMaker Pipelines is triggered whenever a data drift is detected in the inference pipeline. After it’s trained, the model is registered into the central model registry to be approved by a model approver. When it’s approved, the updated model version is used to generate predictions through the inference pipeline.
Infrastructure as code – The infrastructure as code (IaC), created using HashiCorp Terraform, supports the scheduling of the inference pipeline with EventBridge, triggering of the train pipeline based on an EventBridge rule and sending notifications using Amazon Simple Notification Service (Amazon SNS) topics.

The MLOps workflow includes the following steps:

Access the SageMaker Studio domain in the development account, clone the GitHub repository, go through the process of model development using the sample model provided, and generate the train and inference pipelines.
Run the train pipeline in the development account, which generates the model artifacts for the trained model version and registers the model into SageMaker Model Registry in the central model registry account.
Approve the model in SageMaker Model Registry in the central model registry account.
Push the code (train and inference pipelines, and the Terraform IaC code to create the EventBridge schedule, EventBridge rule, and SNS topic) into a feature branch of the GitHub repository. Create a pull request to merge the code into the main branch of the GitHub repository.
Trigger the Jenkins CI/CD pipeline, which is set up with the GitHub repository. The CI/CD pipeline deploys the code into the prod account to create the train and inference pipelines along with Terraform code to provision the EventBridge schedule, EventBridge rule, and SNS topic.
The inference pipeline is scheduled to run on a daily basis, whereas the train pipeline is set up to run whenever data drift is detected from the inference pipeline.
Notifications are sent through the SNS topic whenever there is a failure with either the train or inference pipeline.

Prerequisites
For this solution, you should have the following prerequisites:

Three AWS accounts (dev, prod, and central model registry accounts)
A SageMaker Studio domain set up in each of the three AWS accounts (see Onboard to Amazon SageMaker Studio or watch the video Onboard Quickly to Amazon SageMaker Studio for setup instructions)
Jenkins (we use Jenkins 2.401.1) with administrative privileges installed on AWS
Terraform version 1.5.5 or later installed on Jenkins server

For this post, we work in the us-east-1 Region to deploy the solution.
Provision KMS keys in dev and prod accounts
Our first step is to create AWS Key Management Service (AWS KMS) keys in the dev and prod accounts.
Create a KMS key in the dev account and give access to the prod account
Complete the following steps to create a KMS key in the dev account:

On the AWS KMS console, choose Customer managed keys in the navigation pane.
Choose Create key.
For Key type, select Symmetric.
For Key usage, select Encrypt and decrypt.
Choose Next.
Enter the production account number to give the production account access to the KMS key provisioned in the dev account. This is a required step because the first time the model is trained in the dev account, the model artifacts are encrypted with the KMS key before being written to the S3 bucket in the central model registry account. The production account needs access to the KMS key in order to decrypt the model artifacts and run the inference pipeline.
Choose Next and finish creating your key.

After the key is provisioned, it should be visible on the AWS KMS console.

Create a KMS key in the prod account
Go through the same steps in the previous section to create a customer managed KMS key in the prod account. You can skip the step to share the KMS key to another account.
Set up a model artifacts S3 bucket in the central model registry account
Create an S3 bucket of your choice with the string sagemaker in the naming convention as part of the bucket’s name in the central model registry account, and update the bucket policy on the S3 bucket to give permissions from both the dev and prod accounts to read and write model artifacts into the S3 bucket.
The following code is the bucket policy to be updated on the S3 bucket:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AddPerm”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<dev-account-id>:root”
},
“Action”: [
“s3:PutObject”,
“s3:PutObjectAcl”,
“s3:GetObject”,
“s3:GetObjectVersion”
],
“Resource”: “arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
},
{
“Sid”: “AddPerm1”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<dev-account-id>:root”
},
“Action”: “s3:ListBucket”,
“Resource”: [
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>”,
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
]
},
{
“Sid”: “AddPerm2”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<prod-account-id>:root”
},
“Action”: [
“s3:PutObject”,
“s3:PutObjectAcl”,
“s3:GetObject”,
“s3:GetObjectVersion”
],
“Resource”: “arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
},
{
“Sid”: “AddPerm3”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<prod-account-id>:root”
},
“Action”: “s3:ListBucket”,
“Resource”: [
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>”,
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
]
}
]
}

Set up IAM roles in your AWS accounts
The next step is to set up AWS Identity and Access Management (IAM) roles in your AWS accounts with permissions for AWS Lambda, SageMaker, and Jenkins.
Lambda execution role
Set up Lambda execution roles in the dev and prod accounts, which will be used by the Lambda function run as part of the SageMaker Pipelines Lambda step. This step will run from the inference pipeline to fetch the latest approved model, using which inferences are generated. Create IAM roles in the dev and prod accounts with the naming convention arn:aws:iam::<account-id>:role/lambda-sagemaker-role and attach the following IAM policies:

Policy 1 – Create an inline policy named cross-account-model-registry-access, which gives access to the model package set up in the model registry in the central account:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: “sagemaker:ListModelPackages”,
“Resource”: “arn:aws:sagemaker:us-east-1:<central-model-registry-account-id>:model-package/mammo-severity-model-package/*”
},
{
“Sid”: “VisualEditor1”,
“Effect”: “Allow”,
“Action”: “sagemaker:DescribeModelPackageGroup”,
“Resource”: “arn:aws:sagemaker:us-east-1:<central-model-registry-account-id>:model-package-group/mammo-severity-model-package”
}
]
}

Policy 2 – Attach AmazonSageMakerFullAccess, which is an AWS managed policy that grants full access to SageMaker. It also provides select access to related services, such as AWS Application Auto Scaling, Amazon S3, Amazon Elastic Container Registry (Amazon ECR), and Amazon CloudWatch Logs.
Policy 3 – Attach AWSLambda_FullAccess, which is an AWS managed policy that grants full access to Lambda, Lambda console features, and other related AWS services.
Policy 4 – Use the following IAM trust policy for the IAM role:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: [
“lambda.amazonaws.com”,
“sagemaker.amazonaws.com”
]
},
“Action”: “sts:AssumeRole”
}
]
}

SageMaker execution role
The SageMaker Studio domains set up in the dev and prod accounts should each have an execution role associated, which can be found on the Domain settings tab on the domain details page, as shown in the following screenshot. This role is used to run training jobs, processing jobs, and more within the SageMaker Studio domain.

Add the following policies to the SageMaker execution role in both accounts:

Policy 1 – Create an inline policy named cross-account-model-artifacts-s3-bucket-access, which gives access to the S3 bucket in the central model registry account, which stores the model artifacts:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: [
“s3:PutObject”,
“s3:GetObject”,
“s3:GetObjectVersion”
],
“Resource”: “arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
},
{
“Sid”: “VisualEditor1”,
“Effect”: “Allow”,
“Action”: [
“s3:ListBucket”
],
“Resource”: [
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>”,
“arn:aws:s3:::<s3-bucket-in-central-model-registry-account>/*”
]
}
]
}

Policy 2 – Create an inline policy named cross-account-model-registry-access, which gives access to the model package in the model registry in the central model registry account:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: “sagemaker:CreateModelPackageGroup”,
“Resource”: “arn:aws:sagemaker:us-east-1:<central-model-registry-account-id>:model-package-group/mammo-severity-model-package”
}
]
}

Policy 3 – Create an inline policy named kms-key-access-policy, which gives access to the KMS key created in the previous step. Provide the account ID in which the policy is being created and the KMS key ID created in that account.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AllowUseOfKeyInThisAccount”,
“Effect”: “Allow”,
“Action”: [
“kms:Encrypt”,
“kms:Decrypt”,
“kms:ReEncrypt*”,
“kms:GenerateDataKey*”,
“kms:DescribeKey”
],
“Resource”: “arn:aws:kms:us-east-1:<account-id>:key/<kms-key-id>”
}
]
}

Policy 4 – Attach AmazonSageMakerFullAccess, which is an AWS managed policy that grants full access to SageMaker and select access to related services.
Policy 5 – Attach AWSLambda_FullAccess, which is an AWS managed policy that grants full access to Lambda, Lambda console features, and other related AWS services.
Policy 6 – Attach CloudWatchEventsFullAccess, which is an AWS managed policy that grants full access to CloudWatch Events.
Policy 7 – Add the following IAM trust policy for the SageMaker execution IAM role:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: [
“events.amazonaws.com”,
“sagemaker.amazonaws.com”
]
},
“Action”: “sts:AssumeRole”
}
]
}

Policy 8 (specific to the SageMaker execution role in the prod account) – Create an inline policy named cross-account-kms-key-access-policy, which gives access to the KMS key created in the dev account. This is required for the inference pipeline to read model artifacts stored in the central model registry account where the model artifacts are encrypted using the KMS key from the dev account when the first version of the model is created from the dev account.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AllowUseOfKeyInDevAccount”,
“Effect”: “Allow”,
“Action”: [
“kms:Encrypt”,
“kms:Decrypt”,
“kms:ReEncrypt*”,
“kms:GenerateDataKey*”,
“kms:DescribeKey”
],
“Resource”: “arn:aws:kms:us-east-1:<dev-account-id>:key/<dev-kms-key-id>”
}
]
}

Cross-account Jenkins role
Set up an IAM role called cross-account-jenkins-role in the prod account, which Jenkins will assume to deploy ML pipelines and corresponding infrastructure into the prod account.
Add the following managed IAM policies to the role:

CloudWatchFullAccess
AmazonS3FullAccess
AmazonSNSFullAccess
AmazonSageMakerFullAccess
AmazonEventBridgeFullAccess
AWSLambda_FullAccess

Update the trust relationship on the role to give permissions to the AWS account hosting the Jenkins server:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “events.amazonaws.com”,
“AWS”: “arn:aws:iam::<jenkins-account-id>:root”
},
“Action”: “sts:AssumeRole”,
“Condition”: {}
}
]
}

Update permissions on the IAM role associated with the Jenkins server
Assuming that Jenkins has been set up on AWS, update the IAM role associated with Jenkins to add the following policies, which will give Jenkins access to deploy the resources into the prod account:

Policy 1 – Create the following inline policy named assume-production-role-policy:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: “sts:AssumeRole”,
“Resource”: “arn:aws:iam::<prod-account-id>:role/cross-account-jenkins-role”
}
]
}

Policy 2 – Attach the CloudWatchFullAccess managed IAM policy.

Set up the model package group in the central model registry account
From the SageMaker Studio domain in the central model registry account, create a model package group called mammo-severity-model-package using the following code snippet (which you can run using a Jupyter notebook):

import boto3

model_package_group_name = “mammo-severity-model-package”
sm_client = boto3.Session().client(“sagemaker”)

create_model_package_group_response = sm_client.create_model_package_group(
ModelPackageGroupName=model_package_group_name,
ModelPackageGroupDescription=”Cross account model package group for mammo severity model”,

)

print(‘ModelPackageGroup Arn : {}’.format(create_model_package_group_response[‘ModelPackageGroupArn’]))

Set up access to the model package for IAM roles in the dev and prod accounts
Provision access to the SageMaker execution roles created in the dev and prod accounts so you can register model versions within the model package mammo-severity-model-package in the central model registry from both accounts. From the SageMaker Studio domain in the central model registry account, run the following code in a Jupyter notebook:

import json
import boto3

model_package_group_name = “mammo-severity-model-package”
# Convert the policy from JSON dict to string
model_package_group_policy = dict(
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AddPermModelPackageGroupCrossAccount”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: [“arn:aws:iam::<dev-account-id>:root”, “arn:aws:iam::<prod-account-id>:root”]
},
“Action”: [
“sagemaker:DescribeModelPackageGroup”
],
“Resource”: “arn:aws:sagemaker:us-east-1:<central-model-registry-account>:model-package-group/mammo-severity-model-package”
},
{
“Sid”: “AddPermModelPackageVersionCrossAccount”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: [“arn:aws:iam::<dev-account-id>:root”, “arn:aws:iam::<prod-account-id>:root”]
},
“Action”: [
“sagemaker:DescribeModelPackage”,
“sagemaker:ListModelPackages”,
“sagemaker:UpdateModelPackage”,
“sagemaker:CreateModelPackage”,
“sagemaker:CreateModel”
],
“Resource”: “arn:aws:sagemaker:us-east-1:<central-model-registry-account>:model-package/mammo-severity-model-package/*”
}
]
})
model_package_group_policy = json.dumps(model_package_group_policy)
# Add Policy to the model package group
sm_client = boto3.Session().client(“sagemaker”)
response = sm_client.put_model_package_group_policy(
ModelPackageGroupName = model_package_group_name,
ResourcePolicy = model_package_group_policy)

Set up Jenkins
In this section, we configure Jenkins to create the ML pipelines and the corresponding Terraform infrastructure in the prod account through the Jenkins CI/CD pipeline.

On the CloudWatch console, create a log group named jenkins-log within the prod account to which Jenkins will push logs from the CI/CD pipeline. The log group should be created in the same Region as where the Jenkins server is set up.
Install the following plugins on your Jenkins server:

Job DSL
Git
Pipeline
Pipeline: AWS Steps
Pipeline Utility Steps

Set up AWS credentials in Jenkins using the cross-account IAM role (cross-account-jenkins-role) provisioned in the prod account.
For System Configuration, choose AWS.
Provide the credentials and CloudWatch log group you created earlier.
Set up GitHub credentials within Jenkins.
Create a new project in Jenkins.
Enter a project name and choose Pipeline.
On the General tab, select GitHub project and enter the forked GitHub repository URL.
Select This project is parameterized.
On the Add Parameter menu, choose String Parameter.
For Name, enter prodAccount.
For Default Value, enter the prod account ID.
Under Advanced Project Options, for Definition, select Pipeline script from SCM.
For SCM, choose Git.
For Repository URL, enter the forked GitHub repository URL.
For Credentials, enter the GitHub credentials saved in Jenkins.
Enter main in the Branches to build section, based on which the CI/CD pipeline will be triggered.
For Script Path, enter Jenkinsfile.
Choose Save.

The Jenkins pipeline should be created and visible on your dashboard.

Provision S3 buckets, collect and prepare data
Complete the following steps to set up your S3 buckets and data:

Create an S3 bucket of your choice with the string sagemaker in the naming convention as part of the bucket’s name in both dev and prod accounts to store datasets and model artifacts.
Set up an S3 bucket to maintain the Terraform state in the prod account.
Download and save the publicly available UCI Mammography Mass dataset to the S3 bucket you created earlier in the dev account.
Fork and clone the GitHub repository within the SageMaker Studio domain in the dev account. The repo has the following folder structure:

/environments – Configuration script for prod environment
/mlops-infra – Code for deploying AWS services using Terraform code
/pipelines – Code for SageMaker pipeline components
Jenkinsfile – Script to deploy through Jenkins CI/CD pipeline
setup.py – Needed to install the required Python modules and create the run-pipeline command
mammography-severity-modeling.ipynb – Allows you to create and run the ML workflow

Create a folder called data within the cloned GitHub repository folder and save a copy of the publicly available UCI Mammography Mass dataset.
Follow the Jupyter notebook mammography-severity-modeling.ipynb.
Run the following code in the notebook to preprocess the dataset and upload it to the S3 bucket in the dev account:

import boto3
import sagemaker
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder

#Replace the values based on the resoures created
default_bucket = “<s3-bucket-in-dev-account>”
model_artifacts_bucket = “<s3-bucket-in-central-model-registry-account>”
region = “us-east-1”
model_name = “mammography-severity-model”
role = sagemaker.get_execution_role()
lambda_role = “arn:aws:iam::<dev-account-id>:role/lambda-sagemaker-role”
kms_key = “arn:aws:kms:us-east-1:<dev-account-id>:key/<kms-key-id-in-dev-account>”
model_package_group_name=”arn:aws:sagemaker:us-east-1:<central-model-registry-account-id>:model-package-group/mammo-severity-model-package”

feature_columns_names = [
‘BIRADS’,
‘Age’,
‘Shape’,
‘Margin’,
‘Density’,
]
feature_columns_dtype = {
‘BIRADS’: np.float64,
‘Age’: np.float64,
‘Shape’: np.float64,
‘Margin’: np.float64,
‘Density’: np.float64,
}

# read raw dataset
mammographic_data = pd.read_csv(“data/mammographic_masses.data”,header=None)

# split data into batch and raw datasets
batch_df =mammographic_data.sample(frac=0.05,random_state=200)
raw_df =mammographic_data.drop(batch_df.index)

# Split the raw datasets to two parts, one of which will be used to train
#the model initially and then other dataset will be leveraged when
#retraining the model
train_dataset_part2 =raw_df.sample(frac=0.1,random_state=200)
train_dataset_part1 =raw_df.drop(train_dataset_part2.index)

# save the train datasets
train_dataset_part1.to_csv(“data/mammo-train-dataset-part1.csv”,index=False)
train_dataset_part2.to_csv(“data/mammo-train-dataset-part2.csv”,index=False)

# remove label column from the batch dataset which will be used to generate inferences
batch_df.drop(5,axis=1,inplace=True)

# create a copy of the batch dataset
batch_modified_df = batch_df

def preprocess_batch_data(feature_columns_names,feature_columns_dtype,batch_df):
batch_df.replace(“?”, “NaN”, inplace = True)
batch_df.columns = feature_columns_names
batch_df = batch_df.astype(feature_columns_dtype)
numeric_transformer = Pipeline(
steps=[(“imputer”, SimpleImputer(strategy=”median”))]
)
numeric_features = list(feature_columns_names)
preprocess = ColumnTransformer(
transformers=[
(“num”, numeric_transformer, numeric_features)
]
)
batch_df = preprocess.fit_transform(batch_df)
return batch_df

# save the batch dataset file
batch_df = preprocess_batch_data(feature_columns_names,feature_columns_dtype,batch_df)
pd.DataFrame(batch_df).to_csv(“data/mammo-batch-dataset.csv”, header=False, index=False)

# modify batch dataset to introduce missing values
batch_modified_df.replace(“?”, “NaN”, inplace = True)
batch_modified_df.columns = feature_columns_names
batch_modified_df = batch_modified_df.astype(feature_columns_dtype)

# save the batch dataset with outliers file
batch_modified_df.to_csv(“data/mammo-batch-dataset-outliers.csv”,index=False)

The code will generate the following datasets:

data/ mammo-train-dataset-part1.csv – Will be used to train the first version of model.
data/ mammo-train-dataset-part2.csv  – Will be used to train the second version of model along with the mammo-train-dataset-part1.csv dataset.
data/mammo-batch-dataset.csv – Will be used to generate inferences.
data/mammo-batch-dataset-outliers.csv – Will introduce outliers into the dataset to fail the inference pipeline. This will enable us to test the pattern to trigger automated retraining of the model.

Upload the dataset mammo-train-dataset-part1.csv under the prefix mammography-severity-model/train-dataset, and upload the datasets mammo-batch-dataset.csv and mammo-batch-dataset-outliers.csv to the prefix mammography-severity-model/batch-dataset of the S3 bucket created in the dev account:

import boto3
s3_client = boto3.resource(‘s3’)
s3_client.Bucket(default_bucket).upload_file(“data/mammo-train-dataset-part1.csv”,”mammography-severity-model/data/train-dataset/mammo-train-dataset-part1.csv”)
s3_client.Bucket(default_bucket).upload_file(“data/mammo-batch-dataset.csv”,”mammography-severity-model/data/batch-dataset/mammo-batch-dataset.csv”)
s3_client.Bucket(default_bucket).upload_file(“data/mammo-batch-dataset-outliers.csv”,”mammography-severity-model/data/batch-dataset/mammo-batch-dataset-outliers.csv”)

Upload the datasets mammo-train-dataset-part1.csv and mammo-train-dataset-part2.csv under the prefix mammography-severity-model/train-dataset into the S3 bucket created in the prod account through the Amazon S3 console.
Upload the datasets mammo-batch-dataset.csv and mammo-batch-dataset-outliers.csv to the prefix mammography-severity-model/batch-dataset of the S3 bucket in the prod account.

Run the train pipeline
Under <project-name>/pipelines/train, you can see the following Python scripts:

scripts/raw_preprocess.py – Integrates with SageMaker Processing for feature engineering
scripts/evaluate_model.py – Allows model metrics calculation, in this case auc_score
train_pipeline.py – Contains the code for the model training pipeline

Complete the following steps:

Upload the scripts into Amazon S3:

import boto3
s3_client = boto3.resource(‘s3’)
s3_client.Bucket(default_bucket).upload_file(“pipelines/train/scripts/raw_preprocess.py”,”mammography-severity-model/scripts/raw_preprocess.py”)
s3_client.Bucket(default_bucket).upload_file(“pipelines/train/scripts/evaluate_model.py”,”mammography-severity-model/scripts/evaluate_model.py”)

Get the train pipeline instance:

from pipelines.train.train_pipeline import get_pipeline

train_pipeline = get_pipeline(
region=region,
role=role,
default_bucket=default_bucket,
model_artifacts_bucket=model_artifacts_bucket,
model_name = model_name,
kms_key = kms_key,
model_package_group_name= model_package_group_name,
pipeline_name=”mammo-severity-train-pipeline”,
base_job_prefix=”mammo-severity”,
)

train_pipeline.definition()

Submit the train pipeline and run it:

train_pipeline.upsert(role_arn=role)
train_execution = train_pipeline.start()

The following figure shows a successful run of the training pipeline. The final step in the pipeline registers the model in the central model registry account.

Approve the model in the central model registry
Log in to the central model registry account and access the SageMaker model registry within the SageMaker Studio domain. Change the model version status to Approved.

Once approved, the status should be changed on the model version.

Run the inference pipeline (Optional)
This step is not required but you can still run the inference pipeline to generate predictions in the dev account.
Under <project-name>/pipelines/inference, you can see the following Python scripts:

scripts/lambda_helper.py – Pulls the latest approved model version from the central model registry account using a SageMaker Pipelines Lambda step
inference_pipeline.py – Contains the code for the model inference pipeline

Complete the following steps:

Upload the script to the S3 bucket:

import boto3
s3_client = boto3.resource(‘s3’)
s3_client.Bucket(default_bucket).upload_file(“pipelines/inference/scripts/lambda_helper.py”,”mammography-severity-model/scripts/lambda_helper.py”)

Get the inference pipeline instance using the normal batch dataset:

from pipelines.inference.inference_pipeline import get_pipeline

inference_pipeline = get_pipeline(
region=region,
role=role,
lambda_role = lambda_role,
default_bucket=default_bucket,
kms_key=kms_key,
model_name = model_name,
model_package_group_name= model_package_group_name,
pipeline_name=”mammo-severity-inference-pipeline”,
batch_dataset_filename = “mammo-batch-dataset”
)

Submit the inference pipeline and run it:

inference_pipeline.upsert(role_arn=role)
inference_execution = inference_pipeline.start()

The following figure shows a successful run of the inference pipeline. The final step in the pipeline generates the predictions and stores them in the S3 bucket. We use MonitorBatchTransformStep to monitor the inputs into the batch transform job. If there are any outliers, the inference pipeline goes into a failed state.

Run the Jenkins pipeline
The environment/ folder within the GitHub repository contains the configuration script for the prod account. Complete the following steps to trigger the Jenkins pipeline:

Update the config script prod.tfvars.json based on the resources created in the previous steps:

{
“env_group”: “prod”,
“aws_region”: “us-east-1”,
“event_bus_name”: “default”,
“pipelines_alert_topic_name”: “mammography-model-notification”,
“email”:”admin@org.com”,
“lambda_role”:”arn:aws:iam::<prod-account-id>:role/lambda-sagemaker-role”,
“default_bucket”:”<s3-bucket-in-prod-account>”,
“model_artifacts_bucket”: “<s3-bucket-in-central-model-registry-account>”,
“kms_key”: “arn:aws:kms:us-east-1:<prod-account-id>:key/<kms-key-id-in-prod-account>”,
“model_name”: “mammography-severity-model”,
“model_package_group_name”:”arn:aws:sagemaker:us-east-1:<central-model-registry-account-id>:model-package-group/mammo-severity-model-package”,
“train_pipeline_name”:”mammo-severity-train-pipeline”,
“inference_pipeline_name”:”mammo-severity-inference-pipeline”,
“batch_dataset_filename”:”mammo-batch-dataset”,
“terraform_state_bucket”:”<s3-bucket-terraform-state-in-prod-account>”,
“train_pipeline”: {
“name”: “mammo-severity-train-pipeline”,
“arn”: “arn:aws:sagemaker:us-east-1:<prod-account-id>:pipeline/mammo-severity-train-pipeline”,
“role_arn”: “arn:aws:iam::<prod-account-id>:role/service-role/<sagemaker-execution-role-in-prod-account>”
},
“inference_pipeline”: {
“name”: “mammo-severity-inference-pipeline”,
“arn”: “arn:aws:sagemaker:us-east-1:<prod-account-id>:pipeline/mammo-severity-inference-pipeline”,
“cron_schedule”: “cron(0 23 * * ? *)”,
“role_arn”: “arn:aws:iam::<prod-account-id>:role/service-role/<sagemaker-execution-role-in-prod-account>”
}

}

Once updated, push the code into the forked GitHub repository and merge the code into main branch.
Go to the Jenkins UI, choose Build with Parameters, and trigger the CI/CD pipeline created in the previous steps.

When the build is complete and successful, you can log in to the prod account and see the train and inference pipelines within the SageMaker Studio domain.

Additionally, you will see three EventBridge rules on the EventBridge console in the prod account:

Schedule the inference pipeline
Send a failure notification on the train pipeline
When the inference pipeline fails to trigger the train pipeline, send a notification

Finally, you will see an SNS notification topic on the Amazon SNS console that sends notifications through email. You’ll get an email asking you to confirm the acceptance of these notification emails.

Test the inference pipeline using a batch dataset without outliers
To test if the inference pipeline is working as expected in the prod account, we can log in to the prod account and trigger the inference pipeline using the batch dataset without outliers.
Run the pipeline via the SageMaker Pipelines console in the SageMaker Studio domain of the prod account, where the transform_input will be the S3 URI of the dataset without outliers (s3://<s3-bucket-in-prod-account>/mammography-severity-model/data/mammo-batch-dataset.csv).

The inference pipeline succeeds and writes the predictions back to the S3 bucket.

Test the inference pipeline using a batch dataset with outliers
You can run the inference pipeline using the batch dataset with outliers to check if the automated retraining mechanism works as expected.
Run the pipeline via the SageMaker Pipelines console in the SageMaker Studio domain of the prod account, where the transform_input will be the S3 URI of the dataset with outliers (s3://<s3-bucket-in-prod-account>/mammography-severity-model/data/mammo-batch-dataset-outliers.csv).

The inference pipeline fails as expected, which triggers the EventBridge rule, which in turn triggers the train pipeline.

After a few moments, you should see a new run of the train pipeline on the SageMaker Pipelines console, which picks up the two different train datasets (mammo-train-dataset-part1.csv and mammo-train-dataset-part2.csv) uploaded to the S3 bucket to retrain the model.

You will also see a notification sent to the email subscribed to the SNS topic.

To use the updated model version, log in to the central model registry account and approve the model version, which will be picked up during the next run of the inference pipeline triggered through the scheduled EventBridge rule.
Although the train and inference pipelines use a static dataset URL, you can have the dataset URL passed to the train and inference pipelines as dynamic variables in order to use updated datasets to retrain the model and generate predictions in a real-world scenario.
Clean up
To avoid incurring future charges, complete the following steps:

Remove the SageMaker Studio domain across all the AWS accounts.
Delete all the resources created outside SageMaker, including the S3 buckets, IAM roles, EventBridge rules, and SNS topic set up through Terraform in the prod account.
Delete the SageMaker pipelines created across accounts using the AWS Command Line Interface (AWS CLI).

Conclusion
Organizations often need to align with enterprise-wide toolsets to enable collaboration across different functional areas and teams. This collaboration ensures that your MLOps platform can adapt to evolving business needs and accelerates the adoption of ML across teams. This post explained how to create an MLOps framework in a multi-environment setup to enable automated model retraining, batch inference, and monitoring with Amazon SageMaker Model Monitor, model versioning with SageMaker Model Registry, and promotion of ML code and pipelines across environments with a CI/CD pipeline. We showcased this solution using a combination of AWS services and third-party toolsets. For instructions on implementing this solution, see the GitHub repository. You can also extend this solution by bringing in your own data sources and modeling frameworks.

About the Authors
Gayatri Ghanakota is a Sr. Machine Learning Engineer with AWS Professional Services. She is passionate about developing, deploying, and explaining AI/ ML solutions across various domains. Prior to this role, she led multiple initiatives as a data scientist and ML engineer with top global firms in the financial and retail space. She holds a master’s degree in Computer Science specialized in Data Science from the University of Colorado, Boulder.
Sunita Koppar is a Sr. Data Lake Architect with AWS Professional Services. She is passionate about solving customer pain points processing big data and providing long-term scalable solutions. Prior to this role, she developed products in internet, telecom, and automotive domains, and has been an AWS customer. She holds a master’s degree in Data Science from the University of California, Riverside.
Saswata Dash is a DevOps Consultant with AWS Professional Services. She has worked with customers across healthcare and life sciences, aviation, and manufacturing. She is passionate about all things automation and has comprehensive experience in designing and building enterprise-scale customer solutions in AWS. Outside of work, she pursues her passion for photography and catching sunrises.

Customizing coding companions for organizations

Generative AI models for coding companions are mostly trained on publicly available source code and natural language text. While the large size of the training corpus enables the models to generate code for commonly used functionality, these models are unaware of code in private repositories and the associated coding styles that are enforced when developing with them. Consequently, the generated suggestions may require rewriting before they are appropriate for incorporation into an internal repository.
We can address this gap and minimize additional manual editing by embedding code knowledge from private repositories on top of a language model trained on public code. This is why we developed a customization capability for Amazon CodeWhisperer. In this post, we show you two possible ways of customizing coding companions using retrieval augmented generation and fine-tuning.
Our goal with CodeWhisperer customization capability is to enable organizations to tailor the CodeWhisperer model using their private repositories and libraries to generate organization-specific code recommendations that save time, follow organizational style and conventions, and avoid bugs or security vulnerabilities. This benefits enterprise software development and helps overcome the following challenges:

Sparse documentation or information for internal libraries and APIs that forces developers to spend time examining previously written code to replicate usage.
Lack of awareness and consistency in implementing enterprise-specific coding practices, styles and patterns.
Inadvertent use of deprecated code and APIs by developers.

By using internal code repositories for additional training that have already undergone code reviews, the language model can surface the use of internal APIs and code blocks that overcome the preceding list of problems. Because the reference code is already reviewed and meets the customer’s high bar, the likelihood of introducing bugs or security vulnerabilities is also minimized. And, by carefully selecting of the source files used for customization, organizations can reduce the use of deprecated code.
Design challenges
Customizing code suggestions based on an organization’s private repositories has many interesting design challenges. Deploying large language models (LLMs) to surface code suggestions has fixed costs for availability and variable costs due to inference based on the number of tokens generated. Therefore, having separate customizations for each customer and hosting them individually, thereby incurring additional fixed costs, can be prohibitively expensive. On the other hand, having multiple customizations simultaneously on the same system necessitates multi-tenant infrastructure to isolate proprietary code for each customer. Furthermore, the customization capability should surface knobs to enable the selection of the appropriate training subset from the internal repository using different metrics (for example, files with a history of fewer bugs or code that is recently committed into the repository). By selecting the code based on these metrics, the customization can be trained using higher-quality code which can improve the quality of code suggestions. Finally, even with continuously evolving code repositories, the cost associated with customization should be minimal to help enterprises realize cost savings from increased developer productivity.
A baseline approach to building customization could be to pretrain the model on a single training corpus composed of of the existing (public) pretraining dataset along with the (private) enterprise code. While this approach works in practice, it requires (redundant) individual pretraining using the public dataset for each enterprise. It also requires redundant deployment costs associated with hosting a customized model for each customer that only serves client requests originating from that customer. By decoupling the training of public and private code and deploying the customization on a multi-tenant system, these redundant costs can be avoided.
How to customize
At a high level, there are two types of possible customization techniques: retrieval-augmented generation (RAG) and fine-tuning (FT).

Retrieval-augmented generation: RAG finds matching pieces of code within a repository that is similar to a given code fragment (for example, code that immediately precedes the cursor in the IDE) and augments the prompt used to query the LLM with these matched code snippets. This enriches the prompt to help nudge the model into generating more relevant code. There are a few techniques explored in the literature along these lines. See Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, REALM, kNN-LM and RETRO.

Fine-tuning: FT takes a pre-trained LLM and trains it further on a specific, smaller codebase (compared to the pretraining dataset) to adapt it for the appropriate repository. Fine-tuning adjusts the LLM’s weights based on this training, making it more tailored to the organization’s unique needs.

Both RAG and fine-tuning are powerful tools for enhancing the performance of LLM-based customization. RAG can quickly adapt to private libraries or APIs with lower training complexity and cost. However, searching and augmenting retrieved code snippets to the prompt increases latency at runtime. Instead, fine-tuning does not require any augmentation of the context because the model is already trained on private libraries and APIs. However, it leads to higher training costs and complexities in serving the model, when multiple custom models have to be supported across multiple enterprise customers. As we discuss later, these concerns can be remedied by optimizing the approach further.
Retrieval augmented generation
There are a few steps involved in RAG:
Indexing
Given a private repository as input by the admin, an index is created by splitting the source code files into chunks. Put simply, chunking turns the code snippets into digestible pieces that are likely to be most informative for the model and are easy to retrieve given the context. The size of a chunk and how it is extracted from a file are design choices that affect the final result. For example, chunks can be split based on lines of code or based on syntactic blocks, and so on.
Administrator Workflow
Contextual search
Search a set of indexed code snippets based on a few lines of code above the cursor and retrieve relevant code snippets. This retrieval can happen using different algorithms. These choices might include:

Bag of words (BM25) – A bag-of-words retrieval function that ranks a set of code snippets based on the query term frequencies and code snippet lengths.

BM25-based retrieval

The following figure illustrates how BM25 works. In order to use BM25, an inverted index is built first. This is a data structure that maps different terms to the code snippets that those terms occur in. At search time, we look up code snippets based on the terms present in the query and score them based on the frequency.

Semantic retrieval [Contriever, UniXcoder] – Converts query and indexed code snippets into high-dimensional vectors and ranks code snippets based on semantic similarity. Formally, often k-nearest neighbors (KNN) or approximate nearest neighbor (ANN) search is often used to find other snippets with similar semantics.

Semantic retrieval

BM25 focuses on lexical matching. Therefore, replacing “add” with “delete” may not change the BM25 score based on the terms in the query, but the retrieved functionality may be the opposite of what is required. In contrast, semantic retrieval focuses on the functionality of the code snippet even though variable and API names may be different. Typically, a combination of BM25 and semantic retrievals can work well together to deliver better results.
Augmented inference
When developers write code, their existing program is used to formulate a query that is sent to the retrieval index. After retrieving multiple code snippets using one of the techniques discussed above, we prepend them to the original prompt. There are many design choices here, including the number of snippets to be retrieved, the relative placement of the snippets in the prompt, and the size of the snippet. The final design choice is primarily driven by empirical observation by exploring various approaches with the underlying language model and plays a key role in determining the accuracy of the approach. The contents from the returned chunks and the original code are combined and sent to the model to get customized code suggestions.
Developer workflow

Fine tuning:
Fine-tuning a language model is done for transfer learning in which the weights of a pre-trained model are trained on new data. The goal is to retain the appropriate knowledge from a model already trained on a large corpus and refine, replace, or add new knowledge from the new corpus — in our case, a new codebase. Simply training on a new codebase leads to catastrophic forgetting. For example, the language model may “forget” its knowledge of safety or the APIs that are sparsely used in the enterprise codebase to date. There are a variety of techniques like experience replay, GEM, and PP-TF that are employed to address this challenge.
Fine tuning

There are two ways of fine-tuning. One approach is to use the additional data without augmenting the prompt to fine-tune the model. Another approach is to augment the prompt during fine-tuning by retrieving relevant code suggestions. This helps improve the model’s ability to provide better suggestions in the presence of retrieved code snippets. The model is then evaluated on a held-out set of examples after it is trained. Subsequently, the customized model is deployed and used for generating the code suggestions.
Despite the advantages of using dedicated LLMs for generating code on private repositories, the costs can be prohibitive for small and medium-sized organizations. This is because dedicated compute resources are necessary even though they may be underutilized given the size of the teams. One way to achieve cost efficiency is serving multiple models on the same compute (for example, SageMaker multi-tenancy). However, language models require one or more dedicated GPUs across multiple zones to handle latency and throughput constraints. Hence, multi-tenancy of full model hosting on each GPU is infeasible.
We can overcome this problem by serving multiple customers on the same compute by using small adapters to the LLM. Parameter-efficient fine-tuning (PEFT) techniques like prompt tuning, prefix tuning, and Low-Rank Adaptation (LoRA) are used to lower training costs without any loss of accuracy. LoRA, especially, has seen great success at achieving similar (or better) accuracy than full-model fine-tuning. The basic idea is to design a low-rank matrix that is then added to the matrices with the original matrix weight of targeted layers of the model. Typically, these adapters are then merged with the original model weights for serving. This leads to the same size and architecture as the original neural network. Keeping the adapters separate, we can serve the same base model with many model adapters. This brings the economies of scale back to our small and medium-sized customers.
Low-Rank Adaptation (LoRA)

Measuring effectiveness of customization
We need evaluation metrics to assess the efficacy of the customized solution. Offline evaluation metrics act as guardrails against shipping customizations that are subpar compared to the default model. By building datasets out of a held-out dataset from within the provided repository, the customization approach can be applied to this dataset to measure effectiveness. Comparing the existing source code with the customized code suggestion quantifies the usefulness of the customization. Common measures used for this quantification include metrics like edit similarity, exact match, and CodeBLEU.
It is also possible to measure usefulness by quantifying how often internal APIs are invoked by the customization and comparing it with the invocations in the pre-existing source. Of course, getting both aspects right is important for a successful completion. For our customization approach, we have designed a tailor-made metric known as Customization Quality Index (CQI), a single user-friendly measure ranging between 1 and 10. The CQI metric shows the usefulness of the suggestions from the customized model compared to code suggestions with a generic public model.
Summary
We built Amazon CodeWhisperer customization capability based on a mixture of the leading technical techniques discussed in this blog post and evaluated it with user studies on developer productivity, conducted by Persistent Systems. In these two studies, commissioned by AWS, developers were asked to create a medical software application in Java that required use of their internal libraries. In the first study, developers without access to CodeWhisperer took (on average) ~8.2 hours to complete the task, while those who used CodeWhisperer (without customization) completed the task 62 percent faster in (on average) ~3.1 hours.
In the second study with a different set of developer cohorts, developers using CodeWhisperer that had been customized using their private codebase completed the task in 2.5 hours on average, 28 percent faster than those who were using CodeWhisperer without customization and completed the task in ~3.5 hours on average. We strongly believe tools like CodeWhisperer that are customized to your codebase have a key role to play in further boosting developer productivity and recommend giving it a run. For more information and to get started, visit the Amazon CodeWhisperer page.

About the authors
Qing Sun is a Senior Applied Scientist in AWS AI Labs and work on AWS CodeWhisperer, a generative AI-powered coding assistant. Her research interests lie in Natural Language Processing, AI4Code and generative AI. In the past, she had worked on several NLP-based services such as Comprehend Medical, a medical diagnosis system at Amazon Health AI and Machine Translation system at Meta AI. She received her PhD from Virginia Tech in 2017.
Arash Farahani is an Applied Scientist with Amazon CodeWhisperer. His current interests are in generative AI, search, and personalization. Arash is passionate about building solutions that resolve developer pain points. He has worked on multiple features within CodeWhisperer, and introduced NLP solutions into various internal workstreams that touch all Amazon developers. He received his PhD from University of Illinois at Urbana-Champaign in 2017.
Xiaofei Ma is an Applied Science Manager in AWS AI Labs. He joined Amazon in 2016 as an Applied Scientist within SCOT organization and then later AWS AI Labs in 2018 working on Amazon Kendra. Xiaofei has been serving as the science manager for several services including Kendra, Contact Lens, and most recently CodeWhisperer and CodeGuru Security. His research interests lie in the area of AI4Code and Natural Language Processing. He received his PhD from University of Maryland, College Park in 2010.
Murali Krishna Ramanathan is a Principal Applied Scientist in AWS AI Labs and co-leads AWS CodeWhisperer, a generative AI-powered coding companion. He is passionate about building software tools and workflows that help improve developer productivity. In the past, he built Piranha, an automated refactoring tool to delete code due to stale feature flags and led code quality initiatives at Uber engineering. He is a recipient of the Google faculty award (2015), ACM SIGSOFT Distinguished paper award (ISSTA 2016) and Maurice Halstead award (Purdue 2006). He received his PhD in Computer Science from Purdue University in 2008.
Ramesh Nallapati is a Senior Principal Applied Scientist in AWS AI Labs and co-leads CodeWhisperer, a generative AI-powered coding companion, and Titan Large Language Models at AWS. His interests are mainly in the areas of Natural Language Processing and Generative AI. In the past, Ramesh has provided science leadership in delivering many NLP-based AWS products such as Kendra, Quicksight Q and Contact Lens. He held research positions at Stanford, CMU and IBM Research, and received his Ph.D. in Computer Science from University of Massachusetts Amherst in 2006.

40+ Cool AI Tools You Should Check Out (November 2023)

DeepSwap

DeepSwap is an AI-based tool for anyone who wants to create convincing deepfake videos and images. It is super easy to create your content by refacing videos, pictures, memes, old movies, GIFs… You name it. The app has no content restrictions, so users can upload material of any content. Besides, you can get a 50% off to be a subscribed user of the product for the first time.

Aragon

Get stunning professional headshots effortlessly with Aragon. Utilize the latest in A.I. technology to create high-quality headshots of yourself in a snap! Skip the hassle of booking a photography studio or dressing up. Get your photos edited and retouched quickly, not after days. Receive 40 HD photos that will give you an edge in landing your next job.

AdCreative.ai

Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. Say goodbye to hours of creative work and hello to high-converting ad and social media posts generated in mere seconds. Maximize your success and minimize your effort with AdCreative.ai today.

Hostinger AI Website Builder

Hostinger uses the power of a cutting-edge artificial intelligence engine to create the best AI website builder for all website owners. The builder guides you through the design process, suggesting layouts, color schemes, and content placements tailored to your needs. Embrace the freedom to customize every detail while maintaining responsive design for various devices.

Otter AI

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure. Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion

Notion is aiming to increase its user base through the utilization of its advanced AI technology. Their latest feature, Notion AI, is a robust generative AI tool that assists users with tasks like note summarization, identifying action items in meetings, and creating and modifying text. Notion AI streamlines workflows by automating tedious tasks, providing suggestions, and templates to users, ultimately simplifying and improving the user experience.

Codium AI

Generating meaningful tests for busy devs. With CodiumAI, you get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push. With CodiumAI, developers innovate faster and with confidence, saving their time devoted to testing and analyzing code. Code, as you meant it.

Docktopus AI

Decktopus is an AI-powered presentation tool that simplifies online content creation with more than 100 customizable templates, allowing users to create professional presentations in seconds.

SaneBox

AI is the future, but at SaneBox, AI has been successfully powering email for the past 12 years and counting, saving the average user more than 3 hours a week on inbox management.

Promptpal AI

Promptpal AI helps users discover the best prompts to get the most out of AI models like ChatGPT.

Quinvio AI

Quinvio is an AI video creation tool that enables quick video presentations with an intuitive editor, AI assistance for writing, and an option to choose an AI spokesperson.

Ask your PDF

AskYourPdf is an AI chatbot that helps users interact with PDF documents easily and extract insights.

Supernormal AI

Supernormal is an AI-powered tool that helps users create meeting notes automatically, saving 5-10 minutes every meeting.

Suggesty

Suggesty is powered by GPT-3 and provides human-like answers to Google searches.

ChatGPT Sidebar

ChatGPT Sidebar is a ChatGPT Chrome extension that can be used on any website to summarize articles, explain concepts, etc.

MarcBot

MarcBot is a chatbot inside Telegram messenger than uses ChatGPT API, Whisper, and Amazon Polly.

Motion AI

Motion enables users to create chatbots that can engage as well as delight their customers across multiple channels and platforms, all at scale.

Roam Around

Roam Around is an AI tool powered by ChatGPT that helps users to build their travel itineraries.

Beautiful

Beautiful AI presentation software enables users to quickly create beautifully designed, modern slides that are professional-looking and impressive.

Quotify

Quotify uses AI to identify the most relevant quotes from any text-based PDF, making it a powerful quote-finding tool.

Harvey AI

Harvey is an AI legal advisor that helps in contract analysis, litigation, due diligence, etc.

Bearly AI

Bearly is an AI-based tool that facilitates faster reading, writing, and content creation.

Scispace AI

Scispace is an AI assistant that simplifies reading and understanding complex content, allowing users to highlight confusing text, ask follow-up questions, and search for relevant papers without specifying keywords.

Hints AI

Hints is an AI tool powered by GPT that can be integrated with any software to perform tasks on behalf of the user.

Monday.com

Monday.com is a cloud-based framework that allows users to build software applications and work management tools.

Base64

Base64 is a data extraction automation tool that allows users to extract text, photos, and other types of data from all documents.

AI Writer

AI Writer is an AI content creation platform that allows users to generate articles and blog posts within seconds.

Engage AI

Engage is an AI tool that augments users’ comments to engage prospects on Linkedin.

Google Duplex 

Google Duplex is an AI technology that mimics a human voice and makes phone calls on behalf of a person.

Perplexity

Perplexity is an AI tool that aims to answer questions accurately using large language models.

NVIDIA Canvas

NVIDIA Canvas is an AI tool that turns simple brushstrokes into realistic landscape images.

Seenapse

Seenapse is a tool that allows users to generate hundreds of divergent and creative ideas.

Murf AI 

Murf AI allows users to create studio-like voice overs within minutes.

10Web

10Web is an AI-powered WordPress platform that automates website building, hosting, and page speed boosting.

Kickresume

KickResume is an AI tool that allows users to create beautiful resumes quickly.

DimeADozen

DimeADozen is an AI tool that allows users to validate their business ideas within seconds.

WavTool

WavTools allows users to make high-quality music in the browser for free.

Wonder Dynamics

Wonder Dynamics is an AI tool that integrates computer-generated (CG) characters into real-life settings through automatic animation, lighting, and composition.

Gen-2

Gen-2 is a multimodal AI tool that generates videos by taking text, images, or video clips as input.

Uizard

Uizard is an AI tool for designing web and mobile apps within a few minutes.

GPT-3 Color Palette Generator

This is an AI tool that generates a color palette on the basis of an English description.

Rationale

Rationale is an AI tool that assists business owners, managers, and individuals with tough decisions.

Vizology

Vizology is an AI tool that provides businesses with AI-generated responses to inquiries about companies, markets, and contextual business intelligence.

PromptPerfect

PromptPerfect is a prompt optimization tool that helps to bring the most out of AI models like ChatGPT.

Numerous

Numerous is an AI assistant that allows users to breeze through their busy work in Excel & Google Sheets.

Nolan

Nolan is a tool that allows users to craft compelling movie scripts.

Play HT

Play HT is an AI voice generator that allows users to generate realistic text-to-speech voice online.

PromptGPT

PromptGPT allows users to improve their ChatGPT output by providing optimized prompts.

AI Image Enlarger

This tool allows users to enlarge and enhance their small images automatically.

Timely

Timely is an AI-powered time-tracking software that helps users to boost their productivity.

HeyGPT

This is an IOS shortcut that replaces Siri with ChatGPT.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any question regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post 40+ Cool AI Tools You Should Check Out (November 2023) appeared first on MarkTechPost.