Introducing popularity tuning for Similar-Items in Amazon Personalize

Amazon Personalize now enables popularity tuning for its Similar-Items recipe (aws-similar-items). Similar-Items generates recommendations that are similar to the item that a user selects, helping users discover new items in your catalog based on the previous behavior of all users and item metadata. Previously, this capability was only available for SIMS, the other Related_Items recipe within Amazon Personalize.
Every customer’s item catalog and the way that users interact with it are unique to their business. When recommending similar items, some customers may want to place more emphasis on popular items because they increase the likelihood of user interaction, while others may want to de-emphasize popular items to surface recommendations that are more similar to the selected item but are less widely known. This launch gives you more control over the degree to which popularity influences Similar-Items recommendations, so you can tune the model to meet your particular business needs.
In this post, we show you how to tune popularity for the Similar-Items recipe. We specify a value closer to zero to include more popular items, and specify a value closer to 1 to place less emphasis on popularity.
Example use cases
To explore the impact of this new feature in greater detail, let’s review two examples. [1]
First, we used the Similar-Items recipe to find recommendations similar to Disney’s 1994 movie The Lion King (IMDB record). When the popularity discount is set to 0, Amazon Personalize recommends movies that have a high frequency of occurrence (are popular). In this example, the movie Seven (a.k.a. Se7en), which occurred 19,295 times in the dataset, is recommended at rank 3.0.

By tuning the popularity discount to a value of 0.4 for The Lion King recommendations, we see that the rank of the movie Seven drops to 4.0. We also see movies from the Children genre like Babe, Beauty and the Beast, Aladdin, and Snow White and the Seven Dwarfs get recommended at a higher rank despite their lower overall popularity in the dataset.

Let’s explore another example. We used the Similar-Items recipe to find recommendations similar to Disney and Pixar’s 1995 movie Toy Story (IMDB record). When the popularity discount is set to 0, Amazon Personalize recommends movies that have a high frequency occurrence in the dataset. In this example, we see that the movie Twelve Monkeys (a.k.a. 12 Monkeys), which occurred 6,678 times in the dataset, is recommended at rank 5.0.

By tuning the popularity discount to a value of 0.4 for Toy Story recommendations, we see that the rank of the Twelve Monkeys is no longer recommended in the top 10. We also see movies from the Children genre like Aladdin, Toy Story 2, and A Bug’s Life get recommended at a higher rank despite their lower overall popularity in the dataset.

Placing greater emphasis on more popular content can help increase likelihood that users will engage with item recommendations. Reducing emphasis on popularity may surface recommendations that seem more relevant to the queried item, but may be less popular with users. You can tune the degree of importance placed on popularity to meet your business needs for a specific personalization campaign.
Implement popularity tuning
To tune popularity for the Similar-Items recipe, configure the popularity_discount_factor hyperparameter via the AWS Management Console, the AWS SDKs, or the AWS Command Line Interface (AWS CLI).
The following is sample code setting the popularity discount factor to 0.5 via the AWS SDK:

{
response = personalize.create_solution(
name=”movie_lens-with-popularity-discount-0_5″.
recipeARN=”arn:aws:personalize:::recipe/aws-similar-items”,
datasetGroupArn=dsg_arn,
solutionConfig={
“algorithmHyperParameters” : {
# set the preferred value of popularity discount here
“popularity_discount_factor” : “0.50”
}
}
]
}

The following screenshot shows setting the popularity discount factor to 0.3 on the Amazon Personalize console.

Conclusion
With popularity tuning, you can now further refine the Similar-Items recipe within Amazon Personalize to control the degree to which popularity influences item recommendations. This gives you greater control over defining the end-user experience and what is included or excluded in your Similar-Items recommendations.
For more details on how to implement popularity tuning for the Similar-Items recipe, refer to documentation.
References
[1] Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

About the Authors
Julia McCombs Clark is a  Sr. Technical Product Manager on the Amazon Personalize team.
Nihal Harish is a Software Development Engineer on the Amazon Personalize team.

Best AI Games (2023)

Some industry insiders claim that the most useful applications of artificial intelligence in video games are the ones that go under the radar. Artificial intelligence video games are always evolving. Each kind of game will use AI in its unique way. AI programs the non-playable characters’ responses and behaviors. In this case, it’s crucial since the characters need to seem credible and capable.

The advancement of artificial intelligence technologies is largely attributable to work done in the gaming industry. In the past, AI was employed to help you plan your next move. In today’s games, AI is used to improve aesthetics and fix any problems that may arise during play. Contrarily, AI games do not rely on artificial intelligence. After all, you can use deep learning more practically than just amusement. Sophia the Robot, for instance, is used as a tool to educate the youth of today about AI for the benefit of the future.

F.E.A.R.

First Encounter Assault Recon is a first-person shooter horror game with psychological elements available for the Xbox 360, PlayStation 3, and Microsoft Windows. It’s one of the best artificial intelligence games and the first in the F.E.A.R. series. Produced by Monolith Productions and released at launch by Vivendi Universal Games’ Sierra Entertainment imprint. It’s a shame that few people talk about the fantastic first-person shooter F.E.A.R., which had engaging gameplay, difficult enemy encounters, and superior artificial intelligence. F.E.A.R. is the first video game to incorporate Goal Oriented Action Planning (GOAP), a form of artificial intelligence. The technology enables opponents to act like humans, making gunfights more exciting and memorable.

Last of Us

Sony Interactive Entertainment’s 2013 AI game The Last of Us has garnered a passionate fanbase—a survival horror game. Joel and Ellie and the epidemic are featured. AI dominates this survival game. Each playable character has a distinct personality and reacts differently to player actions. The game’s complex past offers various paths. Non-playable characters may help the player in danger or ambush them. When even your comrades run out of bullets, you’re fighting. This show’s characters are introspective and creative. Even without orders, Ellie kills adversaries. She can employ shields to locate her opponent. AI-assisted games go beyond story progression. 

Splinter Cell: Blacklist

All Blacklist operations have the same overarching objective: evade security. In this case, the guard AI is quite impressive, and artificial intelligence has always been a point of fascination in the Splinter Cell games. It is a challenging stealth game, as you say. It’s like a chess game, and computers are crazy about chess. You enter a zone, locate all the guards, plan your escape, and proceed with the task. However, it’s more challenging than it sounds. The guards are educated to recognize and respond to the slightest of shifts visually and aurally.

XCOM: Enemy Unknown

The 2012 XCOM reboot’s AI was a major factor in the game’s popularity. The developer of this AI reasoned that if it were also witty, that would be even better. The utility was created due to technological progress, which made possible “a system that assigned a quantitative value to every conceivable activity.” Because of its limited movement options, XCOM’s AI has to carefully plan the most efficient course of action for each turn; this is one of the game’s most recognizable features. It would consider how close you are to the nearest objective, if you’re near any hostile aliens, how many enemies there are, how they behave, etc. Other prospective game makers should consider adopting this AI approach.

Halo: CE

The Halo series is another popular video game franchise well-known for its formidable computer opponents. This facet is one of the primary reasons why Covenant and the Flood have evolved into recognizable adversaries in the Halo series of video games. Combat Evolved, the first game in the series marked a watershed moment in the evolution of video game AI. Some of the tactics that Grunts, Brutes, and other similar foes use are unique to this franchise and cannot be found in any other games in the series. Halo: Reach is yet another game that successfully utilizes artificial intelligence.

Minecraft

Since its release in 2012, Minecraft has always impressed. Due to the lack of predetermined goals, many players find it a fun sandbox experience. Depending on your approach to building your Minecraft world, you might have a lot of pleasure or stress. However, Minecraft offers a variety of difficulty settings for those who enjoy a serious challenge. Fans want both the adventure mode and the spectator mode. However, in general, this game can go on forever. It’s very similar to online Lego games in that you constantly build. The game uses AI to change based on how you play it. Each new universe that players make is more unique than the last. These games use artificial intelligence to preserve the integrity of the players’ worlds while maintaining their individuality.

Rocket League

When it comes to artificial intelligence games, Rocket League ranks high. The game gives players the football-meets-cars dynamic they didn’t know they wanted. The popular video game Rocket League has a simple premise: you play football while driving a car. The players use rocket-powered vehicles to kick and pass the ball. The game’s AI only stands out a little. This is most noticeable in the first phases of the game when ball techniques are used. Not only is it brilliant at artificial intelligence games, but it also knows how to put AI to good use.

Stockfish

Among the best games for artificial intelligence in Stock, Stockfish, a free and open-source chess program, is easily accessible online. Because of its open-source nature, it undergoes regular reviews and updates, much like encrypted messaging apps. Every few months, the system is upgraded and made more challenging. In the game, one plays a chess match against a computer. Rare individuals have succeeded in beating this artificial intelligence system.

Google Quick Draw

Beautiful but over-the-top video games are only sometimes entertaining and engaging. The Google Quick Draw feature is a perfect illustration of this. Google Quick Draw was developed by the inventive technologist Jonas Jongejan, and it’s a kind of Pictionary with AI. Players answer a question in this game by drawing the computer’s suggested answer. Doodles can be recognized in-game with the help of AI. The computer learns more about objects, people, and locations with every stroke and line it draws. Quick Draw is a fun game that can be played instantly with a Google search. It’s also a great stepping stone for anyone curious about machine learning.

FIFA

Thanks to its long history, FIFA has established its dominance over the game industry. Almost every gamer has tried their hand at FIFA at least once. As a result, games are less likely to lose their appeal over time. In the most recent FIFA games, an AI technology called football knowledge is used. Like when it creates universes, AI ensures the balls follow scientific rules. Dribblers will be given more opportunities to practice and develop their abilities. On the other hand, the AI’s strategy can be uncovered via your teammates, making it easier (or harder, depending on your play style) for you to take control of the game.

Red Dead Redemption 2

AI manages non-playable characters in Red Dead Redemption 2. Individuality is brought to life by machine learning technologies. Every action reacts to your decision, and the reactions are virtually always realistic. Some people might make fun of your clothes, and your weaponry could accidentally kill a helpless insect. These features are unimportant, but they make for far more interesting gameplay when combined with AI technology.

Half-Life

Half-Life, which was released in 1998, is among the most innovative video games that have ever been created. The game brought Half-Life to a wider audience and demonstrated how important AI is to the gaming business. Without a shadow of a question, the Marines are one of the most jaw-dropping aspects found in Half-Life. How these different forces attempted to creep up on the gamer is fascinating.

Grand Theft Auto 5

Rockstar has made great strides in artificial intelligence, and Grand Theft Auto 5 is another prime example. It’s a fantastic example of how great a video game can be when the artificial intelligence is spot on. Pedestrians are now more intelligent than ever, responding creatively to player input, especially with an instant effect.

Middle Earth: Shadow Of Mordor

The Nemesis System is one of the most distinctive elements that sets Shadow of Mordor unique from other games. The first game is still quite well remembered, even though Shadow of War is an improvement. When discussing games with impressive artificial intelligence, it would be unwise to understate the Nemesis System’s potentially limitless applications. Those passionate about the Nemesis System can’t wait to see how other game designers work with this concept.

Darkforest

Facebook has already begun implementing AI experiments across its product line, including its augmented reality glasses. Facebook is incorporating AI into its games this time around. Using artificial intelligence, Facebook created Darkforest, a version of Go with nearly infinite moves. AI might replace human competitors in this setting. Examples of such methods include Darkforest (or Darkfores2), which uses a hybrid of neural networks and search-based techniques to choose its next best action. It anticipates your next action and evaluates it accordingly. Players often regard Darkforest as a formidable AI test. When it counts, there are many factors to consider in a game of Go. Probability, statistics, and tried-and-true methods should all be taken into account. Machine learning is used to analyze and play with these factors. This AI-human clash is the toughest one to date.

AlphaGo Zero

The artificial intelligence game Go can be played whenever the player wants. According to its Chinese roots as a game of trapping your opponent’s stones, Go’s basic techniques make it a fair game for AI and humans. Like chess, a game of Go ends after all legal moves have been made. After all, the stones have been moved and captured, and the winner is the player with the most. Like Darkforest, AlphaGo Zero uses complex search tree algorithms to foretell moves. In particular, “advanced search tree” methods are used. A network is used to determine the next move, while another network is used to determine the winner. Your computerized opponents will get smarter over time, thanks to machine learning. Moreover, unlike humans, it never seems to tire of playing. The artificial intelligence powering AlphaGo has already defeated the best Go players in the world. It’s time for the next competitors to throw their hats in the ring.

Metal Gear Solid V: The Phantom Pain

The artificial intelligence in Metal Gear Solid games is usually quite advanced for its time. As stealth games, they should feature difficult artificial intelligence. The artificial intelligence in Metal Gear Solid V: The Phantom Pain is the best in the series. Each assignment in The Phantom Pain can be accomplished in various ways. The AI will implement countermeasures if they rely too much on only one or two strategies. A player’s enemies will start donning beefier helmets if they’re repeatedly shot in the head. The opponent will have additional lights if the players decide to attack at night. If players snipe from afar, the military will use mortars to counter the threat. Metal Gear Solid V’s enemies are skilled tacticians who will force you to adapt and stay one step ahead of them.

Left 4 Dead 2

The player-versus-player mode in Left 4 Dead 2 is robust. The AI Director is always present whether players are engaged in cooperative or competitive play. The game’s AI Director determines the location and timing of enemy spawns, the availability of goods, and the number of Special Infected encountered. The AI Director’s abilities in this area are unparalleled. The AI Director is wise and constantly switches things up to keep players guessing. It’s not overcrowded with foes but rather delicately calibrated to keep players on edge and feeling threatened. It guarantees that every single run-through of a campaign will be unique.

Stellaris 

Numerous examples of AIs in strategy games cannot compete with human players. The complexity and variety of these games make it extremely difficult to create an AI that can provide a fair challenge. Cheating is a common way for games to make up for problems. Sometimes the AI has a slight advantage, like more data, and sometimes the benefit is more obvious, like more time or money. Stellaris is an intricate strategy game with a heavy emphasis on the economy. The game aims to amass resources and expand your realm. At greater difficulties, the AI needs bonuses to keep up and quickly catch up if it still needs to receive them. The AI regularly receives updates that expand its capabilities thanks to Paradox Entertainment’s Custodian Initiative. The fact that it can handle anything is a credit to the designers.

Resident Evil 2

In Resident Evil 2, most bad guys aren’t particularly bright. They bumble at the player to close the gap and engage in melee combat. Since they’re zombies, that makes perfect sense. But now that Mr. X is here, everything has changed. Throughout the game, he poses a constant danger to Leon Kennedy and Claire Redfield while they work at the Raccoon City Police Department. Mr. X in Resident Evil 2 walks straight at the player, making him easy to kite. However, this is done solely so that the game can be completed. As a hunter, Mr. X generally exhibits much more nuanced behavior. If the player is lost, he will hunt for them carefully and react to loud noises like shooting or fighting. Instead of charging in to disturb the combat, he will stand back and watch as a zombie savages the player.

Alien: Isolation

The xenomorph that follows you around for the entirety of Alien: Isolation is a big part of the game’s appeal. It’s a perfect predator and a film horror icon. The game captures Alien’s rising tension when the player learns their opponent is smart. The xenomorph’s intelligence is its most remarkable quality. It retains the player’s strategies and counters with difficulty. The xenomorph will become increasingly vigilant if the player repeatedly uses the same hiding place. If the same techniques are used repeatedly, the game will learn to disregard them. The xenomorph will eventually figure out how to avoid the player’s flamethrower and will cause them to waste ammunition trying to scare it away.

Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Best AI Games (2023) appeared first on MarkTechPost.

Exploring AVFormer: Google AI’s Innovative Approach to Augment Audio …

One of the biggest obstacles facing automated speech recognition (ASR) systems is their inability to adapt to novel, unbounded domains. Audiovisual ASR (AV-ASR) is a technique for enhancing the accuracy of ASR systems in multimodal video, especially when the audio is loud. This feature is invaluable for movies shot “in the wild” when the speaker’s mouth might not be in view. Models for this task are often large and comprise both visual and audio encoders and datasets for this task tend to be small.

As other AVASR works, it is only taught and tested using instructional videos. As trials by Google’s research team demonstrate, it performs badly when applied to novel domains using only a single training data set. However, several newly released massive audio-only models have been greatly optimized using self-supervised pretraining and tremendous supervised training on audio-only data from audiobooks like LibriLight and LibriSpeech. Models with billions of parameters, widespread availability, and impressive cross-domain generalization are all features of this class of models. The idea is to recycle the massive investment in such models’ training by reusing their weights. Inspiring them are recent efforts that modify frozen foundation models for use in a variety of domains.

While these models retain the advantages of audio-only pretraining for zero-shot generalization, they now integrate visual inputs in a lightweight manner to enable AV-ASR. The AVFormer framework uses light projection layers and trainable adaptors to infuse visual input into a static ASR model. 

Researchers demonstrate that these can be taught with minimal extra training time and parameters on a modest amount of poorly labeled video data. This reduces the potential for domain shift and catastrophic forgetting associated with end-to-end finetuning. They also incorporate a basic curricular plan during training to guarantee consistency in the finetuning of these adapters, which they demonstrate is essential for the model to interpret auditory and visual data in tandem correctly. Finally, they show that the model beats state-of-the-art zero-shot approaches on three AV-ASR benchmarks from various domains while maintaining respectable performance on baselines that rely just on audio.

Zero-shot generalization across all AV domains is the target without sacrificing quality on audio-only benchmarks. A state-of-the-art ASR model is used as a starting point and then modified for use in unrestricted AV-ASR. The following two elements are used to include visual features derived from a robust pretrained visual model into the model:

They use a linear projection of visual elements to incorporate audio tokens.

To facilitate domain adaptation, they introduce minimally invasive adapters into the ASR model’s encoder before it is frozen.

Here are some of the architecture’s most crucial parts:

Encoder and decoder for frozen conformers

Layers of the optical encoder and projection are used for projecting and extracting features from images.

Adaptation layers were added to the core infrastructure, specifically for the audio spectrum. 

To facilitate domain adaptation across multiple modalities, the architecture features a frozen Conformer encoder-decoder model and a frozen CLIP encoder (frozen layers shown in grey with a lock symbol), as well as two lightweight trainable modules, a visual projection layer (shown in orange) and bottleneck adapters (shown in blue). Researchers recommend a two-stage approach to curriculum learning, with the first phase focusing on training the adapters (blue) without any visual tokens and the second phase tuning the visual projection layer (orange) while keeping the rest of the model static.

Researchers evaluate AVFormer’s zero-shot performance on the How2, VisSpeech, and Ego4D AV-ASR benchmarks compared to BEST-RQ, the audio version of the model, and AVATAR, the state-of-the-art AV-ASR. When both AVATAR and BEST-RQ are trained on LibriSpeech and the complete HowTo100M dataset, AVFormer still surpasses them. Notably, this requires training 600M parameters for BEST-RQ but only 4M parameters for AVFormer; therefore, it only needs a small subset of the training dataset (5% of HowTo100M). In addition, they compare AVFormer to an audio-only baseline called LibriSpeech and find that it outperforms both.

The state-of-the-art in zero-shot performance on many AV-ASR datasets is compared. LibriSpeech, an audio-only platform, also features performances. Lower WER percentages indicate higher performance. While the entirety of AVATAR and BEST-RQ are finetuned on HowTo100M, AVFormer’s small collection of finetuned parameters allows it to function effectively with as little as 5% of the dataset.

Researchers unveil AVFormer, an efficient tool for converting static examples of state-of-the-art ASR models into those suitable for AVASR. This method is realistic and effective, as seen by its zero-shot efficiency. Tuning the full parameter set of pre-trained models becomes problematic as ASR models grow in size and complexity across domains. The method is parameter efficient, allowing for simultaneous domain transfer and visual input blending.

Check Out The Paper and Blog Article. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Exploring AVFormer: Google AI’s Innovative Approach to Augment Audio-Only Models with Visual Information & Streamlined Domain Adaptation appeared first on MarkTechPost.

Meet STEVE-1: An Instructable Generative AI Model For Minecraft That F …

Powerful AI models may now be operated and interacted with via language commands, making them widely available and adaptable. Stable Diffusion, which transforms natural language into a picture, and ChatGPT, which can reply to messages written in natural language and carry out various tasks, are examples of such models. While the cost of training those models can range from tens of thousands to millions of dollars, there has been a similarly exciting development in which strong open-source foundation models, such as LLaMA, can be improved with surprisingly little computation and data to become instruction-following. 

Researchers from the University of Toronto and the Vector Institute for Artificial Intelligence investigate the viability of such a strategy in sequential decision-making domains in this research. Diverse data for sequential decision-making is highly costly and frequently does not have an easy-to-use “instruction” label like captions for pictures, unlike in the text and image domains. They suggest modifying pretrained generative behavior models using instruction data, building on previous developments in instruction-tuned LLMs like Alpaca. Two foundation models for the well-known open-ended video game Minecraft have been made available in the last year: MineCLIP, a model for aligning text and video clips, and VPT, a model for behavior. 

This has created a fascinating opportunity to investigate instruction-following optimization in Minecraft’s sequential decision-making domain. The agent has an extensive understanding of the Minecraft world because VPT was trained on 70k hours of Minecraft playtime. The VPT model may, however, have the potential for broad, controlled behavior if it is fine-tuned to follow directions, much as the enormous potential of LLMs was unlocked by aligning them to obey instructions. They specifically show in their research how to fine-tune VPT to obey short-horizon text instructions using just $60 of computing and around 2,000 instruction-labeled trajectory segments. 

Their methodology is influenced by unCLIP, which was used to develop the well-known text-to-image model DALLe 2. They break down the challenge of designing a Minecraft agent that follows instructions into a VPT model adjusted to accomplish visual objectives stored in the MineCLIP latent space and a previous model that converts text instructions into MineCLIP visual embeddings. They employ visual MineCLIP embeddings rather than pricey text-instruction labels to fine-tune VPT via behavioral cloning with self-supervised data produced by hindsight relabeling. 

They combine unCLIP with classifier-free guiding to develop their agent, dubbed STEVE-1, which considerably exceeds the benchmark set by Baker et al. for open-ended command following in Minecraft using low-level controllers (mouse and keyboard) and raw pixel inputs. 

The following are their primary contributions: 

• They develop STEVE-1, a Minecraft agent with high accuracy while executing open-ended text and visual commands. They conduct in-depth analyses of their agent, demonstrating that it can carry out various short-horizon tasks1 in Minecraft. They demonstrate that straightforward prompt chaining may significantly boost performance for longer-horizon operations like construction and crafts. 

• They explain how to build STEVE-1 with just $60 of computing, demonstrating that unCLIP and classifier-free guiding are crucial for effective performance in sequential decision-making. 

• They make the STEVE-1 model weights, assessment scripts, and training scripts available to encourage future study on teachable, open-ended sequential decision-making agents.

The website has video demos of the agent in the game.

Check Out The Paper, Code, and Project Page. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet STEVE-1: An Instructable Generative AI Model For Minecraft That Follows Both Text And Visual Instructions And Only Costs $60 To Train appeared first on MarkTechPost.

Accelerate PyTorch with DeepSpeed to train large language models with …

Training large language models (LLMs) with billions of parameters can be challenging. In addition to designing the model architecture, researchers need to set up state-of-the-art training techniques for distributed training like mixed precision support, gradient accumulation, and checkpointing. With large models, the training setup is even more challenging because the available memory in a single accelerator device bounds the size of models trained using only data parallelism, and using model parallel training requires additional level of modifications to the training code. Libraries such as DeepSpeed (an open-source deep learning optimization library for PyTorch) address some of these challenges, and can help accelerate model development and training.
In this post, we set up training on the Intel Habana Gaudi-based Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances and quantify the benefits of using a scaling framework such as DeepSpeed. We present scaling results for an encoder-type transformer model (BERT with 340 million to 1.5 billion parameters). For the 1.5-billion-parameter model, we achieved a scaling efficiency of 82.7% across 128 accelerators (16 dl1.24xlarge instances) using DeepSpeed ZeRO stage 1 optimizations. The optimizer states were partitioned by DeepSpeed to train large models using the data parallel paradigm. This approach has been extended to train a 5-billion-parameter model using data parallelism. We also used Gaudi’s native support of the BF16 data type for reduced memory size and increased training performance compared to using the FP32 data type. As a result, we achieved pre-training (phase 1) model convergence within 16 hours (our target was to train a large model within a day) for the BERT 1.5-billion-parameter model using the wikicorpus-en dataset.
Training setup
We provisioned a managed compute cluster comprised of 16 dl1.24xlarge instances using AWS Batch. We developed an AWS Batch workshop that illustrates the steps to set up the distributed training cluster with AWS Batch. Each dl1.24xlarge instance has eight Habana Gaudi accelerators, each with 32 GB of memory and a full mesh RoCE network between cards with a total bi-directional interconnect bandwidth of 700 Gbps each (see Amazon EC2 DL1 instances Deep Dive for more information). The dl1.24xlarge cluster also used four AWS Elastic Fabric Adapters (EFA), with a total of 400 Gbps interconnect between nodes.
The distributed training workshop illustrates the steps to set up the distributed training cluster. The workshop shows the distributed training setup using AWS Batch and in particular, the multi-node parallel jobs feature to launch large-scale containerized training jobs on fully managed clusters. More specifically, a fully managed AWS Batch compute environment is created with DL1 instances. The containers are pulled from Amazon Elastic Container Registry (Amazon ECR) and launched automatically into the instances in the cluster based on the multi-node parallel job definition. The workshop concludes by running a multi-node, multi-HPU data parallel training of a BERT (340 million to 1.5 billion parameters) model using PyTorch and DeepSpeed.
BERT 1.5B pre-training with DeepSpeed
Habana SynapseAI v1.5 and v1.6 support DeepSpeed ZeRO1 optimizations. The Habana fork of the DeepSpeed GitHub repository includes the modifications necessary to support the Gaudi accelerators. There is full support of distributed data parallel (multi-card, multi-instance), ZeRO1 optimizations, and BF16 data types.
All these features are enabled on the BERT 1.5B model reference repository, which introduces a 48-layer, 1600-hidden dimension, and 25-head bi-directional encoder model, derived from a BERT implementation. The repository also contains the baseline BERT Large model implementation: a 24-layer, 1024-hidden, 16-head, 340-million-parameter neural network architecture. The pre-training modeling scripts are derived from the NVIDIA Deep Learning Examples repository to download the wikicorpus_en data, preprocess the raw data into tokens, and shard the data into smaller h5 datasets for distributed data parallel training. You can adopt this generic approach to train your custom PyTorch model architectures using your datasets using DL1 instances.
Pre-training (phase 1) scaling results
For pre-training large models at scale, we mainly focused on two aspects of the solution: training performance, as measured by the time to train, and cost-effectiveness of arriving at a fully converged solution. Next, we dive deeper into these two metrics with BERT 1.5B pre-training as an example.
Scaling performance and time to train
We start by measuring the performance of the BERT Large implementation as a baseline for scalability. The following table lists the measured throughput of sequences per second from 1-8 dl1.24xlarge instances (with eight accelerator devices per instance). Using the single-instance throughput as baseline, we measured the efficiency of scaling across multiple instances, which is an important lever to understand the price-performance training metric.

Number of Instances
Number of Accelerators
Sequences per Second
Sequences per Second per Accelerator
Scaling Efficiency

1
8
1,379.76
172.47
100.0%

2
16
2,705.57
169.10
98.04%

4
32
5,291.58
165.36
95.88%

8
64
9,977.54
155.90
90.39%

The following figure illustrates the scaling efficiency.

For BERT 1.5B, we modified the hyperparameters for the model in the reference repository to guarantee convergence. The effective batch size per accelerator was set to 384 (for maximum memory utilization), with micro-batches of 16 per step and 24 steps of gradient accumulation. Learning rates of 0.0015 and 0.003 were used for 8 and 16 nodes, respectively. With these configurations, we achieved convergence of the phase 1 pre-training of BERT 1.5B across 8 dl1.24xlarge instances (64 accelerators) in approximately 25 hours, and 15 hours across 16 dl1.24xlarge instances (128 accelerators). The following figure shows the average loss as a function of number of training epochs, as we scale up the number of accelerators.

With the configuration described earlier, we obtained 85% strong scaling efficiency with 64 accelerators and 83% with 128 accelerators, from a baseline of 8 accelerators in a single instance. The following table summarizes the parameters.

Number of Instances
Number of Accelerators
Sequences per Second
Sequences per Second per Accelerator
Scaling Efficiency

1
8
276.66
34.58
100.0%

8
64
1,883.63
29.43
85.1%

16
128
3,659.15
28.59
82.7%

The following figure illustrates the scaling efficiency.

Conclusion
In this post, we evaluated support for DeepSpeed by Habana SynapseAI v1.5/v1.6 and how it helps scale LLM training on Habana Gaudi accelerators. Pre-training of a 1.5-billion-parameter BERT model took 16 hours to converge on a cluster of 128 Gaudi accelerators, with 85% strong scaling. We encourage you to take a look at the architecture demonstrated in the AWS workshop and consider adopting it to train custom PyTorch model architectures using DL1 instances.

About the authors
Mahadevan Balasubramaniam is a Principal Solutions Architect for Autonomous Computing with nearly 20 years of experience in the area of physics-infused deep learning, building, and deploying digital twins for industrial systems at scale. Mahadevan obtained his PhD in Mechanical Engineering from the Massachusetts Institute of Technology and has over 25 patents and publications to his credit.
RJ is an engineer in Search M5 team leading the efforts for building large scale deep learning systems for training and inference. Outside of work he explores different cuisines of food and plays racquet sports.
Sundar Ranganathan is the Head of Business Development, ML Frameworks on the Amazon EC2 team. He focuses on large-scale ML workloads across AWS services like Amazon EKS, Amazon ECS, Elastic Fabric Adapter, AWS Batch, and Amazon SageMaker. His experience includes leadership roles in product management and product development at NetApp, Micron Technology, Qualcomm, and Mentor Graphics.
Abhinandan Patni is a Senior Software Engineer at Amazon Search. He focuses on building systems and tooling for scalable distributed deep learning training and real time inference.
Pierre-Yves Aquilanti is Head of Frameworks ML Solutions at Amazon Web Services where he helps develop the industry’s best cloud based ML Frameworks solutions. His background is in High Performance Computing and prior to joining AWS, Pierre-Yves was working in the Oil & Gas industry. Pierre-Yves is originally from France and holds a Ph.D. in Computer Science from the University of Lille.

Retrain ML models and automate batch predictions in Amazon SageMaker C …

You can now retrain machine learning (ML) models and automate batch prediction workflows with updated datasets in Amazon SageMaker Canvas, thereby making it easier to constantly learn and improve the model performance and drive efficiency. An ML model’s effectiveness depends on the quality and relevance of the data it’s trained on. As time progresses, the underlying patterns, trends, and distributions in the data may change. By updating the dataset, you ensure that the model learns from the most recent and representative data, thereby improving its ability to make accurate predictions. Canvas now supports updating datasets automatically and manually enabling you to use the latest version of the tabular, image, and document dataset for training ML models.
After the model is trained, you may want to run predictions on it. Running batch predictions on an ML model enables processing multiple data points simultaneously instead of making predictions one by one. Automating this process provides efficiency, scalability, and timely decision-making. After the predictions are generated, they can be further analyzed, aggregated, or visualized to gain insights, identify patterns, or make informed decisions based on the predicted outcomes. Canvas now supports setting up an automated batch prediction configuration and associating a dataset to it. When the associated dataset is refreshed, either manually or on a schedule, a batch prediction workflow will be triggered automatically on the corresponding model. Results of the predictions can be viewed inline or downloaded for later review.
In this post, we show how to retrain ML models and automate batch predictions using updated datasets in Canvas.
Overview of solution
For our use case, we play the part of a business analyst for an ecommerce company. Our product team wants us to determine the most critical metrics that influence a shopper’s purchase decision. For this, we train an ML model in Canvas with a customer website online session dataset from the company. We evaluate the model’s performance and, if needed, retrain the model with additional data to see if it improves the performance of the existing model or not. To do so, we use the auto update dataset capability in Canvas and retrain our existing ML model with the latest version of training dataset. Then we configure automatic batch prediction workflows—when the corresponding prediction dataset is updated, it automatically triggers the batch prediction job on the model and makes the results available for us to review.
The workflow steps are as follows:

Upload the downloaded customer website online session data to Amazon Simple Storage Service (Amazon S3) and create a new training dataset Canvas. For the full list of supported data sources, refer to Importing data in Amazon SageMaker Canvas.
Build ML models and analyze their performance metrics. Refer to the steps on how to build a custom ML Model in Canvas and evaluate a model’s performance.
Set up auto update on the existing training dataset and upload new data to the Amazon S3 location backing this dataset. Upon completion, it should create a new dataset version.
Use the latest version of the dataset to retrain the ML model and analyze its performance.
Set up automatic batch predictions on the better performing model version and view the prediction results.

You can perform these steps in Canvas without writing a single line of code.
Overview of data
The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period. The following table outlines the data schema.

Column Name
Data Type
Description

Administrative
Numeric
Number of pages visited by the user for user account management-related activities.

Administrative_Duration
Numeric
Amount of time spent in this category of pages.

Informational
Numeric
Number of pages of this type (informational) that the user visited.

Informational_Duration
Numeric
Amount of time spent in this category of pages.

ProductRelated
Numeric
Number of pages of this type (product related) that the user visited.

ProductRelated_Duration
Numeric
Amount of time spent in this category of pages.

BounceRates
Numeric
Percentage of visitors who enter the website through that page and exit without triggering any additional tasks.

ExitRates
Numeric
Average exit rate of the pages visited by the user. This is the percentage of people who left your site from that page.

Page Values
Numeric
Average page value of the pages visited by the user. This is the average value for a page that a user visited before landing on the goal page or completing an ecommerce transaction (or both).

SpecialDay
Binary
The “Special Day” feature indicates the closeness of the site visiting time to a specific special day (such as Mother’s Day or Valentine’s Day) in which the sessions are more likely to be finalized with a transaction.

Month
Categorical
Month of the visit.

OperatingSystems
Categorical
Operating systems of the visitor.

Browser
Categorical
Browser used by the user.

Region
Categorical
Geographic region from which the session has been started by the visitor.

TrafficType
Categorical
Traffic source through which user has entered the website.

VisitorType
Categorical
Whether the customer is a new user, returning user, or other.

Weekend
Binary
If the customer visited the website on the weekend.

Revenue
Binary
If a purchase was made.

Revenue is the target column, which will help us predict whether or not a shopper will purchase a product or not.
The first step is to download the dataset that we will use. Note that this dataset is courtesy of the UCI Machine Learning Repository.
Prerequisites
For this walkthrough, complete the following prerequisite steps:

Split the downloaded CSV that contains 20,000 rows into multiple smaller chunk files.

This is so that we can showcase the dataset update functionality. Ensure all the CSV files have the same headers, otherwise you may run into schema mismatch errors while creating a training dataset in Canvas.

Create an S3 bucket and upload online_shoppers_intentions1-3.csv to the S3 bucket.

Set aside 1,500 rows from the downloaded CSV to run batch predictions on after the ML model is trained.
Remove the Revenue column from these files so that when you run batch prediction on the ML model, that is the value your model will be predicting.

Ensure all the predict*.csv files have the same headers, otherwise you may run into schema mismatch errors while creating a prediction (inference) dataset in Canvas.

Perform the necessary steps to set up a SageMaker domain and Canvas app.

Create a dataset
To create a dataset in Canvas, complete the following steps:

In Canvas, choose Datasets in the navigation pane.
Choose Create and choose Tabular.
Give your dataset a name. For this post, we call our training dataset OnlineShoppersIntentions.
Choose Create.
Choose your data source (for this post, our data source is Amazon S3).

Note that as of this writing, the dataset update functionality is only supported for Amazon S3 and locally uploaded data sources.

Select the corresponding bucket and upload the CSV files for the dataset.

You can now create a dataset with multiple files.

Preview all the files in the dataset and choose Create dataset.

We now have version 1 of the OnlineShoppersIntentions dataset with three files created.

Choose the dataset to view the details.

The Data tab shows a preview of the dataset.

Choose Dataset details to view the files that the dataset contains.

The Dataset files pane lists the available files.

Choose the Version History tab to view all the versions for this dataset.

We can see our first dataset version has three files. Any subsequent version will include all the files from previous versions and will provide a cumulative view of the data.

Train an ML model with version 1 of the dataset
Let’s train an ML model with version 1 of our dataset.

In Canvas, choose My models in the navigation pane.
Choose New model.
Enter a model name (for example, OnlineShoppersIntentionsModel), select the problem type, and choose Create.
Select the dataset. For this post, we select the OnlineShoppersIntentions dataset.

By default, Canvas will pick up the most current dataset version for training.

On the Build tab, choose the target column to predict. For this post, we choose the Revenue column.
Choose Quick build.

The model training will take 2–5 minutes to complete. In our case, the trained model gives us a score of 89%.

Set up automatic dataset updates
Let’s update on our dataset using the auto update functionality and bring in more data and see if the model performance improves with the new version of dataset. Datasets can be manually updated as well.

On the Datasets page, select the OnlineShoppersIntentions dataset and choose Update dataset.
You can either choose Manual update, which is a one-time update option, or Automatic update, which allows you to automatically update your dataset on a schedule. For this post, we showcase the automatic update feature.

You’re redirected to the Auto update tab for the corresponding dataset. We can see that Enable auto update is currently disabled.

Toggle Enable auto update to on and specify the data source (as of this writing, Amazon S3 data sources are supported for auto updates).
Select a frequency and enter a start time.
Save the configuration settings.

An auto update dataset configuration has been created. It can be edited at any time. When a corresponding dataset update job is triggered on the specified schedule, the job will appear in the Job history section.

Next, let’s upload the online_shoppers_intentions4.csv, online_shoppers_intentions5.csv, and online_shoppers_intentions6.csv files to our S3 bucket.

We can view our files in the dataset-update-demo S3 bucket.

The dataset update job will get triggered at the specified schedule and create a new version of the dataset.

When the job is complete, dataset version 2 will have all the files from version 1 and the additional files processed by the dataset update job. In our case, version 1 has three files and the update job picked up three additional files, so the final dataset version has six files.

We can view the new version that was created on the Version history tab.

The Data tab contains a preview of the dataset and provides a list of all the files in the latest version of the dataset.

Retrain the ML model with an updated dataset
Let’s retrain our ML model with the latest version of the dataset.

On the My models page, choose your model.
Choose Add version.
Select the latest dataset version (v2 in our case) and choose Select dataset.
Keep the target column and build configuration similar to the previous model version.

When the training is complete, let’s evaluate the model performance. The following screenshot shows that adding additional data and retraining our ML model has helped improve our model performance.

Create a prediction dataset
With an ML model trained, let’s create a dataset for predictions and run batch predictions on it.

On the Datasets page, create a tabular dataset.
Enter a name and choose Create.
In our S3 bucket, upload one file with 500 rows to predict.

Next, we set up auto updates on the prediction dataset.

Toggle Enable auto update to on and specify the data source.
Select the frequency and specify a starting time.
Save the configuration.

Automate the batch prediction workflow on an auto updated predictions dataset
In this step, we configure our auto batch prediction workflows.

On the My models page, navigate to version 2 of your model.
On the Predict tab, choose Batch prediction and Automatic.
Choose Select dataset to specify the dataset to generate predictions on.
Select the predict dataset that we created earlier and choose Choose dataset.
Choose Set up.

We now have an automatic batch prediction workflow. This will be triggered when the Predict dataset is automatically updated.

Now let’s upload more CSV files to the predict S3 folder.

This operation will trigger an auto update of the predict dataset.

This will in turn trigger the automatic batch prediction workflow and generate predictions for us to view.

We can view all automations on the Automations page.

Thanks to the automatic dataset update and automatic batch prediction workflows, we can use the latest version of the tabular, image, and document dataset for training ML models, and build batch prediction workflows that get automatically triggered on every dataset update.
Clean up
To avoid incurring future charges, log out of Canvas. Canvas bills you for the duration of the session, and we recommend logging out of Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.
Conclusion
In this post, we discussed how we can use the new dataset update capability to build new dataset versions and train our ML models with the latest data in Canvas. We also showed how we can efficiently automate the process of running batch predictions on updated data.
To start your low-code/no-code ML journey, refer to the Amazon SageMaker Canvas Developer Guide.
Special thanks to everyone who contributed to the launch.

About the Authors
Janisha Anand is a Senior Product Manager on the SageMaker No/Low-Code ML team, which includes SageMaker Canvas and SageMaker Autopilot. She enjoys coffee, staying active, and spending time with her family.
Prashanth is a Software Development Engineer at Amazon SageMaker and mainly works with SageMaker low-code and no-code products.
Esha Dutta is a Software Development Engineer at Amazon SageMaker. She focuses on building ML tools and products for customers. Outside of work, she enjoys the outdoors, yoga, and hiking.

Expedite the Amazon Lex chatbot development lifecycle with Test Workbe …

Amazon Lex is excited to announce Test Workbench, a new bot testing solution that provides tools to simplify and automate the bot testing process. During bot development, testing is the phase where developers check whether a bot meets the specific requirements, needs and expectations by identifying errors, defects, or bugs in the system before scaling. Testing helps validate bot performance on several fronts such as conversational flow (understanding user queries and responding accurately), intent overlap handling, and consistency across modalities. However, testing is often manual, error-prone, and non-standardized. Test Workbench standardizes automated test management by allowing chatbot development teams to generate, maintain, and execute test sets with a consistent methodology and avoid custom scripting and ad-hoc integrations. In this post, you will learn how Test Workbench streamlines automated testing of a bot’s voice and text modalities and provides accuracy and performance measures for parameters such as audio transcription, intent recognition, and slot resolution for both single utterance inputs and multi-turn conversations. This allows you to quickly identify bot improvement areas and maintain a consistent baseline to measure accuracy over time and observe any accuracy regression due to bot updates.
Amazon Lex is a fully managed service for building conversational voice and text interfaces. Amazon Lex helps you build and deploy chatbots and virtual assistants on websites, contact center services, and messaging channels. Amazon Lex bots help increase interactive voice response (IVR) productivity, automate simple tasks, and drive operational efficiencies across the organization. Test Workbench for Amazon Lex standardizes and simplifies the bot testing lifecycle, which is critical to improving bot design.
Features of Test Workbench
Test Workbench for Amazon Lex includes the following features:

Generate test datasets automatically from a bot’s conversation logs
Upload manually built test set baselines
Perform end-to-end testing of single input or multi-turn conversations
Test both audio and text modalities of a bot
Review aggregated and drill-down metrics for bot dimensions:

Speech transcription
Intent recognition
Slot resolution (including multi-valued slots or composite slots)
Context tags
Session attributes
Request attributes
Runtime hints
Time delay in seconds

Prerequisites
To test this feature, you should have the following:

An AWS account with administrator access
A sample retail bot imported via the Amazon Lex console (for more information, refer to importing a bot)
A test set source, either from:

Conversation logs enabled for the bot to store bot interactions, or
A sample retail test set that can be imported following the instructions provided in this post

In addition, you should have knowledge and understanding of the following services and features:

Amazon Lex
Amazon CloudWatch
AWS Identity and Access Management (IAM)

Create a test set
To create your test set, complete the following steps:

On the Amazon Lex console, under Test workbench in the navigation pane, choose Test sets.

You can review a list of existing test sets, including basic information such as name, description, number of test inputs, modality, and status. In the following steps, you can choose between generating a test set from the conversation logs associated with the bot or uploading an existing manually built test set in a CSV file format.

Choose Create test set.

Generating test sets from conversation logs allows you to do the following:

Include real multi-turn conversations from the bot’s logs in CloudWatch
Include audio logs and conduct tests that account for real speech nuances, background noises, and accents
Speed up the creation of test sets

Uploading a manually built test set allows you to do the following:

Test new bots for which there is no production data
Perform regression tests on existing bots for any new or modified intents, slots, and conversation flows
Test carefully crafted and detailed scenarios that specify session attributes and request attributes

To generate a test set, complete the following steps. To upload a manually built test set, skip to step 7.

Choose Generate a baseline test set.
Choose your options for Bot name, Bot alias, and Language.
For Time range, set a time range for the logs.
For Existing IAM role, choose a role.

Ensure that the IAM role is able to grant you access to retrieve information from the conversation logs. Refer to Creating IAM roles to create an IAM role with the appropriate policy.

If you prefer to use a manually created test set, select Upload a file to this test set.
For Upload a file to this test set, choose from the following options:

Select Upload from S3 bucket to upload a CSV file from an Amazon Simple Storage Service (Amazon S3) bucket.
Select Upload a file to this test set to upload a CSV file from your computer.

You can use the sample test set provided in this post. For more information about templates, choose the CSV Template link on the page.

For Modality, select the modality of your test set, either Text or Audio.

Test Workbench provides testing support for audio and text input formats.

For S3 location, enter the S3 bucket location where the results will be stored.
Optionally, choose an AWS Key Management Service (AWS KMS) key to encrypt output transcripts.
Choose Create.

Your newly created test set will be listed on the Test sets page with one of the following statuses:

Ready for annotation – For test sets generated from Amazon Lex bot conversation logs, the annotation step serves as a manual gating mechanism to ensure quality test inputs. By annotating values for expected intents and expected slots for each test line item, you indicate the “ground truth” for that line. The test results from the bot run are collected and compared against the ground truth to mark test results as pass or fail. This line level comparison then allows for creating aggregated measures.
Ready for testing – This indicates that the test set is ready to be executed against an Amazon Lex bot.
Validation error – Uploaded test files are checked for errors such as exceeding maximum supported length, invalid characters in intent names, or invalid Amazon S3 links containing audio files. If the test set is in the Validation error state, download the file showing the validation details to see test input issues or errors on a line-by-line basis. Once they are addressed, you can manually upload the corrected test set CSV into the test set.

Executing a test set
A test set is de-coupled from a bot. The same test set can be executed against a different bot or bot alias in the future as your business use case evolves. To report performance metrics of a bot against the baseline test data, complete the following steps:

Import the sample bot definition and build the bot (refer to Importing a bot for guidance).
On the Amazon Lex console, choose Test sets in the navigation pane.
Choose your validated test set.

Here you can review basic information about the test set and the imported test data.

Choose Execute test.
Choose the appropriate options for Bot name, Bot alias, and Language.
For Test type, select Audio or Text.
For Endpoint selection, select either Streaming or Non-streaming.
Choose Validate discrepancy to validate your test dataset.

Before executing a test set, you can validate test coverage, including identifying intents and slots present in the test set but not in the bot. This early warning serves to set tester expectation for unexpected test failures. If discrepancies between your test dataset and your bot are detected, the Execute test page will update with the View details button.

Intents and slots found in the test data set but not in the bot alias are listed as shown in the following screenshots.

After you validate the discrepancies, choose Execute to run the test.

Review results
The performance measures generated after executing a test set help you identify areas of bot design that need improvements and are useful for expediting bot development and delivery to support your customers. Test Workbench provides insights on intent classification and slot resolution in end-to-end conversation and single-line input level. The completed test runs are stored with timestamps in your S3 bucket, and can be used for future comparative reviews.

On the Amazon Lex console, choose Test results in the navigation pane.
Choose the test result ID for the results you want to review.

On the next page, the test results will include a breakdown of results organized in four main tabs:  Overall results, Conversation results, Intent and slot results, and Detailed results.
Overall results
The Overall results tab contains three main sections:

Test set input breakdown — A chart showing the total number of end-to-end conversations and single input utterances in the test set.
Single input breakdown — A chart showing the number of passed or failed single inputs.
Conversation breakdown — A chart showing the number of passed or failed multi-turn inputs.

For test sets run in audio modality, speech transcription charts are provided to show the number of passed or failed speech transcriptions on both single input and conversation types. In audio modality, a single input or multi-turn conversation could pass the speech transcription test, yet fail the overall end-to-end test. This can be caused, for instance, by a slot resolution or an intent recognition issue.

Conversation results
Test Workbench helps you drill down into conversation failures that can be attributed to specific intents or slots. The Conversation results tab is organized into three main areas, covering all intents and slots used in the test set:

Conversation pass rates — A table used to visualize which intents and slots are responsible for possible conversation failures.
Conversation intent failure metrics — A bar graph showing the top five worst performing intents in the test set, if any.
Conversation slot failure metrics — A bar graph showing the top five worst performing slots in the test set, if any.

Intent and slot results
The Intent and slot results tab provides drill-down metrics for bot dimensions such as intent recognition and slot resolution.

Intent recognition metrics — A table showing the intent recognition success rate.
Slot resolution metrics — A table showing the slot resolution success rate, by each intent.

Detailed results
You can access a detailed report of the executed test run on the Detailed results tab. A table is displayed to show the actual transcription, output intent, and slot values in a test set. The report can be downloaded as a CSV for further analysis.

The line-level output provides insights to help improve the bot design and boost accuracy. For instance, misrecognized or missed speech inputs such as branded words can be added to custom vocabulary of an intent or as utterances under an intent.
In order to further improve conversation design, you can refer to this post, outlining best practices on using ML to create a bot that will delight your customers by accurately understanding them.
Conclusion
In this post, we presented the Test Workbench for Amazon Lex, a native capability that standardizes a chatbot automated testing process and allows developers and conversation designers to streamline and iterate quickly through bot design and development.
We look forward to hearing how you use this new functionality of Amazon Lex and welcome feedback! For any questions, bugs, or feature requests, please reach us through AWS re:Post for Amazon Lex or your AWS Support contacts.
To learn more, see Amazon Lex FAQs and the Amazon Lex V2 Developer Guide.

About the authors
Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.
Grazia Russo Lassner is a Senior Consultant with the AWS Professional Services Natural Language AI team. She specializes in designing and developing conversational AI solutions using AWS technologies for customers in various industries. Outside of work, she enjoys beach weekends, reading the latest fiction books, and family.

Using An Artificial Intelligence Algorithm, Researchers at MIT and McM …

MIT and McMaster University researchers have utilized artificial intelligence (AI) to discover a new antibiotic that effectively kills drug-resistant bacteria, particularly Acinetobacter baumannii, a species commonly found in hospitals. This bacterium is associated with severe infections such as pneumonia and meningitis, and it is a leading cause of infections among wounded soldiers. The rise of antibiotic-resistant bacteria necessitates the development of new antibiotics, and the use of AI in drug discovery holds great promise.

The researchers employed a machine-learning algorithm to evaluate nearly 7,000 chemical compounds and identify a potential drug that inhibits the growth of Acinetobacter baumannii. The AI algorithm was trained to recognize patterns in extensive data sets and predict the inhibitory properties of chemical compounds. This approach enables the identification of novel antibiotics with distinct chemical structures compared to existing drugs.

In their initial study, the team successfully trained the AI algorithm to identify compounds that could inhibit the growth of E. coli, yielding a molecule named halicin. Halicin demonstrated the ability to kill multiple bacterial species resistant to conventional treatment. Building on this success, the researchers focused on combatting A. baumannii, a significant threat due to its multidrug resistance.

To train their computational model, the researchers exposed A. baumannii to various chemical compounds and observed their inhibitory effects. The AI algorithm analyzed the chemical structures of these compounds and learned to associate specific features with growth inhibition. Next, the algorithm analyzed over 6,000 compounds from the Drug Repurposing Hub at the Broad Institute, quickly identifying a few hundred top candidates. From there, the team selected 240 compounds for experimental testing in the laboratory, prioritizing those with structurally distinct properties from existing antibiotics.

The tests yielded nine potential antibiotics, including one particularly potent compound. Originally investigated as a diabetes drug, this compound effectively kills A. baumannii while leaving other bacterial species unaffected. This narrow spectrum of activity minimizes the risk of bacterial resistance and reduces harm to beneficial gut bacteria that aid in preventing opportunistic infections.

The researchers named the potent antibiotic abaucin and demonstrated its efficacy in treating A. baumannii wound infections in mice. Lab tests confirmed its effectiveness against various drug-resistant strains of A. baumannii isolated from human patients. Further investigations revealed that abaucin interferes with lipoprotein trafficking, a cellular process involved in protein transportation. Notably, abaucin selectively targets A. baumannii despite this process being present in all Gram-negative bacteria. The researchers suggest that subtle differences in how A. baumannii performs lipoprotein trafficking contribute to the drug’s selectivity.

The team is collaborating with McMaster researchers to optimize abaucin’s medicinal properties for potential use in patients. Additionally, they plan to apply their AI modeling approach to identify potential antibiotics for other drug-resistant infections caused by bacteria such as Staphylococcus aureus and Pseudomonas aeruginosa.

The successful application of AI in identifying a novel antibiotic highlights its potential to accelerate and expand the search for effective treatments against drug-resistant bacteria. This research addresses the urgent need for new antibiotics and demonstrates the power of AI in revolutionizing the field of drug discovery.

Check Out The Paper and Reference MIT Article. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Using An Artificial Intelligence Algorithm, Researchers at MIT and McMaster University have identified a new Antibiotic that can Kill a Type of Bacteria that is Responsible for Many Drug-Resistant Infections appeared first on MarkTechPost.

Salesforce AI Research Introduces CodeTF: A One-Stop Transformer Libra …

Over the past few years, AI has caused seismic shifts in the software engineering industry. Basic source code analysis is at the heart of the machine learning-based methodologies that have traditionally been used for code intelligence jobs in software engineering. These activities aim to enhance the source code’s quality and maintainability by better comprehending, analyzing, and altering it. Deep learning models have recently demonstrated promising results in more difficult code intelligence tasks, such as code generation, code completion, code summarization, and code retrieval. These models are particularly Transformer-based large language models (LLMs) pretrained on large-scale code data (“Code LLMs”).

Despite LLMs’ clear benefits, most developers still find it difficult and time-consuming to create and implement such models from scratch. Expert software developers and ML researchers are required to create scalable and serviceable models for production environments. The inconsistent interfaces between models, datasets, and application tasks are a major barrier. It leads to the development and deployment of Code LLMs requiring much repetitious work.

Salesforce AI Research presents CodeTF, an open-source and all-inclusive library for Transformer-based LLMs. CodeTF’s standardized user interface makes it simple to access and modify code modules independently. A core module tailored to code-based data and models is the basis for other key components, including model training, inference, and datasets. This design philosophy makes Standardized integration with commercially available models and data sets possible. 

This library provides access to a wide variety of pretrained Transformer-based LLMs and coding jobs within the uniform framework of CodeTF. CodeTF supports several LLM codes, including encoder-only, decoder-only, and encoder-decoder. CodeTF provides a mechanism for rapidly loading and serving pretrained models, custom models, and datasets, as well as several widely used datasets like HumanEval and APPS. Library users can rapidly reproduce and implement state-of-the-art models with a unified interface. They can also incorporate new models and benchmarks as they see fit. 

Due to the strict grammatical requirements that must be followed to align with their programming languages, code data sometimes necessitates more stringent preprocessing and transformation techniques than data in other domains like vision and text. So, CodeTF presents a more robust set of data processing features, such as Abstract Syntax Tree (AST) parsers for multiple programming languages based on tree-sitter 2 and tools for extracting code attributes like method names, identifiers, variable names, and comments. Tools for efficient processing and manipulating code data for model training, fine-tuning, and evaluation. These capabilities are critical for preprocessing code into a form that language models can understand. For its multi-objective learning technique, CodeT5 requires, among other things, the extraction of function names and the identification of identifier positions.

The proposed library enables users to take advantage of cutting-edge developments in code intelligence research and development by giving access to state-of-the-art models, fine-tuning and evaluation tools, and a variety of popular datasets. 

Check Out The Paper and Github link. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Salesforce AI Research Introduces CodeTF: A One-Stop Transformer Library For Code Large Language Models (CodeLLM) appeared first on MarkTechPost.

Hey AI-Pa! Draw Me a Story: TaleCrafter is an AI Method that can Gener …

Generative AI has come a long way recently. We are all familiar with ChatGPT, diffusion models, and more at this point. These tools are becoming more and more integrated into our daily lives. Now, we are using ChatGPT as an assistant to our daily tasks; MidJourney to assist design process and more AI tools to ease our routine tasks.

The advancement of generative AI models has enabled unique use cases that were different to achieve previously. We have seen someone write and illustrate an entire child book using generative AI models. We used to tell the stories the same way for ages, and now we This was a great example of how generative AI can revolutionize the storytelling that we have been using for ages. 

Visual storytelling is a powerful method of conveying narrative content effectively to diverse audiences. Its applications in education and entertainment, such as children’s books, are vast. We know that we can generate stories and illustrations separately using generative AI models, but can we actually use them to generate a visual story consistently? The question then becomes; given a story in plain text and the portrait images of a few characters, can we generate a series of images to express the story visually?

To have an accurate visual representation of a narrative, story visualization must meet several vital requirements. Firstly, maintaining identity consistency is crucial to depict characters and environments consistently throughout the frames or scenes. Secondly, the visual content should closely align with the textual narrative, accurately representing the events and interactions described in the story. Lastly, a clear and logical layout of objects and characters within the generated images aids in seamlessly guiding the viewer’s attention through the narrative, facilitating understanding.

Generative AI has been used to propose several story visualization methods. Early work relied on GAN or VAE-based methods and text encoders to project text into a latent space, generating images conditioned on the textual input. While these approaches demonstrated promise, they faced challenges in generalizing to new actors, scenes, and layout arrangements. Recent attempts at zero-shot story visualization investigated the potential of adapting to new characters and scenes using pre-trained models. However, these methods lacked support for multiple characters and did not consider the importance of layout and local object structures within the generated images.

So, should we just give up on having an AI-based story visualization system? Are these limitations too difficult to be tackled? Of course not! Time to meet TaleCrafter.

TaleCraft can generate visual stories. Source: https://arxiv.org/pdf/2305.18247.pdf

TaleCrafter is a novel and versatile interactive story visualization system that overcomes the limitations of previous approaches. The system consists of four key components: story-to-prompt generation (S2P), text-to-layout generation (T2L), controllable text-to-image generation (C-T2I), and image-to-video animation (I2V).

These components work together to address the requirements of a story visualization system. Story-to-prompt generation (S2P component leverages a large language model to generate prompts that depict the visual content of images based on instructions derived from the story. Text-to-layout generation (T2L) component utilizes the generated prompt to generate an image layout that offers location guidance for the main subjects. Then, the controllable text-to-image generation (C-T2I) module,  the core component of the visualization system, renders images conditioned on the layout, local sketch, and prompt. Finally, the image-to-video animation (I2V) component enriches the visualization process by animating the generated images, providing a more vivid and engaging presentation of the story.

Overview of TaleCrafter. Source: https://arxiv.org/pdf/2305.18247.pdf

TaleCrafter‘s main contributions lie in two key aspects. Firstly, the proposed story visualization system leverages large language and pre-trained text-to-image (T2I) models to generate a video from plain text stories. This versatile system can handle multiple novel characters and scenes, overcoming the limitations of previous approaches that were limited to specific datasets. Secondly, the controllable text-to-image generation module (C-T2I) emphasizes identity preservation for multiple characters and provides control over layout and local object structures, enabling interactive editing and customization.

Check Out The Paper and Github link. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Hey AI-Pa! Draw Me a Story: TaleCrafter is an AI Method that can Generate Interactive Visuals for Stories appeared first on MarkTechPost.

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. With the recent PyTorch 2.0 release, AWS customers can now do same things as they could with PyTorch 1.x but faster and at scale with improved training speeds, lower memory usage, and enhanced distributed capabilities. Several new technologies including torch.compile, TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor have been included in the PyTorch2.0 release. Refer to PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever for details.
This post demonstrates the performance and ease of running large-scale, high-performance distributed ML model training and deployment using PyTorch 2.0 on AWS. This post further walks through a step-by-step implementation of fine-tuning a RoBERTa (Robustly Optimized BERT Pretraining Approach) model for sentiment analysis using AWS Deep Learning AMIs (AWS DLAMI) and AWS Deep Learning Containers (DLCs) on Amazon Elastic Compute Cloud (Amazon EC2 p4d.24xlarge) with an observed 42% speedup when used with PyTorch 2.0 torch.compile + bf16 + fused AdamW. The fine-tuned model is then deployed on AWS Graviton-based C7g EC2 instance on Amazon SageMaker with an observed 10% speedup compared to PyTorch 1.13.
The following figure shows a performance benchmark of fine-tuning a RoBERTa model on Amazon EC2 p4d.24xlarge with AWS PyTorch 2.0 DLAMI + DLC.

Refer to Optimized PyTorch 2.0 inference with AWS Graviton processors for details on AWS Graviton-based instance inference performance benchmarks for PyTorch 2.0.
Support for PyTorch 2.0 on AWS
PyTorch2.0 support is not limited to the services and compute shown in example use-case in this post; it extends to many others on AWS, which we discuss in this section.
Business requirement
Many AWS customers, across a diverse set of industries, are transforming their businesses by using artificial intelligence (AI), specifically in the area of generative AI and large language models (LLMs) that are designed to generate human-like text. These are basically big models based on deep learning techniques that are trained with hundreds of billions of parameters. The growth in model sizes is increasing training time from days to weeks, and even months in some cases. This is driving an exponential increase in training and inference costs, which requires, more than ever, a framework such as PyTorch 2.0 with built-in support of accelerated model training and the optimized infrastructure of AWS tailored to the specific workloads and performance needs.
Choice of compute
AWS provides PyTorch 2.0 support on the broadest choice of powerful compute, high-speed networking, and scalable high-performance storage options that you can use for any ML project or application and customize to fit your performance and budget requirements. This is manifested in the diagram in the next section; in the bottom tier, we provide a broad selection of compute instances powered by AWS Graviton, Nvidia, AMD, and Intel processors.
For model deployments, you can use ARM-based processors such as the recently announced AWS Graviton-based instance that provides inference performance for PyTorch 2.0 with up to 3.5 times the speed for Resnet50 compared to the previous PyTorch release, and up to 1.4 times the speed for BERT, making AWS Graviton-based instances the fastest compute-optimized instances on AWS for CPU-based model inference solutions.
Choice of ML services
To use AWS compute, you can select from a broad set of global cloud-based services for ML development, compute, and workflow orchestration. This choice allows you to align with your business and cloud strategies and run PyTorch 2.0 jobs on the platform of your choice. For instance, if you have on-premises restrictions or existing investments in open-source products, you can use Amazon EC2, AWS ParallelCluster, or AWS UltraCluster to run distributed training workloads based on a self-managed approach. You could also use a fully managed service like SageMaker for a cost-optimized, fully managed, and production-scale training infrastructure. SageMaker also integrates with various MLOps tools, which allows you to scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden.
Similarly, if you have existing Kubernetes investments, you can also use Amazon Elastic Kubernetes Service (Amazon EKS) and Kubeflow on AWS to implement an ML pipeline for distributed training or use an AWS-native container orchestration service like Amazon Elastic Container Service (Amazon ECS) for model training and deployments. Options to build your ML platform are not limited to these services; you can pick and choose depending on your organizational requirements for your PyTorch 2.0 jobs.

Enabling PyTorch 2.0 with AWS DLAMI and AWS DLC
To use the aforementioned stack of AWS services and powerful compute, you have to install an optimized compiled version of the PyTorch2.0 framework and its required dependencies, many of which are independent projects, and test them end to end. You may also need CPU-specific libraries for accelerated math routines, GPU-specific libraries for accelerated math and inter-GPU communication routines, and GPU drivers that need to be aligned with the GPU compiler used to compile the GPU libraries. If your jobs require large-scale multi-node training, you need an optimized network that can provide lowest latency and highest throughput. After you build your stack, you need to regularly scan and patch them for security vulnerabilities and rebuild and retest the stack after every framework version upgrade.
AWS helps reduce this heavy lifting by offering a curated and secure set of frameworks, dependencies, and tools to accelerate deep learning in the cloud though AWS DLAMIs and AWS DLCs. These pre-built and tested machine images and containers are optimized for deep learning on EC2 Accelerated Computing Instance types, allowing you to scale out to multiple nodes for distributed workloads more efficiently and easily. It includes a pre-built Elastic Fabric Adapter (EFA), Nvidia GPU stack, and many deep learning frameworks (TensorFlow, MXNet, and PyTorch with latest release of 2.0) for high-performance distributed deep learning training. You don’t need to spend time installing and troubleshooting deep learning software and drivers or building ML infrastructure, nor do you have to incur the recurring cost of patching these images for security vulnerabilities or recreating the images after every new framework version upgrade. Instead, you can focus on the higher value-added effort of training jobs at scale in a shorter amount of time and iterating on your ML models faster.

Solution overview
Considering that training on GPU and inference on CPU is a popular use case for AWS customers, we have included as part of this post a step-by-step implementation of a hybrid architecture (as shown in the following diagram). We will explore the art-of-the-possible and use a P4 EC2 instance with BF16 support initialized with Base GPU DLAMI including NVIDIA drivers, CUDA, NCCL, EFA stack, and PyTorch2.0 DLC for fine-tuning a RoBERTa sentiment analysis model that gives you control and flexibility to use any open-source or proprietary libraries. Then we use SageMaker for a fully managed model hosting infrastructure to host our model on AWS Graviton3-based C7g instances. We picked C7g on SageMaker because it’s proven to reduce inference costs by up to 50% relative to comparable EC2 instances for real-time inference on SageMaker. The following diagram illustrates this architecture.

The model training and hosting in this use case consists of the following steps:

Launch a GPU DLAMI-based EC2 Ubuntu instance in your VPC and connect to your instance using SSH.
After you log in to your EC2 instance, download the AWS PyTorch 2.0 DLC.
Run your DLC container with a model training script to fine-tune the RoBERTa model.
After model training is complete, package the saved model, inference scripts, and a few metadata files into a tar file that SageMaker inference can use and upload the model package to an Amazon Simple Storage Service (Amazon S3) bucket.
Deploy the model using SageMaker and create an HTTPS inference endpoint. The SageMaker inference endpoint holds a load balancer and one or more instances of your inference container in different Availability Zones. You can deploy either multiple versions of the same model or entirely different models behind this single endpoint. In this example, we host a single model.
Invoke your model endpoint by sending it test data and verify the inference output.

In the following sections, we showcase fine-tuning a RoBERTa model for sentiment analysis. RoBERTa is developed by Facebook AI, improving on the popular BERT model by modifying key hyperparameters and pre-training on a larger corpus. This leads to improved performance compared to vanilla BERT.
We use the transformers library by Hugging Face to get the RoBERTa model pre-trained on approximately 124 million tweets, and we fine-tune it on the Twitter dataset for sentiment analysis.
Prerequisites
Make sure you meet the following prerequisites:

You have an AWS account.
Make sure you’re in the us-west-2 Region to run this example. (This example is tested in us-west-2; however, you can run in any other Region.)
Create a role with the name sagemakerrole. Add managed policies AmazonSageMakerFullAccess and AmazonS3FullAccess to give SageMaker access to S3 buckets.
Create an EC2 role with the name ec2_role. Use the following permission policy:

#Refer – Make sure EC2 role has following policies
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: [
“ecr:BatchGetImage”,
“ecr:BatchCheckLayerAvailability”,
“ecr:CompleteLayerUpload”,
“ecr:GetDownloadUrlForLayer”,
“ecr:InitiateLayerUpload”,
“ecr:PutImage”,
“ecr:UploadLayerPart”,
“ecr:GetAuthorizationToken”,
“s3:*”,
“s3-object-lambda:*”,
“iam:Get*”,
“iam:PassRole”,
“sagemaker:*”
],
“Resource”: “*”
}
]
}

1. Launch your development instance
We create a p4d.24xlarge instance that offers 8 NVIDIA A100 Tensor Core GPUs in us-west-2:

#STEP 1.1
For a short guide on launching your instance, read the Getting Started with Amazon EC2 documentation.

When selecting the AMI, follow the release notes to run this command using the AWS Command Line Interface (AWS CLI) to find the AMI ID to use in us-west-2:

#STEP 1.2 – This requires AWS CLI credentials to call ec2 describe-images api (ec2:DescribeImages).
aws ec2 describe-images –region us-west-2 –owners amazon –filters ‘Name=name,Values=Deep Learning Base GPU AMI (Ubuntu 20.04) ????????’ ‘Name=state,Values=available’ –query ‘reverse(sort_by(Images, &CreationDate))[:1].ImageId’ –output text

Make sure the size of the gp3 root volume is 200 GiB.
EBS volume encryption is not enabled by default. Consider changing this when moving this solution to production.
2. Download a Deep Learning Container
AWS DLCs are available as Docker images in Amazon Elastic Container Registry Public, a managed AWS container image registry service that is secure, scalable, and reliable. Each Docker image is built for training or inference on a specific deep learning framework version, Python version, with CPU or GPU support. Select the PyTorch 2.0 framework from the list of available Deep Learning Containers images.
Complete the following steps to download your DLC:
a. SSH to the instance. By default, security group used with EC2 opens up SSH port to all. Please consider this if you are moving this solution to production:

#STEP 2.1 – Use Public IP
ssh -i ~/.ssh/<pub_key> ubuntu@<IP_ADDR>

#Refer – Output: Notice python3.9 package that we will use to run and install Inference scripts

__| __|_ )
_| ( / Deep Learning Base GPU AMI (Ubuntu 20.04)
___|___|___|

Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1035-aws x86_64v)

* Please note that Amazon EC2 P2 Instance is not supported on current DLAMI.
* Supported EC2 instances: G3, P3, P3dn, P4d, P4de, G5, G4dn.
NVIDIA driver version: 525.85.12
Default CUDA version: 11.2

Utility libraries are installed in /usr/bin/python3.9.
To access them, use /usr/bin/python3.9.

By default, the security group used with Amazon EC2 opens up the SSH port to all. Consider changing this if you are moving this solution to production.
b. Set the environment variables required to run the remaining steps of this implementation:

#STEP 2.2
Attach the role “ec2_role” to your EC2 instance from the AWS console.

#STEP 2.3
Follow the steps here to create a S3 bucket in us-west-2 region

#STEP 2.4 – Set Environment variables
#Bucket created in step 2.3
export S3_BUCKET=<your-s3-bucket>
export PYTHON_V=python3.9
export SAGEMAKER_ROLE=$(aws iam get-role –role-name sagemakerrole –output text –query ‘Role.Arn’)
aws configure set default.region ‘us-west-2′

Amazon ECR supports public image repositories with resource-based permissions using AWS Identity and Access Management (IAM) so that specific users or services can access images.
c. Log in to the DLC registry:

#STEP 2.5 – login
aws ecr get-login-password –region us-west-2 | docker login –username AWS –password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com

#Refer – Output
Login Succeeded

d. Pull the latest PyTorch 2.0 container with GPU support in us-west-2

#STEP 2.6 – pull the latest DLC PyTorch image
docker pull 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2

#Refer – Output
7608715873ec: Pull complete
a0bad51e1731: Pull complete
f7778ea3b9cc: Pull complete
….

Digest: sha256:1ab0d477345a11970d811cc252bc461dd70859f15caa19a65198e7941953e6b8
StaRefertus: Downloaded newer image for 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2
763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2

If you get the error “no space left on device”, make sure you increase the EC2 EBS volume to 200 GiB and then extend the Linux file system.
3. Clone the latest scripts adapted to PyTorch 2.0
Clone the scripts with the following code:

#STEP 3.1
cd $HOME
git clone https://github.com/aws-samples/aws-deeplearning-labs.git
cd aws-deeplearning-labs/workshop/twitter_lm/scripts/
export ml_working_dir=$PWD

Because we’re using the Hugging Face transformers API with the latest version 4.28.1, it has already enabled PyTorch 2.0 support. We added the following argument to the trainer API in train_sentiment.py to enable new PyTorch 2.0 features:

Torch compile – Experience an average 43% speedup on Nvidia A100 GPUs with single line of change.
BF16 datatype – New data type support (Brain Floating Point) for Ampere or newer GPUs.
Fused AdamW optimizer – Fused AdamW implementation to further speed up training. This stochastic optimization method modifies the typical implementation of weight decay in Adam by decoupling weight decay from the gradient update.

#Refer – updated training config
training_args = TrainingArguments(
do_eval=True,
evaluation_strategy=’epoch’,
output_dir=’test_trainer’,
logging_dir=’test_trainer’,
logging_strategy=’epoch’,
save_strategy=’epoch’,
num_train_epochs=10,
learning_rate=1e-05,
# pytorch 2.0.0 specific args
torch_compile=True,
bf16=True,
optim=’adamw_torch_fused’,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
load_best_model_at_end=True,
metric_for_best_model=’recall’,
)

4. Build a new Docker image with dependencies
We extend the pre-built PyTorch 2.0 DLC image to install the Hugging Face transformer and other libraries that we need to fine-tune our model. This allows you to use the included tested and optimized deep learning libraries and settings without having to create an image from scratch. See the following code:

#STEP 4.1 – Create Dockerfile with following content
printf ‘FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2
RUN pip install scikit-learn evaluate transformers xformers
‘ > Dockerfile

#STEP 4.2 – Build new docker file
docker build -f Dockerfile -t pytorch2.0:roberta-sentiment-analysis .

5. Start training using the container
Run the following Docker command to begin fine-tuning the model on the tweet_eval sentiment dataset. We’re using the Docker container arguments (shared memory size, max locked memory, and stack size) recommend by Nvidia for deep learning workloads.

#STEP 5.1 – run docker container for model training
docker run –net=host –uts=host –ipc=host –shm-size=1g –ulimit stack=67108864 –ulimit memlock=-1 –gpus all -v “/home/ubuntu:/workspace” pytorch2.0:roberta-sentiment-analysis python /workspace/aws-deeplearning-labs/workshop/twitter_lm/scripts/train_sentiment.py

You should expect the following output. The script first downloads the TweetEval dataset, which consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. The tasks include irony, hate, offensive, stance, emoji, emotion, and sentiment.
The script then downloads the base model and starts the fine-tuning process. Training and evaluation metrics are reported at the end of each epoch.

#Refer – Output
{‘loss’: 0.6927, ‘learning_rate’: 9e-06, ‘epoch’: 1.0}
{‘eval_loss’: 0.6144512295722961, ‘eval_recall’: 0.7129473901625799, ‘eval_runtime’: 3.2694, ‘eval_samples_per_second’: 611.74, ‘eval_steps_per_second’: 4.894, ‘epoch’: 1.0}
{‘loss’: 0.5554, ‘learning_rate’: 8.000000000000001e-06, ‘epoch’: 2.0}
{‘eval_loss’: 0.5860999822616577, ‘eval_recall’: 0.7312511094156663, ‘eval_runtime’: 3.3918, ‘eval_samples_per_second’: 589.655, ‘eval_steps_per_second’: 4.717, ‘epoch’: 2.0}
{‘loss’: 0.5084, ‘learning_rate’: 7e-06, ‘epoch’: 3.0}
{‘eval_loss’: 0.6119785308837891, ‘eval_recall’: 0.730757638985487, ‘eval_runtime’: 3.592, ‘eval_samples_per_second’: 556.791, ‘eval_steps_per_second’: 4.454, ‘epoch’: 3.0}

Performance statistics
With PyTorch 2.0 and the latest Hugging Face transformers library 4.28.1, we observed a 42% speedup on a single p4d.24xlarge instance with 8 A100 40GB GPUs. Performance improvements comes from a combination of torch.compile, the BF16 data type, and the fused AdamW optimizer. The following code is the final result of two training runs with and without new features:

#Refer performance statistics
wihtout torch.compile + bf16 + fused adamw:
{‘eval_loss’: 0.7532123327255249, ‘eval_recall’: 0.7315191840508296, ‘eval_runtime’: 3.7641, ‘eval_samples_per_second’: 531.341, ‘eval_steps_per_second’: 4.251, ‘epoch’: 10.0}
{‘train_runtime’: 1891.5635, ‘train_samples_per_second’: 241.15, ‘train_steps_per_second’: 1.887, ‘train_loss’: 0.4372138784713104, ‘epoch’: 10.0}

with torch.compile + bf16 + fused adamw
{‘eval_loss’: 0.7548801898956299, ‘eval_recall’: 0.7251081080195005, ‘eval_runtime’: 3.5685, ‘eval_samples_per_second’: 560.453, ‘eval_steps_per_second’: 4.484, ‘epoch’: 10.0}
{‘train_runtime’: 1095.388, ‘train_samples_per_second’: 416.428, ‘train_steps_per_second’: 3.259, ‘train_loss’: 0.44210514314368327, ‘epoch’: 10.0}

6. Test the trained model locally before preparing for SageMaker inference
You can find the following files under $ml_working_dir/saved_model/ after training:

#Refer – model training artifacts
config.json
merges.txt
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.json

Let’s make sure we can run inference locally before preparing for SageMaker inference. We can load the saved model and run inference locally using the test_trained_model.py script:

#STEP 6.1 – run docker container for test model infernce
docker run –net=host –uts=host –ipc=host –ulimit stack=67108864 –ulimit memlock=-1 –gpus all -v “/home/ubuntu:/workspace” pytorch2.0:roberta-sentiment-analysis python /workspace/aws-deeplearning-labs/workshop/twitter_lm/scripts/test_trained_model.py

You should expect the following output with the input “Covid cases are increasing fast!”:

#Refer – Output
[{‘label’: ‘negative’, ‘score’: 0.854185163974762}]

7. Prepare the model tarball for SageMaker inference
Under the directory where the model is located, make a new directory called code:

#STEP 7.1 – set permissions
cd $ml_working_dir
sudo chown ubuntu:ubuntu saved_model
cd saved_model
mkdir code

In the new directory, create the file inference.py and add the following to it:

#STEP 7.2 – write inference.py
printf ‘import json
from transformers import pipeline

REQUEST_CONTENT_TYPE = “application/x-text”
STR_DECODE_CODE = “utf-8”
RESULT_CLASS = “sentiment”
RESULT_SCORE = “score”

def model_fn(model_dir):
sentiment_analysis = pipeline(
“sentiment-analysis”,
model=model_dir,
tokenizer=model_dir,
return_all_scores=True
)
return sentiment_analysis

def input_fn(request_body, request_content_type):
if request_content_type == REQUEST_CONTENT_TYPE:
input_data = request_body.decode(STR_DECODE_CODE)
return input_data

def predict_fn(input_data, model):
return model(input_data)

def output_fn(prediction, accept):
class_label = None
score = -1
for _pred in prediction[0]:
if _pred[“score”] > score:
score = _pred[“score”]
class_label = _pred[“label”]
return json.dumps({RESULT_CLASS: class_label, RESULT_SCORE: score})’ > code/inference.py

Make another file in the same directory called requirements.txt and put transformers in it. SageMaker installs the dependencies in requirements.txt in the inference container for you.

#STEP 7.3 – write requirements.txt
printf ‘transformers’ > code/requirements.txt

In the end, you should have the following folder structure:

#Refer – inference package folder structure
code/
code/inference.py
code/requirements.txt
config.json
merges.txt
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.json

The model is ready to be packaged and uploaded to Amazon S3 for use with SageMaker inference:

#STEP 7.4 – Create inference package tar file and upload it to S3
sudo tar -cvpzf ./personal-roberta-base-sentiment.tar.gz -C ./ .
aws s3 cp ./personal-roberta-base-sentiment.tar.gz s3://$S3_BUCKET

8. Deploy the model on a SageMaker AWS Graviton instance
New generations of CPUs offer a significant performance improvement in ML inference due to specialized built-in instructions. In this use case, we use the SageMaker fully managed hosting infrastructure with AWS Graviton3-based C7g instances. AWS has also measured up to a 50% cost savings for PyTorch inference with AWS Graviton3-based EC2 C7g instances across Torch Hub ResNet50, and multiple Hugging Face models relative to comparable EC2 instances.
To deploy the models to AWS Graviton instances, we use AWS DLCs that provide support for PyTorch 2.0 and TorchServe 0.8.0, or you can bring your own containers that are compatible with the ARMv8.2 architecture.
We use the model we trained earlier: s3://<your-s3-bucket>/twitter-roberta-base-sentiment-latest.tar.gz. If you haven’t used SageMaker before, review Get Started with Amazon SageMaker.
To start, make sure the SageMaker package is up to date:

#STEP 8.1 – Install SageMaker library
cd $ml_working_dir
$PYTHON_V -m pip install -U sagemaker

Because this is an example, create a file called start_endpoint.py and add the following code. This will be the Python script to start a SageMaker inference endpoint with the mode:

#STEP 8.2 – write start_endpoint.py
printf ‘# Import some needed modules
from sagemaker import get_execution_role, Session, image_uris
from sagemaker.model import Model
import boto3
import os

model_name = “pytorch-roberta-model”

# Setup SageMaker session
region = boto3.Session().region_name
role = os.environ.get(“SAGEMAKER_ROLE”)
sm_client = boto3.client(“sagemaker”, region_name=region)
sagemaker_session = Session()
bucket = os.environ.get(“S3_BUCKET”)

# Select container. In our case,its graviton
container_uri = image_uris.retrieve(
region=”us-west-2″,
framework=”pytorch”,
version=”2.0.0″,
image_scope=”inference_graviton”)

# Set model parameters
model = Model(
image_uri=container_uri,
model_data=f”s3://{bucket}/personal-roberta-base-sentiment.tar.gz”,
role=role,
name=model_name,
sagemaker_session=sagemaker_session
)

# Deploy model
endpoint = model.deploy(
initial_instance_count=1,
instance_type=”ml.c7g.4xlarge”,
endpoint_name=”sm-endpoint-” + model_name
)’ > start_endpoint.py

We’re using ml.c7g.4xlarge for the instance and are retrieving PT 2.0 with an image scope inference_graviton. This is our AWS Graviton3 instance.
Next, we create the file that runs the prediction. We do these as separate scripts so we can run the predictions as many times as we want. Create predict.py with the following code:

#STEP 8.3 – write predict.py
printf ‘import boto3
from boto3 import Session, client

model_name = “pytorch-roberta-model”
data = “Writing data to analyze sentiments and see how the data is viewed”

sagemaker_runtime = boto3.client(“sagemaker-runtime”, region_name=”us-west-2″)
endpoint_name=”sm-endpoint-” + model_name
print(“Calling model:” + endpoint_name)
response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=bytes(data, “utf-8″),
ContentType=”application/x-text”,
)
print(response[“Body”].read().decode(“utf-8”))’ > predict.py

With the scripts generated, we can now start an endpoint, do predictions against the endpoint, and clean up when we’re done:

#Step 8.4 – Start the SageMaker Inference endpoint
$PYTHON_V start_endpoint.py

#Step 8.5 Do a prediction this can be run as many times as we like
$PYTHON_V predict.py

#Refer – Prediction Output
Calling model:sm-endpoint-pytorch-roberta-model
{“sentiment”: “neutral”, “score”: 0.9342969059944153}

9. Clean up
Lastly, we want to clean up from this example. Create cleanup.py and add the following code:

#STEP 9.1 CleanUp Script
printf ‘from boto3 import client

model_name = “pytorch-roberta-model”
endpoint_name=”sm-endpoint-” + model_name

sagemaker_client = client(“sagemaker”, region_name=”us-west-2″)
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)
sagemaker_client.delete_model(ModelName=model_name)’ > cleanup.py

#Step 9.2 Cleanup
$PYTHON_V cleanup.py

Conclusion
AWS DLAMIs and DLCs have become the go-to standard for running deep learning workloads on a broad selection of compute and ML services on AWS. Along with using framework-specific DLCs on AWS ML services, you can also use a single framework on Amazon EC2, which removes the heavy lifting necessary for developers to build and maintain deep learning applications. Refer to Release Notes for DLAMI and Available Deep Learning Containers Images to get started.
This post showed one of many possibilities to train and serve your next model on AWS and discussed several formats that you can adopt to meet your business objectives. Give this example a try or use our other AWS ML services to expand the data productivity for your business. We have included a simple sentiment analysis problem so that customers new to ML can understand how simple it is to get started with PyTorch 2.0 on AWS. We will be covering more advanced use cases, models, and AWS technologies in upcoming blog posts.

About the authors
Kanwaljit Khurmi is a Principal Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.
Mike Schneider is a Systems Developer, based in Phoenix AZ. He is a member of Deep Learning containers, supporting various Framework container images, to include Graviton Inference. He is dedicated to infrastructure efficiency and stability.
Lai Wei is a Senior Software Engineer at Amazon Web Services. He is focusing on building easy to use, high-performance and scalable deep learning frameworks for accelerating distributed model training. Outside of work, he enjoys spending time with his family, hiking, and skiing.

Arrange your transcripts into paragraphs with Amazon Transcribe

Amazon Transcribe is a speech recognition service that generates transcripts from video and audio files in multiple supported languages and accents. It comes with a rich set of features, including automatic language identification, multi-channel and multi-speaker support, custom vocabularies, and transcript redaction.
Amazon Transcribe supports two modes of operation: batch and streaming. In batch mode, a transcription job is created to process files residing in an Amazon Simple Storage Service (Amazon S3) bucket; in streaming mode, the audio source is integrated in real time with Amazon Transcribe through HTTP/2 calls or Web Sockets.
In this post, we explore how to automatically arrange the generated transcript into paragraphs while in batch mode, increasing the readability of the generated transcript.
Transcription output
Amazon Transcribe uses JSON representation for its output. It provides the transcription result in two different formats: text format and itemized format.
Text format provides the transcript altogether, as a block of text, whereas itemized format provides the transcript in the form of timely ordered transcribed items, along with additional metadata per item. Both formats exist in parallel in the output file.
Depending on the features selected during transcription job creation, Amazon Transcribe creates additional and enriched views of the transcription result. See the following example code:

{
“jobName”: “2x-speakers_2x-channels”,
“accountId”: “************”,
“results”: {
“transcripts”: [
{
“transcript”: “Hi, welcome.”
}
],
“speaker_labels”: [
{
“channel_label”: “ch_0”,
“speakers”: 2,
“segments”: [
]
},
{
“channel_label”: “ch_1”,
“speakers”: 2,
“segments”: [
]
}
],
“channel_labels”: {
“channels”: [
],
“number_of_channels”: 2
},
“items”: [

],
“segments”: [
]
},
“status”: “COMPLETED”
}
The views are as follows:

Transcripts – Represented by the transcripts element, it contains only the text format of the transcript. In multi-speaker, multi-channel scenarios, concatenation of all transcripts is provided as a single block.
Speakers – Represented by the speaker_labels element, it contains both the text and itemized formats of the transcript grouped by speaker. It’s available only when the multi-speakers feature is enabled.
Channels – Represented by the channel_labels element, it contains both the text and itemized formats of the transcript, grouped by channel. It’s available only when the multi-channels feature is enabled.
Items – Represented by the items element, it contains only the itemized format of the transcript. In multi-speaker, multi-channel scenarios, items are enriched with additional properties, indicating speaker and channel.
Segments – Represented by the segments element, it contains both the text and itemized formats of the transcript, grouped by alternative transcription. It’s available only when the alternative results feature is enabled.

Transcription metadata in the items view
In the items view, items are provided in the form of a timely ordered list, with every item containing additional metadata information:

{
“results”: {
“items”: [
{
“channel_label”: “ch_0”,
“start_time”: “1.509”,
“speaker_label”: “spk_0”,
“end_time”: “2.21”,
“alternatives”: [
{
“confidence”: “0.999”,
“content”: “Hi”
}
],
“type”: “pronunciation”
},
{
“channel_label”: “ch_0”,
“speaker_label”: “spk_0”,
“alternatives”: [
{
“confidence”: “0.0”,
“content”: “,”
}
],
“type”: “punctuation”
},
{
“channel_label”: “ch_0”,
“start_time”: “2.22”,
“speaker_label”: “spk_0”,
“end_time”: “2.9”,
“alternatives”: [
{
“confidence”: “0.999”,
“content”: “welcome”
}
],
“type”: “pronunciation”
},
{
“channel_label”: “ch_0”,
“speaker_label”: “spk_0”,
“alternatives”: [
{
“confidence”: “0.0”,
“content”: “.”
}
],
“type”: “punctuation”
}
]
}
}

The metadata is as follows:

Type – The type value indicates if the specific item is a punctuation or a pronunciation. Examples of supported punctuations are comma, full stop, and question mark.
Alternatives – An array of objects containing the actual transcription, along with confidence level, ordered by confidence level. When alternative results feature is not enabled, this list always has one item only.

Confidence – An indication of how confident Amazon Transcribe is about the correctness of transcription. It uses values from 0–1, with 1 indicating 100% confidence.
Content – The transcribed word.

Start time – A time pointer of the audio or video file indicating the start of the item in ss.SSS format.
End time – A time pointer of the audio or video file indicating the end of the item in ss.SSS format.
Channel label – The channel identifier, which is present in the item only when the channel identification feature was enabled in the job configuration.
Speaker label – The speaker identifier, which is present in the item only when the speaker partitioning feature was enabled in the job configuration.

Identifying paragraphs
Identification of paragraphs relies on metadata information in the items view. In particular, we utilize start and end time information along with transcription type and content to identify sentences and then decide which sentences are the best candidates for paragraph entry points.
A sentence is considered to be a list of transcription items that exists between punctuation items that indicate full stop. Exceptions to this are the start and end of the transcript, which are by default sentence boundaries. The following figure shows an example of these items. Sentence identification is straightforward with Amazon Transcribe because punctuation is an out-of-the-box feature, along with the punctuation types comma, full stop, question mark. In this concept, we utilize a full stop as the sentence boundary.
Not every sentence should be a paragraph point. To identify paragraphs, we introduce a new insight at the sentence level called a start delay, as illustrated in the following figure. We use a start delay to define the time delay the speaker introduces to the pronunciation of the current sentence in comparison to the previous one. Calculation of the start delay requires the start time of the current sentence and end time of the previous one per speaker. Because Amazon Transcribe provides start and end times per item, the calculation requires the usage of the first and last items of the current and previous sentences, respectively.
Knowing the start delays of every sentence, we can apply statistical analysis and figure out the significance of every delay in comparison to the total population of delays. In our context, significant delays are those that are over the population’s typical duration. The following graph shows an example. For this concept, we decide to accept the sentences with start delays greater than the mean value as significant, and introduce a paragraph point at the beginning of every such sentence. Apart from the mean value, there are other options, like accepting all start delays greater than the median, or third quantile or upper fence value of the population.
We add one more additional step to the paragraph identification process, taking into consideration the number of words contained by each paragraph. When paragraphs contain a significant number of words, we run a split operation, thereby adding one more paragraph to the final result.
In the context of word counts, we define as significant the word counts that exceed the upper fence value. We make this decision deliberately, so that we restrict split operations to the paragraphs that truly behave as outliers in our results. The following graph shows an example. The split operation selects the new paragraph entry point by considering the maximum sentence start delay insight. This way, the new paragraph is introduced at the sentence that exhibits the max start delay inside the current paragraph. Splits can be repeated until no word count exceeds the selected boundary, in our case the upper fence value. The following figure shows an example.
Conclusion
In this post, we presented a concept to automatically introduce paragraphs to your transcripts, without manual intervention, based on the metadata Amazon Transcribe provides along with the actual transcript. This concept is not language or accent specific, because it relies on non-linguistic metadata to suggest paragraph entry points. Future variations can include grammatical or semantic information on a per-language case, further enhancing the paragraph identification logic.
If you have feedback about this post, submit your comments in the comments section. We look forward to hearing from you. Check out Amazon Transcribe Features for additional features that will help you get the most value out of your transcripts.

About the Authors
Kostas Tzouvanas is an Enterprise Solution Architect at Amazon Web Services. He helps customers architect cloud-based solutions to achieve their business potential. His main focus is trading platforms and high performance computing systems. He is also passionate about genomics and bioinformatics.
Pavlos Kaimakis is an Enterprise Solutions Architect looking after Enterprise customers in GR/CY/MT supporting them with his experience to design and implement solutions that drive value to them. Pavlos has spent the biggest amount of time in his career in the product and customer support sector – both from an engineering and a management perspective. Pavlos loves travelling and he’s always up for exploring new places in the world.

Build machine learning-ready datasets from the Amazon SageMaker offlin …

Amazon SageMaker Feature Store is a purpose-built service to store and retrieve feature data for use by machine learning (ML) models. Feature Store provides an online store capable of low-latency, high-throughput reads and writes, and an offline store that provides bulk access to all historical record data. Feature Store handles the synchronization of data between the online and offline stores.
Because model development is an iterative process, customers will frequently query the offline store and build various datasets for model training. Currently, there are several ways to access features in the offline store, including running SQL queries with Amazon Athena or using Spark SQL in Apache Spark. However, these patterns require writing ad hoc (and sometimes complex) SQL statements, which isn’t always suitable for the data scientist persona.
Feature Store recently extended the SageMaker Python SDK to make it easier to create datasets from the offline store. With this release, you can use a new set of methods in the SDK to create datasets without writing SQL queries. These new methods support common operations such as time travel, filtering duplicate records, and joining multiple feature groups while ensuring point-in-time accuracy.
In this post, we demonstrate how to use the SageMaker Python SDK to build ML-ready datasets without writing any SQL statements.
Solution overview
To demonstrate the new functionality, we work with two datasets: leads and web marketing metrics. These datasets can be used to build a model that predicts if a lead will convert into a sale given marketing activities and metrics captured for that lead.
The leads data contains information on prospective customers who are identified using Lead_ProspectID. The features for a lead (for example, LeadSource) can be updated over time, which results in a new record for that lead. The Lead_EventTime represents the time in which each record is created. The following screenshot shows an example of this data.

The web marketing metrics data tracks the engagement metrics for a lead, where each lead is identified using the Web_ProspectID. The Web_EventTime represents the time in which the record was created. Unlike the leads feature group, there is only one record per lead in this feature group. The following screenshot shows an example of this data.

We walk through the key parts of the sagemaker-feature-store-offline-sdk.ipynb notebook, which demonstrates the following steps:

Create a dataset from a feature group.
Join multiple feature groups.
Create a point-in-time join between a feature group and a dataset based on a set of events at specific timestamps.
Retrieve feature history within a specific time range.
Retrieve features as of a specific timestamp.

Prerequisites
You need the following prerequisites:

An AWS account.
A SageMaker Jupyter notebook instance. Access the code from the GitHub repository and upload it to your notebook instance.
You can also run the notebook in an Amazon SageMaker Studio environment, which is an IDE for ML development. You can clone the GitHub repo via a terminal inside the Studio environment using the following command:

git clone https://github.com/aws-samples/amazon-sagemaker-feature-store-offline-queries.git

We assume a feature group for the leads data has been created using the existing FeatureGroup.create method, and can be referenced using the variable base_fg. For more information on feature groups, refer to Create Feature Groups.
Create a dataset from a feature group
To create a dataset using the SageMaker SDK, we use the new FeatureStore class, which contains the create_dataset method. This method accepts a base feature group that may be joined with other feature groups or DataFrames. We start by providing the leads feature group as the base and an Amazon Simple Storage Service (Amazon S3) path to store the dataset:

from sagemaker.feature_store.feature_store import FeatureStore
feature_store = FeatureStore(sagemaker_session=feature_store_session)
ds1_builder = feature_store.create_dataset (base=base_fg,
output_path=f”s3://{s3_bucket_name}/dataset_query_results”,)

The create_dataset method returns a DatasetBuilder object, which can be used to generate a dataset from one or multiple feature groups (which we demonstrate in the next section). To create a simple dataset consisting of only the leads features, we invoke the to_csv_file method. This runs a query in Athena to retrieve the features from the offline store, and saves the results to the specified S3 path.

csv, query = ds1_builder.to_csv_file()
# Show S3 location of CSV file
print(f’CSV file: {csv}’)

Join multiple feature groups
With the SageMaker SDK, you can easily join multiple feature groups to build a dataset. You can also perform join operations between an existing Pandas DataFrame to a single or multiple feature groups. The base feature group is an important concept for joins. The base feature group is the feature group that has other feature groups or the Pandas DataFrame joined to it.
While creating the dataset using the create_dataset function, we use the with_feature_group method, which performs an inner join between the base feature group and another feature group using the record identifier and the target feature name in the base feature group. In our example, the base feature group is the leads feature group, and the target feature group is the web marketing feature group. The with_feature_group method accepts the following arguments:

feature_group – This is the feature group we are joining with. In our code sample, the target feature group is created by using the web marketing dataset.
target_feature_name_in_base – The name of the feature in the base feature group that we’re using as a key in the join. We use Lead_ProspectID as the record identifier for the base feature group.
included_feature_names – This is the list of the feature names of the base feature group. We use this field to specify the features that we want to include in the dataset.

The following code shows an example of creating a dataset by joining the base feature group with the target feature group:

join_builder = feature_store.create_dataset(base=base_fg,
output_path=f”s3://{s3_bucket_name}/dataset_query_results”).with_feature_group(
feature_group=target_fg,
target_feature_name_in_base=”Lead_ProspectID”,
included_feature_names=[“Web_ProspectID”,
“LastCampaignActivity”,”PageViewsPerVisit”,
“TotalTimeOnWebsite”,”TotalWebVisits”,
“AttendedMarketingEvent”,”OrganicSearch”,
“ViewedAdvertisement”,],)

You can extend the join operations to include multiple feature groups by adding the with_feature_group method at the end of the preceding code example and defining the required arguments for the new feature group. You can also perform join operations with an existing DataFrame by defining the base to be your existing Pandas DataFrame and joining with the interested feature groups. The following code sample shows how to create dataset with an existing Pandas DataFrame and an existing feature group:

ds2_builder = feature_store.create_dataset(
base=new_records_df2, # Pandas DataFrame
event_time_identifier_feature_name=”Lead_EventTime”,
record_identifier_feature_name=”Lead_ProspectID”,
output_path=f”s3://{s3_bucket_name}/dataset_query_results”,).with_feature_group(
base_fg, “Lead_ProspectID”, [“LeadSource”])

For more examples on these various configurations, refer to Create a Dataset from your Feature Groups.
Create a point-in-time join
One of the most powerful capabilities of this enhancement is to perform point-in-time joins simply and without the need to write complex SQL code. When building ML models, data scientists need to avoid data leakage or target leakage, which is accidentally using data during model training that wouldn’t be available at the time of prediction. For instance, if we’re trying to predict credit card fraud, we should exclude transactions that arrive after the fraudulent charge we’re trying to predict, otherwise the trained model could use this post-fraud information to alter the model, making it generalize less well.
Retrieval of point-in-time accurate feature data requires you to supply an entity DataFrame that provides a set of record IDs (or primary key) and corresponding event times that serve as the cutoff time for the event. This retrieval mechanism is sometimes referred to as row-level time travel, because it allows a different time constraint to be applied for each row key. To perform point-in-time joins with the SageMaker SDK, we use the Dataset Builder class and provide the entity DataFrame as the base argument to the constructor.
In the following code, we create a simple entity DataFrame with two records. We set the event times, used to indicate the cutoff time, near the middle of the time series data (mid-January 2023):

# Create Events (entity table) dataframe to pass Timestamp for Point-in-Time Join
events = [[‘2023-01-20T00:00:00Z’, record_id1],
[‘2023-01-15T00:00:00Z’, record_id2]]
df_events = pd.DataFrame(events, columns=[‘Event_Time’, ‘Lead_ProspectID’])

When we use the point_in_time_accurate_join functionality with the create_dataset call, the internal query excludes all records with timestamps later then the cutoff times supplied, returning the latest feature values that would have been available at the time of the event:

# Create Dataset Builder using point-in-time-accurate-join function

pit_builder = feature_store.create_dataset(
base=df_events,
event_time_identifier_feature_name=’Event_Time’,
record_identifier_feature_name=’Lead_ProspectID’,
output_path=f”s3://{s3_bucket_name}/{s3_prefix}/dataset_query_results”
).with_feature_group(base_fg, “Lead_ProspectID”
).point_in_time_accurate_join(
).with_number_of_recent_records_by_record_identifier(1)

Notice that there are only two records in the DataFrame returned by the point-in-time join. This is because we only submitted two record IDs in the entity DataFrame, one for each Lead_ProspectID we want to retrieve. The point-in-time criteria specifies that a record’s event time (stored in the Lead_Eventtime field) must contain a value that is less than the cutoff time.

Additionally, we instruct the query to retrieve only the latest record that meets this criteria because we have applied the with_number_of_recent_records_by_record_identifier method. When used in conjunction with the point_in_time_accurate_join method, this allows the caller to specify how many records to return from those that meet the point-in-time join criteria.
Compare point-in-time join results with Athena query results
To verify the output returned by the SageMaker SDK point_in_time_accurate_join function, we compare it to the result of an Athena query. First, we create a standard Athena query using a SELECT statement tied to the specific table created by the Feature Store runtime. This table name can be found by referencing the table_name field after instantiating the athena_query from the FeatureGroup API:

SELECT * FROM “sagemaker_featurestore”.”off_sdk_fg_lead_1682348629″
WHERE “off_sdk_fg_lead_1682348629″.”Lead_ProspectID” = ‘5e84c78f-6438-4d91-aa96-b492f7e91029′

The Athena query doesn’t contain any point-in-time join semantics, so it returns all records that match the specified record_id (Lead_ProspectID).
Next, we use the Pandas library to sort the Athena results by event times for easy comparison. The records with timestamps later than the event times specified in the entity DataFrame (for example, 2023-01-15T00:00:00Z) submitted to the point_in_time_accurate_join don’t show up in the point-in-time results. Because we additionally specified that we only want a single record from the preceding create_dataset code, we only get the latest record prior to the cutoff time. By comparing the SageMaker SDK results with the Athena query results, we see that the point-in-time join function returned the proper records.

Therefore, we have confidence that we can use the SageMaker SDK to perform row-level time travel and avoid target leakage. Furthermore, this capability works across multiple feature groups that may be refreshed on completely different timelines.
Retrieve feature history within a specific time range
We also want to demonstrate the use of specifying a time range window when joining the feature groups to form a dataset. The time window is defined using with_event_time_range, which accepts two inputs, starting_timestamp and ending_timestamp, and returns a dataset builder object. In our code sample, we set the retrieval time window for 1 full day from 2022-07-01 00:00:00 until 2022-07-02 00:00:00.
The following code shows how to create a dataset with the specified event time window while joining the base feature group with the target feature group:

# Setup Event Time window: seconds of unix epoch time
# Start at 07/01/2022 and set time window to one day
start_ts = 1656633600
time_window = 86400
# Using hard-coded timestamps from dataset, then adding time window
datetime_start = datetime.fromtimestamp(start_ts)
datetime_end = datetime.fromtimestamp(start_ts+time_window)
print(f’Setting retrieval time window: {datetime_start} until {datetime_end}’)
time_window_builder = (feature_store.create_dataset(
base=base_fg, output_path=f”s3://{s3_bucket_name}/dataset_query_results”).with_feature_group(
feature_group=target_fg,
target_feature_name_in_base=”Lead_ProspectID”,
included_feature_names=[“Web_ProspectID”,”LastCampaignActivity”,”PageViewsPerVisit”,
“TotalTimeOnWebsite”,”TotalWebVisits”,”AttendedMarketingEvent”,
“OrganicSearch”,”ViewedAdvertisement”,],)
.with_event_time_range(starting_timestamp=datetime_start,ending_timestamp=datetime_end))

We also confirm the difference between the sizes of the dataset created using with_event_time_range by exporting to a Pandas DataFrame with the to_dataframe() method and displaying the data. Notice how the result set has only a fraction of the original 10,020 records, because it only retrieves records whose event_time is within the 1-day time period.
Retrieve features as of a specific timestamp
The DatasetBuilder as_of method retrieves features from a dataset that meet a timestamp-based constraint, which the caller provides as an argument to the function. This mechanism is useful for scenarios such as rerunning experiments on previously collected data, backtesting time series models, or building a dataset from a previous state of the offline store for data auditing purposes. This functionality is sometimes referred to as time travel because it essentially rolls back the data store to an earlier date and time. This time constraint is also referred to as the cutoff timestamp.
In our sample code, we first create the cutoff timestamp by reading the write_time value for the last record written to the Feature Store, the one written with put_record. Then we provide this cutoff timestamp to the DatasetBuilder as an argument to the as_of method:

# Create dataset using as-of timestamp
print(f’using cut-off time: {asof_cutoff_datetime}’)
as_of_builder = feature_store.create_dataset(
base=base_fg,
output_path=f”s3://{s3_bucket_name}/{s3_prefix}/dataset_query_results”).with_feature_group(
feature_group=target_fg,
target_feature_name_in_base=’Lead_ProspectID’,
included_feature_names=[‘Web_ProspectID’,’Web_EventTime’,
‘TotalWebVisits’]).as_of(asof_cutoff_datetime)

It’s important to note that the as_of method applies the time constraint to the internal write_time field, which is automatically generated by Feature Store. The write_time field represents the actual timestamp when the record is written to the data store. This is different than other methods like point-in-time-accurate-join and with_event_time_range that use the client-provided event_time field as a comparator.
Clean up
Be sure to delete all the resources created as part of this example to avoid incurring ongoing charges. This includes the feature groups and the S3 bucket containing the offline store data.
SageMaker Python SDK experience vs. writing SQL
The new methods in the SageMaker Python SDK allow you to quickly create datasets and move to the training step quickly during the ML lifecycle. To show the time and effort that can be saved, let’s examine a use case where we need to join two feature groups while retrieving the features within a specified time frame. The following figure compares the Python queries on the offline Feature Store vs. SQL used to create the dataset behind a Python query.

As you can see, the same operation of joining two feature groups requires you to create a long, complex SQL query, whereas it can be accomplished using just the with_feature_group and with_event_time_range methods in the SageMaker Python SDK.
Conclusion
The new offline store methods in the Python SageMaker SDK allow you to query your offline features without having to write complex SQL statements. This provides a seamless experience for customers who are accustomed to writing Python code during model development. For more information about feature groups, refer to Create a Dataset From Your Feature Groups and Feature Store APIs: Feature Group.
The full example in this post can be found in the GitHub repository. Give it a try and let us know your feedback in the comments.

About the Authors
Paul Hargis has focused his efforts on machine learning at several companies, including AWS, Amazon, and Hortonworks. He enjoys building technology solutions and teaching people how to leverage them. Paul likes to help customers expand their machine learning initiatives to solve real-world problems. Prior to his role at AWS, he was lead architect for Amazon Exports and Expansions, helping amazon.com improve the experience for international shoppers.
Mecit Gungor is an AI/ML Specialist Solution Architect at AWS helping customers design and build AI/ML solutions at scale. He covers a wide range of AI/ML use cases for Telecommunication customers and currently focuses on Generative AI, LLMs, and training and inference optimization. He can often be found hiking in the wilderness or playing board games with his friends in his free time.
Tony Chen is a Machine Learning Solutions Architect at AWS, helping customers design scalable and robust machine learning capabilities in the cloud. As a former data scientist and data engineer, he leverages his experience to help tackle some of the most challenging problems organizations face with operationalizing machine learning.
Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience in end-to-end designs and solutions for machine learning; business analytics within financial, operational, and marketing analytics; healthcare; supply chain; and IoT. Outside work, Sovik enjoys traveling and watching movies.

Revolutionizing Scene Reconstruction with Break-A-Scene: The Future of …

Humans naturally possess the ability to break down complicated scenes into component elements and imagine them in various scenarios. One might easily picture the same creature in multiple attitudes and locales or imagine the same bowl in a new environment, given a snapshot of a ceramic artwork showing a creature reclining on a bowl. Today’s generative models, however, need help with tasks of this nature. Recent research suggests personalizing large-scale text-to-image models by optimizing freshly added specialized text embeddings or fine-tuning the model weights, given many pictures of a single idea, to enable synthesizing instances of this concept in unique situations.

In this study, researchers from the Hebrew University of Jerusalem, Google Research, Reichman University and Tel Aviv University present a novel scenario for textual scene decomposition: given a single image of a scene that might include several concepts of various types, their objective is to separate out a specific text token for each idea. This permits the creation of innovative pictures from verbal prompts that highlight certain concepts or combinations of many themes. The ideas they want to learn or extract from the customization activity are only sometimes apparent, which makes it potentially unclear. Previous works have dealt with this ambiguity by focusing on a single topic at a time and using a variety of photographs to show the notion in various settings. However, alternative methods are required to resolve the problem when transitioning to a single-picture situation. 

They specifically suggest adding a series of masks to the input image to add further information about the concepts they want to extract. These masks may be free-form ones that the user supplies or ones produced by an automated segmentation approach (such as). Adapting the two primary techniques, TI and DB, to this environment indicate a reconstruction-editability tradeoff. Whereas TI fails to rebuild the ideas in a new context properly, DB needs more context control due to overfitting. In this study, the authors suggest a unique customization pipeline that successfully strikes a compromise between maintaining learned concept identity and preventing overfitting. 

Figure 1 provides an overview of our methodology, which has four main parts: (1) We use a union-sampling approach, in which a new subset of the tokens is sampled every time, to train the model to handle various combinations of created ideas. Additionally, (2) in order to prevent overfitting, we employ a two-phase training regime, starting with the optimisation of just the recently inserted tokens with a high learning rate and continuing with the model weights in the second phase with a reduced learning rate. The desired ideas are reconstructed by use of a (3) disguised diffusion loss. Fourth, we employ a unique cross-attention loss to promote disentanglement between the learned ideas.

Their pipeline contains two steps, which are shown in Figure 1. To rebuild the input image, they first identify a group of special text characters (called handles), freeze the model weights, and then optimize the handles. They continue to refine the handles while switching over to fine-tuning the model weights in the second phase. Their method strongly emphasizes disentangling concept extraction or ensuring that each handle is connected to just one target concept. They also understand that the customization procedure cannot be performed independently for each idea to develop graphics showcasing combinations of notions. In response to this discovery, we offer union sampling, a training approach that meets this need and improves the creation of idea combinations. 

They do this by utilizing the masked diffusion loss, a modified variation of the standard diffusion loss. The model is not penalized if a handle is linked to more than one concept because of this loss, which guarantees that each custom handle may deliver its intended idea. Their main finding is that they may punish such entanglement by additionally imposing a loss on the cross-attention maps, which are known to correlate with the scene layout. Due to the additional loss, each handle will concentrate solely on the areas covered by its target concept. They offer several automatic measurements for the task to compare their methodology to the benchmarks. 

They have made the following contributions, in order: (1) they introduce the novel task of textual scene decomposition; (2) they propose a novel method for this situation that strikes a balance between concept fidelity and scene editability by learning a set of disentangled concept handles; and (3) they suggest several automatic evaluation metrics and use them, along with a user study, to demonstrate the effectiveness of their approach. They also conduct user research, which shows that human assessors also like their methodology. In their last part, they suggest several applications for their technique.

Check Out The Paper and Project Page. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Revolutionizing Scene Reconstruction with Break-A-Scene: The Future of AI-Powered Object Extraction and Remixing appeared first on MarkTechPost.

Meet GPTutor: A ChatGPT-Powered Programming Tool For Code Explanation …

In recent years, the need for competent programmers has increased the number of people learning to code. However, a teacher shortage makes it difficult to create tailored learning experiences. Students sometimes need help with novel programming languages and difficult code samples. Natural Language Generation (NLG) models, such as ChatGPT, transform programming education by providing tailored training. These models comprehend difficult programming ideas and deliver human-like explanations. NLG provides learners access to personalized lectures, code examples, and personalized explanations.

In this context, a research team from Taiwan recently published a paper to introduce  GPTutor. GPTutor, a ChatGPT-powered programming tool, is a Visual Studio Code extension that leverages the capabilities of the ChatGPT API to provide comprehensive explanations for programming code. The main idea of the proposed tool is to utilize NLG models as programming tutors to provide code explanations. By leveraging the OpenAI ChatGPT API, GPTutor retrieves pertinent code and provides highly precise and concise explanations. Existing NLG applications have limitations in offering comprehensive, accurate, and up-to-date descriptions for programming code. GPTutor aims to overcome these limitations and provide concise and accurate code explanations by analyzing the source code.

The source code for GPTutor is freely available on GitHub and has been successfully published on the Visual Studio Code Extension Marketplace. Users install the extension, set their OpenAI API key, and select the GPT model if desired. They can then hover over a code block in the supported language (currently Move) to receive explanations, comments, or audits for the selected code. Students, programming teachers, and coding boot camp instructors have all expressed satisfaction with GPTutor’s user-friendly interface and its capacity to deliver adequate code explanations. Users are especially taken aback by GPTutor’s ability to provide pertinent source code for functions in prompts, resulting in more thorough explanations. In addition, comparative evaluations demonstrate that GPTutor outperforms Vanilla ChatGPT and GitHub Copilot in delivering accurate code explanations.

The authors of the paper propose several areas of future work for GPTutor. One key focus is enhancing performance and personalization by applying prompt programming techniques. This involves optimizing prompts and employing heuristic search methods to identify relevant code, with the ultimate goal of providing personalized explanations and an enhanced user experience. Furthermore, the authors plan to evaluate the effectiveness of GPTutor in real-world scenarios by observing student interactions with the tool during programming assignments. This evaluation will involve collaborating with coding course lecturers and utilizing appropriate analysis techniques to assess the relationship between student grades and the frequency of GPTutor usage.

In conclusion, GPTutor is a ChatGPT-powered programming tool that addresses the challenges in programming education by providing comprehensive code explanations. It has received positive feedback from users, and future work includes enhancing performance and personalization through prompt programming techniques. The effectiveness of GPTutor will be evaluated in real-world scenarios. GPTutor continues to evolve as a valuable tool for programming education.

GPTutor will be evaluated in real-world scenarios to measure its impact on student learning outcomes. Observing how students interact with the tool during programming assignments and analyzing the correlation between grades and GPTutor usage frequency will validate its effectiveness as an educational tool.

Check Out The Paper and Plugin. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet GPTutor: A ChatGPT-Powered Programming Tool For Code Explanation Provided As A VSCode Extension appeared first on MarkTechPost.