Build brand loyalty by recommending actions to your users with Amazon …

Amazon Personalize is excited to announce the new Next Best Action (aws-next-best-action) recipe to help you determine the best actions to suggest to your individual users that will enable you to increase brand loyalty and conversion.
Amazon Personalize is a fully managed machine learning (ML) service that makes it effortless for developers to deliver highly personalized user experiences in real time. It enables you to improve customer engagement by powering personalized product and content recommendations in websites, applications, and targeted marketing campaigns. You can get started without any prior ML experience, using APIs to easily build sophisticated personalization capabilities in a few clicks. All your data is encrypted to be private and secure.
In this post, we show you how to use the Next Best Action recipe to personalize action recommendations based on each user’s past interactions, needs, and behavior.
Solution overview
With the rapid growth of digital channels and technology advances that make hyper-personalization more accessible, brands struggle to determine what actions will maximize engagement for each individual user. Brands either show the same actions to all users or rely on traditional user segmentation approaches to recommend actions to each user cohort. However, these approaches are no longer sufficient, because every user expects a unique experience and tends to abandon brands that don’t understand their needs. Furthermore, brands are unable to update the action recommendations in real time due to the manual nature of the process.
With Next Best Action, you can determine the actions that have the highest likelihood of engaging each individual user based on their preferences, needs, and history. Next Best Action takes the in-session interests of each user into account and provides action recommendations in real time. You can recommend actions such as enrolling in loyalty programs, signing up for a newsletter or magazine, exploring a new category, downloading an app, and other actions that encourage conversion. This will enable you to improve each user’s experience by providing them with recommendations on actions across their user journey that will help promote long-term brand engagement and revenue. It will also help improve your return on marketing investment by recommending the action that each user has a high likelihood of taking.
AWS Partners like Credera are excited by the personalization possibilities that the Amazon Personalize Next Best Action will unlock for their customers.

“Amazon Personalize is a world-class machine learning solution that enables companies to create meaningful customer experiences across a wide array of use cases without extensive rework or up-front implementation cost that is typically required of these types of solutions. We are really excited about the addition of the Next Best Action capability that will enable customers to provide personalized action recommendations, significantly improving their digital experiences and driving additional business value. Specifically, we expect anyone working within the retail or content space to see an improved experience for their customers and higher conversions as a direct result of using Amazon Personalize. We are extremely thrilled to be a launch partner with AWS on this release and looking forward to empowering businesses to drive ML-based personalized solutions with Next Best Action.”
– Jason Goth, Partner and Chief Technology Officer, Credera.

Example use cases
To explore the impact of this new feature in greater detail, let’s review an example by taking three users: A (User_id 11999), B (User_id 17141), and C (User_id 8103), who are in different stages of their user journey while making purchases on a website. We then see how Next Best Action suggests the optimal actions for each user based on their past interactions and preferences.
First, we look at the action interactions dataset to understand how users have interacted with actions in the past. The following example shows the three users and their different shopping patterns. User A is a frequent buyer and has shopped mostly in the “Beauty & Grooming” and “Jewelry” categories in the past. User B is a casual buyer who has made a few purchases in the “Electronics” category in the past, and User C is a new user on the website who has made their first purchase in the “Clothing” category.

User Type
User_id
Actions
Action_Event_Type
Timestamp

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-09-17 20:03:05

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-09-18 19:28:38

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-09-20 17:49:52

User A
11999
Purchase in “Jewelry” category
taken
2023-09-26 18:36:16

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-09-30 19:21:05

User A
11999
Download the mobile app
taken
2023-09-30 19:29:35

User A
11999
Purchase in “Jewelry” category
taken
2023-10-01 19:35:47

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-10-04 19:19:34

User A
11999
Purchase in “Jewelry” category
taken
2023-10-06 20:38:55

User A
11999
Purchase in “Beauty & Grooming” category
taken
2023-10-10 20:17:07

User B
17141
Purchase in “Electronics” category
taken
2023-09-29 20:17:49

User B
17141
Purchase in “Electronics” category
taken
2023-10-02 00:38:08

User B
17141
Purchase in “Electronics” category
taken
2023-10-07 11:04:56

User C
8103
Purchase in “Clothing” category
taken
2023-09-26 18:30:56

Traditionally, brands either show the same actions to all users or employ user segmentation strategies to recommend actions to their user base. The following table is an example of a brand showing the same set of actions to all users. These actions may or may not be relevant to the users, reducing their engagement with the brand.

User Type
User_id
Action Recommendations
Rank of Action

User A
11999
Subscribe to Loyalty Program
1

User A
11999
Download the mobile app
2

User A
11999
Purchase in “Electronics” category
3

User B
17141
Subscribe to Loyalty Program
1

User B
17141
Download the mobile app
2

User B
17141
Purchase in “Electronics” category
3

User C
8103
Subscribe to Loyalty Program
1

User C
8103
Download the mobile app
2

User C
8103
Purchase in “Electronics” category
3

Now let’s use Next Best Action to recommend actions for each user. After you define the actions eligible for recommendations, the aws-next-best-action recipe returns a ranked list of actions, personalized for each user, based on user propensity (the probability of a user taking a particular action, ranging between 0.0–1.0) and value of that action, if provided. For the purpose of this post, we only consider user propensity.
In the following example, we see that for User A (frequent buyer), Subscribe to Loyalty Program is the top recommended action with a propensity score of 1.00, which means that this user is most likely to enroll in the loyalty program because they have made numerous purchases. Therefore, recommending the action Subscribe to Loyalty Program to User A has a high probability of increasing User A’s engagement.

User Type
User_id
Action Recommendations
Rank of Action
Propensity Score

User A
11999
Subscribe to Loyalty Program
1
1.00

User A
11999
Purchase in “Jewelry” category
2
0.86

User A
11999
Purchase in “Beauty & Grooming” category
3
0.85

User B
17141
Purchase in “Electronics” category
1
0.78

User B
17141
Subscribe to Loyalty Program
2
0.71

User B
17141
Purchase in “Smart Homes” category
3
0.66

User C
8103
Purchase in “Handbags & Shoes” category
1
0.60

User C
8103
Download the mobile app
2
0.48

User C
8103
Purchase in “Clothing” category
3
0.46

Similarly, User B (casual buyer persona) has a higher probability to continue purchasing in “Electronics” category and also buying new products in a similar category, “Smart Homes”. Therefore, Next Best Action recommends you to prioritize actions, Purchase in “Electronics” category and Purchase in “Smart Homes” category. This means that if you prompt User B to buy products in these two categories, it can lead to greater engagement. We also notice the action to Subscribe to Loyalty Program is recommended to User B but with a lower propensity score of 0.71 as compared to User A, whose propensity score is 1.0. This is because users that have a deeper history and are further along their shopping journey benefit more from Loyalty programs due of the added benefits and are highly likely to interact more.
Finally, we see that Next Best Action for User C is purchasing in “Handbags & Shoes” category, which is similar to their previous action of Purchase in “Clothing” category. We also see that the propensity score to Download the mobile app is relatively lower (0.48) than another action, Purchase in “Handbags & Shoes” category, which has a higher propensity score of 0.60. This means that if you recommend User C to purchase products in a complementary category (“Handbags & Shoes”) over downloading the mobile app, they are more likely to stick with your brand and continue shopping in the future.
For more details on how to implement the Next Best Action (aws-next-best-action) recipe, refer to documentation.
Conclusion
The new Next Best Action recipe in Amazon Personalize helps you recommend the right actions to the right user in real time based on their individual behavior and needs. This will enable you to maximize user engagement and lead to greater conversion rates.
For more information about Amazon Personalize, see the Amazon Personalize Developer Guide.

About the Authors
Shreeya Sharma is a Sr. Technical Product Manager working with AWS AI/ML on Amazon Personalize. She has a background in computer science engineering, technology consulting, and data analytics. In her spare time, she enjoys traveling, performing theatre, and trying out new adventures.
Pranesh Anubhav is a Senior Software Engineer for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.
Aniket Deshmukh is an Applied Scientist in AWS AI labs supporting Amazon Personalize. Aniket works in the general area of recommendation systems, contextual bandits, and multi-modal deep learning.

Check Out This New AI System Called Student of Games (SoG) that is cap …

There is a long tradition of using games as AI performance indicators. Search and learning-based approaches performed well in various perfect information games, while game theory-based methods performed well in a few imperfect information poker variations. By combining directed search, self-play learning, and game-theoretic reasoning, the AI researchers from EquiLibre Technologies, Sony AI, Amii and Midjourney, working with Google’s DeepMind project, propose Student of Games, a general-purpose algorithm that unifies earlier efforts. With its high empirical performance in big perfect and imperfect information games, Student of Games is a significant step toward developing universal algorithms applicable in any setting. With increasing computational and approximation power, they show that Student of Games is robust and eventually achieves flawless play. Student of Games performs strongly in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold ’em poker, and defeats the state-of-the-art agent in Scotland Yard. This imperfect information game illustrates the value of guided search, learning, and game-theoretic reasoning.

To demonstrate how far artificial intelligence has progressed, a computer was taught to play a board game and then improved to the point where it could beat humans at the game. With this latest study, the team has made significant progress toward creating artificial general intelligence, where a computer can perform tasks previously thought impossible for a machine.

Most board game-playing computers have been designed to play just one game, like chess. By designing and constructing such systems, scientists have created a form of constrained artificial intelligence. The researchers behind this new project have developed an intelligent system that can compete in games that require a wide range of abilities.

What is SoG – “Student Of Games”?

Combining search, learning, and game-theoretic analysis into a single algorithm, SoG has many practical applications. SoG comprises a GT-CFR technique for learning CVPNs and sound self-play. In particular, SoG is a reliable algorithm for optimal and suboptimal information games: SoG is guaranteed to generate a better approximation of minimax-optimal techniques as computer resources improve. This discovery is also proven empirically in Leduc poker, where additional search leads to test-time approximation refinement, unlike any pure RL systems that do not use search.

Why is SoG so effective?

SoG employs a technique called growing-tree counterfactual regret minimization (GT-CFR), which is a form of local search that may be performed at any time and involves the non-uniform construction of subgames to increase the weight of the subgames with which the most important future states are associated. Further, SoG employs a learning technique called sound self-play, which trains value-and-policy networks based on game results and recursive sub-searches applied to scenarios discovered in earlier searches. As a significant step toward universal algorithms that can be learned in any situation, SoG exhibits good performance across multiple problem domains with perfect and imperfect information. In inferior information games, standard search applications face well-known issues.

Summary of Algorithms

The SoG method uses acoustic self-play to instruct the agent: When making a choice, each player uses a well-tuned GT-CFR search coupled with a CVPN to produce a policy for the current state, which is then utilized to sample an action randomly. GT-CFR is a two-stage process that begins with the present public state and ends with a mature tree. The current public tree’s CFR is updated during the regret update phase. During the expansion phase, new general forms are added to the tree using expansion trajectories based on simulation. GT-CFR iterations comprise one regret updating phase run and one expansion phase run.

Training data for the value and policy networks is generated throughout the self-play process: search queries (public belief states queried by the CVPN during the GT-CFR regret update phase) and full-game trajectories. The search queries must be resolved to update the value network based on counterfactual value targets. The policy network can be adjusted to targets derived from the full-game trajectories. The actors create the self-play data (and answer inquiries) while the trainers discover and implement new networks and occasionally refresh the actors.

Some Limitations

The use of betting abstractions in poker might be abandoned in favor of a generic action-reduction policy for vast action spaces.

A generative model that samples world states and works on the sampled subset could approximate SoG, which currently necessitates enumerating each public state’s information, which can be prohibitively expensive in some games.

Strong performance in challenge domains often requires a large amount of computational resources; an intriguing question is whether or not this level of performance is attainable with fewer resources.

The research team believes it has the potential to thrive at other sorts of games due to its ability to teach itself how to play nearly any game, and it has already beaten rival AI systems and humans at Go, chess, Scotland Yard, and Texas Hold ’em poker.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Check Out This New AI System Called Student of Games (SoG) that is capable of both Beating Humans at a Variety of Games and Learning to Play New Ones appeared first on MarkTechPost.

Meet snnTorch: An Open-Source Python Package for Performing Gradient-b …

In artificial intelligence, efficiency, and environmental impact have become paramount concerns. Addressing this, Jason Eshraghian from UC Santa Cruz developed snnTorch, an open-source Python library implementing spiking neural networks, drawing inspiration from the brain’s remarkable efficiency in processing data. The crux, highlighted in the research, lies in the inefficiency of traditional neural networks and their escalating environmental footprint.

Traditional neural networks lack the elegance of the brain’s processing mechanisms. Spiking neural networks emulate the brain by activating neurons only when there’s input, in contrast to conventional networks that continually process data. Eshraghian aims to infuse AI with the efficiency observed in biological systems, providing a tangible solution to environmental concerns arising from the energy-intensive nature of current neural networks.

snnTorch, a pandemic-born passion project, has gained traction, surpassing 100,000 downloads. Its applications range from NASA’s satellite tracking to collaborations with companies like Graphcore, optimizing AI chips. SnnTorch is committed to harnessing the brain’s power efficiency and seamlessly integrating it into AI functionality. Eshraghian, with a chip design background, sees the potential for optimizing computing chips through software and hardware co-design for maximum power efficiency.

As snnTorch adoption grows, so does the need for educational resources. Eshraghian’s paper, a companion to the library, serves a dual purpose: documenting the code and providing an educational resource for brain-inspired AI. It takes an exceptionally honest approach, acknowledging the unsettled nature of neuromorphic computing, sparing students frustration in a field where even experts grapple with uncertainty.

The research’s honesty extends to its presentation, featuring code blocks—a departure from conventional research papers. These blocks, with explanations, underline the unsettled nature of certain areas, offering transparency in an often opaque field. Eshraghian aims to provide a resource he wished he had during his coding journey. This transparency resonates positively with reports of the research used in onboarding at neuromorphic hardware startups.

The research explores the limitations and opportunities of brain-inspired deep learning, recognizing the gap in understanding brain processes compared to AI models. Eshraghian suggests a path forward: identifying correlations and discrepancies. One key difference is the brain’s inability to revisit past data, focusing on real-time information—an opportunity for enhanced energy efficiency crucial for sustainable AI.

The research delves into the fundamental neuroscience concept: “fire together, wired together.” Traditionally seen as opposed to deep learning’s backpropagation, the researcher proposes a complementary relationship, opening avenues for exploration. Collaborating with biomolecular engineering researchers on cerebral organoids bridges the gap between biological models and computing research. Incorporating “wetware” into the software/hardware co-design paradigm, this multidisciplinary approach promises insights into brain-inspired learning.

In conclusion, snnTorch and its paper mark a milestone in the journey toward brain-inspired AI. Its success underscores the demand for energy-efficient alternatives to traditional neural networks. The researcher’s transparent and educational approach fosters a collaborative community dedicated to pushing neuromorphic computing boundaries. As guided by snnTorch insights, the field holds the potential to revolutionize AI and deepen our understanding of processes in the human brain.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meet snnTorch: An Open-Source Python Package for Performing Gradient-based Learning with Spiking Neural Networks appeared first on MarkTechPost.

Meet HyperHuman: A Novel AI Framework for Hyper-Realistic Human Genera …

The generation of hyper-realistic human images from user-defined conditions, such as text and pose, is meaningful for various applications, including image animation and virtual try-ons. Numerous efforts have been made to explore the task of controllable human image generation. Early methods either relied on variational auto-encoders (VAEs) in a reconstruction manner or improved realism through generative adversarial networks (GANs). Despite the creation of high-quality images by some methods, challenges like unstable training and limited model capacity confined them to small datasets with low diversity.

The recent emergence of diffusion models (DMs) has introduced a new paradigm for realistic synthesis, becoming the predominant architecture in Generative AI. However, exemplar text-to-image (T2I) models like Stable Diffusion and DALL·E 2 still struggle to create human images with coherent anatomy, such as arms, legs, and natural poses. The primary challenge lies in the non-rigid deformations of the human form, requiring structural information that is difficult to depict solely through text prompts.

Recent works, such as ControlNet and T2I-Adapter, have attempted to enable structural control for image generation by introducing a learnable branch to modulate pre-trained DMs, like Stable Diffusion, in a plug-and-play manner. However, these approaches suffer from feature discrepancies between the main and auxiliary branches, resulting in inconsistency between control signals (e.g., pose maps) and generated images. HumanSD proposes directly inputting the body skeleton into the diffusion U-Net through channel-wise concatenation to address this. However, this method is confined to generating artistic-style images with limited diversity. Additionally, human content is synthesized only with pose control, neglecting other crucial structural information like depth maps and surface-normal maps.

The work reported in this article proposes a unified framework, HyperHuman, to generate in-the-wild human images with high realism and diverse layouts. Its overview is illustrated in the figure below.

The key insight is recognizing the inherently structural nature of human images across multiple granularities, from coarse-level body skeletons to fine-grained spatial geometry. Capturing such correlations between explicit appearance and latent structure in one model is essential for generating coherent and natural human images. The paper establishes a large-scale human-centric dataset called HumanVerse, containing 340 million in-the-wild human images with comprehensive annotations. Based on this dataset, two modules are designed for hyper-realistic controllable human image generation: the Latent Structural Diffusion Model and the Structure-Guided Refiner. The former augments the pre-trained diffusion backbone to simultaneously denoise RGB, depth, and normal aspects, ensuring spatial alignment among denoised textures and structures.

Due to such meticulous design, the modeling of image appearance, spatial relationships, and geometry occurs collaboratively within a unified network. Each branch complements the others, incorporating both structural awareness and textural richness. An enhanced noise schedule eliminates low-frequency information leakage, ensuring uniform depth and surface-normal values in local regions. Employing the same timestep for each branch enhances learning and facilitates feature fusion. With spatially-aligned structure maps, the Structure-Guided Refiner composes predicted conditions for detailed, high-resolution image generation. Additionally, a robust conditioning scheme is designed to alleviate the impact of error accumulation in the two-stage generation pipeline.

A comparison with state-of-the-art techniques is reported below.

The first 4×4 grid of each row contains the input skeleton, jointly denoised normal, depth, and coarse RGB (512×512) as computed by HyperHuman.

This was the summary of HyperHuman, a novel AI framework for generating in-the-wild human images with high realism and diverse layouts. If you are interested and want to learn more about it, please feel free to refer to the links cited below. 

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meet HyperHuman: A Novel AI Framework for Hyper-Realistic Human Generation with Latent Structural Diffusion appeared first on MarkTechPost.

This AI Research Proposes a Fully Automated Solution for Consistent Ch …

A key component of many creative projects is the capacity of the created visual content to remain consistent across different situations, as seen in Figure 1. These include drawing book illustrations, building brands, making comics, presentations, websites, and more. Establishing brand identification, enabling narrative, improving communication, and fostering emotional connection all depend on this consistency. This study intends to address the problem of text-to-image generative models’ inability to generate images consistently despite their increasingly amazing capabilities. 

Figure 1: The Chosen One: The approach distills a representation that allows for consistent portrayal of the same character in new circumstances given a text prompt identifying a character.

They specifically discuss the challenge of consistent character generation, in which they derive a representation that allows them to generate consistent portrayals of the same character in new circumstances, given an input text prompt specifying a nature. Even though they discuss characters frequently in this paper, their work is relevant to general visual topics. Think of an illustrator creating a Plasticine cat figure, for instance. Enabling a prompt that describes the character to be used with a cutting-edge text-to-image model yields a range of inconsistent results, as shown in Figure 2. On the other hand, our study demonstrates how to condense a dependable depiction of the cat (2nd row), which may subsequently be applied to portray the same character in various circumstances. 

Figure 2: Consistency of identity: The technique yields the same cat, while a traditional text-to-image diffusion model creates multiple cats (all according to the input text) given the command “a Plasticine of a cute baby cat with big eyes.”

An array of ad hoc solutions has already been born out of the necessity for consistent character creation and the broad appeal of text-to-image generative models. These include employing visual variants and manually sorting them according to resemblance or utilizing celebrity names as prompts to create consistent individuals. Unlike these haphazard, labor-intensive methods, they provide a completely automated, systematic strategy for reliable character creation. The scholarly works that deal with personalization and narrative development are the ones that are most directly tied to their location. A few of these techniques take many user-supplied photos and create a representation of a specific character. Others cannot depend on the textual inversion of an already-existing human face portrayal or generalize to new characters outside the training set. 

In this study, researchers from Google Research, The Hebrew University of Jerusalem, Tel Aviv University, and Reichman University contend that producing a consistent character is often more important than visually replicating a certain appearance in many applications. As a result, they tackle a novel context in which their goal is to automatically extract a coherent depiction of a persona that need only adhere to one natural language description. Their approach allows for creating a novel, consistent character that does not necessarily need to mirror any current visual portrayal because it does not require any photos of the target character as input. Their fully automated approach to the consistent character generation challenge is predicated on the idea that groups of pictures with common traits would be present in an adequately large set of created images for a given prompt. 

It is possible to derive a representation from such a cluster that encapsulates the “common ground” amongst its pictures. They can improve the consistency of the output graphics while adhering to the original input prompt by repeating the procedure with this representation. First, they use a pre-trained feature extractor to create a gallery of images based on the given language prompt, and then they embed those images in an Euclidean space. They then group these embeddings into clusters and select the most unified collection as input for a customization technique that looks for a consistent identity. The next gallery of photos, which still depicts the input prompt but should show better consistency, is then created using the generated model. 

Iteratively repeating this technique continues till convergence. They perform user research and objectively and qualitatively evaluate their strategy against many baselines. Lastly, they provide several methods of application. To summarize, their contributions consist of three main parts:

They describe the job of consistent character development.

They provide a unique approach to this work.

They conduct user research and quantitative and qualitative evaluation of their technique to show its efficacy.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Research Proposes a Fully Automated Solution for Consistent Character Generation with the Sole Input being a Text Prompt appeared first on MarkTechPost.

NVIDIA AI Research Releases HelpSteer: A Multiple Attribute Helpfulnes …

In the significantly advancing field of Artificial Intelligence (AI) and Machine Learning (ML), developing intelligent systems that smoothly align with human preferences is crucial. The development of Large Language Models (LLMs), which seek to imitate humans by generating content and answering questions like a human, has led to massive popularity in AI. 

SteerLM, which has been recently introduced as a technique for supervised fine-tuning, gives end users more control over model responses during inference. In contrast to traditional methods like Reinforcement Learning from Human Feedback (RLHF), SteerLM uses a multi-dimensional collection of expressly stated qualities. This gives users the ability to direct AI to produce responses that satisfy preset standards, such as helpfulness, and allow customization based on particular requirements.

The criterion differentiating more helpful responses from less helpful ones is not well-defined in the open-source datasets currently available for training language models on helpfulness preferences. As a result, models trained on these datasets sometimes unintentionally learn to favor specific dataset artifacts, such as giving longer responses more weight than they actually have, even when those responses aren’t that helpful. 

To overcome this challenge, a team of researchers from NVIDIA has introduced a dataset called HELPSTEER, an extensive compilation created to annotate many elements that influence how helpful responses are. This dataset has a large sample size of 37,000 samples and has annotations for verbosity, coherence, accuracy, and complexity. It also has an overall helpfulness rating for every response. These characteristics go beyond a straightforward length-based preference to offer a more nuanced view of what constitutes a truly helpful response.

The team has used the Llama 2 70B model with the STEERLM approach to train language models efficiently on this dataset. The final model has outperformed all other open models without using training data from more complex models such as GPT-4, achieving a high score of 7.54 on the MT Bench. This demonstrates how well the HELPSTEER dataset works to improve language model performance and solve issues with other datasets.

The HELPSTEER dataset has been made available by the team for use under the International Creative Commons Attribution 4.0 Licence. This publicly available dataset can be used by language researchers and developers to continue the development and testing of helpfulness-preference-focused language models. The dataset can be accessed on HuggingFace at https://huggingface.co/datasets/nvidia/HelpSteer. 

The team has summarized their primary contributions as follows,

A 37k-sample helpfulness dataset has been developed consisting of annotated responses for accuracy, coherence, complexity, verbosity, and overall helpfulness.

Llama 2 70B has been trained using the dataset, and it has achieved a leading MT Bench score of 7.54, outperforming models that do not rely on private data, including GPT4.

The dataset has been made publicly available under a CC-BY-4.0 license to promote community access for further study and development based on the findings.

In conclusion, the HELPSTEER dataset is a great introduction as it bridges a significant void in currently available open-source datasets. The dataset has demonstrated efficacy in educating language models to give precedence to characteristics such as accuracy, consistency, intricacy, and expressiveness, leading to enhanced outcomes.

Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post NVIDIA AI Research Releases HelpSteer: A Multiple Attribute Helpfulness Preference Dataset for STEERLM with 37k Samples appeared first on MarkTechPost.

Learn How to Generate 3D Avatars from 2D Image Collections with this N …

Generative models, such as Generative Adversarial Networks (GANs), have the capacity to generate lifelike images of objects and dressed individuals after being trained on an extensive image collection. Although the resulting output is a 2D image, numerous applications necessitate diverse and high-quality virtual 3D avatars. These avatars should allow pose and camera viewpoint control while ensuring 3D consistency. To address the demand for 3D avatars, the research community explores generative models capable of automatically generating 3D shapes of humans and clothing based on input parameters like body pose and shape. Despite considerable advancements, most existing methods overlook texture and rely on precise and clean 3D scans of humans for training. Acquiring such scans is expensive, limiting their availability and diversity.

Developing a method for learning the generation of 3D human shapes and textures from unstructured image data presents a challenging and under-constrained problem. Each training instance exhibits unique shapes and appearances, observed only once from specific viewpoints and poses. While recent progress in 3D-aware GANs has shown impressive results for rigid objects, these methods face difficulties in generating realistic humans due to the complexity of human articulation. Although some recent work demonstrates the feasibility of learning articulated humans, existing approaches struggle with limited quality, resolution, and challenges in modeling loose clothing.

The paper reported in this article introduces a novel method for 3D human generation from 2D image collections, achieving state-of-the-art image and geometry quality while effectively modeling loose clothing.

The overview of the proposed method is illustrated below.

This method adopts a monolithic design capable of modeling both the human body and loose clothing, departing from the approach of representing humans with separate body parts. Multiple discriminators are incorporated to enhance geometric detail and focus on perceptually important regions.

A novel generator design is proposed to address the goal of high image quality and flexible handling of loose clothing, modeling 3D humans holistically in a canonical space. The articulation module, Fast-SNARF, is responsible for the movement and positioning of body parts and adapted to the generative setting. Additionally, the model adopts empty-space skipping, optimizing and accelerating the rendering of areas with no significant content to improve overall efficiency.

The modular 2D discriminators are guided by normal information, meaning they consider the directionality of surfaces in the 3D space. This guidance helps the model focus on regions that are perceptually important for human observers, contributing to a more accurate and visually pleasing outcome. Furthermore, the discriminators prioritize geometric details, enhancing the overall quality of the generated images. This improvement likely contributes to a more realistic and visually appealing representation of the 3D human models.

The experimental results reported above demonstrate a significant improvement of the proposed method over previous 3D- and articulation-aware methods in terms of geometry and texture quality, validated quantitatively, qualitatively, and through perceptual studies.

In summary, this contribution includes a generative model of articulated 3D humans with state-of-the-art appearance and geometry, an efficient generator for loose clothing, and specialized discriminators enhancing visual and geometric fidelity. The authors plan to release the code and models for further exploration.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Learn How to Generate 3D Avatars from 2D Image Collections with this Novel AI Technique appeared first on MarkTechPost.

Accelerating AI/ML development at BMW Group with Amazon SageMaker Stud …

This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group.
The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and motorcycles, and provider of premium financial and mobility services. The BMW Group sets trends in production technology and sustainability as an innovation leader with an intelligent material mix, a technological shift towards digitalization, and resource-efficient production.
In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions.
Data scientists and ML engineers require capable tooling and sufficient compute for their work. Therefore, BMW established a centralized ML/deep learning infrastructure on premises several years ago and continuously upgraded it. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management.
In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. JuMa is a service of BMW Group’s AI platform for its data analysts, ML engineers, and data scientists that provides a user-friendly workspace with an integrated development environment (IDE). It is powered by Amazon SageMaker Studio and provides JupyterLab for Python and Posit Workbench for R. This offering enables BMW ML engineers to perform code-centric data analytics and ML, increases developer productivity by providing self-service capability and infrastructure automation, and tightly integrates with BMW’s centralized IT tooling landscape.
JuMa is now available to all data scientists, ML engineers, and data analysts at BMW Group. The service streamlines ML development and production workflows (MLOps) across BMW by providing a cost-efficient and scalable development environment that facilitates seamless collaboration between data science and engineering teams worldwide. This results in faster experimentation and shorter idea validation cycles. Moreover, the JuMa infrastructure, which is based on AWS serverless and managed services, helps reduce operational overhead for DevOps teams and allows them to focus on enabling use cases and accelerating AI innovation at BMW Group.
Challenges of growing an on-premises AI platform
Prior to introducing the JuMa service, BMW teams worldwide were using two on-premises platforms that provided teams JupyterHub and RStudio environments. These platforms were too limited regarding CPU, GPU, and memory to allow the scalability of AI at BMW Group. Scaling these platforms with managing more on-premises hardware, more software licenses, and support fees would require significant up-front investments and high efforts for its maintenance. To add to this, limited self-service capabilities were available, requiring high operational effort for its DevOps teams. More importantly, the use of these platforms was misaligned with BMW Group’s IT cloud-first strategy. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS. In contrast, the data science and analytics teams already using AWS directly for experimentation needed to also take care of building and operating their AWS infrastructure while ensuring compliance with BMW Group’s internal policies, local laws, and regulations. This included a range of configuration and governance activities from ordering AWS accounts, limiting internet access, using allowed listed packages to keeping their Docker images up to date.
Overview of solution
JuMa is a fully managed multi-tenant, security hardened AI platform service built on AWS with SageMaker Studio at the core. By relying on AWS serverless and managed services as the main building blocks of the infrastructure, the JuMa DevOps team doesn’t need to worry about patching servers, upgrading storage, or managing any other infrastructure components. The service handles all those processes automatically, providing a powerful technical platform that is generally up to date and ready to use.
JuMa users can effortlessly order a workspace via a self-service portal to create a secure and isolated development and experimentation environment for their teams. After a JuMa workspace is provisioned, the users can launch JupyterLab or Posit workbench environments in SageMaker Studio with just a few clicks and start the development immediately, using the tools and frameworks they are most familiar with. JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. The latter helps AI/ML teams seamlessly access required data, given they are authorized to do so, without needing to build data pipelines. Furthermore, the notebooks can be integrated into the corporate Git repositories to collaborate using version control.
The solution abstracts away all technical complexities associated with AWS account management, configuration, and customization for AI/ML teams, allowing them to fully focus on AI innovation. The platform ensures that the workspace configuration meets BMW’s security and compliance requirements out of the box.
The following diagram describes the high-level context view of the architecture.

User journey
BMW AI/ML team members can order their JuMa workspace using BMW’s standard catalog service. After approval by the line manager, the ordered JuMa workspace is provisioned by the platform fully automatedly. The workspace provisioning workflow includes the following steps (as numbered in the architecture diagram).

A data scientist team orders a new JuMa workspace in BMW’s Catalog. JuMa automatically provisions a new AWS account for the workspace. This ensures full isolation between the workspaces following the federated model account structure mentioned in SageMaker Studio Administration Best Practices.
JuMa configures a workspace (which is a Sagemaker domain) that only allows predefined Amazon SageMaker features required for experimentation and development, specific custom kernels, and lifecycle configurations. It also sets up the required subnets and security groups that ensure the notebooks run in a secure environment.
After the workspaces are provisioned, the authorized users log in to the JuMa portal and access the SageMaker Studio IDE within their workspace using a SageMaker pre-signed URL. Users can choose between opening a SageMaker Studio private space or a shared space. Shared spaces encourage collaboration between different members of a team that can work in parallel on the same notebooks, whereas private spaces allow for a development environment for solitary workloads.
Using the BMW data portal, users can request access to on-premises databases or data stored in BMW’s Cloud Data Hub, making it available in their workspace for development and experimentation, from data preparation and analysis to model training and validation.

After an AI model is developed and validated in JuMa, AI teams can use the MLOPs service of the BMW AI platform to deploy it to production quickly and effortlessly. This service provides users with a production-grade ML infrastructure and pipelines on AWS using SageMaker, which can be set up in minutes with just a few clicks. Users simply need to host their model on the provisioned infrastructure and customize the pipeline to meet their specific use case needs. In this way, the AI platform covers the entire AI lifecycle at BMW Group.
JuMa features
Following best practice architecting on AWS, the JuMa service was designed and implemented according to the AWS Well-Architected Framework. Architectural decisions of each Well-Architected pillar are described in detail in the following sections.
Security and compliance
To assure full isolation between the tenants, each workspace receives its own AWS account, where the authorized users can jointly collaborate on analytics tasks as well as on developing and experimenting with AI/ML models. The JuMa portal itself enforces isolation at runtime using policy-based isolation with AWS Identity and Access Management (IAM) and the JuMa user’s context. For more information about this strategy, refer to Run-time, policy-based isolation with IAM.
Data scientists can only access their domain through the BMW network via pre-signed URLs generated by the portal. Direct internet access is disabled within their domain. Their Sagemaker domain privileges are built using Amazon SageMaker Role Manager personas to ensure least privilege access to AWS services needed for the development such as SageMaker, Amazon Athena, Amazon Simple Storage Service (Amazon S3), and AWS Glue. This role implements ML guardrails (such as those described in Governance and control), including enforcement of ML training to occur in either Amazon Virtual Private Cloud (Amazon VPC) or without internet and allowing only the use of JuMa’s custom vetted and up-to-date SageMaker images.
Because JuMa is designed for development, experimentation, and ad-hoc analysis, it implements retention policies to remove data after 30 days. To access data whenever needed and store it for long term, JuMa seamlessly integrates with the BMW Cloud Data Hub and BMW on-premises databases.
Finally, JuMa supports multiple Regions to comply to special local legal situations which, for example, require it to process data locally to enable BMW’s data sovereignty.
Operational excellence
Both the JuMa platform backend and workspaces are implemented with AWS serverless and managed services. Using those services helps minimize the effort of the BMW platform team maintaining and operating the end-to-end solution, striving to be a no-ops service. Both the workspace and portal are monitored using Amazon CloudWatch logs, metrics, and alarms to check key performance indicators (KPIs) and proactively notify the platform team of any issues. Additionally, the AWS X-Ray distributed tracing system is used to trace requests throughout multiple components and annotate CloudWatch logs with workspace-relevant context.
All changes to the JuMa infrastructure are managed and implemented through automation using infrastructure as code (IaC). This helps reduce manual efforts and human errors, increase consistency, and ensure reproducible and version-controlled changes across both JuMa platform backend workspaces. Specifically, all workspaces are provisioned and updated through an onboarding process built on top of AWS Step Functions, AWS CodeBuild, and Terraform. Therefore, no manual configuration is required to onboard new workspaces to the JuMa platform.
Cost optimization
By using AWS serverless services, JuMa ensures on-demand scalability, pre-approved instance sizes, and a pay-as-you-go model for the resources used during the development and experimentation activities per the AI/ML teams’ needs. To further optimize costs, the JuMa platform monitors and identifies idle resources within SageMaker Studio and shuts them down automatically to prevent expenses for non-utilized resources.
Sustainability
JuMa replaces BMW’s two on-premises platforms for analytics and deep learning workloads that consume a considerable amount of electricity and produce CO2 emissions even when not in use. By migrating AI/ML workloads from on premises to AWS, BMW will slash its environmental impact by decommissioning the on-premises platforms.
Furthermore, the mechanism for auto shutdown of idle resources, data retention polices, and the workspace usage reports to its owners implemented in JuMa help further minimize the environmental footprint of running AI/ML workloads on AWS.
Performance efficiency
By using SageMaker Studio, BMW teams benefit from an easy adoption of the latest SageMaker features that can help accelerate their experimentation. For example, they can use Amazon SageMaker JumpStart capabilities to use the latest pre-trained, open source models. Additionally, it helps reduce AI/ML team efforts moving from experimentation to solution industrialization, because the development environment provides the same AWS core services but restricted to development capabilities.
Reliability
SageMaker Studio domains are deployed in a VPC-only mode to manage internet access and only allow access to intended AWS services. The network is deployed in two Availability Zones to protect against a single point of failure, achieving greater resiliency and availability of the platform to its users.
Changes to JuMa workspaces are automatically deployed and tested to development and integration environments, using IaC and CI/CD pipelines, before upgrading customer environments.
Finally, data stored in Amazon Elastic File System (Amazon EFS) for SageMaker Studio domains is kept after volumes are deleted for backup purposes.
Conclusion
In this post, we described how BMW Group in collaboration with AWS ProServe developed a fully managed AI platform service on AWS using SageMaker Studio and other AWS serverless and managed services.
With JuMa, BMW’s AI/ML teams are empowered to unlock new business value by accelerating experimentation as well as time-to-market for disruptive AI solutions. Furthermore, by migrating from its on-premises platform, BMW can reduce the overall operational efforts and costs while also increasing sustainability and the overall security posture.
To learn more about running your AI/ML experimentation and development workloads on AWS, visit Amazon SageMaker Studio.

About the Authors
Marc Neumann is the head of the central AI Platform at BMP Group. He is responsible for developing and implementing strategies to use AI technology for business value creation across the BMW Group. His primary goal is to ensure that the use of AI is sustainable and scalable, meaning it can be consistently applied across the organization to drive long-term growth and innovation. Through his leadership, Neumann aims to position the BMW Group as a leader in AI-driven innovation and value creation in the automotive industry and beyond.
Amor Steinberg is a Machine Learning Engineer at BMW Group and the service lead of Jupyter Managed, a new service that aims to provide a code-centric analytics and machine learning workbench for engineers and data scientists at the BMW Group. His past experience as a DevOps Engineer at financial institutions enabled him to gather a unique understanding of the challenges that faces banks in the European Union and keep the balance between striving for technological innovation, complying with laws and regulations, and maximizing security for customers.
Marinus Krommenhoek is a Senior Cloud Solution Architect and a Software Developer at BMW Group. He is enthusiastic about modernizing the IT landscape with state-of-the-art services that add high value and are easy to maintain and operate. Marinus is a big advocate of microservices, serverless architectures, and agile working. He has a record of working with distributed teams across the globe within large enterprises.
Nicolas Jacob Baer is a Principal Cloud Application Architect at AWS ProServe with a strong focus on data engineering and machine learning, based in Switzerland. He works closely with enterprise customers to design data platforms and build advanced analytics and ML use cases.
Joaquin Rinaudo is a Principal Security Architect at AWS ProServe. He is passionate about building solutions that help developers improve their software quality. Prior to AWS, he worked across multiple domains in the security industry, from mobile security to cloud and compliance-related topics. In his free time, Joaquin enjoys spending time with family and reading science-fiction novels.
Shukhrat Khodjaev is a Senior Global Engagement Manager at AWS ProServe. He specializes in delivering impactful big data and AI/ML solutions that enable AWS customers to maximize their business value through data utilization.

Automating product description generation with Amazon Bedrock

In today’s ever-evolving world of ecommerce, the influence of a compelling product description cannot be overstated. It can be the decisive factor that turns a potential visitor into a paying customer or sends them clicking off to a competitor’s site. The manual creation of these descriptions across a vast array of products is a labor-intensive process, and it can slow down the velocity of new innovation. This is where Amazon Bedrock with its generative AI capabilities steps in to reshape the game. In this post, we dive into how Amazon Bedrock is transforming the product description generation process, empowering e-retailers to efficiently scale their businesses while conserving valuable time and resources.
Unlocking the power of generative AI in retail
Generative AI has captured the attention of boards and CEOs worldwide, prompting them to ask, “How can we leverage generative AI for our business?” One of the most promising applications of generative AI in ecommerce is using it to craft product descriptions. Retailers and brands have invested significant resources in testing and evaluating the most effective descriptions, and generative AI excels in this area.
Creating engaging and informative product descriptions for a vast catalog is a monumental task, especially for global ecommerce platforms. Manual translation and adaptation of product descriptions for each market consumes time and resources. This results in generic or incomplete descriptions, leading to reduced sales and customer satisfaction.
The power of Amazon Bedrock: AI-generated product descriptions
Amazon Bedrock is a fully managed service that simplifies generative AI development, offering high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API. It provides a comprehensive set of capabilities for building generative AI applications while ensuring privacy and security are maintained. With Amazon Bedrock, you can experiment with various FMs and customize them privately using techniques like fine-tuning and Retrieval Augmented Generation (RAG). The platform enables you to create managed agents for complex business tasks without the need for coding, such as booking travel, processing insurance claims, creating ad campaigns, and managing inventory.
For example, ecommerce platforms can initially generate basic product descriptions that include size, color, and price. However, Amazon Bedrock’s flexibility allows these descriptions to be fine-tuned to incorporate customer reviews, integrate brand-specific language, and highlight specific product features, resulting in tailored descriptions that resonate with the target audience. Moreover, Amazon Bedrock offers access to foundation models from Amazon and leading AI startups through an intuitive API, making the entire process seamless and efficient.
Using AI can have the following impact on the product description process:

Faster approvals – Vendors experience a streamlined process, moving from product listing to approval in under an hour, eliminating frustrating delays
Improved product listing velocity – When automated, your ecommerce marketplace sees a surge in product listings, offering consumers access to the latest merchandise nearly instantaneously
Future-proofing – By embracing cutting-edge AI, you secure your position as a forward-looking platform ready to meet evolving market demands
Innovation – This solution liberates teams from mundane tasks, allowing them to focus on higher-value work and fostering a culture of innovation

Solution overview
Before we dive into the technical details, let’s see the high-level preview of what this solution offers. This solution will allow you to create and manage product descriptions for your ecommerce platform. It empowers your platform to:

Generate descriptions from text – With the power of generative AI, Amazon Bedrock can convert plain text descriptions into vivid, informative, and captivating product descriptions.
Craft images – Beyond text, it can also craft images that align perfectly with the product descriptions, enhancing the visual appeal of your listings.
Enhance existing content – Do you have existing product descriptions that need a fresh perspective? Amazon Bedrock can take your current content and make it even more compelling and engaging.

This solution is available in the AWS Solutions Library. We’ve provided detailed instructions in the accompanying README file. The README file contains all the information you need to get started, from requirements to deployment guidelines.
The system architecture comprises several core components:

UI portal – This is the user interface (UI) designed for vendors to upload product images.
Amazon Rekognition – Amazon Rekognition is an image analysis service that detects objects, text, and labels in images.
Amazon Bedrock – Foundation models in Amazon Bedrock use the labels detected by Amazon Rekognition to generate product descriptions.
AWS Lambda – AWS Lambda provides serverless compute for processing.
Product database – The central repository stores vendor products, images, labels, and generated descriptions. This could be any database of your choice. Note that in this solution, all of the storage is in the UI.
Admin portal – This portal provides oversight of the system and product listings, ensuring smooth operation. This is not part of the solution; we’ve added it for understanding.

The following diagram illustrates the flow of data and interactions within the system

The workflow includes the following steps:

The client initiates a request to the Amazon API Gateway REST API.
Amazon API Gateway passes the request to AWS Lambda through a proxy integration.
When operating on product image inputs, AWS Lambda calls Amazon Rekognition to detect objects in the image.
AWS Lambda calls LLMs hosted by Amazon Bedrock, such as the Amazon Titan language models, to generate product descriptions.
The response is passed back from AWS Lambda to Amazon API Gateway.
Finally, HTTP response from Amazon API Gateway is returned to the client.

Example use case
Imagine a vendor uploads a product image of shoes, and Amazon Rekognition identifies key attributes like “white shoes,” “sneaker,” and “durable.” The Amazon Bedrock Titan AI takes this information and generates a product description like, “Here is a draft product description for a canvas running shoe based on the product photo: Introducing the Canvas Runner, the perfect lightweight sneaker for your active lifestyle. This running shoe features a breathable canvas upper with leather accents for a stylish, classic look. The lace-up design provides a secure fit, while the padded tongue and collar add comfort. Inside, a removable cushioned insole supports and comforts your feet. The EVA midsole absorbs shock with each step, reducing fatigue. Flex grooves in the rubber outsole ensure flexibility and traction. With its simple, retro-inspired style, the Canvas Runner seamlessly transitions from workouts to everyday wear. Whether you’re running errands or running miles, this versatile sneaker will keep you moving in comfort and style.”
Design details
Let’s explore the components in more detail:

User interface:

Front end – The front end of the vendor portal allows vendors to upload product images and displays product listings.
API calls – The portal communicates with the backend through APIs to process images and generate descriptions.

Amazon Rekognition:

Image analysis – Triggered by API calls, Amazon Rekognition analyzes images and detects objects, text, and labels.
Label output – It outputs label data derived from the analysis.

Amazon Bedrock:

NLP text generation – Amazon Bedrock uses the Amazon Titan natural language processing (NLP) model to generate textual descriptions.
Label integration – It takes the labels detected by Amazon Rekognition as input to generate product descriptions.
Style matching – Amazon Bedrock provides fine-tuning capabilities for Amazon Titan models to ensure that the generated descriptions match the style of the platform.

AWS Lambda:

Processing – Lambda handles the API calls to services.

Product database:

Flexible database – The product database is chosen based on customer preferences and requirements. Note this is not provided as part of the solution.

Additional capabilities
This solution goes beyond just generating product descriptions. It offers two more incredible options:

Image and description generation from text – With the power of generative AI, Amazon Bedrock can take text descriptions and create corresponding images along with detailed product descriptions. Consider the potential:

Instantly visualizing products from text.
Automating image creation for large catalogs.
Enhancing customer experience with rich visuals.
Reducing content creation time and costs.

Description enhancement – If you already have existing product descriptions, Amazon Bedrock can enhance them. Simply supply the text and the prompt, and Amazon Bedrock will skillfully enhance and enrich the content, rendering it highly captivating and engaging for your customers.

Conclusion
In the fiercely competitive world of ecommerce, staying at the forefront of innovation is imperative. Amazon Bedrock offers a transformative capability for e-retailers looking to enhance their product content, optimize their listing process, and drive sales. With the power of AI-generated product descriptions, businesses can create compelling, informative, and culturally relevant content that resonates deeply with customers. The future of ecommerce has arrived, and it’s driven by machine learning with Amazon Bedrock.
Are you ready to unlock the full potential of AI-powered product descriptions? Take the next step in revolutionizing your ecommerce platform. Visit the AWS Solutions Library and explore how Amazon Bedrock can transform your product descriptions, streamline your processes, and boost your sales. It’s time to supercharge your ecommerce with Amazon Bedrock!

About the Authors
Dhaval Shah is a Senior Solutions Architect at AWS, specializing in Machine Learning. With a strong focus on digital native businesses, he empowers customers to leverage AWS and drive their business growth. As an ML enthusiast, Dhaval is driven by his passion for creating impactful solutions that bring positive change. In his leisure time, he indulges in his love for travel and cherishes quality moments with his family.
Doug Tiffan is the Head of World Wide Solution Strategy for Fashion & Apparel at AWS. In his role, Doug works with Fashion & Apparel executives to understand their goals and align with them on the best solutions. Doug has over 30 years of experience in retail, holding several merchandising and technology leadership roles. Doug holds a BBA from Texas A&M University and is based in Houston, Texas.
Nikhil Sharma is a Solutions Architecture Leader at Amazon Web Services (AWS) where he and his team of Solutions Architects help AWS customers solve critical business challenges using AWS cloud technologies and services.
Kevin Bell is a Sr. Solutions Architect at AWS based in Seattle. He has been building things in the cloud for about 10 years. You can find him online as @bellkev on GitHub.
Nipun Chagari is a Principal Solutions Architect based in the Bay Area, CA. Nipun is passionate about helping customers adopt Serverless technology to modernize applications and achieve their business objectives. His recent focus has been on assisting organizations in adopting modern technologies to enable digital transformation. Apart from work, Nipun finds joy in playing volleyball, cooking and traveling with his family.
Marshall Bunch is a Solutions Architect at AWS helping North American customers design secure, scalable and cost-effective workloads in the cloud. His passion lies in solving age-old business problems where data and the newest technologies enable novel solutions. Beyond his professional pursuits, Marshall enjoys hiking and camping in Colorado’s beautiful Rocky Mountains.
Altaaf Dawoodjee is a Solutions Architect Leader that supports AdTech customers in the Digital Native Business (DNB) segment at Amazon Web Service (AWS). He has over 20 years of experience in Technology and has deep expertise in Analytics. He is passionate about helping drive successful business outcomes for his customers leveraging the AWS cloud.
Scott Bell is a dynamic leader and innovator with 25+ years of technology management experience. He is passionate about leading and developing teams in providing technology to meet the challenges of global users and businesses. He has extensive experience in leading technology teams which provide global technology solutions supporting 35+ languages. He is also passionate about the way the AI and Generative AI transform businesses and the way they support customer’s current unmet needs.
Sachin Shetti is a Principal Customer Solution Manager at AWS. He is passionate about helping enterprises succeed and realize significant benefits from cloud adoption, driving everything from basic migration to large-scale cloud transformation across people, processes, and technology. Prior to joining AWS, Sachin worked as a software developer for over 12 years and held multiple senior leadership positions leading technology delivery and transformation in healthcare, financial services, retail, and insurance. He has an Executive MBA and a Bachelor’s degree in Mechanical Engineering.

Optimizing costs for Amazon SageMaker Canvas with automatic shutdown o …

Amazon SageMaker Canvas is a rich, no-code Machine Learning (ML) and Generative AI workspace that has allowed customers all over the world to more easily adopt ML technologies to solve old and new challenges thanks to its visual, no-code interface. It does so by covering the ML workflow end-to-end: whether you’re looking for powerful data preparation and AutoML, managed endpoint deployment, simplified MLOps capabilities, and ready-to-use models powered by AWS AI services and Generative AI, SageMaker Canvas can help you to achieve your goals.
As companies of all sizes adopt SageMaker Canvas, customers asked for ways to optimize cost. As defined in the AWS Well-Architected Framework, a cost-optimized workload fully uses all resources, meets your functional requirements, and achieves an outcome at the lowest possible price point.
Today, we’re introducing a new way to further optimize costs for SageMaker Canvas applications. SageMaker Canvas now collects Amazon CloudWatch metrics that provide insight into app usage and idleness. Customers can use this information to shut down automatically idle SageMaker Canvas applications to avoiding incurring unintended costs.
In this post, we’ll show you how to automatically shut down idle SageMaker Canvas apps to control costs by using a simple serverless architecture. Templates used in this post are available in GitHub.
Understanding and tracking costs
Education is always the first step into understanding and controlling costs for any workload, either on-premises or in the cloud. Let’s start by reviewing the SageMaker Canvas pricing model. In a nutshell, SageMaker Canvas has a pay-as-you-go pricing model, based on two dimensions:

Workspace instance: ­ formerly known as session time, is the cost associated with running the SageMaker Canvas app
AWS service charges: ­ costs associated with training the models, deploying the endpoints, generating inferences (resources to spin up SageMaker Canvas).

Customers always have full control over the resources that are launched by SageMaker Canvas and can keep track of costs associated with the SageMaker Canvas app by using the AWS Billing and Cost Management service. For more information, refer to Manage billing and cost in SageMaker Canvas.
To limit the cost associated with the workspace instances, as a best practice, you must log out, do not close the browser tab. To log out, choose the Log out button on the left panel of the SageMaker Canvas app.

Automatically shutting down SageMaker Canvas applications
For IT Administrators that are looking to provide automated controls for shutting down SageMaker Canvas applications and keeping costs under control, there are two approaches:

Shutdown applications on a schedule (every day at 19:00 or every Friday at 18:00)
Shutdown automatically idle applications (when the application hasn’t been used for two hours)

Shutdown applications on a schedule

Scheduled shutdown of SageMaker Canvas applications can be achieved with very little effort by using a cron expression (with Amazon EventBridge Cron Rule), a compute component (an AWS Lambda function) that calls the Amazon SageMaker API DeleteApp. This approach has been discussed in the Provision and manage ML environments with Amazon SageMaker Canvas using AWS CDK and AWS Service Catalog post, and implemented in the associated GitHub repository.
One of the advantages of the above architecture is that it is very simple to duplicate it to achieve scheduled creation of the SageMaker Canvas app. By using a combination of scheduled creation and scheduled deletion, a cloud administrator can make sure that the SageMaker Canvas application is ready to be used whenever users start their business day (e.g. 9AM on a work day), and that the app also automatically shuts down at the end of the business day (e.g. 7PM on a work day, always shut down during weekends). All that is needed to do is change the line of code calling the DeleteApp API into CreateApp, as well as updating the cron expression to reflect the desired app creation time.
While this approach is very easy to implement and test, a drawback of the suggested architecture is that it does not take into account whether an application is currently being used or not, shutting it down regardless of its current activity status. According to different situations, this might cause friction with active users, which might suddenly see their session terminated.
You can retrieve the template associated to this architecture from the following GitHub repository:

Shutdown automatically idle applications

Starting today, Amazon SageMaker Canvas emits CloudWatch metrics that provide insight into app usage and idleness. This allows an administrator to define a solution that reads the idleness metric, compares it against a threshold, and defines a specific logic for automatic shutdown. A more detailed overview of the idleness metric emitted by SageMaker Canvas is shown in the following paragraph.
To achieve automatic shutdown of SageMaker Canvas applications based on the idleness metrics, we provide an AWS CloudFormation template. This template consists of three main components:

An Amazon CloudWatch Alarm, which runs a query to check the MAX value of the TimeSinceLastActive metric. If this value is greater than a threshold provided as input to the CloudFormation template, it triggers the rest of the automation. This query can be run on a single user profile, on a single domain, or across all domains. According to the level of control that you wish to have, you can use:

the all-domains-all-users template, which checks this across all users and all domains in the region where the template is deployed
the one-domain-all-users template, which checks this across all users in one domain in the region where the template is deployed
the one-domain-one-user template, which checks this for one user profile, in one domain, in the region where the template is deployed

The alarm state change creates an event on the default event bus in Amazon EventBridge, which has an Amazon EventBridge Rule set up to trigger an AWS Lambda function
The AWS Lambda function identifies which SageMaker Canvas app has been running in idle for more than the specified threshold, and deletes it with the DeleteApp API.

You can retrieve the AWS CloudFormation templates associated to this architecture from the following GitHub repository:

How SageMaker Canvas idleness metric work
SageMaker Canvas emits a TimeSinceLastActive metric in the /aws/sagemaker/Canvas/AppActivity namespace, which shows the number of seconds that the app has been idle with no user activity. We can use this new metric to trigger an automatic shutdown of the SageMaker Canvas app when it has been idle for a defined period. SageMaker Canvas exposes the TimeSinceLastActive with the following schema:
{
“Namespace”: “/aws/sagemaker/Canvas/AppActivity”,
“Dimensions”: [
[
“DomainId”,
“UserProfileName”
]
],
“Metrics”: [
{
“Name”: “TimeSinceLastActive”,
“Unit”: “Seconds”,
“Value”: 12345
}
]
}
The key components of this metric are as follows:

Dimensions, in particular DomainID and UserProfileName, that allow an administrator to pinpoint which applications are idle across all domains and users
Value of the metric, which indicates the number of seconds since the last activity in the SageMaker Canvas applications. SageMaker Canvas considers the following as activity:

Any action taken in the SageMaker Canvas application (clicking a button, transforming a dataset, generating an in-app inference, deploying a model);
Using a ready-to-use model or interacting with the Generative AI models using chat interface;
A batch inference scheduled to run at a specific time; for more information, refer to  Manage automations.

This metric can be read via Amazon CloudWatch API such as get_metric_data. For example, using the AWS SDK for Python (boto3):
import boto3, datetime

cw = boto3.client(‘cloudwatch’)
metric_data_results = cw.get_metric_data(
MetricDataQueries=[
{
“Id”: “q1”,
“Expression”: ‘SELECT MAX(TimeSinceLastActive) FROM “/aws/sagemaker/Canvas/AppActivity” GROUP BY DomainId, UserProfileName’,
“Period”: 900
}
],
StartTime=datetime.datetime(2023, 1, 1),
EndTime=datetime.datetime.now(),
ScanBy=’TimestampAscending’
)
The Python query extracts the MAX value of TimeSinceLastActive from the namespace associated to SageMaker Canvas after grouping these values by DomainID and UserProfileName.
Deploying and testing the auto-shutdown solution
To deploy the auto-shutdown stack, do the following:

Download the AWS CloudFormation template that refers to the solution you want to implement from the above GitHub repository. Choose whether you want to implement a solution for all SageMaker Domains, for a single SageMaker Domain, or for a single user;
Update template parameters:

The idle timeout – time (in seconds) that the SageMaker Canvas app is allowed to stay in idle before it gets shutdown; default value is 2 hours
The alarm period – aggregation time (in seconds) used by CloudWatch Alarm to compute the idle timeout; default value is 20 minutes
(optional) SageMaker Domain ID and user profile name

Deploy the CloudFormation stack to create the resources

Once deployed (should take less than two minutes), the AWS Lambda function and Amazon CloudWatch alarm are configured to automatically shut down the Canvas app when idle. To test the auto-shutdown script, do the following:

Make sure that the SageMaker Canvas app is running within the right domain and with the right user profile (if you have configured them).
Stop using the SageMaker Canvas app and wait for the idle timeout period (default, 2 hours)
Check that the app is stopped after being idle for the threshold time by checking that the CloudWatch alarm has been triggered and, after triggering the automation, it has gone back to the normal state.

In our test, we have set the idle timeout period to two hours (7200 seconds). In the following graph plotted by Amazon CloudWatch Metrics, you can see that the SageMaker Canvas app has been emitting the TimeSinceLastActive metric until the threshold was met (1), which triggered the alarm. Once the alarm was triggered, the AWS Lambda function was executed, which deleted the app and brought the metric back below the threshold (2).

Conclusion
In this post, we implemented an automated shutdown solution for idle SageMaker Canvas apps using AWS Lambda and CloudWatch Alarm and the newly emitted metric of idleness from SageMaker Canvas. Thanks to this solution, customers not only can optimize costs for their ML workloads but can also avoid unintended charges for applications that they forgot were running in their SageMaker Domain.
We’re looking forward to seeing what new use cases and workloads customers can solve with the peace of mind brought by this solution. For more examples of how SageMaker Canvas can help you achieve your business goals, refer to the following posts:

Predict customer churn with no-code machine learning using Amazon SageMaker Canvas
Publish predictive dashboards in Amazon QuickSight using ML predictions from Amazon SageMaker Canvas
Democratize computer vision defect detection for manufacturing quality using no-code machine learning with Amazon SageMaker Canvas
Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models
Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI

To learn how you can run production-level workloads with Amazon SageMaker Canvas, refer to the following posts:

Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas
Retrain ML models and automate batch predictions in Amazon SageMaker Canvas using updated datasets
Operationalize ML models built in Amazon SageMaker Canvas to production using the Amazon SageMaker Model Registry
Provision and manage ML environments with Amazon SageMaker Canvas using AWS CDK and AWS Service Catalog

About the authors
Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with customers all around the globe that are looking to adopt Low-Code/No-Code Machine Learning technologies, and Generative AI. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.
Huong Nguyen is a Sr. Product Manager at AWS. She is leading the data ecosystem integration for SageMaker, with 14 years of experience building customer-centric and data-driven products for both enterprise and consumer spaces.
Gunjan Garg is a Principal Engineer at Amazon SageMaker team in AWS, providing technical leadership for the product. She has worked in several roles in the AI/ML org for last 5 years and is currently focused on Amazon SageMaker Canvas.
Ziyao Huang is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is passionate about building great product that makes ML easy for the customers. Outside of work, Ziyao likes to read, and hang out with his friends.

Decoding Complex AI Models: Purdue Researchers Transform Deep Learning …

The highly parameterized nature of complex prediction models makes describing and interpreting the prediction strategies difficult. Researchers have introduced a novel approach using topological data analysis (TDA), to solve the issue. These models, including machine learning, neural networks, and AI models, have become standard tools in various scientific fields but are often difficult to interpret due to their extensive parameterization.

The researchers from Purdue University recognized the need for a tool that could transform these intricate models into a more understandable format. They leveraged TDA to construct Reeb networks, providing a topological view that facilitates the interpretation of prediction strategies. The method was applied to various domains, showcasing its scalability across large datasets.

The proposed Reeb networks are essentially discretizations of topological structures, allowing for the visualization of prediction landscapes. Each node in the Reeb network represents a local simplification of the prediction space, computed as a cluster of data points with similar predictions. Nodes are connected based on shared data points, revealing informative relationships between predictions and training data.

One significant application of this approach is in detecting labeling errors in training data. The Reeb networks proved effective in identifying ambiguous regions or prediction boundaries, guiding further investigation into potential errors. The method also demonstrated utility in understanding generalization in image classification and inspecting predictions related to pathogenic mutations in the BRCA1 gene.

Comparisons were drawn with widely used visualization techniques such as tSNE and UMAP, highlighting the Reeb networks’ ability to provide more information about the boundaries between predictions and relationships between training data and predictions.

The construction of Reeb networks involves prerequisites such as a large set of data points with unknown labels, known relationships among data points, and a real-valued guide to each predicted value. The researchers employed a recursive splitting and merging procedure called GTDA (graph-based TDA) to build the Reeb net from the original data points and graph. The method is scalable, as demonstrated by its analysis of 1.3 million images in ImageNet.

In practical applications, the Reeb network framework was applied to a graph neural network predicting product types on Amazon based on reviews. It revealed key ambiguities in product categories, emphasizing the limitations of prediction accuracy and suggesting the need for label improvements. Similar insights were gained when applying the framework to a pretrained ResNet50 model on the Imagenet dataset, providing a visual taxonomy of images and uncovering ground truth labeling errors.

The researchers also showcased the application of Reeb networks in understanding predictions related to malignant gene mutations, particularly in the BRCA1 gene. The networks highlighted localized components in the DNA sequence and their mapping to secondary structures, aiding interpretation.

In conclusion, the researchers anticipate that topological inspection techniques, such as Reeb networks, will play a crucial role in translating complex prediction models into actionable human-level insights. The method’s ability to identify issues from labeling errors to protein structure suggests its broad applicability and potential as an early diagnostic tool for prediction models.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Decoding Complex AI Models: Purdue Researchers Transform Deep Learning Predictions into Topological Maps appeared first on MarkTechPost.

Researchers from Microsoft Research and Tsinghua University Proposed  …

Large Language Models (LLMs), such as GPT-4 and LLaMA, have undoubtedly transformed the technological landscape. However, sluggish processing speed is a recurring challenge limiting their widespread applicability. Despite their remarkable capabilities, the time it takes to obtain responses from LLMs hinders their effectiveness, particularly in latency-critical applications like chatbots, copilots, and industrial controllers. Recognizing the need for a solution that addresses this fundamental problem, Microsoft Research and Tsinghua University researchers have introduced an innovative approach named Skeleton-of-Thought (SoT).

Traditionally, efforts to enhance LLMs’ speed have involved intricate modifications to the models, systems, or hardware. However, the research team takes a different route with SoT. Unlike conventional methods, SoT refrains from making extensive changes to LLMs and treats them as black boxes instead. The focus shifts from altering the internal workings of the models to optimizing the organization of their output content. The proposed solution prompts LLMs to follow a unique two-stage process. In the first stage, the LLM is directed to derive a skeleton of the answer. Subsequently, in the second stage, the LLM is tasked with the parallel expansion of multiple points within the skeleton. This approach introduces a novel means of boosting LLM response times without requiring complex adjustments to the model architecture.

The methodology of SoT involves breaking down the content generation process into two distinctive stages. Firstly, the LLM is prompted to construct a skeleton of the answer. This initial step aligns with how humans often approach problem-solving by outlining a high-level structure. The second stage leverages this skeleton to execute parallel expansion, enabling the LLM to address multiple points simultaneously. Remarkably, this approach is applicable to open-source models like LLaMA and API-based models such as GPT-4, showcasing its versatility.

To evaluate the effectiveness of SoT, the research team conducted extensive tests on 12 recently released models, spanning both open-source and API-based categories. The team observed substantial speed-ups by utilizing the Vicuna-80 dataset, which includes questions from various domains like coding, math, writing, and roleplay. SoT achieved speed-ups ranging from 1.13x to 2.39x on eight 12 models. Crucially, these speed-ups were attained without sacrificing answer quality. The team used metrics from FastChat and LLMZoo to assess the quality of SoT’s answers, showcasing its ability to maintain or improve response quality across diverse question categories.

In conclusion, SoT emerges as a promising solution to the persistent challenge of slow LLMs. The research team’s innovative approach of treating LLMs as black boxes and focusing on data-level efficiency optimization provides a fresh perspective on accelerating content generation. By prompting LLMs to construct a skeleton of the answer and then executing parallel expansion, SoT introduces an effective means of improving response times. The results from the evaluation demonstrate not only considerable speed-ups but also the ability to maintain or enhance answer quality, addressing the dual challenges of efficiency and effectiveness. This work opens up avenues for future exploration in dynamic thinking processes for artificial intelligence, encouraging a shift towards more efficient and versatile language models.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Researchers from Microsoft Research and Tsinghua University Proposed Skeleton-of-Thought (SoT): A New Artificial Intelligence Approach to Accelerate Generation of LLMs appeared first on MarkTechPost.

NVIDIA AI Researchers Propose Tied-Lora: A Novel Artificial Intelligen …

A group of researchers from Nvidia have developed a new technique called Tied-LoRA, which aims to improve the parameter efficiency of the Low-rank Adaptation (LoRA) method. The course uses weight tying and selective training to find the optimal balance between performance and trainable parameters. The researchers conducted experiments on different tasks and base language models and found that there are trade-offs between efficiency and performance.

Recent advances in parameter-efficient fine-tuning techniques include LoRA, which reduces trainable parameters through low-rank matrix approximations. AdaLoRA is an extension of LoRA that introduces dynamic rank adjustment and combines adapter tuning with LoRA. Another technique is VeRA, proposed by Kopiczko, which reduces parameters through frozen matrices and trainable scaling vectors. QLoRA uses quantized base models to achieve memory-efficient LoRA. This study applies weight tying to low-rank weight matrices, further enhancing parameter efficiency.

In addressing the computational expense of fine-tuning LLMs for downstream tasks, Tied-LoRA is a novel approach that combines weight tying and selective training to enhance the parameter efficiency of LoRA. It explores different parameter training/freezing and weight-tying combinations through systematic experiments on diverse studies and base language models. The researchers identify a specific Tied-LoRA configuration that achieves comparable performance while utilizing only 13% of the parameters compared to the standard LoRA method.

Tied-LoRA is a method that enhances the parameter efficiency of the LoRA approach by combining weight tying and selective training. It involves applying weight tying to low-rank matrices in LoRA, sharing the same consequences across layers in the base language model, thereby reducing the number of trainable parameters. It explores various combinations of parameter training/freezing and weight tying to achieve an optimal balance between performance and trainable parameters. The proposed Tied-LoRA configurations are evaluated on diverse tasks, demonstrating efficiency across data settings, including translation and mathematical reasoning.

In experiments across diverse tasks and two base language models, different Tied-LoRA configurations demonstrated trade-offs between efficiency and performance. A specific Tied-LoRA configuration, vBuA, outperformed others, achieving comparable performance. vBuA was identified as the optimal option, maintaining performance while reducing parameters by 87%. Evaluations on tasks like extractive question answering, summarization, and mathematical reasoning showcased Tied-LoRA’s ability to enhance parameter efficiency while preserving competitive performance significantly.

After conducting experiments across various tasks, it has been found that Tied-LoRA is a paradigm that enhances the parameter efficiency of the LoRA method by utilizing weight tying and selective training. The results suggest that Tied-LoRA can replace functions such as commonsense NLI, extractive QA, and summarization. Moreover, it offers improved parameter efficiency without compromising performance, utilizing only 13% of the parameters from standard LoRA. However, discussing limitations and comparisons with other parameter efficiency methods is important to identify potential areas for future exploration.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post NVIDIA AI Researchers Propose Tied-Lora: A Novel Artificial Intelligence Approach that Aims to Improve the Parameter Efficiency of the Low-rank Adaptation (LoRA) Methods appeared first on MarkTechPost.

Microsoft Research Introduces Florence-2: A Novel Vision Foundation Mo …

There has been a noticeable trend in Artificial General Intelligence (AGI) systems toward using pre-trained, adaptable representations, which provide task-agnostic advantages in various applications. Natural language processing (NLP) is a good example of this tendency since sophisticated models demonstrate flexibility with thorough knowledge covering several domains and tasks with straightforward instructions. The popularity of NLP encourages a complementary strategy in computer vision. Unique obstacles arise from the necessity for broad perceptual capacities in universal representation for various vision-related activities. Whereas natural language processing (NLP) focuses mostly on text, computer vision has to handle complex visual data such as characteristics, masked contours, and object placement. In computer vision, achieving universal representation necessitates skillful handling of various challenging tasks arranged in two dimensions, as shown in Figure 1. 

Figure 1

Spatial Hierarchy: The model has to recognize spatial information at different sizes, comprehending fine-grained pixel details and image-level ideas. To support the complex spatial hierarchy in vision, the model must be capable of managing a range of granularities.

Semantic Granularity: In computer vision, universal representation should cover a range of semantic granularities. The paradigm moves from abstract titles to more detailed explanations, providing flexible comprehension for various uses. 

This pursuit is characterized by distinctiveness and substantial challenges. A key hurdle is the need for more, hindering the development of a foundational model capable of capturing the intricate nuances of spatial hierarchy and semantic granularity. Existing datasets, such as ImageNet, COCO, and Flickr30k Entities, tailored for specialized applications, are extensively labeled by humans. To overcome this constraint, it is imperative to generate extensive annotations for each image on a larger scale. Another challenge is the absence of a that seamlessly integrates spatial hierarchy and semantic granularity in computer vision. With task-specific design, traditional models perform well in tasks like semantic segmentation, object identification, and picture captioning. However, creating a complete, cohesive model that can adjust to different vision tasks in a task-independent way is crucial, even taking on new duties with little to no task-specific fine-tuning.

Through unified pre-training and network design, the model pioneers the integration of spatial, temporal, and multi-modal features in computer vision. The first evolutionary iteration excels in transfer learning through task-specific fine-tuning using customized adapters and pre-training with noisy text-image pairings. However, its reliance on big task-specific datasets and adapters results in gaps when it comes to tackling the two major issues mentioned above. In this work, researchers from Azure provide a universal backbone that is attained using multitask learning with rich visual annotations. This leads to a prompt-based, unified representation for various vision tasks, which successfully tackles the issues of incomplete comprehensive data and lack of a uniform architecture.

Large-scale, high-quality annotated data is necessary for multitask learning. Rather than depending on time-consuming human annotation, their data engine creates an extensive visual dataset named fld, which has 5.4B annotations for 126M photos. There are two effective processing modules in this engine. The first module departs from the conventional single and manual annotation strategy by using specialized models to annotate photos jointly and autonomously. Similar to the wisdom of crowds theory, many models collaborate to create a consensus, resulting in a more impartial and trustworthy picture interpretation. Using basic models that have been learned, the second module repeatedly refines and filters these automatic annotations.

Their model uses a sequence-to-sequence (seq2seq) architecture, integrating an image encoder and a multi-modality encoder-decoder by leveraging this large dataset. This architecture supports a range of vision tasks without requiring task-specific architectural adjustments, in line with the NLP community’s goal of flexible model creation with a uniform foundation. Every annotation in the dataset is consistently standardized into textual outputs. This enables the consistent optimization of a single multitask learning strategy using the same loss function as the goal. The result is a flexible vision foundation model, or model, that can handle a range of functions, including object recognition, captioning, and grounding, all under the control of a single model with standardized parameters. Textual prompts are utilized to activate tasks, consistent with the methodology employed by large language models (LLMs).

Their method achieves a universal representation and has wide-ranging use in many visual tasks. Key findings consist of:

The model is a flexible vision foundation model that provides new state-of-the-art zero-shot performance in tasks, including referencing expression comprehension on RefCOCO, visual grounding on Flick30k, and captioning on COCO.

Notwithstanding its small size, it competes with more specialized models after being fine-tuned using publicly available human-annotated data. Most notably, the improved model sets new benchmark state-of-the-art scores on RefCOCO.

The pre-trained backbone outperforms supervised and self-supervised models on downstream tasks, COCO object detection and instance segmentation, and ADE20K semantic segmentation. Their model, which uses the Mask-RCNN, DINO, and UperNet frameworks, delivers significant increases of 6.9, 5.5, and 5.9 points on COCO and ADE20K datasets, respectively and quadruples the training efficiency of pre-trained models on ImageNet.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Microsoft Research Introduces Florence-2: A Novel Vision Foundation Model with a Unified Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks appeared first on MarkTechPost.

University of Pennsylvania Researchers have Developed a Machine Learni …

The GPT-Vision model has caught everyone’s attention. People are excited about its ability to understand and generate content related to text and images. However, there’s a challenge – we don’t know precisely what GPT-Vision is good at and where it falls short. This lack of understanding can be risky, primarily if the model is used in critical areas where mistakes could have serious consequences.

Traditionally, researchers evaluate AI models like GPT-Vision by collecting extensive data and using automatic metrics for measurement. However, an alternative approach- an example-driven analysis- is introduced by researchers. Instead of analyzing vast amounts of data, the focus shifts to a small number of specific examples. This approach is considered scientifically rigorous and has proven effective in other fields.

To address the challenge of comprehending GPT-Vision’s capabilities, a team of researchers from the University of Pennsylvania has proposed a formalized AI method inspired by social science and human-computer interaction. This machine learning-based method provides a structured framework for evaluating the model’s performance, emphasizing a deep understanding of its real-world functionality.

The suggested evaluation method involves five stages: data collection, data review, theme exploration, theme development, and theme application. Drawing from grounded theory and thematic analysis, established techniques in social science, this method is designed to offer profound insights even with a relatively small sample size.

To illustrate the effectiveness of this evaluation process, the researchers applied it to a specific task – generating alt text for scientific figures. Alt text is crucial for conveying image content to individuals with visual impairments. The analysis reveals that while GPT-Vision displays impressive capabilities, it tends to depend on textual information overly, is sensitive to prompt wording, and struggles with understanding spatial relationships.

In conclusion, the researchers emphasize that this example-driven qualitative analysis not only identifies limitations in GPT-Vision but also showcases a thoughtful approach to understanding and evaluating new AI models. The goal is to prevent potential misuse of these models, particularly in situations where errors could have severe consequences.
The post University of Pennsylvania Researchers have Developed a Machine Learning Framework for Gauging the Efficacy of Vision-Based AI Features by Conducting a Battery of Tests on OpenAI’s ChatGPT-Vision appeared first on MarkTechPost.