Decoding AI Cognition: Unveiling the Color Perception of Large Languag …

Researchers are pushing what machines can comprehend and replicate regarding human cognitive processes. A groundbreaking study unveils an approach to peering into the minds of Large Language Models (LLMs), particularly focusing on GPT-4’s understanding of color. This research signifies a shift from traditional neural network analysis towards methodologies inspired by cognitive psychology, offering fresh insights into how AI systems conceptualize and process information.

The challenge of interpreting AI models lies in their complexity and the opaque nature of their internal workings. Previous techniques often involve dissecting the activation patterns of artificial neurons, which increasingly need to catch up as models grow in sophistication. The study introduces an ingenious methodology borrowed from cognitive psychology’s playbook. The team posits that human mental representations can be inferred from behavior, as can those of AI systems through their responses to specific probes.

The researchers from Princeton University and the University of Warwick employed direct sampling and Markov Chain Monte Carlo (MCMC) methods to interrogate GPT-4’s mental representations, focusing on color perception. This methodological choice is pivotal, offering a more nuanced and efficient way to understand AI’s thoughts. By simulating scenarios where GPT-4 is presented with choices or tasks related to color, the study aims to map out the AI’s conceptualization of color space, akin to how one might study human cognition.

What sets this study apart is its detailed methodology, which comprises direct prompting, direct sampling, MCMC, and Gibbs sampling to probe GPT-4’s perception of color. This multifaceted approach reflects a significant methodological leap. For instance, direct prompting involves asking GPT-4 to generate HSL (Hue, Saturation, Lightness) color codes for given objects, while direct sampling evaluates GPT-4’s binary responses to randomly selected colors. Meanwhile, adaptive methods like MCMC and Gibbs sampling iteratively refine the AI’s responses, allowing for a dynamic and nuanced exploration of its color representations.

By applying these behavioral methods, the researchers uncovered that adaptive techniques, namely MCMC and Gibbs sampling, were particularly effective in mirroring human-like color representations within GPT-4. This alignment between AI’s and humans’ conceptualizations of color underscores the potential of these methods to probe and understand the internal representations of LLMs accurately.

The implications of this research extend far beyond the specific realm of color perception:

It marks a paradigm shift in AI research, moving from static, neuron-focused analyses toward dynamic, behaviorally informed methodologies. This transition opens up new avenues for exploring the cognitive capabilities of AI systems in a manner that more closely resembles human psychological research.

The success of adaptive sampling methods in this study paves the way for their application in other domains of AI research, suggesting a broad utility for these techniques in uncovering the intricacies of AI cognition.

The study lays the groundwork for future research to demystify AI systems ‘thought processes by demonstrating the feasibility and effectiveness of applying cognitive psychology methods to AI. This approach could lead to more interpretable and human-like AI models, bridging the gap between human cognition and artificial intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Decoding AI Cognition: Unveiling the Color Perception of Large Language Models through Cognitive Psychology Methods appeared first on MarkTechPost.

Google DeepMind Unveils MusicRL: A Pretrained Autoregressive MusicLM M …

In the fascinating world of artificial intelligence and music, a team at Google DeepMind has made a groundbreaking stride. Their creation, MusicRL, is a beacon in the journey of music generation, leveraging the nuances of human feedback to shape the future of how machines understand and create music. This innovation stems from a simple yet profound realization: music, at its core, is a deeply personal and subjective experience. Traditional models, while technically proficient, often need to catch up on capturing the essence that makes music resonate on a personal level. MusicRL challenges this status quo by generating music and sculpting it according to the listener’s preferences.

The brilliance of MusicRL lies in its methodology, a sophisticated dance between technology and human emotion. At its foundation is MusicLM, an autoregressive model that serves as the canvas for MusicRL’s creativity. The model then undergoes a process akin to learning from the collective wisdom of its audience, employing reinforcement learning to refine its outputs. This isn’t just algorithmic training; it’s a dialogue between creator and consumer, where each note and harmony is shaped by human touch. The system was exposed to a dataset of 300,000 pairwise preferences, a testament to its commitment to understanding the vast landscape of human musical taste.

The results of this endeavor are nothing short of remarkable. MusicRL doesn’t just perform; it enchants, offering a listening experience that users prefer over the baseline models in extensive evaluations. The numbers speak volumes, with MusicRL’s versions consistently outshining their predecessors in head-to-head comparisons. This isn’t merely a win in technical excellence but a victory in capturing the elusive spark that ignites human emotion through music. The dual versions, MusicRL-R and MusicRL-U, each fine-tuned with different facets of human feedback, showcase the model’s versatility in adapting to and reflecting the diversity of human preferences.

What sets MusicRL apart is its technical prowess and its philosophical underpinning—the recognition of music as an expression of the human experience. This approach has opened new doors in AI-generated music beyond replicating sound to creating emotionally resonant and personally tailored musical experiences. The implications are vast, from personalized music creation to new forms of interactive musical experiences, heralding a future where AI and human creativity harmonize in unprecedented ways.

MusicRL is more than a technological achievement; it’s a step towards a new understanding of how we interact with and appreciate music. It challenges us to rethink the role of AI in creative processes, inviting a future where technology not only replicates but enriches the human experience. As we stand on the brink of this new era, MusicRL serves as a beacon, illuminating the path toward a world where music is not just heard but felt, deeply and personally, across the spectrum of human emotion.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Google DeepMind Unveils MusicRL: A Pretrained Autoregressive MusicLM Model of Discrete Audio Tokens Finetuned with Reinforcement Learning to Maximise Sequence-Level Rewards appeared first on MarkTechPost.

How BigBasket improved AI-enabled checkout at their physical stores us …

This post is co-written with Santosh Waddi and Nanda Kishore Thatikonda from BigBasket.
BigBasket is India’s largest online food and grocery store. They operate in multiple ecommerce channels such as quick commerce, slotted delivery, and daily subscriptions. You can also buy from their physical stores and vending machines. They offer a large assortment of over 50,000 products across 1,000 brands, and are operating in more than 500 cities and towns. BigBasket serves over 10 million customers.
In this post, we discuss how BigBasket used Amazon SageMaker to train their computer vision model for Fast-Moving Consumer Goods (FMCG) product identification, which helped them reduce training time by approximately 50% and save costs by 20%.
Customer challenges
Today, most supermarkets and physical stores in India provide manual checkout at the checkout counter. This has two issues:

It requires additional manpower, weight stickers, and repeated training for the in-store operational team as they scale.
In most stores, the checkout counter is different from the weighing counters, which adds to the friction in the customer purchase journey. Customers often lose the weight sticker and have to go back to the weighing counters to collect one again before proceeding with the checkout process.

Self-checkout process
BigBasket introduced an AI-powered checkout system in their physical stores that uses cameras to distinguish items uniquely. The following figure provides an overview of the checkout process.

The BigBasket team was running open source, in-house ML algorithms for computer vision object recognition to power AI-enabled checkout at their Fresho (physical) stores. We were facing the following challenges to operate their existing setup:

With the continuous introduction of new products, the computer vision model needed to continuously incorporate new product information. The system needed to handle a large catalog of over 12,000 Stock Keeping Units (SKUs), with new SKUs being continually added at a rate of over 600 per month.
To keep pace with new products, a new model was produced each month using the latest training data. It was costly and time consuming to train the models frequently to adapt to new products.
BigBasket also wanted to reduce the training cycle time to improve the time to market. Due to increases in SKUs, the time taken by the model was increasing linearly, which impacted their time to market because the training frequency was very high and took a long time.
Data augmentation for model training and manually managing the complete end-to-end training cycle was adding significant overhead. BigBasket was running this on a third-party platform, which incurred significant costs.

Solution overview
We recommended that BigBasket rearchitect their existing FMCG product detection and classification solution using SageMaker to address these challenges. Before moving to full-scale production, BigBasket tried a pilot on SageMaker to evaluate performance, cost, and convenience metrics.
Their objective was to fine-tune an existing computer vision machine learning (ML) model for SKU detection. We used a convolutional neural network (CNN) architecture with ResNet152 for image classification. A sizable dataset of around 300 images per SKU was estimated for model training, resulting in over 4 million total training images. For certain SKUs, we augmented data to encompass a broader range of environmental conditions.
The following diagram illustrates the solution architecture.

The complete process can be summarized into the following high-level steps:

Perform data cleansing, annotation, and augmentation.
Store data in an Amazon Simple Storage Service (Amazon S3) bucket.
Use SageMaker and Amazon FSx for Lustre for efficient data augmentation.
Split data into train, validation, and test sets. We used FSx for Lustre and Amazon Relational Database Service (Amazon RDS) for fast parallel data access.
Use a custom PyTorch Docker container including other open source libraries.
Use SageMaker Distributed Data Parallelism (SMDDP) for accelerated distributed training.
Log model training metrics.
Copy the final model to an S3 bucket.

BigBasket used SageMaker notebooks to train their ML models and were able to easily port their existing open source PyTorch and other open source dependencies to a SageMaker PyTorch container and run the pipeline seamlessly. This was the first benefit seen by the BigBasket team, because there were hardly any changes needed to the code to make it compatible to run on a SageMaker environment.
The model network consists of a ResNet 152 architecture followed by fully connected layers. We froze the low-level feature layers and retained the weights acquired through transfer learning from the ImageNet model. The total model parameters were 66 million, consisting of 23 million trainable parameters. This transfer learning-based approach helped them use fewer images at the time of training, and also enabled faster convergence and reduced the total training time.
Building and training the model within Amazon SageMaker Studio provided an integrated development environment (IDE) with everything needed to prepare, build, train, and tune models. Augmenting the training data using techniques like cropping, rotating, and flipping images helped improve the model training data and model accuracy.
Model training was accelerated by 50% through the use of the SMDDP library, which includes optimized communication algorithms designed specifically for AWS infrastructure. To improve data read/write performance during model training and data augmentation, we used FSx for Lustre for high-performance throughput.
Their starting training data size was over 1.5 TB. We used two Amazon Elastic Compute Cloud (Amazon EC2) p4d.24 large instances with 8 GPU and 40 GB GPU memory. For SageMaker distributed training, the instances need to be in the same AWS Region and Availability Zone. Also, training data stored in an S3 bucket needs to be in the same Availability Zone. This architecture also allows BigBasket to change to other instance types or add more instances to the current architecture to cater to any significant data growth or achieve further reduction in training time.
How the SMDDP library helped reduce training time, cost, and complexity
In traditional distributed data training, the training framework assigns ranks to GPUs (workers) and creates a replica of your model on each GPU. During each training iteration, the global data batch is divided into pieces (batch shards) and a piece is distributed to each worker. Each worker then proceeds with the forward and backward pass defined in your training script on each GPU. Finally, model weights and gradients from the different model replicas are synced at the end of the iteration through a collective communication operation called AllReduce. After each worker and GPU has a synced replica of the model, the next iteration begins.
The SMDDP library is a collective communication library that improves the performance of this distributed data parallel training process. The SMDDP library reduces the communication overhead of the key collective communication operations such as AllReduce. Its implementation of AllReduce is designed for AWS infrastructure and can speed up training by overlapping the AllReduce operation with the backward pass. This approach achieves near-linear scaling efficiency and faster training speed by optimizing kernel operations between CPUs and GPUs.
Note the following calculations:

The size of the global batch is (number of nodes in a cluster) * (number of GPUs per node) * (per batch shard)
A batch shard (small batch) is a subset of the dataset assigned to each GPU (worker) per iteration

BigBasket used the SMDDP library to reduce their overall training time. With FSx for Lustre, we reduced the data read/write throughput during model training and data augmentation. With data parallelism, BigBasket was able to achieve almost 50% faster and 20% cheaper training compared to other alternatives, delivering the best performance on AWS. SageMaker automatically shuts down the training pipeline post-completion. The project completed successfully with 50% faster training time in AWS (4.5 days in AWS vs. 9 days on their legacy platform).
At the time of writing this post, BigBasket has been running the complete solution in production for more than 6 months and scaling the system by catering to new cities, and we’re adding new stores every month.

“Our partnership with AWS on migration to distributed training using their SMDDP offering has been a great win. Not only did it cut down our training times by 50%, it was also 20% cheaper. In our entire partnership, AWS has set the bar on customer obsession and delivering results—working with us the whole way to realize promised benefits.”
– Keshav Kumar, Head of Engineering at BigBasket.

Conclusion
In this post, we discussed how BigBasket used SageMaker to train their computer vision model for FMCG product identification. The implementation of an AI-powered automated self-checkout system delivers an improved retail customer experience through innovation, while eliminating human errors in the checkout process. Accelerating new product onboarding by using SageMaker distributed training reduces SKU onboarding time and cost. Integrating FSx for Lustre enables fast parallel data access for efficient model retraining with hundreds of new SKUs monthly. Overall, this AI-based self-checkout solution provides an enhanced shopping experience devoid of frontend checkout errors. The automation and innovation have transformed their retail checkout and onboarding operations.
SageMaker provides end-to-end ML development, deployment, and monitoring capabilities such as a SageMaker Studio notebook environment for writing code, data acquisition, data tagging, model training, model tuning, deployment, monitoring, and much more. If your business is facing any of the challenges described in this post and wants to save time to market and improve cost, reach out to the AWS account team in your Region and get started with SageMaker.

About the Authors
Santosh Waddi is a Principal Engineer at BigBasket, brings over a decade of expertise in solving AI challenges. With a strong background in computer vision, data science, and deep learning, he holds a postgraduate degree from IIT Bombay. Santosh has authored notable IEEE publications and, as a seasoned tech blog author, he has also made significant contributions to the development of computer vision solutions during his tenure at Samsung.
Nanda Kishore Thatikonda is an Engineering Manager leading the Data Engineering and Analytics at BigBasket. Nanda has built multiple applications for anomaly detection and has a patent filed in a similar space. He has worked on building enterprise-grade applications, building data platforms in multiple organizations and reporting platforms to streamline decisions backed by data. Nanda has over 18 years of experience working in Java/J2EE, Spring technologies, and big data frameworks using Hadoop and Apache Spark.
Sudhanshu Hate is a Principal AI & ML Specialist with AWS and works with clients to advise them on their MLOps and generative AI journey. In his previous role, he conceptualized, created, and led teams to build a ground-up, open source-based AI and gamification platform, and successfully commercialized it with over 100 clients. Sudhanshu has to his credit a couple of patents; has written 2 books, several papers, and blogs; and has presented his point of view in various forums. He has been a thought leader and speaker, and has been in the industry for nearly 25 years. He has worked with Fortune 1000 clients across the globe and most recently is working with digital native clients in India.
Ayush Kumar is Solutions Architect at AWS. He is working with a wide variety of AWS customers, helping them adopt the latest modern applications and innovate faster with cloud-native technologies. You’ll find him experimenting in the kitchen in his spare time.

Amazon SageMaker Feature Store now supports cross-account sharing, dis …

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams, and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store to process, standardize, and use features at scale across the ML lifecycle.
SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. This new capability promotes collaboration and minimizes duplicate work for teams involved in ML model and application development, particularly in enterprise environments with multiple accounts spanning different business units or functions.
With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). After they’re granted access, users of those accounts can conveniently view all of their feature groups, including the shared ones, through Amazon SageMaker Studio or SDKs. This enables teams to discover and utilize features developed by other teams, fostering knowledge sharing and efficiency. Additionally, usage details of shared resources can be monitored with Amazon CloudWatch and AWS CloudTrail. For a deep dive, refer to Cross account feature group discoverability and access.
In this post, we discuss the why and how of a centralized feature store with cross-account access. We show how to set it up and run a sample demonstration, as well as the benefits you can get by using this new capability in your organization.
Who needs a cross-account feature store
Organizations need to securely share features across teams to build accurate ML models, while preventing unauthorized access to sensitive data. SageMaker Feature Store now allows granular sharing of features across accounts via AWS RAM, enabling collaborative model development with governance.
SageMaker Feature Store provides purpose-built storage and management for ML features used during training and inferencing. With cross-account support, you can now selectively share features stored in one AWS account with other accounts in your organization.
For example, the analytics team may curate features like customer profile, transaction history, and product catalogs in a central management account. These need to be securely accessed by ML developers in other departments like marketing, fraud detection, and so on to build models.
The following are key benefits of sharing ML features across accounts:

Consistent and reusable features – Centralized sharing of curated features improves model accuracy by providing consistent input data to train on. Teams can discover and directly consume features created by others instead of duplicating them in each account.
Feature group access control – You can grant access to only the specific feature groups required for an account’s use case. For example, the marketing team may only get access to the customer profile feature group needed for recommendation models.
Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features.
Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs. This provides an audit trail required for governance and compliance.

Delineating producers from consumers in cross-account feature stores
In the realm of machine learning, the feature store acts as a crucial bridge, connecting those who supply data with those who harness it. This dichotomy can be effectively managed using a cross-account setup for the feature store. Let’s demystify this using the following personas and a real-world analogy:

Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store
Data scientists (consumers) – They extract and utilize this data to craft their models

Data engineers serve as architects sketching the initial blueprint. Their task is to construct and oversee efficient data pipelines. Drawing data from source systems, they mold raw data attributes into discernable features. Take “age” for instance. Although it merely represents the span between now and one’s birthdate, its interpretation might vary across an organization. Ensuring quality, uniformity, and consistency is paramount here. Their aim is to feed data into a centralized feature store, establishing it as the undisputed reference point.
ML engineers refine these foundational features, tailoring them for mature ML workflows. In the context of banking, they might deduce statistical insights from account balances, identifying trends and flow patterns. The hurdle they often face is redundancy. It’s common to see repetitive feature creation pipelines across diverse ML initiatives.
Imagine data scientists as gourmet chefs scouting a well-stocked pantry, seeking the best ingredients for their next culinary masterpiece. Their time should be invested in crafting innovative data recipes, not in reassembling the pantry. The hurdle at this juncture is discovering the right data. A user-friendly interface, equipped with efficient search tools and comprehensive feature descriptions, is indispensable.
In essence, a cross-account feature store setup meticulously segments the roles of data producers and consumers, ensuring efficiency, clarity, and innovation. Whether you’re laying the foundation or building atop it, knowing your role and tools is pivotal.
The following diagram shows two different data scientist teams, from two different AWS accounts, who share and use the same central feature store to select the best features needed to build their ML models. The central feature store is located in a different account managed by data engineers and ML engineers, where the data governance layer and data lake are usually situated.

Cross-account feature group controls
With SageMaker Feature Store, you can share feature group resources across accounts. The resource owner account shares resources with the resource consumer accounts. There are two distinct categories of permissions associated with sharing resources:

Discoverability permissions – Discoverability means being able to see feature group names and metadata. When you grant discoverability permission, all feature group entities in the account that you share from (resource owner account) become discoverable by the accounts that you are sharing with (resource consumer accounts). For example, if you make the resource owner account discoverable by the resource consumer account, then principals of the resource consumer account can see all feature groups contained in the resource owner account. This permission is granted to resource consumer accounts by using the SageMaker catalog resource type.
Access permissions – When you grant an access permission, you do so at the feature group resource level (not the account level). This gives you more granular control over granting access to data. The type of access permissions that can be granted are read-only, read/write, and admin. For example, you can select only certain feature groups from the resource owner account to be accessible by principals of the resource consumer account, depending on your business needs. This permission is granted to resource consumer accounts by using the feature group resource type and specifying feature group entities.

The following example diagram visualizes sharing the SageMaker catalog resource type granting the discoverability permission vs. sharing a feature group resource type entity with access permissions. The SageMaker catalog contains all of your feature group entities. When granted a discoverability permission, the resource consumer account can search and discover all feature group entities within the resource owner account. A feature group entity contains your ML data. When granted an access permission, the resource consumer account can access the feature group data, with access determined by the relevant access permission.

Solution overview
Complete the following steps to securely share features between accounts using SageMaker Feature Store:

In the source (owner) account, ingest datasets and prepare normalized features. Organize related features into logical groups called feature groups.
Create a resource share to grant cross-account access to specific feature groups. Define allowed actions like get and put, and restrict access only to authorized accounts.
In the target (consumer) accounts, accept the AWS RAM invitation to access shared features. Review the access policy to understand permissions granted.

Developers in target accounts can now retrieve shared features using the SageMaker SDK, join with additional data, and use them to train ML models. The source account can monitor access to shared features by all accounts using CloudTrail event logs. Audit logs provide centralized visibility into feature usage.
With these steps, you can enable teams across your organization to securely use shared ML features for collaborative model development.
Prerequisites
We assume that you have already created feature groups and ingested the corresponding features inside your owner account. For more information about getting started, refer to Get started with Amazon SageMaker Feature Store.
Grant discoverability permissions
First, we demonstrate how to share our SageMaker Feature Store catalog in the owner account. Complete the following steps:

In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
Under Shared by me in the navigation pane, choose Resource shares.
Choose Create resource share.
Enter a resource share name and choose SageMaker Resource Catalogs as the resource type.
Choose Next.
For discoverability-only access, enter AWSRAMPermissionSageMakerCatalogResourceSearch for Managed permissions.
Choose Next.
Enter your consumer account ID and choose Add. You may add several consumer accounts.
Choose Next and complete your resource share.

Now the shared SageMaker Feature Store catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS Command Line Interface (AWS CLI) with the following command (provide your AWS Region, owner account ID, and consumer account ID):

aws ram create-resource-share
–name MyCatalogFG
–resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:sagemaker-catalog/DefaultFeatureGroupCatalog
–principals CONSACCOUNTID
–permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerCatalogResourceSearch

Accept the resource share invite
To accept the resource share invite, complete the following steps:

In the target (consumer) account, open the AWS RAM console.
Under Shared with me in the navigation pane, choose Resource shares.
Choose the new pending resource share.
Choose Accept resource share.

You can achieve the same result using the AWS CLI with the following command:

aws ram get-resource-share-invitations

From the output of preceding command, retrieve the value of resourceShareInvitationArn and then accept the invitation with the following command:

aws ram accept-resource-share-invitation
–resource-share-invitation-arn RESOURCESHAREINVITATIONARN

The workflow is the same for sharing feature groups with another account via AWS RAM.
After you share some feature groups with the target account, you can inspect the SageMaker Feature Store, where you can observe that the new catalog is available.

Grant access permissions
With access permissions, we can grant permissions at the feature group resource level. Complete the following steps:

In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
Under Shared by me in the navigation pane, choose Resource shares.
Choose Create resource share.
Enter a resource share name and choose SageMaker Feature Groups as the resource type.
Select one or more feature groups to share.
Choose Next.
For read/write access, enter AWSRAMPermissionSageMakerFeatureGroupReadWrite for Managed permissions.
Choose Next.
Enter your consumer account ID and choose Add. You may add several consumer accounts.
Choose Next and complete your resource share.

Now the shared catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS CLI with the following command (provide your Region, owner account ID, consumer account ID, and feature group name):

aws ram create-resource-share
–name MyCatalogFG
–resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:feature-group/FEATUREGROUPNAME
–principals CONSACCOUNTID
–permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite

There are three types of access that you can grant to feature groups:

AWSRAMPermissionSageMakerFeatureGroupReadOnly – The read-only privilege allows resource consumer accounts to read records in the shared feature groups and view details and metadata
AWSRAMPermissionSageMakerFeatureGroupReadWrite – The read/write privilege allows resource consumer accounts to write records to, and delete records from, the shared feature groups, in addition to read permissions
AWSRAMPermissionSagemakerFeatureGroupAdmin – The admin privilege allows the resource consumer accounts to update the description and parameters of features within the shared feature groups and update the configuration of the shared feature groups, in addition to read/write permissions

Accept the resource share invite
To accept the resource share invite, complete the following steps:

In the target (consumer) account, open the AWS RAM console.
Under Shared with me in the navigation pane, choose Resource shares.
Choose the new pending resource share.
Choose Accept resource share.

The process of accepting the resource share using the AWS CLI is the same as for the previous discoverability section, with the get-resource-share-invitations and accept-resource-share-invitation commands.
Sample notebooks showcasing this new capability
Two notebooks were added to the SageMaker Feature Store Workshop GitHub repository in the folder 09-module-security/09-03-cross-account-access:

m9_03_nb1_cross-account-admin.ipynb – This needs to be launched on your admin or owner AWS account
m9_03_nb2_cross-account-consumer.ipynb – This needs to be launched on your consumer AWS account

The first script shows how to create the discoverability resource share for existing feature groups at the admin or owner account and share it with another consumer account programmatically using the AWS RAM API create_resource_share(). It also shows how to grant access permissions to existing feature groups at the owner account and share these with another consumer account using AWS RAM. You need to provide your consumer AWS account ID before running the notebook.
The second script accepts the AWS RAM invitations to discover and access cross-account feature groups from the owner level. Then it shows how to discover cross-account feature groups that are on the owner account and list these on the consumer account. You can also see how to access in read/write cross-account feature groups that are on the owner account and perform the following operations from the consumer account: describe(), get_record(), ingest(), and delete_record().
Conclusion
The SageMaker Feature Store cross-account capability offers several compelling benefits. Firstly, it facilitates seamless collaboration by enabling sharing of feature groups across multiple AWS accounts. This enhances data accessibility and utilization, allowing teams in different accounts to use shared features for their ML workflows.
Additionally, the cross-account capability enhances data governance and security. With controlled access and permissions through AWS RAM, organizations can maintain a centralized feature store while ensuring that each account has tailored access levels. This not only streamlines data management, but also strengthens security measures by limiting access to authorized users.
Furthermore, the ability to share feature groups across accounts simplifies the process of building and deploying ML models in a collaborative environment. It fosters a more integrated and efficient workflow, reducing redundancy in data storage and facilitating the creation of robust models with shared, high-quality features. Overall, the Feature Store’s cross-account capability optimizes collaboration, governance, and efficiency in ML development across diverse AWS accounts. Give it a try, and let us know what you think in the comments.

About the Authors
Ioan Catana is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.
Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Outside of work, Philipp enjoys tinkering with 3D printers, sailing, and hiking.
Dhaval Shah is a Senior Solutions Architect at AWS, specializing in machine learning. With a strong focus on digital native businesses, he empowers customers to use AWS and drive their business growth. As an ML enthusiast, Dhaval is driven by his passion for creating impactful solutions that bring positive change. In his leisure time, he indulges in his love for travel and cherishes quality moments with his family.
Mizanur Rahman is a Senior Software Engineer for Amazon SageMaker Feature Store with over 10 years of hands-on experience specializing in AI and ML. With a strong foundation in both theory and practical applications, he holds a Ph.D. in Fraud Detection using Machine Learning, reflecting his dedication to advancing the field. His expertise spans a broad spectrum, encompassing scalable architectures, distributed computing, big data analytics, micro services and cloud infrastructures for organizations.

Apple AI Research Releases MLLM-Guided Image Editing (MGIE) to Enhance …

The use of advanced design tools has brought about revolutionary transformations in the fields of multimedia and visual design. As an important development in the field of picture modification, instruction-based image editing has increased the process’s control and flexibility. Natural language commands are used to change photographs, removing the requirement for detailed explanations or particular masks to direct the editing process. 

However, a typical problem occurs when human instructions are too brief for current systems to understand and carry out properly. Multimodal Large Language Models (MLLMs) come into the picture to address this challenge. MLLMs demonstrate impressive cross-modal comprehension skills, easily combining textual and visual data. These models do exceptionally well at producing visually informed and linguistically accurate responses. 

In their recent research, a team of researchers from UC Santa Barbara and Apple has explored how MLLMs can revolutionize instruction-based picture editing, resulting in the creation of Multimodal Large Language Model-Guided Picture Editing (MGIE). MGIE operates by learning to extract expressive instructions from human input, giving clear direction for the image alteration process that follows. 

Through end-to-end training, the model incorporates this understanding into the editing process, capturing the visual creativity that is inherent in these instructions. By integrating MLLMs, MGIE understands and interprets brief but contextually rich instructions, overcoming the constraints imposed by human directions that are too brief.

In order to determine MGIE’s effectiveness, the team has carried out a thorough analysis covering several aspects of picture editing. This involved testing its performance in local editing chores, global photo optimization, and Photoshop-style adjustments. The experiment outcomes highlighted how important expressive instructions are to instruction-based image modification. 

MGIE showed a significant improvement in both automated measures and human evaluation by utilizing MLLMs. This enhancement is accomplished while preserving competitive inference efficiency, guaranteeing that the model is useful for practical, real-world applications in addition to being effective.

The team has summarised their primary contributions as follows.

A unique approach called MGIE has been introduced, which includes learning an editing model and Multimodal Large Language Models (MLLMs) simultaneously.

Expressive instructions that are cognizant of visual cues have been added to provide clear direction during the image editing process.

Numerous aspects of image editing have been examined, such as local editing, global photo optimization, and Photoshop-style modification.

The efficacy of MGIE has been evaluated by qualitative comparisons, including several editing features. The effects of expressive instructions that are cognizant of visual cues on image editing have been assessed through extensive trials.

In conclusion, instruction-based image editing, which is made possible by MLLMs, represents a substantial advancement in the search for more understandable and effective image alteration. As a concrete example of this, MGIE highlights how expressive instructions may be used to improve the overall quality and user experience of image editing jobs. The results of the study have emphasized the importance of these instructions by showing that MGIE improves editing performance in a variety of editing jobs.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Apple AI Research Releases MLLM-Guided Image Editing (MGIE) to Enhance Instruction-based Image Editing via Learning to Produce Expressive Instructions appeared first on MarkTechPost.

This AI Paper Proposes Two Types of Convolution, Pixel Difference Conv …

Deep convolutional neural networks (DCNNs) have been a game-changer for several computer vision tasks. These include object identification, object recognition, image segmentation, and edge detection. The ever-growing size and power consumption of DNNs have been key to enabling much of this advancement. Embedded, wearable, and Internet of Things (IoT) devices, which have restricted computing resources and low power, as well as drones, pose significant challenges to sustainability, environmental friendliness, and broad economic viability because of their computationally expensive DNNs despite their high accuracy. As a result, many people are interested in finding ways to maximize the energy efficiency of DNNs through algorithm and hardware optimization.

Model quantization, efficient neural architecture search, compact network design, knowledge distillation, and tensor decomposition are among the most popular DNN compression and acceleration approaches.

Researchers from the University of Oulu, the National University of Defense Technology, the Chinese Academy of Sciences, and the Aviation University of Air Force aim to improve DCNN efficiency by delving into the inner workings of deep features. Network depth and convolution are the two primary components of a DCNN that determine its expressive power. In the first case, a deep convolutional neural network (DCNN) learns a series of hierarchical representations that map to higher abstraction levels. The second method is known as convolution, and it involves exploring image patterns with local operators that are translation invariant. This is similar to how local descriptors are extracted in conventional frameworks for shallow image representation. Although Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), and Sorted Random Projections (SRPs) are well-known for their discriminative power and robustness in describing fine-grained image information, the conventional shallow BoW pipeline may restrict their use. But in contrast, DCNNs’ traditional convolutional layer merely records pixel intensity cues, leaving out important information about the image’s microstructure, such as higher-order local gradients.

The researchers wanted to explore how to merge conventional local descriptors with DCNNs for the greatest of all worlds. They found that such higher-order local differential information, which is overlooked by conventional convolution, can effectively capture microtexture information and was already effective before deep learning; consequently, they believe that this area deserves more attention and should be investigated in the future. 

Their recent work provides two convolutional layers, PDC and Bi-PDC, which can augment vanilla convolution by capturing higher-order local differential information. They work well with preexisting DCNNs and are computationally efficient. They want to improve the commonly used CNN architectures for vision applications by creating a generic convolution operation called PDC. The LBP mechanism is incorporated into the basic convolution operations in their PDC design so that filters can probe local pixel differences instead of pixel intensities. To extract rich higher-order feature statistics from distinct encoding orientations, they build three PDC instances—Central PDC, Angular PDC, and Radial PDC—using different LBP probing algorithms. 

There are three notable characteristics of PDC in general. 

Feature maps are enhanced in diversity because they can generate features with high-order information that complement features produced by vanilla convolutions. 

In addition, it is completely differentiable and can be easily integrated into any network design for comprehensive optimization. 

Users can improve efficiency by using it with other network acceleration techniques, such as network binarization.

They create a new small DCNN architecture called Pixel Difference Network (PiDiNet) to do the edge detection job using the suggested PDC. As mentioned in their paper, PiDiNet is the first deep network to perform at a human level on the widely used BSDS500 dataset without requiring ImageNet pretraining.

To show that their method works for both low-level tasks (like edge detection) and high-level ones (like image classification and facial recognition), they construct two very efficient DCNN architectures using PDC and Bi-PDC, called Binary Pixel Difference Networks (Bi-PiDiNet) that can combine Bi-PDC with vanilla binary convolution in a flexible way. This architecture can efficiently recognize objects in images by capturing zeroth-order and higher-order local picture information. Miniaturized and, more precisely, Bi-PiDiNet is the result of careful design. 

The proposed PiDiNet and Bi-PiDiNet outperform the state-of-the-art in terms of efficiency and accuracy in extensive experimental evaluations conducted on widely used datasets for edge detection, image classification, and facial recognition. PiDiNet and Bi-PiDiNet are new proposals that could improve the efficiency of edge vision tasks by using lightweight deep models. 

The researchers keep much room for future research on PDC and Bi-PDC. Microstructurally, several pattern probing methodologies can be explored to produce (Bi-)PDC instances for specific tasks. Looking at the big picture, establishing numerous (Bi-)PDC instances optimally can improve a network. They anticipate that numerous semantically low- and high-level computer vision (CV) tasks, such as object detection, salient object detection, face behavior analysis, etc., will benefit from the suggested (Bi-)PDC due to its capacity to capture high-order information.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper Proposes Two Types of Convolution, Pixel Difference Convolution (PDC) and Binary Pixel Difference Convolution (Bi-PDC), to Enhance the Representation Capacity of Convolutional Neural Network CNNs appeared first on MarkTechPost.

Google Research Introduces TimesFM: A Single Forecasting Model Pre-Tra …

Time Series forecasting is an important task in machine learning and is frequently used in various domains such as finance, manufacturing, healthcare, and natural sciences. Researchers from Google introduced a decoder-only model for the task, called TimeFM, based on pretraining a patched-decoder style attention model on a large time-series corpus comprising both real-world and synthetic datasets. Time series data, collected at regular intervals over time, plays a crucial role in predicting future values. Traditional methods like ARIMA and GARCH have been widely used. The recent advancements in deep learning, particularly in large language models (LLMs) for Natural Language Processing (NLP), have opened new ways for researchers to handle time series forecasting by applying these models to the task.

The existing deep learning models such as DeepAR, Temporal Convolutions, and NBEATS are popular for time series forecasting, outperforming traditional statistical methods. There has been recent work on reusing or fine-tuning large language models (LLMs) like GPT-3 and LLaMA-2 for time series forecasting. In the paper, the researchers aim to investigate if a model pre-trained on massive amounts of time-series data can learn temporal patterns useful for accurate forecasting on previously unseen datasets.

TimesFM’s architecture involves a stacked transformer with a patched-decoder style attention mechanism inspired by successful patch-based modeling in long-horizon forecasting. The proposed model uses decoder-only training, which allows the model to predict the future by seeing different numbers of input patches in parallel. The data for training includes both real-world and synthetic data. The real-world data is taken from diverse sources like Google Trends and Wiki Pageviews, while the synthetic data is generated from statistical models like ARIMA.

Experiments demonstrate that TimesFM achieves impressive zero-shot forecasting performance. Not only the performance of the model is impressive but also it is more efficient than the existing models in parameter size and pretraining data. The model is evaluated on public datasets from Darts, Monash, and Informer, showcasing its ability to generalize and outperform specialized baselines.

Training on a wide corpus of synthetic and real-world data, TimesFM is a groundbreaking time series foundation model. The model’s unique architecture, which includes a patched-decoder attention mechanism and decoder-only training, contributes to its strong zero-shot forecasting performance. TimesFM’s ability to outperform baselines across multiple datasets demonstrates the potential of large pre-trained models for time series forecasting, providing a promising avenue for reducing training data and computational requirements in this field.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Google Research Introduces TimesFM: A Single Forecasting Model Pre-Trained on a Large Time-Series Corpus of 100B Real World Time-Points appeared first on MarkTechPost.

How Booking.com modernized its ML experimentation framework with Amazo …

This post is co-written with Kostia Kofman and Jenny Tokar from Booking.com.
As a global leader in the online travel industry, Booking.com is always seeking innovative ways to enhance its services and provide customers with tailored and seamless experiences. The Ranking team at Booking.com plays a pivotal role in ensuring that the search and recommendation algorithms are optimized to deliver the best results for their users.

Sharing in-house resources with other internal teams, the Ranking team machine learning (ML) scientists often encountered long wait times to access resources for model training and experimentation – challenging their ability to rapidly experiment and innovate. Recognizing the need for a modernized ML infrastructure, the Ranking team embarked on a journey to use the power of Amazon SageMaker to build, train, and deploy ML models at scale.
Booking.com collaborated with AWS Professional Services to build a solution to accelerate the time-to-market for improved ML models through the following improvements:

Reduced wait times for resources for training and experimentation
Integration of essential ML capabilities such as hyperparameter tuning
A reduced development cycle for ML models

Reduced wait times would mean that the team could quickly iterate and experiment with models, gaining insights at a much faster pace. Using SageMaker on-demand available instances allowed for a tenfold wait time reduction. Essential ML capabilities such as hyperparameter tuning and model explainability were lacking on premises. The team’s modernization journey introduced these features through Amazon SageMaker Automatic Model Tuning and Amazon SageMaker Clarify. Finally, the team’s aspiration was to receive immediate feedback on each change made in the code, reducing the feedback loop from minutes to an instant, and thereby reducing the development cycle for ML models.
In this post, we delve into the journey undertaken by the Ranking team at Booking.com as they harnessed the capabilities of SageMaker to modernize their ML experimentation framework. By doing so, they not only overcame their existing challenges, but also improved their search experience, ultimately benefiting millions of travelers worldwide.
Approach to modernization
The Ranking team consists of several ML scientists who each need to develop and test their own model offline. When a model is deemed successful according to the offline evaluation, it can be moved to production A/B testing. If it shows online improvement, it can be deployed to all the users.
The goal of this project was to create a user-friendly environment for ML scientists to easily run customizable Amazon SageMaker Model Building Pipelines to test their hypotheses without the need to code long and complicated modules.
One of the several challenges faced was adapting the existing on-premises pipeline solution for use on AWS. The solution involved two key components:

Modifying and extending existing code – The first part of our solution involved the modification and extension of our existing code to make it compatible with AWS infrastructure. This was crucial in ensuring a smooth transition from on-premises to cloud-based processing.
Client package development – A client package was developed that acts as a wrapper around SageMaker APIs and the previously existing code. This package combines the two, enabling ML scientists to easily configure and deploy ML pipelines without coding.

SageMaker pipeline configuration
Customizability is key to the model building pipeline, and it was achieved through config.ini, an extensive configuration file. This file serves as the control center for all inputs and behaviors of the pipeline.
Available configurations inside config.ini include:

Pipeline details – The practitioner can define the pipeline’s name, specify which steps should run, determine where outputs should be stored in Amazon Simple Storage Service (Amazon S3), and select which datasets to use
AWS account details – You can decide which Region the pipeline should run in and which role should be used
Step-specific configuration – For each step in the pipeline, you can specify details such as the number and type of instances to use, along with relevant parameters

The following code shows an example configuration file:

[BUILD]
pipeline_name = ranking-pipeline
steps = DATA_TRANFORM, TRAIN, PREDICT, EVALUATE, EXPLAIN, REGISTER, UPLOAD
train_data_s3_path = s3://…

[AWS_ACCOUNT]
region = eu-central-1

[DATA_TRANSFORM_PARAMS]
input_data_s3_path = s3://…
compression_type = GZIP
….
[TRAIN_PARAMS]
instance_count = 3
instance_type = ml.g5.4xlarge
epochs = 1
enable_sagemaker_debugger = True

[PREDICT_PARAMS]
instance_count = 3
instance_type = ml.g5.4xlarge

[EVALUATE_PARAMS]
instance_type = ml.m5.8xlarge
batch_size = 2048

[EXPLAIN_PARAMS]
check_job_instance_type = ml.c5.xlarge
generate_baseline_with_clarify = False
….

config.ini is a version-controlled file managed by Git, representing the minimal configuration required for a successful training pipeline run. During development, local configuration files that are not version-controlled can be utilized. These local configuration files only need to contain settings relevant to a specific run, introducing flexibility without complexity. The pipeline creation client is designed to handle multiple configuration files, with the latest one taking precedence over previous settings.
SageMaker pipeline steps
The pipeline is divided into the following steps:

Train and test data preparation – Terabytes of raw data are copied to an S3 bucket, processed using AWS Glue jobs for Spark processing, resulting in data structured and formatted for compatibility.
Train – The training step uses the TensorFlow estimator for SageMaker training jobs. Training occurs in a distributed manner using Horovod, and the resulting model artifact is stored in Amazon S3. For hyperparameter tuning, a hyperparameter optimization (HPO) job can be initiated, selecting the best model based on the objective metric.
Predict – In this step, a SageMaker Processing job uses the stored model artifact to make predictions. This process runs in parallel on available machines, and the prediction results are stored in Amazon S3.
Evaluate – A PySpark processing job evaluates the model using a custom Spark script. The evaluation report is then stored in Amazon S3.
Condition – After evaluation, a decision is made regarding the model’s quality. This decision is based on a condition metric defined in the configuration file. If the evaluation is positive, the model is registered as approved; otherwise, it’s registered as rejected. In both cases, the evaluation and explainability report, if generated, are recorded in the model registry.
Package model for inference – Using a processing job, if the evaluation results are positive, the model is packaged, stored in Amazon S3, and made ready for upload to the internal ML portal.
Explain – SageMaker Clarify generates an explainability report.

Two distinct repositories are used. The first repository contains the definition and build code for the ML pipeline, and the second repository contains the code that runs inside each step, such as processing, training, prediction, and evaluation. This dual-repository approach allows for greater modularity, and enables science and engineering teams to iterate independently on ML code and ML pipeline components.
The following diagram illustrates the solution workflow.

Automatic model tuning
Training ML models requires an iterative approach of multiple training experiments to build a robust and performant final model for business use. The ML scientists have to select the appropriate model type, build the correct input datasets, and adjust the set of hyperparameters that control the model learning process during training.
The selection of appropriate values for hyperparameters for the model training process can significantly influence the final performance of the model. However, there is no unique or defined way to determine which values are appropriate for a specific use case. Most of the time, ML scientists will need to run multiple training jobs with slightly different sets of hyperparameters, observe the model training metrics, and then try to select more promising values for the next iteration. This process of tuning model performance is also known as hyperparameter optimization (HPO), and can at times require hundreds of experiments.
The Ranking team used to perform HPO manually in their on-premises environment because they could only launch a very limited number of training jobs in parallel. Therefore, they had to run HPO sequentially, test and select different combinations of hyperparameter values manually, and regularly monitor progress. This prolonged the model development and tuning process and limited the overall number of HPO experiments that could run in a feasible amount of time.
With the move to AWS, the Ranking team was able to use the automatic model tuning (AMT) feature of SageMaker. AMT enables Ranking ML scientists to automatically launch hundreds of training jobs within hyperparameter ranges of interest to find the best performing version of the final model according to the chosen metric. The Ranking team is now able choose between four different automatic tuning strategies for their hyperparameter selection:

Grid search – AMT will expect all hyperparameters to be categorical values, and it will launch training jobs for each distinct categorical combination, exploring the entire hyperparameter space.
Random search – AMT will randomly select hyperparameter values combinations within provided ranges. Because there is no dependency between different training jobs and parameter value selection, multiple parallel training jobs can be launched with this method, speeding up the optimal parameter selection process.
Bayesian optimization – AMT uses Bayesian optimization implementation to guess the best set of hyperparameter values, treating it as a regression problem. It will consider previously tested hyperparameter combinations and its impact on the model training jobs with the new parameter selection, optimizing for smarter parameter selection with fewer experiments, but it will also launch training jobs only sequentially to always be able to learn from previous trainings.
Hyperband – AMT will use intermediate and final results of the training jobs it’s running to dynamically reallocate resources towards training jobs with hyperparameter configurations that show more promising results while automatically stopping those that underperform.

AMT on SageMaker enabled the Ranking team to reduce the time spent on the hyperparameter tuning process for their model development by enabling them for the first time to run multiple parallel experiments, use automatic tuning strategies, and perform double-digit training job runs within days, something that wasn’t feasible on premises.
Model explainability with SageMaker Clarify
Model explainability enables ML practitioners to understand the nature and behavior of their ML models by providing valuable insights for feature engineering and selection decisions, which in turn improves the quality of the model predictions. The Ranking team wanted to evaluate their explainability insights in two ways: understand how feature inputs affect model outputs across their entire dataset (global interpretability), and also be able to discover input feature influence for a specific model prediction on a data point of interest (local interpretability). With this data, Ranking ML scientists can make informed decisions on how to further improve their model performance and account for the challenging prediction results that the model would occasionally provide.
SageMaker Clarify enables you to generate model explainability reports using Shapley Additive exPlanations (SHAP) when training your models on SageMaker, supporting both global and local model interpretability. In addition to model explainability reports, SageMaker Clarify supports running analyses for pre-training bias metrics, post-training bias metrics, and partial dependence plots. The job will be run as a SageMaker Processing job within the AWS account and it integrates directly with the SageMaker pipelines.
The global interpretability report will be automatically generated in the job output and displayed in the Amazon SageMaker Studio environment as part of the training experiment run. If this model is then registered in SageMaker model registry, the report will be additionally linked to the model artifact. Using both of these options, the Ranking team was able to easily track back different model versions and their behavioral changes.
To explore input feature impact on a single prediction (local interpretability values), the Ranking team enabled the parameter save_local_shap_values in the SageMaker Clarify jobs and was able to load them from the S3 bucket for further analyses in the Jupyter notebooks in SageMaker Studio.

The preceding images show an example of how a model explainability would look like for an arbitrary ML model.
Training optimization
The rise of deep learning (DL) has led to ML becoming increasingly reliant on computational power and vast amounts of data. ML practitioners commonly face the hurdle of efficiently using resources when training these complex models. When you run training on large compute clusters, various challenges arise in optimizing resource utilization, including issues like I/O bottlenecks, kernel launch delays, memory constraints, and underutilized resources. If the configuration of the training job is not fine-tuned for efficiency, these obstacles can result in suboptimal hardware usage, prolonged training durations, or even incomplete training runs. These factors increase project costs and delay timelines.
Profiling of CPU and GPU usage helps understand these inefficiencies, determine the hardware resource consumption (time and memory) of the various TensorFlow operations in your model, resolve performance bottlenecks, and, ultimately, make the model run faster.
Ranking team used the framework profiling feature of Amazon SageMaker Debugger (now deprecated in favor of Amazon SageMaker Profiler) to optimize these training jobs. This allows you to track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs.
Ranking team also used the TensorFlow Profiler feature of TensorBoard, which further helped profile the TensorFlow model training. SageMaker is now further integrated with TensorBoard and brings the visualization tools of TensorBoard to SageMaker, integrated with SageMaker training and domains. TensorBoard allows you to perform model debugging tasks using the TensorBoard visualization plugins.
With the help of these two tools, Ranking team optimized the their TensorFlow model and were able to identify bottlenecks and reduce the average training step time from 350 milliseconds to 140 milliseconds on CPU and from 170 milliseconds to 70 milliseconds on GPU, speedups of 60% and 59%, respectively.
Business outcomes
The migration efforts centered around enhancing availability, scalability, and elasticity, which collectively brought the ML environment towards a new level of operational excellence, exemplified by the increased model training frequency and decreased failures, optimized training times, and advanced ML capabilities.
Model training frequency and failures
The number of monthly model training jobs increased fivefold, leading to significantly more frequent model optimizations. Furthermore, the new ML environment led to a reduction in the failure rate of pipeline runs, dropping from approximately 50% to 20%. The failed job processing time decreased drastically, from over an hour on average to a negligible 5 seconds. This has strongly increased operational efficiency and decreased resource wastage.
Optimized training time
The migration brought with it efficiency increases through SageMaker-based GPU training. This shift decreased model training time to a fifth of its previous duration. Previously, the training processes for deep learning models consumed around 60 hours on CPU; this was streamlined to approximately 12 hours on GPU. This improvement not only saves time but also expedites the development cycle, enabling faster iterations and model improvements.
Advanced ML capabilities
Central to the migration’s success is the use of the SageMaker feature set, encompassing hyperparameter tuning and model explainability. Furthermore, the migration allowed for seamless experiment tracking using Amazon SageMaker Experiments, enabling more insightful and productive experimentation.
Most importantly, the new ML experimentation environment supported the successful development of a new model that is now in production. This model is deep learning rather than tree-based and has introduced noticeable improvements in online model performance.
Conclusion
This post provided an overview of the AWS Professional Services and Booking.com collaboration that resulted in the implementation of a scalable ML framework and successfully reduced the time-to-market of ML models of their Ranking team.
The Ranking team at Booking.com learned that migrating to the cloud and SageMaker has proved beneficial, and that adapting machine learning operations (MLOps) practices allows their ML engineers and scientists to focus on their craft and increase development velocity. The team is sharing the learnings and work done with the entire ML community at Booking.com, through talks and dedicated sessions with ML practitioners where they share the code and capabilities. We hope this post can serve as another way to share the knowledge.
AWS Professional Services is ready to help your team develop scalable and production-ready ML in AWS. For more information, see AWS Professional Services or reach out through your account manager to get in touch.

About the Authors
Laurens van der Maas is a Machine Learning Engineer at AWS Professional Services. He works closely with customers building their machine learning solutions on AWS, specializes in distributed training, experimentation and responsible AI, and is passionate about how machine learning is changing the world as we know it.
Daniel Zagyva is a Data Scientist at AWS Professional Services. He specializes in developing scalable, production-grade machine learning solutions for AWS customers. His experience extends across different areas, including natural language processing, generative AI and machine learning operations.
Kostia Kofman is a Senior Machine Learning Manager at Booking.com, leading the Search Ranking ML team, overseeing Booking.com’s most extensive ML system. With expertise in Personalization and Ranking, he thrives on leveraging cutting-edge technology to enhance customer experiences.
Jenny Tokar is a Senior Machine Learning Engineer at Booking.com’s Search Ranking team. She specializes in developing end-to-end ML pipelines characterized by efficiency, reliability, scalability, and innovation. Jenny’s expertise empowers her team to create cutting-edge ranking models that serve millions of users every day.
Aleksandra Dokic is a Senior Data Scientist at AWS Professional Services. She enjoys supporting customers to build innovative AI/ML solutions on AWS and she is excited about business transformations through the power of data.
Luba Protsiva is an Engagement Manager at AWS Professional Services. She specializes in delivering Data and GenAI/ML solutions that enable AWS customers to maximize their business value and accelerate speed of innovation.

This AI Paper Introduces StepCoder: A Novel Reinforcement Learning Fra …

Large language models (LLMs) are advancing the automation of computer code generation in artificial intelligence. These sophisticated models, trained on extensive datasets of programming languages, have shown remarkable proficiency in crafting code snippets from natural language instructions. Despite their prowess, aligning these models with the nuanced requirements of human programmers remains a significant hurdle. While effective to a degree, traditional methods often fall short when faced with complex, multi-faceted coding tasks, leading to outputs that, although syntactically correct, may only partially capture the intended functionality.

Enter StepCoder, an innovative reinforcement learning (RL) framework designed by research teams from Fudan NLPLab, Huazhong University of Science and Technology, and KTH Royal Institute of Technology to tackle the nuanced challenges of code generation. At its core, StepCoder aims to refine the code creation process, making it more aligned with human intent and significantly more efficient. The framework distinguishes itself through two main components: the Curriculum of Code Completion Subtasks (CCCS) and Fine-Grained Optimization (FGO). Together, these mechanisms address the twin challenges of exploration in the vast space of potential code solutions and the precise optimization of the code generation process.

CCCS revolutionizes exploration by segmenting the daunting task of generating long code snippets into manageable subtasks. This systematic breakdown simplifies the model’s learning curve, enabling it to tackle increasingly complex coding requirements gradually with greater accuracy. As the model progresses, it navigates from completing simpler chunks of code to synthesizing entire programs based solely on human-provided prompts. This step-by-step escalation makes the exploration process more tractable and significantly enhances the model’s capability to generate functional code from abstract requirements.

The FGO component complements CCCS by honing in on the optimization process. It leverages a dynamic masking technique to focus the model’s learning on executed code segments, disregarding irrelevant portions. This targeted optimization ensures that the learning process is directly tied to the functional correctness of the code, as determined by the outcomes of unit tests. The result is a model that generates syntactically correct code and is functionally sound and more closely aligned with the programmer’s intentions.

The efficacy of StepCoder was rigorously tested against existing benchmarks, showcasing superior performance in generating code that met complex requirements. The framework’s ability to navigate the output space more efficiently and produce functionally accurate code sets a new standard in automated code generation. Its success lies in the technological innovation it represents and its approach to learning, which closely mirrors the incremental nature of human skill acquisition.

This research marks a significant milestone in bridging the gap between human programming intent and machine-generated code. StepCoder’s novel approach to tackling the challenges of code generation highlights the potential for reinforcement learning to transform how we interact with and leverage artificial intelligence in programming. As we move forward, the insights gleaned from this study offer a promising path toward more intuitive, efficient, and effective tools for code generation, paving the way for advancements that could redefine the landscape of software development and artificial intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper Introduces StepCoder: A Novel Reinforcement Learning Framework for Code Generation appeared first on MarkTechPost.

Meet UniDep: A Tool that Streamlines Python Project Dependency Managem …

Handling dependencies in Python projects can often become daunting, especially when dealing with a mix of Python and non-Python packages. The constant juggling between different dependency files can lead to confusion and inefficiencies in the development process. Meet UniDep, a tool designed to streamline and simplify Python dependency management, making it an invaluable asset for developers, particularly in research, data science, robotics, AI, and ML projects.

Unified Dependency File

UniDep introduces a unified approach to managing Conda and Pip dependencies in a single file, using requirements.yaml or pyproject.toml. This eliminates the need to maintain separate files, such as requirements.txt and environment.yaml, simplifying the entire dependency landscape.

Build System Integration

One of UniDep’s notable features is its seamless integration with Setuptools and Hatchling. This ensures automatic dependency handling during the installation process, making it a breeze to set up development environments with just a single command: 

`unidep install ./your-package`.

One-Command Installation

UniDep’s `unidep install` command effortlessly handles Conda, Pip, and local dependencies, providing a comprehensive solution for developers seeking a hassle-free installation process.

Monorepo-Friendly

For projects within a monorepo structure, UniDep excels in rendering multiple requirements.yaml or pyproject.toml files into a single Conda environment.yaml file. This ensures consistent global and per-subpackage conda-lock files, simplifying dependency management across interconnected projects.

Platform-Specific Support

UniDep acknowledges the diversity of operating systems and architectures by allowing developers to specify dependencies tailored to different platforms. This ensures a smooth experience when working across various environments.

pip-compile Integration

UniDep integrates with pip-compile, enabling the generation of fully pinned requirements.txt files from requirements.yaml or pyproject.toml files. This promotes environment reproducibility and stability.

Integration with conda-lock

UniDep enhances the functionality of conda-lock by allowing the generation of fully pinned conda-lock.yml files from one or more requirements.yaml or pyproject.toml files. This tight integration ensures consistency in dependency versions, which is crucial for reproducible environments.

Nerd Stats

Developed in Python, UniDep boasts over 99% test coverage, full typing support, adherence to Ruff’s rules, extensibility, and minimal dependencies.

UniDep proves particularly useful when setting up complete development environments that require both Python and non-Python dependencies, such as CUDA, compilers, etc. Its one-command installation and support for various platforms make it a valuable tool in fields like research, data science, robotics, AI, and ML.

Real-World Application

UniDep shines in monorepos with multiple dependent projects, although many such projects are private. A public example, home-assistant-streamdeck-yaml, showcases UniDep’s efficiency in handling system dependencies across different platforms.

UniDep emerges as a powerful ally for developers seeking simplicity and efficiency in Python dependency management. Whether you prefer Conda or Pip, UniDep streamlines the process, making it an essential tool for anyone dealing with complex development environments. Try UniDep now and witness a significant boost in your development process.

The post Meet UniDep: A Tool that Streamlines Python Project Dependency Management by Unifying Conda and Pip Packages in a Single System appeared first on MarkTechPost.

This AI Paper from Stanford and Google DeepMind Unveils How Efficient …

Artificial intelligence has seen remarkable advancements with the development of large language models (LLMs). Thanks to techniques like reinforcement learning from human feedback (RLHF), they have significantly improved performing various tasks. However, the challenge lies in synthesizing novel content solely based on human feedback.

One of the core challenges in advancing LLMs is optimizing their learning process from human feedback. This feedback is obtained through a process where models are presented with prompts and generate responses, with human raters indicating their preferences. The goal is to refine the models’ responses to align more closely with human preferences. However, this method requires many interactions, posing a bottleneck for rapid model improvement.

Current methodologies for training LLMs involve passive exploration, where models generate responses based on predefined prompts without actively seeking to optimize the learning from feedback. One such approach is to use Thompson sampling, where queries are generated based on uncertainty estimates represented by an epistemic neural network (ENN). The choice of exploration scheme is critical, and double Thompson sampling has shown effective in generating high-performing queries. Others include Boltzmann Exploration and Infomax. While these methods have been instrumental in the initial stages of LLM development, they must be optimized for efficiency, often requiring an impractical number of human interactions to achieve notable improvements. 

Researchers at Google Deepmind and Stanford University have introduced a novel approach to active exploration, utilizing double Thompson sampling and ENN for query generation. This method allows the model to actively seek out feedback that is most informative for its learning, significantly reducing the number of queries needed to achieve high-performance levels. The ENN provides uncertainty estimates that guide the exploration process, enabling the model to make more informed decisions on which queries to present for feedback.

In the experimental setup, agents generate responses to 32 prompts, forming queries evaluated by a preference simulator. The feedback is used to refine their reward models at the end of each epoch. Agents explore the response space by selecting the most informative pairs from a pool of 100 candidates, utilizing a multi-layer perceptron (MLP) architecture with two hidden layers of 128 units each or an ensemble of 10 MLPs for epistemic neural networks (ENN).

The results highlight the effectiveness of double Thompson sampling (TS) over other exploration methods like Boltzmann exploration and infomax, especially in utilizing uncertainty estimates for improved query selection. While Boltzmann’s exploration shows promise at lower temperatures, double TS consistently outperforms others by making better use of uncertainty estimates from the ENN reward model. This approach accelerates the learning process and demonstrates the potential for efficient exploration to dramatically reduce the volume of human feedback required, marking a significant advance in training large language models.

In conclusion, this research showcases the potential for efficient exploration to overcome the limitations of traditional training methods. The team has opened new avenues for rapid and effective model enhancement by leveraging advanced exploration algorithms and uncertainty estimates. This approach promises to accelerate innovation in LLMs and highlights the importance of optimizing the learning process for the broader advancement of artificial intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper from Stanford and Google DeepMind Unveils How Efficient Exploration Boosts Human Feedback Efficacy in Enhancing Large Language Models appeared first on MarkTechPost.

CMU Researchers Introduce VisualWebArena: An AI Benchmark Designed to …

The field of Artificial Intelligence (AI) has always had a long-standing goal of automating everyday computer operations using autonomous agents. Basically, the web-based autonomous agents with the ability to reason, plan, and act are a potential way to automate a variety of computer operations. However, the main obstacle to accomplishing this goal is creating agents that can operate computers with ease, process textual and visual inputs, understand complex natural language commands, and carry out activities to accomplish predetermined goals. The majority of currently existing benchmarks in this area have predominantly concentrated on text-based agents.

In order to address these challenges, a team of researchers from Carnegie Mellon University has introduced VisualWebArena, a benchmark designed and developed to evaluate the performance of multimodal web agents on realistic and visually stimulating challenges. This benchmark includes a wide range of complex web-based challenges that assess several aspects of autonomous multimodal agents’ abilities.

In VisualWebArena, agents are required to read image-text inputs accurately, decipher natural language instructions, and perform activities on websites in order to accomplish user-defined goals. A comprehensive assessment has been carried out on the most advanced Large Language Model (LLM)–based autonomous agents, which include many multimodal models. Text-only LLM agents have been found to have certain limitations through both quantitative and qualitative analysis. The gaps in the capabilities of the most advanced multimodal language agents have also been disclosed, thus offering insightful information.

The team has shared that VisualWebArena consists of 910 realistic activities in three different online environments, i.e., Reddit, Shopping, and Classifieds. While the Shopping and Reddit environments are carried over from WebArena, the Classifieds environment is a new addition to real-world data. Unlike WebArena, which does not have this visual need, all challenges offered in VisualWebArena are notable for being visually anchored and requiring a thorough grasp of the content for effective resolution. Since images are used as input, about 25.2% of the tasks require understanding interleaving.

The study has thoroughly compared the current state-of-the-art Large Language Models and Vision-Language Models (VLMs) in terms of their autonomy. The results have demonstrated that powerful VLMs outperform text-based LLMs on VisualWebArena tasks. The highest-achieving VLM agents have shown to attain a success rate of 16.4%, which is significantly lower than the human performance of 88.7%.

An important discrepancy between open-sourced and API-based VLM agents has also been found, highlighting the necessity of thorough assessment metrics. A unique VLM agent has also been suggested, which draws inspiration from the Set-of-Marks prompting strategy. This new approach has shown significant performance benefits, especially on graphically complex web pages, by streamlining the action space. By addressing the shortcomings of LLM agents, this VLM agent has offered a possible way to improve the capabilities of autonomous agents in visually complex web contexts.

In conclusion, VisualWebArena is an amazing solution for providing a framework for assessing multimodal autonomous language agents as well as offering knowledge that may be applied to the creation of more powerful autonomous agents for online tasks.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post CMU Researchers Introduce VisualWebArena: An AI Benchmark Designed to Evaluate the Performance of Multimodal Web Agents on Realistic and Visually Stimulating Challenges appeared first on MarkTechPost.

Tiny Titans Triumph: The Surprising Efficiency of Compact LLMs Exposed …

In the rapidly advancing field of natural language processing (NLP), the advent of large language models (LLMs) has significantly transformed. These models have shown remarkable success in understanding and generating human-like text across various tasks without specific training. However, the deployment of such models in real-world scenarios is often hindered by their substantial demand for computational resources. This challenge has prompted researchers to explore the efficacy of smaller, more compact LLMs in tasks such as meeting summarization, where the balance between performance and resource utilization is crucial.

Traditionally, text summarization, particularly meeting transcripts, has relied on models requiring large annotated datasets and significant computational power for training. While these models achieve impressive results, their practical application is limited due to the high costs associated with their operation. Recognizing this barrier, a recent study explored whether smaller LLMs could serve as a viable alternative to their larger counterparts. This research focused on the industrial application of meeting summarization, comparing the performance of fine-tuned compact LLMs, such as FLAN-T5, TinyLLaMA, and LiteLLaMA, against zero-shot larger LLMs.

The study’s methodology was thorough, employing a range of compact and larger LLMs in an extensive evaluation. The compact models were fine-tuned on specific datasets, while the larger models were tested in a zero-shot manner, meaning they were not specifically trained on the task at hand. This approach allowed for directly comparing the models’ abilities to summarize meeting content accurately and efficiently.

Remarkably, the research findings indicated that certain compact LLMs, notably FLAN-T5, could match or even surpass the performance of larger LLMs in summarizing meetings. FLAN-T5, with its 780M parameters, demonstrated comparable or superior results to larger LLMs with parameters ranging from 7B to over 70B. This revelation points to the potential of compact LLMs to offer a cost-effective solution for NLP applications, striking an optimal balance between performance and computational demand.

The performance evaluation highlighted FLAN-T5’s exceptional capability in the meeting summarization task. For instance, FLAN-T5’s performance was on par with, if not better, many larger zero-shot LLMs, underscoring its efficiency and effectiveness. This result highlights the potential of compact models to revolutionize how we deploy NLP solutions in real-world settings, particularly in scenarios where computational resources are limited.

In conclusion, the exploration into the viability of compact LLMs for meeting summarization tasks has unveiled promising prospects. The standout performance of models like FLAN-T5 suggests that smaller LLMs can punch above their weight, offering a feasible alternative to their larger counterparts. This breakthrough has significant implications for deploying NLP technologies, indicating a path forward where efficiency and performance go hand in hand. As the field continues to evolve, the role of compact LLMs in bridging the gap between cutting-edge research and practical application will undoubtedly be a focal point of future studies.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Tiny Titans Triumph: The Surprising Efficiency of Compact LLMs Exposed! appeared first on MarkTechPost.

This AI Paper Introduces PirateNets: A Novel AI System Designed to Fac …

With the world of computational science continually evolving, physics-informed neural networks (PINNs) stand out as a groundbreaking approach for tackling forward and inverse problems governed by partial differential equations (PDEs). These models incorporate physical laws into the learning process, promising a significant leap in predictive accuracy and robustness. 

But as PINNs grow in depth and complexity, their performance paradoxically declines. This counterintuitive phenomenon stems from the intricacies of multi-layer perceptron (MLP) architectures and their initialization schemes, often leading to poor trainability and unstable results.

Current physics-informed machine learning methodologies include refining neural network architecture, enhancing training algorithms, and employing specialized initialization techniques. Despite these efforts, the search for an optimal solution remains ongoing. Efforts such as embedding symmetries and invariances into models and formulating tailored loss functions have been pivotal.

A team of researchers from the University of Pennsylvania, Duke University, and North Carolina State University have introduced Physics-Informed Residual Adaptive Networks (PirateNets), an architecture designed to harness the full potential of deep PINNs. By submitting adaptive residual connections, PirateNets offers a dynamic framework that allows the model to start as a shallow network and progressively deepen during training. This innovative approach addresses the initialization challenges and enhances the network’s capacity to learn and generalize from physical laws.

PirateNets integrates random Fourier features as an embedding function to mitigate spectral bias and efficiently approximate high-frequency solutions. This architecture employs dense layers augmented with gating operations across each residual block, where the forward pass involves point-wise activation functions coupled with adaptive residual connections. Key to their design, trainable parameters within the skip connections modulate each block’s nonlinearity, culminating in the network’s final output being a linear amalgamation of initial layer embeddings. At inception, PirateNets resemble a linear blend of basis functions, enabling inductive bias control. This setup facilitates an optimal initial guess for the network, leveraging data from diverse sources to overcome deep network initialization challenges inherent in PINNs.

The effectiveness of PirateNet is validated through rigorous benchmarks, outshining Modified MLP with its sophisticated architecture. Utilizing random Fourier features for coordinate embedding and employing Modified MLP as the backbone, enhanced by random weight factorization (RWF) and Tanh activation, PirateNet adheres to exact periodic boundary conditions. The training uses mini-batch gradient descent with Adam optimizer, incorporating a learning rate schedule of warm-up and exponential decay. PirateNet demonstrates superior performance and faster convergence across benchmarks, achieving record-breaking results for the Allen-Cahn and Korteweg–De Vries equations. Ablation studies further confirm its scalability, robustness, and the effectiveness of its components, solidifying PirateNet’s prowess in effectively addressing complex, nonlinear problems.

In conclusion, the development of PirateNets signifies a remarkable achievement in computational science. PirateNets paves the way for more accurate and robust predictive models by integrating physical principles with deep learning. This research addresses the inherent challenges of PINNs and opens new routes for scientific exploration, promising to revolutionize our approach to solving complex problems governed by PDEs.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper Introduces PirateNets: A Novel AI System Designed to Facilitate Stable and Efficient Training of Deep Physics-Informed Neural Network Models appeared first on MarkTechPost.

Stanford Researchers Introduce RAPTOR: A Novel Tree-based Retrieval Sy …

Retrieval-augmented language models often retrieve only short chunks from a corpus, limiting overall document context. This decreases their ability to adapt to changes in the world state and incorporate long-tail knowledge. Existing retrieval-augmented approaches also need fixing. The one we tackle is that most existing methods retrieve only a few short, contiguous text chunks, which limits their ability to represent and leverage large-scale discourse structure. This is particularly relevant for thematic questions that require integrating knowledge from multiple text parts, such as understanding an entire book.

Recent developments in Large Language Models (LLMs) demonstrate their effectiveness as standalone knowledge stores, encoding facts within their parameters. Fine-tuning downstream tasks further enhances their performance. However, challenges arise in updating LLMs with evolving world knowledge. An alternative approach involves indexing text in an information retrieval system and presenting retrieved information to LLMs for current domain-specific knowledge. Existing retrieval-augmented methods are limited to retrieving only short, contiguous text chunks, hindering the representation of large-scale discourse structure, which is crucial for thematic questions and a comprehensive understanding of texts like in the NarrativeQA dataset.

The researchers from Stanford University propose RAPTOR, an innovative indexing and retrieval system designed to address limitations in existing methods. RAPTOR utilizes a tree structure to capture a text’s high-level and low-level details. It clusters text chunks, generates summaries for clusters, and constructs a tree from the bottom up. This structure enables loading different levels of text chunks into LLMs context, facilitating efficient and effective answering of questions at various levels. The key contribution is using text summarization for retrieval augmentation, enhancing context representation across different scales, as demonstrated in experiments on long document collections.

RAPTOR addresses reading semantic depth and connection issues by constructing a recursive tree structure that captures both broad thematic comprehension and granular details. The process involves segmenting the retrieval corpus into chunks, embedding them using SBERT, and clustering them with a soft clustering algorithm based on Gaussian Mixture Models (GMMs) and Uniform Manifold Approximation and Projection (UMAP). The resulting tree structure allows for efficient querying through tree traversal or a collapsed tree approach, enabling retrieval of relevant information at different levels of specificity.

RAPTOR outperforms baseline methods across three question-answering datasets: NarrativeQA, QASPER, and QuALITY. Control comparisons using UnifiedQA 3B as the reader show consistent superiority of RAPTOR over BM25 and DPR. Paired with GPT-4, RAPTOR achieves state-of-the-art results on QASPER and QuALITY datasets, showcasing its effectiveness in handling thematic and multi-hop queries. The contribution of the tree structure is validated, demonstrating the significance of upper-level nodes in capturing a broader understanding and enhancing retrieval capabilities.

In conclusion, Stanford University researchers introduce RAPTOR, an innovative tree-based retrieval system that enhances the knowledge of large language models with contextual information across different abstraction levels. RAPTOR constructs a hierarchical tree structure through recursive clustering and summarization, facilitating the effective synthesis of information from diverse sections of retrieval corpora. Controlled experiments showcase RAPTOR’s superiority over traditional methods, establishing new benchmarks in various question-answering tasks. Overall, RAPTOR proves to be a promising approach for advancing the capabilities of language models through enhanced contextual retrieval.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Stanford Researchers Introduce RAPTOR: A Novel Tree-based Retrieval System that Augments the Parametric Knowledge of LLMs with Contextual Information appeared first on MarkTechPost.