Meta AI Launches Massively Multilingual Speech (MMS) Project: Introduc …

Significant advancements in speech technology have been made over the past decade, allowing it to be incorporated into various consumer items. It takes a lot of labeled data, in this case, many thousands of hours of audio with transcriptions, to train a good machine learning model for such jobs. This information only exists in some languages. For instance, out of the 7,000+ languages in use today, only about 100 are supported by current voice recognition algorithms. 

Recently, the amount of labeled data needed to construct speech systems have been drastically reduced because of self-supervised speech representations. Despite progress, major current efforts still only cover around 100 languages. 

Facebook’s Massively Multilingual Speech (MMS) project combines wav2vec 2.0 with a new dataset that contains labeled data for over 1,100 languages and unlabeled data for almost 4,000 languages to address some of these obstacles. Based on their findings, the Massively Multilingual Speech models are superior to the state-of-the-art methods and support ten times as many languages. 

Since the greatest available speech datasets only include up to 100 languages, their initial goal was to collect audio data for hundreds of languages. As a result, they looked to religious writings like the Bible, which have been translated into many languages and whose translations have been extensively examined for text-based language translation research. People have recorded themselves reading these translations and made the audio files available online. This research compiled a collection of New Testament readings in over 1,100 languages, yielding an average of 32 hours of data per language.

Their investigation reveals that the proposed models perform similarly well for male and female voices, even though this data is from a specific domain and is typically read by male speakers. Even though the recordings are religious, the research indicates that this does not unduly bias the model toward producing more religious language. According to the researchers, this is because they employ a Connectionist Temporal Classification strategy, which is more limited than large language models (LLMs) or sequence-to-sequence models for voice recognition.

The team preprocessed tha data by combining a highly efficient forced alignment approach that can handle recordings that are 20 minutes or longer with an alignment model that was trained using data from over 100 different languages. To eliminate possibly skewed information, they used numerous iterations of this procedure plus a cross-validation filtering step based on model accuracy. They integrated the alignment technique into PyTorch and made the alignment model publicly available so that other academics may use it to generate fresh speech datasets.

There is insufficient information to train traditional supervised speech recognition models with only 32 hours of data per language. The team relied on wav2vec 2.0 to train effective systems, drastically decreasing the quantity of previously required labeled data. Specifically, they used over 1,400 unique languages to train self-supervised models on over 500,000 hours of voice data, approximately five times more languages than any previous effort. 

The researchers employed pre-existing benchmark datasets like FLEURS to assess the performance of models trained on the Massively Multilingual Speech data. Using a 1B parameter wav2vec 2.0 model, they trained a multilingual speech recognition system on over 1,100 languages. The performance degrades slightly as the number of languages grows: The character mistake rate only goes up by roughly 0.4% from 61 to 1,107 languages, while the language coverage goes up by nearly 18 times.

Comparing the Massively Multilingual Speech data to OpenAI’s Whisper, the researchers discovered that models trained on the former achieve half the word error rate. At the same time, the latter covers 11 times as many languages. This illustrates that the model can compete favorably with the state-of-the-art in voice recognition.

The team also used their datasets and publicly available datasets like FLEURS and CommonVoice to train a language identification (LID) model for more than 4,000 languages. Then it tested it on the FLEURS LID challenge. The findings show that performance is still excellent even when 40 times as many languages are supported. They also developed speech synthesis systems for more than 1,100 languages. The majority of existing text-to-speech algorithms are trained on single-speaker voice datasets. 

The team foresees a world where one model can handle many speech tasks across all languages. While they did train individual models for each task—recognition, synthesis, and identification of language—they believe that in the future, a single model will be able to handle all of these functions and more, improving performance in every area.

Check out the Paper, Blog, and Github Link. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages appeared first on MarkTechPost.

TU Delft Researchers Introduce a New Approach to Enhance the Performan …

Researchers have developed an innovative method to enhance visual recognition systems by densifying feature points within images. This approach shows great promise in computer vision, offering improved efficiency and accuracy in various applications like image processing and object detection.

The new approach, known as densification, aims to overcome the limitations of traditional visual recognition models that often struggle to identify objects in complex or crowded scenes. Densification involves increasing the density of feature points within an image, providing a more comprehensive representation of its content.

The implementation of densification involves a multi-step process. First, the input image is captured, and critical feature points are extracted using existing algorithms. These feature points are then used to generate a dense point cloud representation, which contains a more significant number of densely distributed feature points than traditional sparse feature point methods.

The researchers developed a specialized deep learning architecture called the DenseNet to leverage the dense point cloud representation. This model consists of multiple layers that progressively refine the extracted features, leading to more accurate recognition and classification of objects within the image.

Experimental results have demonstrated the advantages of the densification approach. It has shown higher accuracy rates and better overall performance than conventional sparse feature point methods, particularly in challenging scenarios. The dense point cloud representation has also improved robustness against occlusions, clutter, and varying viewpoints.

Densification has the potential to revolutionize various applications in visual recognition. Autonomous driving, for example, can enhance object detection capabilities, allowing vehicles to better identify and respond to pedestrians, cyclists, and other vehicles in real time. In surveillance systems, densification can improve object recognition accuracy in crowded areas, reducing false alarms and enhancing security measures.

The benefits of densification extend beyond traditional computer vision domains. Its ability to recognize and classify objects within complex scenes makes it suitable for robotics, industrial automation, and augmented reality applications. By providing more precise and comprehensive visual information, densification improves the performance and reliability of these systems.

 Future investigations may explore different deep learning architectures, refine feature extraction algorithms, and expand the densification scope to other visual recognition areas.

In conclusion, densification offers a promising advancement in visual recognition systems. Increasing the density of feature points within images enhances accuracy, robustness, and overall object identification and classification performance. Its potential applications in computer vision, autonomous systems, surveillance, robotics, and other fields are vast. Ongoing research will likely uncover further advancements and practical implementations of densification shortly.

Check out the Paper. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post TU Delft Researchers Introduce a New Approach to Enhance the Performance of Deep Learning Algorithms for VPR Applications appeared first on MarkTechPost.

Best Free AI Logo Makers of 2023

Your logo is the first thing your consumers or prospective customers in the target audience see often. A logo’s visual appeal is crucial to conveying the professionalism and trustworthiness of a company.

The logo represents your firm in the marketplace. Because of this, a logo that was thoughtfully created and effectively conveys your brand values is essential. The sort of company you want to launch is mostly irrelevant. The logo and overall layout will eventually need your attention. This is relevant for many enterprises, including blogging, coaching, and online sales.

Designing an effective logo requires some talent. This is because creating a logo that effectively represents your business is a labor-intensive process. Fortunately, today’s AI tools can provide a hand. Logos may be created quickly using today’s AI tools. Locate an appropriate AI logo generator, feed it some text, then wait a minute.

You may find a variety of logo creation tools here.

Logomaster.ai

Businesses can now generate stunning logos in seconds with the help of Logomaster.ai, an AI-powered online logo builder. Logomaster.ai’s intuitive interface means anybody can use it, regardless of design experience. It’s ideal for freelancers, solopreneurs, and small enterprises because of its 100+ gorgeous templates and AI capability. If you need a logo but want to save money on something other than a designer, use Logomaster.ai instead. Logomaster.ai’s logos are free to use for any purpose, whether commercial or not. Over 3000 businesses throughout the globe rely on it, and it has a 4.7-star rating based on 1958 reviews from satisfied customers. For companies who want to utilize their logo in both print and digital media, Logomaster.ai provides a professional logo package that contains everything a designer would supply.

LogoAI 

LogoAI is a branding tool that streamlines the processes of logo design, identity creation, and brand marketing via automation. An AI-driven engine processes logo information, design standards, and brand development guidelines. LogoAI creates logos in any size or format and coordinates documents, including letterhead, business cards, social media posts, posters, and flyers. It provides several logo examples and layouts that may be modified to suit specific requirements. A brand hub on the platform allows instant activation of a unique brand bundle. LogoAI provides a quick and simple method for creating a strong brand identity.

Looka 

Create a polished logo and brand identity with no effort with Looka, an AI-powered platform. Looka, the rebranded version of Logojoy, is available without cost. First, a Logo Maker employs artificial intelligence to produce hundreds of logo prototypes depending on user input rapidly. The user may then alter the layout to their liking. The Brand Kit generates promotional materials using the provided logo, color scheme, and typefaces. Business cards, social media profiles, email signatures, and more template options are available in the Brand Kit. Users of Looka, a platform driven by artificial intelligence, may alter their profile and cover images across many social media platforms, including YouTube, Twitter, and Facebook.

GraphicSprings 

Logos and other visuals may be made quickly and simply using GraphicSprings, a free artificial intelligence logo generator. Its drag-and-drop user interface makes creating a professional-looking logo or designs easy. You may get free access to many photos, use them in your designs, and locate plenty of design-related information in their collection. Your logo’s hues, typefaces, and text boxes may all be altered to your liking. The application also includes several industry-specific templates that may be used in fields as varied as medicine, cosmetics, manufacturing, and more. 

Canva 

One of the top artificial intelligence logo-generating tools is now available inside Canva. A logo for your company may be made quickly and simply from one of more than a thousand available templates. The UI is straightforward enough for newcomers to use with ease. Logos may be created and edited using a variety of tools, making the process accessible to those with less familiarity with graphic design programs like Adobe Photoshop. Canva is not limited to creating graphics for social media or blog headers. 

Logoshi 

Logoshi is an AI-powered logo generator for making custom logos, icons, and more. You may start making logos immediately without worrying about learning how to utilize them. They provide a user-friendly interface that guides you through the many steps of the creative process to maximize your productivity and inspire you to design a stunning logo. It’s not only logos; you can also make AI-generated profile pictures and zoom backdrops. Logoshi provides the tools necessary to create a successful brand.

DesignMantic 

Do you prefer a logo with a cube or an infinity symbol? With DesignMantic’s free artificial intelligence logo generator, you can create a high-quality, up-to-date logo in minutes, no matter your level of design expertise. Many different pre-made logos are available, and you may customize them to your heart’s content by swapping out icons and adding your text. In addition to being searchable by keywords and industries, their designs are readily editable in their free logo builder studio. Their free logo maker is a treasure trove of logo design ideas for every industry, from IT to fashion.

Hatchful 

Hatchful is an AI-powered logo generator that may help you develop fresh ideas for your company’s branding. The e-commerce platform Shopify, which powers this online logo creator, is among the most widely used worldwide. It offers various services and products, including web development, marketing automation, and e-commerce. Users may choose from hundreds of available templates on Hatchful and tailor them to their own company. Because the tools are so straightforward, you can get right in without thinking about the technical side.

Logo Garden

Logo Garden is an automated logo maker that can create a logo for your company at no cost. The design of the page is basic and straightforward. Navigation is also clear. Logos may be produced in seconds and adjusted to meet the user’s needs. Once within their logo creator tool, you may customize the logo by selecting different symbols, fonts, and colors. Thanks to their simple drag-and-drop interface, you can create beautiful designs with these programs even if you’ve never touched a graphics editor before. In addition, they provide creative advice to help you create a logo representing your business.  

Ucraft

Ucraft’s free logo AI maker makes it simple to change an existing logo and save the finished product for later use. Using the Ucraft logo maker’s intuitive interface, you can choose from over a million symbols and hundreds of shapes, fonts, and colors to create a custom logo. If you’re looking for inspiration or step-by-step instructions on creating a logo that stands out, they provide articles like that.

Adobe Express

If you need a high-quality logo but need help to afford to hire a designer, try Adobe Express, a free AI logo maker. You may have a logo designed for you by artificial intelligence using one of the many available templates. If you feel that tweaking is necessary, use the offered tools to change the color scheme or fonts. Alternatively, you may use Adobe’s vector-based tools to create your own. In addition, you may add motion to your logos for a livelier display. High-quality logos created by AI may also be shared throughout your various digital and printed channels in Adobe Express. 

Zyro

Zyro AI’s free logo generator is an easy-to-use internet program for making unique logos. Artificial intelligence algorithms are taught hundreds of existing logos to build a new design with the same visual characteristics and typography. The procedure is straightforward: choose a logo style and then decide on the number of colors you’d want to use. The next step is to combine your design by picking out various forms and varieties of text. Once completed, save your design as an image file and call it a day.

Namecheap

Namecheap’s Logo Maker is a free, limitless logo generator with a simple design wizard. This program employs AI and machine learning to create a logo from your responses to their design questions. The design’s overall appearance is up to you, right down to the font, typeface, and color scheme. Use this information for personal and professional use if you want to save costs. 

Tailor Brands

Tailor Brands provides everything you need to create custom graphics and logos for your business. The tool’s intuitive design ensures a speedy learning curve. A logo and design may be presented to you in under two minutes. AI algorithms are the brains behind the tool. The logo maker works by having you enter information about your business, your preferred design aesthetic, and any icons you may want to include. The logo generator then produces many logos from which to choose.

Stable Diffusion

Stable Diffusion is a unique new artificial intelligence art generator that can visualize words. You may use this program to make everything from logos to cartoons to anime to photographs of landscapes. Remember that stable diffusion is about more than making AI logos. It’s a testing ground for logos, posters, and other visual works of art. Stable Diffusion is a method that may be used to create visual representations of text. However, a logo designed in SD cannot be exported in a vector format. The program isn’t concerned with file formats, editing features, or the like since its only purpose is to translate ideas into reality.

DALL-E

DALL-E is another cutting-edge AI program that can create pictures from words. OpenAI, the business that developed DALL-E, is a well-funded research lab and corporation on the verge of transforming the artificial intelligence (AI) industry. You may utilize this program to make your wildest dreams a reality. With only a few words as input, this program can whip up paintings, cartoons, logos, and other graphic design work. Despite its many convenient vector export features, this is more than just a machine-learning logo creator. Instead, it’s a multi-purpose artificial art generator.

Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Best Free AI Logo Makers of 2023 appeared first on MarkTechPost.

Amazon SageMaker XGBoost now offers fully distributed GPU training

Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.
The SageMaker XGBoost algorithm allows you to easily run XGBoost training and inference on SageMaker. XGBoost (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm performs well in ML competitions because of its robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can fine-tune. You can use XGBoost for regression, classification (binary and multiclass), and ranking problems. You can use GPUs to accelerate training on large datasets.
Today, we are happy to announce that SageMaker XGBoost now offers fully distributed GPU training.
Starting with version 1.5-1 and above, you can now utilize all GPUs when using multi-GPU instances. The new feature addresses your needs to use fully distributed GPU training when dealing with large datasets. This means being able to use multiple Amazon Elastic Compute Cloud (Amazon EC2) instances (GPU) and using all GPUs per instance.
Distributed GPU training with multi-GPU instances
With SageMaker XGBoost version 1.2-2 or later, you can use one or more single-GPU instances for training. The hyperparameter tree_method needs to be set to gpu_hist. When using more than one instance (distributed setup), the data needs to be divided among instances as follows (the same as the non-GPU distributed training steps mentioned in XGBoost Algorithm). Although this option is performant and can be used in various training setups, it doesn’t extend to using all GPUs when choosing multi-GPU instances such as g5.12xlarge.
With SageMaker XGBoost version 1.5-1 and above, you can now use all GPUs on each instance when using multi-GPU instances. The ability to use all GPUs in multi-GPU instance is offered by integrating the Dask framework.
You can use this setup to complete training quickly. Apart from saving time, this option will also be useful to work around blockers such as maximum usable instance (soft) limits, or if the training job is unable to provision a large number of single-GPU instances for some reason.
The configurations to use this option are the same as the previous option, except for the following differences:

Add the new hyperparameter use_dask_gpu_training with string value true.
When creating TrainingInput, set the distribution parameter to FullyReplicated, whether using single or multiple instances. The underlying Dask framework will carry out the data load and split the data among Dask workers. This is different from the data distribution setting for all other distributed training with SageMaker XGBoost.

Note that splitting data into smaller files still applies for Parquet, where Dask will read each file as a partition. Because you’ll have a Dask worker per GPU, the number of files should be greater than instance count * GPU count per instance. Also, making each file too small and having a very large number of files can degrade performance. For more information, see Avoid Very Large Graphs. For CSV, we still recommend splitting up large files into smaller ones to reduce data download time and enable quicker reads. However, it’s not a requirement.
Currently, the supported input formats with this option are:

text/csv
application/x-parquet

The following input mode is supported:

File mode

The code will look similar to the following:

import os
import boto3
import re
import sagemaker
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput
from sagemaker.xgboost.estimator import XGBoost

role = sagemaker.get_execution_role()
region = sagemaker.Session().boto_region_name
session = Session()

bucket = “<Specify S3 Bucket>”
prefix = “<Specify S3 prefix>”

hyperparams = {
“objective”: “reg:squarederror”,
“num_round”: “500”,
“verbosity”: “3”,
“tree_method”: “gpu_hist”,
“eval_metric”: “rmse”,
“use_dask_gpu_training”: “true”
}

output_path = “s3://{}/{}/output”.format(bucket, prefix)

content_type = “application/x-parquet”
instance_type = “ml.g4dn.2xlarge”

xgboost_container = sagemaker.image_uris.retrieve(“xgboost”, region, “1.5-1″)
xgb_script_mode_estimator = sagemaker.estimator.Estimator(
image_uri=xgboost_container,
hyperparameters=hyperparams,
role=role,
instance_count=1,
instance_type=instance_type,
output_path=output_path,
max_run=7200,

)

test_data_uri = ” <specify the S3 uri for training dataset>”
validation_data_uri = “<specify the S3 uri for validation dataset>”

train_input = TrainingInput(
test_data_uri, content_type=content_type
)

validation_input = TrainingInput(
validation_data_uri, content_type=content_type
)

xgb_script_mode_estimator.fit({“train”: train_input, “validation”: validation_input})

The following screenshots show a successful training job log from the notebook.

Benchmarks
We benchmarked evaluation metrics to ensure that the model quality didn’t deteriorate with the multi-GPU training path compared to single-GPU training. We also benchmarked on large datasets to ensure that our distributed GPU setups were performant and scalable.
Billable time refers to the absolute wall-clock time. Training time is only the XGBoost training time, measured from the train() call until the model is saved to Amazon Simple Storage Service (Amazon S3).
Performance benchmarks on large datasets
The use of multi-GPU is usually appropriate for large datasets with complex training. We created a dummy dataset with 2,497,248,278 rows and 28 features for testing. The dataset was 150 GB and composed of 1,419 files. Each file was sized between 105–115 MB. We saved the data in Parquet format in an S3 bucket. To simulate somewhat complex training, we used this dataset for a binary classification task, with 1,000 rounds, to compare performance between the single-GPU training path and the multi-GPU training path.
The following table contains the billable training time and performance comparison between the single-GPU training path and the multi-GPU training path.

Single-GPU Training Path

Instance Type
Instance Count
Billable Time / Instance(s)
Training Time(s)

g4dn.xlarge
20
Out of Memory

g4dn.2xlarge
20
Out of Memory

g4dn.4xlarge
15
1710
1551.9

16
1592
1412.2

17
1542
1352.2

18
1423
1281.2

19
1346
1220.3

Multi-GPU Training Path (with Dask)

Instance Type
Instance Count
Billable Time / Instance(s)
Training Time(s)

g4dn.12xlarge
7
Out of Memory

8
1143
784.7

9
1039
710.73

10
978
676.7

12
940
614.35

We can see that using multi-GPU instances results in low training time and low overall time. The single-GPU training path still has some advantage in downloading and reading only part of the data in each instance, and therefore low data download time. It also doesn’t suffer from Dask’s overhead. Therefore, the difference between training time and total time is smaller. However, due to using more GPUs, multi-GPU setup can cut training time significantly.
You should use an EC2 instance that has enough compute power to avoid out of memory errors when dealing with large datasets.
It’s possible to reduce total time further with the single-GPU setup by using more instances or more powerful instances. However, in terms of cost, it might be more expensive. For example, the following table shows the training time and cost comparison with a single-GPU instance g4dn.8xlarge.

Single-GPU Training Path

Instance Type
Instance Count
Billable Time / Instance(s)
Cost ($)

g4dn.8xlarge
15
1679
15.22

17
1509
15.51

19
1326
15.22

Multi-GPU Training Path (with Dask)

Instance Type
Instance Count
Billable Time / Instance(s)
Cost ($)

g4dn.12xlarge
8
1143
9.93

10
978
10.63

12
940
12.26

Cost calculation is based on the On-Demand price for each instance. For more information, refer to Amazon EC2 G4 Instances.
Model quality benchmarks
For model quality, we compared evaluation metrics between the Dask GPU option and the single-GPU option, and ran training on various instance types and instance counts. For different tasks, we used different datasets and hyperparameters, with each dataset split into training, validation, and test sets.
For a binary classification (binary:logistic) task, we used the HIGGS dataset in CSV format. The training split of the dataset has 9,348,181 rows and 28 features. The number of rounds used was 1,000. The following table summarizes the results.

Multi-GPU Training with Dask

Instance Type
Num GPUs / Instance
Instance Count
Billable Time / Instance(s)
Accuracy %
F1 %
ROC AUC %

g4dn.2xlarge
1
1
343
75.97
77.61
84.34

g4dn.4xlarge
1
1
413
76.16
77.75
84.51

g4dn.8xlarge
1
1
413
76.16
77.75
84.51

g4dn.12xlarge
4
1
157
76.16
77.74
84.52

For regression (reg:squarederror), we used the NYC green cab dataset (with some modifications) in Parquet format. The training split of the dataset has 72,921,051 rows and 8 features. The number of rounds used was 500. The following table shows the results.

Multi-GPU Training with Dask

Instance Type
Num GPUs / Instance
Instance Count
Billable Time / Instance(s)
MSE
R2
MAE

g4dn.2xlarge
1
1
775
21.92
0.7787
2.43

g4dn.4xlarge
1
1
770
21.92
0.7787
2.43

g4dn.8xlarge
1
1
705
21.92
0.7787
2.43

g4dn.12xlarge
4
1
253
21.93
0.7787
2.44

Model quality metrics are similar between the multi-GPU (Dask) training option and the existing training option. Model quality remains consistent when using a distributed setup with multiple instances or GPUs.
Conclusion
In this post, we gave an overview of how you can use different instance type and instance count combinations for distributed GPU training with SageMaker XGBoost. For most use cases, you can use single-GPU instances. This option provides a wide range of instances to use and is very performant. You can use multi-GPU instances for training with large datasets and lots of rounds. It can provide quick training with a smaller number of instances. Overall, you can use SageMaker XGBoost’s distributed GPU setup to immensely speed up your XGBoost training.
To learn more about SageMaker and distributed training using Dask, check out Amazon SageMaker built-in LightGBM now offers distributed training using Dask

About the Authors
Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.
Dewan Choudhury is a Software Development Engineer with Amazon Web Services. He works on Amazon SageMaker’s algorithms and JumpStart offerings. Apart from building AI/ML infrastructures, he is also passionate about building scalable distributed systems.
Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.
Tony Cruz

Analyze Amazon SageMaker spend and determine cost optimization opportu …

In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support plan. Since its introduction, we have helped hundreds of customers optimize their workloads, set guardrails, and improve visibility of their machine learning (ML) workloads’ cost and usage.
In this series of posts, we share lessons learned about optimizing costs in Amazon SageMaker. In Part 1, we showed how to get started using AWS Cost Explorer to identify cost optimization opportunities in SageMaker. In this post, we focus on SageMaker inference environments: real-time inference, batch transform, asynchronous inference, and serverless inference.

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage:

Part 1: Overview
Part 2: SageMaker notebooks and SageMaker Studio
Part 3: Processing and Data Wrangler jobs
Part 4: Training jobs
Part 5: Hosting

SageMaker offers multiple inference options for you to pick from based on your workload requirements:

Real-time inference for online, low latency, or high throughput requirements
Batch transform for offline, scheduled processing and when you don’t need a persistent endpoint
Asynchronous inference for when you have large payloads with long processing times and want to queue requests
Serverless inference for when you have intermittent or unpredictable traffic patterns and can tolerate cold starts

In the following sections, we discuss each inference option in more detail.
SageMaker real-time inference
When you create an endpoint, SageMaker attaches an Amazon Elastic Block Store (Amazon EBS) storage volume to the Amazon Elastic Compute Cloud (Amazon EC2) instance that hosts the endpoint. This is true for all instance types that don’t come with a SSD storage. Because the d* instance types come with an NVMe SSD storage, SageMaker doesn’t attach an EBS storage volume to these ML compute instances. Refer to Host instance storage volumes for the size of the storage volumes that SageMaker attaches for each instance type for a single endpoint and for a multi-model endpoint.
The cost of SageMaker real-time endpoints is based on the per instance-hour consumed for each instance while the endpoint is running, the cost of GB-month of provisioned storage (EBS volume), as well as the GB data processed in and out of the endpoint instance, as outlined in Amazon SageMaker Pricing. In Cost Explorer, you can view real-time endpoint costs by applying a filter on the usage type. The names of these usage types are structured as follows:

REGION-Host:instanceType (for example, USE1-Host:ml.c5.9xlarge)
REGION-Host:VolumeUsage.gp2 (for example, USE1-Host:VolumeUsage.gp2)
REGION-Hst:Data-Bytes-Out (for example, USE2-Hst:Data-Bytes-In)
REGION-Hst:Data-Bytes-Out (for example, USW2-Hst:Data-Bytes-Out)

As shown in the following screenshot, filtering by the usage type Host: will show a list of real-time hosting usage types in an account.

You can either select specific usage types or select Select All and choose Apply to display the cost breakdown of SageMaker real-time hosting usage. To see the cost and usage breakdown by instance hours, you need to de-select all the REGION-Host:VolumeUsage.gp2 usage types before applying the usage type filter. You can also apply additional filters such as account number, EC2 instance type, cost allocation tag, Region, and more. The following screenshot shows cost and usage graphs for the selected hosting usage types.

Additionally, you can explore the cost associated with one or more hosting instances by using the Instance type filter. The following screenshot shows cost and usage breakdown for hosting instance ml.p2.xlarge.

Similarly, the cost for GB data processed in and processed out can be displayed by selecting the associated usage types as an applied filter, as shown in the following screenshot.

After you have achieved your desired results with filters and groupings, you can either download your results by choosing Download as CSV or save the report by choosing Save to report library. For general guidance on using Cost Explorer, refer to AWS Cost Explorer’s New Look and Common Use Cases.
Optionally, you can enable AWS Cost and Usage Reports (AWS CUR) to gain insights into the cost and usage data for your accounts. AWS CUR contains hourly AWS consumption details. It’s stored in Amazon Simple Storage Service (Amazon S3) in the payer account, which consolidates data for all the linked accounts. You can run queries to analyze trends in your usage and take appropriate action to optimize cost. Amazon Athena is a serverless query service that you can use to analyze the data from AWS CUR in Amazon S3 using standard SQL. More information and example queries can be found in the AWS CUR Query Library.
You can also feed AWS CUR data into Amazon QuickSight, where you can slice and dice it any way you’d like for reporting or visualization purposes. For instructions, see How do I ingest and visualize the AWS Cost and Usage Report (CUR) into Amazon QuickSight.
You can obtain resource-level information such as endpoint ARN, endpoint instance types, hourly instance rate, daily usage hours, and more from AWS CUR. You can also include cost-allocation tags in your query for an additional level of granularity. The following example query returns real-time hosting resource usage for the last 3 months for the given payer account:

SELECT
bill_payer_account_id,
line_item_usage_account_id,
line_item_resource_id AS endpoint_arn,
line_item_usage_type,
DATE_FORMAT((line_item_usage_start_date),’%Y-%m-%d’) AS day_line_item_usage_start_date,
SUM(CAST(line_item_usage_amount AS DOUBLE)) AS sum_line_item_usage_amount,
line_item_unblended_rate,
SUM(CAST(line_item_unblended_cost AS DECIMAL(16,8))) AS sum_line_item_unblended_cost,
line_item_blended_rate,
SUM(CAST(line_item_blended_cost AS DECIMAL(16,8))) AS sum_line_item_blended_cost,
line_item_line_item_description,
line_item_line_item_type
FROM
customer_all
WHERE
line_item_usage_start_date >= date_trunc(‘month’,current_date – interval ‘3’ month)
AND line_item_product_code = ‘AmazonSageMaker’
AND line_item_line_item_type IN (‘DiscountedUsage’, ‘Usage’, ‘SavingsPlanCoveredUsage’)
AND line_item_usage_type like ‘%Host%’
AND line_item_operation = ‘RunInstance’
AND bill_payer_account_id = ‘xxxxxxxxxxxx’
GROUP BY
bill_payer_account_id,
line_item_usage_account_id,
line_item_resource_id,
line_item_usage_type,
line_item_unblended_rate,
line_item_blended_rate,
line_item_line_item_type,
DATE_FORMAT((line_item_usage_start_date),’%Y-%m-%d’),
line_item_line_item_description
ORDER BY
line_item_resource_id, day_line_item_usage_start_date

The following screenshot shows the results obtained from running the query using Athena. For more information, refer to Querying Cost and Usage Reports using Amazon Athena.

The result of the query shows that endpoint mme-xgboost-housing with ml.x4.xlarge instance is reporting 24 hours of runtime for multiple consecutive days. The instance rate is $0.24/hour and the daily cost for running for 24 hours is $5.76.
AWS CUR results can help you identify patterns of endpoints running for consecutive days in each of the linked accounts, as well as endpoints with the highest monthly cost. This can also help you decide whether the endpoints in non-production accounts can be deleted to save cost.
Optimize costs for real-time endpoints
From a cost management perspective, it’s important to identify under-utilized (or over-sized) instances and bring the instance size and counts, if required, in line with workload requirements. Common system metrics like CPU/GPU utilization and memory utilization are written to Amazon CloudWatch for all hosting instances. For real-time endpoints, SageMaker makes several additional metrics available in CloudWatch. Some of the commonly monitored metrics include invocation counts and invocation 4xx/5xx errors. For a full list of metrics, refer to Monitor Amazon SageMaker with Amazon CloudWatch.
The metric CPUUtilization provides the sum of each individual CPU core’s utilization. The CPU utilization of each core range is 0–100. For example, if there are four CPUs, the CPUUtilization range is 0–400%. The metric MemoryUtilization is the percentage of memory that is used by the containers on an instance. This value range is 0–100%. The following screenshot shows an example of CloudWatch metrics CPUUtilization and MemoryUtilization for an endpoint instance ml.m4.10xlarge that comes with 40 vCPUs and 160 GiB memory.

These metrics graphs show maximum CPU utilization of approximately 3,000%, which is the equivalent of 30 vCPUs. This means that this endpoint isn’t utilizing more than 30 vCPUs out of the total capacity of 40 vCPUs. Similarly, the memory utilization is below 6%. Using this information, you can possibly experiment with a smaller instance that can match this resource need. Furthermore, the CPUUtilization metric shows a classic pattern of periodic high and low CPU demand, which makes this endpoint a good candidate for auto scaling. You can start with a smaller instance and scale out first as your compute demand changes. For information, see Automatically Scale Amazon SageMaker Models.
SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment using production variants, and you only pay for what you use. Each production variant runs on its own compute instance and you’re charged per instance-hour consumed for each instance while the variant is running.
SageMaker also supports shadow variants, which have the same components as a production variant and run on their own compute instance. With shadow variants, SageMaker automatically deploys the model in a test environment, routes a copy of the inference requests received by the production model to the test model in real time, and collects performance metrics such as latency and throughput. This enables you to validate any new candidate component of your model serving stack before promoting it to production.
When you’re done with your tests and aren’t using the endpoint or the variants extensively anymore, you should delete it to save cost. Because the model is stored in Amazon S3, you can recreate it as needed. You can automatically detect these endpoints and take corrective actions (such as deleting them) by using Amazon CloudWatch Events and AWS Lambda functions. For example, you can use the Invocations metric to get the total number of requests sent to a model endpoint and then detect if the endpoints have been idle for the past number of hours (with no invocations over a certain period, such as 24 hours).
If you have several under-utilized endpoint instances, consider hosting options such as multi-model endpoints (MMEs), multi-container endpoints (MCEs), and serial inference pipelines to consolidate usage to fewer endpoint instances.
For real-time and asynchronous inference model deployment, you can optimize cost and performance by deploying models on SageMaker using AWS Graviton. AWS Graviton is a family of processors designed by AWS that provide the best price performance and are more energy efficient than their x86 counterparts. For guidance on deploying an ML model to AWS Graviton-based instances and details on the price performance benefit, refer to Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker. SageMaker also supports AWS Inferentia accelerators through the ml.inf2 family of instances for deploying ML models for real-time and asynchronous inference. You can use these instances on SageMaker to achieve high performance at a low cost for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers.
In addition, you can use Amazon SageMaker Inference Recommender to run load tests and evaluate the price performance benefits of deploying your model on these instances. For additional guidance on automatically detecting idle SageMaker endpoints, as well as instance right-sizing and auto scaling for SageMaker endpoints, refer to Ensure efficient compute resources on Amazon SageMaker.
SageMaker batch transform
Batch inference, or offline inference, is the process of generating predictions on a batch of observations. Offline predictions are suitable for larger datasets and in cases where you can afford to wait several minutes or hours for a response.
The cost for SageMaker batch transform is based on the per instance-hour consumed for each instance while the batch transform job is running, as outlined in Amazon SageMaker Pricing. In Cost Explorer, you can explore batch transform costs by applying a filter on the usage type. The name of this usage type is structured as REGION-Tsform:instanceType (for example, USE1-Tsform:ml.c5.9xlarge).
As shown in the following screenshot, filtering by usage type Tsform: will show a list of SageMaker batch transform usage types in an account.

You can either select specific usage types or select Select All and choose Apply to display the cost breakdown of batch transform instance usage for the selected types. As mentioned earlier, you can also apply additional filters. The following screenshot shows cost and usage graphs for the selected batch transform usage types.

Optimize costs for batch transform
SageMaker batch transform only charges you for the instances used while your jobs are running. If your data is already in Amazon S3, then there is no cost for reading input data from Amazon S3 and writing output data to Amazon S3. All output objects are attempted to be uploaded to Amazon S3. If all are successful, then the batch transform job is marked as complete. If one or more objects fail, the batch transform job is marked as failed.
Charges for batch transform jobs apply in the following scenarios:

The job is successful
Failure due to ClientError and the model container is SageMaker or a SageMaker managed framework
Failure due to AlgorithmError or ClientError and the model container is your own custom container (BYOC)

The following are some of the best practices for optimizing a SageMaker batch transform job. These recommendations can reduce the total runtime of your batch transform job, thereby lowering costs:

Set BatchStrategy to MultiRecord and SplitType to Line if you need the batch transform job to make mini batches from the input file. If it can’t automatically split the dataset into mini batches, you can divide it into mini batches by putting each batch in a separate input file, placed in the data source S3 bucket.
Make sure that the batch size fits into the memory. SageMaker usually handles this automatically; however, when dividing batches manually, this needs to be tuned based on the memory.
Batch transform partitions the S3 objects in the input by key and maps those objects to instances. When you have multiples files, one instance might process input1.csv, and another instance might process input2.csv. If you have one input file but initialize multiple compute instances, only one instance processes the input file and the rest of the instances are idle. Make sure the number of files is equal to or greater than the number of instances.
If you have a large number of small files, it may be beneficial to combine multiple files into a small number of bigger files to reduce Amazon S3 interaction time.
If you’re using the CreateTransformJob API, you can reduce the time it takes to complete batch transform jobs by using optimal values for parameters such as MaxPayloadInMB, MaxConcurrentTransforms, or BatchStrategy:

MaxConcurrentTransforms indicates the maximum number of parallel requests that can be sent to each instance in a transform job. The ideal value for MaxConcurrentTransforms is equal to the number of vCPU cores in an instance.
MaxPayloadInMB is the maximum allowed size of the payload, in MB. The value in MaxPayloadInMB must be greater than or equal to the size of a single record. To estimate the size of a record in MB, divide the size of your dataset by the number of records. To ensure that the records fit within the maximum payload size, we recommend using a slightly larger value. The default value is 6 MB.
MaxPayloadInMB must not be greater than 100 MB. If you specify the optional MaxConcurrentTransforms parameter, then the value of (MaxConcurrentTransforms * MaxPayloadInMB) must also not exceed 100 MB.
For cases where the payload might be arbitrarily large and is transmitted using HTTP chunked encoding, set the MaxPayloadInMB value to 0. This feature works only in supported algorithms. Currently, SageMaker built-in algorithms do not support HTTP chunked encoding.

Batch inference tasks are usually good candidates for horizontal scaling. Each worker within a cluster can operate on a different subset of data without the need to exchange information with other workers. AWS offers multiple storage and compute options that enable horizontal scaling. If a single instance is not sufficient to meet your performance requirements, consider using multiple instances in parallel to distribute the workload. For key considerations when architecting batch transform jobs, refer to Batch Inference at Scale with Amazon SageMaker.
Continuously monitor the performance metrics of your SageMaker batch transform jobs using CloudWatch. Look for bottlenecks, such as high CPU or GPU utilization, memory usage, or network throughput, to determine if you need to adjust instance sizes or configurations.
SageMaker uses the Amazon S3 multipart upload API to upload results from a batch transform job to Amazon S3. If an error occurs, the uploaded results are removed from Amazon S3. In some cases, such as when a network outage occurs, an incomplete multipart upload might remain in Amazon S3. To avoid incurring storage charges, we recommend that you add the S3 bucket policy to the S3 bucket lifecycle rules. This policy deletes incomplete multipart uploads that might be stored in the S3 bucket. For more information, see Managing your storage lifecycle.

SageMaker asynchronous inference
Asynchronous inference is a great choice for cost-sensitive workloads with large payloads and burst traffic. Requests can take up to 1 hour to process and have payload sizes of up to 1 GB, so it’s more suitable for workloads that have relaxed latency requirements.
Invocation of asynchronous endpoints differs from real-time endpoints. Rather than passing a request payload synchronously with the request, you upload the payload to Amazon S3 and pass an S3 URI as a part of the request. Internally, SageMaker maintains a queue with these requests and processes them. During endpoint creation, you can optionally specify an Amazon Simple Notification Service (Amazon SNS) topic to receive success or error notifications. When you receive the notification that your inference request has been successfully processed, you can access the result in the output Amazon S3 location.
The cost for asynchronous inference is based on the per instance-hour consumed for each instance while the endpoint is running, cost of GB-month of provisioned storage, as well as GB data processed in and out of the endpoint instance, as outlined in Amazon SageMaker Pricing. In Cost Explorer, you can filter asynchronous inference costs by applying a filter on the usage type. The name of this usage type is structured as REGION-AsyncInf:instanceType (for example, USE1-AsyncInf:ml.c5.9xlarge). Note that GB volume and GB data processed usage types are the same as real-time endpoints, as mentioned earlier in this post.
As shown in the following screenshot, filtering by the usage type AsyncInf: in Cost Explorer displays a cost breakdown by asynchronous endpoint usage types.

To see the cost and usage breakdown by instance hours, you need to de-select all the REGION-Host:VolumeUsage.gp2 usage types before applying the usage type filter. You can also apply additional filters. Resource-level information such as endpoint ARN, endpoint instance types, hourly instance rate, and daily usage hours can be obtained from AWS CUR. The following is an example of an AWS CUR query to obtain asynchronous hosting resource usage for the last 3 months:

SELECT
bill_payer_account_id,
line_item_usage_account_id,
line_item_resource_id AS endpoint_arn,
line_item_usage_type,
DATE_FORMAT((line_item_usage_start_date),’%Y-%m-%d’) AS day_line_item_usage_start_date,
SUM(CAST(line_item_usage_amount AS DOUBLE)) AS sum_line_item_usage_amount,
line_item_unblended_rate,
SUM(CAST(line_item_unblended_cost AS DECIMAL(16,8))) AS sum_line_item_unblended_cost,
line_item_blended_rate,
SUM(CAST(line_item_blended_cost AS DECIMAL(16,8))) AS sum_line_item_blended_cost,
line_item_line_item_description,
line_item_line_item_type
FROM
customer_all
WHERE
line_item_usage_start_date >= date_trunc(‘month’,current_date – interval ‘3’ month)
AND line_item_product_code = ‘AmazonSageMaker’
AND line_item_line_item_type IN (‘DiscountedUsage’, ‘Usage’, ‘SavingsPlanCoveredUsage’)
AND line_item_usage_type like ‘%AsyncInf%’
AND line_item_operation = ‘RunInstance’
GROUP BY
bill_payer_account_id,
line_item_usage_account_id,
line_item_resource_id,
line_item_usage_type,
line_item_unblended_rate,
line_item_blended_rate,
line_item_line_item_type,
DATE_FORMAT((line_item_usage_start_date),’%Y-%m-%d’),
line_item_line_item_description
ORDER BY
line_item_resource_id, day_line_item_usage_start_date

The following screenshot shows the results obtained from running the AWS CUR query using Athena.

The result of the query shows that endpoint sagemaker-abc-model-5 with ml.m5.xlarge instance is reporting 24 hours of runtime for multiple consecutive days. The instance rate is $0.23/hour and the daily cost for running for 24 hours is $5.52.
As mentioned earlier, AWS CUR results can help you identify patterns of endpoints running for consecutive days, as well as endpoints with the highest monthly cost. This can also help you decide whether the endpoints in non-production accounts can be deleted to save cost.
Optimize costs for asynchronous inference
Just like the real-time endpoints, the cost for asynchronous endpoints is based on the instance type usage. Therefore, it’s important to identify under-utilized instances and resize them based on the workload requirements. In order to monitor asynchronous endpoints, SageMaker makes several metrics such as ApproximateBacklogSize, HasBacklogWithoutCapacity, and more available in CloudWatch. These metrics can show requests in the queue for an instance and can be used for auto scaling an endpoint. SageMaker asynchronous inference also includes host-level metrics. For information on host-level metrics, see SageMaker Jobs and Endpoint Metrics. These metrics can show resource utilization that can help you right-size the instance.
SageMaker supports auto scaling for asynchronous endpoints. Unlike real-time hosted endpoints, asynchronous inference endpoints support scaling down instances to zero by setting the minimum capacity to zero. For asynchronous endpoints, SageMaker strongly recommends that you create a policy configuration for target-tracking scaling for a deployed model (variant). You need to define the scaling policy that scaled on the ApproximateBacklogPerInstance custom metric and set the MinCapacity value to zero.
Asynchronous inference enables you to save on costs by auto scaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests. Requests that are received when there are zero instances are queued for processing after the endpoint scales up. Therefore, for use cases that can tolerate a cold start penalty of a few minutes, you can optionally scale down the endpoint instance count to zero when there are no outstanding requests and scale back up as new requests arrive. Cold start time depends on the time required to launch a new endpoint from scratch. Also, if the model itself is big, then the time can be longer. If your job is expected to take longer than the 1-hour processing time, you may want to consider SageMaker batch transform.
Additionally, you may also consider your request’s queued time combined with the processing time to choose the instance type. For example, if your use case can tolerate hours of wait time, you can choose a smaller instance to save cost.
For additional guidance on instance right-sizing and auto scaling for SageMaker endpoints, refer to Ensure efficient compute resources on Amazon SageMaker.
Serverless inference
Serverless inference allows you to deploy ML models for inference without having to configure or manage the underlying infrastructure. Based on the volume of inference requests your model receives, SageMaker serverless inference automatically provisions, scales, and turns off compute capacity. As a result, you pay for only the compute time to run your inference code and the amount of data processed, not for idle time. For serverless endpoints, instance provisioning is not necessary. You need to provide the memory size and maximum concurrency. Because serverless endpoints provision compute resources on demand, your endpoint may experience a few extra seconds of latency (cold start) for the first invocation after an idle period. You pay for the compute capacity used to process inference requests, billed by the millisecond, GB-month of provisioned storage, and the amount of data processed. The compute charge depends on the memory configuration you choose.
In Cost Explorer, you can filter serverless endpoints costs by applying a filter on the usage type. The name of this usage type is structured as REGION-ServerlessInf:Mem-MemorySize (for example, USE2-ServerlessInf:Mem-4GB). Note that GB volume and GB data processed usage types are the same as real-time endpoints.
You can see the cost breakdown by applying additional filters such as account number, instance type, Region, and more. The following screenshot shows the cost breakdown by applying filters for the serverless inference usage type.

Optimize cost for serverless inference
When configuring your serverless endpoint, you can specify the memory size and maximum number of concurrent invocations. SageMaker serverless inference auto-assigns compute resources proportional to the memory you select. If you choose a larger memory size, your container has access to more vCPUs. With serverless inference, you only pay for the compute capacity used to process inference requests, billed by the millisecond, and the amount of data processed. The compute charge depends on the memory configuration you choose. The memory sizes you can choose are 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, and 6144 MB. The pricing increases with the memory size increments, as explained in Amazon SageMaker Pricing, so it’s important to select the correct memory size. As a general rule, the memory size should be at least as large as your model size. However, it’s a good practice to refer to memory utilization when deciding the endpoint memory size, in addition to the model size itself.
General best practices for optimizing SageMaker inference costs
Optimizing hosting costs isn’t a one-time event. It’s a continuous process of monitoring deployed infrastructure, usage patterns, and performance, and also keeping a keen eye on new innovative solutions that AWS releases that could impact cost. Consider the following best practices:

Choose an appropriate instance type – SageMaker supports multiple instance types, each with varying combinations of CPU, GPU, memory, and storage capacities. Based on your model’s resource requirements, choose an instance type that provides the necessary resources without over-provisioning. For information about available SageMaker instance types, their specifications, and guidance on selecting the right instance, refer to Ensure efficient compute resources on Amazon SageMaker.
Test using local mode – In order to detect failures and debug faster, it’s recommended to test the code and container (in case of BYOC) in local mode before running the inference workload on the remote SageMaker instance. Local mode is a great way to test your scripts before running them in a SageMaker managed hosting environment.
Optimize models to be more performant – Unoptimized models can lead to longer runtimes and use more resources. You can choose to use more or bigger instances to improve performance; however, this leads to higher costs. By optimizing your models to be more performant, you may be able to lower costs by using fewer or smaller instances while keeping the same or better performance characteristics. You can use Amazon SageMaker Neo with SageMaker inference to automatically optimize models. For more details and samples, see Optimize model performance using Neo.
Use tags and cost management tools – To maintain visibility into your inference workloads, it’s recommended to use tags as well as AWS cost management tools such as AWS Budgets, the AWS Billing console, and the forecasting feature of Cost Explorer. You can also explore SageMaker Savings Plans as a flexible pricing model. For more information about these options, refer to Part 1 of this series.

Conclusion
In this post, we provided guidance on cost analysis and best practices when using SageMaker inference options. As machine learning establishes itself as a powerful tool across industries, training and running ML models needs to remain cost-effective. SageMaker offers a wide and deep feature set for facilitating each step in the ML pipeline and provides cost optimization opportunities without impacting performance or agility. Reach out to your AWS team for cost guidance on your SageMaker workloads.

Refer to the following posts in this series for more information about optimizing cost for SageMaker:

Part 1: Overview
Part 2: SageMaker notebooks and SageMaker Studio
Part 3: Processing and Data Wrangler jobs
Part 4: Training jobs
Part 5: Hosting

About the Authors
Deepali Rajale is a Senior AI/ML Specialist at AWS. She works with enterprise customers providing technical guidance with best practices for deploying and maintaining AI/ML solutions in the AWS ecosystem. She has worked with a wide range of organizations on various deep learning use cases involving NLP and computer vision. She is passionate about empowering organizations to leverage generative AI to enhance their use experience. In her spare time, she enjoys movies, music, and literature.
Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers on all things ML to design, build, and operate at scale. In his spare time, he enjoys cycling, hiking, and rock and roll climbing.

Analyze Amazon SageMaker spend and determine cost optimization opportu …

In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support plan. Since its introduction, we’ve helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage.
In this series of posts, we share lessons learned about optimizing costs in Amazon SageMaker. In this post, we focus on SageMaker training jobs.

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage:

Part 1: Overview
Part 2: SageMaker notebooks and SageMaker Studio
Part 3: Processing and Data Wrangler jobs
Part 4: Training jobs
Part 5: Hosting

SageMaker training jobs
SageMaker training jobs are asynchronous batch processes with built-in features for ML model training and optimization.
With SageMaker training jobs, you can bring your own algorithm or choose from more than 25 built-in algorithms. SageMaker supports various data sources and access patterns, distributed training including heterogenous clusters, as well as experiment management features and automatic model tuning.
The cost of a training job is based on the resources you use (instances and storage) for the duration (in seconds) that those instances are running. This includes the time training takes place and, if you’re using the warm pool feature, the keep alive period you configure. In Part 1, we showed how to get started using AWS Cost Explorer to identify cost optimization opportunities in SageMaker. You can filter training costs by applying a filter on the usage type. The names of these usage types are as follows:

REGION-Train:instanceType (for example, USE1-Train:ml.m5.large)
REGION-Train:VolumeUsage.gp2 (for example, USE1-Train:VolumeUsage.gp2)

To view a breakdown of your training costs in Cost Explorer, you can enter train: as a prefix for Usage type. If you filter only for hours used (see the following screenshot), Cost Explorer will generate two graphs: Cost and Usage. This view will help you prioritize your optimization opportunities and identify which instances are long-running and costly.

Before optimizing an existing training job, we recommend following the best practices covered in Optimizing costs for machine learning with Amazon SageMaker: test your code locally and use local mode for testing, use pre-trained models where possible, and consider managed spot training (which can optimize cost up to 90% over On-Demand instances).
When an On-Demand job is launched, it goes through five phases: Starting, Downloading, Training, Uploading, and Completed. You can see those phases and descriptions on the training job’s page on the SageMaker console.

From a pricing perspective, you are charged for Downloading, Training, and Uploading phases.
Reviewing these phases is a first step in diagnosing where to optimize your training costs. In this post, we discuss the Downloading and Training phases.
Downloading phase
In the preceding example, the Downloading phase took less than a minute. However, if data downloading is a big factor of your training cost, you should consider the data source you are using and access methods. SageMaker training jobs support three data sources natively: Amazon Elastic File System (Amazon EFS), Amazon Simple Storage Service (Amazon S3), and Amazon FSx for Lustre. For Amazon S3, SageMaker offers three managed ways that your algorithm can access the training: File mode (where data is downloaded to the instance block storage), Pipe mode (data is streamed to the instance, thereby eliminating the duration of the Downloading phase) and Fast File mode (combines the ease of use of the existing File mode with the performance of Pipe mode). For detailed guidance on choosing the right data source and access methods, refer to Choose the best data source for your Amazon SageMaker training job.
When using managed spot training, any repeated Downloading phases that occurred due to interruption are not charged (so you’re only charged for the duration of the data download one time).
It’s important to note that although SageMaker training jobs support the data sources we mentioned, they are not mandatory. In your training code, you can implement any method for downloading the training data from any source (provided that the training instance can access it). There are additional ways to speed up download time, such as using the Boto3 API with multiprocessing to download files concurrently, or using third-party libraries such as WebDataset or s5cmd for faster download from Amazon S3. For more information, refer to Parallelizing S3 Workloads with s5cmd.
Training phase
Optimizing the Training phase cost consists of optimizing two vectors: choosing the right infrastructure (instance family and size), and optimizing the training itself. We can roughly divide training instances into two categories: accelerated GPU-based, mostly for deep-learning models, and CPU-based for common ML frameworks. For guidance on selecting the right instance family for training, refer to Ensure efficient compute resources on Amazon SageMaker. If your training requires GPUs instances, we recommend referring to the video How to select Amazon EC2 GPU instances for deep learning.
As a general guidance, if your workload does require an NVIDIA GPU, we found customers gain significant cost savings with two Amazon Elastic Compute Cloud (Amazon EC2) instance types: ml.g4dn and ml.g5. The ml.g4dn is equipped with NVIDIA T4 and offers a particularly low cost per memory. The ml.g5 instance is equipped with NVIDIA A10g Tensor Core and has the lowest cost-per-CUDA flop (fp32).
AWS offers specific cost saving features for deep learning training:

Amazon SageMaker Training Compiler, which implements optimizations to reduce training time on GPU instances
Trn1 instances, powered by AWS Trainium accelerators, which are purpose built for high-performance training while offering up to 50% cost-to-train savings over comparable GPU-based instances

In order to right-size and optimize your instance, you should first look at the Amazon CloudWatch metrics the training jobs are generating. For more information, refer to SageMaker Jobs and Endpoint Metrics. You can further use CloudWatch custom algorithm metrics to monitor the training performance.

These metrics can indicate bottlenecks or over-provisioning of resources. For example, if you’re observing high CPU with low GPU utilizations, you can address the issue by using heterogeneous clusters. Another example can be seeing consistent low CPU utilization throughout the job duration—this can lead to reducing the size of the instance.
If you’re using distributed training, you should test different distribution methods (tower, Ring-AllReduce, mirrored, and so on) to validate maximum utilization and fine-tune your framework parameters accordingly (for an example, see Best practices for TensorFlow 1.x acceleration training on Amazon SageMaker). It’s important to highlight that you can use the SageMaker distribution API and libraries like SageMaker Distributed Data Parallel, SageMaker Model Parallel, and SageMaker Sharded Data Parallel, which are optimized for AWS infrastructure and help reduce training costs.
Note that distributed training doesn’t necessarily scale linearly and might introduce some overhead, which will affect the overall runtime.
For deep learning models, another optimization technique is using mixed precision. Mixed precision can speed up training, thereby reducing both training time and memory usage with minimal to no impact on model accuracy. For more information, see the Train with Data Parallel and Model Parallel section in Distributed Training in Amazon SageMaker.
Finally, optimizing framework-specific parameters can have a significant impact in optimizing the training process. SageMaker automatic model tuning finds hyperparameters that perform the best, as measured by an objective metric that you choose. Setting the training time as an objective metric and framework configuration as hyperparameters can help remove bottlenecks and reduce overall training time. For an example of optimizing the default TensorFlow settings and removing a CPU bottleneck, refer to Aerobotics improves training speed by 24 times per sample with Amazon SageMaker and TensorFlow.
Another opportunity for optimizing both download and processing time is to consider training on a subset of your data. If your data consists of multiple duplicate entries or features with low information gain, you might be able to train on a subset of data and reduce downloading and training time as well as use a smaller instance and Amazon Elastic Block Store (Amazon EBS) volume. For an example, refer to Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models. Also, Amazon SageMaker Data Wrangler can simplify the analysis and creation of training samples. For more information, refer to Create random and stratified samples of data with Amazon SageMaker Data Wrangler.
SageMaker Debugger
To ensure efficient training and resource utilization, SageMaker can profile your training job using Amazon SageMaker Debugger. Debugger offers built-in rules to alert on common issues that are affecting your training like CPU bottleneck, GPU memory increase, or I/O bottleneck, or you can create your own rules. You can access and analyze the generated report in Amazon SageMaker Studio. For more information, refer to Amazon SageMaker Debugger UI in Amazon SageMaker Studio Experiments. The following screenshot shows the Debugger view in Studio.

You can drill down into the Python operators and functions (the Top operations on GPU section) that are run to perform the training job. The Debugger built-in rules for profiling watch framework operation-related issues, including excessive training initialization time due to data downloading before training starts and step duration outliers in training loops. You should note that although using the built-in rules are free, costs for custom rules apply based on the instance that you configure for the duration of the training job and storage that is attached to it.
Conclusion
In this post, we provided guidance on cost analysis and best practices when training ML models using SageMaker training jobs. As machine learning establishes itself as a powerful tool across industries, training and running ML models needs to remain cost-effective. SageMaker offers a wide and deep feature set for facilitating each step in the ML pipeline and provides cost optimization opportunities without impacting performance or agility.

Refer to the following posts in this series for more information about optimizing cost for SageMaker:

Part 1: Overview
Part 2: SageMaker notebooks and SageMaker Studio
Part 3: Processing and Data Wrangler jobs
Part 4: Training jobs
Part 5: Hosting

About the Authors
Deepali Rajale is a Senior AI/ML Specialist at AWS. She works with enterprise customers providing technical guidance with best practices for deploying and maintaining AI/ML solutions in the AWS ecosystem. She has worked with a wide range of organizations on various deep learning use cases involving NLP and computer vision. She is passionate about empowering organizations to leverage generative AI to enhance their use experience. In her spare time, she enjoys movies, music, and literature.
Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers on all things ML to design, build, and operate at scale. In his spare time, he enjoys cycling, hiking, and increasing entropy.

Stability AI Releases StableStudio: An Open Source Design Suite For Ge …

The innovative AI Startup Stability AI, which is renowned for its text-to-image model Stable Diffusion has made it to the headlines with the announcement of its new project, StableStudio. It is an open-source version of its commercial AI-powered design software, DreamStudio. This unique and interesting move is an attempt to move closer towards their aim of collaborative development and outperform the speed at which things move in this generative AI and art field compared to individual companies and their closed-door efforts. This move is also considered a leveraging move after recent investments from tech giants like Google, Microsoft, and Amazon.

In a recent article release, Stability AI has expressed its views on the future enhancements of generative AI. 

They believe an open, collaborative, and community-driven development would better facilitate the expansion of generative AI. They expressed their vision of working with the broader community to create a best-in-class user interface, granting users full control over the creative potential of generative AI.

Started as an animation studio for the open-source generative AI art model Disco Diffusion, DreamStudio has gradually set a new goal and focus on image generation with the introduction of Stable Diffusion. This new shift in focus has put DreamStudio in close and tight competition with other rival generative image platforms like Midjourny and NightCafe.

While StableStudio and DreamStudio share many similarities, there are some important differences. Stable Studio does not share DreamStudio brandings or Stability-specific account features such as billing and API management. In addition to that, the backend API calls have been replaced by a plug-in system.

Even though StableStudio shares a vision of collaborative development, some skeptics view this release of StableStudio as an attempt by Stability AI to outsource the development of DreamStudio to the open-source community. While this perspective isn’t entirely baseless and illogical, it is also a fact that Stability AI is under a lot of pressure to monetize its diverse efforts, spanning art, animation, biomedicine, and generative audio.

CEO of Stability AI, Emad Mostaque, hinted at going public with Stability AI with an initial public offering (IPO). Interestingly, according to the recent posts on Stability Ai, despite raising over $100 million in venture capital in October last year with a valuation exceeding $1 billion, it has needed to be faster to generate revenue and is rapidly burning its cash reserves.

This new approach comes with an existential challenge for Stability AI. Unlike Stable Diffusion, which the company did not inherently develop but partnered up with research organizations to create and build upon, Stability AI primarily focused on providing cloud access to the computational power required for training AI models rather than developing its models.

This approach is evolving for good. There were some key highlight developments several weeks ago, such as Stability AI released a suite of text-generating AI models aimed at competing with systems like OpenAI’s GPT-4 and ChatGPT. On top of that, Stability AI unveiled Stable Diffusion XL (SDXL), an enhanced version of the original model that introduces significant improvements, such as the generation of hands.

This bold move by Stability AI, open-sourcing their project, aligns strategically with the company’s ongoing efforts to secure more funding. As time evolves, let us see how this move turns out for them.

Check out the Project Page. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Stability AI Releases StableStudio: An Open Source Design Suite For Generative AI appeared first on MarkTechPost.

Researchers from UC Berkeley Introduce Gorilla: A Finetuned LLaMA-base …

A recent breakthrough in the field of Artificial Intelligence is the introduction of Large Language Models (LLMs). These models enable us to understand language more concisely and, thus, make the best use of Natural Language Processing (NLP) and Natural Language Understanding (NLU). These models are performing well on every other task, including text summarization, question answering, content generation, language translation, and so on. They understand complex textual prompts, even texts with reasoning and logic, and identify patterns and relationships between that data.

Though language models have shown incredible performance and have developed significantly in recent times by demonstrating their competence in a variety of tasks, it still remains difficult for them to use tools through API calls in an efficient manner. Even famous LLMs like GPT-4 struggle to generate precise input arguments and frequently recommend inappropriate API calls. To address this issue, Berkeley and Microsoft Research researchers have proposed Gorilla, a finetuned LLaMA-based model that beats GPT-4 in terms of producing API calls. Gorilla helps in choosing the appropriate API, improving LLMs’ capacity to work with external tools to carry out particular activities. 

The team of researchers has also created an APIBench dataset, which is made up of a sizable corpus of APIs with overlapping functionality. The dataset has been created by collecting public model hubs like TorchHub, TensorHub, and HuggingFace for their ML APIs. Every API request from TorchHub and TensorHub is included for each API, and the top 20 models from HuggingFace for each task category are chosen. Additionally, they produce ten fictitious user query prompts for each API using the self-instruct method.

Using this APIBench dataset and document retrieval, researchers have finetuned Gorilla. Gorilla, the 7 billion parameter model outperforms GPT-4 in terms of the correctness of API functioning and lowers hallucinatory mistakes. The document retriever’s effective integration with Gorilla demonstrates the possibility for LLMs to use tools more precisely. The improved API call-generating capabilities of Gorilla and its capacity to modify documentation as necessary improves the applicability and dependability of the model’s results. This development is important because it allows LLMs to keep up with regularly updated documentation, giving users more accurate and current information. 

One of the examples shared by the researchers shows how Gorilla correctly recognizes tasks and offers fully-qualified API results. API calls generated by the models showed GPT-4 producing API requests for hypothetical models, which demonstrates a lack of comprehension of the task. Claude chose the wrong library, showing a lack of ability to recognize the right resources. Gorilla, in contrast, correctly recognized the task. Gorilla thus differs from GPT-4 and Claude as its API call creation is accurate, demonstrating both its enhanced performance and task comprehension.

In conclusion, Gorilla is a major addition to the list of language models, as it even addresses the issue of writing API calls. Its capabilities enable the reduction of problems related to hallucination and reliability.

Check out the Paper, Github Link, and Project Page. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers from UC Berkeley Introduce Gorilla: A Finetuned LLaMA-based Model that Surpasses GPT-4 on Writing API Calls appeared first on MarkTechPost.

Meet LIMA: A New 65B Parameter LLaMa Model Fine-Tuned On 1000 Carefull …

Language models develop general-purpose representations transferable to almost any language interpretation or generating job by being pretrained to anticipate the next token at an astounding scale. Different approaches to aligning language models have thus been put forth to facilitate this transfer, with a particular emphasis on instruction tuning over sizable datasets with millions of examples and, more recently, reinforcement learning from human feedback (RLHF) gathered over millions of interactions with human annotators, for existing alignment techniques to function at ChatGPT levels, large computing, and specialized data resources are needed. 

However, they show that with a good language model already trained, very good performance may be obtained by just tweaking 1,000 properly chosen training instances. According to their hypothesis, alignment may be a quick and easy procedure where the model learns the format or style of engaging users to disclose the skills and information already learned during pretraining. They collect 1,000 instances that resemble authentic user cues and excellent replies to verify this idea. They choose 750 of the best questions and responses from online discussion boards like Stack Exchange and wikiHow, evaluating them for quality and variety.

They also manually compose 250 instances of questions and answers while emphasizing a consistent response style in the vein of an AI assistant and optimizing for task diversity. Researchers from Meta AI, Carnegie Mellon University, University of Southern California and Tel Aviv University train LIMA, a 65B-parameter LLaMa model previously trained and improved on this collection of 1,000 examples. Three hundred difficult test questions compare LIMA against contemporary language models and products. LIMA surpasses RLHF-trained DaVinci003 from OpenAI, which was trained with RLHF, as well as a 65B-parameter replica of Alpaca, which was introduced on 52,000 samples, in a study of human preference. 

Although humans frequently prefer GPT-4, Claude, and Bard replies over LIMA responses, this is not always the case; LIMA consistently yields equivalent or preferable results in 43%, 46%, and 58% of the situations, respectively. They repeat the annotations of human preferences using GPT-4 as the annotator confirms their findings. When LIMA replies are evaluated on an absolute scale, 88% satisfy the prompt’s requirements, and 50% are rated outstanding. Ablation tests show significant improvements when improving data quality and significantly falling returns when increasing data amount without simultaneously increasing prompt variety. 

Furthermore, they discover that LIMA can carry on coherent multi-turn discourse despite having no dialogue examples. Including 30 hand-crafted dialogue chains in training may enhance this capacity. Overall, these amazing results show the effectiveness of pretraining and its relative value over approaches to reinforcement learning and large-scale instruction tailoring. They demonstrate how a robust pretrained language model may be tuned to provide outstanding, competitive outcomes on various prompts using 1,000 well-picked samples. There are, however, drawbacks to this strategy. 

The mental work required to create such instances is enormous and challenging to scale up. Second, while LIMA normally provides strong replies, an unfortunate sample during decoding or an aggressive prompt can frequently result in a weak response. LIMA is less resilient than product-grade models. Nevertheless, the data provided in this work shows that it is possible to address the difficult alignment problems straightforwardly.

Check out the Pre-Print Paper. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet LIMA: A New 65B Parameter LLaMa Model Fine-Tuned On 1000 Carefully Curated Prompts And Responses appeared first on MarkTechPost.

MIT Researchers Present A new computer vision system turns any shiny o …

Valuable and often concealed information about one’s immediate surroundings can be gleaned from an object’s reflection. By repurposing them as cameras, one can do previously inconceivable image feats, such as looking through walls or up into the sky. This is challenging because several factors influence reflections, including the object’s geometry, the material’s qualities, the 3D environment, and the observer’s viewpoint. By internally deconstructing the object’s geometry and radiance from the specular radiance being reflected onto it, humans can derive depth and semantic clues about the occluded portions in the surroundings.

Computer vision researchers at MIT and Rice have developed a method of using reflections to produce images of the real environment. Using reflections, they transform shiny objects into “cameras,” giving the impression that the user is gazing at the world through the “lenses” of commonplace items like a ceramic coffee cup or a metallic paperweight.

The method used by the researchers involves transforming shiny objects of undetermined geometry into radiance-field cameras. The main idea is to use the object’s surface as a digital sensor to record reflected light from the surrounding environment in two dimensions.

Researchers demonstrate that novel view synthesis, the rendering of novel views that are only directly visible to the glossy object in the scene but not to the observer, is possible thanks to recovering the environment’s radiance fields. Furthermore, we can picture occluders created by nearby objects in the scene using the radiance field. The method developed by the researchers is taught from start to finish using many photographs of the object to simultaneously estimate its geometry, diffuse radiance, and the radiance field of its 5D environment.

The research aims to separate the object from its reflections so that the object may “see” the world as if it were a camera and record its surroundings. Computer vision has struggled with reflections for some time because they are a distorted 2D representation of a 3D scene whose shape is unknown.

Researchers model the object’s surface as a virtual sensor, collecting the 2D projection of the 5D environment radiance field around the object to create a 3D representation of the world as the thing sees it. Most of the environment’s radiance field is obscured except via the object’s reflections. Beyond field-of-view, novel-view synthesis, or the rendering of novel views that are only directly visible to the glossy object in the scene but not to the observer, is made possible by the use of environment radiance fields, which also allow for depth and radiance estimation from the object to its surroundings.

In summing up, the team did the following:

They demonstrate how implicit surfaces can be transformed into virtual sensors with the ability to capture 3D images of their environments using only virtual cones.

Together, they calculate the object’s 5D ambient radiance field and estimate its diffuse radiance.

They demonstrate how to use the light field of the surrounding environment to generate fresh viewpoints invisible to the human eye.

This project aims to reconstruct the 5D radiance field of the surroundings from many photographs of a shiny item whose shape and albedo are unknown. Glare from reflective surfaces reveals scene elements outside the frame of view. Specifically, the surface normals and curvature of the glossy object determine how the observer’s images are mapped onto the real world.

Researchers may need more accurate information on the object’s shape or the reflected reality, contributing to the distortion. It’s also possible that the glossy object’s color and texture will blend in with the reflections. Furthermore, it isn’t easy to discern depth in reflected scenes since reflections are two-dimensional projections of a three-dimensional environment.

The team of researchers overcame these obstacles. They begin by photographing the shiny object from various angles, catching a variety of reflections. Orca (Objects such as Radiance-Field Cameras) is the acronym for their three-stage process.

Orca can record multiview reflections by imaging the object from various angles, which are then used to estimate the depth between the glossy object and other objects in the scene and the shape of the glossy object itself. More information about the strength and direction of light rays coming from and hitting each point in the image is captured by ORCa’s 5D radiance field model. Orca can make more precise depth estimates thanks to the data in this 5D radiance field. Because the scene is displayed as a 5D radiance field rather than a 2D image, the user can see details that corners or other obstacles would otherwise obscure. Researchers explain that once ORCa has collected the 5D radiance field, the user can position a virtual camera wherever in the area and generate the synthetic image the camera would produce. The user might also alter the appearance of an item, say from ceramic to metallic, or incorporate virtual things into the scene.

By expanding the definition of the radiance field beyond the traditional direct-line-of-sight radiance field, researchers can open new avenues of inquiry into the environment and the objects inside it. Using projected virtual views and depth, the work can open up possibilities in virtual item insertion and 3D perception, such as extrapolating information from outside the camera’s field of vision.

Check out the Paper and Project Page. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post MIT Researchers Present A new computer vision system turns any shiny object into a camera of sorts: Enabling an observer to see around corners or beyond obstructions appeared first on MarkTechPost.

Meet QLORA: An Efficient Finetuning Approach That Reduces Memory Usage …

Large language models (LLMs) may be improved via finetuning, which also allows for adding or removing desired behaviors. However, finetuning big models is prohibitively costly; for example, a LLaMA 65B parameter model consumes more than 780 GB of GPU RAM when finetuning it in standard 16-bit mode. Although more current quantization approaches can lessen the memory footprint of LLMs, these methods only function for inference and fail during training. Researchers from the University of Washington developed QLORA, which quantizes a pretrained model using a cutting-edge, high-precision algorithm to a 4-bit resolution before adding a sparse set of learnable Low-rank Adapter weights modified by backpropagating gradients through the quantized consequences. They show for the first time that a quantized 4-bit model may be adjusted without affecting performance. 

Compared to a 16-bit fully finetuned baseline, QLORA reduces the average memory needs of finetuning a 65B parameter model from >780GB of GPU RAM to 48GB without sacrificing runtime or predictive performance. The largest publicly accessible models to date are now fine-tunable on a single GPU, representing a huge change in the accessibility of LLM finetuning. They train the Guanaco family of models using QLORA, and their largest model achieves 99.3% using a single professional GPU over 24 hours, effectively closing the gap to ChatGPT on the Vicuna benchmark. The second-best model reaches 97.8% of ChatGPT’s performance level on the Vicuna benchmark while being trainable in less than 12 hours on a single consumer GPU. 

The following technologies from QLORA are intended to lower memory use without compromising performance: (1) 4-bit NormalFloat, a quantization data type for normally distributed data that is information-theoretically optimum and produces superior empirical outcomes than 4-bit Integers and 4-bit Floats. (2) Double Quantization, which saves, on average, 0.37 bits per parameter (or around 3 GB for a 65B model), quantizes the quantization constants. (3) Paged Optimizers use NVIDIA unified memory to prevent memory spikes caused by gradient checkpointing when processing a mini-batch with a lengthy sequence. When used, their smallest Guanaco model (7B parameters) uses under 5 GB of memory while outperforming a 26 GB Alpaca model on the Vicuna test by more than 20 percentage points. 

They incorporate these contributions into a more refined LoRA strategy that includes adapters at every network tier and, therefore, almost eliminates the accuracy trade-offs identified in earlier work. Due to QLORA’s efficiency, we can analyze instruction finetuning and chatbot performance on model sizes in greater detail than we could have done with conventional finetuning owing to memory cost. As a result, they train over a thousand models using a variety of instruction-tuning datasets, model topologies, and parameter values ranging from 80M to 65B. They demonstrate that QLORA restores 16-bit performance, trains Guanaco, an advanced chatbot, and examines patterns in the learned models. 

First, even though both are intended to provide instruction after generalization, they discover that data quality is considerably more essential than dataset size, with a 9k sample dataset (OASST1) outperforming a 450k sample dataset (FLAN v2, subsampled) on chatbot performance. Second, they demonstrate that good Massive Multitask Language Understanding (MMLU) benchmark performance only sometimes translates into great Vicuna chatbot benchmark performance, and vice versa. In other words, dataset appropriateness is more important than scale for a given task. They also offer a thorough evaluation of chatbot performance using human raters and GPT-4. 

Models compete against one another in matches using tournament-style benchmarking to determine the best response to a given stimulus. GPT-4 or human annotators decide which player wins a game. Elo scores, which are created by combining the tournament outcomes, are used to rank chatbot performance. On the rank of model performance in the tournaments, they discover that GPT-4 and human judgments mostly concur, but there are also some areas of stark divergence. As a result, they draw attention to the fact that model-based assessment has uncertainties while being a less expensive option than human annotation. 

They add qualitative analysis of Guanaco models to their chatbot benchmark findings. Their study identifies instances of success and failure that the quantitative standards did not account for. They publish all model generations with GPT-4 and human comments to aid future research. They incorporate their techniques into the Hugging Face transformers stack, open-source their software and CUDA kernels, and make them widely available. For 32 distinct open-sourced, improved models, they provide a collection of adapters for models of sizes 7/13/33/65B trained on 8 different instruction following datasets. The code repository is made public, along with a demo that can be hosted on Colab.

Check out the Paper, Code, and Colab. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet QLORA: An Efficient Finetuning Approach That Reduces Memory Usage Enough To Finetune A 65B Parameter Model On A Single 48GB GPU While Preserving Full 16-Bit FineTuning Task Performance appeared first on MarkTechPost.

LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Pro …

SPRING is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning. 

A group of researchers from Carnegie Mellon University, NVIDIA, Ariel University, and Microsoft have investigated the use of Large Language Models (LLMs) for understanding and reasoning with human knowledge in the context of games. They propose a two-stage approach called SPRING, which involves studying an academic paper and then using a Question-Answer (QA) framework to justify the knowledge obtained.

More details about SPRING

In the first stage, the authors read the LaTeX source code of the original paper by Hafner (2021) to extract prior knowledge. They employed an LLM to extract relevant information, including game mechanics and desirable behaviors documented in the paper. They then utilized a QA summarization framework similar to Wu et al. (2023) to generate QA dialogue based on the extracted knowledge, enabling SPRING to handle diverse contextual information.

The second stage focused on in-context chain-of-thought reasoning using LLMs to solve complex games. They constructed a directed acyclic graph (DAG) as a reasoning module, where questions are nodes and dependencies between questions are represented as edges. For example, the question “For each action, are the requirements met?” is linked to the question “What are the top 5 actions?” within the DAG, establishing a dependency from the latter question to the former.

LLM answers are computed for each node/question by traversing the DAG in topological order. The final node in the DAG represents the question about the best action to take, and the LLM’s answer is directly translated into an environmental action.

Experiments and Results

The Crafter Environment, introduced by Hafner (2021), is an open-world survival game with 22 achievements organized in a tech tree of depth 7. The game is represented as a grid world with top-down observations and a discrete action space consisting of 17 options. The observations also provide information about the player’s current inventory state, including health points, food, water, rest levels, and inventory items.

The authors compared SPRING and popular RL methods on the Crafter benchmark. Subsequently, experiments and analysis were carried out on different components of their architecture to examine the impact of each part on the in-context “reasoning” abilities of the LLM.

Source: https://arxiv.org/pdf/2305.15486.pdf

The authors compared the performance of various RL baselines to SPRING with GPT-4, conditioned on the environment paper by Hafner (2021). SPRING surpasses previous state-of-the-art (SOTA) methods by a significant margin, achieving an 88% relative improvement in-game score and a 5% improvement in reward compared to the best-performing RL method by Hafner et al. (2023).

Notably, SPRING leverages prior knowledge from reading the paper and requires zero training steps, while RL methods typically necessitate millions of training steps.

Source: https://arxiv.org/pdf/2305.15486.pdf

The above figure represents a plot of unlock rates for different tasks, comparing SPRING to popular RL baselines. SPRING, empowered by prior knowledge, outperforms RL methods by more than ten times on achievements such as “Make Stone Pickaxe,” “Make Stone Sword,” and “Collect Iron,” which are deeper in the tech tree (up to depth 5) and challenging to reach through random exploration. 

Moreover, SPRING performs perfectly on achievements like “Eat Cow” and “Collect Drink.” At the same time, model-based RL frameworks like Dreamer-V3 have significantly lower unlock rates (over five times lower) for “Eat Cow” due to the challenge of reaching moving cows through random exploration. Importantly, SPRING does not take action “Place Stone” since it was not discussed as beneficial for the agent in the paper by Hafner (2021), even though it could be easily achieved through random exploration.

Limitations

One limitation of using an LLM for interacting with the environment is the need for object recognition and grounding. However, this limitation doesn’t exist in environments that provide accurate object information, such as contemporary games and virtual reality worlds. While pre-trained visual backbones struggle with games, they perform reasonably well in real-world-like environments. Recent advancements in visual-language models indicate potential for reliable solutions in visual-language understanding in the future.

Conclusion

In summary, the SPRING framework showcases the potential of Language Models (LLMs) for game understanding and reasoning. By leveraging prior knowledge from academic papers and employing in-context chain-of-thought reasoning, SPRING outperforms previous state-of-the-art methods on the Crafter benchmark, achieving substantial improvements in-game score and reward. The results highlight the power of LLMs in complex game tasks and suggest future advancements in visual-language models could address existing limitations, paving the way for reliable and generalizable solutions.

Check out the Paper. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning appeared first on MarkTechPost.

UC Berkeley Researchers Introduce Video Prediction Rewards (VIPER): An …

Designing a reward function by hand is time-consuming and can result in unintended consequences. This is a major roadblock in developing reinforcement learning (RL)-based generic decision-making agents.

Previous video-based learning methods have rewarded agents whose current observations are most like those of experts. They cannot capture meaningful activities throughout time since rewards are conditional solely on the current observation. And generalization is hindered by the adversarial training techniques that lead to mode collapse.

U.C. Berkeley researchers have developed a novel method for extracting incentives from video prediction models called Video Prediction incentives for reinforcement learning (VIPER). VIPER can learn reward functions from raw films and generalize to untrained domains.

First, VIPER uses expert-generated movies to train a prediction model. The video prediction model is then used to train an agent in reinforcement learning to optimize the log-likelihood of agent trajectories. The distribution of the agent’s trajectories must be minimized to match the distribution of the video model. Using the video model’s likelihoods as a reward signal directly, the agent may be trained to follow a trajectory distribution similar to the video model’s. Unlike rewards at the observational level, those provided by video models quantify the temporal consistency of behavior. It also allows quicker training timeframes and greater interactions with the environment because evaluating likelihoods is much faster than doing video model rollouts. 

Across 15 DMC tasks, 6 RLBench tasks, and 7 Atari tasks, the team conducts a thorough study and demonstrates that VIPER can achieve expert-level control without using task rewards. According to the findings, VIPER-trained RL agents beat adversarial imitation learning across the board. Since VIPER is integrated into the setting, it does not care which RL agent is used. Video models are already generalizable to arm/task combinations not encountered during training, even in the small dataset regime.

The researchers think using big, pre-trained conditional video models will make more flexible reward functions possible. With the help of recent breakthroughs in generative modeling, they believe their work provides the community with a foundation for scalable reward specification from unlabeled films.

Check out the Paper and Project. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post UC Berkeley Researchers Introduce Video Prediction Rewards (VIPER): An Algorithm That Leverages Pretrained Video Prediction Models As Action-Free Reward Signals For Reinforcement Learning appeared first on MarkTechPost.

Unleash Your Potential: 8 Must-Try AI Tools for Instant Results

TLDV

tl;dv is an AI-powered platform designed to improve the effectiveness of remote meetings. Google Meet and Zoom meetings may be captured, highlighted, edited, shared, and downloaded. Users may also benefit from statistics that light up their sessions’ workings. tl;dv has a wide range of applications, including the ability to record online meetings for free, to cut down on meeting time, to have more effective meetings, to share meeting insights faster, to scale recruitment, onboarding, and training, to amplify the customer’s voice, to increase transparency, to facilitate cross-functional collaboration, to work across time zones, and to ensure effective follow-up.

Kickresume 

Around 2,000,000 people from all around the globe have used Kickresume to write their resumes and cover letters. It offers various resources, including professional-quality templates vetted by HR professionals, to help users craft their strongest resumes. Kickresume protects the privacy of its users by letting them choose which cookies to accept. It has resources, including a resume and cover letter builder, a website creator, an artificial intelligence resume writer, an artificial intelligence cover letter writer, a resume checker, a jobs board called Pyjama, and resume and cover letter samples. You may use the instructions to improve your resume and cover letter writing skills.

10Web 

10Web’s AI-enhanced WordPress platform streamlines the development and maintenance processes for websites. Tools like an automated WordPress host, BuddyBoss hosting, one-click migration, real-time backup and security, and a page speed optimizer are all part of the platform. Using the Elementor-based drag-and-drop editor, users may easily construct or replicate any website with AI in the AI Builder. With extensions for popular WordPress plugins like Yoast, Classic Editor, and more, the AI Assistant makes it easy to produce high-quality material quickly and easily. The fastest, totally automated WordPress hosting, powered by Google Cloud, is provided by Automated WordPress Hosting. With PageSpeed Booster, websites may achieve a PageSpeed score of 90 or higher, boost their Core Web Vitals, and run more efficiently. The platform also has e-commerce widgets, menus, forms, sliders, 50+ premium widgets, full-site construction features, and 20+ content templates.

CHARTGPT 

The artificial intelligence tool CHARTGPT automatically generates charts from user-supplied text. The program reads your data description and then employs React, Next.js, OpenAI, and Tailwind CSS to render eye-catching charts. Users may save time and work by just typing in their data and having the tool turn it into a graph. Presenters, data analysts, and researchers will find it particularly useful for creating concise data visualizations. CHARTGPT’s use of OpenAI guarantees reliable output charts and the capability to process complicated and unstructured data. The results are presented attractively, facilitating both comprehension and dissemination.

Yatter Plus

Yatter Plus is an artificial intelligence–driven WhatsApp chatbot that alleviates your daily stresses by responding promptly to your inquiries and worries. Yatter Plus is like having a personal assistant at your disposal all the time, with the added convenience of communicating with it through text. The AI-powered helper has several useful functions, such as providing instantaneous responses to any question, translating across languages, doing computations, and more. Yatter Plus eliminates the need to search several search results before finding what you’re looking for. Therefore, Yatter Plus helps you save time, allowing you to work more effectively. You can use Yatter Plus on your favorite messaging software, WhatsApp, and it’s really simple.

Glasp 

Glasp is an artificial intelligence (AI)-driven app that sums up YouTube videos in seconds. It uses cutting-edge NLP and ML technology to provide consumers with a clear, short synopsis of the video in question. Glasp combines the strengths of ChatGPT and YouTube to create summaries that help users save time. The extension may be downloaded and installed on Chrome, Safari, and other web browsers in seconds. Glasp may be accessed on social networking sites like Twitter, LinkedIn, Medium, Substack, Discord, and Slack. Glasp summarizes YouTube videos and allows users to highlight and organize phrases and ideas from the web and access the knowledge shared by individuals with similar interests.

Monic AI

Monic.ai is developing a digital headquarters for learning management supplemented with AI. In addition to spaced repetition and deep file search (searching inside the sentences of your picture files), this platform facilitates various learning techniques. User-generated, unstructured content like PowerPoint presentations, books, and handwritten notes may be used to create interactive learning tools like flashcards and simulated tests for students and teachers.

ChatABC

By providing customers with features like team collaboration, a library of prompts, and an always-available service, ChatABC.ai is an improved version of ChatGPT. Users may choose from many models, such as ChatGPT and GPT-4, and place conversations in distinct folders for easy access. In addition, customers may upload papers and have their inquiries regarding those documents answered by the AI assistant. The software comes with over 150 pre-made suggestions; users may also make their own. There is a free plan, paid options for individuals, small groups, and corporations, and a bespoke plan.

This article is based on this tweet thread. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post Unleash Your Potential: 8 Must-Try AI Tools for Instant Results appeared first on MarkTechPost.

Meet GANonymization: A Novel Face Anonymization Framework With Facial …

In recent years, with the exponential growth in the availability of personal data and the rapid advancement of technology, concerns regarding privacy and security have been amplified. As a result, data anonymization has become more important because it plays a crucial role in protecting people’s privacy and preventing accidental sharing of sensitive information.

Data anonymization methods like generalization, suppression, randomization, and perturbation are commonly used to protect privacy while sharing and analyzing data. However, these methods have weaknesses. Generalization can cause information loss and reduced accuracy, suppression may result in incomplete data sets, randomization techniques can leave room for re-identification attacks, and perturbation can introduce noise that impacts data quality. Striking a balance between privacy and data utility is crucial when implementing these methods to overcome their limitations effectively.

Acquiring and sharing sensitive face data can be particularly difficult, especially when making datasets publicly available. However, there are promising opportunities in using facial data for tasks such as emotion recognition. To address these challenges, a research team from Germany proposed a novel approach to face anonymization that focuses on emotion recognition.

The authors introduce GANonymization, a novel face anonymization framework that preserves facial expressions. The framework utilizes a generative adversarial network (GAN) to synthesize an anonymized version of a face based on a high-level representation.

The GANonymization framework consists of four components: face extraction, face segmentation, facial landmarks extraction, and re-synthesis. In the face extraction step, the RetinaFace framework detects and extracts visible faces. The faces are then aligned and resized to meet the requirements of the GAN. Face segmentation is performed to remove the background and focus solely on the face. Facial landmarks are extracted using a media-pipe face-mesh model, providing an abstract representation of the facial shape. These landmarks are projected onto a 2D image. Finally, a pix2pix GAN architecture is employed for re-synthesis, using landmark/image pairs from the CelebA dataset as training data. The GAN generates realistic face images based on landmark representations, ensuring the preservation of facial expressions while removing irrelevant traits.

To evaluate the effectiveness of the proposed approach, the research team conducted a comprehensive experimental investigation. The evaluation encompassed multiple aspects, including assessing the anonymization performance, considering the preservation of emotional expressions, and examining the impact of training an emotion recognition model. They compared the approach with DeepPrivacy2 regarding anonymization performance using the WIDER dataset. They also assessed the preservation of emotional expressions using AffectNet, CK+, and FACES datasets. The proposed approach outperformed DeepPrivacy2 in preserving emotional expressions across the datasets, as demonstrated through inference and training scenarios. The experimental investigation provided evidence of the effectiveness of the proposed approach in terms of anonymization performance and preservation of emotional expressions. In both aspects, the findings demonstrated superiority over the compared method, DeepPrivacy2. These results contribute to understanding and advancing face anonymization techniques, particularly in maintaining emotional information while ensuring privacy protection.

In conclusion, we presented in this article a new approach, GANonymization, a novel face anonymization framework that utilizes a generative adversarial network (GAN) to preserve facial expressions while removing identifying traits. The comprehensive experimental investigation demonstrated the approach’s effectiveness in terms of anonymization performance and preservation of emotional expressions. In both aspects, the proposed approach outperformed DeepPrivacy2, a comparative method, indicating its superiority. These findings contribute to advancing face anonymization techniques and highlight the potential for maintaining emotional information while ensuring privacy protection.

Check out the Paper and Github link. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet GANonymization: A Novel Face Anonymization Framework With Facial Expression-Preserving Abilities appeared first on MarkTechPost.