Deploy a serverless ML inference endpoint of large language models usi …

For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, you can significantly reduce the required effort.
Amazon SageMaker inference, which was made generally available in April 2022, makes it easy for you to deploy ML models into production to make predictions at scale, providing a broad selection of ML infrastructure and model deployment options to help meet all kinds of ML inference needs. You can use SageMaker Serverless Inference endpoints for workloads that have idle periods between traffic spurts and can tolerate cold starts. The endpoints scale out automatically based on traffic and take away the undifferentiated heavy lifting of selecting and managing servers. Additionally, you can use AWS Lambda directly to expose your models and deploy your ML applications using your preferred open-source framework, which can prove to be more flexible and cost-effective.
FastAPI is a modern, high-performance web framework for building APIs with Python. It stands out when it comes to developing serverless applications with RESTful microservices and use cases requiring ML inference at scale across multiple industries. Its ease and built-in functionalities like the automatic API documentation make it a popular choice amongst ML engineers to deploy high-performance inference APIs. You can define and organize your routes using out-of-the-box functionalities from FastAPI to scale out and handle growing business logic as needed, test locally and host it on Lambda, then expose it through a single API gateway, which allows you to bring an open-source web framework to Lambda without any heavy lifting or refactoring your codes.
This post shows you how to easily deploy and run serverless ML inference by exposing your ML model as an endpoint using FastAPI, Docker, Lambda, and Amazon API Gateway. We also show you how to automate the deployment using the AWS Cloud Development Kit (AWS CDK).
Solution overview
The following diagram shows the architecture of the solution we deploy in this post.

Prerequisites
You must have the following prerequisites:

Python3 installed, along with virtualenv for creating and managing virtual environments in Python
aws-cdk v2 installed on your system in order to be able to use the AWS CDK CLI
Docker installed and running on your local machine

Test if all the necessary software is installed:

The AWS Command Line Interface (AWS CLI) is needed. Log in to your account and choose the Region where you want to deploy the solution.
Use the following code to check your Python version:

python3 –version

Check if virtualenv is installed for creating and managing virtual environments in Python. Strictly speaking, this is not a hard requirement, but it will make your life easier and helps follow along with this post more easily. Use the following code:

python3 -m virtualenv –version

Check if cdk is installed. This will be used to deploy our solution.

cdk –version

Check if Docker is installed. Our solution will make your model accessible through a Docker image to Lambda. To build this image locally, we need Docker.

docker –version

Make sure Docker is up and running with the following code:

docker ps

How to structure your FastAPI project using AWS CDK
We use the following directory structure for our project (ignoring some boilerplate AWS CDK code that is immaterial in the context of this post):

“`

fastapi_model_serving

└───.venv

└───fastapi_model_serving
│   │   __init__.py
│   │   fastapi_model_serving_stack.py
│   │
│   └───model_endpoint
│       └───docker
│       │      Dockerfile
│       │      serving_api.tar.gz


│       └───runtime
│            └───serving_api
│                    requirements.txt
│                    serving_api.py
│                └───custom_lambda_utils
│                     └───model_artifacts
│                            …
│                     └───scripts
│                            inference.py

└───templates
│   └───api
│   │     api.py
│   └───dummy
│         dummy.py

│ app.py
│   cdk.json
│   README.md
│   requirements.txt
│   init-lambda-code.sh

“`

The directory follows the recommended structure of AWS CDK projects for Python.
The most important part of this repository is the fastapi_model_serving directory. It contains the code that will define the AWS CDK stack and the resources that are going to be used for model serving.
The fastapi_model_serving directory contains the model_endpoint subdirectory, which contains all the assets necessary that make up our serverless endpoint, namely the Dockerfile to build the Docker image that Lambda will use, the Lambda function code that uses FastAPI to handle inference requests and route them to the correct endpoint, and the model artifacts of the model that we want to deploy. model_endpoint also contains the following:

Docker– This subdirectory contains the following:
Dockerfile – This is used to build the image for the Lambda function with all the artifacts (Lambda function code, model artifacts, and so on) in the right place so that they can be used without issues.
serving.api.tar.gz – This is a tarball that contains all the assets from the runtime folder that are necessary for building the Docker image. We discuss how to create the .tar.gz file later in this post.
runtime– This subdirectory contains the following:
serving_api – The code for the Lambda function and its dependencies specified in the requirements.txt file.
custom_lambda_utils – This includes an inference script that loads the necessary model artifacts so that the model can be passed to the serving_api that will then expose it as an endpoint.

Additionally, we have the template directory, which provides a template of folder structures and files where you can define your customized codes and APIs following the sample we went through earlier. The template directory contains dummy code that you can use to create new Lambda functions:

dummy – Contains the code that implements the structure of an ordinary Lambda function using the Python runtime
api – Contains the code that implements a Lambda function that wraps a FastAPI endpoint around an existing API gateway

Deploy the solution
By default, the code is deployed inside the eu-west-1 region. If you want to change the Region, you can change the DEPLOYMENT_REGION context variable in the cdk.json file.
Keep in mind, however, that the solution tries to deploy a Lambda function on top of the arm64 architecture, and that this feature might not be available in all Regions. In this case, you need to change the architecture parameter in the fastapi_model_serving_stack.py file, as well as the first line of the Dockerfile inside the Docker directory, to host this solution on the x86 architecture.
To deploy the solution, complete the following steps:

Run the following command to clone the GitHub repository: git clone https://github.com/aws-samples/lambda-serverless-inference-fastapiBecause we want to showcase that the solution can work with model artifacts that you train locally, we contain a sample model artifact of a pretrained DistilBERT model on the Hugging Face model hub for a question answering task in the serving_api.tar.gz file. The download time can take around 3–5 minutes. Now, let’s set up the environment.
Download the pretrained model that will be deployed from the Hugging Face model hub into the ./model_endpoint/runtime/serving_api/custom_lambda_utils/model_artifacts directory. It also creates a virtual environment and installs all dependencies that are needed. You only need to run this command once: make prep. This command can take around 5 minutes (depending on your internet bandwidth) because it needs to download the model artifacts.
Package the model artifacts inside a .tar.gz archive that will be used inside the Docker image that is built in the AWS CDK stack. You need to run this code whenever you make changes to the model artifacts or the API itself to always have the most up-to-date version of your serving endpoint packaged: make package_model. The artifacts are all in place. Now we can deploy the AWS CDK stack to your AWS account.
Run cdk bootstrap if it’s your first time deploying an AWS CDK app into an environment (account + Region combination):

make cdk_bootstrap
This stack includes resources that are needed for the toolkit’s operation. For example, the stack includes an Amazon Simple Storage Service (Amazon S3) bucket that is used to store templates and assets during the deployment process. Because we’re building Docker images locally in this AWS CDK deployment, we need to ensure that the Docker daemon is running before we can deploy this stack via the AWS CDK CLI.
To check whether or not the Docker daemon is running on your system, use the following command:

docker ps
If you don’t get an error message, you should be ready to deploy the solution.
Deploy the solution with the following command:

make deploy
This step can take around 5–10 minutes due to building and pushing the Docker image.

Troubleshooting
If you’re a Mac user, you may encounter an error when logging into Amazon Elastic Container Registry (Amazon ECR) with the Docker login, such as Error saving credentials … not implemented. For example:

exited with error code 1: Error saving credentials: error storing credentials – err: exit status 1,…dial unix backend.sock: connect: connection refused

Before you can use Lambda on top of Docker containers inside the AWS CDK, you may need to change the ~/docker/config.json file. More specifically, you might have to change the credsStore parameter in ~/.docker/config.json to osxkeychain. That solves Amazon ECR login issues on a Mac.
Run real-time inference
After your AWS CloudFormation stack is deployed successfully, go to the Outputs tab for your stack on the AWS CloudFormation console and open the endpoint URL. Now our model is accessible via the endpoint URL and we’re ready to run real-time inference.
Navigate to the URL to see if you can see “hello world” message and add /docs to the address to see if you can see the interactive swagger UI page successfully. There might be some cold start time, so you may need to wait or refresh a few times.

After you log in to the landing page of the FastAPI swagger UI page, you can run via the root / or via /question.
From /, you could run the API and get the “hello world” message.
From /question, you could run the API and run ML inference on the model we deployed for a question answering case. For example, we use the question is What is the color of my car now? and the context is My car used to be blue but I painted red.

When you choose Execute, based on the given context, the model will answer the question with a response, as shown in the following screenshot.

In the response body, you can see the answer with the confidence score from the model. You could also experiment with other examples or embed the API in your existing application.
Alternatively, you can run the inference via code. Here is one example written in Python, using the requests library:

import requests

url = “https://<YOUR_API_GATEWAY_ENDPOINT_ID>.execute-api.<YOUR_ENDPOINT_REGION>.amazonaws.com/prod/question?question=”What is the color of my car now?”&context=”My car used to be blue but I painted red””

response = requests.request(“GET”, url, headers=headers, data=payload)

print(response.text)

The code outputs a string similar to the following:

‘{“score”:0.6947233080863953,”start”:38,”end”:41,”answer”:”red”}’

If you are interested in knowing more about deploying Generative AI and large language models on AWS, check out here:

Deploy Serverless Generative AI on AWS Lambda with OpenLLaMa
Deploy large language models on AWS Inferentia2 using large model inference containers

Clean up
Inside the root directory of your repository, run the following code to clean up your resources:

make destroy

Conclusion
In this post, we introduced how you can use Lambda to deploy your trained ML model using your preferred web application framework, such as FastAPI. We provided a detailed code repository that you can deploy, and you retain the flexibility of switching to whichever trained model artifacts you process. The performance can depend on how you implement and deploy the model.
You are welcome to try it out yourself, and we’re excited to hear your feedback!

About the Authors
Tingyi Li is an Enterprise Solutions Architect from AWS based out in Stockholm, Sweden supporting the Nordics customers. She enjoys helping customers with the architecture, design, and development of cloud-optimized infrastructure solutions. She is specialized in AI and Machine Learning and is interested in empowering customers with intelligence in their AI/ML applications. In her spare time, she is also a part-time illustrator who writes novels and plays the piano.
Demir Catovic is a Machine Learning Engineer from AWS based in Zurich, Switzerland. He engages with customers and helps them implement scalable and fully-functional ML applications. He is passionate about building and productionizing machine learning applications for customers and is always keen to explore around new trends and cutting-edge technologies in the AI/ML world.

A New AI Research from the University of Maryland, College Park Develo …

The human eye is a wonderful organ that allows vision and stores important environmental data. They normally use their eyes as two lenses to direct light onto the photosensitive cells that make up their retina. Still, if they looked into someone else’s eyes, they would also be able to see the light reflected from the cornea. When they use a camera to photograph someone else’s eyes, they transform their eyes into a pair of mirrors in the imaging system. Since the light that reaches the observer’s retina and the light that reflects off their eyes come from the same source, their camera should provide pictures containing details about the environment they are viewing. 

An image of two eyes has recovered a panoramic representation of the world the observer sees in earlier experiments. Applications including relighting, focused object estimation, detecting grip position, and personal recognition have all been further studied in follow-up investigations. They ponder if they are capable of more than just reconstructing a single panoramic environment map or spotting patterns in light of current developments in 3D vision and graphics. Is it feasible to restore the observer’s reality in three dimensions? This work addresses these concerns by creating a 3D scene from a series of eye pictures. They begin with the knowledge that when their heads move naturally, their eyes capture and reflect information from several views. 

Researchers from the University of Maryland offer a brand-new technique for creating 3D reconstructions of an observer’s environment from eye scans, fusing past ground-breaking work with the most recent developments in neural rendering. Their method uses a stationary camera and extracts the multi-view cues from eye pictures. At the same time, head movement occurs, unlike the usual NeRF capture setup, which requires a moving camera to acquire multi-view information (frequently followed by camera position estimation). Though conceptually simple, rebuilding a 3D NeRF from eye pictures in practice is difficult. The initial difficulty is source separation. They must distinguish between reflections and the complex iris textures of human eyes. 

The 3D reconstruction process becomes more ambiguous due to these complicated patterns. The visual images they collect are intrinsically mixed with iris textures, in contrast to the clean photographs of the scene that are normally presumed in regular captures. This composition makes The reconstruction technique more difficult, which throws off the pixel correlation. Estimating the corneal posture presents a second difficulty. Small and difficult to localize precisely from image observations, eyes are. However, the precision of their positions and 3D orientations is crucial for multi-view reconstruction. 

To overcome these difficulties, the authors of this study repurpose NeRF for training on eye images by adding two essential elements: a) texture decomposition, which makes use of a short radial before making it easier to distinguish the iris texture from the overall radiance field, and b) eye pose refinement, which improves pose estimation accuracy despite the difficulties posed by the small size of eyes. They create a synthetic dataset of a complex indoor environment with photos that capture the reflection from an artificial cornea with a realistic texture to assess the performance and efficacy of their technique. They also use a real-world setup with several items to take pictures of eyes. They conduct considerable research on artificial and actual collected ocular images to support several design decisions in their methodology. 

These are their main contributions: 

• They offer a brand-new technique for creating 3D reconstructions of an observer’s environment from eye scans, fusing past ground-breaking work with the most recent developments in neural rendering. 

• They considerably enhance the quality of the reconstructed radiance field by introducing a radial prior for the breakdown of iris texture in eye pictures. 

• They solve the special problem of collecting characteristics from human eyes by developing a cornea pose refining process that reduces noisy pose estimations of eyeballs. 

These developments broaden the scope of 3D scene reconstruction through neural rendering to handle partially corrupted image observations obtained from eye reflections. This creates new opportunities for research and development in the broader field of accidental imaging to reveal and capture 3D scenes outside the visible line of sight. Their website has several videos showcasing their developments in action.

Figure 1 shows the reconstruction of a radiation field using eye reflections. The eye of a person is very reflecting. They demonstrate that using only the reflections of the subject’s eyes, it is possible to rebuild and display the 3D scene they are viewing from a series of frames that record a moving head.

Check Out The Paper and Project. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

AdCreative.ai

tinyEinstein

Notion

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post A New AI Research from the University of Maryland, College Park Develops an AI System that can Reconstruct 3D Scenes from Reflections in the Human Eye appeared first on MarkTechPost.

The Philosophy Course for ChatGPT: This AI Research Explores the Behav …

2023 is the year of LLMs. ChatGPT, GPT-4, LLaMA, and, more. A new LLM model is taking the spotlight one after the other. These models have revolutionized the field of natural language processing and are being increasingly utilized across various domains.

LLMs possess the remarkable ability to exhibit a wide range of behaviors, including engaging in dialogue, which can lead to a compelling illusion of conversing with a human-like interlocutor. However, it is important to recognize that LLM-based dialogue agents differ significantly from human beings in several respects.

Our language skills are developed through embodied interaction with the world. We, as individuals, acquire cognitive capacities and linguistic abilities through socialization and immersion in a community of language users. This part happens faster in babies, and as we grow old, our learning process slows down; but the fundamentals stay the same.

In contrast, LLMs are disembodied neural networks trained on vast amounts of human-generated text, with the primary objective of predicting the next word or token based on a given context. Their training revolves around learning statistical patterns from language data rather than through the direct experience of the physical world.

Despite these differences, we tend to use LLMs to mimic humans. We do this in chatbots, assistants, etc. Though, this approach poses a challenging dilemma. How do we describe and understand LLMs’ behavior? 

It is natural to employ familiar folk-psychological language, using terms like “knows,” “understands,” and “thinks” to describe dialogue agents, as we would with human beings. However, when taken too literally, such language promotes anthropomorphism, exaggerating the similarities between AI systems and humans while obscuring their profound differences.

So how do we approach this dilemma? How can we describe the terms “understanding” and “knowing” for AI models? Let’s jump into the Role Play paper. 

In this paper, the authors propose adopting alternative conceptual frameworks and metaphors to think and talk about LLM-based dialogue agents effectively. They advocate for two primary metaphors: viewing the dialogue agent as role-playing a single character or as a superposition of simulacra within a multiverse of possible characters. These metaphors offer different perspectives on understanding the behavior of dialogue agents and have their own distinct advantages.

Example of Autoregressive sampling. Source: https://arxiv.org/pdf/2305.16367.pdf

The first metaphor describes the dialogue agent as playing a specific character. When given a prompt, the agent tries to continue the conversation in a way that matches the assigned role or persona. It aims to respond according to the expectations associated with that role.

The second metaphor sees the dialogue agent as a collection of different characters from various sources. These agents have been trained on a wide range of materials like books, scripts, interviews, and articles, which gives them a lot of knowledge about different types of characters and storylines. As the conversation goes on, the agent adjusts its role and persona based on the training data it has, allowing it to adapt and respond in character.

Example of turn-taking in dialogue agents. Source: https://arxiv.org/pdf/2305.16367.pdf

By adopting this framework, researchers and users can explore important aspects of dialogue agents, like deception and self-awareness, without mistakenly attributing these concepts to humans. Instead, the focus shifts to understanding how dialogue agents behave in role-playing scenarios and the various characters they can imitate.

In conclusion, dialogue agents based on LLM possess the ability to simulate human-like conversations, but they differ significantly from actual human language users. By using alternative metaphors, such as seeing dialogue agents as role-players or combinations of simulations, we can better comprehend and discuss their behavior. These metaphors provide insights into the complex dynamics of LLM-based dialogue systems, enabling us to appreciate their creative potential while recognizing their fundamental distinctness from human beings.

Check Out The Paper. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

AdCreative.ai

tinyEinstein

Notion

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post The Philosophy Course for ChatGPT: This AI Research Explores the Behavior of LLMs in Dialogue Agents appeared first on MarkTechPost.

Revolutionizing Drug Discovery: Machine Learning Model Identifies Pote …

Aging and other diseases, such as cancer, type 2 diabetes, osteoarthritis, and viral infection, all involve cellular senescence as a stress response. Targeted removal of senescent cells is gaining popularity, although few senolytics are known since their molecular targets need to be better understood. Here, scientists describe finding three senolytics with relatively inexpensive machine learning algorithms that were educated entirely on previously published data. In human cell lines undergoing different types of senescence, they confirmed the senolytic action of ginkgetin, periplocin, and oleandrin using computational screening of multiple chemical libraries. The chemicals are as effective as well-established analytics, demonstrating that oleandrin is more effective than current gold standards against its target. The method reduced drug screening expenses by a factor of several hundred, and it shows that AI can make the most of limited and varied drug screening data. This opens the door to novel, data-driven methods for drug discovery’s early stages.

Although senolytics have shown considerable promise in relieving symptoms of numerous diseases in mice, their elimination has also been related to several negative outcomes, including the impairment of processes like wound healing and liver function. Despite promising findings, only two drugs have shown efficacy in clinical studies for their senolytic action.

Some good analytics have been developed in the past. However, they are generally harmful to healthy cells. Now, researchers at Scotland’s University of Edinburgh have developed a novel approach to identify chemical compounds that can remove these faulty cells without harming healthy ones.

They constructed a machine-learning model to identify compounds with senolytic qualities and taught it to do so. Chemicals from two existing chemical libraries, which include a wide range of FDA-approved or clinical-stage chemicals, were merged with data used to train the model from various sources, such as academic articles and commercial patents. To avoid biasing the machine-learning system, the dataset includes 2,523 substances with both senolytic and non-senolytic characteristics. After applying the algorithm to a database of over 4,000 compounds, 21 promising candidates were found.

Three compounds, ginkgetin, periplocin, and oleandrin, were shown during testing to eliminate senescent cells without affecting healthy cells, making them good candidates. The results showed that oleandrin was the most effective of the three. All three are common components of herbal remedies.

The oleander plant (Nerium oleander) is the source of oleandrin, a substance with comparable effects to the cardiac medication digoxin, which is used to treat heart failure and certain irregular heart rhythms (arrhythmias). Anticancer, anti-inflammatory, anti-HIV, antibacterial, and antioxidant effects have all been observed in oleandrin. The therapeutic window for oleandrin in humans is small, as it is highly toxic over therapeutic levels. Therefore, selling or using it as a food additive or pharmaceutical is illegal.

Like oleandrin, Linkedin has been proven to have beneficial effects against cancer, inflammation, microbes, and the nervous system in the form of antioxidant and neuroprotective characteristics. The Ginkgo (Ginkgo biloba) tree is the oldest living tree species, and its leaves and seeds have been used for herbal medicine in China for thousands of years. This tree is the source of Linkedin. The tree’s dried leaves are used to create an extract of Ginkgo biloba that is sold without a prescription. It is a top-selling herbal supplement in the United States and Europe.

According to the study authors, their results show that the chemicals are as effective as, if not more so than, the senolytics identified in earlier studies. They claim that their machine-learning-based approach was so effective that it cut down on the number of compounds required to be screened by a factor of over 200.

The team believes their AI-based strategy is a major step forward in discovering effective treatments for serious diseases. Several novel features in this technique set it apart from standard AI use in the pharmaceutical industry. 

First, it doesn’t require additional funds to be spent on in-house experimental characterization of training compounds because it uses only published data for model training.

Second, senolysis is a rare molecular property, and there are few senolytics reported in the literature, so the machine learning models were trained on a much smaller dataset than is typically considered in the field. The method’s effectiveness indicates that machine learning can make the most of literature data, even though such material is often more diverse and limited in scope than one may anticipate. 

Third, phenotypic indicators of pharmacological activity were used in target-agnostic model training. Many conditions impose a significant economic and societal burden but for which few or no targets are known; for these conditions, phenotypic drug discovery presents an opportunity to expand the number of chemical starting points that can be advanced through the discovery pipeline.

Check Out The Paper and Reference Article. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

AdCreative.ai

tinyEinstein

Notion

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post Revolutionizing Drug Discovery: Machine Learning Model Identifies Potential Age-Defying Compounds and Paves Way for Future Complex Disease Treatment appeared first on MarkTechPost.

How Light & Wonder built a predictive maintenance solution for gam …

This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W).
Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential. Over 500 machine events are monitored in near-real time to give a full picture of machine conditions and their operating environments. Utilizing data streamed through LnW Connect, L&W aims to create better gaming experience for their end-users as well as bring more value to their casino customers.
Light & Wonder teamed up with the Amazon ML Solutions Lab to use events data streamed from LnW Connect to enable machine learning (ML)-powered predictive maintenance for slot machines. Predictive maintenance is a common ML use case for businesses with physical equipment or machinery assets. With predictive maintenance, L&W can get advanced warning of machine breakdowns and proactively dispatch a service team to inspect the issue. This will reduce machine downtime and avoid significant revenue loss for casinos. With no remote diagnostic system in place, issue resolution by the Light & Wonder service team on the casino floor can be costly and inefficient, while severely degrading the customer gaming experience.
The nature of the project is highly exploratory—this is the first attempt at predictive maintenance in the gaming industry. The Amazon ML Solutions Lab and L&W team embarked on an end-to-end journey from formulating the ML problem and defining the evaluation metrics, to delivering a high-quality solution. The final ML model combines CNN and Transformer, which are the state-of-the-art neural network architectures for modeling sequential machine log data. The post presents a detailed description of this journey, and we hope you will enjoy it as much as we do!
In this post, we discuss the following:

How we formulated the predictive maintenance problem as an ML problem with a set of appropriate metrics for evaluation
How we prepared data for training and testing
Data preprocessing and feature engineering techniques we employed to obtain performant models
Performing a hyperparameter tuning step with Amazon SageMaker Automatic Model Tuning
Comparisons between the baseline model and the final CNN+Transformer model
Additional techniques we used to improve model performance, such as ensembling

Background
In this section, we discuss the issues that necessitated this solution.
Dataset
Slot machine environments are highly regulated and are deployed in an air-gapped environment. In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. The aggregated files are encrypted and the decryption key is only available in AWS Key Management Service (AWS KMS). A cellular-based private network into AWS is set up through which the files were uploaded into Amazon Simple Storage Service (Amazon S3).
LnW Connect streams a wide range of machine events, such as start of game, end of game, and more. The system collects over 500 different types of events. As shown in the following , each event is recorded along with a timestamp of when it happened and the ID of the machine recording the event. LnW Connect also records when a machine enters a non-playable state, and it will be marked as a machine failure or breakdown if it doesn’t recover to a playable state within a sufficiently short time span.

Machine ID
Event Type ID
Timestamp

0
E1
2022-01-01 00:17:24

0
E3
2022-01-01 00:17:29

1000
E4
2022-01-01 00:17:33

114
E234
2022-01-01 00:17:34

222
E100
2022-01-01 00:17:37

In addition to dynamic machine events, static metadata about each machine is also available. This includes information such as machine unique identifier, cabinet type, location, operating system, software version, game theme, and more, as shown in the following table. (All the names in the table are anonymized to protect customer information.)

Machine ID
Cabinet Type
OS
Location
Game Theme

276
A
OS_Ver0
AA Resort & Casino
StormMaiden

167
B
OS_Ver1
BB Casino, Resort & Spa
UHMLIndia

13
C
OS_Ver0
CC Casino & Hotel
TerrificTiger

307
D
OS_Ver0
DD Casino Resort
NeptunesRealm

70
E
OS_Ver0
EE Resort & Casino
RLPMealTicket

Problem definition
We treat the predictive maintenance problem for slot machines as a binary classification problem. The ML model takes in the historical sequence of machine events and other metadata and predicts whether a machine will encounter a failure in a 6-hour future time window. If a machine will break down within 6 hours, it is deemed a high-priority machine for maintenance. Otherwise, it is low priority. The following figure gives examples of low-priority (top) and high-priority (bottom) samples. We use a fixed-length look-back time window to collect historical machine event data for prediction. Experiments show that longer look-back time windows improve model performance significantly (more details later in this post).

Modeling challenges
We faced a couple of challenges solving this problem:

We have a huge amount event logs that contain around 50 million events a month (from approximately 1,000 game samples). Careful optimization is needed in the data extraction and preprocessing stage.
Event sequence modeling was challenging due to the extremely uneven distribution of events over time. A 3-hour window can contain anywhere from tens to thousands of events.
Machines are in a good state most of the time and the high-priority maintenance is a rare class, which introduced a class imbalance issue.
New machines are added continuously to the system, so we had to make sure our model can handle prediction on new machines that have never been seen in training.

Data preprocessing and feature engineering
In this section, we discuss our methods for data preparation and feature engineering.
Feature engineering
Slot machine feeds are streams of unequally spaced time series events; for example, the number of events in a 3-hour window can range from tens to thousands. To handle this imbalance, we used event frequencies instead of the raw sequence data. A straightforward approach is aggregating the event frequency for the entire look-back window and feeding it into the model. However, when using this representation, the temporal information is lost, and the order of events is not preserved. We instead used temporal binning by dividing the time window into N equal sub-windows and calculating the event frequencies in each. The final features of a time window are the concatenation of all its sub-window features. Increasing the number of bins preserves more temporal information. The following figure illustrates temporal binning on a sample window.

First, the sample time window is split into two equal sub-windows (bins); we used only two bins here for simplicity for illustration. Then, the counts of the events E1, E2, E3, and E4 are calculated in each bin. Lastly, they are concatenated and used as features.
Along with the event frequency-based features, we used machine-specific features like software version, cabinet type, game theme, and game version. Additionally, we added features related to the timestamps to capture the seasonality, such as hour of the day and day of the week.
Data preparation
To extract data efficiently for training and testing, we utilize Amazon Athena and the AWS Glue Data Catalog. The events data is stored in Amazon S3 in Parquet format and partitioned according to day/month/hour. This facilitates efficient extraction of data samples within a specified time window. We use data from all machines in the latest month for testing and the rest of the data for training, which helps avoid potential data leakage.
ML methodology and model training
In this section, we discuss our baseline model with AutoGluon and how we built a customized neural network with SageMaker automatic model tuning.
Building a baseline model with AutoGluon
With any ML use case, it’s important to establish a baseline model to be used for comparison and iteration. We used AutoGluon to explore several classic ML algorithms. AutoGluon is easy-to-use AutoML tool that uses automatic data processing, hyperparameter tuning, and model ensemble. The best baseline was achieved with a weighted ensemble of gradient boosted decision tree models. The ease of use of AutoGluon helped us in the discovery stage to navigate quickly and efficiently through a wide range of possible data and ML modeling directions.
Building and tuning a customized neural network model with SageMaker automatic model tuning
After experimenting with different neural networks architectures, we built a customized deep learning model for predictive maintenance. Our model surpassed the AutoGluon baseline model by 121% in recall at 80% precision. The final model ingests historical machine event sequence data, time features such as hour of the day, and static machine metadata. We utilize SageMaker automatic model tuning jobs to search for the best hyperparameters and model architectures.
The following figure shows the model architecture. We first normalize the binned event sequence data by average frequencies of each event in the training set to remove the overwhelming effect of high-frequency events (start of game, end of game, and so on). The embeddings for individual events are learnable, while the temporal feature embeddings (day of the week, hour of the day) are extracted using the package GluonTS. Then we concatenate the event sequence data with the temporal feature embeddings as the input to the model. The model consists of the following layers:

Convolutional layers (CNN) – Each CNN layer consists of two 1-dimensional convolutional operations with residual connections. The output of each CNN layer has the same sequence length as the input to allow for easy stacking with other modules. The total number of CNN layers is a tunable hyperparameter.
Transformer encoder layers (TRANS) – The output of the CNN layers is fed together with the positional encoding to a multi-head self-attention structure. We use TRANS to directly capture temporal dependencies instead of using recurrent neural networks. Here, binning of the raw sequence data (reducing length from thousands to hundreds) helps alleviate the GPU memory bottlenecks, while keeping the chronological information to a tunable extent (the number of the bins is a tunable hyperparameter).
Aggregation layers (AGG) – The final layer combines the metadata information (game theme type, cabinet type, locations) to produce the priority level probability prediction. It consists of several pooling layers and fully connected layers for incremental dimension reduction. The multi-hot embeddings of metadata are also learnable, and don’t go through CNN and TRANS layers because they don’t contain sequential information.

We use the cross-entropy loss with class weights as tunable hyperparameters to adjust for the class imbalance issue. In addition, the numbers of CNN and TRANS layers are crucial hyperparameters with the possible values of 0, which means specific layers may not always exist in the model architecture. This way, we have a unified framework where the model architectures are searched along with other usual hyperparameters.
We utilize SageMaker automatic model tuning, also known as hyperparameter optimization (HPO), to efficiently explore model variations and the large search space of all hyperparameters. Automatic model tuning receives the customized algorithm, training data, and hyperparameter search space configurations, and searches for best hyperparameters using different strategies such as Bayesian, Hyperband, and more with multiple GPU instances in parallel. After evaluating on a hold-out validation set, we obtained the best model architecture with two layers of CNN, one layer of TRANS with four heads, and an AGG layer.
We used the following hyperparameter ranges to search for the best model architecture:

hyperparameter_ranges = {
# Learning Rate
“learning_rate”: ContinuousParameter(5e-4, 1e-3, scaling_type=”Logarithmic”),
# Class weights
“loss_weight”: ContinuousParameter(0.1, 0.9),
# Number of input bins
“num_bins”: CategoricalParameter([10, 40, 60, 120, 240]),
# Dropout rate
“dropout_rate”: CategoricalParameter([0.1, 0.2, 0.3, 0.4, 0.5]),
# Model embedding dimension
“dim_model”: CategoricalParameter([160,320,480,640]),
# Number of CNN layers
“num_cnn_layers”: IntegerParameter(0,10),
# CNN kernel size
“cnn_kernel”: CategoricalParameter([3,5,7,9]),
# Number of tranformer layers
“num_transformer_layers”: IntegerParameter(0,4),
# Number of transformer attention heads
“num_heads”: CategoricalParameter([4,8]),
#Number of RNN layers
“num_rnn_layers”: IntegerParameter(0,10), # optional
# RNN input dimension size
“dim_rnn”:CategoricalParameter([128,256])
}

To further improve model accuracy and reduce model variance, we trained the model with multiple independent random weight initializations, and aggregated the result with mean values as the final probability prediction. There is a trade-off between more computing resources and better model performance, and we observed that 5–10 should be a reasonable number in the current use case (results shown later in this post).
Model performance results
In this section, we present the model performance evaluation metrics and results.
Evaluation metrics
Precision is very important for this predictive maintenance use case. Low precision means reporting more false maintenance calls, which drives costs up through unnecessary maintenance. Because average precision (AP) doesn’t fully align with the high precision objective, we introduced a new metric named average recall at high precisions (ARHP). ARHP is equal to the average of recalls at 60%, 70%, and 80% precision points. We also used precision at top K% (K=1, 10), AUPR, and AUROC as additional metrics.
Results
The following table summarizes the results using the baseline and the customized neural network models, with 7/1/2022 as the train/test split point. Experiments show that increasing the window length and sample data size both improve the model performance, because they contain more historical information to help with the prediction. Regardless of the data settings, the neural network model outperforms AutoGluon in all metrics. For example, recall at the fixed 80% precision is increased by 121%, which enables you to quickly identify more malfunctioned machines if using the neural network model.

Model
Window length/Data size
AUROC
AUPR
ARHP
Recall@Prec0.6
Recall@Prec0.7
Recall@Prec0.8
Prec@top1%
Prec@top10%

AutoGluon baseline
12H/500k
66.5
36.1
9.5
12.7
9.3
6.5
85
42

Neural Network
12H/500k
74.7
46.5
18.5
25
18.1
12.3
89
55

AutoGluon baseline
48H/1mm
70.2
44.9
18.8
26.5
18.4
11.5
92
55

Neural Network
48H/1mm
75.2
53.1
32.4
39.3
32.6
25.4
94
65

The following figures illustrate the effect of using ensembles to boost the neural network model performance. All the evaluation metrics shown on the x-axis are improved, with higher mean (more accurate) and lower variance (more stable). Each box-plot is from 12 repeated experiments, from no ensembles to 10 models in ensembles (x-axis). Similar trends persist in all metrics besides the Prec@top1% and Recall@Prec80% shown.
After factoring in the computational cost, we observe that using 5–10 models in ensembles is suitable for Light & Wonder datasets.

Conclusion
Our collaboration has resulted in the creation of a groundbreaking predictive maintenance solution for the gaming industry, as well as a reusable framework that could be utilized in a variety of predictive maintenance scenarios. The adoption of AWS technologies such as SageMaker automatic model tuning facilitates Light & Wonder to navigate new opportunities using near-real-time data streams. Light & Wonder is starting the deployment on AWS.
If you would like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.

About the authors
Aruna Abeyakoon is the Senior Director of Data Science & Analytics at Light & Wonder Land-based Gaming Division. Aruna leads the industry-first Light & Wonder Connect initiative and supports both casino partners and internal stakeholders with consumer behavior and product insights to make better games, optimize product offerings, manage assets, and health monitoring & predictive maintenance.
Denisse Colin is a Senior Data Science Manager at Light & Wonder, a leading cross-platform global game company. She is a member of the Gaming Data & Analytics team helping develop innovative solutions to improve product performance and customers’ experiences through Light & Wonder Connect.
Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he helps AWS customers across various industries such as gaming, healthcare and life sciences, manufacturing, automotive, and sports and media, accelerate their use of machine learning and AWS cloud services to solve their business challenges.
Mohamad Aljazaery is an applied scientist at Amazon ML Solutions Lab. He helps AWS customers identify and build ML solutions to address their business challenges in areas such as logistics, personalization and recommendations, computer vision, fraud prevention, forecasting and supply chain optimization.
Yawei Wang is an Applied Scientist at the Amazon ML Solution Lab. He helps AWS business partners identify and build ML solutions to address their organization’s business challenges in a real-world scenario.
Yun Zhou is an Applied Scientist at the Amazon ML Solutions Lab, where he helps with research and development to ensure the success of AWS customers. He works on pioneering solutions for various industries using statistical modeling and machine learning techniques. His interest includes generative models and sequential data modeling.
Panpan Xu is a Applied Science Manager with the Amazon ML Solutions Lab at AWS. She is working on research and development of Machine Learning algorithms for high-impact customer applications in a variety of industrial verticals to accelerate their AI and cloud adoption. Her research interest includes model interpretability, causal analysis, human-in-the-loop AI and interactive data visualization.
Raj Salvaji leads Solutions Architecture in the Hospitality segment at AWS. He works with hospitality customers by providing strategic guidance, technical expertise to create solutions to complex business challenges. He draws on 25 years of experience in multiple engineering roles across Hospitality, Finance and Automotive industries.
Shane Rai is a Principal ML Strategist with the Amazon ML Solutions Lab at AWS. He works with customers across a diverse spectrum of industries to solve their most pressing and innovative business needs using AWS’s breadth of cloud-based AI/ML services.

Deepmind Researchers Open-Source TAPIR: A New AI Model for Tracking An …

Computer vision is one of the most popular fields of Artificial Intelligence. The models developed using computer vision are able to derive meaningful information from different types of media, be it digital images, videos, or any other visual inputs. It teaches machines how to perceive and understand visual information and then act upon the details. Computer vision has taken a significant leap forward with the introduction of a new model called Tracking Any Point with per-frame Initialization and Temporal Refinement (TAPIR). TAPIR has been designed with the aim of effectively tracking a specific point of interest in a video sequence.

Developed by a team of researchers from Google DeepMind, VGG, Department of Engineering Science, and the University of Oxford, the algorithm behind the TAPIR model consists of two stages – a matching stage and a refinement stage. In the matching stage, the TAPIR model analyzes each video sequence frame separately to find a suitable candidate point match for the query point. This step seeks to identify the query point’s most likely related point in each frame, and in order to ensure that the TAPIR model can follow the query point’s movement across the video, this procedure is carried out frame by frame.

The matching stage in which candidate point matches are identified is followed by the employment of the refinement stage. In this stage, the TAPIR model updates both the trajectory, which is the path followed by the query point, and the query features based on local correlations and thus takes into account the surrounding information in each frame to improve the accuracy and precision of tracking the query point. The refining stage improves the model’s capacity to precisely track the movement of the query point and adjust to variations in the video sequence by integrating local correlations.

For the evaluation of the TAPIR model, the team has used the TAP-Vid benchmark, which is a standardized evaluation dataset for video tracking tasks. The results showed that the TAPIR model performs significantly better than the baseline techniques. The performance improvement has been measured using a metric called Average Jaccard (AJ), upon which the TAPIR model has shown to achieve an approximate 20% absolute improvement in AJ compared to other methods on the DAVIS (Densely Annotated VIdeo Segmentation) benchmark.

The model has been designed to facilitate fast parallel inference on long video sequences, i.e., it can process multiple frames simultaneously, improving the efficiency of tracking tasks. The team has mentioned that the model can be applied live, enabling it to process and keep track of points as new video frames are added. It can track 256 points on a 256×256 video at a rate of about 40 frames per second (fps) and can also be expanded to handle films with higher resolution, giving it flexibility in how it handles videos of various sizes and quality.

The team has provided two online Google Colab demos for the users to try TAPIR without installation. The first Colab demo enables users to run the model on their own videos, providing an interactive experience to test and observe the model’s performance. The second demo focuses on running TAPIR in an online fashion. Also, the users can run TAPIR live by tracking points on their own webcams with a modern GPU by cloning the codebase provided.

Check Out The Paper and Project. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

AdCreative.ai

tinyEinstein

Notion

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post Deepmind Researchers Open-Source TAPIR: A New AI Model for Tracking Any Point (TAP) that Effectively Tracks a Query Point in a Video Sequence appeared first on MarkTechPost.

AI Will Eat Itself? This AI Paper Introduces A Phenomenon Called Model …

Using stable diffusion, pictures could be made from just words. GPT-2, GPT-3(.5), and GPT-4 performed amazingly on many language challenges. The public was first exposed to these types of language models through ChatGPT. Large language models (LLMs) have established themselves as a permanent fixture and are expected to alter the entire online text and imagery ecosystem drastically. Training from massive web-scraped data can only be maintained if given due consideration. Indeed, the value of data acquired regarding true human interactions with systems will increase in the inclusion of content generated by LLMs in data scraped from the Internet.

Researchers from Britain and Canada find that model collapse occurs when one model learns from data generated by another. This degenerative process causes models to lose track of the genuine underlying data distribution over time, even when no change has occurred. They illustrate this phenomenon by providing case studies of model failure in the context of the Gaussian Mixture Model, the Variational Autoencoder, and the Large Language Model. They demonstrate how, over successive generations, acquired behaviors converge to an estimate with extremely minimal variance and how this loss of knowledge about the true distribution begins with the disappearance of the tails. In addition, they demonstrate that this outcome is inevitable even in scenarios with nearly optimal conditions for long-term learning, i.e., no function estimation error.

The researchers conclude by talking about the larger effects of model collapse. They point out how important it is to have access to the raw data to determine where the tails of the underlying distribution matter. Thus, data on human interactions with LLMs will become increasingly useful if used to post material on the Internet on a large scale, thereby polluting data collection to train them.

Model Collapse: What Is It?

When one generation of learned generative models collapses into the next, the latter is corrupted since they were trained on contaminated data and thus misinterpret the world. Model collapse can be classified as either “early” or “late,” depending on when it occurs. In the early stage of model collapse, the model starts to lose information about the distribution’s tails; in the late stage, the model entangles different modes of the original distributions and converges to a distribution that bears little resemblance to the original, often with very small variance.

In this approach, which considers many models throughout time, models do not forget previously learned data but instead begin misinterpreting what they perceive to be real by reinforcing their ideas, in contrast to the catastrophic forgetting process. This occurs due to two distinct mistake sources that, when combined throughout generations, lead to a departure from the original model. One particular mistake mechanism is crucial to the process; it would survive past the first generation.

Model Collapse: Causes

The basic and secondary causes of model failure are as follows:

The most common error is the result of a statistical approximation, which occurs when there are a finite number of samples but diminishes as the sample size approaches infinity.

Secondary error caused by function approximators not being sufficiently expressive (or occasionally overly expressive beyond the original distribution) is known as functional approximation error.

Each of these factors may exacerbate or ameliorate the likelihood of model collapse. Better approximation power can be a double-edged sword since greater expressiveness can amplify statistical noise and reduce it, leading to a better approximation of the underlying distribution.

Model collapse is said to occur in all recursively trained generative models, affecting every model generation. They make basic mathematical models that collapse when applied to real data but can be used to derive analytical equations for values of interest. They aim to put a number on the impact of various error types on final approximations of the original distribution.

Researchers show that Model Collapse can be triggered by training on data from another generative model, leading to a shift in distribution. As a result, the model incorrectly interprets the training problem. Long-term learning requires maintaining access to the original data source and keeping other data not produced by LLMs readily available over time. It is still being determined how content generated by LLMs can be tracked at scale, which raises problems about the provenance of content scraped from the Internet and the need to distinguish it from other data. Community-wide coordination is one approach to ensuring that all parties participating in LLM development and deployment are communicating and sharing data necessary to settle provenance problems. With data crawled from the Internet before the widespread adoption of the technology or direct access to data provided by humans at scale, it may become increasingly easier to train subsequent versions of LLMs.

Check Out The Paper and Reference Article. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

tinyEinstein

Notion

Getimg.ai

Rask AI

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post AI Will Eat Itself? This AI Paper Introduces A Phenomenon Called Model Collapse That Refers To A Degenerative Learning Process Where Models Start Forgetting Improbable Events Over Time appeared first on MarkTechPost.

50+ New Cutting-Edge AI Tools (July 2023)

AI tools are rapidly increasing in development, with new ones being introduced regularly. Check out some AI tools below that can enhance your daily routines.

tl;dv

Powered by the GPT model, this tool is a meeting recorder for Zoom and Google Meet. tl;dv transcribes and summarizes the calls for the user.

Otter AI

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure

Taskade

Taskade is an AI productivity tool that helps users manage their tasks and projects efficiently.

Notion AI

Notion AI is a writing assistant that helps users write, brainstorm, edit, and summarize right inside the Notion workspace.

Bing

Microsoft has launched the AI-powered Bing search engine, which is like having a research assistant, personal planner, and creative partner whenever the user searches the web.

Bard

Bard is a chatbot developed by Google that helps to boost productivity and bring ideas to life.

Forefront

Forefront AI is a platform that offers free access to GPT-4, image generation, custom personas, and shareable chats, thereby empowering businesses with improved efficiency and user experience.

Merlin

Merlin is a ChatGPT extension that helps users to finish any task on any website providing features like blog summarizer and AI writer for Gmail.

WNR AI

WNR AI provides AI templates that convert a simple form into an optimized prompt to extract the best results from AI.

Chat ABC

Chat ABC is a better alternative to ChatGPT, providing features like a prompt library, team collaboration, etc.

Paperpal AI

Paperpal is an AI language assistant and online academic writing tool that identifies language errors and provides instant suggestions to the user.

Monic AI

Monic is an AI tool that makes learning interactive by turning notes, slides, articles, and textbooks into mock tests.

ChartGPT

ChartGPT is a tool that transforms simple text into beautiful charts.

Trinka AI

Trinka is a grammar checker and language enhancement writing assistant.

Scholarcy

Scholarcy reads the user’s articles, reports, and textbooks and converts them into flashcards.

Lavender

Lavender is a sales email assistant that helps users to write better emails.

Regie

Regie is a content platform for revenue teams that allows users to create and publish sales sequences to their sales engagement platform.

Warmer

Warmer is an AI email personalization tool that helps users to increase their cold emails.

Twain

Twain is a communication assistant that helps users to write clear and confident outreach messages that get answers.

Octane

Octane is a platform for data collection and personalized Facebook Messenger and SMS automation.

10Web

10Web is an automated website builder that improves the core web vitals of users’ websites.

Uncody

Uncody is a landing page generator that allows users to build professional-looking websites easily.

Dora

Dora AI allows users to create editable websites just from an input prompt.

Durable

Durable is an AI website builder allowing users to instantly create websites with images and copies.

Replit

Replit is a web-based Integrated Development Environment (IDE) that enables users to build projects online.

Consensus

Consensus is an AI-powered search engine that extracts findings directly from scientific research.

Writesonic

Writesonic is an AI writer that generates SEO-friendly content for blogs, Google ads, Facebook ads, and Shopify for free.

Yatter Plus

Yatter Plus is a WhatsApp chatbot that answers all user queries, questions, and concerns in seconds.

Typewise

Typewise is a text prediction software that boosts enterprise productivity.

Cohere

Cohere is a tool that provides access to advanced LLMs and NLP tools through APIs.

Quickchat

Quickchat is a conversational AI assistant empowering companies to build their multilingual chatbots.

Kaizan

Kaizan is a Client Intelligence Platform that allows its users to retain their clients and grow revenue.

Looka

Looka is an AI-powered logo maker that enables entrepreneurs to easily create a professional logo and brand identity. 

Namecheap

Namecheap is a free logo generator tool for businesses.

LogoAI

LogoAI is a brand-building platform for crafting polished logos, developing cohesive brand identities, and streamlining brand promotion through automation.

Stockimg

Stockimg is an AI image generator that creates logos, book covers, and posters.

Brandmark

Brandmark is an AI-powered logo, business card, and social media graphics designer.

Panopreter

Panopreter is a text-to-speech tool that converts digital content into audio.

Speechelo

Speechelo is a tool that generates human-sounding voiceovers from text.

Synthesys

Synthesys is a platform that allows users to create multilingual voiceovers and videos effortlessly.

Speechify

Speechify is an AI voice generator capable of converting texts into natural-sounding voices.

Murf

Murf is an AI voice generator that makes the process of voiceovers effortless.

Pictory

Pictory is an AI video generator that creates short videos from long-form content.

Synthesia

Synthesia generates professional videos by simply taking text as input.

Veed.io

Veed.io is an AI-powered video editing platform that allows users to add images, subtitles, convert text to videos, and much more. 

Colossyan

Colossyan allows users to create videos from text within minutes and auto-translate to dozens of languages.

GetIMG

GetIMG allows users to generate original images at scale, edit photos, and create custom AI models.

Shutterstock

Shutterstock allows users to create unique AI photos using text prompts.

NightCafe

NightCafe is an AI art generator that allows users to create an artwork within seconds.

Artbreeder

Using Artbreeder, users can make simple collages from shapes and images by describing them with a prompt.

Stablecog

Stablecog is an open-source, free, and multilingual AI image generator.

Speak AI

Speak AI allows marketing teams to turn unstructured audio, video, and text into insights using NLP.

AISEO

AISEO is an AI-powered writing assistant which allows users to generate SEO-optimized content within minutes.

Lumen5

Lumen5 is an AI-powered video creation platform that allows users to easily create engaging video content within minutes.

Spellbook

Spellbook uses LLMs like GPT-4 to draft contracts faster.

Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

Note: This post contains affiliate links. If you use these links to buy something we may earn a commission. Thanks.”
The post 50+ New Cutting-Edge AI Tools (July 2023) appeared first on MarkTechPost.

Use the AWS CDK to deploy Amazon SageMaker Studio lifecycle configurat …

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use lifecycle configurations to automate customization for your Studio environment. This customization includes installing custom packages, configuring notebook extensions, preloading datasets, and setting up source code repositories. For example, as an administrator for a Studio domain, you may want to save costs by having notebook apps shut down automatically after long periods of inactivity.
The AWS Cloud Development Kit (AWS CDK) is a framework for defining cloud infrastructure through code and provisioning it through AWS CloudFormation stacks. A stack is a collection of AWS resources that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.
In this post, we show how to use the AWS CDK to set up Studio, use Studio lifecycle configurations, and enable its access for data scientists and developers in your organization.
Solution overview
The modularity of lifecycle configurations allows you to apply them to all users in a domain or to specific users. This way, you can set up lifecycle configurations and reference them in the Studio kernel gateway or Jupyter server quickly and consistently. The kernel gateway is the entry point to interact with a notebook instance, whereas the Jupyter server represents the Studio instance. This enables you to apply DevOps best practices and meet safety, compliance, and configuration standards across all AWS accounts and Regions. For this post, we use Python as the main language, but the code can be easily changed to other AWS CDK supported languages. For more information, refer to Working with the AWS CDK.
Prerequisites
To get started, make sure you have the following prerequisites:

The AWS Command Line Interface (AWS CLI) installed.
The AWS CDK installed. For more information, refer to Getting started with the AWS CDK and Working with the AWS CDK in Python.
An AWS profile with permissions to create AWS Identity and Access Management (IAM) roles, Studio domains, and Studio user profiles.
Python 3+.

Clone the GitHub repository
First, clone the GitHub repository.
As you clone the repository, you can observe that we have a classic AWS CDK project with the directory studio-lifecycle-config-construct, which contains the construct and resources required to create lifecycle configurations.
AWS CDK constructs
The file we want to inspect is aws_sagemaker_lifecycle.py. This file contains the SageMakerStudioLifeCycleConfig construct we use to set up and create lifecycle configurations.
The SageMakerStudioLifeCycleConfig construct provides the framework for building lifecycle configurations using a custom AWS Lambda function and shell code read in from a file. The construct contains the following parameters:

ID – The name of the current project.
studio_lifecycle_content – The base64 encoded content.
studio_lifecycle_tags – Labels you assign to organize Amazon resources. They are inputted as key-value pairs and are optional for this configuration.
studio_lifecycle_config_app_type – JupyterServer is for the unique server itself, and the KernelGateway app corresponds to a running SageMaker image container.

For more information on the Studio notebook architecture, refer to Dive deep into Amazon SageMaker Studio Notebooks architecture.
The following is a code snippet of the Studio lifecycle config construct (aws_sagemaker_lifecycle.py):

class SageMakerStudioLifeCycleConfig(Construct):
def __init__(
self,
scope: Construct,
id: str,
studio_lifecycle_config_content: str,
studio_lifecycle_config_app_type: str,
studio_lifecycle_config_name: str,
studio_lifecycle_config_arn: str,
**kwargs,
):
super().__init__(scope, id)
self.studio_lifecycle_content = studio_lifecycle_content
self.studio_lifecycle_config_name = studio_lifecycle_config_name
self.studio_lifecycle_config_app_type = studio_lifecycle_config_app_type

lifecycle_config_role = iam.Role(
self,
“SmStudioLifeCycleConfigRole”,
assumed_by=iam.ServicePrincipal(“lambda.amazonaws.com”),
)

lifecycle_config_role.add_to_policy(
iam.PolicyStatement(
resources=[f”arn:aws:sagemaker:{scope.region}:{scope.account}:*”],
actions=[
“sagemaker:CreateStudioLifecycleConfig”,
“sagemaker:ListUserProfiles”,
“sagemaker:UpdateUserProfile”,
“sagemaker:DeleteStudioLifecycleConfig”,
“sagemaker:AddTags”,
],
)
)

create_lifecycle_script_lambda = lambda_.Function(
self,
“CreateLifeCycleConfigLambda”,
runtime=lambda_.Runtime.PYTHON_3_8,
timeout=Duration.minutes(3),
code=lambda_.Code.from_asset(
“../mlsl-cdk-constructs-lib/src/studiolifecycleconfigconstruct”
),
handler=”onEvent.handler”,
role=lifecycle_config_role,
environment={
“studio_lifecycle_content”: self.studio_lifecycle_content,
“studio_lifecycle_config_name”: self.studio_lifecycle_config_name,
“studio_lifecycle_config_app_type”: self.studio_lifecycle_config_app_type,
},
)

config_custom_resource_provider = custom_resources.Provider(
self,
“ConfigCustomResourceProvider”,
on_event_handler=create_lifecycle_script_lambda,
)

studio_lifecyle_config_custom_resource = CustomResource(
self,
“LifeCycleCustomResource”,
service_token=config_custom_resource_provider.service_token,
)
self. studio_lifecycle_config_arn = studio_lifecycle_config_custom_resource.get_att(“StudioLifecycleConfigArn”)

After you import and install the construct, you can use it. The following code snippet shows how to create a lifecycle config using the construct in a stack either in app.py or another construct:

my_studio_lifecycle_config = SageMakerStudioLifeCycleConfig(
self,
“MLSLBlogPost”,
studio_lifecycle_config_content=”base64content”,
studio_lifecycle_config_name=”BlogPostTest”,
studio_lifecycle_config_app_type=”JupyterServer”,

)

Deploy AWS CDK constructs
To deploy your AWS CDK stack, run the following commands in the location where you cloned the repository.
The command may be python instead of python3 depending on your path configurations.

Create a virtual environment:

For macOS/Linux, use python3 -m venv .cdk-venv.
For Windows, use python3 -m venv .cdk-venv.

Activate the virtual environment:

For macOS/Linux, use source .cdk-venvbinactivate.
For Windows, use .cdk-venv/Scripts/activate.bat.
For PowerShell, use .cdk-venv/Scripts/activate.ps1.

Install the required dependencies:

pip install -r requirements.txt
pip install -r requirements-dev.txt

At this point, you can optionally synthesize the CloudFormation template for this code:

cdk synth

Deploy the solution with the following commands:

aws configure
cdk bootstrap
cdk deploy

When the stack is successfully deployed, you should be able to view the stack on the CloudFormation console.

You will also be able to view the lifecycle configuration on the SageMaker console.

Choose the lifecycle configuration to view the shell code that runs as well as any tags you assigned.

Attach the Studio lifecycle configuration
There are multiple ways to attach a lifecycle configuration. In this section, we present two methods: using the AWS Management Console, and programmatically using the infrastructure provided.
Attach the lifecycle configuration using the console
To use the console, complete the following steps:

On the SageMaker console, choose Domains in the navigation pane.
Choose the domain name you’re using and the current user profile, then choose Edit.
Select the lifecycle configuration you want to use and choose Attach.

From here, you can also set it as default.

Attach the lifecycle configuration programmatically
You can also retrieve the ARN of the Studio lifecycle configuration created by the construct’s and attach it to the Studio construct programmatically. The following code shows the lifecycle configuration ARN being passed to a Studio construct:

default_user_settings=sagemaker.CfnDomain.UserSettingsProperty(
execution_role=self.sagemaker_role.role_arn,
jupyter_server_app_settings=sagemaker.CfnDomain.JupyterServerAppSettingsProperty(
default_resource_spec=sagemaker.CfnDomain.ResourceSpecProperty(
instance_type=”system”,
lifecycle_config_arn = my_studio_lifecycle_config.studio_lifeycycle_config_arn

)
)

Clean up
Complete the steps in this section to clean up your resources.
Delete the Studio lifecycle configuration
To delete your lifecycle configuration, complete the following steps:

On the SageMaker console, choose Studio lifecycle configurations in the navigation pane.
Select the lifecycle configuration, then choose Delete.

Delete the AWS CDK stack
When you’re done with the resources you created, you can destroy your AWS CDK stack by running the following command in the location where you cloned the repository:

cdk destroy

When asked to confirm the deletion of the stack, enter yes.
You can also delete the stack on the AWS CloudFormation console with the following steps:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack that you want to delete.
In the stack details pane, choose Delete.
Choose Delete stack when prompted.

If you run into any errors, you may have to manually delete some resources depending on your account configuration.
Conclusion
In this post, we discussed how Studio serves as an IDE for ML workloads. Studio offers lifecycle configuration support, which allows you to set up custom shell scripts to perform automated tasks, or set up development environments at launch. We used AWS CDK constructs to build the infrastructure for the custom resource and lifecycle configuration. Constructs are synthesized into CloudFormation stacks that are then deployed to create the custom resource and lifecycle script that is used in Studio and the notebook kernel.
For more information, visit Amazon SageMaker Studio.

About the Authors
Cory Hairston is a Software Engineer with the Amazon ML Solutions Lab. He currently works on providing reusable software solutions.
Alex Chirayath is a Senior Machine Learning Engineer at the Amazon ML Solutions Lab. He leads teams of data scientists and engineers to build AI applications to address business needs.
Gouri Pandeshwar is an Engineer Manager at the Amazon ML Solutions Lab. He and his team of engineers are working to build reusable solutions and frameworks that help accelerate adoption of AWS AI/ML services for customers’ business use cases.

Boost agent productivity with Salesforce integration for Live Call Ana …

As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a customer’s perception of your brand.
The Live Call Analytics with Agent Assist (LCA) open-source solution addresses these challenges by providing features such as AI-powered agent assistance, call transcription, call summarization, and much more. As part of our effort to meet the needs of your agents, we strive to add features based on your feedback and our own experience helping contact center operators.
One of the features we added is the ability to write your own AWS Lambda hooks for the start of call and post-call to custom process calls as they occur. This makes it easier to custom integrate with LCA architecture without complex modification to the original source code. It also lets you update LCA stack deployments more easily and quickly than if you were modifying the code directly.
Today, we are excited to announce a feature that lets you integrate LCA with your Customer Relationship Management (CRM) system, built on top of the pre- and post-call Lambda hooks.

In this post, we walk you through setting up the LCA/CRM integration with Salesforce.
Solution overview
LCA now has two additional Lambda hooks:

Start of call Lambda hook – The LCA Call Event/Transcript Processor invokes this hook at the beginning of each call. This function can implement custom logic that applies to the beginning of call processing, such as retrieving call summary details logged into a case in a CRM.
Post-call summary Lambda hook – The LCA Call Event/Transcript Processor invokes this hook after the call summary is processed. This function can implement custom logic that’s relevant to postprocessing, for example, updating the call summary to a CRM system.

The following diagram illustrates the start of call and post-call (summary) Lambda hooks that integrate with Salesforce to look up and update case records, respectively.

Here are the steps we walk you through:

Set up Salesforce to allow the custom Lambda hooks to look up or update the case records.
Deploy the LCA and Salesforce integration stacks.
Update the LCA stack with the Salesforce integration Lambda hooks and perform validations.

Prerequisites
You need the following prerequisites:

An existing Salesforce organization. Sign up for a free Salesforce Developer Edition organization, if you don’t have one.
An AWS account. If you don’t have one, sign up at https://aws.amazon.com.
The AWS Command Line Interface (AWS CLI) version 2 installed.
The AWS Serverless Application Model Command Line Interface (AWS SAM CLI), to build and deploy your SAM application.

Create a Salesforce connected app
To set up your Salesforce app, complete the following steps:

Log in to your Salesforce org and go to Setup.
Search for App Manager and choose App Manager.
Choose New Connected App.
For Connected App Name, enter a name.
For Contact Email, enter a valid email.
Select Enable OAuth Settings and enter a value for Callback URL.
Under Available OAuth Scopes, choose Manage user data via APIs (api).
Select Require Secret for Webserver Flow and Require Secret for Refresh Token Flow.
Choose Save.
Under API (Enable OAuth Settings), choose Manage Consumer Details.
Verify your identity if prompted.
Copy the consumer key and consumer secret.

You need these when deploying the AWS Serverless Application Model (AWS SAM) application.
Get your Salesforce access token
If you don’t already have an access token, you need to obtain one. Before doing this, make sure that you’re prepared to update any applications that are using an access token because this step creates a new one and may invalidate the prior tokens.

Find your personal information by choosing Settings from View profile on the top right.
Choose Reset My Security Token followed by Reset Security Token.
Make note of the new access token that you receive via email.

Create a Salesforce customer contact record for each caller
The Lambda function that performs case look-up and update matches the caller’s phone number with a contact record in Salesforce. To create a new contact, complete the following steps:

Log in to your Salesforce org.
Under App Launcher, search for and choose Service Console.
On the Service Console page, choose Contacts from the drop-down list, then choose New.
Enter a valid phone number under the Phone field of the New Contact page.
Enter other contact details and choose Save.
Repeat Steps 1–5 for any caller that makes a phone call and test the integration.

Deploy the LCA stack
Complete the following steps to deploy the LCA stack:

Follow the instructions under the Deploy the CloudFormation stack section of Live call analytics and agent assist for your contact center with Amazon language AI services.
Make sure that you choose ANTHROPIC, SAGEMAKER, or LAMBDA for the End of Call Transcript Summary parameter. See Transcript Summarization for more details.

The stacks take about 45 minutes to deploy.

After the main stack shows CREATE_COMPLETE, on the Outputs tab, make a note of the Kinesis data stream ARN (CallDataStreamArn).

Deploy the Salesforce integration stack
To deploy the Salesforce integration stack, complete the following steps:

Open a command-line terminal and run the following commands:

https://github.com/aws-samples/amazon-transcribe-live-call-analytics.git
cd amazon-transcribe-live-call-analytics/plugins/salesforce-integration
sam build
sam deploy —guided

Use the following table as a reference for parameter choices.

Parameter Name
Description

AWS Region
The Region where you have deployed the LCA solution

SalesforceUsername
The user name of your Salesforce organization that has permissions to read and create cases

SalesforcePassword
The password associated to your Salesforce user name

SalesforceAccessToken
The access token you obtained earlier

SalesforceConsumerKey
The consumer key you copied earlier

SalesforceConsumerSecret
The consumer secret you obtained earlier

SalesforceHostUrl
The login URL of your Salesforce organization

SalesforceAPIVersion
The Salesforce API version (choose default or v56.0)

LCACallDataStreamArn
The Kinesis data stream ARN (CallDataStreamArn) obtained earlier

After the stack successfully deploys, make a note of StartOfCallLambdaHookFunctionArn and PostCallSummaryLambdaHookFunctionArn from the outputs displayed on your terminal.

Update LCA Stack
Complete the following steps to update the LCA stack:

On the AWS CloudFormation console, update the main LCA stack.
Choose Use current template.
For Lambda Hook Function ARN for Custom Start of Call Processing (existing), provide the StartOfCallLambdaHookFunctionArn that you obtained earlier.
For Lambda Hook Function ARN for Custom Post Processing, after the Call Transcript Summary is processed (existing), provide the PostCallSummaryLambdaHookFunctionArn that you obtained earlier.
Make sure that End of Call Transcript Summary is not DISABLED.

Validate the integration
Make a test call and make sure you can see the beginning of call AGENT ASSIST and post-call AGENT ASSIST transcripts. Refer to the Explore live call analysis and agent assist features section of the Live call analytics and agent assist for your contact center with Amazon language AI services post for guidance.
Clean up
To avoid incurring charges, clean up your resources by following these instructions when you are finished experimenting with this solution:

On the AWS CloudFormation console, and delete the LCA stacks that you deployed. This deletes resources that were created by deploying the solution. The recording S3 buckets, DynamoDB table, and CloudWatch log groups are retained after the stack is deleted to avoid deleting your data.
On your terminal, run sam delete to delete the Salesforce integration Lambda functions.
Follow the instructions in Deactivate a Developer Edition Org to deactivate your Salesforce Developer org.

Conclusion
In this post, we demonstrated how the Live-Call Analytics sample project can accelerate your adoption of real-time contact center analytics and integration. Rather than building from scratch, we show how to use the existing code base with the pre-built integration points with the start of call and post-call Lambda hooks. This enhances agent productivity by integrating with Salesforce to look up and update case records. Explore our open-source project and enhance the CRM pre- and post-call Lambda hooks to accommodate your use case.

About the Authors
Kishore Dhamodaran is a Senior Solutions Architect at AWS.
Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.
Christopher Lott is a Senior Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California and enjoys gardening, aerospace, and traveling the world.
Babu Srinivasan is a Sr. Specialist SA – Language AI services in the World Wide Specialist organization at AWS, with over 24 years of experience in IT and the last 6 years focused on the AWS Cloud. He is passionate about AI/ML. Outside of work, he enjoys woodworking and entertains friends and family (sometimes strangers) with sleight of hand card magic.

20+ Best AI Tools For Startups (2023)

Workplace creativity, analysis, and decision-making are all being revolutionized by AI. Today, artificial intelligence capabilities present a tremendous opportunity for businesses to hasten expansion and better control internal processes. Artificial intelligence applications are vast, ranging from automation and predictive analytics to personalization and content development. Here is a rundown of the best artificial intelligence tools that can give young businesses a leg up and speed up their expansion.

AdCreative.ai

Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. Say goodbye to hours of creative work and hello to the high-converting ad and social media posts generated in mere seconds. Maximize your success and minimize your effort with AdCreative.ai today.

DALL·E 2

OpenAI’s DALLE 2 is a cutting-edge AI art generator that creates unique and creative visuals from a single text input. Its AI model was trained on a huge dataset of images and textual descriptions to produce detailed and visually attractive images in response to written requests. Startups can use DALLE 2 to create images in advertisements and on their websites and social media pages. Businesses can save time and money by not manually sourcing or creating graphics from the start, thanks to this method of generating different images from text. 

Otter AI

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure. Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion

Notion is aiming to increase its user base through the utilization of its advanced AI technology. Their latest feature, Notion AI, is a robust generative AI tool that assists users with tasks like note summarization, identifying action items in meetings, and creating and modifying text. Notion AI streamlines workflows by automating tedious tasks, providing suggestions, and templates to users, ultimately simplifying and improving the user experience.

Motion

Motion is a clever tool that uses AI to create daily schedules that account for your meetings, tasks, and projects. Say goodbye to the hassle of planning and hello to a more productive life.

Jasper

With its outstanding content production features, Jasper, an advanced AI content generator, is making waves in the creative industry. Jasper, considered the best in its area, aids new businesses in producing high-quality content across multiple media with minimal time and effort investment. The tool’s efficiency stems from recognizing human writing patterns, which facilitates groups’ rapid production of interesting content. To stay ahead of the curve, entrepreneurs may use Jasper as an AI-powered companion to help them write better copy for landing pages and product descriptions and more intriguing and engaging social media posts.

Lavender

Lavender, a real-time AI Email Coach, is widely regarded as a game-changer in the sales industry, helping thousands of SDRs, AEs, and managers improve their email response rates and productivity. Competitive sales environments make effective communication skills crucial to success. Startups may capitalize on the competition by using Lavender to boost their email response rate and forge deeper relationships with prospective customers.

Speak AI

Speak is a speech-to-text software driven by artificial intelligence that makes it simple for academics and marketers to transform linguistic data into useful insights without custom programming. Startups can acquire an edge and strengthen customer relationships by transcribing user interviews, sales conversations, and product reviews. In addition, they can examine rivals’ material to spot trends in keywords and topics and use this information to their advantage. In addition, marketing groups can utilize speech-to-text transcription to make videos and audio recordings more accessible and generate written material that is search engine optimization (SEO) friendly and can be used in various contexts.  

GitHub Copilot

Recently, GitHub released an AI tool called GitHub Copilot, which can translate natural language questions into code recommendations in dozens of languages. This artificial intelligence (AI) tool was trained on billions of lines of code using OpenAI Codex to detect patterns in the code and make real-time, in-editor suggestions of code that implement full functionalities. A startup’s code quality, issue fixes, and feature deliveries can all benefit greatly from using GitHub Copilot. Moreover, GitHub Copilot enables developers to be more productive and efficient by handling the mundane aspects of coding so that they can concentrate on the bigger picture.

Olivia by Paradox

For faster hiring across all industries and geographies, businesses can turn to Olivia, a conversational recruiting tool developed by Paradox. This AI-powered conversational interface may be used for candidate screening, FAQs, interview scheduling, and new hire onboarding. With Olivia, entrepreneurs may locate qualified people for even the most technical positions and reclaim the hours spent on administrative activities.

Lumen5

Lumen5 is a marketing team-focused video production platform that allows for developing high-quality videos with zero technical requirements. Lumen5 uses Machine Learning to automate video editing, allowing users to quickly and easily produce high-quality videos. Startups can quickly and easily create high-quality films for social media, advertising, and thought leadership with the help of the platform’s built-in media library, which provides access to millions of stock footage, photographs, and music tracks. In addition, AI can help firms swiftly convert blog entries to videos or Zoom recordings into interesting snippets for other marketing channels.

Spellbook by Rally

Spellbook is an artificial intelligence (AI) tool that leverages OpenAI’s GPT-3 to review and recommend language for your contracts without you having to leave the comfort of a Word document. It was trained on billions of lines of legal text. This artificial intelligence tool can be used by startups in drafting and reviewing agreements and external contracts to identify aggressive words, list missing clauses and definitions, and red flag flags. Spellbook can also generate new clauses and recommend common topics of negotiation based on the agreement’s context.

Grammarly

Grammarly is an AI-powered writing app that flags and corrects grammar errors as you type. A machine learning algorithm trained on a massive dataset of documents containing known faults drives the system. Enter your content (or copy and paste it) into Grammarly, and the program will check it for mistakes. Furthermore, the program “reads” the mood of your work and makes suggestions accordingly. You can choose to consider the recommendations or not. As an AI tool, Grammarly automates a process that previously required human intervention (in this case, proofreading). Use an AI writing checker like Grammarly, and you’ll save yourself a ton of time.

ChatBot

Chatbots are one of the most well-known uses of artificial intelligence. Computer programs called “chatbots” attempt to pass as humans in online conversations. They process user input using NLP algorithms that enable them to respond appropriately. From assisting customers to promoting products, chatbots have many potential applications. Chatbots on websites and mobile apps have increased in recent years to provide constant help to customers. Whether answering basic questions or solving complex problems, chatbots are up to the challenge. In addition, businesses can use them to make suggestions to customers, such as offering related items or services.

Zendesk

Keeping track of customer support inquiries can take time and effort, especially for smaller organizations. Zendesk is an artificial intelligence (AI)-powered platform for managing customer assistance. Zendesk goes above and beyond the capabilities of chatbots by discovering trends and patterns in customer service inquiries. Useful metrics are automatically gathered, such as typical response times and most often encountered issues. It also finds the most popular articles in your knowledge base so you can prioritize linking to them. An intuitive dashboard displays all this information for a bird’s-eye view of your customer service.

Timely

Timely is an AI-powered calendar app that will revolutionize how you schedule your day. It integrates with your regular software to make tracking time easier for your business. Track your team’s efficiency, identify time-consuming tasks, and understand how your company spends its resources. Timely is a fantastic tool for increasing the effectiveness and efficiency of your team. You can see how your staff spends their time in real-time and adjust workflows accordingly.

AIReflex

If you own an online store, you understand the ongoing threat of fraud. Companies lose billions of dollars annually to credit card fraud, which can also hurt your reputation. Through the analysis of client behavior patterns, fraud can be prevented with the help of AI. Machine learning algorithms are used by businesses like aiReflex to sift through client data in search of signs of fraud. It would be impractical and time-consuming to inspect every transaction manually. However, this can be automated with the help of AI, which will keep an eye on all of your financial dealings and flag anything that looks fishy. Your company will be safe from fraudulent activity if you take this precaution.

Murf AI

Murf is an artificial intelligence–powered text-to-speech tool. It has a wide range of applications, from speech generation for corporate training to use in audiobook and podcast production. It is a highly flexible tool that may also be used for voiceovers in promotional videos or infomercials. Murf is a wonderful option if you need to generate a speech but don’t have the funds to hire a professional voice actor. Choosing a realistic-sounding voice from their more than 120 options in 20 languages is easy. Their studio is easy to use, and you may incorporate audio, video, and still photographs into your production. As a bonus, you have complete command over the rate, pitch, and intonation of your recording, allowing you to mimic the performance of a trained voice actor.

ChatGPT

OpenAI’s ChatGPT is a massive language model built on the GPT-3.5 framework. It can produce logical and appropriate answers to various inquiries because it has been trained on large text data. Because ChatGPT can automate customer care and support, it has helped startups provide 24/7 help without hiring a huge customer service department. For instance, the Indian food delivery firm Swiggy has used ChatGPT to enhance customer service and shorten response times, resulting in happier and more loyal customers.

BARD by Google

Google’s Bard uses the Language Model for Dialogue Applications (LaMDA) as an artificially intelligent chatbot and content-generating tool. Its sophisticated communication abilities have been of great use to new businesses. New companies have used Bard to improve their software development, content creation, and customer service. For example, virtual assistant startup Robin AI has implemented Bard to boost customer service and answer quality. Startups can now provide more tailored and interesting user experiences because of Bard’s intelligent and context-aware dialogue production, increasing customer satisfaction and revenue.

BEAUTIFUL.AI

Small business owners and founders often need persuasive presentations to win over investors and new clientele. Create great presentations without spending hours in PowerPoint or Slides by using Beautiful.ai. The software will automatically generate engaging slides from the data you provide, like text and graphics. Over 60 editable slide templates and multiple presentation layouts are available on Beautiful.ai. Try it out and see if it helps you make a better impression.

DUMME

If you want to reach millennials and other young people with short attention spans, you need to have a presence on TikTok and Instagram. Dumme is a useful tool for extracting key moments from longer videos and podcasts to make shorts (short videos to share on social media). You may use Dumme to pick the best moments from any video or audio you post to use them in short. It will automatically create a short video with a title, description, and captions suitable for sharing online. Making a short video for sharing on social media can be done without spending hours in front of a computer.

Cohere Generate

The Open AI-backed firm Cohere Generate created the language AI platform. It helps organizations and startups save time and effort in creating large-scale, personalized text content. It employs NLP and machine learning algorithms to develop content that fits with the brand’s voice and tone. Use this tool to boost your startup’s online visibility, expand your reach, and strengthen your content marketing strategy.

Synthesia

Synthesia is a cutting-edge video synthesis platform that has been a huge boon to the video production efforts of new businesses. It uses artificial intelligence to eliminate the need for costly and time-consuming video shoots by fusing a human performer’s facial emotions and lip movements with the audio. To improve their advertising campaigns, product presentations, and customer onboarding procedures, startups may use Synthesia to create tailored video content at scale. For instance, entrepreneurs can produce multilingual, locally adapted videos or dynamic video ads with little to no more work. Synthesia gives young companies the tools to reach more people at a lower cost per unit while still delivering high-quality content.

Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Note: This post contains affiliate links. If you use these links to buy something, we may earn a commission. Thanks.
The post 20+ Best AI Tools For Startups (2023) appeared first on MarkTechPost.

Meet Seal: An AI Framework that Pursues ‘Segment Any Point Cloud Seq …

Large Language Models (LLMs) have taken the Artificial Intelligence community by storm. Their recent impact and incredible performance display have helped contribute to a wide range of industries such as healthcare, finance, entertainment, etc. The well-known LLMs like GPT-3.5, GPT 4, DALLE 2, and BERT, also known as the foundation models, perform extraordinary tasks and ease our lives by generating unique content given just a short natural language prompt. 

Recent vision foundation models (VFMs) like SAM, X-Decoder, and SEEM have made many advancements in computer vision. Although VFMs have made tremendous progress in 2D perception tasks, 3D VFM research still needs to be improved. Researchers have suggested that expanding current 2D VFMs for 3D perception tasks is required. One crucial 3D perception task is the segmentation of point clouds captured by LiDAR sensors, which is essential for the safe operation of autonomous vehicles.

Existing point cloud segmentation techniques mainly rely on sizable datasets that have been annotated for training; however, labeling point clouds is time-consuming and difficult. To overcome all the challenges, a team of researchers has introduced Seal, a framework that uses vision foundation models for segmenting diverse automotive point cloud sequences. Inspired by cross-modal representation learning, Seal gathers semantically rich knowledge from VFMs to support self-supervised representation learning on automotive point clouds. The main idea is to develop high-quality contrastive samples for cross-modal representation learning using a 2D-3D relationship between LiDAR and camera sensors.

Seal possesses three key properties: scalability, consistency, and generalizability.

Scalability – Seal makes use of VFMs by simply converting them into point clouds, doing away with the necessity for 2D or 3D annotations during the pretraining phase. Due to its scalability, it manages vast amounts of data, which even helps eliminates the time-consuming need for human annotation.

Consistency: The architecture enforces spatial and temporal links at both the camera-to-LiDAR and point-to-segment stages. Seal enables efficient cross-modal representation learning by capturing the cross-modal interactions between vision, i.e., camera and LiDAR sensors which help in making sure that the learned representations incorporate pertinent and coherent data from both modalities.

Generalizability: Seal enables knowledge transfer to downstream applications involving various point cloud datasets. It generalizes and handles datasets with different resolutions, sizes, degrees of cleanliness, contamination levels, actual data, and artificial data.

Some of the key contributions mentioned by the team are –

The proposed framework Seal is a scalable, reliable, and generalizable framework created to capture semantic-aware spatial and temporal consistency.

It allows the extraction of useful features from automobile point cloud sequences.

The authors have stated that this study is the first to use 2D vision foundation models for self-supervised representation learning on a significant scale of 3D point clouds.

Across 11 different point cloud datasets with various data configurations, SEAL has performed better than earlier methods in both linear probing and fine-tuning for downstream applications.

For evaluation, the team has performed tests on eleven distinct point cloud datasets to assess Seal’s performance. The outcomes demonstrated Seal’s superiority to the existing approaches. On the nuScenes dataset, Seal achieved a remarkable mean Intersection over Union (mIoU) of 45.0% after linear probing. This performance surpassed random initialization by 36.9% mIoU and outperformed previous SOTA methods by 6.1% mIoU. Seal also portrayed significant performance gains in twenty different few-shot fine-tuning tasks across all eleven tested point cloud datasets.

Check Out The Paper, Github, and Tweet. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Notion

Otter AI

AdCreative.ai

tinyEinstein

SaneBox

Getimg.ai

Check Out 100’s AI Tools in AI Tools Club
The post Meet Seal: An AI Framework that Pursues ‘Segment Any Point Cloud Sequences’ by Leveraging 2D Vision Foundation Models for Self-Supervised Learning on Large-Scale 3D Point Clouds appeared first on MarkTechPost.

Researchers From Max Plank Propose MIME: A Generative AI Model that Ta …

Humans are always interacting with their surroundings. They move about a space, touch things, sit on chairs, or sleep on beds. These interactions detail how the scene is set up and where the objects are. A mime is a performer who uses their comprehension of such relationships to create a rich, imaginative, 3D environment with nothing more than their body movements. Can they teach a computer to mimic human actions and make the appropriate 3D scene? Numerous fields, including architecture, gaming, virtual reality, and the synthesis of synthetic data, might benefit from this technique. For instance, there are substantial datasets of 3D human motion, such as AMASS, but these datasets seldom include details on the 3D setting in which they were collected. 

Could they create believable 3D sceneries for all the motions using AMASS? If so, they could make training data with realistic human-scene interaction using AMASS. They developed a novel technique called MIME (Mining Interaction and Movement to infer 3D Environments), which creates believable interior 3D scenes based on 3D human motion to respond to such inquiries. What makes it possible? The fundamental assumptions are as follows: (1) Human motion across space denotes the absence of items, essentially defining areas of the picture devoid of furniture. Additionally, this limits the kind and location of 3D objects when in touch with the scene; for instance, a sitting person must be seated on a chair, sofa, bed, etc. 

Figure 1: Estimating 3D scenes from human movement. They recreate realistic 3D settings in which the motion may have occurred given 3D human motion (left), such as that obtained from motion capture or body-worn sensors. Their generative model is able to generate several realistic scenarios (right) with proper human-scene interaction that take into account the locations and postures of the person.

Researchers from the Max Planck Institute for Intelligent Systems in Germany and Adobe created MIME, a transformer-based auto-regressive 3D scene generation technique, to give these intuitions some tangible form. Given an empty floor plan and a human motion sequence, MIME predicts the furniture that will come into contact with the human. Additionally, it foresees believable items that do not come into touch with people but fit in with other objects and adhere to the free-space restrictions brought on by the motions of people. They partition the motion into contact and non-contact snippets to condition the 3D scene creation for human motion. They estimate potential contact poses using POSA. The non-contact postures project the foot vertices onto the ground plane to establish the room’s free space, which they record as 2D floor maps. 

The contact vertices predicted by POSA create 3D bounding boxes that reflect the contact postures and associated 3D human body models. The objects that satisfy the contact and free-space criteria are expected autoregressively use this data as input to the transformer; see Fig. 1. They expanded the large-scale synthetic scene dataset 3D-FRONT to create a new dataset named 3D-FRONT HUMAN to train MIME. They automatically add people to the 3D scenarios, including non-contact people (a series of walking motions and people standing) and contact people (people sitting, touching, and lying). To do this, they use static contact poses from RenderPeople scans and motion sequences from AMASS. 

MIME creates a realistic 3D scene layout for the input motion at inference time, represented as 3D bounding boxes. They choose 3D models from the 3D-FUTURE collection based on this arrangement; then, they fine-tune their 3D placement based on geometric restrictions between the human positions and the scene. Their method produces a 3D set that supports human touch and motion while placing convincing objects in free space, unlike pure 3D scene creation systems like ATISS. Their approach permits the development of items not in contact with the person, anticipating the complete scene instead of individual objects, in contrast to Pose2Room, a recent pose-conditioned generative model. They show that their approach works without any adjustments on genuine motion sequences that have been recorded, like PROX-D. 

In conclusion, they contribute the following: 

• A brand-new motion-conditioned generative model for 3D room scenes that auto-regressively creates things that come into contact with people while avoiding occupying motion-defined vacant space. 

• A brand-new 3D scene dataset made up of interacting people and people in free space was created by filling 3D FRONT with motion data from AMASS and static contact/standing poses from RenderPeople.

The code is available on GitHub along with a video demo. They also have a video explanation of their approach.

Check Out The Paper, Github, and Project. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Aragon AI

SaneBox

tinyEinstein

Notion

Criminal IP: AI-based Phishing Link Checker

Riverside

Check Out 100’s AI Tools in AI Tools Club
The post Researchers From Max Plank Propose MIME: A Generative AI Model that Takes 3D Human Motion Capture and Generates Plausible 3D Scenes that are Consistent with the Motion appeared first on MarkTechPost.

Reduce energy consumption of your machine learning workloads by up to …

Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more and more applications around us have become infused with ML, and new ML-powered applications are conceived every day. A popular example is OpenAI’s ChatGPT, which is powered by a state-of-the-art large language model (LMM). For reference, GPT-3, an earlier generation LLM has 175 billion parameters and requires months of non-stop training on a cluster of thousands of accelerated processors. The Carbontracker study estimates that training GPT-3 from scratch may emit up to 85 metric tons of CO2 equivalent, using clusters of specialized hardware accelerators.
There are several ways AWS is enabling ML practitioners to lower the environmental impact of their workloads. One way is through providing prescriptive guidance around architecting your AI/ML workloads for sustainability. Another way is by offering managed ML training and orchestration services such as Amazon SageMaker Studio, which automatically tears down and scales up ML resources when not in use, and provides a host of out-of-the-box tooling that saves cost and resources. Another major enabler is the development of energy efficient, high-performance, purpose-built accelerators for training and deploying ML models.
The focus of this post is on hardware as a lever for sustainable ML. We present the results of recent performance and power draw experiments conducted by AWS that quantify the energy efficiency benefits you can expect when migrating your deep learning workloads from other inference- and training-optimized accelerated Amazon Elastic Compute Cloud (Amazon EC2) instances to AWS Inferentia and AWS Trainium. Inferentia and Trainium are AWS’s recent addition to its portfolio of purpose-built accelerators specifically designed by Amazon’s Annapurna Labs for ML inference and training workloads.
AWS Inferentia and AWS Trainium for sustainable ML
To provide you with realistic numbers of the energy savings potential of AWS Inferentia and AWS Trainium in a real-world application, we have conducted several power draw benchmark experiments. We have designed these benchmarks with the following key criteria in mind:

First, we wanted to make sure that we captured direct energy consumption attributable to the test workload, including not just the ML accelerator but also the compute, memory, and network. Therefore, in our test setup, we measured power draw at that level.
Second, when running the training and inference workloads, we ensured that all instances were operating at their respective physical hardware limits and took measurements only after that limit was reached to ensure comparability.
Finally, we wanted to be certain that the energy savings reported in this post could be achieved in a practical real-world application. Therefore, we used common customer-inspired ML use cases for benchmarking and testing.

The results are reported in the following sections.
Inference experiment: Real-time document understanding with LayoutLM
Inference, as opposed to training, is a continuous, unbounded workload that doesn’t have a defined completion point. It therefore makes up a large portion of the lifetime resource consumption of an ML workload. Getting inference right is key to achieving high performance, low cost, and sustainability (better energy efficiency) along the full ML lifecycle. With inference tasks, customers are usually interested in achieving a certain inference rate to keep up with the ingest demand.
The experiment presented in this post is inspired by a real-time document understanding use case, which is a common application in industries like banking or insurance (for example, for claims or application form processing). Specifically, we select LayoutLM, a pre-trained transformer model used for document image processing and information extraction. We set a target SLA of 1,000,000 inferences per hour, a value often considered as real time, and then specify two hardware configurations capable of meeting this requirement: one using Amazon EC2 Inf1 instances, featuring AWS Inferentia, and one using comparable accelerated EC2 instances optimized for inference tasks. Throughout the experiment, we track several indicators to measure inference performance, cost, and energy efficiency of both hardware configurations. The results are presented in the following figure.

Performance, Cost and Energy Efficiency Results of Inference Benchmarks

AWS Inferentia delivers 6.3 times higher inference throughput. As a result, with Inferentia, you can run the same real-time LayoutLM-based document understanding workload on fewer instances (6 AWS Inferentia instances vs. 33 other inference-optimized accelerated EC2 instances, equivalent to an 82% reduction), use less than a tenth (-92%) of the energy in the process, all while achieving significantly lower cost per inference (USD 2 vs. USD 25 per million inferences, equivalent to a 91% cost reduction).
Training experiment: Training BERT Large from scratch
Training, as opposed to inference, is a finite process that is repeated much less frequently. ML engineers are typically interested in high cluster performance to reduce training time while keeping cost under control. Energy efficiency is a secondary (yet growing) concern. With AWS Trainium, there is no trade-off decision: ML engineers can benefit from high training performance while also optimizing for cost and reducing environmental impact.
To illustrate this, we select BERT Large, a popular language model used for natural language understanding use cases such as chatbot-based question answering and conversational response prediction. Training a well-performing BERT Large model from scratch typically requires 450 million sequences to be processed. We compare two cluster configurations, each with a fixed size of 16 instances and capable of training BERT Large from scratch (450 million sequences processed) in less than a day. The first uses traditional accelerated EC2 instances. The second setup uses Amazon EC2 Trn1 instances featuring AWS Trainium. Again, we benchmark both configurations in terms of training performance, cost, and environmental impact (energy efficiency). The results are shown in the following figure.

Performance, Cost and Energy Efficiency Results of Training Benchmarks

In the experiments, AWS Trainium-based instances outperformed the comparable training-optimized accelerated EC2 instances by a factor of 1.7 in terms of sequences processed per hour, cutting the total training time by 43% (2.3h versus 4h on comparable accelerated EC2 instances). As a result, when using a Trainium-based instance cluster, the total energy consumption for training BERT Large from scratch is approximately 29% lower compared to a same-sized cluster of comparable accelerated EC2 instances. Again, these performance and energy efficiency benefits also come with significant cost improvements: cost to train for the BERT ML workload is approximately 62% lower on Trainium instances (USD 787 versus USD 2091 per full training run).
Getting started with AWS purpose-built accelerators for ML
Although the experiments conducted here all use standard models from the natural language processing (NLP) domain, AWS Inferentia and AWS Trainium excel with many other complex model architectures including LLMs and the most challenging generative AI architectures that users are building (such as GPT-3). These accelerators do particularly well with models with over 10 billion parameters, or computer vision models like stable diffusion (see Model Architecture Fit Guidelines for more details). Indeed, many of our customers are already using Inferentia and Trainium for a wide variety of ML use cases.
To run your end-to-end deep learning workloads on AWS Inferentia- and AWS Trainium-based instances, you can use AWS Neuron. Neuron is an end-to-end software development kit (SDK) that includes a deep learning compiler, runtime, and tools that are natively integrated into the most popular ML frameworks like TensorFlow and PyTorch. You can use the Neuron SDK to easily port your existing TensorFlow or PyTorch deep learning ML workloads to Inferentia and Trainium and start building new models using the same well-known ML frameworks. For easier setup, use one of our Amazon Machine Images (AMIs) for deep learning, which come with many of the required packages and dependencies. Even simpler: you can use Amazon SageMaker Studio, which natively supports TensorFlow and PyTorch on Inferentia and Trainium (see the aws-samples GitHub repo for an example).
One final note: while Inferentia and Trainium are purpose built for deep learning workloads, many less complex ML algorithms can perform well on CPU-based instances (for example, XGBoost and LightGBM and even some CNNs). In these cases, a migration to AWS Graviton3 may significantly reduce the environmental impact of your ML workloads. AWS Graviton-based instances use up to 60% less energy for the same performance than comparable accelerated EC2 instances.
Conclusion
There is a common misconception that running ML workloads in a sustainable and energy-efficient fashion means sacrificing on performance or cost. With AWS purpose-built accelerators for machine learning, ML engineers don’t have to make that trade-off. Instead, they can run their deep learning workloads on highly specialized purpose-built deep learning hardware, such as AWS Inferentia and AWS Trainium, that significantly outperforms comparable accelerated EC2 instance types, delivering lower cost, higher performance, and better energy efficiency—up to 90%—all at the same time. To start running your ML workloads on Inferentia and Trainium, check out the AWS Neuron documentation or spin up one of the sample notebooks. You can also watch the AWS re:Invent 2022 talk on Sustainability and AWS silicon (SUS206), which covers many of the topics discussed in this post.

About the Authors
Karsten Schroer is a Solutions Architect at AWS. He supports customers in leveraging data and technology to drive sustainability of their IT infrastructure and build data-driven solutions that enable sustainable operations in their respective verticals. Karsten joined AWS following his PhD studies in applied machine learning & operations management. He is truly passionate about technology-enabled solutions to societal challenges and loves to dive deep into the methods and application architectures that underlie these solutions.
Kamran Khan is a Sr. Technical Product Manager at AWS Annapurna Labs. He works closely with AI/ML customers to shape the roadmap for AWS purpose-built silicon innovations coming out of Amazon’s Annapurna Labs. His specific focus is on accelerated deep-learning chips including AWS Trainium and AWS Inferentia. Kamran has 18 years of experience in the semiconductor industry. Kamran has over a decade of experience helping developers achieve their ML goals.

SalesForce AI Researchers Introduce Mask-free OVIS: An Open-Vocabulary …

Instance segmentation refers to the computer vision task of identifying and differentiating multiple objects that belong to the same class within an image by treating them as distinct entities. Over the past few years, there has been a significant upturn in the number of instances of segmentation techniques because of the rapid advancements in deep learning techniques. For instance, convolutional neural networks (CNNs) and other progressive architectures such as Mask R-CNN are used for instance segmentation. The dominant characteristic of such techniques is that they combine object detection capabilities with pixel-wise segmentation to identify objects and generate accurate masks for each instance within an image, leading to a better understanding of the overall picture. 

However, there is a certain downside to existing detection models regarding the number of base categories they can identify. Previous trials have indicated that if a detection model is trained on the COCO dataset, its capability to detect approximately 80 categories can be attained. However, any additional categories would necessitate human involvement, which is laborious and time-consuming. To counter this, Open Vocabulary (OV) methods exist that leverage image-caption pairs and vision language models to learn new categories. However, there are vast differences in supervision when it comes to learning from base and novel categories. This often leads to overfitting on base categories and poor generalization to novel ones. As a result, there is a strong requirement for a methodology that can enhance these detection methods to detect new categories without much human intervention. This would make the models more practical and scalable for real-world applications. 

To address this issue, researchers at Salesforce AI have devised a method where bounding box and instance-mask annotations are generated from an image-caption pair. Their proposed method, The Mask-free OVIS pipeline, takes advantage of weak supervision by utilizing pseudomask annotations derived from a vision-language model to learn base and novel categories. This approach eliminates the need for laborious human annotation and addresses the issue of overfitting. Experimental evaluations have demonstrated that their methodology surpasses existing state-of-the-art open vocabulary instance segmentation models. Moreover, their research has been acknowledged and accepted at the prestigious Computer Vision and Pattern Recognition Conference in 2023.

Salesforce researchers have devised a pipeline that consists of two main stages: pseudo-mask generation and open-vocabulary instance segmentation. In the first stage, a pseudo-mask annotation is created for the object of interest from the image-caption pair. By utilizing a pre-trained vision-language model, the object’s name serves as a text prompt to localize the object. Additionally, an iterative masking process is performed with GradCAM to refine the pseudo-mask and ensure it covers the entire object accurately. In the second stage, a weakly-supervised segmentation (WSS) network is trained to select the proposal with the highest overlap with the GradCAM activation map using previously generated bounding boxes. Finally, a Mask-RCNN model is trained using the generated pseudo annotations, completing the pipeline.

The pipeline, thus, eliminates the need for any human involvement by harnessing the power of pre-trained vision-language models and weakly supervised models to automatically generate pseudo-mask annotations, which can be employed as additional training data. To evaluate their pipeline, the researchers conducted several experiments on sought-after datasets like the MS-COCO and OpenImages datasets. The findings demonstrated that employing pseudo-annotations in their approach leads to exceptional performance in detection and instance segmentation tasks, surpassing other methods that depend on human annotations. The one-of-a-kind vision-language guided approach to pseudo annotation generation, devised by the researchers at Salesforce, paves the way for originating more advanced and precise instance segmentation models that eliminate the need for human annotators.

Check Out The Paper, Project, and Reference Article. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Getimg.ai

Rask AI

tinyEinstein

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post SalesForce AI Researchers Introduce Mask-free OVIS: An Open-Vocabulary Instance Segmentation Mask Generator appeared first on MarkTechPost.