WAYVE Introduces GAIA-1: A New Generative AI Model for Autonomy that C …

The automotive industry has long pursued the goal of autonomous driving, recognizing its potential to revolutionize transportation and enhance road safety. However, developing autonomous systems that can effectively navigate complex real-world scenarios has proven to be a significant challenge. A cutting-edge generative AI model called GAIA-1 has been introduced in response to this challenge, designed explicitly for autonomy.

GAIA-1 is a research model that utilizes video, text, and action inputs to generate realistic driving videos while offering fine-grained control over ego-vehicle behavior and scene features. Its unique capability to manifest the generative rules of the real world represents a significant advancement in embodied AI, allowing artificial systems to comprehend and replicate real-world practices and behaviors. The introduction of GAIA-1 opens up limitless possibilities for innovation in the field of autonomy, facilitating enhanced and accelerated training of autonomous driving technology.

The GAIA-1 model is a multi-modal approach that leverages video, text, and action inputs to generate realistic driving videos. By training on a vast corpus of real-world UK urban driving data, the model learns to predict subsequent frames in a video sequence, exhibiting autoregressive prediction capabilities similar to large language models (LLMs). GAIA-1 goes beyond being a standard generative video model by functioning as an actual world model. It comprehends and disentangles important driving concepts such as vehicles, pedestrians, road layouts, and traffic lights, providing precise control over ego-vehicle behavior and other scene features. 

One of the remarkable achievements of GAIA-1 is its ability to manifest the underlying generative rules of the world. Through extensive training on diverse driving data, the model synthesizes the inherent structure and patterns of the natural world, generating highly realistic and various driving scenes. This breakthrough signifies a significant step toward realizing embodied AI, where artificial systems can interact with the world and comprehend and reproduce its rules and behaviors.

A crucial component of autonomous driving is a world model—a representation of the world based on accumulated knowledge and observations. World models enable predictions of future events, a fundamental requirement for autonomous driving. These models can be learned simulators or mental “what if” thought experiments for model-based reinforcement learning and planning. By incorporating world models into driving models, a better understanding of human decisions can be achieved, leading to improved generalization in real-world situations. GAIA-1 builds upon extensive research in prediction and world models, refining approaches such as future prediction, driving simulation, bird’s-eye view prediction, and learning world models over five years.

Additionally, GAIA-1 can extrapolate beyond its training data, enabling it to imagine scenarios it has never encountered. This capability is valuable for safety evaluation, as it allows the model to generate simulated data representing incorrect driving behaviors, which can be used to evaluate driving models in a safe and controlled environment.

In conclusion, GAIA-1 represents a game-changing generative AI research model with immense potential for advancements in research, simulation, and training within the autonomy field. Its ability to generate realistic and diverse driving scenes opens new possibilities for training autonomous systems to navigate complex real-world scenarios more effectively. Continued research and insights on GAIA-1 are eagerly anticipated as it continues to push the boundaries of autonomous driving.

Check Out The Reference Article. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Notion

Otter AI

AdCreative.ai

tinyEinstein

SaneBox

Getimg.ai

Check Out 100’s AI Tools in AI Tools Club
The post WAYVE Introduces GAIA-1: A New Generative AI Model for Autonomy that Creates Realistic Driving Videos by Leveraging Video, Text, and Action Inputs appeared first on MarkTechPost.

Meet LLM-Blender: A Novel Ensembling Framework to Attain Consistently …

Large Language Models have shown remarkable performance in a massive range of tasks. From producing unique and creative content and questioning answers to translating languages and summarizing textual paragraphs, LLMs have been successful in imitating humans. Some well-known LLMs like GPT, BERT, and PaLM have been in the headlines for accurately following instructions and accessing vast amounts of high-quality data. Models like GPT4 and PaLM are not open-source, which prevents anyone from understanding their architectures and the training data. On the other hand, the open-source nature of LLMs like Pythia, LLaMA, and Flan-T5 provides an opportunity to researchers to fine-tune and improve the models on custom instruction datasets. This enables the development of smaller and more efficient LLMs like Alpaca, Vicuna, OpenAssistant, and MPT.

There is no single open-source LLM that leads the market, and the best LLMs for various examples can differ greatly from one another. Therefore, in order to continuously produce improved answers for each input, it is essential to dynamically ensemble these LLMs. Biases, errors, and uncertainties can be reduced by integrating the distinctive contributions of various LLMs, thus resulting in outcomes that more closely match human preferences. To address this, researchers from the Allen Institute for Artificial Intelligence, the University of Southern California, and Zhejiang University have proposed LLM-BLENDER, an ensembling framework that consistently obtains superior performance by utilizing the many advantages of several open-source large language models. 

LLM-BLENDER consists of two modules – PAIRRANKER and GENFUSER. These modules show that the optimal LLM for different examples can vary significantly. PAIRRANKER, the first module, has been developed to identify minute variations among potential outputs. It uses an advanced pairwise comparison technique in which the original text and two candidate outputs from various LLMs act as inputs. In order to jointly encode the input and the candidate pair, it makes use of cross-attention encoders like RoBERTa, where the quality of the two candidates can be determined by PAIRRANKER using this encoding. 

The second module, GENFUSER, focuses on merging the top-ranked candidates to generate an improved output. It makes the most of the advantages of the chosen candidates while minimizing their disadvantages. GENFUSER aims to develop an output that is superior to the output of any one LLM by merging the outputs of various LLMs.

For evaluation, the team has provided a benchmark dataset called MixInstruct, which incorporates Oracle pairwise comparisons and combines various instruction datasets. This dataset uses 11 popular open-source LLMs to generate multiple candidates for each input across various instruction-following tasks. It comprises training, validation, and test examples with Oracle comparisons for automatic evaluation. These oracle comparisons have been used to give candidate outputs a ground truth ranking, allowing the performance of LLM-BLENDER and other benchmark techniques to be assessed.

The experimental findings have shown that LLM-BLENDER performs much better across a range of evaluation parameters than individual LLMs and baseline techniques. It establishes a sizable performance gap and shows that employing the LLM-BLENDER ensembling methodology results in higher-quality output when compared to using a single LLM or baseline method. PAIRRANKER’s selections have outperformed individual LLM models because of their better performance in reference-based metrics and GPT-Rank. Through efficient fusion, GENFUSER significantly improves response quality by utilizing the top picks from PAIRRANKER. 

LLM-BLENDER has also outperformed individual LLMs, like Vicuna, and has thus shown great potential for improving LLM deployment and research through ensemble learning.

Check Out The Paper, Project, and Github. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Nota

SaneBox

tinyEinstein

Notion

Criminal IP: AI-based Phishing Link Checker

Riverside

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post Meet LLM-Blender: A Novel Ensembling Framework to Attain Consistently Superior Performance by Leveraging the Diverse Strengths of Multiple Open-Source Large Language Models (LLMs) appeared first on MarkTechPost.

Onboard users to Amazon SageMaker Studio with Active Directory group-s …

Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
When creating your SageMaker domain, you can choose to use either AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM) for user authentication methods. Both authentication methods have their own set of use cases; in this post, we focus on SageMaker domains with IAM Identity Center, or single sign-on (SSO) mode, as the authentication method.
With SSO mode, you set up an SSO user and group in IAM Identity Center and then grant access to either the SSO group or user from the Studio console. Currently, all SSO users in a domain inherit the domain’s execution role. This may not work for all organizations. For instance, administrators may want to set up IAM permissions for a Studio SSO user based on their Active Directory (AD) group membership. Furthermore, because administrators are required to manually grant SSO users access to Studio, the process may not scale when onboarding hundreds of users.
In this post, we provide prescriptive guidance for the solution to provision SSO users to Studio with least privilege permissions based on AD group membership. This guidance enables you to quickly scale for onboarding hundreds of users to Studio and achieve your security and compliance posture.
Solution overview
The following diagram illustrates the solution architecture.

The workflow to provision AD users in Studio includes the following steps:

Set up a Studio domain in SSO mode.
For each AD group:

Set up your Studio execution role with appropriate fine-grained IAM policies
Record an entry in the AD group-role mapping Amazon DynamoDB table.
Alternatively, you can adopt a naming standard for IAM role ARNs based on the AD group name and derive the IAM role ARN without needing to store the mapping in an external database.
Sync your AD users and groups and memberships to AWS Identity Center:

If you’re using an identity provider (IdP) that supports SCIM, use the SCIM API integration with IAM Identity Center.
If you are using self-managed AD, you may use AD Connector.

When the AD group is created in your corporate AD, complete the following steps:

Create a corresponding SSO group in IAM Identity Center.
Associate the SSO group to the Studio domain using the SageMaker console.

When an AD user is created in your corporate AD, a corresponding SSO user is created in IAM Identity Center.
When the AD user is assigned to an AD group, an IAM Identity Center API (CreateGroupMembership) is invoked, and SSO group membership is created.
The preceding event is logged in AWS CloudTrail with the name AddMemberToGroup.
An Amazon EventBridge rule listens to CloudTrail events and matches the AddMemberToGroup rule pattern.
The EventBridge rule triggers the target AWS Lambda function.
This Lambda function will call back IAM Identity Center APIs, get the SSO user and group information, and perform the following steps to create the Studio user profile (CreateUserProfile) for the SSO user:

Look up the DynamoDB table to fetch the IAM role corresponding to the AD group.
Create a user profile with the SSO user and the IAM role obtained from the lookup table.
The SSO user is granted access to Studio.

The SSO user is redirected to the Studio IDE via the Studio domain URL.

Note that, as of writing, Step 4b (associate the SSO group to the Studio domain) needs to be performed manually by an admin using the SageMaker console at the SageMaker domain level.
Set up a Lambda function to create the user profiles
The solution uses a Lambda function to create the Studio user profiles. We provide the following sample Lambda function that you can copy and modify to meet your needs for automating the creation of the Studio user profile. This function performs the following actions:

Receive the CloudTrail AddMemberToGroup event from EventBridge.
Retrieve the Studio DOMAIN_ID from the environment variable (you can alternatively hard-code the domain ID or use a DynamoDB table as well if you have multiple domains).
Read from a dummy markup table to match AD users to execution roles. You can change this to fetch from the DynamoDB table if you’re using a table-driven approach. If you use DynamoDB, your Lambda function’s execution role needs permissions to read from the table as well.
Retrieve the SSO user and AD group membership information from IAM Identity Center, based on the CloudTrail event data.
Create a Studio user profile for the SSO user, with the SSO details and the matching execution role.

import os
import json
import boto3
DOMAIN_ID = os.environ.get(‘DOMAIN_ID’, ‘d-xxxx’)

def lambda_handler(event, context):

print({“Event”: event})

client = boto3.client(‘identitystore’)
sm_client = boto3.client(‘sagemaker’)

event_detail = event[‘detail’]
group_response = client.describe_group(
IdentityStoreId=event_detail[‘requestParameters’][‘identityStoreId’],
GroupId=event_detail[‘requestParameters’][‘groupId’],
)
group_name = group_response[‘DisplayName’]

user_response = client.describe_user(
IdentityStoreId=event_detail[‘requestParameters’][‘identityStoreId’],
UserId=event_detail[‘requestParameters’][‘member’][‘memberId’]
)
user_name = user_response[‘UserName’]
print(f”Event details: {user_name} has been added to {group_name}”)

mapping_dict = {
“ad-group-1”: “<execution-role-arn>”,
“ad-group-2”: “<execution-role-arn>”
}

user_role = mapping_dict.get(group_name)

if user_role:
response = sm_client.create_user_profile(
DomainId=DOMAIN_ID,
SingleSignOnUserIdentifier=”UserName”,
SingleSignOnUserValue=user_name,
# if the SSO user_name value is an email,
# add logic to handle it since Studio user profiles don’t accept @ character
UserProfileName=user_name,
UserSettings={
“ExecutionRole”: user_role
}
)
print(response)
else:
response = “Group is not authorized to use SageMaker. Doing nothing.”
print(response)
return {
‘statusCode’: 200,
‘body’: json.dumps(response)
}

Note that by default, the Lambda execution role doesn’t have access to create user profiles or list SSO users. After you create the Lambda function, access the function’s execution role on IAM and attach the following policy as an inline policy after scoping down as needed based on your organization requirements.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Action”: [
“identitystore:DescribeGroup”,
“identitystore:DescribeUser”
],
“Effect”: “Allow”,
“Resource”: “*”
},
{
“Action”: “sagemaker:CreateUserProfile”,
“Effect”: “Allow”,
“Resource”: “*”
},
{
“Action”: “iam:PassRole”,
“Effect”: “Allow”,
“Resource”: [
“<list-of-studio-execution-roles>”
]
}
]
}

Set up the EventBridge rule for the CloudTrail event
EventBridge is a serverless event bus service that you can use to connect your applications with data from a variety of sources. In this solution, we create a rule-based trigger: EventBridge listens to events and matches against the provided pattern and triggers a Lambda function if the pattern match is successful. As explained in the solution overview, we listen to the AddMemberToGroup event. To set it up, complete the following steps:

On the EventBridge console, choose Rules in the navigation pane.
Choose Create rule.
Provide a rule name, for example, AddUserToADGroup.
Optionally, enter a description.
Select default for the event bus.
Under Rule type, choose Rule with an event pattern, then choose Next.
On the Build event pattern page, choose Event source as AWS events or EventBridge partner events.
Under Event pattern, choose the Custom patterns (JSON editor) tab and enter the following pattern:

{
“source”: [“aws.sso-directory”],
“detail-type”: [“AWS API Call via CloudTrail”],
“detail”: {
“eventSource”: [“sso-directory.amazonaws.com”],
“eventName”: [“AddMemberToGroup”]
}
}

Choose Next.
On the Select target(s) page, choose the AWS service for the target type, the Lambda function as the target, and the function you created earlier, then choose Next.
Choose Next on the Configure tags page, then choose Create rule on the Review and create page.

After you’ve set the Lambda function and the EventBridge rule, you can test out this solution. To do so, open your IdP and add a user to one of the AD groups with the Studio execution role mapped. Once you add the user, you can verify the Lambda function logs to inspect the event and also see the Studio user provisioned automatically. Additionally, you can use the DescribeUserProfile API call to verify that the user is created with appropriate permissions.
Supporting multiple Studio accounts
To support multiple Studio accounts with the preceding architecture, we recommend the following changes:

Set up an AD group mapped to each Studio account level.
Set up a group-level IAM role in each Studio account.
Set up or derive the group to IAM role mapping.
Set up a Lambda function to perform cross-account role assumption, based on the IAM role mapping ARN and created user profile.

Deprovisioning users
When a user is removed from their AD group, you should remove their access from the Studio domain as well. With SSO, when a user is removed, the user is disabled in IAM Identity Center automatically if the AD to IAM Identity Center sync is in place, and their Studio application access is immediately revoked.
However, the user profile on Studio still persists. You can add a similar workflow with CloudTrail and a Lambda function to remove the user profile from Studio. The EventBridge trigger should now listen for the DeleteGroupMembership event. In the Lambda function, complete the following steps:

Obtain the user profile name from the user and group ID.
List all running apps for the user profile using the ListApps API call, filtering by the UserProfileNameEquals parameter. Make sure to check for the paginated response, to list all apps for the user.
Delete all running apps for the user and wait until all apps are deleted. You can use the DescribeApp API to view the app’s status.
When all apps are in a Deleted state (or Failed), delete the user profile.

With this solution in place, ML platform administrators can maintain group memberships in one central location and automate the Studio user profile management through EventBridge and Lambda functions.
The following code shows a sample CloudTrail event:

“AddMemberToGroup”:
{
“eventVersion”: “1.08”,
“userIdentity”: {
“type”: “Unknown”,
“accountId”: “<account-id>”,
“accessKeyId”: “30997fec-b566-4b8b-810b-60934abddaa2”
},
“eventTime”: “2022-09-26T22:24:18Z”,
“eventSource”: “sso-directory.amazonaws.com”,
“eventName”: “AddMemberToGroup”,
“awsRegion”: “us-east-1”,
“sourceIPAddress”: “54.189.184.116”,
“userAgent”: “Okta SCIM Client 1.0.0”,
“requestParameters”: {
“identityStoreId”: “d-906716eb24”,
“groupId”: “14f83478-a061-708f-8de4-a3a2b99e9d89”,
“member”: {
“memberId”: “04c8e458-a021-702e-f9d1-7f430ff2c752”
}
},
“responseElements”: null,
“requestID”: “b24a123b-afb3-4fb6-8650-b0dc1f35ea3a”,
“eventID”: “c2c0873b-5c49-404c-add7-f10d4a6bd40c”,
“readOnly”: false,
“eventType”: “AwsApiCall”,
“managementEvent”: true,
“recipientAccountId”: “<account-id>”,
“eventCategory”: “Management”,
“tlsDetails”: {
“tlsVersion”: “TLSv1.2”,
“cipherSuite”: “ECDHE-RSA-AES128-GCM-SHA256”,
“clientProvidedHostHeader”: “up.sso.us-east-1.amazonaws.com”
}
}

The following code shows a sample Studio user profile API request:

create-user-profile \
–domain-id d-xxxxxx \
–user-profile-name ssouserid
–single-sign-on-user-identifier ‘userName’ \
–single-sign-on-user-value ‘ssouserid‘ \
–user-settings ExecutionRole=arn:aws:iam::<account id>:role/name

Conclusion
In this post, we discussed how administrators can scale Studio onboarding for hundreds of users based on their AD group membership. We demonstrated an end-to-end solution architecture that organizations can adopt to automate and scale their onboarding process to meet their agility, security, and compliance needs. If you’re looking for a scalable solution to automate your user onboarding, try this solution, and leave you feedback below! For more information about onboarding to Studio, see Onboard to Amazon SageMaker Domain.

About the authors
Ram Vittal is an ML Specialist Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 2-year-old sheep-a-doodle!
Durga Sury is an ML Solutions Architect in the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and hiking with her 5-year-old husky.

Meet Otter: A Cutting-Edge AI Model that Leverages a Large-Scale Datas …

Multi-faceted models strive to integrate data from diverse sources, including written language, pictures, and videos, to execute various functions. These models have demonstrated considerable potential in comprehending and generating content that fuses visual and textual data.

A crucial component of multi-faceted models is instruction tuning, which involves fine-tuning the model based on natural language directives. This enables the model to grasp user intentions better and generate precise and pertinent responses. Instruction tuning has been effectively employed in large language models (LLMs) like GPT-2 and GPT-3, enabling them to follow instructions to accomplish real-world tasks.

Existing approaches in multi-modal models can be categorized into system design and end-to-end trainable models perspectives. The system design perspective connects different models using a dispatch scheduler like ChatGPT but lacks training flexibility and can be costly. The end-to-end trainable models perspective integrates models from other modalities but may have high training costs or limited flexibility. Previous instruction tuning datasets in multi-modal models lacks in-context examples. Recently, a new approach proposed by a research team from Singapore introduces in-context instruction tuning and constructs datasets with contextual examples to fill this gap.

The main contributions of this work include:

The introduction of the MIMIC-IT dataset for instruction tuning in multi-modal models.

The development of the Otter model with improved instruction-following and in-context learning abilities.

The optimization of OpenFlamingo implementation for easier accessibility.

These contributions provide researchers with a valuable dataset, an enhanced model, and a more user-friendly framework for advancing multi-modal research.

Concretely, the authors introduce the MIMIC-IT dataset, which aims to enhance OpenFlamingo’s instruction comprehension capabilities while preserving its in-context learning capacity. The dataset consists of image-text pairs with contextual relationships, while OpenFlamingo aims to generate text for a queried image-text pair based on in-context examples. The MIMIC-IT dataset is introduced to enhance OpenFlamingo’s instruction comprehension while maintaining its in-context learning. It includes image-instruction-answer triplets and corresponding context. OpenFlamingo is a framework that enables multi-modal models to generate text based on images and contextual examples.

During training, the Otter model follows the OpenFlamingo paradigm, freezing the pretrained encoders and fine-tuning specific modules. The training data follows a particular format with image, user instruction, “GPT”-generated answers, and a [endofchunk] token. The model is trained using cross-entropy loss, with the Please view this post in your web browser to complete the quiz. token separating solutions for prediction objectives.

The authors integrated Otter into Hugging Face Transformers, allowing easy reuse and integration into researchers’ pipelines. They optimized the model for training on 4×RTX-3090 GPUs and supported Fully Sharded Data Parallel (FSDP) and DeepSpeed for improved efficiency. They also offer a script for converting the original OpenFlamingo checkpoint into the Hugging Face Model format. Regarding demonstrations, Otter performs better in following user instructions and exhibits advanced reasoning abilities compared to OpenFlamingo. It demonstrates the ability to handle complex scenarios and apply contextual knowledge. Otter also supports multi-modal in-context learning and performs well in visual question-answering tasks, leveraging information from images and contextual examples to provide comprehensive and accurate answers.

In conclusion, this research contributes to multi-modal models by introducing the MIMIC-IT dataset, enhancing the Otter model with improved instruction-following and in-context learning abilities, and optimizing the implementation of OpenFlamingo for easier accessibility. Integrating Otter into Hugging Face Transformers enables researchers to leverage the model with minimal effort. The demonstrated capabilities of Otter in following user instructions, reasoning in complex scenarios, and performing multi-modal in-context learning showcase the advancements in multi-modal understanding and generation. These contributions provide valuable resources and insights for future research and development in multi-modal models.

Check Out The Paper, Project and Github. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Getimg.ai

Rask AI

tinyEinstein

SaneBox

Notion

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post Meet Otter: A Cutting-Edge AI Model that Leverages a Large-Scale Dataset Called MIMIC-IT to Achieve State-of-the-Art Performances in Perception and Reasoning Benchmarks appeared first on MarkTechPost.

Best AI Sales Assistant Tools for 2023

Artificial intelligence sales assistant solutions, often virtual sales assistants, aid sales representatives by automating various duties. Using these AI-powered sales tools, sales and marketing teams may spend less time on mundane chores and more time focusing on strategic initiatives. This entails more than just automating chats; it also involves screening through leads. Covid-19’s push toward online selling has made artificial intelligence sales assistants increasingly important.

There is some duplication of features between AI sales assistants and other types of sales analytics tools, chatbots, and AI applications. However, they are increasingly adept at automating routine sales procedures while providing valuable foresight. Let’s check out some of the artificial intelligence sales assistant apps.

Warmer.ai

Reaching out to people you need to learn is essential for finding new business leads and talent. However, locating relevant data about these prospects and writing an effective first email can be challenging. In this respect, Warmer.ai shines. Using AI characteristics, Warmer.ai assists with personalizing emails by filling in recommended touchpoints such as the prospect’s accolades, interests, work titles, and other information. It improves response rates, meeting bookings, and efficiency, allowing sales teams to devote more time to completing deals and less time to administrative activities.

Drift

Drift is a platform that shortens the sales cycle by speeding up the lead qualification process. It doesn’t need users to fill out forms or wait for a response, instead focusing on immediate interaction. Chatbots are at the heart of the sales assistant tool, allowing customers to get their queries answered and set up appointments with representatives. Integrating with other marketing tools and tailoring the experience to each visitor are two of the most important elements.

Dooly

Dooly.ai integrates with Salesforce, a widely used customer relationship management tool, to aid businesses. Dooly streamlines this procedure to save time waiting for the app to launch or switching between tabs. It’s a convenient way for teams to change several transactions simultaneously. Staying on top of your deals and their development is made easier with key tools like meeting notes, note templates, pipeline updates, and a task manager.

Troops

Troops is a tool that can be used with Slack and Microsoft Teams to automate notifications and other tasks. It uses AI to communicate with other sales tools like Salesforce. That way, your team may spend as little time as possible moving between systems. Signals are real-time messages about actions that affect income and are an important feature. Deal Rooms can centralize customer information in Slack and enhance team collaboration. Thanks to Command, every built-in tool can be edited with a single line of code.

TopOpps

TopOpps uses AI for many aspects of the sales process, including training and development, activity tracking, pipeline management, and forecasting. This eliminates many mundane, repetitive tasks that sales teams would otherwise have to deal with daily. For instance, management can avoid rash decisions about crucial sales KPIs thanks to accurate sales forecasting. Also, information such as appointments and other deal metrics may be captured automatically and uploaded to your CRM in real-time.

Exceed.ai

Lead qualification is simplified with Exceed.ai’s AI interactions. The scheduling of meetings is automated as well. This frees up time that might otherwise be spent searching for downloads, allowing your account executives to better prepare for meetings with potential clients. Each prospect is interacted with by an AI bot at some stage. Depending on your preferences, it can send messages by text, email, or website. The meeting will be scheduled, and prospects will be primed to hear your sales pitch.

Tact.ai

Engage consumers on any platform with a conversational interface reminiscent of WhatsApp with the help of Tact.ai. In doing so, it hopes to transform CRMs from passive record-keeping tools into interactive channels for two-way communication between businesses and their customers. One of their services, called Tact Assistant, eliminates the need for your representatives to interact directly with customers. Tact Portal is an online hub where your customers interact with your business in a way tailored to the service they receive from you.

SalesDirector

Sales teams need to record a lot of data regularly. The AI sales assistant tool SalesDirector records this information mechanically. Your managers can make informed decisions thanks to the analytics and insights provided by this system. In addition to Google Data Studio, Power BI, Tableau, Einstein, and MongoDB integration are key features. It also includes a historical trends section that can be used to inform analytics-driven strategy development.

Zoovu

Zoovu is an artificial intelligence (AI) based sales tool that assists buyers in locating certain products. It transforms your unstructured data into a more manageable, human-friendly interface. That way, those who visit your site can easily learn more about what you provide. Virtual assistants, and automated programs, are also included to help the user. The idea is to give your business a leg up on the competition by guiding potential customers through the preliminary stages of your sales funnel.

People.ai

People.ai was developed specifically for use with Salesforce. In particular, it speeds up the onboarding process for new representatives and boosts their profitability, allowing your sales staff to generate more cash for your business. Regarding marketing, People.ai can help you save time upfront by populating your contact list with qualified leads. This sales assistant app also includes sales forecasting, providing your management with an idea of what to expect in the future. You will know the requirement for more effort in the upcoming quarter.

ChatSpot 

ChatSpot is an artificial intelligence-driven chat platform that integrates with HubSpot CRM. The tool’s AI features facilitate a fluent conversation between users. The company states that its software is competitive with ChatGPT for HubSpot. DALL-E 2 powers the tool to respond to users more nuancedly based on their circumstances. ChatSpot’s primary goal is to supply a full-featured chat environment that can handle a variety of consumer interactions. It’s useful for tracking leads, assisting customers, and bolstering sales. The software is meant to work with HubSpot CRM, giving users one place to manage their customer interactions. Sign up for early access while the tool is still in public alpha.

Managr.ai

If you want to make more sales, Managr.ai is your artificial intelligence sales assistant. It takes notes on your sales calls using NLP and then generates emails and tasks on the fly. The sales pipeline may be predicted, and the greatest prospects can be found with the help of Managr.ai. Time-consuming sales tasks can be completely automated with the help of Managr.ai. This will allow you to devote more time fostering connections with clients and securing sales. During sales calls, you can rest assured that you will remember everything thanks to Managr.ai’s AI-powered note-taking. As a result, your follow-up emails and presentations will be more precise and convincing. With Managr.ai, you can always see exactly where your sales are with their pipeline and opportunity management tools. Opportunity discovery and goal attainment tracking are both facilitated by this.

Veloxy.io

Veloxy.io, an AI sales platform, can assist if you want to close more deals. It employs machine learning to help you find promising customers, vet potential leads, and monitor sales performance. Sales enablement tools are available on Veloxy.io and can be used to make compelling sales presentations. Using Veloxy.io to automate your sales processes can save you time. This will allow you to devote more time fostering connections with clients and securing sales. Lead scoring and qualification tools driven by AI are available on Veloxy.io, allowing you to zero in on the most promising leads and maximize your sales potential. You may see the entire sales process laid out before you with its help. Opportunity discovery and goal attainment tracking are both facilitated by this.

Namora AI

With Namora AI, you’ll close more business. It takes NLP notes on sales calls and creates emails and tasks. Namora AI can anticipate the sales pipeline’s greatest chances. Namora AI can record sales calls. It then summarizes the call using natural language processing. Namora AI can produce action items and discover follow-ups. Namora AI automates sales operations to save time. You’ll have extra time to cultivate new clientele and close deals. You’ll remember a crucial meeting or sales call point again with Namora AI’s AI-powered note-taking. Thus, your follow-up emails and presentations will be clearer. You can see the full sales process with Namora AI’s pipeline and opportunity management. This aids in goal tracking and opportunity identification.

Saile.ai 

Saile.ai’s AI automates sales prospecting for sales managers. Sailebots, Saile.ai’s patent-pending personality-driven AI bots, can curate leads, verify and engage decision-makers, and deliver actionable opportunities without human intervention. Saile.ai helps sales professionals save time, increase productivity, and close more business. This technology allows sales managers to prospect less and close more deals. Sales leaders may focus on completing deals since Sailebots automate prospecting. It can help sales managers increase output by automating regular tasks. It can also help sales managers close more deals. 

Pod 

Pod AI assists B2B salespeople. It supports account executives’ pipeline management with GPT. AI playbooks help salespeople close deals faster by guiding them on which stakeholders to connect with and when. Pod centralizes CRM changes, account strategies, and notes. Only it combines artificial intelligence and chatGPT to streamline sales team procedures. Sales managers can now evaluate pipelines in minutes, saving time for training and development. Revenue operations teams can also accelerate sales rep onboarding and process compliance.

Note: If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post Best AI Sales Assistant Tools for 2023 appeared first on MarkTechPost.

UC San Diego and Qualcomm Researchers Unleash Natural Program: A Power …

The latest and most incredible advancement in the domain of Artificial Intelligence is the development of Large Language Models (LLMs).  The very famous ChatGPT developed by OpenAI, which is based on the GPT 3.5 and GPT 4 architecture, is of great use and is mostly in the headlines for generating content and answering questions just like a human would do.  Its ability to imitate humans in generating creative and precise content enables it to dive into problem-solving in almost all industries.  With the addition of Chain-of-Thought (CoT) prompting, the impact of LLMs like GPT 3.5 has improved, resulting in significant changes in the information processing industry.  CoT enhances the LLMs and helps them generate more comprehensive and elaborate reasoning processes in a series of intermediate steps. 

Though CoT offers many advantages, its emphasis on intermediate reasoning phases occasionally causes hallucinations and compounded errors, which makes it difficult for the models to generate consistent and accurate reasoning processes.  A lot of efforts have been made to enable LLMs to do explicit and rigorous deductive reasoning by drawing inspiration from how humans engage in deliberate deductive logical reasoning procedures to solve problems.  To address these challenges, a team of researchers has introduced the Natural Program, a natural language-based deductive reasoning format that uses the inherent power of natural language to achieve deductive reasoning.

The team has mentioned that this approach breaks down the reasoning verification process into a number of sequential sub-processes.  Only the context and premises required for the particular step are provided to each subprocess, and the decomposition makes the verification process more approachable.  The authors have used publically accessible models like OpenAI’s GPT-3.5-turbo (175B) to run trials on datasets for arithmetic and common sense to show the effectiveness of their natural program-based verification technique.  The outcomes demonstrated how well their strategy worked to increase the dependability of reasoning processes produced by big language models.

The Natural Program format enables language models to generate precise reasoning steps, ensuring that subsequent steps are more rigorously grounded on prior steps.  The language models perform reasoning self-verification in a step-by-step manner by using this structure, and the resulting reasoning stages are more rigorous and reliable since a verification procedure is integrated into each level of deductive reasoning.

Some of the key contributions mentioned by the team are –

With the introduction of the Natural Program format, the team has proposed a framework for rigorous deductive reasoning, which is suitable for verification and can be simply produced by in-context learning.

It has been shown that the lengthy deductive reasoning processes written in the proposed Natural Program format may be reliably self-verified by using step-by-step subprocesses that only cover the prerequisite context and premises.

Through experiments, the team has shown how effectively the framework enhances the accuracy, dependability, and interpretability of LLM-generated reasoning stages and solutions.

In conclusion, this framework seems promising for enhancing the deductive reasoning capabilities of language models.

Check Out The Paper and Github. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

Noota

SaneBox

tinyEinstein

Notion

Criminal IP: AI-based Phishing Link Checker

Riverside

Aragon AI

Check Out 100’s AI Tools in AI Tools Club
The post UC San Diego and Qualcomm Researchers Unleash Natural Program: A Powerful Tool for Effortless Verification of Rigorous Reasoning Chains in Natural Language – An AI Game Changer appeared first on MarkTechPost.

Meet TARDIS: An AI Framework that Identifies Singularities in Complex …

We are deluged with enormous volumes of data from all the different domains, including scientific, medical, social media, and educational data. Analyzing such data is a crucial requirement. With the increasing amount of data, it is important to have approaches for extracting simple and meaningful representations from complex data. The previous methods work on the same assumption that the data lies close to a small-dimensional manifold despite having a large ambient dimension and seek the lowest-dimensional manifold that best characterizes the data.

Manifold learning methods are used in representation learning, where high-dimensional data is transformed into a lower-dimensional space while keeping crucial data features intact. Though the manifold hypothesis work for most types of data, it doesn’t work well in data with singularities. Singularities are the regions where the manifold assumption breaks down and can contain important information. These regions violate the smoothness or regularity properties of a manifold.

Researchers have proposed a topological framework called TARDIS (Topological Algorithm for Robust DIscovery of Singularities) to address the challenge of identifying and characterizing singularities in data. This unsupervised representation learning framework detects singular regions in point cloud data and has been designed to be agnostic to the geometric or stochastic properties of the data, only requiring a notion of the intrinsic dimension of neighborhoods. It aims to tackle two key aspects – quantifying the local intrinsic dimension and assessing the manifoldness of a point across multiple scales. 

The authors have mentioned that quantifying the local intrinsic dimension measures the effective dimensionality of a data point’s neighborhood. The framework has achieved this by using topological methods, particularly persistent homology, which is a mathematical tool used to study the shape and structure of data across different scales. It estimates the intrinsic dimension of a point’s neighborhood by applying persistent homology, which gives information on the local geometric complexity. This local intrinsic dimension measures the degree to which the data point is manifold and indicates whether it conforms to the low-dimensional manifold assumption or behaves differently.

The Euclidicity Score, which evaluates a point’s manifoldness on different scales, quantifies a point’s departure from Euclidean behavior, revealing the existence of singularities or non-manifold structures. The framework captures differences in a point’s manifoldness by taking Euclidicity into account at various scales, making it possible to spot singularities and comprehend local geometric complexity.

The team has provided theoretical guarantees on the approximation quality of this framework for certain classes of spaces, including manifolds. They have run experiments on a variety of datasets, from high-dimensional image collections to spaces with known singularities, to validate their theory. These findings showed how well the approach identifies and processes non-manifold portions in data, shedding light on the limitations of the manifold hypothesis and exposing important data hidden in singular regions.

In conclusion, this approach effectively questions the manifold hypothesis and is efficient in detecting singularities which are the points that violate the manifoldness assumption.

Check Out The Paper and Github link. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet TARDIS: An AI Framework that Identifies Singularities in Complex Spaces and Captures Singular Structures and Local Geometric Complexity in Image Data appeared first on MarkTechPost.

MetaVERTU Revolutionizes Smartphone Market with ChatGPT Integration, R …

In an amazing set of events, Vetu, a luxury smartphone brand, dropped their new project integrating ChatGPT into their new upcoming devices, MetaVertu. This news made headlines on April 24 and was reported by an authoritative Chinese media outlet Jinsefinance. This turn of events and a breakthrough development made it into the market just before Apple’s highly anticipated launch of ChatGPT on the App Store on May 19.

Vertu is a premium smartphone manufacturing company that Nokia formerly owned. They claim to provide the best class experience and service to the users. Although they don’t offer any major technological advancement in terms of hardware, they claim to have best-in-class encryption, global GSM sim coverage, a best-in-class camera, and other general features, along with one unique feature that comes along with a concierge service entitlement. Driven by the philosophy of “If you can spend $20,000 on a watch, then why not on your smartphone,” they aim to cater to the elite class better regarding their cellular needs.

MetaVertu is now integrated with ChatGPT, which will introduce an unwitnessed and unparalleled user experience, offering a wide range of features and benefits that puts it ahead of all the competition. Unlike Apple’s app store, where you have to pay a subscription fee of $19.9, MetVertu has decided to give out free access to chatgpt and applications based on that. The company is claiming the affordability factor as its unique selling point for users searching for an exceptional AI-powered conversational experience.

When a user access ChatGPT on the MetaSpace platform, that user can gain access to a comprehensive set of functionalities. The ChatGPT app, known as V-GPT, enables seamless one-click login and unrestricted conversations at no cost (as opposed to all the other paid models on different platforms), and it also supports voice input for user queries. Not only that, but users can also engage in dialogues with various AI personalities such as AI Buddha, a comic, or even a dream interpreter, which supports how versatile and entertaining, conversational experience they are set to deliver.

MetaVertu has laid out some ambitious plans for what they aim and visualize for the future post the chatgpt integration. They are counting on the new release of ChatGPT4, which will introduce new custom AI roles and lead to the creation of personal AI gigs tailored to suit every user on a personal level. The company plans to integrate voice chat capabilities and deploy various tools for various scenarios. These tools will include an emotional assistant for managing emotional intelligence, conflict resolution, and blame shifting; an efficiency expert offering reporting, OKR (Objectives and Key Results) composition, and translation tools; and a copywriting genius specializing in marketing and everyday written content.

What is worth noting is that Vertu disclosed all of this information related to ChatGPT integration on April 24, significantly before May 19, demonstrating a significant and visionary commitment to pioneer AI integration and redefine the smartphone landscape.

In conclusion, Vertu’s integration of ChatGPT into their latest MetaVERTU smartphone series has initiated a new age of conversational capabilities. The affordability, versatility, and customization offered by MetaVERTU make it unique. Since it is leading the race of AI integration with smartphones, this move has positioned Vertu as a pioneering force driving the AI-integrated smartphone market. With its ambitious plans for future updates and tools, it would be interesting to see how it evolves.

Check Out The Reference Article and Website. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post MetaVERTU Revolutionizes Smartphone Market with ChatGPT Integration, Redefining Conversational Capabilities and Pioneering AI-Driven Luxury appeared first on MarkTechPost.

Meta AI Unveils Revolutionary I-JEPA: A Groundbreaking Leap in Compute …

Humans pick up a tremendous quantity of background information about the world just by watching it. The Meta team has been working on developing computers that can learn internal models of how the world functions to let them learn much more quickly, plan out how to do challenging jobs, and quickly adapt to novel conditions since last year. For the system to be effective, these representations must be learned directly from unlabeled input, such as images or sounds, rather than manually assembled labeled datasets. This learning process is known as self-supervised learning.

Generative architectures are trained by obscuring or erasing parts of the data used to train the model. This could be done with an image or text. They then make educated guesses about what pixels or words are missing or distorted. However, a major drawback of generative approaches is that the model attempts to fill in any gaps in knowledge, notwithstanding the inherent uncertainty of the real world. 

Researchers at Meta have just unveiled their first artificial intelligence model. By comparing abstract representations of images (rather than comparing the pixels themselves), their Image Joint Embedding Predictive Architecture (I-JEPA) can learn and improve over time.

According to the researchers, the JEPA will be free of the biases and problems that plague invariance-based pretraining because it does not involve collapsing representations from numerous views/augmentations of an image to a single point.

The goal of I-JEPA is to fill in knowledge gaps using a representation closer to how individuals think. The proposed multi-block masking method is another important design option that helps direct I-JEPA toward developing semantic representations. 

I-JEPA’s predictor can be considered a limited, primitive world model that can describe spatial uncertainty in a still image based on limited contextual information. In addition, the semantic nature of this world model allows it to make inferences about previously unknown parts of the image rather than relying solely on pixel-level information.

To see the model’s outputs when asked to forecast within the blue box, the researchers trained a stochastic decoder that transfers the I-JEPA predicted representations back into pixel space. This qualitative analysis demonstrates that the model can learn global representations of visual objects without losing track of where those objects are in the frame.

Pre-training with I-JEPA uses few computing resources. It doesn’t require the overhead of applying more complex data augmentations to provide different perspectives. The findings suggest that I-JEPA can learn robust, pre-built semantic representations without custom view enhancements. A linear probing and semi-supervised evaluation on ImageNet-1K also beats pixel and token-reconstruction techniques.

Compared to other pretraining methods for semantic tasks, I-JEPA holds its own despite relying on manually produced data augmentations. I-JEPA outperforms these approaches on basic vision tasks like object counting and depth prediction. I-JEPA is adaptable to more scenarios since it uses a less complex model with a more flexible inductive bias.

The team believes that JEPA models have the potential to be used in creative ways in areas like video interpretation is quite promising. Using and scaling up such self-supervised approaches for developing a broad model of the world is a huge step forward.

Check Out The Paper and Github. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Unveils Revolutionary I-JEPA: A Groundbreaking Leap in Computer Vision That Emulates Human and Animal Learning and Reasoning appeared first on MarkTechPost.

Researchers from Microsoft and UC Santa Barbara Propose LONGMEM: An AI …

Large language models (LLMs) have greatly improved the state-of-the-art in various understanding and generation tasks, revolutionizing natural language processing. Most LLMs gain from self-supervised training over huge corpora by gathering information from a fixed-sized local context and displaying emerging skills, including zero-shot prompting, in-context learning, and Chain-of-Thought (CoT) reasoning. The input length restriction of present LLMs precludes them from generalizing to real-world applications, such as extended horizontal planning, where the capacity to handle long-form material beyond a fix-sized session is crucial. 

The simplest solution to the length limit problem is simply scaling up the input context length. For improved long-range interdependence, GPT-3, for example, raises the input length from 1k of GPT-2 to 2k tokens. The in-context dense attention is nevertheless severely confined by the quadratic computing complexity of Transformer self-attention, and this technique often requires computationally extensive training from the beginning. Another new area of research, which still mostly requires training from the start, focuses on creating in-context sparse attention to avoid the quadratic cost of self-attention. 

While Memorising Transformer (MemTRM) is a well-known study, it approximates in-context scant attention through dense attention to both in-context tokens and memorized tokens retrieved from a non-differentiable memory for Transformers. MemTRM delivers significant perplexity benefits when modeling large books or papers by scaling up the resultant language model to handle up to 65k tokens. MemTRM’s linked memory approach, which uses a single model for encoding and fusing memory for language modeling, presents the memory staleness difficulty during training. In other words, cached earlier representations in memory may have distributional changes from those from the most recent model when the model parameters are changed, reducing the use of memory augmentation. 

In this paper authors from UCSB and Microsoft Research propose the LONGMEM framework, which enables language models to cache long-form prior context or knowledge into the non-differentiable memory bank and take advantage of them via a decoupled memory module to address the memory staleness problem. They create a revolutionary residual side network (SideNet) to achieve decoupled memory. A frozen backbone LLM is used to extract the paired attention keys and values from the previous context into the memory bank. The resulting attention query of the current input is utilized in the SideNet’s memory-augmented layer to access cached (keys and values) for earlier contexts. The associated memory augmentations are then fused into learning hidden states via a joint attention process. 

Better knowledge transfer from the pretrained backbone LLM is made possible by newly built cross-network residual connections between the SideNet and the frozen backbone LLM. The pre-trained LLM may be modified to utilize long-contextual memory by repeatedly training the residual SideNet to extract and fuse memory-augmented long-context. There are two primary advantages to their decoupled memory system. First, the decoupled frozen backbone LLM and SideNet in their proposed architecture isolate memory retrieval and fusion from encoding prior inputs into memory. 

This efficiently addresses the problem of memory staleness since the backbone LLM only serves as the long-context knowledge encoder. In contrast, the residual SideNet serves as the memory retriever and reader. Second, it is computationally inefficient and suffers from catastrophic forgetting to change the LLM with memory augmentations directly. In addition to being able to access the knowledge that was previously learned, LONGMEM can also prevent devastating forgetting since the backbone LLM is frozen throughout the effective memory-augmented adaption stage. Depending on the subsequent activities, LONGMEM can input different kinds of long-form text and information into the memory bank. 

They focus on two illustrative instances: memory-augmented in-context learning with thousands of task-relevant demonstration examples and language modeling with full-length book contexts. They assess how well the proposed LONGMEM performs on several long-text language modeling tasks and memory-augmented in-context learning for language understanding. According to experimental findings, their model regularly surpasses the strong baselines regarding its capacity for long-text modeling and in-context learning. Their approach significantly increases the ability of LLM to represent long-context language by -1.38 ~ -1.62 perplexity over various length splits of the Gutenberg-2022 corpus. 

Surprisingly, their model greatly outperforms the current strong x-former baselines to attain the state-of-the-art performance of 40.5% identification accuracy on ChapterBreak, a difficult long-context modeling benchmark. Lastly, compared to MemTRM and baselines without memory enhancement, LONGMEM displays strong in-context learning benefits on common NLU tasks.

Check Out The Paper and Github link. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers from Microsoft and UC Santa Barbara Propose LONGMEM: An AI Framework that Enables LLMs to Memorize Long History appeared first on MarkTechPost.

This AI Paper Proposes A Zero-Shot Personalized Lip2Speech Synthesis M …

A team of researchers from the University of Science and Technology of China has developed a novel machine-learning model for lip-to-speech (Lip2Speech) synthesis. The model is capable of generating personalized synthesized speech in zero-shot conditions, meaning it can make predictions related to data classes that it did not encounter during training. The researchers introduced their approach leveraging a variational autoencoder—a generative model based on neural networks that encode and decode data.

Lip2Speech synthesis involves predicting spoken words based on the movements of a person’s lips, and it has various real-world applications. For example, it can assist patients who cannot produce speech sounds in communicating with others, add sound to silent movies, restore speech in noisy or damaged videos, and even determine conversations in voice-less CCTV footage. While some machine learning models have shown promise in Lip2Speech applications, they often struggle with real-time performance and are not trained using zero-shot learning approaches.

Typically, to achieve zero-shot Lip2Speech synthesis, machine learning models require reliable video recordings of speakers to extract additional information about their speech patterns. However, in cases where only silent or unintelligible videos of a speaker’s face are available, this information cannot be accessed. The researchers’ model aims to address this limitation by generating speech that matches the appearance and identity of a given speaker without relying on recordings of their actual speech.

The team proposed a zero-shot personalized Lip2Speech synthesis method that utilizes face images to control speaker identities. They employed a variational autoencoder to disentangle speaker identity and linguistic content representations, allowing speaker embeddings to control the voice characteristics of synthetic speech for unseen speakers. Additionally, they introduced associated cross-modal representation learning to enhance the ability of face-based speaker embeddings (FSE) in voice control.

To evaluate the performance of their model, the researchers conducted a series of tests. The results were remarkable, as the model generated synthesized speech that accurately matched a speaker’s lip movements and their age, gender, and overall appearance. The potential applications of this model are extensive, ranging from assistive tools for individuals with speech impairments to video editing software and aid for police investigations. The researchers highlighted the effectiveness of their proposed method through extensive experiments, demonstrating that the synthetic utterances were more natural and aligned with the personality of the input video compared to other methods. Importantly, this work represents the first attempt at zero-shot personalized Lip2Speech synthesis using a face image rather than reference audio to control voice characteristics.

In conclusion, the researchers have developed a machine-learning model for Lip2Speech synthesis that excels in zero-shot conditions. The model can generate personalized synthesized speech that aligns with a speaker’s appearance and identity by leveraging a variational autoencoder and face images. The successful performance of this model opens up possibilities for various practical applications, such as aiding individuals with speech impairments, enhancing video editing tools, and assisting in police investigations.

Check Out The Paper and Reference Article. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post This AI Paper Proposes A Zero-Shot Personalized Lip2Speech Synthesis Method: A Synthetic Speech Model To Match Lip Movements appeared first on MarkTechPost.

Researchers From UC Berkeley and Google Introduce an AI Framework that …

The domain of Artificial Intelligence (AI) is evolving and advancing with the release of every new model and solution. Large Language Models (LLMs), which have recently got very popular due to their incredible abilities, are the main reason for the rise in AI. The subdomains of AI, be it Natural Language Processing, Natural Language Understanding, or Computer Vision, all of these are progressing, and for all good reasons. One research area that has recently garnered a lot of interest from AI and deep learning communities is Visual Question Answering (VQA). VQA is the task of answering open-ended text-based questions about an image. 

Systems adopting Visual Question Answering attempt to appropriately answer questions in natural language regarding an input in the form of an image, and these systems are designed in a way that they understand the contents of an image similar to how humans do and thus effectively communicate the findings. Recently, a team of researchers from UC Berkeley and Google Research has proposed an approach called CodeVQA that addresses visual question answering using modular code generation. CodeVQA formulates VQA as a program synthesis problem and utilizes code-writing language models which take questions as input and generate code as output.

This framework’s main goal is to create Python programs that can call pre-trained visual models and combine their outputs to provide answers. The produced programs manipulate the visual model outputs and derive a solution using arithmetic and conditional logic. In contrast to previous approaches, this framework uses pre-trained language models, pre-trained visual models based on image-caption pairings, a small number of VQA samples, and pre-trained visual models to support in-context learning. 

To extract specific visual information from the image, such as captions, pixel locations of things, or image-text similarity scores, CodeVQA uses primitive visual APIs wrapped around Visual Language Models. The created code coordinates various APIs to gather the necessary data, then uses the full expressiveness of Python code to analyze the data and reason about it using math, logical structures, feedback loops, and other programming constructs to arrive at a solution.

For evaluation, the team has compared the performance of this new technique to a few-shot baseline that does not use code generation to gauge its effectiveness. COVR and GQA were the two benchmark datasets used in the evaluation, among which the GQA dataset includes multihop questions created from scene graphs of individual Visual Genome photos that humans have manually annotated, and the COVR dataset contains multihop questions about sets of images in the Visual Genome and imSitu datasets. The outcomes showed that CodeVQA performed better on both datasets than the baseline. In particular, it showed an improvement in the accuracy by at least 3% on the COVR dataset and by about 2% on the GQA dataset.

The team has mentioned that CodeVQA is simple to deploy and utilize because it doesn’t require any additional training. It makes use of pre-trained models and a limited number of VQA samples for in-context learning, which aids in tailoring the created programs to particular question-answer patterns. To sum up, this framework is powerful and makes use of the strength of pre-trained LMs and visual models, providing a modular and code-based approach to VQA.

Check Out The Paper and GitHub link. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers From UC Berkeley and Google Introduce an AI Framework that Formulates Visual Question Answering as Modular Code Generation appeared first on MarkTechPost.

SambaSafety automates custom R workload, improving driver safety with …

At SambaSafety, their mission is to promote safer communities by reducing risk through data insights. Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers. SambaSafety serves more than 15,000 global employers and insurance carriers with driver risk and compliance monitoring, online training and deep risk analytics, as well as risk pricing solutions. Through the collection, correlation and analysis of driver record, telematics, corporate and other sensor data, SambaSafety not only helps employers better enforce safety policies and reduce claims, but also helps insurers make informed underwriting decisions and background screeners perform accurate, efficient pre–hire checks.
Not all drivers present the same risk profile. The more time spent behind the wheel, the higher your risk profile. SambaSafety’s team of data scientists has developed complex and propriety modeling solutions designed to accurately quantify this risk profile. However, they sought support to deploy this solution for batch and real-time inference in a consistent and reliable manner.
In this post, we discuss how SambaSafety used AWS machine learning (ML) and continuous integration and continuous delivery (CI/CD) tools to deploy their existing data science application for batch inference. SambaSafety worked with AWS Advanced Consulting Partner Firemind to deliver a solution that used AWS CodeStar, AWS Step Functions, and Amazon SageMaker for this workload. With AWS CI/CD and AI/ML products, SambaSafety’s data science team didn’t have to change their existing development workflow to take advantage of continuous model training and inference.
Customer use case
SambaSafety’s data science team had long been using the power of data to inform their business. They had several skilled engineers and scientists building insightful models that improved the quality of risk analysis on their platform. The challenges faced by this team were not related to data science. SambaSafety’s data science team needed help connecting their existing data science workflow to a continuous delivery solution.
SambaSafety’s data science team maintained several script-like artifacts as part of their development workflow. These scripts performed several tasks, including data preprocessing, feature engineering, model creation, model tuning, and model comparison and validation. These scripts were all run manually when new data arrived into their environment for training. Additionally, these scripts didn’t perform any model versioning or hosting for inference. SambaSafety’s data science team had developed manual workarounds to promote new models to production, but this process became time-consuming and labor-intensive.
To free up SambaSafety’s highly skilled data science team to innovate on new ML workloads, SambaSafety needed to automate the manual tasks associated with maintaining existing models. Furthermore, the solution needed to replicate the manual workflow used by SambaSafety’s data science team, and make decisions about proceeding based on the outcomes of these scripts. Finally, the solution had to integrate with their existing code base. The SambaSafety data science team used a code repository solution external to AWS; the final pipeline had to be intelligent enough to trigger based on updates to their code base, which was written primarily in R.
Solution overview
The following diagram illustrates the solution architecture, which was informed by one of the many open-source architectures maintained by SambaSafety’s delivery partner Firemind.

The solution delivered by Firemind for SambaSafety’s data science team was built around two ML pipelines. The first ML pipeline trains a model using SambaSafety’s custom data preprocessing, training, and testing scripts. The resulting model artifact is deployed for batch and real-time inference to model endpoints managed by SageMaker. The second ML pipeline facilitates the inference request to the hosted model. In this way, the pipeline for training is decoupled from the pipeline for inference.
One of the complexities in this project is replicating the manual steps taken by the SambaSafety data scientists. The team at Firemind used Step Functions and SageMaker Processing to complete this task. Step Functions allows you to run discrete tasks in AWS using AWS Lambda functions, Amazon Elastic Kubernetes Service (Amazon EKS) workers, or in this case SageMaker. SageMaker Processing allows you to define jobs that run on managed ML instances within the SageMaker ecosystem. Each run of a Step Function job maintains its own logs, run history, and details on the success or failure of the job.
The team used Step Functions and SageMaker, together with Lambda, to handle the automation of training, tuning, deployment, and inference workloads. The only remaining piece was the continuous integration of code changes to this deployment pipeline. Firemind implemented a CodeStar project that maintained a connection to SambaSafety’s existing code repository. When the industrious data science team at SambaSafety posts an update to a specific branch of their code base, CodeStar picks up the changes and triggers the automation.
Conclusion
SambaSafety’s new serverless MLOps pipeline had a significant impact on their capability to deliver. The integration of data science and software development enables their teams to work together seamlessly. Their automated model deployment solution reduced time to delivery by up to 70%.
SambaSafety also had the following to say:

“By automating our data science models and integrating them into their software development lifecycle, we have been able to achieve a new level of efficiency and accuracy in our services. This has enabled us to stay ahead of the competition and deliver innovative solutions to clients. Our clients will greatly benefit from this with the faster turnaround times and improved accuracy of our solutions.”

SambaSafety connected with AWS account teams with their problem. AWS account and solutions architecture teams worked to identify this solution by sourcing from our robust partner network. Connect with your AWS account team to identify similar transformative opportunities for your business.

About the Authors
Dan Ferguson is an AI/ML Specialist Solutions Architect (SA) on the Private Equity Solutions Architecture at Amazon Web Services. Dan helps Private Equity backed portfolio companies leverage AI/ML technologies to achieve their business objectives.
Khalil Adib is a Data Scientist at Firemind, driving the innovation Firemind can provide to their customers around the magical worlds of AI and ML. Khalil tinkers with the latest and greatest tech and models, ensuring that Firemind are always at the bleeding edge.
Jason Mathew is a Cloud Engineer at Firemind, leading the delivery of projects for customers end-to-end from writing pipelines with IaC, building out data engineering with Python, and pushing the boundaries of ML. Jason is also the key contributor to Firemind’s open source projects.

Meet AdANNS: A Novel Framework that Leverages Adaptive Representations …

To obtain information comparable to a given query, large-scale web search engines train an encoder to contain the query and then connect the encoder to an approximate nearest neighbor search (ANNS) pipeline. Learned representations are often stiff, high-dimensional vectors generally employed as-is throughout the ANNS pipeline. They can result in computationally expensive retrieval because of their ability to accurately capture tail queries and data points.

An integral part of retrieval pipelines is a semantic search on learned representations. Learning a neural network to embed queries and a large number (N) of data points in a d-dimensional vector space is the bare minimum for a semantic search approach. All of the steps of an ANN use the same information learned by existing semantic search algorithms, which are rigid representations (RRs). That is to say, whereas ANNS indices permit a wide range of parameters for searching the design space to maximize the accuracy-compute trade-off, it is customarily believed that the dimensionality of the input data is fixed.

Different stages of ANNS can use adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs than would be possible with rigid representations, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. Researchers offer AdANNS, a novel ANNS design framework that takes advantage of the adaptability afforded by Matryoshka Representations.

Researchers show state-of-the-art accuracy-compute trade-offs using unique AdANNS-based key ANNS building pieces such as search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). AdANNS-IVF, for instance, achieves 1.5% higher accuracy than rigid representations-based IVF on ImageNet retrieval while using the same compute budget and achieves accuracy parity while running 90x faster on the same dataset. AdANNS-OPQ, a 32-byte variant of OPQ built using flexible representations, achieves the same accuracy as the 64-byte OPQ baseline for Natural Questions. They also demonstrate that the benefits of AdANNS may be applied to state-of-the-art composite ANNS indices by utilizing both search structures and quantization. Finally, they show that ANNS indices constructed without adaptation using matryoshka representations can be compute-awarely searched with AdANNS.

Visit https://github.com/RAIVNLab/AdANNS to get the source code. 

Key Features

Improved accuracy-compute trade-offs are achieved by using AdANNS to develop new search data structures and quantization techniques.

AdANNS-IVF can be deployed 90% faster than traditional IVF while increasing accuracy by up to 1.5%.

AdANNS-OPQ has the same precision as the gold standard at a fraction of the price.

The AdANNS-powered search data structure (AdANNS-IVF) and quantization (AdANNS-OPQ) significantly outperform state-of-the-art alternatives regarding the accuracy-compute trade-off.

In addition to enabling compute-aware elastic search during inference, AdANNS generalizes to state-of-the-art composite ANNS indices.

AdANNS – Adaptive ANNS

AdANNS is a system for enhancing the accuracy-compute trade-off for semantic search components that takes advantage of the inherent flexibility of matryoshka representations. There are two main parts to the typical ANNS pipeline: (a) a search data structure that indexes and stores data points; and (b) a query-point computation method that provides the (rough) distance between a query and a set of data points.

In this study, we demonstrate that AdANNS may be used to improve the performance of both ANNS subsystems, and we quantify the improvements in terms of the trade-off between computational effort and accuracy. Specifically, they introduce AdANNS-IVF, an index structure based on AdANNS that is similar to the more common IVF structure and the related ScaNN structure. In addition, they introduce representation adaptivity in the OPQ, a de facto standard quantization, with the help of AdANNS-OPQ. AdANNS-IVFOPQ, an AdANNS variant of IVFOPQ, and AdANNS-DiskANN, a variant of DiskANN, are two other examples of hybrid methods demonstrated by the researchers. Compared to IVF indices constructed using RRs, AdANNS-IVF is experimentally demonstrated to be substantially more accurate-compute optimum. AdANNS-OPQ is shown to be as accurate as the OPQ on RRs while significantly cheaper.

AdANNS are designed with search architectures that can accommodate various large-scale use cases, each with unique resource requirements for training and inference. However, it is only sometimes the case that the user cannot search the design space because of index creation and storage issues.

In conclusion

AdANNS was proposed by a group of researchers from the University of Washington, Google Research, and Harvard University to enhance the accuracy-compute trade-off by utilizing adaptive representations across many stages of ANNS pipelines. Compared to traditional ANNS building blocks, which employ the same inflexible representation throughout, AdANNS takes advantage of the inherent flexibility of matryoshka representations to construct superior building blocks. For the two primary ANNS building blocks—search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ)—AdANNS achieves SOTA accuracy-compute trade-off. Finally, by combining AdANNS-based building blocks, improved real-world composite ANNS indices may be constructed, allowing for compute-aware elastic search and reducing costs by as much as 8x compared to strong baselines.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet AdANNS: A Novel Framework that Leverages Adaptive Representations for Different Phases of ANNS Pipelines to Improve the Accuracy-Compute Tradeoff appeared first on MarkTechPost.

Revolutionizing Image Editing: Introducing Clipdrop’s Uncrop – The …

Clipdrop has recently launched its new tool, “Uncrop,” in its suite of exciting AI-based image processing tools. Uncrop uses its foundational model Stable Diffusion XL, a state-of-the-art text-to-image AI model.

Uncrop specializes in image outpainting and extensively uses and relies on AI to change the dimensions of an image with content-aware outpaint fills, which tries to mimic the natural light and adjustment of the source image.

Image outpainting is the process of image adjustment where AI is often used to change the dimensions of an image so that it fits in a specified layout without looking gimmicky. When the photo provided is small for the output measurements needed, AI inspects the image and its composition and, with the help of generative models, tries to “paint” the remaining parts to be covered or filled.

This simple-looking tool is solving a lot of problems and, in fact, bottleneck issues in the content industry. As you are aware, most social media platforms require different dimension images in different places; it can be ad campaigns, banners and posters, or even any digital image you put out. There is heavy spending on the workforce, which costs a lot of time and money for these agencies and individuals to create and scale across multiple platforms. Uncrop by Clipdrop solves the exact problem with one click, saving time and money, and helps to garner the reach every brand and individual desires in this digital era.

Here are some samples created with the Uncrop. This technology is available for free use and is almost real-time, again one of its best features. Less model inference time is always needed to scale up.

There is, although, one thing to notice, Uncrop works miraculously well on images that are in good resolution and contain fewer objects and noise. In contrast, ideas that are a bit low grade with many things happening may need to be more accurate. For those, the company is working towards solving them and hopes to tackle them.

IMAGE-1

IMAGE-2

As you can see in image 1, the lower part of the image has been outpainted in a photo-realistic manner with various options.

Image 2, on the other hand, would need some rework, but it provides a place to start from, which can save some time in post-production.

Clipdrop, by Stability AI, aims to solve most of the issues faced by all content creators across the globe and is determined to provide generative AI services to them without the hassle of all the technical overhead and expenses to maintain such technology in-house. With a diverse range of tools such as Background remover, relighting images, image upscaling, generating an image from the text prompt, uncrop, and even a text remover from the picture, it won’t be wrong to say they provide almost every service needed by the content creator. It all comes packaged as an API service, making it easy to integrate and use.

Their use cases vary from Content creation, real-estate design modelings, and automobile industries to all advertising agencies.

Recently we have been seeing a lot of influx of technologies in this particular use case. Meta’s AI sandbox is a classic example of such a business problem.

While the work done by Uncrop stays quite impressive in production quality and time elapsed to complete it, it still faces fierce competition from big giants such as Meta and Adobe. Adobe, too has launched generative AI features known as “generative fill” integrated into their new versions of existing products such as Photoshop, which are extremely powerful even in spot filling and object generation. It will be interesting to see how Stability AI evolves to meet the customer’s expectations and how strong competition will shape the future of the products they are set to release. 

Check Out The Reference Article. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Revolutionizing Image Editing: Introducing Clipdrop’s Uncrop – The Game-Changing Aspect Ratio Editor You’ve Been Waiting For! appeared first on MarkTechPost.