B2B Outbound Sales: Your Guide to Highly Targeted Outreach

In the world of B2B sales, reaching your target audience has become a bit like finding a needle in a haystack. 

It’s a challenge, and with the recent email spam filter updates from the likes of Google and Yahoo, getting into those coveted inboxes is becoming even trickier. 

It’s why we launched B2B by Customers.ai. For years, B2B sales teams have relied on tools like Demandbase, ZoomInfo, and Clearbit to identify who has visited their site. The problem as most people know – is these tools identify the companies visiting you. But companies are not buying from you – people are!

And now, Customers.ai bridges that gap. We can tell you who is visiting your site at the individual level, giving you names, business emails, Linkedin profiles, and more!

It’s a true game changer and we are excited to see just how fast B2B sales teams can improve their outreach efforts. 

We aren’t done helping yet though. Along with the launch of our B2B solution, we are going to dive deep into the dynamic landscape of B2B outbound sales, unpacking the challenges, and serving up actionable strategies to help your sales team not just survive but thrive in this inbox battleground. Let’s do it!

What is B2B Outbound Sales?

Challenges in B2B Outbound Sales

5 Strategies for Highly Targeted B2B Sales Outreach

How to Do B2B Outbound Sales with Customers.ai

B2B Outbound Sales FAQs

What is B2B Outbound Sales?

To start, we have to understand what B2B outbound sales means: 

B2B outbound sales, short for Business-to-Business outbound sales, is a proactive approach where businesses initiate direct communication with potential customers to generate leads and close deals. 

Unlike inbound strategies that rely on attracting prospects organically, outbound sales involve reaching out to a carefully identified target audience through channels like email, phone calls, and social media. 

The goal is to create meaningful connections, understand customer needs, and showcase how a product or service can address those needs.

Some sales pros use outbound sales and cold outreach interchangeably. But they aren’t quite the same.

Cold outreach refers to connecting with truly cold leads, or potential customers who haven’t interacted with your business before.

Outbound sales can refer to reaching out to prospects who have shown intent—by visiting your website or interacting with your social media content.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Challenges in B2B Outbound Sales

We know that B2B buyers rely on email to make purchasing decisions. In fact, according to MarketingCharts, 41% say they use email to source information. Unfortunately, with the average person receiving over 100 emails per day, getting your email to stand out is no easy feat. 

Getting in the inbox isn’t the only challenge facing B2B sales teams though. Let’s look at the current landscape and common challenges we are seeing and how Customers.ai can help overcome these challenges.

1. Identifying the Right Prospects

Who are your prospects? Challenges arise in the initial phase of B2B outbound sales when identifying the right prospects. Defining your Ideal Customer Profile (ICP) is crucial, as it provides a clear blueprint of your target audience’s characteristics and needs.

How Customers.ai Can Help:

Our Website Visitor ID X-Ray pixel tells you who is on your site, helping you identify the right prospects. 

By capturing real-time data on website visitors, you can see who is visiting your site, what they are interested in, and tailor your outreach strategies based on genuine insights into their behavior and preferences.

It’s like having a personal detective for your website, revealing the clues that lead straight to your most promising leads.

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations. 

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

2. Building and Maintaining a High-Quality Contact List

Ensuring a high-quality contact list is a perpetual challenge in B2B outbound sales. Strategies for data hygiene play a pivotal role, involving regular cleansing to eliminate outdated or irrelevant contacts. 

Furthermore, leveraging CRM tools for contact management streamlines the process, offering functionalities for segmentation, personalization, and efficient list maintenance.

How Customers.ai Can Help:

With our Signs of Life Detector, we automatically know if an email is valid, ensuring you aren’t wasting time on the wrong people. 

Additionally, Customers.ai integrates with your CRM, offering seamless contact management, segmentation, and personalization, making it the go-to ally for keeping your contact list not just extensive, but exceptionally high-quality. 

3. Crafting Compelling Outreach Messages

Crafting messages that resonate with your target audience is a persistent challenge in B2B sales outreach. 

Tailoring messages to specific industries or personas demonstrates a deep understanding of the recipient’s context, while A/B testing for message optimization allows for continuous improvement by experimenting with different elements such as subject lines and content.

How Customers.ai Can Help:

Customers.ai makes outreach easy – the key is our AI email writer. 

Tailoring messages to specific industries or personas becomes a breeze as AI Email Writer understands the nuances and preferences of your target audience and can inject a personalized touch based on the information captured. 

It’s like having a virtual wordsmith that not only understands your audience but also knows the exact words that will capture their attention.

4. Avoiding Promotions Tabs & Spam Folders

With the ever-evolving algorithms of email providers, the challenge extends to avoiding promotions tabs and spam folders. Staying out of these sections requires meticulous attention to email content relevance and adherence to best practices. 

Encouraging recipients to whitelist your emails is an additional proactive step to improve deliverability.

How Customers.ai Can Help:

By exclusively targeting individuals who have actively engaged with your site, we ensure that your emails enter inboxes as warm messages. 

This tailored approach not only enhances deliverability but also mitigates the risk of being marked as spam. Coupled with strong personalization, your communications should not only reach your audience but also grab their attention in the best way possible.

5. Navigating Gatekeepers and Overcoming Objections

Gatekeepers can be formidable barriers in B2B outbound sales, necessitating techniques for engaging gatekeepers effectively. 

Strategies that establish a connection and convey the value of your outreach can prove instrumental. 

Simultaneously, addressing objections is an inherent part of the process, and addressing common objections effectively involves anticipating and preparing persuasive responses that transform objections into opportunities for meaningful engagement.

How Customers.ai Can Help:

Customers.ai helps navigate gatekeepers and overcome objections by providing a unique advantage—insight into exactly what pages your prospects have visited. 

With this powerful feature, segmentation becomes more than just a strategy; it becomes a precision tool. 

By tailoring your outreach to resonate with the specific products or services your prospects have shown interest in, you’re not just navigating gatekeepers, you’re opening doors.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 50 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

5 Strategies for Highly Targeted B2B Sales Outreach

You will always face challenges but having a toolkit of highly targeted outreach strategies isn’t just an advantage—it’s a necessity. Here are five strategies to help you cut through the noise, connect with the right prospects, and elevate your outreach game.

1. Segmentation and Personalization

Segmenting your audience is imperative when it comes to sales outreach. By categorizing your audience based on shared characteristics or behaviors, you can tailor your messages to be more relevant and impactful. This not only boosts engagement but also allows for a more strategic allocation of resources, ensuring your efforts are laser-focused on the right targets.

It also lends to personalization. Personalization goes beyond just addressing someone by their first name. Use the data you have to craft messages that resonate with their specific interests, pain points, or product preferences, creating a connection that feels not just personalized but genuinely tailored to their needs.

2. Multi-Channel Outreach Approach

We mentioned earlier that 41% of people rely on email to get content. Those same people also rely on social media, internet searches, and publications for information. A holistic approach to outreach is a must. 

A multi-channel strategy ensures that your message reaches your audience through their preferred platforms, maximizing the chances of engagement. Plus, by diversifying your communication channels, you amplify your outreach efforts and meet your prospects where they are.

But remember, consistency is key. Make sure that your messaging across email, phone, and social media aligns seamlessly, creating a unified brand voice. This not only strengthens your brand identity but also enhances the overall customer experience, fostering trust and recognition across various touchpoints.

3. Content Marketing Integration

Content is how you capture your audience’s attention. By understanding the topics and products that resonate with your prospects, you can create content that not only educates but also adds value to their decision-making process. It positions you as an authority in your industry and makes your outreach more compelling.

However, it can feel like content marketing efforts aren’t always aligned with outreach efforts. Content is a powerful ally in your outreach journey. 

Use the data you have to create personalized email campaigns or targeted social media messaging. Aligning your content with your outreach efforts enhances engagement and reinforces the value you offer.

4. Utilizing Technology and Automation

Two of the biggest challenges facing any organization are resources and time. You have to build something that is scalable. 

Automation tools help you handle repetitive tasks, allowing your team to focus on building genuine connections. From email campaigns to lead scoring, these tools revolutionize how you approach outbound sales.

While automation brings efficiency, maintaining personalization is paramount. Customers.ai ensures that your automated outreach retains a human touch by incorporating personalization features. Whether it’s dynamically adjusting email content based on prospect behavior or timing outreach for maximum impact, our best practices keep your automated efforts personalized and effective.

5. Continuous Monitoring and Iteration

Perpetual improvement in your outbound sales strategy can only be done through continuous monitoring. Utilize real-time analytics and tracking tools to keep a finger on the pulse of your campaigns. This proactive approach allows you to spot trends, identify successful strategies, and promptly address any areas that may need adjustment.

You also can’t stay stagnant. Iterate based on results for ongoing enhancement. Use the insights gained from monitoring key performance indicators to adapt and improve on your outreach strategies. 

Whether it’s refining your messaging, adjusting target segments, or experimenting with different channels, a commitment to ongoing enhancement ensures your outreach efforts remain agile and effective in the ever-evolving landscape of B2B sales.

Email Deliverability Hacks:

Unlocking Inboxes


Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

How to Do B2B Outbound Sales with Customers.ai

To excel at outbound sales, you need an automated process and powerful tools. We’ll show you how to set up a solid sales strategy with Customers.ai.

Step #1: Know Your Ideal Customer

You could spend time cold emailing a huge list of prospects who may or may not be a good fit for your offer. But you’re probably going to waste a ton of time chatting with donkey leads.

Instead, focus on connecting with your target audience. That starts with knowing your ideal customer. Grab your buyer persona and make a note of:

Demographics, including age, gender, and location

Interests and behaviors that define your customers

Problems and challenges your customers deal with

How your solution can help potential customers

As we mentioned earlier, install the Website Visitor ID X-Ray Pixel and start gathering data on your website visitors.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Step #2: Fine-Tune Your Prospect List

Now you know exactly who you’re looking for—but you’re going to need a hand finding those prospects. Along with identifying who is on your site, you can also find prospects who fit your ICP through our Consumer Directory.

Consumer directory is our lead database of over 250,000,000 consumers in the United States. Filter, search, and purchase consumer directory leads and add them to a custom outreach automation to qualify and close leads.

Once you’ve identified the best prospects for you, you can plug their contact information right into your existing campaigns or build new ones.

Step #3: Warm Up Email Accounts

You’re almost ready to start sending targeted outbound outreach. But before you start reaching out to prospects, it’s important to warm up your email address.

Customers.ai has a built-in email warmup tool. All you have to do is click Integrations to link all the email addresses you want to use for outreach.

Not sure how many to add? If you’re using new email accounts, keep in mind that you probably shouldn’t send more than 100 emails from each account per day.

Once you connect your email addresses, click the Configure Sending button in your automation. Select Multiple Emails to give Customers.ai permission to rotate senders.

Step #4: Send Outbound Outreach

Then open the email template at the top of the automation and write your first outreach message. This is a great opportunity to introduce your business, product, or service and offer something of value. To make your message sound more natural, use Customers.ai’s AI Email Writer. 

Once you have your message, configure the waiting period before the next message, and so on and so forth. 

Continue to add email follow-ups to your automation and use your sales team’s benchmarks to decide on the right number of follow-ups for your list.

Superpower Your B2B Outbound Sales Efforts

In the dynamic realm of B2B sales, the challenges are real and the competition fierce. The recent email spam filter updates from industry giants like Google and Yahoo have added an extra layer of complexity, making inbox visibility an even trickier feat.

That’s why we introduced B2B by Customers.ai. Traditional tools identify companies visiting your site, but we understand that people make the decisions. With our Website Visitor ID X-Ray pixel, we bridge the gap, providing individual-level insights into your site visitors—names, business emails, LinkedIn profiles, and more.

But we’re not stopping there. As we launch our B2B solution, we’re delving deep into the intricacies of B2B outbound sales, unraveling challenges, and serving up actionable strategies. In this inbox battleground, survival is not enough; thriving is the goal.

B2B outbound sales is not just about sending emails; it’s about sending the right emails to the right people. Whether it’s identifying the right prospects, building a high-quality contact list, crafting compelling messages, avoiding spam folders, or navigating gatekeepers, Customers.ai is your ally in this journey.

Ready to transform your B2B outbound sales game? Let’s do it together. Explore B2B by Customers.ai now and unlock a new era of precision and effectiveness in your outreach efforts.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 50 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 50 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

B2B Outbound Sales FAQs

Q. What’s the difference between inbound and outbound sales?

Whether you’re using inbound or outbound tactics, you have the same end goal: to close the deal. But the way you start and nurture relationships with prospects differs, depending on which approach you choose.

With inbound, prospects find the organization through social media, the company website, or another marketing channel. They request information or sign up for a list and receive a series of general marketing campaigns.

With outbound, sales reps use one of the channels above to initiate a connection with a potential customer. Sales reps typically tailor their offers for customer segments so they appeal to prospects’ unique needs and challenges.

Because outbound is more targeted, it tends to have a lower signal-to-noise ratio—as long as you’re reaching out to the right prospects. That means outbound is ideal for finding unicorns in a sea of donkeys.

Q. What are the most effective outbound channels?

It’s easy to assume that outbound sales is all about cold calling. But in practice, phone calls often work best at the end of the sales process, when hot leads are ready to hear your pitch.

Let’s look at a few of the best places to reach out to outbound leads:

Email outreach is the ideal channel to start with since it’s great for introducing your business and offering something of value. With email, you can continue to connect with prospects over time, increasing trust and sharing solutions.

SMS outreach is ideal for prospects who are closer to completing their customer journeys. Because SMS inboxes are private spaces for your inner circle, it’s a good idea to build trust first before connecting with prospects over text.

Social selling via channels like Facebook and Instagram is helpful for building relationships and engaging with leads. Once potential customers show intent by commenting or sending a DM, you can shift the conversation to email or text.

Q. Who does outbound sales?

A successful sales team typically needs three types of representatives:

Sales development reps (SDRs) handle the first few steps of the sales process. They generate leads and then nurture and qualify them before handing them over to an account executive.

Business development reps (BDRs) are experts at prospecting and finding potential customers who fit the target audience. They excel at lead generation—turning cold prospects into warm leads and so account executives can get the sale.

Lead response reps (LRRs) often focus on higher-intent prospects who have indicated interest in the business. Like SDRs, they nurture and qualify leads before tasking the account executive with closing the deal.

Q. What are the best outbound sales tools?

To close more deals, sales teams need tools that automate manual tasks and organize prospects:

Prospecting data tools connect you with your ideal customers so you can focus on building relationships. Tools like Customers.ai RoboBDR let you create customer segments for targeted sales outreach.

Outreach automation tools send targeted messages to prospect lists and follow up on key channels over time. Tools like Customers.ai automate email, SMS, and social media sequences to streamline prospecting and lead qualification.

Customer relationship management (CRM) tools keep track of prospects and touchpoints so you know where everyone is in their customer journey. CRMs like HubSpot integrate with Customers.ai so you can streamline your efforts.

Q: What is the difference between B2B outbound sales and cold outreach?

Cold outreach involves contacting entirely new leads, while B2B outbound sales may include reaching out to prospects who have shown some level of interest or intent.

Q. How can I identify the right prospects for my B2B outbound sales efforts?

Define your Ideal Customer Profile (ICP) and leverage tools like Customers.ai’s Website Visitor ID X-Ray pixel to see who is actively engaging with your site.

Q: What challenges do B2B sales teams face in reaching their target audience via email?

Email saturation is a challenge, with the average person receiving over 100 emails per day. Standing out in crowded inboxes requires strategic and personalized messaging.

Q: How does segmentation and personalization enhance B2B sales outreach?

Segmentation categorizes your audience, allowing you to tailor messages for relevance. Personalization goes beyond names, crafting messages that resonate with specific interests or pain points.

Q: What role does multi-channel outreach play in B2B sales strategies?

Multi-channel outreach ensures your message reaches prospects through preferred platforms, enhancing engagement. Consistent messaging across channels builds brand trust.

Q: Why is content marketing integration crucial for B2B outbound sales?

Content marketing adds value to the customer’s decision-making process. Aligning content with outreach efforts positions your brand as an industry authority.

Q: How can technology and automation improve B2B outbound sales efficiency?

Automation tools handle repetitive tasks, optimizing resource use. Customers.ai’s automation retains a personalized touch, balancing efficiency with genuine connections.

Q: What are the benefits of continuous monitoring and iteration in outbound sales?

Continuous monitoring allows real-time adjustments, spotting trends and addressing issues promptly. Iteration based on results ensures ongoing enhancement and adaptation to market dynamics.

Q: How does Customers.ai help navigate gatekeepers and overcome objections?

Customers.ai provides insights into pages prospects visited, enabling precise segmentation. This tailoring of outreach to specific interests transforms objections into opportunities.

Q: Can B2B outbound sales be successful without an automated process?

While possible, an automated process, as offered by Customers.ai, streamlines tasks, saves time, and ensures scalability, making success more achievable in the competitive B2B landscape.

Q: Why is email deliverability crucial in B2B outbound sales?

Email deliverability determines whether your messages reach the intended inboxes. Customers.ai’s warm-up tools and targeted engagement contribute to improved deliverability.

Q: How does B2B outbound sales contribute to building a brand’s authority?

B2B outbound sales positions your brand as an industry expert by delivering valuable content and personalized messages, fostering trust and credibility among your target audience.

Q: What role does predictive analytics play in B2B outbound sales strategies?

Predictive analytics analyzes historical data to forecast outcomes and behaviors. Customers.ai integrates predictive analytics, helping make informed decisions and optimize resources.

Q: Can B2B outbound sales efforts be successful without continuous audience monitoring?

Continuous monitoring is vital for staying agile. It allows for real-time adjustments and ensures your outreach efforts align with changing customer behaviors and market dynamics.

Q: How does B2B outbound sales adapt to changes in customer preferences and behaviors?

Iteration based on continuous monitoring and data-driven insights ensures B2B outbound sales strategies evolve to meet changing customer preferences and behaviors effectively.

Q: What challenges does B2B outbound sales face in the era of information overload?

Standing out in a sea of information is a challenge. Crafting personalized and compelling messages, as well as utilizing targeted segmentation, helps overcome information overload.

Q: How does B2B outbound sales contribute to lead generation and conversion?

B2B outbound sales actively generates leads by initiating direct communication. Effective lead generation, combined with personalized outreach, contributes to higher conversion rates.

Q: Is there a limit to the number of follow-ups in B2B outbound sales campaigns?

The number of follow-ups depends on your industry and target audience. Test and adapt based on your specific benchmarks to find the optimal number for your campaigns.

Q: How does B2B outbound sales address the balance between automation and personalization?

B2B outbound sales tools like Customers.ai strike a balance by automating repetitive tasks while ensuring personalization features retain a human touch in outreach efforts.

Q: Can B2B outbound sales strategies be effective without a deep understanding of the target audience?

Deep understanding of the target audience is foundational. Identifying buyer personas, preferences, and pain points ensures B2B outbound sales strategies resonate and drive meaningful engagement.
The post B2B Outbound Sales: Your Guide to Highly Targeted Outreach appeared first on Customers.ai.

Meet GPT Crawler: An AI Tool that can Crawl a Site to Generate Knowled …

How awesome it will be to build unique GPT models by extracting knowledge from web pages. Meet GPT Crawler: An amazing AI tool that can crawl a site to generate knowledge files to create your own custom GPT from one or multiple URLs

Using GPT, a big language model trained on an enormous corpus of text and code, GPT Crawler extracts knowledge from webpages with astounding efficiency and accuracy. The GPT Crawler uses natural language processing techniques to interpret the context and meaning of the information it encounters, unlike typical web crawlers that only gather raw data. This makes it possible to recognize and extract important data, including relationships, facts, and concepts, turning unstructured web material into organized knowledge.

Here’s a short custom GPT developed by researchers to assist in answering common concerns about using and integrating Builder.io; all it needs is the URL to the Builder documentation: https://chat.openai.com/g/g-kywiqipmR-builder-io-assistant

You may begin by doing these four easy steps:

Make a clone of the repository.

Put dependencies in place.

Set up the crawler.

Launch your crawler.

Command and configuration instructions are available on the GitHub page.

There are other approaches, such as using Docker to run in a container.

Upload your data to OpenAI

At the root of this project, a file named output.json will be created by the crawl. To construct your helper or custom GPT, upload that to OpenAI.

You can also quickly share your knowledge with others by creating a custom GPT here. To design and utilize custom GPTs now, you might require a premium ChatGPT subscription.

Additionally, you may use this here to build a personalized assistant for your created knowledge, which you can then include in your product.

The Way Forward

GPT Crawler and similar types of tools are expected to become significantly more important in information extraction, creating customized GPT models, and individualized AI interactions as GPT technology develops. It opens up a world of possibilities for knowledge management, content production, and AI-powered applications because of its capacity to bridge the gap between organized information and unstructured web material. Without question, GPT Crawler is a game-changer in artificial intelligence because it can transform how humans interact with information completely.
The post Meet GPT Crawler: An AI Tool that can Crawl a Site to Generate Knowledge Files to Create a Custom GPT from One or Multiple URLs appeared first on MarkTechPost.

Learn how to assess the risk of AI systems

Artificial intelligence (AI) is a rapidly evolving field with the potential to improve and transform many aspects of society. In 2023, the pace of adoption of AI technologies has accelerated further with the development of powerful foundation models (FMs) and a resulting advancement in generative AI capabilities.
At Amazon, we have launched multiple generative AI services, such as Amazon Bedrock and Amazon CodeWhisperer, and have made a range of highly capable generative models available through Amazon SageMaker JumpStart. These services are designed to support our customers in unlocking the emerging capabilities of generative AI, including enhanced creativity, personalized and dynamic content creation, and innovative design. They can also enable AI practitioners to make sense of the world as never before—addressing language barriers, climate change, accelerating scientific discoveries, and more.
To realize the full potential of generative AI, however, it’s important to carefully reflect on any potential risks. First and foremost, this benefits the stakeholders of the AI system by promoting responsible and safe development and deployment, and by encouraging the adoption of proactive measures to address potential impact. Consequently, establishing mechanisms to assess and manage risk is an important process for AI practitioners to consider and has become a core component of many emerging AI industry standards (for example, ISO 42001, ISO 23894, and NIST RMF) and legislation (such as EU AI Act).
In this post, we discuss how to assess the potential risk of your AI system.
What are the different levels of risk?
While it might be easier to start looking at an individual machine learning (ML) model and the associated risks in isolation, it’s important to consider the details of the specific application of such a model and the corresponding use case as part of a complete AI system. In fact, a typical AI system is likely to be based on multiple different ML models working together, and an organization might be looking to build multiple different AI systems. Consequently, risks can be evaluated for each use case and at different levels, namely model risk, AI system risk, and enterprise risk.
Enterprise risk encompasses the broad spectrum of risks that an organization may face, including financial, operational, and strategic risks. AI system risk focuses on the impact associated with the implementation and operation of AI systems, whereas ML model risk pertains specifically to the vulnerabilities and uncertainties inherent in ML models.
In this post, we focus on AI system risk, primarily. However, it’s important to note that all different levels of risk management within an organization should be considered and aligned.
How is AI system risk defined?
Risk management in the context of an AI system can be a path to minimize the effect of uncertainty or potential negative impacts, while also providing opportunities to maximize positive impacts. Risk itself is not a potential harm but the effect of uncertainty on objectives. According to the NIST Risk Management Framework (NIST RMF), risk can be estimated as a multiplicative measure of an event’s probability of occurring timed by the magnitudes of the consequences of the corresponding event.

There are two aspects to risk: inherent risk and residual risk. Inherent risk represents the amount of risk the AI system exhibits in absence of mitigations or controls. Residual risk captures the remaining risks after factoring in mitigation strategies.
Always keep in mind that risk assessment is a human-centric activity that requires organization-wide efforts; these efforts range from ensuring all relevant stakeholders are included in the assessment process (such as product, engineering, science, sales, and security teams) to assessing how social perspectives and norms influence the perceived likelihood and consequences of certain events.
Why should your organization care about risk evaluation?
Establishing risk management frameworks for AI systems can benefit society at large by promoting the safe and responsible design, development and operation of AI systems. Risk management frameworks can also benefit organizations through the following:

Improved decision-making – By understanding the risks associated with AI systems, organizations can make better decisions about how to mitigate those risks and use AI systems in a safe and responsible manner
Increased compliance planning – A risk assessment framework can help organizations prepare for risk assessment requirements in relevant laws and regulations
Building trust – By demonstrating that they are taking steps to mitigate the risks of AI systems, organizations can show their customers and stakeholders that they are committed to using AI in a safe and responsible manner

How to assess risk?

As a first step, an organization should consider describing the AI use case that needs to be assessed and identify all relevant stakeholders. A use case is a specific scenario or situation that describes how users interact with an AI system to achieve a particular goal. When creating a use case description, it can be helpful to specify the business problem being solved, list the stakeholders involved, characterize the workflow, and provide details regarding key inputs and outputs of the system.
When it comes to stakeholders, it’s easy to overlook some. The following figure is a good starting point to map out AI stakeholder roles.

Source: “Information technology – Artificial intelligence – Artificial intelligence concepts and terminology”.

An important next step of the AI system risk assessment is to identify potentially harmful events associated with the use case. In considering these events, it can be helpful to reflect on different dimensions of responsible AI, such as fairness and robustness, for example. Different stakeholders might be affected to different degrees along different dimensions. For example, a low robustness risk for an end-user could be the result of an AI system exhibiting minor disruptions, whereas a low fairness risk could be caused by an AI system producing negligibly different outputs for different demographic groups.
To estimate the risk of an event, you can use a likelihood scale in combination with a severity scale to measure the probability of occurrence as well as the degree of consequences. A helpful starting point when developing these scales might be the NIST RMF, which suggests using qualitative nonnumerical categories ranging from very low to very high risk or semi-quantitative assessments principles, such as scales (such as 1–10), bins, or otherwise representative numbers. After you have defined the likelihood and severity scales for all relevant dimensions, you can use a risk matrix scheme to quantify the overall risk per stakeholders along each relevant dimension. The following figure shows an example risk matrix.

Using this risk matrix, we can consider an event with low severity and rare likelihood of occurring as very low risk. Keep in mind that the initial assessment will be an estimate of inherent risk, and risk mitigation strategies can help lower the risk levels further. The process can then be repeated to generate a rating for any remaining residual risk per event. If there are multiple events identified along the same dimension, it can be helpful to pick the highest risk level among all to create a final assessment summary.
Using the final assessment summary, organizations will have to define what risk levels are acceptable for their AI systems as well as consider relevant regulations and policies.
AWS commitment
Through engagements with the White House and UN, among others, we are committed to sharing our knowledge and expertise to advance the responsible and secure use of AI. Along these lines, Amazon’s Adam Selipsky recently represented AWS at the AI Safety Summit with heads of state and industry leaders in attendance, further demonstrating our dedication to collaborating on the responsible advancement of artificial intelligence.
As AI continues to advance, risk assessment is becoming increasingly important and useful for organizations looking to build and deploy AI responsibly. By establishing a risk assessment framework and risk mitigation plan, organizations can reduce the risk of potential AI-related incidents and earn trust with their customers, as well as reap benefits such as improved reliability, improved fairness for different demographics, and more.
Go ahead and get started on your journey of developing a risk assessment framework in your organization and share your thoughts in the comments.
Also check out an overview of generative AI risks published on Amazon Science: Responsible AI in the generative era, and explore the range of AWS services that can support you on your risk assessment and mitigation journey: Amazon SageMaker Clarify, Amazon SageMaker Model Monitor, AWS CloudTrail, as well as the model governance framework.

About the Authors
Mia C. Mayer is an Applied Scientist and ML educator at AWS Machine Learning University; where she researches and teaches safety, explainability and fairness of Machine Learning and AI systems. Throughout her career, Mia established several university outreach programs, acted as a guest lecturer and keynote speaker, and presented at numerous large learning conferences. She also helps internal teams and AWS customers get started on their responsible AI journey.
Denis V. Batalov is a 17-year Amazon veteran and a PhD in Machine Learning, Denis worked on such exciting projects as Search Inside the Book, Amazon Mobile apps and Kindle Direct Publishing. Since 2013 he has helped AWS customers adopt AI/ML technology as a Solutions Architect. Currently, Denis is a Worldwide Tech Leader for AI/ML responsible for the functioning of AWS ML Specialist Solutions Architects globally. Denis is a frequent public speaker, you can follow him on Twitter @dbatalov.
Dr. Sara Liu is a Senior Technical Program Manager with the AWS Responsible AI team. She works with a team of scientists, dataset leads, ML engineers, researchers, as well as other cross-functional teams to raise the responsible AI bar across AWS AI services. Her current projects involve developing AI service cards, conducting risk assessments for responsible AI, creating high-quality evaluation datasets, and implementing quality programs. She also helps internal teams and customers meet evolving AI industry standards.

Lost Conversions, Found: 12 Pre-Cart Abandonment Strategies

With 97% of potential customers exiting a website before reaching the shopping cart, understanding pre-cart abandonment is essential. 

These visitors represent a huge opportunity – and a huge missed opportunity! After all, these people are familiar with your brand, interested in your business, and interested in buying what you have to sell. You have to figure out how to capture their information and how to get them into the funnel.  

That’s where pre-cart abandonment strategies come in. But we have to go beyond the usual best practices. Yes, we know that factors like slow websites and poor design lead to abandonment. That’s not enough. We need to make the extra effort to minimize abandonment and optimize the overall conversion funnel. 

Let’s explore 12 pre-cart abandonment strategies to address these challenges and guide potential customers back into the sales journey.

Website Visitor Identification

Targeted Remarketing Campaigns

Exit-Intent Popups

Email Capture Strategies

Personalized Product Recommendations

Optimized Call-to-Action (CTA) Buttons

User-Friendly Navigation

Data Analysis for Personalization

Limited-Time Promotions

Live Chat Support

Clear Shipping and Return Information

Social Proof and Trust Badges

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 50 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

1. Website Visitor Identification

We’ve talked a lot about website visitor identification lately, and we are going to continue talking about it. Being able to identify who is on your website – names, emails, companies, phone numbers, etc – is a secret weapon for any online business. 

It’s also unbelievably easy to do. With the Customers.ai Website Visitor ID X-Ray Pixel, you can start identifying anonymous visitors in 90 seconds. Here’s how:

Step 1: To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations. 

Step 2: Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

From there, you can build email campaigns, remarketing campaigns, sales outreach strategies, and so much more.

2. Targeted Remarketing Campaigns

Targeted remarketing campaigns are the perfect way to re-engage visitors who left but have displayed interest in your products.

Retargeting ads let you delicately remind them of the products they explored and by leveraging personalized ad creative, you can speak directly to their preferences. 

The key to engaging these people lies in the content itself. You must create compelling content that serves as a persuasive nudge and encourages them to return and finalize their purchase. 

Unfortunately for marketers, retargeting audiences are shrinking, and first-party data ownership has become imperative.  

Sophisticated tools like the Customers.ai Website Visitor ID X-Ray pixel mentioned above, allow you to capture that first-party data and build sizable and precisely targeted remarketing lists.

3. Exit-Intent Popups

Strategically timed to appear as visitors show signs of leaving, exit-intent popups present a golden opportunity to capture crucial information. 

By offering exclusive deals, enticing discounts, or valuable content, you can incentivize users to share their email addresses, laying the foundation for a robust subscriber list. 

Exit-intent popups are a proactive approach that not only help retain those looking to jump ship, but also open avenues for sustained engagement and targeted marketing efforts. Turn exits into entrances for future conversions.

4. Email Capture Strategies

The key to an effective pre-cart email capture strategy is to go beyond the conventional. 

Craft compelling lead magnets—be it newsletters, exclusive offers, or downloadable resources—that entice visitors to share their email addresses willingly. Strategically placed opt-in forms on pre-cart pages serve as the gateway to building a robust subscriber list. 

By creating engaging and unobtrusive opportunities for subscription, you not only capture valuable contact information but also pave the way for establishing a direct line of communication with potential customers. 

5. Personalized Product Recommendations

Personalized product recommendations are a must for any pre-cart abandonment strategy. 

Implementing advanced algorithms that scrutinize user behavior enable you to offer tailor-made suggestions. By showcasing these particular recommendations on pre-cart pages—crafted based on individual browsing history and preferences—you enhance user engagement and significantly boost the likelihood of a conversion. 

Here’s the thing. The power of personalization lies in its ability to resonate with individiuals. Just like email capture strategies and remarketing campaigns require data and creativity, so does your personalization strategy. 

6. Optimized Call-to-Action (CTA) Buttons

The journey from visitor to customer often hinges on the effectiveness of your Call-to-Action (CTA) buttons. 

Design clear and compelling CTAs that seamlessly direct visitors through the conversion funnel. 

Here is an example of a CTA we use to promote our webinar, Beyond Abandoned Cart to Abandoned Product View Revenue:

Ecommerce Webinar

Beyond Abandoned Cart to Abandoned Product View Revenue

with Email Deliverability Hacks & AI Tools

Watch The Webinar

To refine and amplify their impact, use A/B testing methodologies to experiment with different CTA variations. A/B testing allows you to pinpoint the most effective elements, ensuring your CTAs not only capture attention but also prompt decisive action. 

By optimizing your CTA buttons, you help visitors take the next step in the buyer journey, ultimately enhancing the overall conversion rate of your pre-cart abandonment strategy.

7. User-Friendly Navigation

The first impression often begins with navigation, making it a pivotal aspect of your pre-cart strategy. 

You must optimize your website’s navigation to guarantee a seamless and user-friendly experience for every visitor. 

Streamline menus and eliminate unnecessary complexities

Offer clear and intuitive pathways that guide users to their desired products

Reduce friction in the navigation process

A user-friendly interface not only enhances the overall browsing experience but also sets the stage for increased engagement and conversions in the pre-cart phase.

8. Data Analysis for Personalization

Data, data, data. You must use the data you have at your disposal to gain insights into visitor behavior and preferences and form a comprehensive understanding of their unique journeys. 

This valuable data serves as the foundation for a personalized user experience, allowing you to tailor pre-cart pages with precision. From curated product offerings to the personalized recommendations we mentioned above, using data analysis enhances the relevance of your content, making each interaction more meaningful for the visitor. 

9. Limited-Time Promotions

Inject a sense of urgency into your pre-cart abandonment strategy with the strategic use of limited-time promotions. 

By offering exclusive discounts or time-sensitive deals, you create an atmosphere that encourages immediate action from potential customers. 

The trick here is to display these promotions on pre-cart pages, ensuring that visitors can’t overlook the offers. This tactic not only captures attention but also taps into the psychology of urgency, prompting users to make quicker decisions and move closer to completing their purchase. 

Limited-time promotions can help drive immediate conversions and instill a sense of excitement and anticipation.

10. Live Chat Support

Offering real-time assistance to visitors ensures that their questions and concerns are addressed promptly, reducing the likelihood of abandonment. 

Beyond immediate support, live chat can give you some really valuable data. Use those live chat interactions to see what customers care about. What are they asking? What are their concerns? 

This data becomes a powerful tool, enabling you to tailor personalized recommendations based on the specific needs and preferences expressed during chat sessions. 

Live chat support lets you enhance customer satisfaction and create a dynamic and interactive environment that fosters trust and loyalty.

11. Clear Shipping and Return Information

Build trust and alleviate customer concerns by prioritizing transparent communication about shipping and return policies on your pre-cart pages. 

Make sure information regarding shipping costs and return procedures is easily accessible. It can help eliminate any uncertainties potential customers may have and prevent site abandonment. 

By addressing these concerns proactively, you create a sense of reliability and openness, reinforcing the trustworthiness of your brand. Clear and upfront communication fosters a positive perception and streamlines the decision-making process for visitors. The result? An increased likelihood of conversions in the pre-cart phase.

12. Social Proof and Trust Badges

92% of consumers trust recommendations from friends and family more than all other forms of marketing. Why? Because they trust those people, therefore they trust your business. 

Establishing trust is paramount in the world of ecommerce, and leveraging social proof and trust badges is a potent strategy in your pre-cart approach. 

Display customer testimonials and reviews directly on your pre-cart pages. It builds credibility and instills confidence in potential buyers. Additionally, incorporate trust badges and security certifications visibly, assuring visitors of the safety and reliability of their transactions. 

By highlighting positive experiences and emphasizing security measures, you create an environment customers feel comfortable in, leading to higher conversions in the critical pre-cart phase.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 50 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Preparing Your Pre-Cart Abandonment Strategy

Mastering pre-cart abandonment is the key to unlocking untapped sales opportunities. 

By embracing innovative strategies such as website visitor identification, targeted remarketing campaigns, personalized product recommendations, and more, businesses can not only recapture the attention of departing visitors but also guide them toward completing their purchases. 

Plus, as cookie tracking continues to go away and tech companies make it harder and harder to market to individuals, implementing these comprehensive pre-cart strategies becomes essential for not just retaining potential customers but also ensuring a dynamic and engaging online shopping experience.

One of the easiest ways to start is with Customers.ai. Download the Website Visitor ID X-Ray pixel for free and start capturing these pre-cart visitors.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 50 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Pre-Cart Abandonment FAQs

Q: What is pre-cart abandonment in e-commerce?

Pre-cart abandonment refers to users leaving a website before adding items to their shopping cart, indicating a lost potential sale.

Q: How to track pre-cart abandonment on an e-commerce site?

Use analytics tools to monitor user behavior, focusing on page exits before the checkout process.

Q: Why do customers abandon their carts before adding items?

Common reasons include high shipping costs, complicated checkout processes, and unexpected fees.

Q: What are effective strategies to reduce pre-cart abandonment?

Streamline the checkout process, offer transparent pricing, and provide personalized recommendations.

Q: Can exit-intent pop-ups help prevent pre-cart abandonment?

Yes, strategically implemented exit-intent pop-ups can offer incentives or assistance, potentially retaining users.

Q: How does mobile optimization impact pre-cart abandonment?

Mobile optimization is crucial as more users shop on mobile devices, ensuring a seamless and user-friendly experience.

Q: Are discounts effective in preventing pre-cart abandonment?

Discounts can be effective if strategically used to address specific concerns that lead to abandonment.

Q: What role does real-time customer support play in preventing pre-cart abandonment?

Real-time support, like live chat, can address customer concerns immediately, potentially saving a sale.

Q: How does website loading speed affect pre-cart abandonment?

Slow loading times can frustrate users and lead to abandonment; optimize your site for better performance.

Q: Is offering a guest checkout option effective in reducing pre-cart abandonment?

Yes, providing a guest checkout option simplifies the process, reducing barriers for users who don’t want to create an account.

Q: Can retargeting ads help recover potential pre-cart abandoners?

Yes, retargeting ads remind users of their abandoned items and offer incentives to encourage them to return and complete the purchase.

Q: How can email marketing address pre-cart abandonment?

Email campaigns can send personalized reminders, incentives, and re-engage users who abandoned their carts.

Q: Is conducting surveys effective in understanding reasons for pre-cart abandonment?

Yes, surveys provide valuable insights into reasons for abandonment, helping you make targeted improvements.

Q: How can social proof be used to reduce pre-cart abandonment?

Displaying social proof, like customer testimonials, builds trust and alleviates concerns that lead to abandonment.

Q: What impact does a clear return policy have on pre-cart abandonment?

A transparent return policy reassures customers, addressing concerns about potential issues with purchased products.

Q: Is A/B testing recommended for optimizing the checkout process?

Yes, A/B testing allows you to experiment with different checkout elements, identifying and implementing improvements.

Q: How does transparency in pricing reduce pre-cart abandonment?

Transparent pricing builds trust, reducing the likelihood of users abandoning their carts due to unexpected costs.

Q: Why is personalization important in reducing pre-cart abandonment?

Personalization enhances the shopping experience, making users more likely to complete a purchase.

Q: What role do trust signals play in preventing pre-cart abandonment?

Trust signals, such as secure payment options and clear privacy policies, instill confidence and reduce abandonment.

Q: How does offering limited-time promotions impact pre-cart abandonment?

Limited-time promotions create a sense of urgency, motivating users to complete their purchases to avail of the offer.

Q: How can social media integration reduce pre-cart abandonment?

Social media integration allows users to share and discuss potential purchases, creating a sense of community and trust.

Q: What impact does a user-friendly product search have on pre-cart abandonment?

A user-friendly product search reduces frustration, helping users quickly find and add items to their carts.

Q: Why is simplicity in design crucial for preventing pre-cart abandonment?

Simple and intuitive design reduces cognitive load, making the shopping experience more enjoyable and reducing abandonment.

Q: How does addressing common objections in product descriptions impact pre-cart abandonment?

Addressing objections in product descriptions provides clarity, helping users make informed decisions and reducing abandonment.

Q: Can gamification elements be used to reduce pre-cart abandonment?

Yes, gamification elements, like progress bars or rewards for completing the checkout, can make the process more engaging, reducing abandonment.
The post Lost Conversions, Found: 12 Pre-Cart Abandonment Strategies appeared first on Customers.ai.

McMaster University and FAIR Meta Researchers Propose a Novel Machine …

Researchers from McMaster University and FAIR Meta have developed a new machine learning (ML) technique for orbital-free density functional theory (OF-DFT). This ML method optimizes the total energy function and successfully replicates electronic density across various chemical systems. The approach has been applied to simulate lithium hydride, hydrogen, and water molecules, and the memory-efficient gradient optimization method enhances accuracy by optimizing the Laplacian operator and solving Hartree and external potential functionals.

There are existing methods to calculate molecular electronic energy, such as the traditional Kohn-Sham density functional theory (KS-DFT), which relies on molecular orbitals. However, an unexplored approach called OF-DFT has been developed that utilizes electron density to minimize a point and is more suitable for complex systems.

OF-DFT is an electron density-centric computational approach in quantum chemistry and condensed matter physics, offering advantages over KS-DFT for large systems. It determines ground-state properties through electron density minimization, aligning with the Hohenberg-Kohn theorems. It introduces a unique approach using a normalizing flow ansatz to parameterize and optimize the electronic density, successfully replicating it for diverse chemical systems.

The proposed method for optimizing total energy function in OF-DFT involves employing a normalizing flow ansatz to parameterize electronic density across various chemical systems. It is achieved through continuous normalizing flows that transform electronic density by solving ordinary differential equations using a neural network. Gradient-based algorithms are used for total energy optimization, while Monte Carlo sampling is utilized for relevant quantities. Also, a memory-efficient gradient optimization method is employed for solving the Laplacian operator and functionals related to the Hartree and external potentials in OF-DFT.

The method successfully modeled diatomic molecules, specifically LiH, and conducted extensive simulations of hydrogen and water molecules. The model accurately replicated electronic density in various chemical systems, exhibiting changes in density and potential energy surface during the optimization of H2 and H2O molecules. Comparative analysis with the Hartree-Fock model using the STO-3G basis set demonstrated higher density around nuclei in the continuous normalizing flow model. The density functional value was computed using an exponential moving average throughout the optimization process.

In conclusion, the OF-DFT approach utilizing continuous normalizing flows for density transformation is a promising constraint-free solution for accurately describing electronic density and potential energy surfaces across various chemical systems. Its ability to replicate high density around nuclei, as demonstrated in the study with molecules such as LiH, hydrogen, and water, highlights its potential for further refinement and application.

Future work in OF-DFT electronic structure calculations could involve:

Refining the normalizing flow ansatz for electronic density.

Extending the continuous normalizing flow approach to more complex chemical systems.

Conducting comparative analyses to assess the accuracy of the CNF model.

Integrating the CNF model with other machine learning techniques to improve efficiency and precision.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post McMaster University and FAIR Meta Researchers Propose a Novel Machine Learning Approach by Parameterizing the Electronic Density with a Normalizing Flow Ansatz appeared first on MarkTechPost.

‘Lookahead Decoding’: A Parallel Decoding Algorithm to Accelerate …

Although large language models (LLMs) such as GPT-4 and LLaMA are rapidly reimagining modern-day applications, their inference is slow and difficult to optimize because it is based on autoregressive decoding. The delay of an LLM request mostly depends on the answer length of the request or, equivalently, the number of decoding steps because each autoregressive decoding step yields only one token at a time. Unfortunately, current GPUs’ parallel processing capacity is generally underutilized because each decoding step does not take advantage of it. This presents a problem for many practical LLM applications like chatbots and personal assistants, which rely on instantaneous responses and so regularly produce large sequences with low latency.

Auto-regressive decoding can be sped up with the use of speculative decoding methods like Medusa and OSD, which use a “guess-and-verify” strategy in which a preliminary model makes predictions about several possible tokens in the future, and the original LLM checks these predictions in parallel. These methods can reduce latency by taking advantage of situations where fewer decoding steps are required. They do, however, have some restrictions. To begin, the token acceptance rate, or, equivalently, how correctly the draft model can anticipate the outputs of the main model, is the upper bound on the maximum speedup that speculative decoding-based approaches may achieve. Second, developing a reliable preliminary model is not easy; it typically necessitates more training and careful adjustment to account for variations in traffic over time.

A new study by LMSYS ORG presents lookahead decoding, a novel accurate decoding technique developed to address these difficulties. Although it is computationally prohibitive to decode many subsequent tokens in a single step, it has been observed that an LLM can produce numerous orthogonal n-grams simultaneously. These n-grams could potentially fit into future parts of the created sequence. The traditional Jacobi iteration method is adapted for parallel decoding, which allows autoregressive decoding to be seen as the solution of nonlinear equations. The n-grams that are produced are recorded, checked, and then, if appropriate, incorporated into the sequence. Lookahead decoding is particularly notable since it: 

It uses no preliminary model, which speeds up the rollout.

Reduces the total number of decoding steps by a factor of log(FLOPs) for each stage.

The researchers demonstrate that lookahead decoding significantly decreases latency by 1.5x-2.3x with almost no increase in computational burden. Perhaps most significantly, it enables the tradeoff of processing for reduced latency, albeit with diminishing benefits.

They have created their implementation to make lookahead decoding work with huggingface/transformers. HuggingFace provides a native-generated function, but users can significantly boost its efficiency with a few lines of code. 

Jacobi iteration is a time-tested technique for resolving nonlinear systems. LLM inference can also be used for token creation in parallel without needing a pre-trained model. Since each step of Jacobi decoding involves LLM forward computation on >1 token, it is significantly more expensive in terms of FLOPs required than each step of autoregressive decoding. The researchers have observed several difficulties that can be encountered when attempting to significantly improve the wallclock performance of Jacobi decoding in real-world applications. While it can decode many tokens in a series of steps, it often gets their order wrong. Even when properly anticipated, tokens are often replaced in the following cycles. As a result, few iterations successfully decode and correctly place numerous tokens simultaneously. Because of this, the entire point of using parallel decoding is nullified. Generally, it does not result in performance drops because of the parallel processing capabilities of graphics processing units.

Lookahead decoding can circumvent its shortcomings by capitalizing on Jacobi Decoding’s capacity to generate parallel n-grams. Each new token at a point is decoded using the values at that position in previous iterations, as seen in Jacobi decoding. Many n-grams are formed due to this process, which builds a timeline of historical tokens at each token position. To use this, lookahead decoding will gather and cache these n-grams based on their trajectories. Lookahead decoding simultaneously checks promising n-grams from the cache while performing parallel decoding using Jacobi iterations for future tokens.

Each lookahead decoding phase is split into two parallel branches—the lookahead branch and the verification branch—to improve efficiency. To produce n-grams from the Jacobi iteration trajectory, the lookahead branch keeps a constant-sized, two-dimensional window. At the same time, candidates for n-grams that show promise are chosen and checked by the verification branch.

Since memory bandwidth is the primary bottleneck in LLM decoding, the researchers combine the lookahead and verification branches into a single pass, taking advantage of the GPU’s parallel processing capacity while concealing any associated overheads. 

The team tested different sizes of LLaMA-2-Chat and CodeLLaMA on MT-bench, HumanEval, and GSM8K to see how effective their look-ahead decoding is. The lookahead decoding technique delivers speedup without the need for fine-tuning or preliminary models. Under fp16 precision, they assess the 7B, 13B, and 33B models on a single A100 GPU and the 70B model on two A100 GPUs with pipeline parallelism.

MT-Bench LLaMA Discussion: In many model configurations, the speedup achieved by lookahead decoding is around 1.5x.

HumanEval’s CodeLLaMA: CodeLLaMA’s latency is reduced by more than two times when using lookahead decoding on HumanEval. This is because there are numerous easily guessable N-grams included in the code.

Instructional CodeLLaMA for GSM8K: Lookahead decoding reduces latency by 1.8 thanks to CodeLLama-Instructor’s application to GSM8K’s mathematical challenges.

The post ‘Lookahead Decoding’: A Parallel Decoding Algorithm to Accelerate LLM Inference appeared first on MarkTechPost.

ETH Zurich Researchers Introduce UltraFastBERT: A BERT Variant that Us …

The development of UltraFastBERT by researchers at ETH Zurich addressed the problem of reducing the number of neurons used during inference while maintaining performance levels similar to other models. It was achieved through fast feedforward networks (FFFs), which resulted in a significant speedup compared to baseline implementations.

The existing methods have been supported by the code, benchmarking setup, and model weights provided by the researchers at ETH Zurich. They have also suggested exploring multiple FFF trees for joint computation and the potential application in large language models like GPT-3. The study proposes further acceleration through hybrid sparse tensors and device-specific optimizations.

UltraFastBERT shows efficient language modeling with selective engagement during inference. It replaces the feedforward networks of traditional models with simplified FFFs, using consistent activation functions and all-node output weights while eliminating biases. Multiple FFF trees collaboratively compute intermediate layer outputs, allowing for diverse architectures. The provided high-level CPU and PyTorch implementations yield substantial speedups, and the research explores potential acceleration through multiple FFF trees and suggests replacing large language model feedforward networks with FFFs. Intel MKL and NVIDIA cuBLAS are proposed for device-specific optimization.

UltraFastBERT achieves comparable performance to BERT-base, using only 0.3% of its neurons during inference. Trained on a single GPU for a day, it retains at least 96.0% of GLUE predictive performance. UltraFastBERT-1×11-long matches BERT-base performance with 0.3% of its neurons. Performance decreases with deeper fast feedforward networks, but excluding CoLA, all UltraFastBERT models preserve at least 98.6% of predictive performance. Comparisons show significant speedups with quick feedforward layers, achieving 48x to 78x more immediate inference on CPU and a 3.15x speedup on GPU, suggesting potential for large model replacements.

In conclusion, UltraFastBERT is a modification of BERT that achieves efficient language modeling while using only a small fraction of its neurons during inference. The model employs FFFs for substantial speedup, with the provided CPU and PyTorch implementations achieving 78x and 40x speedups, respectively. The study suggests potential further acceleration by implementing primitives for conditional neural execution. Despite using only 0.3% of its neurons, UltraFastBERT’s best model matches BERT-base performance, showcasing the potential for efficient language modeling. UltraFastBERT showcases potential advancements in efficient language modeling, paving the way for faster and resource-friendly models in the future.

The proposed avenues for further research include implementing efficient FFF inference using hybrid vector-level sparse tensors and device-specific optimizations. Exploring the full potential of conditional neural execution for accelerated language modeling is suggested. The potential optimization of large language models by replacing feedforward networks with FFFs is discussed. Future work could focus on reproducible implementations in popular frameworks like PyTorch or TensorFlow and extensive benchmarking to evaluate the performance and practical implications of UltraFastBERT and similar efficient language models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post ETH Zurich Researchers Introduce UltraFastBERT: A BERT Variant that Uses 0.3% of its Neurons during Inference while Performing on Par with Similar BERT Models appeared first on MarkTechPost.

Introducing three new NVIDIA GPU-based Amazon EC2 instances

Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: Amazon EC2 P5e instances powered by NVIDIA H200 GPUs, Amazon EC2 G6 instances featuring NVIDIA L4 GPUs, and Amazon EC2 G6e instances powered by NVIDIA L40S GPUs. All three instances will be available in 2024, and we look forward to seeing what you can do with them.
AWS and NVIDIA have collaborated for over 13 years and have pioneered large-scale, highly performant, and cost-effective GPU-based solutions for developers and enterprise across the spectrum. We have combined NVIDIA’s powerful GPUs with differentiated AWS technologies such as AWS Nitro System, 3,200 Gbps of Elastic Fabric Adapter (EFA) v2 networking, hundreds of GB/s of data throughput with Amazon FSx for Lustre, and exascale computing with Amazon EC2 UltraClusters to deliver the most performant infrastructure for AI/ML, graphics, and HPC. Coupled with other managed services such as Amazon Bedrock, Amazon SageMaker, and Amazon Elastic Kubernetes Service (Amazon EKS), these instances provide developers with the industry’s best platform for building and deploying generative AI, HPC, and graphics applications.
High-performance and cost-effective GPU-based instances for AI, HPC, and graphics workloads
To power the development, training, and inference of the largest large language models (LLMs), EC2 P5e instances will feature NVIDIA’s latest H200 GPUs, which offer 141 GBs of HBM3e GPU memory, which is 1.7 times larger and 1.4 times faster than H100 GPUs. This boost in GPU memory along with up to 3200 Gbps of EFA networking enabled by AWS Nitro System will enable you to continue to build, train, and deploy your cutting-edge models on AWS.
EC2 G6e instances, featuring NVIDIA L40S GPUs, are built to provide developers with a broadly available option for training and inference of publicly available LLMs, as well as support the increasing adoption of Small Language Models (SLM). They are also optimal for digital twin applications that use NVIDIA Omniverse for describing and simulating across 3D tools and applications, and for creating virtual worlds and advanced workflows for industrial digitalization.
EC2 G6 instances, featuring NVIDIA L4 GPUs, will deliver a lower-cost, energy-efficient solution for deploying ML models for natural language processing, language translation, video and image analysis, speech recognition, and personalization as well as graphics workloads, such as creating and rendering real-time, cinematic-quality graphics and game streaming.

About the Author
Chetan Kapoor is the Director of Product Management for the Amazon EC2 Accelerated Computing Portfolio.

Boost inference performance for LLMs with new Amazon SageMaker contain …

Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% on average and improves throughput by 60% on average for Llama2-70B, Falcon-40B and CodeLlama-34B models, compared to previous version.
LLMs have seen an unprecedented growth in popularity across a broad spectrum of applications. However, these models are often too large to fit on a single accelerator or GPU device, making it difficult to achieve low-latency inference and scale. SageMaker offers LMI DLCs to help you maximize the utilization of available resources and improve performance. The latest LMI DLCs offer continuous batching support for inference requests to improve throughput, efficient inference collective operations to improve latency, Paged Attention V2 (which improves the performance of workloads with longer sequence lengths), and the latest TensorRT-LLM library from NVIDIA to maximize performance on GPUs. LMI DLCs offer a low-code interface that simplifies compilation with TensorRT-LLM by just requiring the model ID and optional model parameters; all of the heavy lifting required with building a TensorRT-LLM optimized model and creating a model repo is managed by the LMI DLC. In addition, you can use the latest quantization techniques—GPTQ, AWQ, and SmoothQuant—that are available with LMI DLCs. As a result, with LMI DLCs on SageMaker, you can accelerate time-to-value for your generative AI applications and optimize LLMs for the hardware of your choice to achieve best-in-class price-performance.
In this post, we dive deep into the new features with the latest release of LMI DLCs, discuss performance benchmarks, and outline the steps required to deploy LLMs with LMI DLCs to maximize performance and reduce costs.
New features with SageMaker LMI DLCs
In this section, we discuss three new features with SageMaker LMI DLCs.
SageMaker LMI now supports TensorRT-LLM
SageMaker now offers NVIDIA’s TensorRT-LLM as part of the latest LMI DLC release (0.25.0), enabling state-of-the-art optimizations like SmoothQuant, FP8, and continuous batching for LLMs when using NVIDIA GPUs. TensorRT-LLM opens the door to ultra-low latency experiences that can greatly improve performance. The TensorRT-LLM SDK supports deployments ranging from single-GPU to multi-GPU configurations, with additional performance gains possible through techniques like tensor parallelism. To use the TensorRT-LLM library, choose the TensorRT-LLM DLC from the available LMI DLCs and set engine=MPI among other settings such as option.model_id. The following diagram illustrates the TensorRT-LLM tech stack.

Efficient inference collective operations
In a typical deployment of LLMs, model parameters are spread across multiple accelerators to accommodate the requirements of a large model that can’t fit on a single accelerator. This enhances inference speed by enabling each accelerator to carry out partial calculations in parallel. Afterwards, a collective operation is introduced to consolidate these partial results at the end of these processes, and redistribute them among the accelerators.
For P4D instance types, SageMaker implements a new collective operation that speeds up communication between GPUs. As a result, you get lower latency and higher throughput with the latest LMI DLCs compared to previous versions. Furthermore, this feature is supported out of the box with LMI DLCs, and you don’t need to configure anything to use this feature because it’s embedded in the SageMaker LMI DLCs and is exclusively available for Amazon SageMaker.
Quantization support
SageMaker LMI DLCs now support the latest quantization techniques, including pre-quantized models with GPTQ, Activation-aware Weight Quantization (AWQ), and just-in-time quantization like SmoothQuant.
GPTQ allows LMI to run popular INT3 and INT4 models from Hugging Face. It offers the smallest possible model weights that can fit on a single GPU/multi-GPU. LMI DLCs also support AWQ inference, which allows faster inference speed. Finally, LMI DLCs now support SmoothQuant, which allows INT8 quantization to reduce the memory footprint and computational cost of models with minimal loss in accuracy. Currently, we allow you to do just-in-time conversion for SmoothQuant models without any additional steps. GPTQ and AWQ need to be quantized with a dataset to be used with LMI DLCs. You can also pick up popular pre-quantized GPTQ and AWQ models to use on LMI DLCs. To use SmoothQuant, set option.quantize=smoothquant with engine=DeepSpeed in serving.properties. A sample notebook using SmoothQuant for hosting GPT-Neox on ml.g5.12xlarge is located on GitHub.
Using SageMaker LMI DLCs
You can deploy your LLMs on SageMaker using the new LMI DLCs 0.25.0 without any changes to your code. SageMaker LMI DLCs use DJL serving to serve your model for inference. To get started, you just need to create a configuration file that specifies settings like model parallelization and inference optimization libraries to use. For instructions and tutorials on using SageMaker LMI DLCs, refer to Model parallelism and large model inference and our list of available SageMaker LMI DLCs.
The DeepSpeed container includes a library called LMI Distributed Inference Library (LMI-Dist). LMI-Dist is an inference library used to run large model inference with the best optimization used in different open-source libraries, across vLLM, Text-Generation-Inference (up to version 0.9.4), FasterTransformer, and DeepSpeed frameworks. This library incorporates open-source popular technologies like FlashAttention, PagedAttention, FusedKernel, and efficient GPU communication kernels to accelerate the model and reduce memory consumption.
TensorRT LLM is an open-source library released by NVIDIA in October 2023. We optimized the TensorRT-LLM library for inference speedup and created a toolkit to simplify the user experience by supporting just-in-time model conversion. This toolkit enables users to provide a Hugging Face model ID and deploy the model end-to-end. It also supports continuous batching with streaming. You can expect approximately 1–2 minutes to compile the Llama-2 7B and 13B models, and around 7 minutes for the 70B model. If you want to avoid this compilation overhead during SageMaker endpoint setup and scaling of instances , we recommend using ahead of time (AOT) compilation with our tutorial to prepare the model. We also accept any TensorRT LLM model built for Triton Server that can be used with LMI DLCs.
Performance benchmarking results
We compared the performance of the latest SageMaker LMI DLCs version (0.25.0) to the previous version (0.23.0). We conducted experiments on the Llama-2 70B, Falcon 40B, and CodeLlama 34B models to demonstrate the performance gain with TensorRT-LLM and efficient inference collective operations (available on SageMaker).
SageMaker LMI containers come with a default handler script to load and host models, providing a low-code option. You also have the option to bring your own script if you need to do any customizations to the model loading steps. You need to pass the required parameters in a serving.properties file. This file contains the required configurations for the Deep Java Library (DJL) model server to download and host the model. The following code is the serving.properties used for our deployment and benchmarking:


The engine parameter is used to define the runtime engine for the DJL model server. We can specify the Hugging Face model ID or Amazon Simple Storage Service (Amazon S3) location of the model using the model_id parameter. The task parameter is used to define the natural language processing (NLP) task. The tensor_parallel_degree parameter sets the number of devices over which the tensor parallel modules are distributed. The use_custom_all_reduce parameter is set to true for GPU instances that have NVLink enabled to speed up model inference. You can set this for P4D, P4de, P5 and other GPUs that have NVLink connected. The output_formatter parameter sets the output format. The max_rolling_batch_size parameter sets the limit for the maximum number of concurrent requests. The model_loading_timeout sets the timeout value for downloading and loading the model to serve inference. For more details on the configuration options, refer to Configurations and settings.
Llama-2 70B
The following are the performance comparison results of Llama-2 70B. Latency reduced by 28% and throughput increased by 44% for concurrency of 16, with the new LMI TensorRT LLM DLC.

Falcon 40B
The following figures compare Falcon 40B. Latency reduced by 36% and throughput increased by 59% for concurrency of 16, with the new LMI TensorRT LLM DLC.

CodeLlama 34B
The following figures compare CodeLlama 34B. Latency reduced by 36% and throughput increased by 77% for concurrency of 16, with the new LMI TensorRT LLM DLC.

Recommended configuration and container for hosting LLMs
With the latest release, SageMaker is providing two containers: 0.25.0-deepspeed and 0.25.0-tensorrtllm. The DeepSpeed container contains DeepSpeed, the LMI Distributed Inference Library. The TensorRT-LLM container includes NVIDIA’s TensorRT-LLM Library to accelerate LLM inference.
We recommend the deployment configuration illustrated in the following diagram.

To get started, refer to the sample notebooks:

Deploy Llama-2 70B using the TRT-LLM 0.25.0 LMI container
Deploy Llama-2 70B using the DeepSpeed 0.25.0 LMI container

In this post, we showed how you can use SageMaker LMI DLCs to optimize LLMs for your business use case and achieve price-performance benefits. To learn more about LMI DLC capabilities, refer to Model parallelism and large model inference. We’re excited to see how you use these new capabilities from Amazon SageMaker.

About the authors
Michael Nguyen is a Senior Startup Solutions Architect at AWS, specializing in leveraging AI/ML to drive innovation and develop business solutions on AWS. Michael holds 12 AWS certifications and has a BS/MS in Electrical/Computer Engineering and an MBA from Penn State University, Binghamton University, and the University of Delaware.
Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.
Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.
Jian Sheng is a Software Development Engineer at Amazon Web Services who has worked on several key aspects of machine learning systems. He has been a key contributor to the SageMaker Neo service, focusing on deep learning compilation and framework runtime optimization. Recently, he has directed his efforts and contributed to optimizing the machine learning system for large model inference.
Vivek Gangasani is a AI/ML Startup Solutions Architect for Generative AI startups at AWS. He helps emerging GenAI startups build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.
Harish Tummalacherla is Software Engineer with Deep Learning Performance team at SageMaker. He works on performance engineering for serving large language models efficiently on SageMaker. In his spare time, he enjoys running, cycling and ski mountaineering.

Simplify data prep for generative AI with Amazon SageMaker Data Wrangl …

Generative artificial intelligence (generative AI) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Most real-world data exists in unstructured formats like PDFs, which requires preprocessing before it can be used effectively.
According to IDC, unstructured data accounts for over 80% of all business data today. This includes formats like emails, PDFs, scanned documents, images, audio, video, and more. While this data holds valuable insights, its unstructured nature makes it difficult for AI algorithms to interpret and learn from it. According to a 2019 survey by Deloitte, only 18% of businesses reported being able to take advantage of unstructured data.
As AI adoption continues to accelerate, developing efficient mechanisms for digesting and learning from unstructured data becomes even more critical in the future. This could involve better preprocessing tools, semi-supervised learning techniques, and advances in natural language processing. Companies that use their unstructured data most effectively will gain significant competitive advantages from AI. Clean data is important for good model performance. Extracted texts still have large amounts of gibberish and boilerplate text (e.g., read HTML). Scraped data from the internet often contains a lot of duplications. Data from social media, reviews, or any user generated contents can also contain toxic and biased contents, and you may need to filter them out using some pre-processing steps. There could also be a lot of low-quality contents or bot-generated texts, which can be filtered out using accompanying metadata (e.g., filter out customer service responses that received low customer ratings).
Data preparation is important at multiple stages in Retrieval Augmented Generation (RAG) models. The knowledge source documents need preprocessing, like cleaning text and generating semantic embeddings, so they can be efficiently indexed and retrieved. The user’s natural language query also requires preprocessing, so it can be encoded into a vector and compared to document embeddings. After retrieving relevant contexts, they may need additional preprocessing, like truncation, before being concatenated to the user’s query to create the final prompt for the foundation model. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. With this integration, SageMaker Canvas provides customers with an end-to-end no-code workspace to prepare data, build and use ML and foundations models to accelerate time from data to business insights. You can now easily discover and aggregate data from over 50 data sources, and explore and prepare data using over 300 built-in analyses and transformations in SageMaker Canvas’ visual interface.
Solution overview
In this post, we work with a PDF documentation dataset—Amazon Bedrock user guide. Further, we show how to preprocess a dataset for RAG. Specifically, we clean the data and create RAG artifacts to answer the questions about the content of the dataset. Consider the following machine learning (ML) problem: user asks a large language model (LLM) question: “How to filter and search models in Amazon Bedrock?”. LLM has not seen the documentation during the training or fine-tuning stage, thus wouldn’t be able to answer the question and most probably will hallucinate. Our goal with this post, is to find a relevant piece of text from the PDF (i.e., RAG) and attach it to the prompt, thus enabling LLM to answer questions specific to this document.
Below, we show how you can do all these main preprocessing steps from Amazon SageMaker Canvas (powered by Amazon SageMaker Data Wrangler):

Extracting text from a PDF document (powered by Textract)
Remove sensitive information (powered by Comprehend)
Chunk text into pieces.
Create embeddings for each piece (powered by Bedrock).
Upload embedding to a vector database (powered by OpenSearch)

For this walkthrough, you should have the following:

An AWS account with permissions to create AWS Identity and Access Management (AWS IAM) policies and roles
Access to Amazon SageMaker, an instance of Amazon SageMaker Studio, and a user for Studio. For more information about prerequisites, see Getting started with using Amazon SageMaker Canvas.
Access to Amazon Bedrock models. Follow the guidelines for model access.
Access to Amazon Comprehend. The Amazon SageMaker Studio execution role must have permission to call the Amazon Comprehend DetectPiiEntities action.
Access to Amazon Textract. The Amazon SageMaker Studio execution role must have permission to call the Amazon Textract.
Read and write access to an Amazon Simple Storage Service (Amazon S3) bucket.
Access to Amazon OpenSearch as a vector database. The choice of vector database is an important architectural decision. There are several good options to consider, each with their own strengths. In this example, we have chosen Amazon OpenSearch as our vector database.

Note: Create OpenSearch Service domains following the instructions here. For simplicity, let’s pick the option with a master username and password for fine-grained access control. Once the domain is created, create a vector index with the following mappings, and vector dimension 1536 aligns with Amazon Titan embeddings:

PUT knowledge-base-index
“settings”: {
“index.knn”: True
“mappings”: {
“properties”: {
“text_content”: {
“type”: “text”,
“fields”: {
“keyword”: {
“type”: “keyword”
“text_content_v”: {
“type”: “knn_vector”,
“dimension”: 1536

} }

Build a data flow
In this section, we cover how we can build a data flow to extract text and metadata from PDFs, clean and process the data, generate embeddings using Amazon Bedrock, and index the data in Amazon OpenSearch.
Launch SageMaker Canvas
To launch SageMaker Canvas, complete the following steps:

On the Amazon SageMaker Console, choose Domains in the navigation pane.
Choose your domain.
On the launch menu, choose Canvas.

Create a dataflow
Complete the following steps to create a data flow in SageMaker Canvas:

On the SageMaker Canvas home page, choose Data Wrangler.
Choose Create on the right side of page, then give a data flow name and select Create.
This will land on a data flow page.
Choose Import data, select tabular data.

Now let’s import the data from Amazon S3 bucket:

Choose Import data and select Tabular from the drop-down list.
Data Source and select Amazon S3 from the drop-down list.
Navigate to the meta data file with PDF file locations, and choose the file.
Now the metadata file is loaded to the data preparation data flow, and we can proceed to add next steps to transform the data and index into Amazon OpenSearch. In this case the file has following metadata, with the location of each file in Amazon S3 directory.

To add a new transform, complete the following steps:

Choose the plus sign and choose Add Transform.
Choose Add Step and choose Custom Transform.
You can create a custom transform using Pandas, PySpark, Python user-defined functions, and SQL PySpark. Choose Python (PySpark) for this use-case.
Enter a name for the step. From the example code snippets, browse and select extract text from pdf. Make necessary changes to code snippet and select Add.
Let’s add a step to redact Personal Identifiable Information (PII) data from the extracted data by leveraging Amazon Comprehend. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select mask PII. Make necessary changes to code snippet and select Add.

The next step is to chunk the text content. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select Chunk text. Make necessary changes to code snippet and select Add.

Let’s convert the text content to vector embeddings using the Amazon Bedrock Titan Embeddings model. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select Generate text embedding with Bedrock. Make necessary changes to code snippet and select Add.

Now we have vector embeddings available for the PDF file contents. Let’s go ahead and index the data into Amazon OpenSearch. Choose Add Step and choose Custom Transform. And select Python (PySpark). You’re free to rewrite the following code to use your preferred vector database. For simplicity, we are using master username and password to access OpenSearch API’s, for production workloads select option according to your organization policies.

from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
import json
import requests

text_column = “text_redacted_chunks_embedding”
output_column = text_column + “_response”

headers = {“Content-Type”: “application/json”, “kbn-xsrf”: “true”, “osd-xsrf”: “true”, “security_tenant”: “global”};
index_name = ‘s3_vector_data_v1′

def index_data(text_redacted_chunks, text_redacted_chunks_embedding):
input_json = json.dumps({“text_content”: text_redacted_chunks[-1], “text_content_v”: text_redacted_chunks_embedding[-1]})
response = requests.request(method=”POST”,
auth=(master_user, ‘master_pass’),
return response.content

indexing_udf = udf(index_data, StringType())
df = df.withColumn(‘index_response’,
indexing_udf(col(“text_redacted_chunks”), col(“text_redacted_chunks_embedding”)))

Finally, the dataflow created would be as follows:

With this dataflow, the data from the PDF file has been read and indexed with vector embeddings in Amazon OpenSearch. Now it’s time for us to create a file with queries to query the indexed data and save it to the Amazon S3 location. We’ll point our search data flow to the file and output a file with corresponding results in a new file in an Amazon S3 location.
Preparing a prompt
After we create a knowledge base out of our PDF, we can test it by searching the knowledge base for a few sample queries. We’ll process each query as follows:

Generate embedding for the query (powered by Amazon Bedrock)
Query vector database for the nearest neighbor context (powered by Amazon OpenSearch)
Combine the query and the context into the prompt.
Query LLM with a prompt (powered by Amazon Bedrock)
On the SageMaker Canvas home page, choose Data preparation.
Choose Create on the right side of page, then give a data flow name and select Create.

Now let’s load the user questions and then create a prompt by combining the question and the similar documents. This prompt is provided to the LLM for generating an answer to the user question.

Let’s load a csv file with user questions. Choose Import Data and select Tabular from the drop-down list.
Data Source, and select Amazon S3 from the drop-down list. Alternatively, you can choose to upload a file with user queries.
Let’s add a custom transformation to convert the data into vector embeddings, followed by searching related embeddings from Amazon OpenSearch, before sending a prompt to Amazon Bedrock with the query and context from knowledge base. To generate embeddings for the query, you can use the same example code snippet Generate text embedding with Bedrock mentioned in Step #7 above.

Let’s invoke the Amazon OpenSearch API to search relevant documents for the generated vector embeddings. Add a custom transform with Python (PySpark).

from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
import json
import requests

text_column = “Queries_embedding”
output_column = text_column + “_response”

headers = {“Content-Type”: “application/json”, “kbn-xsrf”: “true”, “osd-xsrf”: “true”, “security_tenant”: “global”};
index_name = ‘s3_vector_data_v1’

def search_data(text_column_embedding):
response = requests.request(method=”GET”,
auth=(master_user, master_pass’),
return response.content

search_udf = udf(search_data, types.ArrayType())
df = df.withColumn(output_column,search_udf(col(text_column)))

Let’s add a custom transform to call the Amazon Bedrock API for query response, passing the documents from the Amazon OpenSearch knowledge base. From the example code snippets, browse and select Query Bedrock with context. Make necessary changes to code snippet and select Add.

In summary, RAG based question answering dataflow is as follows:

ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project leads to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity because different teams often run identical jobs or even write duplicate feature engineering code because they have no knowledge of prior work. To avoid the reprocessing of features, we’ll export our data flow to an Amazon SageMaker pipeline. Let’s select the + button to the right of the query. Select export data flow and choose Run SageMaker Pipeline (via Jupyter notebook).

Cleaning up
To avoid incurring future charges, delete or shut down the resources you created while following this post. Refer to Logging out of Amazon SageMaker Canvas for more details.
In this post, we showed you how Amazon SageMaker Canvas’s end-to-end capabilities by assuming the role of a data professional preparing data for an LLM. The interactive data preparation enabled quickly cleaning, transforming, and analyzing the data to engineer informative features. By removing coding complexities, SageMaker Canvas allowed rapid iteration to create a high-quality training dataset. This accelerated workflow led directly into building, training, and deploying a performant machine learning model for business impact. With its comprehensive data preparation and unified experience from data to insights, SageMaker Canvas empowers users to improve their ML outcomes.
We encourage you to learn more by exploring Amazon SageMaker Data Wrangler, Amazon SageMaker Canvas, Amazon Titan models, Amazon Bedrock, and Amazon OpenSearch Service to build a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, then please leave a comment.

About the Authors
Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.
Nikita Ivkin is a Senior Applied Scientist at Amazon SageMaker Data Wrangler with interests in machine learning and data cleaning algorithms.

Meet LQ-LoRA: A Variant of LoRA that Allows Low-Rank Quantized Matrix …

In the rapidly advancing era of Artificial Intelligence, the introduction of Large Language Models (LLMs) has transformed the way machines and humans interact with each other. Recent months have seen an exponential increase in the number of LLMs developed, with incredible capabilities and super-advanced algorithms. Models like GPT 3.5, GPT 4, LLaMa, PaLM, etc., have demonstrated some exceptional human-imitating abilities in Natural Language Understanding (NLU), processing, translation, summarization, and even content generation.

These LLMs are trained on massive amounts of data. However, there comes a challenge when these models have to adjust to new datasets. Researchers usually face issues when adapting these massive LLMs to new datasets, as full fine-tuning has a number of expenses and memory requirements. In order to address the issue of memory efficiency in LLM fine-tuning, recently, a team of researchers has presented the idea of parameter-efficient fine-tuning methods.

By learning a smaller, fine-tuned extension to the original pretrained model, these techniques can lower the amount of memory needed for fine-tuning. Low-Rank Adaptation (LoRA), which is a well-liked strategy for effective LLM adaptation, involves re-parametrizing the weight matrix of the pretrained model and fine-tuning only two of its components, i.e., L1 and L2. The remaining components remain unchanged. 

Researchers have enhanced the memory efficiency of LoRA by applying it to a quantized pre-trained model. In order to conserve memory, quantization decreases the model’s parameter precision, and if the quantization is significant, zero initialization may not be optimal. To overcome the quantization error, the team has introduced a variant of LoRA called LQ-LoRA.

LQ-LoRA breaks down the weight matrix into a quantized component, Q, and a low-rank component, L1L2, using an iterative technique influenced by the Principal Component Analysis (PCA). In LQ-LoRa, L1 and L2 are refined during adaptation, and the high-variance subspaces of the initial weight matrix are captured.

The team has shared that this work uses integer linear programming to find a mixed quantization method to solve the problem of applying the same quantization configuration to all layers. Given an overall desired bit rate, this technique permits assigning various configurations, including bits and block size, to each matrix. 

The team has modified RoBERTa and LLaMA-2 models of varying sizes, 7B and 70B, using LQ-LoRA. The findings have shown that LQ-LoRA performs better than GPTQ-LoRA and strong QLoRA baselines. The ability to train a 2.5-bit LLaMA-2 model on the OpenAssistant benchmark, which is competitive with a model fine-tuned using 4-bit QLoRA, has shown that the suggested approach allows for more aggressive quantization.

LQ-LoRA has also shown great performance in model compression after being adjusted on a dataset-calibrating language model. Despite the decreased bit rate, the team was able to produce a 2.75-bit LLaMA-2-70B model that is competitive with the original model in complete precision. This indicates that the suggested method may be able to drastically lower the memory needs of big language models without sacrificing functionality for particular activities.

In conclusion, LQ-LoRA is a significant turning point in the development of language models. Its method of memory-efficient adaptation and data-aware considerations, along with dynamic quantization parameter tuning, can definitely lead to a paradigm shift in the field of Artificial Intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meet LQ-LoRA: A Variant of LoRA that Allows Low-Rank Quantized Matrix Decomposition for Efficient Language Model Finetuning appeared first on MarkTechPost.

Redefining Transformers: How Simple Feed-Forward Neural Networks Can M …

Researchers from ETH Zurich analyze the efficacy of utilizing standard shallow feed-forward networks to emulate the attention mechanism in the Transformer model, a leading architecture for sequence-to-sequence tasks. Key attention mechanism elements in the Transformer are replaced with simple feed-forward networks trained through knowledge distillation. Rigorous ablation studies and experiments with various replacement network types and sizes underscore the adaptability of shallow feed-forward networks in emulating attention mechanisms, highlighting their potential to simplify complex sequence-to-sequence architectures.

The research emphasizes the adaptability of shallow feed-forward networks in replicating attention mechanisms. The study employs BLEU scores as the evaluation metric. While successfully repeating the behavior in the encoder and decoder layers, replacing the cross-attention tool poses challenges, leading to notably lower BLEU scores. The research sheds light on the limitations and potential of this approach.

The study explores the viability of replacing attention layers in the original Transformer model with shallow feed-forward networks for sequence-to-sequence tasks, particularly in language translation. Inspired by the computational overheads associated with attention mechanisms, the study investigates whether external feed-forward networks can effectively mimic their behavior. The research focuses on training these networks to substitute key attention components. It aims to assess their capability in modeling attention mechanisms and their potential as an alternative in sequence-to-sequence tasks.

The approach employs knowledge distillation to train shallow feed-forward networks, using intermediate activations from the original Transformer model as the teacher model. A comprehensive ablation study introduces four methods for replacing the attention mechanism in the Transformer’s encoder. Evaluated on the IWSLT2017 dataset using the BLEU metric, the proposed approaches demonstrate comparable performance to the original Transformer. It provides empirical evidence and detailed implementation specifics in the appendix, establishing the effectiveness of these methods in sequence-to-sequence tasks, particularly language translation.

Results indicate that these models can match the original’s performance, showcasing the efficacy of shallow feed-forward networks as attention-layer alternatives. Ablation studies offer insights into replacement network types and sizes, affirming their viability. However, replacing the cross-attention mechanism in the decoder significantly degrades performance, suggesting that while shallow networks excel in self-attention, they need help emulating complex cross-attention interactions in the Transformer model.

In conclusion, the study on attentionless Transformers highlights the need for advanced optimization techniques like knowledge distillation for training these models from scratch. While less specialized architectures may have potential for advanced tasks, replacing the cross-attention mechanism in the decoder with feed-forward networks can significantly reduce performance, revealing the challenges in capturing complex cross-attention interactions.

Future work could optimize hyperparameters using advanced techniques like Bayesian optimization to enhance translation quality and address size bottlenecks. Exploring more complex feed-forward networks, especially for the decoder’s cross-attention, may improve capturing complexity. Investigating alternative architectures for improved expressiveness in cross-attention is a promising research direction. The generalizability of attentionless Transformers to diverse sequence-to-sequence tasks warrants exploration. Further experiments and ablation studies can provide deeper insights, potentially refining the approach and optimizing feed-forward networks emulating attention mechanisms.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Redefining Transformers: How Simple Feed-Forward Neural Networks Can Mimic Attention Mechanisms for Efficient Sequence-to-Sequence Tasks appeared first on MarkTechPost.

Researchers from UC San Diego Introduce EUGENe: An Easy-to-Use Deep Le …

Deep learning is being used in all spheres of life. It has its utility in every field. It has a big impact on biomedical research. It is like a smart computer that can get better at tasks with little help. It has changed the way scientists study medicine and diseases.

It is impactful in genomics, a field of biology that investigates the organization of DNA into genes and the processes through which these genes are activated or deactivated within individual cells.

Researchers at the University of California, San Diego, have formulated a new deep-learning platform that can be quickly and easily adapted to suit various genomics projects. Hannah Carter, Ph.D., associate professor in the Department of Medicine at UC San Diego School of Medicine, said each cell has the same DNA, but how DNA is expressed changes what cells look and do.

EUGENe uses modules and sub-packages to facilitate essential functions within a genomics deep learning workflow. These functions include (1) extracting, transforming, and loading sequence data from various file formats; (2) instantiating, initializing, and training diverse model architectures; and (3) evaluating and interpreting model behavior.

While deep learning holds the potential to offer valuable insights into the diverse biological processes governing genetic variation, its implementation poses challenges for researchers needing more extensive expertise in computer science. Researchers said that the objective was to develop a platform that enables genomics researchers to streamline their deep learning data analysis, facilitating extraction of predictions from raw data with greater ease and efficiency.

Even though only about 2% of the total genome consists of genes encoding specific proteins, the remaining 98%, often denoted as junk DNA due to its purported lack of known function, plays a pivotal role in determining the timing, location, and manner in which certain genes are activated. Understanding the roles of these non-coding genome sections has been a top priority for genomics researchers. Deep learning has proven to be a powerful tool for achieving this goal, though using it effectively can be difficult.

Adam Klie, a Ph.D. student in the Carter lab and the first author of the study, said that Many existing platforms require many hours of coding and data wrangling. He noted that numerous projects necessitate researchers to commence their work from scratch, requiring expertise that may not be readily available to all labs interested in this domain.

To evaluate its efficacy, the researchers tested EUGENe by attempting to replicate the findings of three previous genomics studies that used a variety of sequencing data types. In the past, analyzing such diverse data sets would require integrating several different technological platforms.

EUGENe demonstrated remarkable flexibility, effectively replicating the outcomes of every investigation. This flexibility highlights the platform’s ability to manage a wide range of sequencing data and its potential as an adaptable instrument for genomics research.

EUGENe shows adaptability to different DNA sequencing data types and support for various deep learning models. The researchers aim to broaden its scope to encompass a wider array of data types, including single-cell sequencing data, and plan to make Eugene accessible to research groups worldwide.

Carter expressed enthusiasm about the project’s collaborative potential. He said that one of the exciting things about this project is that the more people use the platform, the better they can make it over time, which will be essential as deep learning continues to evolve rapidly.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Researchers from UC San Diego Introduce EUGENe: An Easy-to-Use Deep Learning Genomics Software appeared first on MarkTechPost.

Amazon Transcribe announces a new speech foundation model-powered ASR …

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it straightforward for you to add speech-to-text capabilities to your applications. Today, we are happy to announce a next-generation multi-billion parameter speech foundation model-powered system that expands automatic speech recognition to over 100 languages. In this post, we discuss some of the benefits of this system, how companies are using it, and how to get started. We also provide an example of the transcription output below.
Transcribe’s speech foundation model is trained using best-in-class, self-supervised algorithms to learn the inherent universal patterns of human speech across languages and accents. It is trained on millions of hours of unlabeled audio data from over 100 languages. The training recipes are optimized through smart data sampling to balance the training data between languages, ensuring that traditionally under-represented languages also reach high accuracy levels.
Carbyne is a software company that develops cloud-based, mission-critical contact center solutions for emergency call responders. Carbyne’s mission is to help emergency responders save lives, and language can’t get in the way of their goals. Here is how they use Amazon Transcribe to pursue their mission:

“AI-powered Carbyne Live Audio Translation is directly aimed at helping improve emergency response for the 68 million Americans who speak a language other than English at home, in addition to the up to 79 million foreign visitors to the country annually. By leveraging Amazon Transcribe’s new multilingual foundation model powered ASR, Carbyne will be even better equipped to democratize life-saving emergency services, because Every. Person. Counts.”
– Alex Dizengof, Co-Founder and CTO of Carbyne.

By leveraging speech foundation model, Amazon Transcribe delivers significant accuracy improvement between 20% and 50% across most languages. On telephony speech, which is a challenging and data-scarce domain, accuracy improvement is between 30% and 70%. In addition to substantial accuracy improvement, this large ASR model also delivers improvements in readability with more accurate punctuation and capitalization. With the advent of generative AI, thousands of enterprises are using Amazon Transcribe to unlock rich insights from their audio content. With significantly improved accuracy and support for over 100 languages, Amazon Transcribe will positively impact all such use cases. All existing and new customers using Amazon Transcribe in batch mode can access speech foundation model-powered speech recognition without needing any change to either the API endpoint or input parameters.
The new ASR system delivers several key features across all the 100+ languages related to ease of use, customization, user safety, and privacy. These include features such as automatic punctuation, custom vocabulary, automatic language identification, speaker diarization, word-level confidence scores, and custom vocabulary filter. The system’s expanded support for different accents, noise environments, and acoustic conditions enables you to produce more accurate outputs and thereby helps you effectively embed voice technologies in your applications.
Enabled by the high accuracy of Amazon Transcribe across different accents and noise conditions, its support for a large number of languages, and its breadth of value-added feature sets, thousands of enterprises will be empowered to unlock rich insights from their audio content, as well as increase the accessibility and discoverability of their audio and video content across various domains. For instance, contact centers transcribe and analyze customer calls to identify insights and subsequently improve customer experience and agent productivity. Content producers and media distributors automatically generate subtitles using Amazon Transcribe to improve content accessibility.
Get started with Amazon Transcribe
You can use the AWS Command Line Interface (AWS CLI), AWS Management Console, and various AWS SDKs for batch transcriptions and continue to use the same StartTranscriptionJob API to get performance benefits from the enhanced ASR model without needing to make any code or parameter changes on your end. For more information about using the AWS CLI and the console, refer to Transcribing with the AWS CLI and Transcribing with the AWS Management Console, respectively.
The first step is to upload your media files into an Amazon Simple Storage Service (Amazon S3) bucket, an object storage service built to store and retrieve any amount of data from anywhere. Amazon S3 offers industry-leading durability, availability, performance, security, and virtually unlimited scalability at very low cost. You can choose to save your transcript in your own S3 bucket, or have Amazon Transcribe use a secure default bucket. To learn more about using S3 buckets, see Creating, configuring, and working with Amazon S3 buckets.
Transcription output
Amazon Transcribe uses JSON representation for its output. It provides the transcription result in two different formats: text format and itemized format. Nothing changes with respect to the API endpoint or input parameters.
The text format provides the transcript as a block of text, whereas itemized format provides the transcript in the form of timely ordered transcribed items, along with additional metadata per item. Both formats exist in parallel in the output file.
Depending on the features you select when creating the transcription job, Amazon Transcribe creates additional and enriched views of the transcription result. See the following example code:

“jobName”: “2x-speakers_2x-channels”,
“accountId”: “************”,
“results”: {
“transcripts”: [
“transcript”: “Hi, welcome.”
“speaker_labels”: [
“channel_label”: “ch_0”,
“speakers”: 2,
“segments”: [
“channel_label”: “ch_1”,
“speakers”: 2,
“segments”: [
“channel_labels”: {
“channels”: [
“number_of_channels”: 2
“items”: [

“segments”: [
“status”: “COMPLETED”

The views are as follows:

Transcripts – Represented by the transcripts element, it contains only the text format of the transcript. In multi-speaker, multi-channel scenarios, concatenation of all transcripts is provided as a single block.
Speakers – Represented by the speaker_labels element, it contains the text and itemized formats of the transcript grouped by speaker. It’s available only when the multi-speakers feature is enabled.
Channels – Represented by the channel_labels element, it contains the text and itemized formats of the transcript, grouped by channel. It’s available only when the multi-channels feature is enabled.
Items – Represented by the items element, it contains only the itemized format of the transcript. In multi-speaker, multi-channel scenarios, items are enriched with additional properties, indicating speaker and channel.
Segments – Represented by the segments element, it contains the text and itemized formats of the transcript, grouped by alternative transcription. It’s available only when the alternative results feature is enabled.

At AWS, we are constantly innovating on behalf of our customers. By extending the language support in Amazon Transcribe to over 100 languages, we enable our customers to serve users from diverse linguistic backgrounds. This not only enhances accessibility, but also opens up new avenues for communication and information exchange on a global scale. To learn more about the features discussed in this post, check out features page and what’s new post.

About the authors
Sumit Kumar is a Principal Product Manager, Technical at AWS AI Language Services team. He has 10 years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Sumit loves to travel and enjoys playing cricket and Lawn-Tennis.
Vivek Singh is a Senior Manager, Product Management at AWS AI Language Services team. He leads the Amazon Transcribe product team. Prior to joining AWS, he held product management roles across various other Amazon organizations such as consumer payments and retail. Vivek lives in Seattle, WA and enjoys running, and hiking.

Drive hyper-personalized customer experiences with Amazon Personalize …

Today, we are excited to announce three launches that will help you enhance personalized customer experiences using Amazon Personalize and generative AI. Whether you’re looking for a managed solution or build your own, you can use these new capabilities to power your journey.
Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by powering personalized product and content recommendations in websites, applications, and targeted marketing campaigns, with no ML expertise required. Using recipes (algorithms prepared for specific uses cases) provided by Amazon Personalize, you can offer diverse personalization experiences like “recommend for you”, “frequently bought together”, guidance on next best actions, and targeted marketing campaigns with user segmentation.
Generative AI is quickly transforming how enterprises do business. Gartner predicts that “by 2026, more than 80% of enterprises will have used generative AI APIs or models, or deployed generative AI-enabled applications in production environments, up from less than 5% in 2023.” While generative AI can quickly create content, it alone is not enough to provide higher degree of personalization to adapt to the ever-changing and nuanced preferences of individual users. Many companies are actively seeking solutions to enhance user experience using Amazon Personalize and generative AI.
FOX Corporation (FOX) produces and distributes news, sports, and entertainment content.

“We are integrating generative AI with Amazon Personalize in order to deliver hyper-personalized experiences to our users. Amazon Personalize has helped us achieve high levels of automation in content customization. For instance, FOX Sports experienced a 400% increase in viewership content starts post-event when applied. Now, we are augmenting generative AI with Amazon Bedrock to our pipeline in order to help our content editors generate themed collections. We look forward to exploring features such as Amazon Personalize Content Generator and Personalize on LangChain to further personalize those collections for our users.”
– Daryl Bowden, Executive Vice President of Technology Platforms.

Announcing Amazon Personalize Content Generator to make recommendations more compelling
Amazon Personalize has launched Content Generator, a new generative AI-powered capability that helps companies make recommendations more compelling by identifying thematic connections between the recommended items. This capability can elevate the recommendation experience beyond standard phrases like “People who bought this also bought…” to more engaging taglines such as “Rise and Shine” for a breakfast food collection, enticing users to click and purchase.
To explore the impact of Amazon Personalize Content Generator in detail, let’s look at two examples.
Use case 1: Carousel titles for movie collections
A micro-genre is a specialized subcategory within a broader genre of film, music, or other forms of media. Streaming platforms use micro-genres to enhance user experience by allowing viewers or listeners to discover content that aligns with their specific tastes and interests. By recommending media content with micro-genres, streaming platforms cater to diverse preferences, ultimately increasing user engagement and satisfaction.
Now you can use Amazon Personalize Content Generator to create carousel titles for micro-genre collections. First, import your datasets of users’ interactions and items into Amazon Personalize for training. You upload a list of itemId values as your seed items. Next, create a batch inference job selecting Themed recommendations with Content Generator on the Amazon Personalize console or setting batch-inference-job-mode to THEME_GENERATION in the API configuration.

As the batch inference output, you will get a set of similar items and a theme for each seed item. We also provide items-theme relevance scores that you can use to set a threshold to show only items that are strongly related to the theme. The following screenshot shows an example of the output:

“theme”:”Movies with a strong female lead”,

“theme”:”Romantic movies for a cozy night in”,

Subsequently you can replace the generic phrase “More like X” with the output theme from Amazon Personalize Content Generator to make the recommendations more compelling.

Use case 2: Subject lines for marketing emails
Email marketing, although cost-effective, often struggles with low open rates and high unsubscribe rates. The decision to open an email critically depends on how attractive the subject line is, because it’s the first thing recipients see along with the sender’s name. However, scripting appealing subject lines can often be tedious and time-consuming.
Now with Amazon Personalize Content Generator, you can create compelling subject lines or headlines in the email body more efficiently, further personalizing your email campaigns. You follow the same process of data ingestion, training, and creating a batch inference job as in the previous use case. The following is an example of a marketing email that incorporates output from Amazon Personalize using Content Generator, including a set of recommended items and a generated subject line:

Subject: Cleaning Products That Will Make Your Life Sparkle!
Dear <user name>, Are you ready to transform your cleaning routine into an effortless and enjoyable experience? Explore our top-tier selections: Robot Vacuum Cleaners <picture> Window Cleaning Kits <picture> Scrub Brushes with Ergonomic Handles <picture> Microfiber Cloths <picture> Eco-Friendly Cleaning Sprays <picture>

These examples showcase how Amazon Personalize Content Generator can assist you in creating a more engaging browsing experience or a more effective marketing campaign. For more detailed instructions, refer to Themed batch recommendations.
Announcing LangChain integration to seamlessly integrate Amazon Personalize with the LangChain framework
LangChain is a powerful open-source framework that allows for integration with large language models (LLMs). LLMs are typically versatile but may struggle with domain-specific tasks where deeper context and nuanced responses are needed. LangChain empowers developers in such scenarios to build modules (agents/chains) for their specific generative AI tasks. They can also introduce context and memory into LLMs by connecting and chaining LLM prompts to solve for varying use cases.
We are excited to launch LangChain integration. With this new capability, builders can use the Amazon Personalize custom chain on LangChain to seamlessly integrate Amazon Personalize with generative AI solutions. Adding a personalized touch to generative AI solutions helps you create more tailored and relevant interactions with end-users. The following code snippet demonstrates how you can invoke Amazon Personalize, retrieve recommendations for a campaign or recommender, and seamlessly feed it into your generative AI applications within the LangChain ecosystem. You can also use this for sequential chains.

from langchain.utilities import AmazonPersonalize
from langchain.chains import AmazonPersonalizeChain
from langchain.llms.bedrock import Bedrock

client=AmazonPersonalize(recommender_arn=recommender_arn, credentials_profile_name=”default”,region_name=”us-west-2″)

bedrock_llm = Bedrock(model_id=”anthropic.claude-v2″, region_name=”us-west-2″)

# Create personalize chain
chain = AmazonPersonalizeChain.from_llm( llm=bedrock_llm, client=client)
response = chain({‘user_id’: ‘1’})

You can use this capability to craft personalized marketing copies, generate concise summaries for recommended content, recommend products or content in chatbots, and build next-generation customer experiences with your creativity.
Amazon Personalize now enables you to return metadata in inference response to improve generative AI workflow
Amazon Personalize now improves your generative AI workflow by enabling return item metadata as part of the inference output. Getting recommendations along with metadata makes it more convenient to provide additional context to LLMs. This additional context, such as genre and product description, can help the models gain a deeper understanding of item attributes to generate more relevant content.
Amazon Personalize supports this capability for both custom recipes and domain optimized recommenders. When creating a campaign or a recommender, you can enable the option to return metadata with recommendation results, or adjust the setting by updating the campaign or recommender. You can select up to 10 metadata fields and 50 recommendation results to return metadata during an inference call, either through the Amazon Personalize API or the Amazon Personalize console.
The following is an example in the API:

## Create campaign with enabled metadata
example_name = ‘metadata_response_enabled_campaign’
create_campaign_response = personalize.create_campaign(
name = example_name,
solutionVersionArn = example_solution_version_arn,
minProvisionedTPS = 1,
campaignConfig = {“enableMetadataWithRecommendations”: True}

## GetRecommendations with metadata columns
metadataMap = {“ITEMS”: [“genres”, “num”]}
response = personalize_runtime.get_recommendations(campaignArn=example_campaign_arn,
userId=”0001″, itemId=”0002″, metadataColumns=metadataMap, numResults=2)

## Example response with metadata
‘itemId’: ‘356’,
‘metadata’: {‘genres’: ‘Comedy’, ‘num’: ‘0.6103248’}
‘itemId’: ‘260’,
‘metadata’: {‘genres’: ‘Action|Adventure’, ‘num’: ‘0.074548’}},

At AWS, we are constantly innovating on behalf of our customers. By introducing these new launches powered by Amazon Personalize and Amazon Bedrock, we will enrich every aspect of the builder and user experience, elevating efficiency and end-user satisfaction. To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.

About the Authors
Jingwen Hu is a Senior Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. In her spare time, she enjoys traveling and exploring local food.
Pranav Agarwal is a Senior Software Engineer with AWS AI/ML and works on architecting software systems and building AI-powered recommender systems at scale. Outside of work, he enjoys reading, running, and ice-skating.
Rishabh Agrawal is a Senior Software Engineer working on AI services at AWS. In his spare time, he enjoys hiking, traveling, and reading.
Ashish Lal is a Senior Product Marketing Manager who leads product marketing for AI services at AWS. He has 9 years of marketing experience and has led the product marketing effort for intelligent document processing. He got his master’s in Business Administration at the University of Washington.