Klaviyo Tools: 20 Must-Have Solutions for Smarter Email Marketing

Email marketing isn’t as easy as it used to be. With inboxes crammed full of promotions, newsletters, and offers, getting your emails noticed (and actually read) has become a real challenge. 

The days of sending a basic email blast and hoping for the best are long gone. Now, it takes strategy, precision, and the right tools to stand out in the crowd.

That’s where Klaviyo comes in. 

Trusted by over 100,000 businesses, Klaviyo helps you connect with your audience on a deeper level, with campaigns that are timely, personalized, and effective. From advanced segmentation to automation, Klaviyo and surrounding Klaviyo tools equip you to tackle the challenges of modern email marketing with ease.

That’s what we are going to help with today. In this post, we’ll explore the top Klaviyo tools that make hitting your email marketing goals not just possible, but achievable…even if you’re working with a small team or limited time. 

So whether you’re looking to improve engagement, increase conversions, or just stop your emails from disappearing into the spam folder, there’s a Klaviyo tool for that. Let’s jump in. 

Why Klaviyo?

How Klaviyo Benefits Different Types of Businesses

Criteria for Selecting the Best Klaviyo Tools

20 Klaviyo Tools to Add to Your Tech Stack Right Now

How to Integrate Klaviyo Tools into Your Marketing Strategy

Your Klaviyo Tools Questions Answered

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Why Klaviyo? Key Features for Smarter Email Marketing

Klaviyo has become a go-to email marketing tool for ecommerce for a reason. Well, actually – a few very good reasons. 

The main one being that Klaviyo isn’t just your standard ESP. It offers a full suite of tools and integrations that are laser-focused on helping you connect with your audience in smarter, more meaningful ways.

Let’s break down some of the standout features that make Klaviyo a favorite for businesses of all shapes and sizes:

Advanced Segmentation for Targeted Campaigns

Say goodbye to generic, one-size-fits-all email blasts. Klaviyo’s segmentation features let you divide your audience based on behavior, purchase history, engagement, and more. 

Want to send a special discount to customers who haven’t bought anything in a while? Done. Or maybe you want to target your most loyal fans with exclusive offers? 

Klaviyo’s segmentation makes it easy.

Personalization That Actually Feels Personal

Forget those cringey “Dear [First Name]” emails. 

Klaviyo takes personalization up a notch with dynamic content that adapts based on the data you have about each subscriber. 

From product recommendations to tailored messages based on past interactions, Klaviyo makes your emails feel like they were written just for each individual customer.

Powerful Automation to Save Time and Drive Results

Running an effective email campaign can be a time-suck, especially if you’re manually handling everything. 

Klaviyo’s automation flows let you set up a series of emails that trigger based on specific customer actions—like welcoming a new subscriber, nudging an abandoned cart, or re-engaging customers who’ve gone a bit quiet. 

Once you set them up, these flows work in the background, helping you nurture leads and boost conversions without lifting a finger.

Data-Driven Insights for Smarter Decisions

Klaviyo goes beyond just showing you who opened your emails. With in-depth analytics and reporting, you can track everything from engagement rates to conversion metrics. 

Plus, Klaviyo’s predictive analytics can forecast things like a customer’s lifetime value or when they’re most likely to buy again. In other words, it’s not just numbers. It’s real insights that help you make better marketing decisions. We can appreciate that right?

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

How Klaviyo Benefits Different Types of Businesses

Klaviyo isn’t a one-size-fits-all tool. It’s designed to meet the unique needs of different businesses. 

Whether you’re running an ecommerce store, a SaaS company, or a small business, Klaviyo’s features adapt to help you connect with your audience, drive engagement, and boost results. 

Here’s how Klaviyo works its magic across various industries.

E-commerce: Klaviyo is practically tailor-made for ecommerce. Integrating with platforms like Shopify, WooCommerce, and BigCommerce, Klaviyo pulls in customer data from your store and helps you create campaigns based on purchase history, abandoned carts, and browsing behavior. The result? Highly personalized emails that turn casual browsers into loyal buyers.

SaaS Companies: For SaaS businesses, Klaviyo’s segmentation and automation capabilities make it easy to stay in touch with your users throughout their lifecycle. Send targeted onboarding sequences, encourage feature adoption, and re-engage customers who might be at risk of churning.

Small Businesses: Don’t let the robust feature set intimidate you. Klaviyo is designed to be intuitive. Small businesses benefit from Klaviyo’s plug-and-play templates, straightforward segmentation, and automation workflows, letting them compete with bigger brands on customer experience without a hefty time commitment.

No matter your industry or business size, Klaviyo gives you the tools to run smarter, more engaging campaigns that get results. 

Criteria for Selecting the Best Klaviyo Tools

With so many tools and features at your disposal, narrowing down the best ones for your business can be tricky. 

Here are a few key factors to keep in mind when choosing which Klaviyo tools to invest your time in:

1. Usability and Interface

Imagine having a powerful tool at your fingertips, but it’s so complicated to use that it takes hours just to set up a single campaign. Ugh, no thanks. 

A user-friendly interface means your team can dive right in without needing extensive training. For instance, a clean, intuitive interface lets you quickly adjust segments or create automated flows without wasting time or risking mistakes.

2. Integration Capabilities

Let’s say you’re running an ecommerce store on Shopify. If your email tool integrates seamlessly, it can automatically sync customer data, purchase histories, and product details, allowing you to send personalized product recommendations. 

Without this integration, you’d have to export and import data manually, which not only takes time but increases the risk of errors.

3. Pricing and Scalability

A tool may look affordable at first, but as your subscriber list grows, costs can skyrocket if it isn’t designed to scale with you. You don’t want to find yourself in a position where your tool becomes prohibitively expensive as your audience grows. 

Choosing a tool with scalable pricing means you can expand your marketing efforts without budget surprises down the line.

4. Customer Support and Community

Even the best tools come with a learning curve and issues will inevitably arise. Imagine launching a campaign at a crucial time, only to hit a technical glitch. 

A reliable support team or an active community forum means you can get quick help when you need it, reducing downtime and stress.

5. Customization and Flexibility

Your marketing needs are unique, so rigid tools can feel like a straightjacket. For example, if you’re unable to customize your automation flows to fit specific customer journeys, you’re limited in how well you can nurture leads. 

Flexible tools let you tailor campaigns to match your business’s exact needs, keeping your marketing aligned with your goals as they evolve.

These examples show why each criterion matters when picking Klaviyo tools, ensuring you’re set up for efficiency, growth, and adaptability in your email marketing strategy.

20 Klaviyo Tools to Add to Your Tech Stack Right Now

When it comes to making the most out of Klaviyo, having the right tools in your corner can make all the difference. 

Klaviyo’s built-in features are powerful on their own, but combining them with a few third-party integrations can take your email marketing to the next level. 

Below, we’ve rounded up the essential Klaviyo tools and integrations that every ecommerce marketer should have on their radar.

Signal: Return Visitor Identification

X-Ray Email Capture

Justuno

Smile.io

Aftership

Email Template Builder

Shopify for Klaviyo

Email Validator

Okendo

A/B Testing Tool

Zendesk

Custom Audience Builder

Automation Flows

Postscript

Google Ads

Canva

Predictive Analytics

Yotpo

Contact Data Enrichment

Recharge

1. Signal: Klaviyo Return Visitor Identification

Overview: 

Signal by Customers.ai is a tool designed to enhance Klaviyo’s tracking capabilities by identifying return visitors that Klaviyo may miss due to its 7-day cookie expiration. By integrating Signal with Klaviyo, you can capture and engage with high-intent visitors who are in-market and ready to buy. When someone returns to your site after the 7-day window, they’re showing renewed interest, and Signal ensures you can identify and reach out to them at this crucial moment. This allows you to engage with warm leads at the perfect time, ultimately boosting conversions and driving sales.

Why You Need This Klaviyo Tool: 

Klaviyo’s standard tracking relies on cookies that expire after seven days, which can result in missed opportunities to re-engage visitors who return after this period. Signal addresses this limitation by extending the tracking window, allowing you to recognize and connect with returning visitors beyond the initial seven days. This means more accurate data, better segmentation, and the ability to tailor your marketing efforts to a wider audience, ultimately leading to increased conversions and customer retention.

2. X-Ray Email Capture & Visitor Identification 

Overview 

The X-Ray Pixel from Customers.ai is a powerful tool that enhances your Klaviyo setup by identifying anonymous website visitors and capturing their email addresses. Once integrated, it enables you to automatically funnel these new contacts into your existing Klaviyo flows, such as welcome sequences and abandoned cart reminders, without requiring any additional input from the visitor. 

Why You Need This Klaviyo Tool 

Traditional email capture methods often rely on visitors voluntarily providing their information, which can result in missed opportunities. The X-Ray Pixel addresses this gap by proactively identifying and capturing the contact details of high-intent visitors who may not have completed a form. By seamlessly integrating these contacts into your Klaviyo flows, you can engage potential customers at critical points in their buying journey, leading to increased conversions and a more personalized customer experience. 

3. Justuno Lead Capture Forms 

Overview

Justuno is a conversion optimization platform that enhances your Klaviyo experience by providing advanced lead capture forms and pop-ups. 

By integrating Justuno with Klaviyo, you can effectively grow your email list and segment subscribers based on their interactions. This seamless connection ensures that the data collected through Justuno’s engaging forms is automatically synced with your Klaviyo account, streamlining your marketing efforts. 

Why You Need This Klaviyo Tool

Capturing leads is a critical component of any ecommerce strategy. Justuno’s customizable pop-ups and forms are designed to engage visitors and encourage them to subscribe, providing you with valuable contact information. When integrated with Klaviyo, these leads are instantly added to your email lists, allowing for immediate segmentation and targeted marketing campaigns. This integration not only simplifies the lead capture process but also enhances your ability to deliver personalized content, ultimately leading to higher conversion rates and improved customer retention.

4. Smile.io Loyalty & Rewards Program 

Overview

Smile.io is a loyalty and rewards platform that makes it easy to keep your best customers coming back. When integrated with Klaviyo, Smile.io syncs loyalty data, like points balance, referral links, and VIP status, so you can create targeted email campaigns that celebrate and reward your most engaged customers.

Why You Need This Klaviyo Tool

Customer retention is key, and this integration makes it simple. With Smile.io and Klaviyo working together, you can automatically send personalized emails that remind customers of their points, highlight upcoming rewards, or celebrate their VIP achievements. By recognizing and rewarding your loyal customers, you’ll encourage repeat purchases, boost customer lifetime value, and strengthen brand loyalty—no extra work required.

5. AfterShip Shipping & Tracking Platform

Overview

AfterShip is a shipping and tracking platform that, when integrated with Klaviyo, keeps your customers in the loop with real-time shipping updates and delivery notifications. This tool automatically syncs order tracking information, so your customers know exactly where their package is and when it will arrive.

Why You Need This Klaviyo Tool

Keeping customers informed post-purchase is crucial for building trust and improving the overall experience. With AfterShip and Klaviyo, you can send timely notifications straight to your customers’ inboxes, from shipping confirmations to delivery alerts. This transparency reduces “Where’s my order?” inquiries and gives your customers peace of mind, leading to a smoother shopping experience and better post-purchase satisfaction.

6. Klaviyo Email Template Builder 

Overview

Klaviyo’s Email Template Builder is a user-friendly, drag-and-drop tool that lets you design professional, on-brand emails without any coding skills. With customizable blocks and pre-designed layouts, you can quickly create visually appealing, responsive emails that look great on any device.

Why You Need This Klaviyo Tool

A polished email is crucial for capturing attention in your subscribers’ inboxes and Klaviyo’s builder makes it easy to create standout designs. Whether you’re building a promotional blast or a nurturing welcome email, Klaviyo’s Template Builder helps you create beautiful emails that engage your audience. The best part? No designer needed!

7. Shopify Klaviyo Integration

Overview

Klaviyo’s Shopify Integration allows you to connect your Shopify store with Klaviyo, seamlessly syncing customer data, purchase history, and product details. This powerful integration ensures that every interaction your customers have with your store is captured, setting you up to create targeted, personalized campaigns based on real-time customer behavior.

Why You Need This Klaviyo Tool

With Klaviyo and Shopify working together, you can take your email marketing to a whole new level. Imagine being able to automatically follow up with customers who abandoned their carts, send personalized product recommendations, or reward repeat shoppers, all based on their actual shopping behavior. This integration lets you tap into your Shopify data to create emails that feel personal and relevant.

8. Email Validator Tool for Klaviyo

Overview

The Customers.ai email validation tool helps keep your email list clean by checking every address that goes into Klaviyo for accuracy and validity. With this tool, you can ensure that every email in your list is active and correctly formatted, which means fewer bounces and better deliverability for your campaigns.

Why You Need This Klaviyo Tool

Sending emails to outdated or incorrect addresses isn’t just a waste of time, it can also hurt your sender reputation and impact deliverability. By using the Customers.ai email validator with Klaviyo, you can automatically filter out invalid emails before they cause issues. This way, your messages reach real people, your engagement rates improve, and your domain reputation stays strong.

9. Okendo Customer Review Platform

Overview

Okendo is a customer review platform that integrates with Klaviyo to help you gather and showcase genuine feedback from your customers. With Okendo, you can collect reviews, ratings, and even photos or videos from customers, bringing authentic, user-generated content right into your marketing mix.

Why You Need This Klaviyo Tool

Nothing builds trust like hearing from real customers, and Okendo makes it easy to put that feedback front and center. By connecting Okendo with Klaviyo, you can send personalized review requests to recent buyers, encouraging them to share their experiences. Plus, you can use this content in your emails, showing potential customers that your products are the real deal.

10. Klaviyo A/B Testing Tool

Overview

Klaviyo’s A/B Testing feature lets you test different variations of your email campaigns to see which one performs best. You can experiment with subject lines, email content, call-to-action buttons, and more, giving you the data to make informed decisions that improve your email marketing results.

Why You Need This Klaviyo Tool

Guesswork doesn’t cut it in email marketing. With A/B Testing, you can see exactly what resonates with your audience and what doesn’t. By testing different elements, you can refine your campaigns to boost open rates, clicks, and conversions. This tool is essential for marketers looking to continuously improve and maximize the impact of every email they send.

11. Zendesk Support Platform

Overview

The Zendesk and Klaviyo integration brings your customer support and marketing efforts together by syncing support tickets and customer interactions directly into Klaviyo. This integration allows you to view support history alongside your marketing data, making it easier to understand each customer’s journey.

Why You Need This Klaviyo Tool

Customer experience doesn’t stop at the purchase. It includes every interaction they have with your brand. With Zendesk data in Klaviyo, you can segment customers based on their support history and tailor your messaging accordingly. For example, you might send an extra-thoughtful follow-up to recent support ticket submitters or offer re-engagement emails to customers who had unresolved issues. This integration ensures that your marketing stays relevant and customer-focused, helping to build trust and loyalty over time.

12. Custom Audience Builder for Email

Overview

The custom audience builder from Customers.ai takes segmentation to the next level, allowing you to create highly specific audience groups based on behaviors, interests, and demographics. This tool integrates seamlessly with Klaviyo, giving you a powerful way to refine your targeting and deliver messages that resonate with each unique audience segment.

Why You Need This Klaviyo Tool

Effective marketing is all about reaching the right people with the right message. The custom audience builder lets you go beyond basic segments, enabling you to target hyper-specific audiences, like high-value shoppers or return visitors. With this level of detail, you can craft personalized campaigns that speak directly to each group’s needs.

13. Klaviyo Automation Flows

Overview

Klaviyo’s Automation Flows allow you to set up automated email sequences triggered by specific customer actions, like signing up, abandoning a cart, or completing a purchase. These flows ensure that every customer receives timely, relevant emails that guide them through their journey with your brand.

Why You Need This Klaviyo Tool

Consistency is key in building customer relationships, and Klaviyo’s Automation Flows make it easy to stay connected without the manual effort. By setting up flows for key touchpoints, such as a welcome series for new subscribers or reminders for abandoned carts, you can engage customers at crucial moments. This not only saves you time but also keeps customers engaged, nurtures leads, and increases the chances of conversion.

14. Postscript SMS Marketing 

Overview

Postscript lets you bring SMS into the mix alongside your Klaviyo emails, so you’re not just hoping customers open an email—they’re also getting your message right on their phone. It’s all about reaching people where they actually are and adding that personal touch.

Why You Need This Klaviyo Tool

Let’s face it. Inboxes get crowded. A quick text can be the nudge your customers need, whether it’s a reminder about an abandoned cart or a flash sale heads-up. With Postscript, you can keep your email and SMS campaigns in sync, making sure customers get the message loud and clear, wherever they are. It’s like covering all the bases without feeling spammy.

15. Google Ads Klaviyo Integration

Overview

Linking your Google Ads account with Klaviyo means you can turn your email lists and customer segments into targeted ad audiences on Google. With this integration, you can reach your subscribers across search and display ads, focusing on retargeting existing customers or finding new ones who match your best buyers.

Why You Need This Klaviyo Tool

Your email list is packed with valuable customer data, so why not put it to work outside the inbox? By syncing Klaviyo with Google Ads, you can show relevant ads to people who already know your brand or reach new potential customers who are similar to your top shoppers. It’s an easy way to keep your brand front and center while making the most of your hard-earned customer insights.

16. Canva for Klaviyo 

Overview

Canva’s design tools make it easy to create on-brand visuals and when you connect Canva with Klaviyo, you can bring those polished designs straight into your emails. This integration lets you design eye-catching templates and graphics without leaving your Klaviyo workflow, so your emails look as good as they perform.

Why You Need This Klaviyo Tool

A great-looking email can be the difference between a quick delete and an engaged reader. With Canva integrated into Klaviyo, you can keep all your marketing materials consistent and professional without needing a designer. From promotional graphics to custom email templates, it’s a quick way to make your emails look sharp and on-brand every time.

17. Klaviyo Predictive Analytics

Overview

Klaviyo’s Predictive Analytics feature uses data to take the guesswork out of your marketing. It looks at your customers’ past actions to predict things like when they’ll buy next or how much they’re likely to spend over time. It’s kind of like a crystal ball for your customer insights.

Why You Need This Klaviyo Tool

Knowing who’s likely to buy again (and when) lets you time your campaigns perfectly and send the right offers to the right people. With Predictive Analytics, you can focus on high-value customers, target people right before they’re ready to shop, and even prevent potential churn. It’s a smart way to use your data to keep customers engaged and loyal.

18. Yotpo Customer Review Platform

Overview

Yotpo is a go-to platform for collecting customer reviews, ratings, and user-generated content, and it integrates smoothly with Klaviyo. By bringing in real customer feedback and photos, you can add authentic social proof to your emails, boosting trust and encouraging more conversions.

Why You Need This Klaviyo Tool

People trust real reviews more than any sales pitch and Yotpo makes it easy to show off that credibility. With Yotpo and Klaviyo working together, you can automatically ask for reviews after purchases, send thank-you messages, and feature customer photos and ratings in your emails. It’s an easy, effective way to build trust, showcase happy customers, and make your emails more compelling.

19. Contact Data Enrichment for Klaviyo

Overview

Customers.ai’s Contact Data Enrichment tool boosts your Klaviyo contact profiles by filling in the gaps. It enriches the data of your website visitors, adding valuable details like job titles, company information, and social media profiles. With this extra layer of information, you get a fuller picture of who’s interacting with your brand.

Why You Need This Klaviyo Tool

Knowing more about your contacts means you can create messages that hit closer to home. With enriched data, you can segment your audience more effectively, tailor your campaigns, and deliver content that truly resonates. It’s all about sending the right message to the right people.

20. Recharge Subscription Billing Platform 

Overview

Recharge is a subscription billing platform built for businesses with recurring products. When you integrate Recharge with Klaviyo, you can create targeted email flows specifically for subscription customers—think renewal reminders, upsell offers, and special rewards for loyal subscribers.

Why You Need This Klaviyo Tool

Managing subscriptions isn’t just about getting customers to sign up; it’s about keeping them engaged for the long haul. With Recharge and Klaviyo working together, you can stay connected with your subscribers at every stage, reminding them of upcoming renewals, offering incentives, and reducing churn. It’s a smart way to build loyalty and keep those repeat customers coming back.

AI-Powered Advertising

How to Unlock AI and Lead Capture Tech for 10X Return on Ad Spend

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

How to Integrate Klaviyo Tools into Your Marketing Strategy

Integrating Klaviyo tools isn’t just about flipping a switch. It’s about aligning them with your marketing goals to get the most out of each feature. Here’s a quick guide to making Klaviyo work for you

Assessing Your Current Strategy

Before diving in, take a step back and assess what’s working (and what’s not) in your current email strategy. 

Are you missing key touchpoints? Are your messages personalized enough? 

Figuring out where Klaviyo can make the biggest impact will help you prioritize which tools to implement first, whether it’s advanced segmentation, automated flows, or data enrichment.

Setting Up Klaviyo Tools Step-by-Step

Once you know where you want to start, set up each Klaviyo tool one step at a time. 

Begin with the basics, like connecting your ecommerce platform or setting up core automation flows (like welcome series and abandoned cart emails). Then, dive deeper with integrations, A/B testing, or predictive analytics. 

Taking it step-by-step keeps things manageable and ensures each tool is set up correctly before moving on.

Training Your Team

Getting the whole team on board is key to making Klaviyo tools part of your everyday marketing strategy. Offer training sessions, set up shared documentation, or even create quick how-to guides for the most-used tools. 

When everyone knows how to use Klaviyo effectively, you’ll be able to make the most of its features and keep your campaigns running smoothly.

Integrating Klaviyo tools into your strategy is all about finding where they fit best, implementing with care, and empowering your team to use them effectively.

The Bottom Line on Klaviyo Tools

Klaviyo offers a powerful suite of tools and integrations that can transform your email marketing strategy from good to great. Whether you’re looking to personalize campaigns, streamline your automation, or capture high-intent visitors, Klaviyo’s features and compatible third-party tools give you everything you need to connect with your audience in meaningful ways.

By thoughtfully integrating tools like advanced segmentation, predictive analytics, SMS, and data enrichment, you can engage your customers at every stage of their journey, all while keeping your brand’s voice and message consistent. From small tweaks to major campaign upgrades, Klaviyo’s tools are designed to help you make smarter, data-driven decisions that boost engagement and build loyalty.

Ready to see what Customers.ai can do for your Klaviyo setup? Get a free Klaviyo signal audit and start driving more sales today!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Klaviyo Tool FAQs

What are the essential Klaviyo tools for ecommerce marketing?

The essential Klaviyo tools for ecommerce include segmentation, automation flows, email template builder, predictive analytics, and integrations with platforms like Shopify. These tools help create personalized campaigns that resonate with customers and drive conversions.

How do Klaviyo tools help improve customer engagement?

Klaviyo’s tools, such as segmentation and automation, allow you to reach customers with relevant messages at the right time, boosting engagement. By sending personalized emails based on behavior and interests, customers are more likely to engage with your brand.

Which Klaviyo tools are best for personalized marketing?

Advanced segmentation and predictive analytics are Klaviyo’s go-to tools for personalized marketing. These tools help you create tailored campaigns for different customer segments based on actions, purchase history, and predicted future behavior.

What Klaviyo tools should I use to increase email open rates?

To increase open rates, leverage Klaviyo’s A/B testing and segmentation tools. A/B testing allows you to experiment with subject lines, while segmentation ensures emails are relevant, making customers more likely to open them.

How can Klaviyo tools boost my ecommerce sales?

Klaviyo’s automation flows, such as abandoned cart reminders and post-purchase emails, drive conversions by targeting customers at crucial points in their journey. Additionally, personalized product recommendations can encourage repeat purchases.

Are there free Klaviyo tools available for small businesses?

Yes, Klaviyo offers a free plan that includes basic segmentation, email campaigns, and essential analytics, making it accessible for small businesses looking to improve their email marketing without upfront costs.

What are the top Klaviyo tools for email automation?

Klaviyo’s automation flows, including welcome series, abandoned cart, and post-purchase sequences, are top tools for automating key touchpoints. These tools help maintain consistent communication and nurture leads automatically.

How can Klaviyo tools enhance my segmentation strategy?

Klaviyo’s advanced segmentation lets you group customers based on behaviors, interests, and demographics, enabling more targeted campaigns. With precise segments, you can send highly relevant messages to each audience, improving engagement.

What Klaviyo tools are best for customer retention?

Predictive analytics and loyalty program integrations are essential Klaviyo tools for retention. They allow you to identify high-value customers, re-engage at-risk customers, and send rewards or incentives to keep customers loyal.

Can Klaviyo tools help me manage customer data effectively?

Yes, Klaviyo’s integrations with platforms like Shopify and CRM systems make it easy to sync and manage customer data. The platform’s segmentation tools also allow you to organize and target customers based on various data points.

Which Klaviyo tools are ideal for new ecommerce stores?

For new stores, Klaviyo’s basic segmentation, welcome series automation, and integration with ecommerce platforms like Shopify are ideal starting points. These tools help build initial customer relationships and gather data for future campaigns.

How do I choose the right Klaviyo tools for my business needs?

Consider your marketing goals. If retention is a priority, tools like automation flows and predictive analytics are key. For lead generation, focus on segmentation and list growth tools to build your audience effectively.

What advanced Klaviyo tools should I know about?

Advanced Klaviyo tools include predictive analytics for forecasting customer behavior, multi-channel marketing capabilities with SMS, and integrations for personalized customer journeys across various platforms.

What Klaviyo tools work best with Shopify?

Klaviyo’s Shopify integration works well with abandoned cart flows, purchase-based segmentation, and product recommendations. Together, they help you create campaigns that directly address Shopify customer behavior.

How do Klaviyo tools support multi-channel marketing?

Klaviyo’s tools enable email and SMS campaigns, allowing you to reach customers on different channels. You can create cohesive multi-channel campaigns that connect with customers wherever they prefer to engage.

Are there Klaviyo tools that improve email deliverability?

Klaviyo offers tools like email validation integrations to keep your list clean and deliverable. Additionally, its segmentation capabilities ensure emails are sent to engaged users, which can help improve deliverability rates.

What Klaviyo tools can help with abandoned cart recovery?

Klaviyo’s abandoned cart flow is designed specifically for cart recovery. You can set up automated emails to remind customers of items left in their cart, encouraging them to complete their purchase.

How do Klaviyo tools integrate with other marketing platforms?

Klaviyo integrates with platforms like Shopify, Facebook Ads, Google Ads, and loyalty programs like Smile.io. These integrations allow for seamless data sharing, enabling unified, data-driven marketing campaigns.

Can Klaviyo tools help me create effective email flows?

Yes, Klaviyo’s automation flows let you create customized email sequences based on customer actions. Flows like welcome series, order follow-ups, and re-engagement emails keep customers engaged and drive conversions.

How do Klaviyo tools enhance customer segmentation?

Klaviyo’s advanced segmentation enables you to group customers by specific criteria, such as behavior, interests, and purchase history. This helps deliver more relevant messages, improving engagement and conversion rates.

Which Klaviyo tools help with collecting customer feedback?

Klaviyo’s integrations with review platforms like Yotpo make it easy to gather and display customer feedback. You can also use email flows to request reviews post-purchase, helping build social proof and credibility.

What Klaviyo tools can track customer behavior effectively?

Klaviyo’s customer insights and analytics tools track behavior like browsing activity, purchase history, and engagement with emails. This data informs more effective segmentation and personalized campaign strategies.

Are there Klaviyo tools specifically for loyalty programs?

Yes, Klaviyo integrates with loyalty platforms like Smile.io to incorporate loyalty data into campaigns. This allows you to reward loyal customers and segment based on loyalty program engagement.

How do Klaviyo tools assist in predictive analytics?

Klaviyo’s predictive analytics tools use AI to forecast customer behavior, such as likely next purchase and lifetime value. These insights enable more strategic, data-driven marketing efforts focused on high-value customers.

What are the latest Klaviyo tools for ecommerce marketers?

Klaviyo continuously updates its offerings. Some recent tools for ecommerce marketers include enhanced SMS capabilities, multi-channel analytics, and more refined predictive analytics for better audience targeting.
The post Klaviyo Tools: 20 Must-Have Solutions for Smarter Email Marketing appeared first on Customers.ai.

Transcribe, translate, and summarize live streams in your browser with …

Live streaming has been gaining immense popularity in recent years, attracting an ever-growing number of viewers and content creators across various platforms. From gaming and entertainment to education and corporate events, live streams have become a powerful medium for real-time engagement and content consumption. However, as the reach of live streams expands globally, language barriers and accessibility challenges have emerged, limiting the ability of viewers to fully comprehend and participate in these immersive experiences.
Recognizing this need, we have developed a Chrome extension that harnesses the power of AWS AI and generative AI services, including Amazon Bedrock, an AWS managed service to build and scale generative AI applications with foundation models (FMs). This extension aims to revolutionize the live streaming experience by providing real-time transcription, translation, and summarization capabilities directly within your browser.
With this extension, viewers can seamlessly transcribe live streams into text, enabling them to follow along with the content even in noisy environments or when listening to audio is not feasible. Moreover, the extension’s translation capabilities open up live streams to a global audience, breaking down language barriers and fostering more inclusive participation. By offering real-time translations into multiple languages, viewers from around the world can engage with live content as if it were delivered in their first language.
In addition, the extension’s capabilities extend beyond mere transcription and translation. Using the advanced natural language processing and summarization capabilities of FMs available through Amazon Bedrock, the extension can generate concise summaries of the content being transcribed in real time. This innovative feature empowers viewers to catch up with what is being presented, making it simpler to grasp key points and highlights, even if they have missed portions of the live stream or find it challenging to follow complex discussions.
In this post, we explore the approach behind building this powerful extension and provide step-by-step instructions to deploy and use it in your browser.
Solution overview
The solution is powered by two AWS AI services, Amazon Transcribe and Amazon Translate, along with Amazon Bedrock, a fully managed service that allows you to build generative AI applications. The solution also uses Amazon Cognito user pools and identity pools for managing authentication and authorization of users, Amazon API Gateway REST APIs, AWS Lambda functions, and an Amazon Simple Storage Service (Amazon S3) bucket.
After deploying the solution, you can access the following features:

Live transcription and translation – The Chrome extension transcribes and translates audio streams for you in real time using Amazon Transcribe, an automatic speech recognition service. This feature also integrates with Amazon Transcribe automatic language identification for streaming transcriptions—with a minimum of 3 seconds of audio, the service can automatically detect the dominant language and generate a transcript without you having to specify the spoken language.
Summarization – The Chrome extension uses FMs such as Anthropic’s Claude 3 models on Amazon Bedrock to summarize content being transcribed, so you can grasp key ideas of your live stream by reading the summary.

Live transcription is currently available in the over 50 languages currently supported by Amazon Transcribe streaming (Chinese, English, French, German, Hindi, Italian, Japanese, Korean, Brazilian Portuguese, Spanish, and Thai), while translation is available in the over 75 languages currently supported by Amazon Translate.
The following diagram illustrates the architecture of the application.

The solution workflow includes the following steps:

A Chrome browser is used to access the desired live streamed content, and the extension is activated and displayed as a side panel. The extension delivers a web application implemented using the AWS SDK for JavaScript and the AWS Amplify JavaScript library.
The user signs in by entering a user name and a password. Authentication is performed against the Amazon Cognito user pool. After a successful login, the Amazon Cognito identity pool is used to provide the user with the temporary AWS credentials required to access application features. For more details about the authentication and authorization flows, refer to Accessing AWS services using an identity pool after sign-in.
The extension interacts with Amazon Transcribe (StartStreamTranscription operation), Amazon Translate (TranslateText operation), and Amazon Bedrock (InvokeModel operation). Interactions with Amazon Bedrock are handled by a Lambda function, which implements the application logic underlying an API made available using API Gateway.
The user is provided with the transcription, translation, and summary of the content playing inside the browser tab. The summary is stored inside an S3 bucket, which can be emptied using the extension’s Clean Up feature.

In the following sections, we walk through how to deploy the Chrome extension and the underlying backend resources and set up the extension, then we demonstrate using the extension in a sample use case.
Prerequisites
For this walkthrough, you should have the following prerequisites:

A computer with Google Chrome installed on it
An AWS account
Access to one or more Amazon Bedrock models (for more information, see Access Amazon Bedrock foundation models)
An AWS Identity and Access Management (IAM) user with the AdministratorAccess policy granted (for production, we recommend restricting access as needed)
The AWS Command Line Interface (AWS CLI) installed and configured to use with your AWS account
The AWS CDK CLI installed
js and npm installed

Deploy the backend
The first step consists of deploying an AWS Cloud Development Kit (AWS CDK) application that automatically provisions and configures the required AWS resources, including:

An Amazon Cognito user pool and identity pool that allow user authentication
An S3 bucket, where transcription summaries are stored
Lambda functions that interact with Amazon Bedrock to perform content summarization
IAM roles that are associated with the identity pool and have permissions required to access AWS services

Complete the following steps to deploy the AWS CDK application:

Using a command line interface (Linux shell, macOS Terminal, Windows command prompt or PowerShell), clone the GitHub repository to a local directory, then open the directory:

git clone https://github.com/aws-samples/aws-transcribe-translate-summarize-live-streams-in-browser.git
cd aws-transcribe-translate-summarize-live-streams-in-browser

Open the cdk/bin/config.json file and populate the following configuration variables:

{
“prefix”: “aaa123”,
“aws_region”: “us-west-2”,
“bedrock_region”: “us-west-2”,
“bucket_name”: “summarization-test”,
“bedrock_model_id”: “anthropic.claude-3-sonnet-20240229-v1:0”
}

The template launches in the us-east-2 AWS Region by default. To launch the solution in a different Region, change the aws_region parameter accordingly. Make sure to select a Region in which all the AWS services in scope (Amazon Transcribe, Amazon Translate, Amazon Bedrock, Amazon Cognito, API Gateway, Lambda, Amazon S3) are available.
The Region used for bedrock_region can be different from aws_region because you might have access to Amazon Bedrock models in a Region different from the Region where you want to deploy the project.
By default, the project uses Anthropic’s Claude 3 Sonnet as a summarization model; however, you can use a different model by changing the bedrock_model_id in the configuration file. For the complete list of model IDs, see Amazon Bedrock model IDs. When selecting a model for your deployment, don’t forget to check that the desired model is available in your preferred Region; for more details about model availability, see Model support by AWS Region.

If you have never used the AWS CDK on this account and Region combination, you will need to run the following command to bootstrap the AWS CDK on the target account and Region (otherwise, you can skip this step):

npx cdk bootstrap aws://{targetAccountId}/{targetRegion}

Navigate to the cdk sub-directory, install dependencies, and deploy the stack by running the following commands:

cd cdk
npm i
npx cdk deploy

Confirm the deployment of the listed resources by entering y.

Wait for AWS CloudFormation to finish the stack creation.
You need to use the CloudFormation stack outputs to connect the frontend to the backend. After the deployment is complete, you have two options.
The preferred option is to use the provided postdeploy.sh script to automatically copy the cdk configuration parameters to a configuration file by running the following command, still in the /cdk folder:

./scripts/postdeploy.sh

Alternatively, you can copy the configuration manually:

Open the AWS CloudFormation console in the same Region where you deployed the resources.
Find the stack named AwsStreamAnalysisStack.
On the Outputs tab, note of the output values to complete the next steps.

Set up the extension
Complete the following steps to get the extension ready for transcribing, translating, and summarizing live streams:

Open the src/config.js Based on how you chose to collect the CloudFormation stack outputs, follow the appropriate step:

If you used the provided automation, check whether the values inside the src/config.js file have been automatically updated with the corresponding values.
If you copied the configuration manually, populate the src/config.js file with the values you noted. Use the following format:

const config = {
“aws_project_region”: “{aws_region}”, // The same you have used as aws_region in cdk/bin/config.json
“aws_cognito_identity_pool_id”: “{CognitoIdentityPoolId}”, // From CloudFormation outputs
“aws_user_pools_id”: “{CognitoUserPoolId}”, // From CloudFormation outputs
“aws_user_pools_web_client_id”: “{CognitoUserPoolClientId}”, // From CloudFormation outputs
“bucket_s3”: “{BucketS3Name}”, // From CloudFormation outputs
“bedrock_region”: “{bedrock_region}”, // The same you have used as bedrock_region in cdk/bin/config.json
“api_gateway_id”: “{APIGatewayId}” // From CloudFormation outputs
};

Take note of the CognitoUserPoolId, which will be needed in a later step to create a new user.

In the command line interface, move back to the aws-transcribe-translate-summarize-live-streams-in-browser directory with a command similar to following:

cd ~/aws-transcribe-translate-summarize-live-streams-in-browser

Install dependencies and build the package by running the following commands:

npm i
npm run build

Open your Chrome browser and navigate to chrome://extensions/.

Make sure that developer mode is enabled by toggling the icon on the top right corner of the page.

Choose Load unpacked and upload the build directory, which can be found inside the local project folder aws-transcribe-translate-summarize-live-streams-in-browser.
Grant permissions to your browser to record your screen and audio:

Identify the newly added Transcribe, translate and summarize live streams (powered by AWS)
Choose Details and then Site Settings.
In the Microphone section, choose Allow.

Create a new Amazon Cognito user:

On the Amazon Cognito console, choose User pools in the navigation pane.
Choose the user pool with the CognitoUserPoolId value noted from the CloudFormation stack outputs.
On the Users tab, choose Create user and configure this user’s verification and sign-in options.

See a walkthrough of Steps 4-6 in the animated image below. For additional details, refer to Creating a new user in the AWS Management Console.

Use the extension
Now that the extension in set up, you can interact with it by completing these steps:

On the browser tab, choose the Extensions.
Choose (right-click) on the Transcribe, translate and summarize live streams (powered by AWS) extension and choose Open side panel.
Log in using the credentials created in the Amazon Cognito user pool from the previous step.
Close the side panel.

You’re now ready to experiment with the extension.

Open a new tab in the browser, navigate to a website featuring an audio/video stream, and open the extension (choose the Extensions icon, then choose the option menu (three dots) next to AWS transcribe, translate, and summarize, and choose Open side panel).
Use the Settings pane to update the settings of the application:

Mic in use – The Mic not in use setting is used to record only the audio of the browser tab for a live video streaming. Mic in use is used for a real-time meeting where your microphone is recorded as well.
Transcription language – This is the language of the live stream to be recorded (set to auto to allow automatic identification of the language).
Translation language – This is the language in which the live stream will be translated and the summary will be printed. After you choose the translation language and start the recording, you can’t change your choice for the ongoing live stream. To change the translation language for the transcript and summary, you will have to record it from scratch.

Choose Start recording to start recording, and start exploring the Transcription and Translation

Content on the Translation tab will appear with a few seconds of delay compared to what you see on the Transcription tab. When transcribing speech in real time, Amazon Transcribe incrementally returns a stream of partial results until it generates the final transcription for a speech segment. This Chrome extension has been implemented to translate text only after a final transcription result is returned.

Expand the Summary section and choose Get summary to generate a summary. The operation will take a few seconds.
Choose Stop recording to stop recording.
Choose Clear all conversations in the Clean Up section to delete the summary of the live stream from the S3 bucket.

See the extension in action in the video below.

Troubleshooting
If you receive the error “Extension has not been invoked for the current page (see activeTab permission). Chrome pages cannot be captured.”, check the following:

Make sure you’re using the extension on the tab where you first opened the side pane. If you want to use it on a different tab, stop the extension, close the side pane, and choose the extension icon again to run it
Make sure you have given permissions for audio recording in the web browser.

If you can’t get the summary of the live stream, make sure you have stopped the recording and then request the summary. You can’t change the language of the transcript and summary after the recording has started, so remember to choose it appropriately before you start the recording.
Clean up
When you’re done with your tests, to avoid incurring future charges, delete the resources created during this walkthrough by deleting the CloudFormation stack:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack AwsStreamAnalysisStack.
Take note of the CognitoUserPoolId and CognitoIdentityPoolId values among the CloudFormation stack outputs, which will be needed in the following step.
Choose Delete stack and confirm deletion when prompted.

Because the Amazon Cognito resources won’t be automatically deleted, delete them manually:

On the Amazon Cognito console, locate the CognitoUserPoolId and CognitoIdentityPoolId values previously retrieved in the CloudFormation stack outputs.
Select both resources and choose Delete.

Conclusion
In this post, we showed you how to deploy a code sample that uses AWS AI and generative AI services to access features such as live transcription, translation and summarization. You can follow the steps we provided to start experimenting with the browser extension.
To learn more about how to build and scale generative AI applications, refer to Transform your business with generative AI.

About the Authors
Luca Guida is a Senior Solutions Architect at AWS; he is based in Milan and he supports independent software vendors in their cloud journey. With an academic background in computer science and engineering, he started developing his AI/ML passion at university; as a member of the natural language processing and generative AI community within AWS, Luca helps customers be successful while adopting AI/ML services.
Chiara Relandini is an Associate Solutions Architect at AWS. She collaborates with customers from diverse sectors, including digital native businesses and independent software vendors. After focusing on ML during her studies, Chiara supports customers in using generative AI and ML technologies effectively, helping them extract maximum value from these powerful tools.
Arian Rezai Tabrizi is an Associate Solutions Architect based in Milan. She supports enterprises across various industries, including retail, fashion, and manufacturing, on their cloud journey. Drawing from her background in data science, Arian assists customers in effectively using generative AI and other AI technologies.

Accelerate your financial statement analysis with Amazon Bedrock and g …

The financial and banking industry can significantly enhance investment research by integrating generative AI into daily tasks like financial statement analysis. By taking advantage of advanced natural language processing (NLP) capabilities and data analysis techniques, you can streamline common tasks like these in the financial industry:

Automating data extraction – The manual data extraction process to analyze financial statements can be time-consuming and prone to human errors. Generative AI models can automate finding and extracting financial data from documents like 10-Ks, balance sheets, and income statements. Foundation model (FMs) are trained to identify and extract relevant information like expenses, revenue, and liabilities.
Trend analysis and forecasting – Identifying trends and forecasting requires domain expertise and advanced mathematics. This limits the ability for individuals to run one-time reporting, while creating dependencies within an organization on a small subset of employees. Generative AI applications can analyze financial data and identify trends and patterns while forecasting future financial performance, all without manual intervention from an analyst. Removing the manual analysis step and allowing the generative AI model to build a report analyzing trends in the financial statement can increase the organization’s agility to make quick market decisions.
Financial reporting statements – Writing detailed financial analysis reports manually can be time-consuming and resource intensive. Dedicated resources to generate financial statements can create bottlenecks within the organization, requiring specialized roles to handle the translation of financial data into a consumable narrative. FMs can summarize financial statements, highlighting key metrics found through trend analysis and providing insights. An automated report writing process not only provides consistency and speed, but minimizes resource constraints in the financial reporting process.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage infrastructure.
In this post, we demonstrate how to deploy a generative AI application that can accelerate your financial statement analysis on AWS.
Solution overview
Building a generative AI application with Amazon Bedrock to analyze financial statements involves a series of steps, from setting up the environment to deploying the model and integrating it into your application.
The following diagram illustrates an example solution architecture using AWS services.
  
The workflow consists of the following steps:

The user interfaces with a web or mobile application, where they upload financial documents.
Amazon API Gateway manages and routes the incoming request from the UI.
An AWS Lambda function is invoked when new documents are added to the Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Bedrock analyzes the documents stored in Amazon S3. The analysis results are returned to the S3 bucket through a Lambda function and stored there.
Amazon DynamoDB provides a fast, scalable way to store and retrieve metadata and analysis results to display to users.
Amazon Simple Notification Service (Amazon SNS) sends notifications about the status of document processing to the application user.

In the following sections, we discuss the key considerations in each step to build and deploy a generative AI application.
Prepare the data
Gather the financial statements you want to analyze. These can be balance sheets, income statements, cash flow statements, and so on. Make sure the data is clean and in a consistent format. You might need to preprocess the data to remove noise and standardize the format. Preprocessing the data will transform the raw data into a state that can be efficiently used for model training. This is often necessary due to messiness and inconsistencies in real-world data. The outcome is to have consistent data for the model to ingest. The two most common types of data preprocessing are normalization and standardization.
Normalization modifies the numerical columns within a dataset to standardize the scale. By rearranging the data within a dataset, the scaling method reduces duplication in which the numbers are scaled from 0–1. Because outliers are removed, undesirable characteristics from the dataset are also removed. When dealing with a significant amount of data, normalizing the dataset enhances the performance of a machine learning model in environments where feature distribution is unclear.
Standardization is a method designed to rescale the values of a dataset to meet the characteristics of a standard normal distribution. By using this methodology, the data can transmit more reliably across systems, making it simpler to process, analyze, and store data in a database. Standardization is beneficial when feature distribution is consistent and values on a scale aren’t constrained within a particular range.
Choose your model
Amazon Bedrock gives you the power of choice by providing a flexible and scalable environment that allows you to access and use multiple FMs from leading AI model providers. This flexibility enables you to select the most appropriate models for your specific use cases, whether you’re working on tasks like NLP, text generation, image generation, or other AI-driven applications.
Deploy the model
If you don’t already have access to Amazon Bedrock FMs, you’ll need to request access through the Amazon Bedrock console. Then you can use the Amazon Bedrock console to deploy the chosen model. Configure the deployment settings according to your application’s requirements.
Develop the backend application
Create a backend service to interact with the deployed model. This service will handle requests from the frontend, send data to the model, and process the model’s responses. You can use Lambda, API Gateway, or other preferred REST API endpoints.
Use the Amazon Bedrock API to send financial statements to the model and receive the analysis results.
The following is an example of the backend code.

Develop the frontend UI
Create a frontend interface for users to upload financial statements and view analysis results. This can be a web or mobile application. Make sure the frontend can send financial statement data to the backend service and display the analysis results.
Conclusion
In this post, we discussed the benefits to building a generative AI application powered by Amazon Bedrock to accelerate the analysis of financial documents. Stakeholders will be able to use AWS services to deploy and manage LLMs that help improve the efficiency of pulling insights from common documents like 10-Ks, balance sheets, and income statements.
For more information on working with generative AI on AWS, visit the AWS Skill Builder generative AI training modules.
For instructions on building frontend applications and full-stack applications powered by Amazon Bedrock, refer to Front-End Web & Mobile on AWS and Create a Fullstack, Sample Web App powered by Amazon Bedrock.

About the Author
Jason D’Alba is an AWS Solutions Architect leader focused on enterprise applications, helping customers architect highly available and scalable data & ai solutions.

Qwen Open Sources the Powerful, Diverse, and Practical Qwen2.5-Coder S …

In the world of software development, there is a constant need for more intelligent, capable, and specialized coding language models. While existing models have made significant strides in automating code generation, completion, and reasoning, several issues persist. The main challenges include inefficiency in dealing with a diverse range of coding tasks, lack of domain-specific expertise, and difficulty in applying models to real-world coding scenarios. Despite the rise of many large language models (LLMs), code-specific models have often struggled to compete with their proprietary counterparts, especially in terms of versatility and applicability. The need for a model that not only performs well on standard benchmarks but also adapts to diverse environments has never been greater.

Qwen2.5-Coder: A New Era of Open CodeLLMs

Qwen has open-sourced the “Powerful,” “Diverse,” and “Practical” Qwen2.5-Coder series, dedicated to continuously promoting the development of open CodeLLMs. The Qwen2.5-Coder series is built upon the Qwen2.5 architecture, leveraging its advanced architecture and expansive tokenizer to enhance the efficiency and accuracy of coding tasks. Qwen has made a significant stride by open-sourcing these models, making them accessible to developers, researchers, and industry professionals. This family of coder models offers a range of sizes from 0.5B to 32B parameters, providing flexibility for a wide variety of coding needs. The release of Qwen2.5-Coder-32B-Instruct comes at an opportune moment, presenting itself as the most capable and practical coder model of the Qwen series. It highlights Qwen’s commitment to fostering innovation and advancing the field of open-source coding models.

Technical Details

Technically, Qwen2.5-Coder models have undergone extensive pretraining on a vast corpus of over 5.5 trillion tokens, which includes public code repositories and large-scale web-crawled data containing code-related texts. The model architecture is shared across different model sizes—1.5B and 7B parameters—featuring 28 layers with variances in hidden sizes and attention heads. Moreover, Qwen2.5-Coder has been fine-tuned using synthetic datasets generated by its predecessor, CodeQwen1.5, incorporating an executor to ensure only executable code is retained, thereby reducing hallucination risks. The models have also been designed to be versatile, supporting various pretraining objectives such as code generation, completion, reasoning, and editing.

State-of-the-Art Performance

One of the reasons why Qwen2.5-Coder stands out is its demonstrated performance across multiple evaluation benchmarks. It has consistently achieved state-of-the-art (SOTA) performance in over 10 benchmarks, including HumanEval and BigCodeBench, surpassing even some larger models. Specifically, Qwen2.5-Coder-7B-Base achieved higher accuracy on HumanEval and MBPP benchmarks compared to models like StarCoder2 and DeepSeek-Coder of comparable or even greater sizes. The Qwen2.5-Coder series also excels in multi-programming language capabilities, demonstrating balanced proficiency across eight languages—such as Python, Java, and TypeScript. Additionally, Qwen2.5-Coder’s long-context capabilities are notably strong, making it suitable for handling repository-level code and effectively supporting inputs up to 128k tokens.

Scalability and Accessibility

Furthermore, the availability of models in various parameter sizes (ranging from 0.5B to 32B), along with the option of quantized formats like GPTQ, AWQ, and GGUF ensures that Qwen2.5-Coder can cater to a wide range of computational requirements. This scalability is crucial for developers and researchers who may not have access to high-end computational resources but still need to benefit from powerful coding capabilities. Qwen2.5-Coder’s versatility in supporting different formats makes it more accessible for practical use, allowing for broader adoption in diverse applications. Such adaptability makes the Qwen2.5-Coder family a vital tool for promoting the development of open-source coding assistants.

Conclusion

The open sourcing of the Qwen2.5-Coder series marks a significant step forward in the development of coding language models. By releasing models that are powerful, diverse, and practical, Qwen has addressed key limitations of existing code-specific models. The combination of state-of-the-art performance, scalability, and flexibility makes the Qwen2.5-Coder family a valuable asset for the global developer community. Whether you are looking to leverage the capabilities of a 0.5B model or need the expansive power of a 32B variant, the Qwen2.5-Coder family aims to meet the needs of a diverse range of users. Now is indeed the perfect time to explore the possibilities with Qwen’s best coder model ever, the Qwen2.5-Coder-32B-Instruct, as well as its versatile family of smaller coders. Let’s welcome this new era of open-source coding language models that continue to push the boundaries of innovation and accessibility.

Check out the Paper, Models on Hugging Face, Demo, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live LinkedIn event] ‘One Platform, Multimodal Possibilities,’ where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast‘
The post Qwen Open Sources the Powerful, Diverse, and Practical Qwen2.5-Coder Series (0.5B/1.5B/3B/7B/14B/32B) appeared first on MarkTechPost.

Hugging Face Releases Sentence Transformers v3.3.0: A Major Leap for N …

Natural Language Processing (NLP) has rapidly evolved in the last few years, with transformers emerging as a game-changing innovation. Yet, there are still notable challenges when using NLP tools to develop applications for tasks like semantic search, question answering, or document embedding. One key issue has been the need for models that not only perform well but also work efficiently on a range of devices, especially those with limited computational resources, such as CPUs. Models tend to require substantial processing power to yield high accuracy, and this trade-off often leaves developers choosing between performance and practicality. Additionally, deploying large models with specialized functionalities can be cumbersome due to storage constraints and expensive hosting requirements. In response, continual innovations are essential to keep pushing NLP tools towards greater efficiency, cost-effectiveness, and usability for a broader audience.

Hugging Face Just Released Sentence Transformers v3.3.0

Hugging Face just released Sentence Transformers v3.3.0, and it’s a major update with significant advancements! This latest version is packed with features that address performance bottlenecks, enhance usability, and offer new training paradigms. Notably, the v3.3.0 update brings a groundbreaking 4.5x speedup for CPU inference by integrating OpenVINO’s int8 static quantization. There are also additions to facilitate training using prompts for a performance boost, integration of Parameter-Efficient Fine-Tuning (PEFT) techniques, and seamless evaluation capabilities through NanoBEIR. The release shows Hugging Face’s commitment to not just improving accuracy but also enhancing computational efficiency, making these models more accessible across a wide range of use cases.

Technical Details and Benefits

The technical enhancements in Sentence Transformers v3.3.0 revolve around making the models more practical for deployment while retaining high levels of accuracy. The integration of OpenVINO Post-Training Static Quantization allows models to run 4.78 times faster on CPUs with an average performance drop of only 0.36%. This is a game-changer for developers deploying on CPU-based environments, such as edge devices or standard servers, where GPU resources are limited or unavailable. A new method, export_static_quantized_openvino_model, has been introduced to make quantization straightforward.

Another major feature is the introduction of training with prompts. By simply adding strings like “query: ” or “document: ” as prompts during training, the performance in retrieval tasks improves significantly. For instance, experiments show a 0.66% to 0.90% improvement in NDCG@10, a metric for evaluating ranking quality, without any additional computational overhead. The addition of PEFT support means that training adapters on top of base models is now more flexible. PEFT allows for efficient training of specialized components, reducing memory requirements and enabling cheap deployment of multiple configurations from a single base model. Seven new methods have been introduced to add or load adapters, making it easy to manage different adapters and switch between them seamlessly.

Why This Release is Important

The v3.3.0 release addresses the pressing needs of NLP practitioners aiming to balance efficiency, performance, and usability. The introduction of OpenVINO quantization is crucial for deploying transformer models in production environments with limited hardware capabilities. For instance, the reported 4.78x speed improvement on CPU-based inference makes it possible to use high-quality embeddings in real-time applications where previously the computational cost would have been prohibitive. The prompt-based training also illustrates how relatively minor adjustments can yield significant performance gains. A 0.66% to 0.90% improvement in retrieval tasks is a remarkable enhancement, especially when it comes at no extra cost.

PEFT integration allows for more scalability in training and deploying models. It is particularly beneficial in environments where resources are shared, or there is a need to train specialized models with minimal computational load. The new ability to evaluate on NanoBEIR, a collection of 13 datasets focused on retrieval tasks, adds an extra layer of assurance that the models trained using v3.3.0 can generalize well across diverse tasks. This evaluation framework allows developers to validate their models on real-world retrieval scenarios, offering a benchmarked understanding of their performance and making it easy to track improvements over time.

Conclusion

The Sentence Transformers v3.3.0 release from Hugging Face is a significant step forward in making state-of-the-art NLP more accessible and usable across diverse environments. With substantial CPU speed improvements through OpenVINO quantization, prompt-based training to enhance performance without extra cost, and the introduction of PEFT for more scalable model management, this update ticks all the right boxes for developers. It ensures that models are not just powerful but also efficient, versatile, and easier to integrate into various deployment scenarios. Hugging Face continues to push the envelope, making complex NLP tasks more feasible for real-world applications while fostering innovation that benefits both researchers and industry professionals alike.

Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live LinkedIn event] ‘One Platform, Multimodal Possibilities,’ where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast‘

The post Hugging Face Releases Sentence Transformers v3.3.0: A Major Leap for NLP Efficiency appeared first on MarkTechPost.

DeepMind Released AlphaFold 3 Inference Codebase, Model Weights and An …

DeepMind has once again taken a significant step in computational biology with the release of AlphaFold 3’s inference codebase, model weights, and an on-demand server. This update brings unprecedented capabilities to the already transformative AlphaFold platform, extending its reach beyond proteins to accurately predict the structure and interactions of almost all of life’s molecules, including nucleic acids, ligands, ions, and modified residues, all in one unified platform. Let’s explore the implications and the technological leap represented by AlphaFold 3.

Addressing the Challenges in Biomolecular Structure Prediction

The accurate prediction of biomolecular structures is one of the most pressing challenges in biology and medicine. Complex biological processes, such as protein synthesis, signal transduction, and drug interactions, rely on intricate molecular structures and precise interactions. Despite significant advances with tools like AlphaFold 2, a considerable gap remained in modeling complexes that include various molecular types such as nucleic acids, ions, and other modifications. Traditional methods are typically domain-specific and fail to generalize well across diverse biomolecular entities. They also suffer from substantial computational requirements, resulting in delays that hinder quick experimentation and practical therapeutic design. To address these challenges, a more generalized, high-accuracy solution was needed—this is where AlphaFold 3 steps in.

DeepMind Releases AlphaFold 3

DeepMind recently released the inference codebase, model weights, and an on-demand server for AlphaFold 3. This release makes it easier for researchers and developers worldwide to integrate the power of AlphaFold into their workflows. Compared to its predecessor, AlphaFold 2, AlphaFold 3 offers a more sophisticated architecture capable of predicting the joint structure of biomolecular complexes, including proteins, DNA, RNA, ligands, ions, and even chemical modifications. This version is designed to accommodate highly complex interactions within biological systems, and the release includes access to model weights, allowing researchers to directly replicate or extend the existing capabilities.

The on-demand server makes AlphaFold 3 accessible without the need for substantial computational infrastructure. By simply providing sequence or structure input, users can query the server to obtain high-accuracy structural predictions, significantly lowering the barrier for research institutions and companies without advanced computational capabilities.

Technical Details

AlphaFold 3 introduces a diffusion-based architecture, significantly improving accuracy for predicting biomolecular interactions. Unlike AlphaFold 2, which mainly focused on proteins, AlphaFold 3 employs a generalized architecture capable of predicting structures for a broader range of biomolecular types. The new “pairformer” replaces AlphaFold 2’s “evoformer” as the central processing module, simplifying the process and improving efficiency. The system operates by directly predicting atomic coordinates using a diffusion model, removing the need for specific torsion angle predictions and stereochemical handling that added complexity in earlier models.

The multiscale nature of the diffusion process enhances the accuracy of predictions by reducing stereochemical losses and eliminating the need for multiple-sequence alignments. As shown in the benchmarks, AlphaFold 3 significantly outperforms traditional tools like AutoDock Vina and RoseTTAFold All-Atom, providing far greater accuracy in protein-ligand interactions and protein-nucleic acid complexes. These advancements not only make AlphaFold 3 more versatile but also drastically reduce the computational burden, allowing broader adoption across industries that need accurate biomolecular structures.

Importance of This Release

The release of AlphaFold 3 is monumental for many reasons. First and foremost, it fills a critical gap in our understanding of complex biomolecular interactions that involve not just proteins but multiple classes of molecules. The updated architecture of AlphaFold 3 can model almost any type of complex found in the Protein Data Bank (PDB). For instance, AlphaFold 3 demonstrated substantial improvement over previous versions, particularly in predicting antibody-antigen interactions, protein-ligand binding, and nucleic acid interactions with impressive accuracy across datasets like PoseBusters and CASP15 RNA targets. The performance metrics showed significant uplift across these tasks, with AlphaFold 3 achieving accuracy levels that outpaced traditional docking and nucleic acid prediction tools.

With improved on-demand availability, AlphaFold 3 empowers research into diseases that involve complex protein-DNA or protein-ligand interactions, such as cancer and neurodegenerative diseases, by providing reliable structural models for these intricate systems. Its ability to handle complex chemical modifications and predict accurate structures even in the presence of modifications (like glycosylation or phosphorylation) makes it invaluable for drug design and discovery. As such, AlphaFold 3 represents a step towards integrating computational models more effectively into therapeutic research, enhancing our capacity to design precise interventions at the molecular level.

Conclusion

DeepMind’s release of AlphaFold 3 has taken the world of structural biology into new territory. By including model weights, inference code, and an on-demand server, DeepMind has opened the door for researchers across disciplines to harness cutting-edge technology without prohibitive infrastructure requirements. AlphaFold 3’s advancements in structure prediction—spanning proteins, nucleic acids, ligands, and more—promise to accelerate our understanding of biomolecular interactions, potentially leading to significant breakthroughs in drug development and molecular biology.

Check out the Paper, Codebase, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live LinkedIn event] ‘One Platform, Multimodal Possibilities,’ where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast‘
The post DeepMind Released AlphaFold 3 Inference Codebase, Model Weights and An On-Demand Server appeared first on MarkTechPost.

Fine-tune Meta Llama 3.2 text generation models for generative AI infe …

Generative AI models have seen tremendous growth, offering cutting-edge solutions for text generation, summarization, code generation, and question answering. Despite their versatility, these models often struggle when applied to niche or domain-specific tasks because their pre-training is typically based on large, generalized datasets. To address these gaps and maximize their utility in specialized scenarios, fine-tuning with domain-specific data is essential to boost accuracy and relevance.
Meta’s newly launched Llama 3.2 series sets a new benchmark in generative AI with its advanced multimodal capabilities and optimized performance across diverse hardware platforms. The collection spans lightweight models like Llama-3.2-1B and Llama-3.2-3B, which support up to 128,000 tokens of context and are tailored for edge devices. These models are ideal for on-device applications such as real-time summarization, instruction following, and multilingual text generation. On the other end of the spectrum, the larger Llama-3.2-11B and Llama-3.2-90B models offer powerful vision-enabled capabilities for tasks such as image understanding, document analysis, and visual grounding. This allows for sophisticated use cases like generating captions for images, interpreting complex graphs, and reasoning over visual data. For instance, the Meta Llama 3.2 models can analyze sales data presented in a graph to provide actionable insights or locate specific objects on a map using natural language instructions.
In this post, we demonstrate how to fine-tune Meta’s latest Llama 3.2 text generation models, Llama 3.2 1B and 3B, using Amazon SageMaker JumpStart for domain-specific applications. By using the pre-built solutions available in SageMaker JumpStart and the customizable Meta Llama 3.2 models, you can unlock the models’ enhanced reasoning, code generation, and instruction-following capabilities to tailor them for your unique use cases. Whether you’re working in finance, healthcare, or any other specialized field, fine-tuning these models will allow you to bridge the gap between general AI capabilities and domain-specific expertise.
Solution overview
SageMaker JumpStart is a robust feature within the SageMaker machine learning (ML) environment, offering practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs). This managed service accelerates the ML development process by providing access to a growing list of cutting-edge models from leading model hubs and providers. You can quickly evaluate, compare, and select FMs based on predefined quality and responsibility metrics for tasks such as article summarization and image generation.
SageMaker JumpStart allows for full customization of pre-trained models to suit specific use cases using your own data. Deployment to production environments is streamlined through the user interface or SDK, enabling rapid integration into applications. The platform also supports organizational collaboration by allowing the sharing of artifacts, including models and notebooks, to expedite model building and deployment. Administrators can manage the visibility of models within the organization, enhancing governance and security.
Furthermore, SageMaker JumpStart enables practitioners to deploy models to dedicated SageMaker instances within a network-isolated environment, maintaining compliance and data protection. By using the robust training and deployment capabilities available in SageMaker, you can customize and scale models to meet diverse ML requirements efficiently.
Prerequisites
To try out this solution using SageMaker JumpStart, you’ll need the following prerequisites:

An AWS account that will contain all of your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
Access to Amazon SageMaker Studio or a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.

Fine-tune Meta Llama 3.2 text generation models
In this section, we demonstrate how to fine-tune Meta Llama 3.2 text generation models. We will first look at the approach of fine-tuning using the SageMaker Studio UI without having to write any code. We then also cover how to fine-tune the model using SageMaker Python SDK.
No-code fine-tuning using the SageMaker Studio UI
SageMaker JumpStart provides access to publicly available and proprietary FMs from third-party and proprietary providers. Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. It helps reduce the time and effort required to build ML models from scratch, allowing teams to focus on fine-tuning and customizing the models for their specific use cases. These models are released under different licenses designated by their respective sources. It’s essential to review and adhere to the applicable license terms before downloading or using these models to make sure they’re suitable for your intended use case.
You can access the Meta Llama 3.2 FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we cover how to discover these models in SageMaker Studio.
SageMaker Studio is an IDE that offers a web-based visual interface for performing the ML development steps, from data preparation to model building, training, and deployment. For instructions on getting started and setting up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane. You’re presented with the list of public models offered by SageMaker, where you can explore other models from other providers.

To start using the Meta Llama 3.2 models, under Providers, choose Meta. You’re presented with a list of the models available.

Choose the Meta Llama 3.2 1B Instruct model. Here you can view the model details, as well as train, deploy, optimize, and evaluate the model.

For this demonstration, we choose Train.

On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning.

In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning.
Choose Submit to start the training job on a SageMaker ML instance.

Accept the Llama 3.2 Community License Agreement to initiate the fine-tuning process.

Deploy the model
After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

You can also deploy the model from this view. You can configure endpoint settings such as the instance type, number of instances, and endpoint name. You will need to accept the End User License Agreement (EULA) before you can deploy the model.

Fine-tune using the SageMaker Python SDK
You can also fine-tune Meta Llama 3.2 models using the SageMaker Python SDK. A sample notebook with the full instructions can be found on GitHub. The following code example demonstrates how to fine-tune the Meta Llama 3.2 1B model:

import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

# To fine-tune the Llama 3.2 3B model available on JumpStart, please change model_id to `meta-textgeneration-llama-3-2-3b`.
model_id = “meta-textgeneration-llama-3-2-1b”
accept_eula = “true”
estimator = JumpStartEstimator(
model_id=model_id, environment={“accept_eula”: accept_eula}
)

# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use instruction_tuned=”True”
estimator.set_hyperparameters(instruction_tuned=”True”, epoch=”5″, max_input_length = “1024”,)
estimator.fit({“training”: train_data_location})

The code sets up a SageMaker JumpStart estimator for fine-tuning the Meta Llama 3.2 large language model (LLM) on a custom training dataset. It configures the estimator with the desired model ID, accepts the EULA, enables instruction tuning by setting instruction_tuned=”True”, sets the number of training epochs, and initiates the fine-tuning process.
When the fine-tuning job is complete, you can deploy the fine-tuned model directly from the estimator, as shown in the following code. As part of the deploy settings, you can define the instance type you want to deploy the model on. For the full list of deployment parameters, refer to the deploy parameters in the SageMaker SDK documentation.

finetuned_predictor = estimator.deploy(instance_type=’ml.g5.xlarge’)

After the endpoint is up and running, you can perform an inference request against it using the predictor object as follows:

prompt = “Your prompt goes here”
payload = {
        “inputs”: prompt,
        “parameters”: {“max_new_tokens”: 256},
    }
response = finetuned_predictor.predict(payload)
response.get(‘generated_text’)

For the full list of predictor parameters, refer to the predictor object in the SageMaker SDK documentation.
Fine-tuning technique
Language models such as Meta Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly higher CUDA memory. Furthermore, training these models can be very slow due to their size. Therefore, for efficient fine-tuning, we use the following optimizations:

Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. In this method, we freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 3 billion parameters for Meta Llama 3.2 3B, we can fine-tune less than 1% of the parameters. This helps significantly reduce the memory requirement because we only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps reduce both training time and cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
Int8 quantization – Even with optimizations such as LoRA, models like Meta Llama 70B require significant computational resources for training. To reduce the memory footprint during training, we can employ Int8 quantization. Quantization typically reduces the precision of the floating-point data types. Although this decreases the memory required to store model weights, it can potentially degrade the performance due to loss of information. However, Int8 quantization utilizes only a quarter of the precision compared to full-precision training, but it doesn’t incur significant degradation in performance. Instead of simply dropping bits, Int8 quantization rounds the data from one type to another, preserving the essential information while optimizing memory usage. To learn about Int8 quantization, refer to int8(): 8-bit Matrix Multiplication for Transformers at Scale.
Fully Sharded Data Parallel (FSDP) – This is a type of data parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance through communication and computation overlapping during training.

The following table compares different methods with the two Meta Llama 3.2 models.

Model
JumpStart Model IDs
Default Instance Type
Supported Instances Types for Fine-Tuning

Meta Llama 3.2 1B
meta-textgeneration-llama-3-2-1b meta-textgeneration-llama-3-2-1b-instruct
ml.g5.2xlarge
ml.g5.2xlarge ml.g5.4xlarge ml.g5.8xlarge ml.g5.12xlarge ml.p3dn.24xlarge ml.g4dn.12xlarge ml.p5.48xlarge

Meta Llama 3.2 3B
meta-textgeneration-llama-3-2-3b meta-textgeneration-llama-3-2-3b-instruct
ml.g5.12xlarge
ml.g5.12xlarge ml.g5.24xlarge ml.g5.48xlarge ml.p3dn.24xlarge ml.g4dn.12xlarge ml.p5.48xlarge

Other instance types may also work for fine-tuning. When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.
Training dataset format
SageMaker JumpStart currently support datasets in both domain adaptation format and instruction tuning format. In this section, we specify an example dataset in both formats. For more details, refer to the Dataset formatting section in the appendix.
Domain adaption format
You can fine-tune the Meta Llama 3.2 text generation model on domain-specific datasets, enabling it to generate relevant text and tackle various natural language processing (NLP) tasks within a particular domain using few-shot prompting. This fine-tuning process involves providing the model with a dataset specific to the target domain. The dataset can be in various formats, such as CSV, JSON, or TXT files. For example, if you want to fine-tune the model for the domain of financial reports and filings, you could provide it with a text file containing SEC filings from a company like Amazon. The following is an excerpt from such a filing:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

Instruction tuning format
In instruction fine-tuning, the model is fine-tuned for a set of NLP tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. In instruction tuning dataset format, you specify the template.json file describing the input and the output formats and the train.jsonl file with the training data item in each line.
The template.json file always has the following JSON format:

{
  “prompt”: “<<Prompt goes here along with question or context or instruction>>”,
  “completion”: “<<completion goes here depending on the activity, for ex: answer for Q&A or summary for Summarization task>>”
}

For instance, the following table shows the template.json and train.jsonl files for the Dolly and Dialogsum datasets.

Dataset
Use Case
template.json
train.jsonl

Dolly
Question Answering

{ “prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn”, “completion”: ” {response}” }

{ “instruction”: “Who painted the Two Monkeys”, “context”: “Two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemäldegalerie (Painting Gallery) of the Berlin State Museums.”, “response”: “The two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemaeldegalerie (Painting Gallery) of the Berlin State Museums.” }

Dialogsum
Text Summarization

{ “prompt”: “Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.nn### Instruction:n{dialogue}nn”, “completion”: ” {summary}” }

{ “dialogue”: “#Person1#: Where do these flower vases come from? n#Person2#: They are made a town nearby. The flower vases are made of porcelain and covered with tiny bamboo sticks. n#Person1#: Are they breakable? n#Person2#: No. They are not only ornmamental, but also useful. n#Person1#: No wonder it’s so expensive. “, “summary”: “#Person2# explains the flower vases’ materials and advantages and #Person1# understands why they’re expensive.” }

Supported hyperparameters for training
The fine-tuning process for Meta Llama 3.2 models allows you to customize various hyperparameters, each of which can influence factors such as memory consumption, training speed, and the performance of the fine-tuned model. At the time of writing this post, the following are the default hyperparameter values. For the most up-to-date information, refer to the SageMaker Studio console, because these values may be subject to change.

int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is False.
enable_fsdp – If True, training uses FSDP. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is True.
epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 0.0001.
lora_r – LoRA R dimension. Must be a positive integer. Default is 8.
lora_alpha – LoRA Alpha. Must be a positive integer. Default is 32.
target_modules – Target modules for LoRA fine-tuning. You can specify a subset of [‘q_proj’,’v_proj’,’k_proj’,’o_proj’,’gate_proj’,’up_proj’,’down_proj’] modules as a string separated by a comma without any spaces. Default is q_proj,v_proj.
lora_dropout – LoRA dropout. Must be a positive float between 0–1. Default is 0.05.
instruction_tuned – Whether to instruction-train the model or not. At most, one of instruction_tuned and chat_dataset can be True. Must be True or False. Default is False.
chat_dataset – If True, dataset is assumed to be in chat format. At most, one of instruction_tuned and chat_dataset can be True. Default is False.
add_input_output_demarcation_key – For an instruction tuned dataset, if this is True, a demarcation key (“### Response:n”) is added between the prompt and completion before training. Default is True.
per_device_train_batch_size – The batch size per GPU core/CPU for training. Default is 4.
per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Default is 1.
max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
seed – Random seed that will be set at the beginning of training. Default is 10.
max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
validation_split_ratio – If validation channel is None, ratio of train-validation split from the train data must be between 0–1. Default is 0.2.
train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.

Instance types and compatible hyperparameters
The memory requirement during fine-tuning may vary based on several factors:

Model type – The 1B model has the smallest GPU memory requirement and the 3B model has a higher memory requirement
Max input length – A higher value of input length leads to processing more tokens at a time and as such requires more CUDA memory
Batch size – A larger batch size requires larger CUDA memory and therefore requires larger instance types
Int8 quantization – If using Int8 quantization, the model is loaded into low precision mode and therefore requires less CUDA memory

To help you get started, we provide a set of combinations of different instance types, hyperparameters, and model types that can be successfully fine-tuned. You can select a configuration as per your requirements and availability of instance types. We fine-tune both two models on a variety of settings with three epochs on a subset of the Dolly dataset with summarization examples.
The results for fine-tuning the models are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models.
Meta Llama 3.2 1B fine-tuning with various hyperparameters
The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 1B.

Instance Type
Max Input Length
Per Device Training Batch Size
Int8 Quantization
Enable FSDP
Time Taken (Minutes)

ml.g5.2xlarge
1024
4
FALSE
TRUE
11.3

ml.g5.2xlarge
1024
8
FALSE
TRUE
11.12

ml.g5.2xlarge
1024
4
FALSE
FALSE
14.55

ml.g5.2xlarge
2048
4
FALSE
TRUE
10.95

ml.g5.2xlarge
1024
4
TRUE
FALSE
17.82

ml.g5.2xlarge
2048
4
TRUE
FALSE
17.4

ml.g5.2xlarge
1024
8
TRUE
FALSE
16.97

ml.g5.4xlarge
1024
8
FALSE
TRUE
11.28

ml.g5.4xlarge
1024
4
FALSE
TRUE
11.48

ml.g5.4xlarge
2048
4
FALSE
TRUE
11.27

ml.g5.4xlarge
1024
4
FALSE
FALSE
14.8

ml.g5.4xlarge
1024
4
TRUE
FALSE
17.38

ml.g5.4xlarge
1024
8
TRUE
FALSE
16.63

ml.g5.4xlarge
2048
4
TRUE
FALSE
16.8

ml.g5.8xlarge
1024
4
FALSE
TRUE
11.12

ml.g5.8xlarge
2048
4
FALSE
TRUE
10.87

ml.g5.8xlarge
1024
8
FALSE
TRUE
10.88

ml.g5.8xlarge
1024
4
FALSE
FALSE
14.47

ml.g5.8xlarge
1024
4
TRUE
FALSE
17.82

ml.g5.8xlarge
1024
8
TRUE
FALSE
17.13

ml.g5.8xlarge
2048
4
TRUE
FALSE
17.13

ml.g5.12xlarge
2048
4
FALSE
FALSE
14.72

ml.g5.12xlarge
1024
4
FALSE
TRUE
10.45

ml.g5.12xlarge
1024
8
TRUE
FALSE
17.23

ml.g5.12xlarge
1024
8
FALSE
FALSE
14.03

ml.g5.12xlarge
1024
4
FALSE
FALSE
14.22

ml.g5.12xlarge
1024
4
TRUE
FALSE
18.07

ml.g5.12xlarge
2048
4
TRUE
FALSE
18.15

ml.g5.12xlarge
2048
4
FALSE
TRUE
8.45

ml.g5.12xlarge
1024
8
FALSE
TRUE
8.87

ml.g4dn.12xlarge
1024
8
FALSE
TRUE
21.15

ml.g4dn.12xlarge
1024
4
TRUE
FALSE
35.12

ml.g4dn.12xlarge
1024
4
FALSE
TRUE
22.42

ml.g4dn.12xlarge
1024
4
FALSE
FALSE
34.62

ml.g4dn.12xlarge
2048
4
FALSE
TRUE
23.25

Meta Llama 3.2 3B fine-tuning with various hyper parameters
The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 3B.

Instance Type
Max Input Length
Per Device Training Batch Size
Int8 Quantization
Enable FSDP
Time Taken (Minutes)

ml.g5.12xlarge
1024
8
TRUE
FALSE
29.18

ml.g5.12xlarge
2048
4
TRUE
FALSE
29.8

ml.g5.12xlarge
1024
4
FALSE
FALSE
26.2

ml.g5.12xlarge
1024
8
FALSE
TRUE
12.88

ml.g5.12xlarge
2048
4
FALSE
TRUE
11.8

ml.g5.12xlarge
1024
4
FALSE
TRUE
14.98

ml.g5.12xlarge
1024
4
TRUE
FALSE
30.05

ml.g5.12xlarge
1024
4
TRUE
FALSE
29.87

ml.g5.24xlarge
1024
4
FALSE
FALSE
25.97

ml.g5.24xlarge
1024
4
FALSE
TRUE
14.65

ml.g5.24xlarge
1024
4
TRUE
FALSE
29.32

ml.g5.24xlarge
2048
4
TRUE
FALSE
29.77

ml.g5.24xlarge
1024
8
TRUE
FALSE
28.78

ml.g5.24xlarge
2048
4
FALSE
TRUE
11.62

ml.g5.24xlarge
1024
8
FALSE
TRUE
12.38

ml.g5.48xlarge
1024
8
FALSE
TRUE
14.25

ml.g5.48xlarge
1024
4
FALSE
FALSE
26.2

ml.g5.48xlarge
2048
4
FALSE
TRUE
13.32

ml.g5.48xlarge
1024
4
FALSE
TRUE
16.73

ml.g5.48xlarge
1024
4
TRUE
FALSE
30.3

ml.g5.48xlarge
2048
4
FALSE
FALSE
28.7

ml.g5.48xlarge
1024
8
FALSE
FALSE
25.6

ml.g5.48xlarge
1024
8
TRUE
FALSE
29.33

ml.g5.48xlarge
2048
4
TRUE
FALSE
30.63

Recommendations on instance types and hyperparameters
When fine-tuning for the model’s accuracy, keep in mind the following:

Larger models such as 3B provide better performance than 1B
Performance without Int8 quantization is better than performance with Int8 quantization

Note the following training time and CUDA memory requirements:

Setting int8_quantization=True decreases the memory requirement.
The combination of per_device_train_batch_size, int8_quantization, and enable_fsdp settings affects the training times. When using a larger batch size with FSDP enabled, the training times are faster compared to using a larger batch size without FSDP.
Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

When choosing the instance type, consider the following:

At the time of writing this post, the G5 instances provided the most efficient training among the supported instance types. However, because AWS regularly updates and introduces new instance types, we recommend that you validate the recommended instance type for Meta Llama 3.2 fine-tuning in the SageMaker documentation or SageMaker console before proceeding.
Training time largely depends on the amount of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the more cost-effective instance for training (ml.g5.2xlarge).

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.
If your dataset is in instruction tuning format, where each sample consists of an instruction (input) and the desired model response (completion), and these input+completion sequences are short (for example, 50–100 words), using a high value for max_input_length can lead to poor performance. This is because the model may struggle to focus on the relevant information when dealing with a large number of padding tokens, and it can also lead to inefficient use of computational resources. The default value of -1 corresponds to a max_input_length of 1024 for Meta Llama models. We recommend setting max_input_length to a smaller value (for example, 200–400) when working with datasets containing shorter input+completion sequences to mitigate these issues and potentially improve the model’s performance and efficiency.
Lastly, due to the high demand of the G5 instances, you may experience unavailability of these instances in your AWS Region with the error “CapacityError: Unable to provision requested ML compute capacity. Please retry using a different ML instance type.” If you experience this error, retry the training job or try a different Region.
Issues when fine-tuning large models
In this section, we discuss two issues when fine-tuning very large models.
Disable output compression
By default, the output of a training job is a trained model that is compressed in a .tar.gz format before it’s uploaded to Amazon S3. However, for large models like the 70B model, this compression step can be time-consuming, taking more than 4 hours. To mitigate this delay, it’s recommended to use the disable_output_compression feature supported by the SageMaker training environment. When disable_output_compression is set to True, the model is uploaded without any compression, which can significantly reduce the time taken for large model artifacts to be uploaded to Amazon S3. The uncompressed model can then be used directly for deployment or further processing. The following code shows how to pass this parameter into the SageMaker JumpStart estimator:

estimator = JumpStartEstimator(
                                model_id=model_id,
                                environment={“accept_eula”: “true”},
                                disable_output_compression=True
                                )

SageMaker Studio kernel timeout issue
The SageMaker Studio kernel is only used to initiate the training job, and its status doesn’t affect the ongoing training process. After the training job starts, the compute resources allocated for the job will continue running the training process, regardless of whether the SageMaker Studio kernel remains active or times out. If the kernel times out during the lengthy training process, you can still deploy the endpoint after training is complete using the training job name with the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator
training_job_name = <<<INSERT_TRAINING_JOB_NAME>>>

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
predictor = attached_estimator.deploy()

To find the training job name, navigate to the SageMaker console and under Training in the navigation pane, choose Training jobs. Identify the training job name and substitute it in the preceding code.
Clean up
To prevent incurring unnecessary charges, it’s recommended to clean up the deployed resources when you’re done using them. You can remove the deployed model with the following code:

predictor.delete_predictor()

Conclusion
As generative AI models continue to evolve, their effectiveness hinges on the ability to adapt and specialize for domain-specific applications. Meta’s Llama 3.2 series, with its innovative multimodal features and flexible deployment options, provides a powerful foundation for building tailored AI solutions. By fine-tuning these models using SageMaker JumpStart, organizations can transform generalized capabilities into highly specialized tools, enhancing precision and delivering meaningful results for complex, real-world problems. Whether you’re aiming to improve document analysis, automate visual interpretation, or generate domain-specific content, Meta Llama 3.2 models, fine-tuned to your needs, can bridge the gap between broad AI functionalities and targeted expertise, driving impactful outcomes in your field.
In this post, we discussed fine-tuning Meta Llama 3.2 text generation models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out.
As shown in the results of fine-tuning the models over two datasets, fine-tuning improves summarization compared to non-fine-tuned models.
As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

About the Authors
Pavan Kumar Rao Navule is a Solutions Architect at Amazon Web Services, where he works with ISVs in India to help them innovate on the AWS platform. He is specialized in architecting AI/ML and generative AI services at AWS. Pavan is a published author for the book “Getting Started with V Programming.” In his free time, Pavan enjoys listening to the great magical voices of Sia and Rihanna.
Jin Tan Ruan is a Prototyping Developer at AWS, part of the AWSI Strategic Prototyping and Customer Engineering (PACE) team, where he focuses on NLP and generative AI. With nine AWS certifications and a robust background in software development, Jin uses his expertise to help AWS strategic customers bring their AI/ML and generative AI projects to life. He holds a Master’s degree in Machine Learning and Software Engineering from Syracuse University. Outside of work, Jin is an avid gamer and a fan of horror films. You can find Jin on LinkedIn to learn more!

Appendix
In this section, we present the results for fine-tuning the Meta Llama 3.2 1B and 3B text generation models on different datasets. This section also covers the dataset formatting for domain adaptation and instruction fine-tuning techniques.
Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dolly dataset

Inputs
Ground Truth
Response from non-finetuned model
Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Write down some points on Yugoslav Cup using given paragraph as a base. ### Input: The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams. ### Response:
1. Clubs from all levels of the football pyramid were eligible to participate in the Yugoslav Cup. 2. Additionally, amateur teams assembled by numerous enterprises and industrial plants as well as individual Yugoslav People’s Army garrisons were encouraged to compete, which meant that each cup edition may include thousands of teams in its preliminary rounds. 3. Prior to the first round proper, where they would be partnered with top-flight teams, these teams would compete in a number of qualification rounds.
The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams. ### Input: The
1. The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What role did Danbury, Connecticut play in the American Revolution? ### Input: During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, are questioned by modern scholars. During the following day on April 26, 1777, the British, under Major General William Tryon, burned and sacked Danbury, but fatalities were limited due to Ludington’s warning. The central motto on the seal of the City of Danbury is Restituimus, (Latin for “We have restored”), a reference to the destruction caused by the Loyalist army troops. The American General David Wooster was mortally wounded at the Battle of Ridgefield by the British forces which had raided Danbury, but at the beginning of the battle, the Americans succeeded in driving the British forces down to Long Island Sound. Wooster is buried in Danbury’s Wooster Cemetery; the private Wooster School in Danbury also was named in his honor. ### Response:
Danbury, Connecticut served as an important military depot for the Continental Army during the revolutionary war. In April 26, 1777 the British Army attacked, burned and sacked the city, but the American troop were able to drive the British forces back to the Long Island sound.
Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family,
During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Give me a bulleted list of the main responses to “Bad Blood” listed in this text, including whether they were positive or negative. ### Input: “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records. The lyrics are about feelings of betrayal by a close friend. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s defiant attitude and dubbing it an album highlight, while others criticized its production and lyrics. The remixed single was praised for the reworked instrumentation and Lamar’s verses, which others considered to be out of place on a pop song. Media outlets speculated that American singer Katy Perry is the subject of the song. The remix received a Grammy nomination for Best Pop Duo/Group Performance. “Bad Blood” was supported by a high-budget music video directed by Joseph Kahn and produced by Swift. It featured an ensemble cast consisting of many singers, actresses and fashion models, which received wide media coverage. Critics praised the video for its cinematic and futuristic visuals inspired by neo-noir styles. It won a Grammy Award for Best Music Video, and two MTV Video Music Awards for the Video of the Year and Best Collaboration. Commercially, “Bad Blood” reached number one in Australia, Canada, New Zealand, and Scotland, as well as the United States, where it topped the Billboard Hot 100, Adult Top 40 and Mainstream Top 40 charts. It has been certified triple platinum in Australia and Canada, and 6√ó platinum in the US. ### Response:
* positive about Taylor Swift’s defiant attitude * negative about the song’s production and lyrics * positive about the instrumentation and verses * positive about the accompanying music video’s visuals * positive in terms of topping several charts
1. “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic
1. The song is a pop song with stomping drums. 2. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records. 3. The lyrics are about feelings of betrayal by a close friend. 4. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the highlights of the Apollo 11 mission? ### Input: Apollo 11 (July 16‚Äì24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil Armstrong and lunar module pilot Buzz Aldrin landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a quarter hours together exploring the site they had named Tranquility Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg) of lunar material to bring back to Earth as pilot Michael Collins flew the Command Module Columbia in lunar orbit, and were on the Moon’s surface for 21 hours, 36 minutes before lifting off to rejoin Columbia. ### Response:
Apollo 11 was the culmination of a multi year space race instigated by the Cold War, leading to the United States landing the first humans on the Moon. The mission had 3 personnel, required multiple years of preparation, and consisted of 8 key phases; launch, flight, lunar orbit, lunar descent, landing, lunar surface operations, lunar ascent, return, splashdown, and Quarantine. The Apollo 11 mission remains the most famous mission ever in the NASA program. The 3 astronauts were Michael Collins, Buzz Aldrin, and Neil Armstrong.
The Apollo 11 mission was a historic event that changed the course of human history. It was the first time humans had landed on the Moon, and it was the first time humans had walked on the Moon. The mission was a success, and it was a major step forward in the exploration of space. The mission was also a major step forward in the exploration of the Moon, and it was a major step forward in the exploration of the Moon, and it was a major step forward in the exploration
The Apollo 11 mission was the first manned mission to land humans on the Moon. The mission was led by Neil Armstrong and Buzz Aldrin, who landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract the key elements from the paragraph about Apple and list them in bullet points ### Input: Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google), Amazon, Meta (known for Facebook), and Microsoft. Apple was founded as Apple Computer Company on April 1, 1976, by Steve Wozniak, Steve Jobs and Ronald Wayne to develop and sell Wozniak’s Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company’s second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers. Apple went public in 1980 to instant financial success. The company developed computers featuring innovative graphical user interfaces, including the 1984 original Macintosh, announced that year in a critically acclaimed advertisement. By 1985, the high cost of its products, and power struggles between executives, caused problems. Wozniak stepped back from Apple amicably and pursued other ventures, while Jobs resigned bitterly and founded NeXT, taking some Apple employees with him. As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as “Wintel”). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple’s unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the “Think different” campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company’s product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook. Apple became the first publicly traded U.S. company to be valued at over $1 trillion in August 2018, then $2 trillion in August 2020, and $3 trillion in January 2022. As of January 2023, it was valued at around $2.2 trillion. The company receives criticism regarding the labor practices of its contractors, its environmental practices, and its business ethics, including anti-competitive practices and materials sourcing. Nevertheless, the company has a large following and enjoys a high level of brand loyalty. It is ranked as one of the world’s most valuable brands. ### Response:
1. Apple is an American multination technology company with revenues of over $394.3 billion as of 2022. 2. Apple is the 4th largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer. 3. Apple was founded by Steve Jobs, Steve Wozniak, and Rold Wayne on April 1, 1976. 4. Steve Jobs guided Apple to profitability through a number of tactics including introducing the iMac, iPod, iPhone, and Ipad when it was facing losses. 5. Apple was the first publicly traded US company to be valued at over $1 trillion in August 2018.
1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google
1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. 2. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. 3. As of March 2023, Apple is the world’s biggest company by market capitalization. 4. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. 5. It is one of the Big Five American information

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dialogsum dataset

Inputs
Ground Truth
Response from non-finetuned model
Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Wow, Amy, look at that number. #Person2#: What’s so great about it? #Person1#: What do you mean? It’s a scrim. #Person2#: Not to me, I wouldn’t be caught dead in that. #Person1#: I love that kind of stuff. #Person2#: Yeah, you seem to like those styles. #Person1#: Sure do. I get a kick out of looking punk. #Person2#: This one looks really expensive though. #Person1#: I don’t care how much it costs. I’m rolling in it. #Person2#: Lucky you! I can’t buy anything else. I’m flat broke. #Person1#: I got a replace my party outfit, I trashed it last week. #Person2#: Why did you drag me into this? I’ve spent a week salary. #Person1#: Are you on the level? You’re the one who called me, remember? #Person2#: Just joking, T. I’ve got some wonderful clothes. Let’s go and have a close look at your outfit. ### Response:
#Person1# trashed the party outfit so #Person1# needs a replacement. Amy jokes that she is broke. They then go to see the outfit.
#Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#:
 #Person1# and Amy are talking about the number of Amy’s outfit. #Person1# loves that kind of stuff and Amy is flat broke.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: So who does he think he is, anyway, I can’t believe the way that my boss has been ordering everyone around lately, I mean, it’s now like he is the CEO, or anything. #Person2#: Um. . . Actually I am guessing you didn’t get the MEMO. Your boss was selected to fill the vacancy in the CEO slot. He actually is the CEO now, or will be, as soon as the official press releases and hoopla is done with. #Person1#: Yikes! you are joking, right? I can’t believe it. No wonder he is so stressed and bossy lately. I thought he was just being ostentatious by giving orders to people and all the different departments. What big shoes to fill! #Person2#: No kidding! When the last CEO left, we were worried about the future of the company. #Person1#: What does a CEO do anyway? I know the CEO makes the most money in the company, but what does he actually do? #Person2#: He is responsible to the board of directors for everything that happens in the company. He or she must give leadership to all company officers, as CEO is also responsible for providing the guidance of philosophy of the company, and acting as official representative, or face of the company. #Person1#: Must be one smart guy. ### Response:
#Person1# complains about the style of CEO. #Person2# says that the boss is already a CEO candidate. #Person2# answers #Person1#’s question about the CEO’s function in a company.
#Person1#: I think you are right. I mean, I can’t believe that my boss is the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the
 #Person1# thinks #Person1#’s boss is being bossy lately. #Person2# tells #Person1# #Person2#’s boss is the CEO now. #Person1# thinks it’s a big shoes to fill.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What does a housing go for in south mountain district? #Person2#: It depends on the type of the house you’re looking for. #Person1#: We’re looking for a three-bedroom home. #Person2#: What’s your general price range? #Person1#: Under 300, 000 RIB, we suppose. #Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest. #Person1#: What about a two-bedroom house? #Person2#: A two-bedroom house in this area is generally around 250, 000 RIB. #Person1#: I see. #Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family. #Person1#: Okay, I will take it into consideration. ### Response:
#Person1# wants a three-bedroom home with a price under 300,000 RIB. #Person2# suggests buying a two-bedroom house for a one-child family with around 250, 000 RIB.
#Person1#: What does a housing go for in south mountain district? #Person2#: It depends on the type of the house you’re looking for. #Person1#: We’re looking for a three-bedroom home. #Person2#: What’s your general price range? #Person1#: Under 300, 000 RIB, we suppose. #Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest. #Person1#: What about a two-bedroom house? #Person2#: A two-bedroom house in this area is generally around 250, 000 RIB. #Person1#: I see. #Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family. #Person1#: Okay, I will take it into consideration.
#Person1# and #Person2# are looking for a three-bedroom home. #Person2# recommends #Person1# to buy a two-bedroom house.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Mr. Sharp, thank you for giving me this interview. So, my first question is, are you happy with your life? #Person2#: Am I happy? You may know you are the first person to have asked me this question directly. The others dared not because they were afraid I might say yes. They wanted me to say no so that they could make up stories about me. #Person1#: What kind of stories? #Person2#: Brian Sharpe, author of over 100 historical novels, lives in a lonely house 30 miles away from Sydney. But is he happy? #Person1#: Is that why you have refused many interviews? #Person2#: I’m tired of being told that I must be unhappy because I’m rich and successful. I’m not unhappy. #Person1#: Understood. And I’m wondering if you follow a similar pattern of writing. #Person2#: I know what you mean. Do I write similar stories? I say no. #Person1#: So, how do you create your heroes? #Person2#: My heroes live? They think for themselves. All their stories are drawn from life, my life, my friends’ lives, including my friends in many books. #Person1#: Sounds good to me. ### Response:
#Person1# interviews Mr. Sharp. Mr. Sharp tells #Person1# he’s happy and he doesn’t write similar stories. His stories are drawn from life.
#Person1#: I’m glad you’re happy. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe
Mr. Sharp is happy with his life. He tells #Person1# he doesn’t follow a similar pattern of writing. He thinks for himself and all his stories are drawn from life.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What’s the weather forecast for today? #Person2#: Cloudy in the morning, overcast in the afternoon. #Person1#: Any rain? #Person2#: No, but I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. ### Response:
#Person2# tells #Person1# the weather forecast for today.
#Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#:
#Person1# and #Person2# talk about the weather forecast.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dolly dataset

Inputs
Ground Truth
Response from non-finetuned model
Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Highlight relevant facts related to Heidelberg ### Input: Heidelberg (German: [Ààha…dléb…õ Åk] (listen); Palatine German: Heidlberg) is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region. Heidelberg University, founded in 1386, is Germany’s oldest and one of Europe’s most reputable universities. Heidelberg is a scientific hub in Germany and home to several internationally renowned research facilities adjacent to its university, including the European Molecular Biology Laboratory and four Max Planck Institutes. The city has also been a hub for the arts, especially literature, throughout the centuries, and it was designated a “City of Literature” by the UNESCO Creative Cities Network. Heidelberg was a seat of government of the former Electorate of the Palatinate and is a popular tourist destination due to its romantic cityscape, including Heidelberg Castle, the Philosophers’ Walk, and the Baroque old town. ### Response:
Heidelberg has the oldest University in Germany. The city is a scientific hub and home to several internationally renowned research facilities. It was designated a “City of Literature” by the UNESCO Creative Cities Network.
Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.
Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What is the Indianapolis 500? ### Input: The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date. ### Response:
The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date. The inaugural race was held in 1911 and was won by Ray Harroun. The event celebrated its 100th anniversary in 2011, and the 100th running was held in 2016. The event was put on hiatus twice, from 1917 to 1918 due to World War I and from 1942 to 1945 due to World War II. In two different periods, the race was part of FIA World Championships; between 1925 and 1928, the World Manufacturers’ Championship and between 1950 and 1960, the World Drivers’ Championship. Marcus Ericsson is the current champion. The most successful drivers are A. J. Foyt, Al Unser Sr., Rick Mears and H√©lio Castroneves, each of whom has won the race four times. The active driver with the most victories is H√©lio Castroneves. Rick Mears holds the record for most career pole positions with six. The most successful car owner is Roger Penske, owner of the Indianapolis Motor Speedway and Team Penske, which has 18 total wins and 18 poles. Penske also has five wins at the IndyCar Grand Prix, held on the combined road course. The event is steeped in tradition, in pre-race ceremonies, post-race celebrations, and race procedure. The most noteworthy and most popular traditions are the 33-car field lining up three-wide for the start, the annual singing of the chorus of “Back Home Again in Indiana,” and the victory lane bottle of milk. Also unique is that qualifying requires the driver to complete four, rather than one, timed laps. Qualifying has a separate weekend. The official attendance is not disclosed by Speedway management, but the permanent seating capacity is upwards of 250,000, and infield patrons raise the race-day attendance to approximately 300,000.
The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in
The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are some interesting facts about Nelson Mandel Boulevard described on this paragraph? ### Input: Nelson Mandela Boulevard (Persian: بلوار نلسون ماندلا) (old name: Jordan Street and Africa Boulevard) still known as Jordan is an affluent and upper-class district in northern Tehran, and some people draw similarities between Jordan District as being much alike Kensington District in London, UK as the area is a mixture of residential and commercial locale, filled with the homes and businesses of many politicians, diplomats, expatriates, and artists. Before the Iranian Revolution in 1979 it was called Jordan street, named after the American presbyterian missionary Samuel M. Jordan and used to be one of the most popular avenue in Tehran, Iran. Renamed Nelson Mandela Boulevard in recent years, it is amongst the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south–north axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. ### Response:
It holds a mixture of residential and commercial locale from upper-class citizens, like politicians, diplomats, expatriates and artists. It used to be called Jordan street before the Iranian revolution. It’s amongst the most famous streets of Tehran, experiencing regular traffic jams even at 2:00 AM during Summer.
Nelson Mandela Boulevard is a street in Tehran, Iran. It is one of the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south–north axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. The street is named after the South African anti-apartheid revolutionary and former President of South Africa, Nelson Mandela.
1. Nelson Mandela Boulevard is a street in northern Tehran, Iran. 2. It was named after the American presbyterian missionary Samuel M. Jordan. 3. It was renamed Nelson Mandela Boulevard in recent years. 4. It is amongst the most famous streets in north Tehran. 5. It is also famous for being one of the liveliest streets of Tehran.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the Wimbledon Championships? ### Input: The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. ### Response:
The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. Wimbledon is one of the four Grand Slam tennis tournaments, the others being the Australian Open, the French Open, and the US Open. Wimbledon is the only major still played on grass, the traditional tennis playing surface. Also, it is the only Grand Slam that retains a night-time curfew, though matches can now continue until 11.00 pm under the lights. The tournament traditionally takes place over two weeks in late June and early July, starting on the last Monday in June and culminating with the Ladies’ and Gentlemen’s Singles Finals, scheduled for the Saturday and Sunday at the end of the second week. Five major events are held each year, with additional junior and invitational competitions also taking place. In 2009, Wimbledon’s Centre Court was fitted with a retractable roof to lessen the loss of playing time due to rain. A roof was operational over No. 1 Court from 2019, when a number of other improvements were made, including adding cushioned seating, a table and 10 independently operable cameras per court to capture the games. Wimbledon traditions include a strict all-white dress code for competitors, and royal patronage. Strawberries and cream are traditionally consumed at the tournament. Unlike other tournaments, advertising is minimal and low key from official suppliers such as Slazenger and Rolex. The relationship with Slazenger is the world’s longest-running sporting sponsorship, providing balls for the tournament since 1902. Due to the COVID-19 pandemic, 2020 Wimbledon was cancelled, the first cancellation of the tournament since World War II. The rescheduled 134th edition was staged from 28 June 2021 to 11 July 2021, following from the 2020 cancellation. The 135th edition was played between 27 June 2022 and 10 July 2022, and regularly scheduled play occurred on the middle Sunday for the first time. It marks the centenary of the inaugural championships staged at the Centre Court. The ATP, ITF, and WTA did not award ranking points for the 2022 tournament, due to controversy over the tournament excluding players representing Russia and Belarus. The 2023 Wimbledon Championships will be the 136th staging and will run from 3 July 2023 to 16 July 2023 and it will be the first event of King Charles III since the death of the former patron, Queen Elizabeth II on 8 September 2022.
The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. The tournament consists of five men’s singles events and five women’s singles events. The men’s singles is the most prestigious of the five men’s singles events, and
The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the different type of coats of dogs? ### Input: The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors, anxiety behaviors, fear of noise, and fear of unfamiliar people or animals. ### Response:
The coat of a dog varies depending on its breed. Typically a dog could have either a “double” coat or a “single” coat. “Double” coats are originated from breeds that live in colder climates. This type of coat consist of a coarse guard hair and a soft down hair. Dogs with “single” coat only has the topcoat.
The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors
There are two types of coats of dogs: double and single. Double coats are made up of a coarse guard hair and a soft down hair. Single coats only have a topcoat.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dialogsum dataset

Inputs
Ground Truth
Response from non-finetuned model
Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Farmington? Would you take a look at these catalogue ads that Roger did? I don’t think we can use them. #Person2#: Hmm. I see what you mean. Have you spoken with Roger? He usually does better work than this. #Person1#: I know, Mr. Farmington. But, I’m not sure that I should bring this up with him. I think it would be better if you spoke to him. #Person2#: All right. Leave these ads with me. I’ll speak to him this morning. ### Response:
#Person1# and Farmington feel dissatisfied with the ads that Roger did. Farmington will speak to Roger.
The conversation between Person1 and Person2 revolves around a set of catalogue ads that Person1 has found. Person1 expresses their concerns about the quality of the ads, suggesting that they may not be suitable for use. Person2 agrees with Person1’s concerns and suggests that they should speak with Roger, the person responsible for creating the ads. Person2 offers to take a look at the ads and speak with Roger in the morning, indicating that they will address the issue. The conversation is focused on finding a solution to the problem with the catalogue ads.
#Person1# thinks the catalogue ads are not good. Mr. Farmington will speak to Roger.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: May I help you? #Person2#: My daughter. She is missing. I don’t know where she is. #Person1#: What dose she look like? #Person2#: She has blond hair and blue eyes. #Person1#: What was she wearing? #Person2#: She has a yellow dress on and red sneakers. #Person1#: When did you last see her? #Person2#: I just saw her down the street. I don’t know where she is. I don’t know. . . #Person1#: How long has it been? #Person2#: Oh, it’s been uh. . . fifteen minutes. ### Response:
#Person2# comes to #Person1# to report that #Person2#’s daughter is missing. #Person1# asks about her appearance and whereabouts.
The conversation between Person1 and Person2 revolves around a missing child. Person2 reports that their daughter is missing, and Person1 offers to help. Person2 describes the child’s appearance, including her blond hair, blue eyes, yellow dress, and red sneakers. Person1 asks about the last time Person2 saw the child and when it was. Person2 mentions that they saw the child down the street, but are unsure of her current location. Person1 presses for more information, asking how long it has been since Person2 last saw the child. Person2 estimates that it has been 15 minutes. The conversation is a search for information and a possible lead to locate the missing child.
#Person2# tells #Person1# that #Person2#’s daughter is missing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Hey, Ray, what are you doing right now? #Person2#: Not much. Joann. Do you want to hang out? #Person1#: Yes, I do. I’m at home myself right now with nothing much to do. #Person2#: Me, too. What would you like to do? #Person1#: Well, we could go to a movie. Have you see Shrek 2? #Person2#: I have, actually. How about the movie, Million Dollar Baby with Clint Eastwood? #Person1#: Hmm. . . I’Ve seen that, too. We could go for a walk in the park. #Person2#: We could, but it looks like it’s going to rain soon. #Person1#: I guess that’s out. Why don’t we go shopping? #Person2#: I’d really rather not. I’ll be too tempted to buy something that I can’t afford! #Person1#: Ok. I guess we’d be not do that, either, then. What do you want to do? #Person2#: I don’t know. Do you want to go bowling at the new bowling alley on 1st street? #Person1#: I would like to, but I injured my wrist last weekend. #Person2#: Let’s see. Do you want to go to a bar and sing some karaoke? #Person1#: That sounds interesting, but I can’t drink alcohol with the medicine I’m taking. #Person2#: Doesn’t sound like that’s be fun, then. Do you want to come here? #Person1#: I don’t have a car. #Person2#: Goodness, we’re not having much luck here, are we? #Person1#: No. do you have a car? #Person2#: No, but I could take the bus. #Person1#: Oh, good. Why don’t you take the bus to mine and we could play Mah Tiang? #Person2#: That sounds great! Have you eaten? #Person1#: Not yet. I’ll cook dinner and have it ready by the time you get here. #Person2#: Perfect. Are you prepared to give me all your money? #Person1#: Huh? We are going to play for money, aren’t we? #Person2#: Oh, right. I suppose so. As long as you promise to lose! #Person1#: Hey. . . as long as I’m not alone on a Saturday night, I don’t really care what I do! #Person2#: See you soon. #Person1#: Bye! ### Response:
Ray and Joann have nothing much to do at home and decide to go out. They discuss some choices, including watching a movie, going shopping, and going to a bar. But finally, Ray decides to go to Joann’s home. They will have dinner and play Mah Tiang together.
Ray and Joann were discussing what to do on a Saturday night. They initially considered going to a movie, but decided against it because of the rain. They also ruled out going shopping because Joann was worried about overspending. Ray suggested bowling, but Joann’s injured wrist made that impossible. They then considered going to a bar to sing karaoke, but Ray couldn’t drink due to his medication. Joann suggested coming to Ray’s place, but he didn’t have a car. They decided to play Mahjong instead, and Joann asked if Ray had eaten. Finally, they agreed to play a game of Mahjong for money, with Joann hoping to win. The conversation ended with Ray joking that he didn’t care what he did as long as he wasn’t alone on a Saturday night.
#Person1# and Ray are looking for something to do. They try to decide on a movie, a walk, shopping, bowling, karaoke, and Mah Tiang. Finally, they decide to play Mah Tiang.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: The weather in Beijing is just beautiful! #Person2#: Autumn is the best season in Beijing, you know. #Person1#: Yes, I can see. The summer heat is over, and the winter cold is still far away. #Person2#: What is the weather like in Washington at this time of this year? #Person1#: Autumn is also very nice in Washington. #Person2#: Is it very cold in winter there? #Person1#: No, it isn’t. But sometimes it snows a lot in winter. #Person2#: Does it rain a lot in summer there? #Person1#: Oh yes! Sometimes it rains very heavily. How about here in Beijing? #Person2#: There’s a lot of rain in summer, but it seldom rains in autumn. #Person1#: I feel it’s rather dry here. #Person2#: That’s true. Many people from the south don’t like the weather here. #Person1#: Are you from the south? #Person2#: No, I’m from the north. I was born in Beijing, actually. #Person1#: I see. so this is your hometown? ### Response:
#Person2# describes the weather of summer, autumn, and winter in Beijing and #Person1# talks about how the weather is like in Washington.
Person1 and Person2 are discussing the weather in Beijing and Washington. They both agree that autumn is a great season in both cities. Person1 mentions that the summer heat is over and the winter cold is still far away, which suggests that the weather in Beijing is mild in autumn. Person2 asks about the weather in Washington, and Person1 responds that autumn is also nice there. They discuss the weather in Washington, with Person2 asking about the cold in winter and Person1 saying it’s not very cold, but sometimes it snows. Person2 then asks about the rain in summer, and Person1 says it rains heavily. Person2 mentions that Beijing has a lot of rain in summer, but not in autumn, which Person1 agrees with. Person1 mentions that Beijing is dry in autumn, and Person2 says that many people from the south don’t like the weather there. Person1 asks if Person2 is from the south, and Person2 responds that they are from the north, born in Beijing. Person1 then asks if Beijing is Person2’s hometown, and Person2 confirms that it is. Note: The response is a summary of the conversation, not a rephrased version of the original text. It aims to capture the main points and ideas discussed between the two people. Here is the response in a more formal tone: The conversation between Person1 and Person2 revolves around the weather in Beijing and Washington. They both appreciate the beauty of autumn in both
#Person1# and #Person2# talk about the weather in Beijing and Washington. #Person2# is from the north and was born in Beijing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What are you doing this weekend? #Person2#: My brother-in-law is having a small get-together at his house and he invited me. #Person1#: Is it a family thing or just friends? #Person2#: A bit of both. Some cousins, aunts and uncles will be there, but also some friends from the neighborhood. #Person1#: Is your great uncle Rick going to be there? He is really funny. #Person2#: Yeah he is going to be there with his step-son and his ex-wife. #Person1#: You mean your sister? #Person2#: No, Rick is actually my great uncle, so he is my grandmother’s brother. #Person1#: You lost me. #Person2#: I’ll explain later, let’s go. ### Response:
#Person2#’s brother-in-law invites #Person2# to attend a small get-together with #Person2#’s relatives and friends. #Person1# is confused about #Person2#’s family relationship.
Person1 and Person2 are discussing a weekend gathering at Person2’s brother-in-law’s house. The event is a mix of family and friends, with some cousins, aunts, uncles, and neighbors attending. Person1 asks if Person2’s great uncle Rick, who is known for being funny, will be there. Person2 confirms that Rick will be attending, along with his step-son and ex-wife, who is Person2’s sister. Person1 seems confused about the family relationship, and Person2 agrees to explain later. The conversation is light-hearted and casual, with a focus on the upcoming social gathering.
#Person2# will attend a small get-together at #Person2#’s brother-in-law’s house. #Person1# asks #Person2# about the people who will be there.

Dataset formatting
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can switch to one of the training methods by specifying the parameter instruction_tuned as True or False.
Domain adaption format
The text generation model can be fine-tuned on any domain-specific dataset to incorporate domain-specific knowledge and language patterns. After fine-tuning on the domain-specific dataset, the model is expected to generate more relevant and accurate text within that domain. Although few-shot prompting can also guide the model towards domain-specific generation, the fine-tuning process plays a crucial role in adapting the model’s understanding and generation capabilities to the target domain. The combination of fine-tuning on domain data and effective prompting techniques can enable the model to perform various NLP tasks within that specific domain more effectively.
For input to the model, use a training and optional validation directory. Each directory contains a CSV, JSON, or TXT file. For CSV and JSON files, the train or validation data is used from the column called text or the first column if no column called text is found. The number of files under train and validation (if provided) should equal to 1, respectively.
The output is a trained model that can be deployed for inference.
The following is an example of a TXT file for fine-tuning the text generation model. The TXT file is SEC filings of Amazon from 2021–2022:

This report includes estimates, projections, statements relating to our business plans, objectives,
and expected operating results that are “forward- looking statements” within the meaning of the Private
Securities Litigation Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear throughout this report,
including the following sections: “Business” (Part I, Item 1 of this Form 10-K), “Risk Factors”
(Part I, Item 1A of this Form 10-K), and “Management’s Discussion and Analysis of Financial Condition
and Results of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking statements
generally are identified by the words “believe,” “project,” “expect,” “anticipate,” “estimate,”
“intend,” “strategy,” “future,” “opportunity,” “plan,” “may,” “should,” “will,” “would,”
“will be,” “will continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject to
risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and
events to differ materially in “Risk Factors,” “Management’s Discussion and
Analysis of Financial Condition and Results of Operations,” and “Quantitative
and Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form 10-K).
Readers are cautioned not to place undue reliance on forward-looking statements,
which speak only as of the date they are made. We undertake no obligation
to update or revise publicly any forward-looking statements, whether because
of new information, future events, or otherwise. GENERAL Embracing Our Future …

Instruction fine-tuning
The text generation model can be instruction-tuned on any text data provided that the data is in the expected format. The instruction-tuned model can be further deployed for inference. By default, instruction tuning is set to false. Therefore, to use an instruction tuning dataset, you use instruction_tuned=”True”.
For input, you can use a training and optional validation directory. The training and validation directories should contain one or multiple JSON lines (.jsonl) formatted files. In particular, the train directory can also contain an optional *.json file describing the input and output formats.
The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.
The training data must be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder; however, it can be saved in multiple .jsonl files. The .jsonl file extension is mandatory. The training folder can also contain a template.json file describing the input and output formats. If no template file is given, the following template will be used:

{
    “prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn”,
    “completion”: “{response}”
}

In this case, the data in the JSON lines entries must include prompt and completion fields. If a custom template is provided, it must also use prompt and completion keys to define the input and output templates. The following is a sample custom template:

{
    “prompt”: “question: {question} context: {context}”,
    “completion”: “{answer}”
}

Here, the data in the JSON lines entries must include the question, context, and answer fields.
The output is a trained model that can be deployed for inference.
We provide a subset of SEC filings data of Amazon. It is downloaded from publicly available EDGAR. For instructions on accessing the data, refer to Accessing EDGAR Data.
License: Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)

Discover insights with the Amazon Q Business Microsoft Teams connector

Microsoft Teams is an enterprise collaboration tool that allows you to build a unified workspace for real-time collaboration and communication, meetings, and file and application sharing. You can exchange and store valuable organizational knowledge within Microsoft Teams.
Microsoft Teams data is often siloed across different teams, channels, and chats, making it difficult to get a unified view of organizational knowledge. Also, important information gets buried in lengthy chat threads or lost in channel backlogs over time.
You can use Amazon Q Business to solve those challenges. Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.
Integrating Amazon Q with Microsoft Teams enables you to index all disparate data into a single searchable repository. You can use natural language capabilities to ask questions to surface relevant insights from Microsoft Teams data. With Amazon Q, you don’t have to constantly switch between different Microsoft Teams workspaces and apps to find information. You can query for Microsoft Teams data alongside other enterprise data sources from one interface with proper access controls.
In this post, we show how to connect your Microsoft Teams with Amazon Q using the Amazon Q Business Microsoft Teams connector. We also walk through the connector’s capabilities and common challenges faced when setting it up.
Overview of the Amazon Q Business Microsoft Teams connector
A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. When you use the data source connector, Amazon Q will have its own index where you can add and sync documents. The document is a unit of data, and how to count a document varies by connector. Amazon Q automatically maps built-in fields to attributes in your data source when it crawls and index documents. If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, custom field mappings can help you specify how a data source attribute maps to your Amazon Q application. For a Microsoft Teams data source, Amazon Q supports the following document types:

Chat messages – Each chat message is a single document
Chat attachments – Each chat attachment is a single document
Channel posts – Each channel post is a single document
Channel wikis – Each channel wiki is a single document
Channel attachments – Each channel attachment is a single document
Meeting chats – Each meeting chat is a single document
Meeting files – Each meeting file is a single document
Meeting notes – Each meeting note is a single document
Calendar meeting (meeting detail) – Each calendar meeting is a single document

Refer to Microsoft Teams data source connector field mappings for which fields are supported for each supported data type. You can also see Supported document formats in Amazon Q Business to understand which documents formats (such as CSV and PDF) are supported for files.
The Amazon Q Business Microsoft Teams connector supports OAuth 2.0 with Client Credentials Flow to authenticate Amazon Q to access your Microsoft Teams instance. Amazon Q requires your Microsoft Teams client ID and client secret to be stored in AWS Secrets Manager.
Amazon Q crawls access control lists (ACLs) and identity information for authorization. Amazon Q indexes the ACL information that’s attached to a document along with the document itself. The information includes the user email address and the group name for the local group or federated group. Then, Amazon Q filters chat responses based on the end-user’s access to documents. Your Amazon Q users can only access to the documents that they have permission to access in Microsoft Teams. An Amazon Q Business connector updates the changes in the ACLs each time your data source content is crawled.
Overview of solution
The following diagram illustrates the solution architecture. In our solution, we configure Microsoft Teams as a data source for an Amazon Q application using the Amazon Q Business Microsoft Teams connector. Amazon Q uses credentials stored in Secrets Manager to access to Microsoft Teams. Amazon Q crawls and indexes the documents and ACL information. The user is authenticated by AWS IAM Identity Center. When user submits a query to the Amazon Q application, Amazon Q retrieves the user and group information and provides answers based on documents that the user has access to.

Prerequisites
Before you set up the Amazon Q Business Microsoft Teams connector, complete the following prerequisite steps in Microsoft Teams.
First, prepare Microsoft users that have the Microsoft Teams license attached. You can achieve this though the Microsoft 365 admin center and referring to Assign licenses by using the Licenses page. If you don’t have Microsoft user account yet, see Add users and assign licenses at the same time.
Next, prepare the Microsoft 365 tenant ID and OAuth 2.0 credentials containing a client ID, client secret, user name, and password, which are required to authenticate Amazon Q to access Microsoft Teams.

Create a Microsoft Teams account in Microsoft 365. For instructions, refer to How do I get Microsoft Teams?
Register an application in the Microsoft Azure Portal:

Log in to the Microsoft Azure Portal with your Microsoft credentials.
On the App registrations page, choose New Registration to register an application. For instructions, refer to Quickstart: Register an application with the Microsoft identity platform.
Copy your Microsoft 365 tenant ID and client ID. You can find them on the overview page of your application.

Create your credentials:

In the Certificates & secrets section of your application page, choose New Client Secret.
Complete the Description and Expires fields and choose Add.
Save the secret ID and secret value to use them later when you configure the Amazon Q Business Microsoft Teams connector.

Make sure you saved the secret value before moving on to other pages. The value is only visible when you create the secret.

Add necessary permissions:

In the API Permissions section of your application page, choose Add a Permission.
Choose Microsoft Graph to add the necessary permissions
Select your necessary permissions. Refer to Prerequisites for connecting Amazon Q Business to Microsoft Teams for the list of required permissions for Amazon Q to access each document type of Microsoft Teams. Also, review Microsoft Graph permissions reference to understand the scope of each permission.
Choose Add permissions, and confirm that you successfully added the necessary permissions.

After you successfully configure the application in the Azure AD portal, you can add some test data in your Microsoft Teams account:

Log in to Microsoft Teams with your Microsoft Teams user account.
Add some sample data in the Microsoft Teams chat, calendar, and wiki.

The following screenshot shows an example of information added to the Microsoft Teams chat.

The following screenshot shows an example of information added to the Microsoft Teams calendar.

Create an Amazon Q Business application
An Amazon Q application is the primary resource that you will use to create a chat solution. Complete the following steps to create the application:

On the Amazon Q Business console, choose Applications in the navigation pane.
Choose Create application.
For Application name, enter a name for your application.
For Access management method, choose AWS IAM Identity Center
For Quick start user, choose users you will give access to this application:

If users are not created yet in your IAM Identity Center, choose Add new users and groups, and Add and assign new users.
Choose Add new users; enter values for Username, First name, Last name, and Email address; and choose Next. This user name must be the same as your Microsoft Teams user name.
Choose Add, then Assign

For Select subscription, choose your preferred Amazon Q subscription plan for users. For this post, we choose Q Business Lite. Refer to Amazon Q Business pricing to understand the differences between Q Business Lite and Q Business Pro.
For Application details, leave it as the default setting.
Choose Create.

Create and configure a Microsoft Teams data source
Complete the following steps to set up your data source:

Choose Data sources in the navigation pane on your application page.
Choose Select retriever:

For Retrievers, choose Native
For Index provisioning, choose the model that fits your application needs. For this post, choose Starter.
For Number of units, enter 1. Each unit is 20,000 documents or 200 MB, whichever comes first. Refer to the document type table discussed in the solution overview to understand how a document is counted for Microsoft Teams data, and set the appropriate units for the data volume of your Microsoft Teams account.
Choose Confirm

Choose Add data source on the Data sources page
Choose Microsoft Teams
In the Name and description section, enter a name and description for your data source.
In the Source section, for Tenant ID, enter the tenant ID you saved in the prerequisite steps. Your Microsoft tenant ID is different from your organization name or domain.
In the Authorization section, for Manage ACLs, choose Enable ACLs.

After you enable ACLs, the data source needs to be deleted and recreated to disable ACLs.

In the Authentication section, for AWS Secrets Manager secret, choose your Secrets Manager secret that stores your Microsoft Teams client ID and client secret. If you don’t have one, choose Create and add new secret and provide that information.
For Payment model, choose a licensing and payment model for your Microsoft Teams account.

Some Microsoft Teams APIs in Microsoft Graph can choose a licensing and payment model using the model query parameter. Refer to Payment models and licensing requirements for Microsoft Teams APIs for more details.

In the Configure VPC and security group section, choose your resources if you want to use a virtual private cloud (VPC).
In the IAM role section, create a new service role to access your repository credentials and index content or choose an existing IAM role.
In the Sync scope section, provide the following information to configure the sync scope for your setup. These settings will significantly affect your crawling and indexing time.

For Sync contents, select the content to sync.
Enter a value for Maximum file size.

Under Additional configuration, provide the following optional information:

For Calendar crawling, enter the date range for which the connector will crawl your calendar content.
For User email, enter the user emails you want to include in your application.
For Team names, add patterns to include or exclude teams found in Microsoft Teams from your application.
For Channel names, add patterns to include or exclude channels found in Microsoft Teams from your application.
For Attachment regex patterns, add regular expression patterns to include or exclude certain attachments for all supported entities. You can add up to 100 patterns.

In the Sync mode section, select how you want to update your index when your data source content changes. We recommend using New, modified, or deleted content sync to only sync new, modified, or deleted content, and shorten the time of the data sync.
In the Sync run schedule section, choose how often Amazon Q will sync with your data source. For details, see Sync run schedule.
In the Tags section, you can add tags optionally.
Choose Add data source
Navigate to Data source details and choose Sync now to begin crawling and indexing data from your data source.

When the sync job finishes, your data source is ready to use.
Run sample queries
When your data sync is complete, you can run some queries though the Amazon Q web experience.

On the application details page, navigate to the Web experience settings section and choose the link for Deployed URL.
Sign in with your IAM Identify Center user name and password (plus multi-factor authentication codes if you configured them). If this is your first time logging in, find the invitation email in your inbox and set up a password by following the instructions in the prompt.
Enter your queries in the Amazon Q prompt.

The following screenshots show some example queries.
Index aggregated Teams channel posts
With the recent enhancement, Amazon Q Business can now aggregate channel posts as a single document. This allows you to increase accuracy and maximize the use of an index unit.
The following screenshots show a channel post that takes the form of an original post by a user and other users responding, and a sample query for the information on the post. The Teams connector aggregates this post thread as a single document.

Troubleshooting and frequently asked questions
In this section, we discuss some common issues and how to troubleshoot.
Amazon Q Business isn’t answering any questions
The common reason is that your document hasn’t been indexed successfully or your Amazon Q user doesn’t have access to the documents. Review the error message in the Sync run history section in your data source details page. Amazon CloudWatch Logs are also available for you to investigate the document-level errors. For the user permission, make sure you logged in with the correct Amazon Q user. Check if the user name matches the user name in Microsoft Teams. If you still see the issue, open an AWS Support case to further investigate your issue.
The connector is unable to sync or the document isn’t indexed
This could happen due to a few reasons. A synchronization job typically fails when there is a configuration error in the index or the data source. The following are common scenarios:

Your IAM role attached to your connector doesn’t have enough permission to access the required AWS services (for example, Secrets Manager). We recommend creating a new service role for your connector.
Your connector doesn’t have the correct credentials to access Microsoft Teams. Review the Microsoft tenant ID, client ID, and client secrets provided to your connector.
The payment and license model you chose for your connector doesn’t match the required license to call some Microsoft Teams APIs. Review your license and try different ones.
Your Amazon Q application has reached the maximum limit to ingest documents. Increase the number of units for index provisioning in your Amazon Q application.
Your Microsoft Graph API calls during your sync might have temporarily faced throttling limits on the number of concurrent calls to a service to prevent overuse of resources. Adjust your sync scope and sync mode of your data source connector to reduce the number of operations per request.

The data source contents are updated, but Amazon Q Business answers using old data
Your Amazon Q index might not have the latest data yet. Make sure you chose the right sync schedule. If you need to immediately sync the data, choose Sync now.
How to determine if the reason you can’t see answers is due to ACLs
Run the same query from two different users who have different ACL permissions in Microsoft Teams.
How to sync documents without ACLs
For the Microsoft Teams connector, you have the option to disable ACLs when you create a data source. When ACLs are disabled for a data source, all documents ingested by the data source become accessible to all end-users of the Amazon Q Business application. To turn off ACLs, you need to be granted the DisableAclOnDataSource IAM action. If this is disabled during creation, you can enable it at a later time. After you enable ACLs, it can’t be disabled. To disable ACLs, you need to delete and recreate the data source. Refer to Set up required permissions for more detail.
Clean up
To avoid incurring future charges, clean up any resources created as part of this solution.

Delete the Amazon Q Business Microsoft Teams connector so any data indexed from the source is removed from the Amazon Q application.
Remove users and unsubscribe the Amazon Q subscription if you created them for your testing.
If you created a new Amazon Q application for your testing, delete the application.

Conclusion
In this post, we discussed how to configure the Amazon Q Business Microsoft Teams connector to index chat, messages, wiki, and files. We showed how Amazon Q enables you to discover insights from your Microsoft Teams workspace quicker and respond your needs faster.
To further improve the search relevance, you can enable metadata search, which was announced on October 15, 2024. When you connect Amazon Q Business to your data, your data source connector crawls relevant metadata or attributes associated with a document. Amazon Q Business can now use the connector metadata to get more relevant responses for user queries. Refer to Configuring metadata controls in Amazon Q Business for more details. You can also use the metadata boosting feature. This allows you to fine-tune the way Amazon Q prioritizes your content to generate the most accurate answer.
To learn more about the Amazon Q Business Microsoft Teams connector, refer to Connecting Microsoft Teams to Amazon Q Business. We also recommend reviewing Best practices for data source connector configuration in Amazon Q Business.

About the Author
Genta Watanabe is a Senior Technical Account Manager at Amazon Web Services. He spends his time working with strategic automotive customers to help them achieve operational excellence. His areas of interest are machine learning and artificial intelligence. In his spare time, Genta enjoys spending quality time with his family and traveling.

Salesforce AI Research Introduces Moirai-MoE: A MoE Time Series Founda …

Time series forecasting has long been integral to finance, healthcare, meteorology, and supply chain management. Its main objective is to predict future data points based on historical observations, which can be challenging due to the complex and varying nature of time series data. Recent advancements in machine learning, particularly foundation models, have transformed this domain by creating generalized models capable of handling various time series without specialized, case-specific training. These foundation models mark a significant shift from traditional approaches that required multiple models tailored to specific datasets. However, the diversity in time series characteristics, such as variations in frequency, seasonality, and underlying patterns, continues to present substantial challenges for unified model training.

A key problem in time series forecasting is handling data heterogeneity effectively. Time series data from different sources vary significantly regarding frequency, distribution, and structure. Current forecasting models often rely on human-defined frequency-based specialization to address this diversity. However, frequency alone is not a reliable indicator of a time series pattern, as data with similar frequencies may exhibit distinct behaviors. Conversely, data with different frequencies may display similar patterns. This approach must capture the complexity and diversity inherent in real-world time series. Another challenge lies in the non-stationary nature of time series data, where the statistical properties of the data change over time, making it difficult to model accurately with frequency-based grouping.

Existing time series forecasting methods attempt to address data variability with varied approaches. For instance, models such as TEMPO and UniTime incorporate language-based prompts to help the model discern different data sources, achieving limited dataset-level specialization. Other models, like TimesFM, maintain frequency-specific embedding dictionaries to aid in distinguishing between data types based on frequency. However, many models, including the widely recognized Chronos series, opt for a generalized structure without specialized modules, increasing model complexity and large parameter demands. The challenge with these methods is their inability to fully capture the diverse nature of time series data, as frequency alone only sometimes correlates with underlying data patterns, leading to inefficiencies and compromised model accuracy.

Researchers from Salesforce AI Research, the National University of Singapore, and the Hong Kong University of Science and Technology introduced an innovative model called MOIRAI-MoE. MOIRAI-MoE integrates a sparse mixture of experts (MoE) within its Transformer architecture, allowing token-level specialization without human-defined frequency heuristics. This data-driven approach minimizes dependency on predefined frequency-based layers and uses a single input/output projection layer, enabling the model to automatically capture and represent diverse patterns. By achieving token-level specialization, MOIRAI-MoE provides a more flexible and efficient solution capable of better representing the unique characteristics of varied time series data without requiring distinct models for each frequency category.

MOIRAI-MoE’s architecture leverages a gating function that assigns each token to an appropriate expert within the Transformer layers based on token clustering derived from a pretrained model. This clustering approach is guided by the Euclidean distance to centroids, allowing tokens with similar patterns to be processed by the same expert while specialized experts handle diverse tokens. By incorporating 32 expert networks, each focusing on unique time series characteristics, MOIRAI-MoE effectively reduces computational overhead while enhancing its ability to generalize across different data types. This approach enables MOIRAI-MoE to excel in representing non-stationary time series data by dynamically adapting to pattern shifts within the data.

Extensive testing across 39 datasets demonstrated the superior performance of MOIRAI-MoE in both in-distribution and zero-shot forecasting scenarios. For in-distribution forecasting, MOIRAI-MoE outperformed its dense model counterpart by up to 17%, showcasing a significant improvement in accuracy while utilizing up to 65 times fewer activated parameters than other leading models, including TimesFM and Chronos. In zero-shot forecasting, where the model was tested on datasets not included in the training data, MOIRAI-MoE’s performance surpassed traditional models. In these tests, MOIRAI-MoE achieved a 3-14% improvement in continuous ranked probability score (CRPS) and an 8-16% improvement in mean absolute scaled error (MASE) over prior models. These results underscore the model’s robust generalization ability without requiring task-specific training.

This research presents key takeaways that highlight the advancements MOIRAI-MoE brings to time series forecasting:

Data-Driven Specialization: By achieving token-level specialization through a sparse mixture of experts, MOIRAI-MoE overcomes the limitations of human-defined frequency specialization, allowing for a more nuanced representation of time series diversity.

Computational Efficiency: The model’s sparse expert activation drastically reduces computational demands, achieving up to 65 times fewer activated parameters while maintaining high accuracy.

Performance Gains: Testing on diverse datasets confirmed that MOIRAI-MoE surpasses dense models and foundational models like TimesFM and Chronos, achieving a 17% improvement over dense counterparts in in-distribution tests.

Scalability and Generalization: MOIRAI-MoE demonstrates strong zero-shot performance, making it highly applicable to real-world forecasting tasks without requiring specialized training for each application, which is critical in diverse applications like finance, healthcare, and climate modeling.

In conclusion, MOIRAI-MoE represents a major advancement in time series forecasting by introducing a flexible, data-driven approach that overcomes the limitations of frequency-based specialization. With its sparse mixture of expert architecture, MOIRAI-MoE addresses the diverse and non-stationary nature of time series data and achieves significant computational efficiency and performance gains. This novel approach underscores the potential of token-level specialization, paving the way for future improvements in time series foundation models and expanding the utility of zero-shot forecasting across various industries and applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post Salesforce AI Research Introduces Moirai-MoE: A MoE Time Series Foundation Model that Achieves Token-Level Model Specialization Autonomously appeared first on MarkTechPost.

SambaNova and Hugging Face Simplify AI Chatbot Integration with One-Cl …

The deployment of AI chatbots has long been a significant challenge for organizations, particularly for those without the necessary technical expertise or infrastructure to support advanced AI models. Developing AI chatbots requires training complex models, managing cloud resources, optimizing inference, and maintaining compatibility across platforms. As a result, many businesses find themselves either compromising on performance or outsourcing their AI projects, both of which can be costly and time-consuming. The high barrier to entry has made AI chatbot deployment particularly challenging for small to medium-sized enterprises (SMEs) and individual developers, limiting the widespread adoption of conversational AI across industries.

One-Click Integration: Simplifying Deployment

SambaNova and Hugging Face are changing the AI chatbot landscape with their new one-click integration, designed to make deployment accessible to a broader audience. The collaboration between these two companies enables developers to seamlessly deploy advanced AI models for chatbots with minimal configuration and setup. This one-click integration aims to streamline the process of getting an AI chatbot up and running, reducing complexity, costs, and the need for extensive technical knowledge. By combining SambaNova’s expertise in hardware acceleration and Hugging Face’s extensive collection of pre-trained models, this integration provides a holistic solution that addresses multiple pain points in the chatbot deployment process.

Technical Details and Benefits

At the core of this integration lies a powerful collaboration that blends SambaNova’s Reconfigurable Dataflow Architecture (RDA) with Hugging Face’s open-source AI models and tools. The technical aspect revolves around providing optimized hardware infrastructure through SambaNova’s DataScale system, which is well-suited for AI workloads, along with Hugging Face’s model repository. With the click of a button, developers can deploy advanced models like GPT-3 or Bloom directly to a scalable and efficient environment without worrying about the underlying infrastructure. The integration simplifies deployment and ensures that the models run efficiently on high-performance systems, thereby improving inference speeds and enhancing the user experience. Furthermore, the collaboration allows developers to leverage SambaNova’s support for large-scale model training while benefiting from Hugging Face’s popular transformers library, which is known for its user-friendly interface.

The Importance of One-Click Integration

This one-click integration is significant for several reasons. It allows developers—even those with limited AI experience—to quickly build and deploy sophisticated conversational agents without getting bogged down in infrastructure details. Early reports from developers using this solution indicate substantial time savings, with some citing deployment processes being reduced from weeks to mere hours. This ease of deployment also translates into quicker iterations and improvements, enabling businesses to be more agile in their chatbot strategies. For enterprises that rely on customer interaction, the reduced complexity and increased speed of deploying chatbots can enhance customer service and drive engagement. Additionally, the availability of pre-trained models from Hugging Face means developers can tailor chatbot behavior to their specific needs with relatively little customization, further boosting the accessibility of AI tools.

Getting Started with One-Click Integration

For developers looking to try out the service, the process is quite simple. Start by visiting SambaNova Cloud’s API website to obtain an access token. Next, use Python to execute the following three lines of code:

import gradio as gr
import sambanova_gradio
gr.load(“Meta-Llama-3.1-70B-Instruct-8k”, src=sambanova_gradio.registry, accept_token=True).launch()

The final step involves clicking “Deploy to Hugging Face” and entering the SambaNova token. In just a few seconds, a fully functional AI chatbot will be available on Hugging Face’s Spaces platform, ready for use.

Conclusion

In conclusion, the partnership between SambaNova and Hugging Face marks a significant step forward in democratizing AI chatbot technology. The one-click integration they have introduced makes the deployment of powerful chatbots feasible for a much wider range of users, from individual developers to large enterprises. By reducing technical barriers and leveraging powerful, optimized infrastructure, SambaNova and Hugging Face are pushing the boundaries of what’s possible with conversational AI, encouraging further innovation and enabling more organizations to benefit from advanced AI solutions.

Check out the Source and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post SambaNova and Hugging Face Simplify AI Chatbot Integration with One-Click Deployment appeared first on MarkTechPost.

Cerebras Systems Revolutionizes AI Inference: 3x Faster with Llama 3.1 …

Artificial Intelligence (AI) continues to evolve rapidly, but with that evolution comes a host of technical challenges that need to be overcome for the technology to truly flourish. One of the most pressing challenges today lies in inference performance. Large language models (LLMs), such as those used in GPT-based applications, demand a high volume of computational resources. The bottleneck occurs during inference—the stage where trained models generate responses or predictions. This stage often faces constraints due to the limitations of current hardware solutions, making the process slow, energy-intensive, and cost-prohibitive. As models become larger, traditional GPU-based solutions are increasingly falling short in terms of both speed and efficiency, limiting the transformative potential of AI in real-time applications. This situation creates a need for faster, more efficient solutions to keep pace with the demands of modern AI workloads.

Cerebras Systems Inference Gets 3x Faster! Llama 3.1-70B at 2,100 Tokens per Second

Cerebras Systems has made a significant breakthrough, claiming that its inference process is now three times faster than before. Specifically, the company has achieved a staggering 2,100 tokens per second with the Llama 3.1-70B model. This means that Cerebras Systems is now 16 times faster than the fastest GPU solution currently available. This kind of performance leap is akin to an entire generation upgrade in GPU technology, like moving from the NVIDIA A100 to the H100, but all accomplished through a software update. Moreover, it is not just larger models that benefit from this increase—Cerebras is delivering 8 times the speed of GPUs running the much smaller Llama 3.1-3B, which is 23 times smaller in scale. Such impressive gains underscore the promise that Cerebras brings to the field, making high-speed, efficient inference available at an unprecedented rate.

Technical Improvements and Benefits

The technical innovations behind Cerebras’ latest leap in performance include several under-the-hood optimizations that fundamentally enhance the inference process. Critical kernels such as matrix multiplication (MatMul), reduce/broadcast, and element-wise operations have been entirely rewritten and optimized for speed. Cerebras has also implemented asynchronous wafer I/O computation, which allows for overlapping data communication and computation, ensuring the maximum utilization of available resources. In addition, advanced speculative decoding has been introduced, effectively reducing latency without sacrificing the quality of generated tokens. Another key aspect of this improvement is that Cerebras maintained 16-bit precision for the original model weights, ensuring that this boost in speed does not compromise model accuracy. All of these optimizations have been verified through meticulous artificial analysis to guarantee they do not degrade the output quality, making Cerebras’ system not only faster but also trustworthy for enterprise-grade applications.

Transformative Potential and Real-World Applications

The implications of this performance boost are far-reaching, especially when considering the practical applications of LLMs in sectors like healthcare, entertainment, and real-time communication. GSK, a pharmaceutical giant, has highlighted how Cerebras’ improved inference speed is fundamentally transforming their drug discovery process. According to Kim Branson, SVP of AI/ML at GSK, Cerebras’ advances in AI are enabling intelligent research agents to work faster and more effectively, providing a critical edge in the competitive field of medical research. Similarly, LiveKit—a platform that powers ChatGPT’s voice mode—has seen a drastic improvement in performance. Russ d’Sa, CEO of LiveKit, remarked that what used to be the slowest step in their AI pipeline has now become the fastest. This transformation is enabling instantaneous voice and video processing capabilities, opening new doors for advanced reasoning, real-time intelligent applications, and enabling up to 10 times more reasoning steps without increasing latency. The data shows that the improvements are not just theoretical; they are actively reshaping workflows and reducing operational bottlenecks across industries.

Conclusion

Cerebras Systems has once again proven its dedication to pushing the boundaries of AI inference technology. With a threefold increase in inference speed and the ability to process 2,100 tokens per second with the Llama 3.1-70B model, Cerebras is setting a new benchmark for what’s possible in AI hardware. By focusing on both software and hardware optimizations, Cerebras is helping AI transcend the limits of what was previously achievable—not only in speed but also in efficiency and scalability. This latest leap means more real-time, intelligent applications, more robust AI reasoning, and a smoother, more interactive user experience. As we move forward, these kinds of advancements are critical in ensuring that AI remains a transformative force across industries. With Cerebras leading the charge, the future of AI inference looks faster, smarter, and more promising than ever.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post Cerebras Systems Revolutionizes AI Inference: 3x Faster with Llama 3.1-70B at 2,100 Tokens per Second appeared first on MarkTechPost.

Assembly AI Introduces Universal-2: The Next Leap in Speech-to-Text Te …

In recent years, Automatic Speech Recognition (ASR) technology has gained significant traction, transforming industries ranging from healthcare to customer support. However, achieving accurate transcription across diverse languages, accents, and noisy environments remains challenging. Current speech-to-text models often face issues like inaccuracies in understanding complex accents, handling domain-specific terminology, and dealing with background noise. The need for a more robust, adaptable, and scalable speech-to-text solution is evident, especially as the demand for such technology rises with the proliferation of AI-driven applications in day-to-day life.

Assembly AI Introduces Universal-2: A New Speech-to-Text Model with Major Improvements

In response to these challenges, Assembly AI has introduced Universal-2, a new speech-to-text model designed to offer significant improvements over its predecessor, Universal-1. This upgraded model aims to enhance transcription accuracy across a broader spectrum of languages, accents, and scenarios. Assembly AI’s Universal-2 leverages cutting-edge advancements in deep learning and speech processing, enabling a more nuanced understanding of human speech even in challenging conditions like poor audio quality or heavy background noise. According to Assembly AI, the release of Universal-2 is a milestone in their journey toward creating the most comprehensive and accurate ASR solution in the industry.

The Universal-2 model has been built on top of the previous version with substantial refinements in architecture and training methodologies. It introduces enhanced multilingual support, making it a truly versatile ASR solution capable of delivering high-quality results across various languages and dialects. One of the key differentiators of Universal-2 is its ability to maintain consistent performance even in low-resource settings, meaning that the model doesn’t falter when transcribing under less-than-ideal conditions. This makes it ideal for applications like call centers, podcasts, and multilingual meetings where speech quality can vary significantly. Additionally, Universal-2 is designed with scalability in mind, offering developers an easy integration experience with a wide array of APIs for rapid deployment.

Technical Details and Benefits of Universal-2

Universal-2 is based on an ASR decoder architecture called the Recurrent Neural Network Transducer (RNN-T). Compared to Universal-1, the model employs a broader training dataset, encompassing diverse speech patterns, multiple dialects, and varying audio qualities. This broader dataset helps the model learn to be more adaptive and precise, reducing the word error rate (WER) compared to its predecessor.

Moreover, the improvements in noise robustness allow Universal-2 to handle real-world audio scenarios more effectively. It has also been optimized for faster processing speeds, enabling near real-time transcription—a crucial feature for applications in sectors like customer service, live broadcasting, and automated meeting transcription. These technical enhancements help bridge the gap between human-level understanding and machine-level transcription, which has long been a target for AI researchers and developers.

The Importance of Universal-2 and Its Performance Metrics

The introduction of Universal-2 is a significant step forward for the ASR industry. Enhanced accuracy and robustness mean that businesses can rely on transcription services with increased confidence, even when dealing with complex audio environments. Assembly AI has reported a notable decrease in the word error rate of Universal-2—a 32% reduction compared to Universal-1. This improvement translates into fewer transcription errors, better customer experiences, and higher efficiency for tasks such as subtitling videos, generating meeting notes, or powering voice-controlled applications.

Another critical aspect is Universal-2’s enhanced performance across different languages and accents. In an increasingly interconnected world, the ability to accurately transcribe non-English languages or handle strong regional accents opens up new opportunities for businesses and services. This broader applicability makes Universal-2 highly valuable in regions where language diversity poses a challenge to conventional ASR systems. By pushing the envelope on multilingual support, Assembly AI continues to make strides in democratizing access to cutting-edge AI technologies.

Conclusion

With Universal-2, Assembly AI is setting a new standard in the speech-to-text landscape. The model’s enhanced accuracy, speed, and adaptability make it a robust choice for developers and businesses looking to leverage the latest in ASR technology. By addressing previous challenges, such as the need for better noise handling and multilingual support, Universal-2 not only builds upon the strengths of its predecessor but also introduces new capabilities that make speech recognition more accessible and effective for a wider range of applications. As industries continue to integrate AI-driven tools into their workflows, advancements like Universal-2 bring us closer to seamless human-computer communication, laying the groundwork for more intuitive and efficient interactions.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post Assembly AI Introduces Universal-2: The Next Leap in Speech-to-Text Technology appeared first on MarkTechPost.

ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence w …

Adam is widely used in deep learning as an adaptive optimization algorithm, but it struggles with convergence unless the hyperparameter β2 is adjusted based on the specific problem. Attempts to fix this, like AMSGrad, require the impractical assumption of uniformly bounded gradient noise, which doesn’t hold in cases with Gaussian noise, as seen in variational autoencoders and diffusion models. Other methods, such as AdaShift, address convergence in limited scenarios but aren’t effective for general problems. Recent studies suggest Adam can converge by fine-tuning β2 per task, though this approach is complex and problem-specific, warranting further exploration for universal solutions.

Researchers from The University of Tokyo introduced ADOPT. This new adaptive gradient method achieves optimal convergence at an O(1/√T) rate without requiring specific choices for β2 or the bounded noise assumption. ADOPT addresses Adam’s non-convergence by excluding the current gradient from the second moment estimate and adjusting the order of momentum and normalization updates. Experiments across diverse tasks—such as image classification, generative modeling, language processing, and reinforcement learning—show ADOPT’s superior performance over Adam and its variants. The method also converges reliably in challenging cases, including scenarios where Adam and AMSGrad struggle.

This study focuses on minimizing an objective function that depends on a parameter vector by using first-order stochastic optimization methods. Rather than working with the exact gradient, they rely on an estimate known as the stochastic gradient. Since the function may be nonconvex, the goal is to find a stationary point where the gradient is zero. Standard analyses for convergence in this area generally make several key assumptions: the function has a minimum bound, the stochastic gradient provides an unbiased estimate of the gradient, the function changes smoothly, and the variance of the stochastic gradient is uniformly limited. For adaptive methods like Adam, an additional assumption about the gradient variance is often made to simplify convergence proofs. The researchers apply a set of assumptions to investigate how adaptive gradient methods converge without relying on the stricter assumption that the gradient noise remains bounded.

Prior research suggests that while basic stochastic gradient descent often converges in nonconvex settings, adaptive gradient methods like Adam are widely used in deep learning due to their flexibility. However, Adam sometimes needs to converge, especially in convex cases. A modified version called AMSGrad was developed to address this, which introduces a non-decreasing scaling of the learning rate by updating the second-moment estimate with a maximum function. Still, AMSGrad’s convergence is based on the stronger assumption of uniformly bounded gradient noise, which is not valid in all scenarios, such as in certain generative models. Therefore, the researchers propose a new adaptive gradient update approach that aims to ensure reliable convergence without relying on stringent assumptions about gradient noise, addressing Adam’s limitations regarding convergence and optimizing parameter dependencies.

The ADOPT algorithm is evaluated across various tasks to verify its performance and robustness compared to Adam and AMSGrad. Starting with a toy problem, ADOPT successfully converges where Adam does not, especially under high-gradient noise conditions. Testing with an MLP on the MNIST dataset and a ResNet on CIFAR-10 shows that ADOPT achieves faster and more stable convergence. ADOPT also outperforms Adam in applications such as Swin Transformer-based ImageNet classification, NVAE generative modeling, and GPT-2 pretraining under noisy gradient conditions and yields improved scores in LLaMA-7B language model finetuning on the MMLU benchmark.

The study addresses the theoretical limitations of adaptive gradient methods like Adam, which need specific hyperparameter settings to converge. To resolve this, the authors introduce ADOPT, an optimizer that achieves optimal convergence rates across various tasks without problem-specific tuning. ADOPT overcomes Adam’s limitations by altering the momentum update order and excluding the current gradient from second-moment calculations, ensuring stability across tasks like image classification, NLP, and generative modeling. The work bridges theory and application in adaptive optimization, although future research may explore more relaxed assumptions to generalize ADOPT’s effectiveness further.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning appeared first on MarkTechPost.

Gemini AI Now Accessible Through the OpenAI Library for Streamlined Us …

In an exciting update for developers, Google has launched Gemini, a new AI model that promises to be more accessible and developer-friendly. Gemini, designed to rival models like OpenAI’s GPT-4, has been made easier to access and integrate into various applications, thanks to Google’s recent initiatives. If you’re a developer exploring powerful alternatives or complementary tools to OpenAI, here’s why Gemini might be the right fit.

Gemini Joins OpenAI Library: Streamlining Access

Google’s Gemini is now accessible through the OpenAI library, providing a seamless experience for developers already familiar with OpenAI’s tools. This integration enables developers to leverage Gemini directly alongside other AI models in their existing workflows. Google’s step towards integrating Gemini into popular ecosystems reduces the friction that often accompanies adopting new AI technologies.

The inclusion of Gemini in the OpenAI library means developers won’t need to overhaul their existing code or pipelines. Instead, they can experiment with Gemini’s capabilities within the tools they already use, providing a straightforward path to enhancing or complementing their AI-driven applications. This flexibility is particularly attractive to developers seeking to optimize or expand their software’s capabilities with minimal disruption.

A Simplified Migration Path for Developers

Migrating to a new AI platform can be daunting, particularly when developers have invested significant time in integrating existing models. Google recognizes this challenge and has provided comprehensive support for those looking to transition to Gemini. The recently introduced migration tools and detailed documentation are geared towards making this switch as painless as possible. Developers familiar with OpenAI’s API can easily transition their code, thanks to syntactic similarities and sample guides.

Python Code Example:

python
from openai import OpenAI
client = OpenAI(
api_key=”gemini_api_key”,
base_url=”https://generativelanguage.googleapis.com/v1beta/”
)

response = client.chat.completions.create(
model=”gemini-1.5-flash”,
n=1,
messages=[
{“role”: “system”, “content”: “You are a helpful assistant.”},
{
“role”: “user”,
“content”: “Explain to me how AI works”
}
]
)

print(response.choices[0].message)

Gemini’s compatibility with existing OpenAI model interfaces is a key highlight. Google has also focused on offering performance that matches or exceeds the reliability and speed of competitive models, making it a suitable replacement or addition for developers concerned about scaling their AI capabilities. The migration aids include examples that help adapt prompts, tweak fine-tuning processes, and adjust implementation details—all meant to foster a smooth experience.

One of Gemini’s standout features is its focus on improved contextual understanding, which is designed to support more nuanced and complex tasks. Google aims to address some of the current limitations observed in traditional AI models, such as maintaining coherence over extended interactions or understanding domain-specific terminology. Gemini’s training has benefited from Google’s extensive data resources, ensuring robust performance across a wide variety of use cases.

Check out the Source here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post Gemini AI Now Accessible Through the OpenAI Library for Streamlined Use appeared first on MarkTechPost.

Is Your LLM Agent Enterprise-Ready? Salesforce AI Research Introduces …

Customer Relationship Management (CRM) has become integral to business operations as the center for managing customer interactions, data, and processes. Integrating advanced AI into CRM can transform these systems by automating routine processes, delivering personalized experiences, and streamlining customer service efforts. As organizations increasingly adopt AI-driven approaches, the need for intelligent agents capable of performing complex CRM tasks has grown. Large language models (LLMs) are at the forefront of this movement, potentially enhancing CRM systems by automating complex decision-making and data management tasks. However, deploying these agents requires robust, realistic benchmarks to ensure they can handle the complexities typical of CRM environments, which include managing multifaceted data objects and following specific interaction protocols.

Existing tools such as WorkArena, WorkBench, and Tau-Bench provide elementary assessments for CRM agent performance. Still, these benchmarks primarily evaluate simple operations, such as data navigation and filtering, and do not capture the complex dependencies and dynamic interrelations typical of CRM data. For instance, these tools must improve modeling relationships between objects, such as orders linked to customer accounts or cases spanning multiple touchpoints. This lack of complexity limits organizations from understanding the full capabilities of LLM agents, creating an ongoing need for a more comprehensive evaluation framework. One of the key challenges in this field is the lack of benchmarks that accurately reflect the intricate, interconnected tasks required in real CRM systems.

Salesforce’s AI Research team addressed this gap by introducing CRMArena, a sophisticated benchmark developed specifically to evaluate the capabilities of AI agents in CRM environments. Unlike previous tools, CRMArena simulates a real-world CRM system complete with complex data interconnections, enabling a robust evaluation of AI agents on professional CRM tasks. The development process involved collaboration with CRM domain experts who contributed to the design of nine realistic tasks based on three distinct personas: service agents, analysts, and managers. These tasks include essential CRM functions, such as monitoring agent performance, handling complex customer inquiries, and analyzing data trends to improve service. CRMArena includes 1,170 unique queries across these nine tasks, providing a comprehensive platform for testing CRM-specific scenarios.

The architecture of CRMArena is grounded in a CRM schema modeled after Salesforce’s Service Cloud. The data generation pipeline produces an interconnected dataset of 16 objects, such as accounts, orders, and cases, with complex dependencies that mirror real-world CRM environments. To enhance realism, CRMArena integrates latent variables replicating dynamic business conditions, such as seasonal buying trends and agent skill variations. This high level of interconnectivity, which involves an average of 1.31 dependencies per object, ensures that CRMArena represents CRM environments accurately, presenting agents with challenges similar to those they would face in professional settings. Additionally, CRMArena’s setup supports both UI and API access to CRM systems, allowing for direct interactions through API calls and realistic response handling.

Performance testing with CRMArena has revealed that current state-of-the-art LLM agents struggle with CRM tasks. Using the ReAct prompting framework, the highest-performing agent achieved only 38.2% task completion. When supplemented with specialized function-calling tools, performance improved to a completion rate of 54.4%, highlighting a significant performance gap. The tasks evaluated included challenging functions such as Named Entity Disambiguation (NED), Policy Violation Identification (PVI), and Monthly Trend Analysis (MTA), all requiring agents to analyze and interpret complex data. For example, only 90% of domain experts confirmed that the synthetic data environment felt authentic, with over 77% rating individual objects within the CRM system as “realistic” or “very realistic.” These insights reveal critical gaps in the LLM agents’ ability to understand nuanced dependencies in CRM data. This area must be addressed for the full deployment of AI-driven CRM.

CRMArena’s ability to deliver high-fidelity testing comes from its two-tiered quality assurance process. The data generation pipeline is optimized to maintain diversity across various data objects, using a mini-batch prompting approach that limits content duplication. Further, CRMArena’s quality assurance processes include format and content verification to ensure the consistency and accuracy of generated data. Regarding query formulation, CRMArena consists of a mix of answerable and non-answerable queries, with non-answerable queries making up 30% of the total. These are designed to test the agents’ capability to identify and handle questions that do not have solutions, thus closely mirroring real CRM environments where information may not always be immediately available.

Key Takeaways from the research on CRMArena include:

CRM Task Coverage: CRMArena includes nine diverse CRM tasks representing service agents, analysts, and managers, covering over 1,170 unique queries.

Data Complexity: CRMArena involves 16 interconnected objects, averaging 1.31 dependencies per object, achieving realism in CRM modeling.

Realism Validation: Over 90% of domain experts rated CRMArena’s test environment as realistic or very realistic, indicating the high validity of its synthetic data.

Agent Performance: Leading LLM agents completed only 38.2% of tasks using standard prompting and 54.4% with function-calling tools, underscoring challenges in current AI capabilities.

Non-Answerable Queries: About 30% of CRMArena’s queries are non-answerable, pushing agents to identify and appropriately handle incomplete information.

In conclusion, the introduction of CRMArena highlights significant advancements and key insights in assessing AI agents for CRM tasks. CRMArena is a major contributor to the CRM industry, offering a scalable, accurate, and rigorous benchmark for evaluating agent performance in CRM environments. As the research demonstrates, there is a substantial gap between the current capabilities of AI agents and the high-performance standards required in CRM systems. CRMArena’s extensive testing framework provides a necessary tool for developing and refining AI agents to meet these demands.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘
The post Is Your LLM Agent Enterprise-Ready? Salesforce AI Research Introduces CRMArena: A Novel AI Benchmark Designed to Evaluate AI Agents on Realistic Tasks Grounded on Professional Work Environments appeared first on MarkTechPost.