This AI Paper from Meta AI and MIT Introduces In-Context Risk Minimiza …

Artificial intelligence is advancing rapidly, but researchers are facing a significant challenge. AI systems struggle to adapt to diverse environments outside their training data, which is critical in areas like self-driving cars, where failures can have catastrophic consequences. Despite efforts by researchers to tackle this problem with algorithms for domain generalization, no algorithm has yet performed better than basic empirical risk minimization (ERM) methods across real-world benchmarks for out-of-distribution generalization. This issue has prompted dedicated research groups, workshops, and societal considerations. As we depend more on AI systems, we must pursue effective generalization beyond training data distribution to ensure they can adapt to new environments and function safely and effectively.

A group of researchers from Meta AI and MIT CSAIL have stressed the importance of context in AI research and have proposed the In-Context Risk Minimization (ICRM) algorithm for better domain generalization. The study argues that researchers in domain generalization should consider the environment as context, and researchers in LLMs should consider context as an environment to improve data generalization. The efficacy of the ICRM algorithm has been demonstrated in the study. The researchers found that attention to context-unlabeled examples allows the algorithm to focus on the test environment risk minimizer, ultimately leading to improved out-of-distribution performance.

https://arxiv.org/abs/2309.09888

The study introduces the ICRM algorithm as a solution to out-of-distribution prediction challenges, treating it as an in-distribution next-token prediction. The researchers advocate training a machine using examples from diverse environments. Through a combination of theoretical insights and experiments, they showcase the effectiveness of ICRM in enhancing domain generalization. The algorithm’s focus on context-unlabeled examples enables it to pinpoint the risk minimizer for the test environment, resulting in significant improvements in out-of-distribution performance.

The research focuses on in-context learning and its ability to balance trade-offs, such as efficiency-resiliency,exploration-exploitation,specialization-generalization, and focusing on diversifying. The study highlights the significance of considering the surroundings as context in domain generalization research and emphasizes the adaptable nature of in-context learning. The authors suggest that researchers utilize this capability to organize data more effectively for better generalization.

https://arxiv.org/abs/2309.09888

The study presents the ICRM algorithm using context-unlabeled examples to improve machine learning performance with out-of-distribution data. It identifies risk minimizers specific to the test environment and shows the importance of context in domain generalization research. Extensive experiments show ICRM’s superiority to basic empirical risk minimization methods. The study suggests that researchers should consider the context for improved data structuring and generalization. The researchers discuss in-context learning trade-offs, including efficiency-resiliency,exploration-exploitation,specialization-generalization, and focusing-diversifying.

In conclusion, the study highlights the importance of considering the environment as a crucial factor in domain generalization research. It emphasizes the adaptive nature of in-context learning, which involves incorporating the environment as a context to improve generalization. In this regard, LLMs demonstrate their ability to learn dynamically and adapt to diverse circumstances, which is vital in addressing challenges related to out-of-distribution generalization. The study proposes the ICRM algorithm to enhance out-of-distribution performance by focusing on the risk minimizer specific to the test environment. It also utilizes context-unlabeled examples to improve domain generalization. The study discusses trade-offs associated with in-context learning, including efficiency-resiliency, exploration-exploitation, specialization-generalization, and focusing-diversifying. It suggests that researchers consider context an environment for effective data structuring, advocating for a move from broad domain indices to more detailed and compositional contextual descriptions.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post This AI Paper from Meta AI and MIT Introduces In-Context Risk Minimization (ICRM): A Machine Learning Framework to Address Domain Generalization as Next-Token Prediction. appeared first on MarkTechPost.

Meet ‘AboutMe’: A New Dataset And AI Framework that Uses Self-Desc …

With the advancements in Natural Language Processing and Natural Language Generation, Large Language Models (LLMs) are being frequently used in real-world applications. With their ability to mimic human behavior, these models, with their general-purpose nature, have stepped into every field and domain. 

Though these models have gained significant attention, these models represent a constrained and skewed collection of human viewpoints and knowledge. The pretraining data’s composition is the reason for this bias since it has a big impact on the model’s behavior. 

Researchers have been putting in efforts to have an additional focus on understanding and documenting the transformations made to the data before pretraining. Pretraining data curation is a multi-step process with multiple decision points that are frequently based on subjective text quality judgments or performance against benchmarks.

In a recent study, a team of researchers from the Allen Institute for AI, the University of California, Berkeley, Emory University, Carnegie Mellon University, and the University of Washington introduced a new dataset and framework called AboutMe. The study highlights the numerous unquestioned assumptions that exist in data curation workflows. With AboutMe, the team has attempted to document the effects of data filtering on text rooted in social and geographic contexts.

The lack of extensive, self-reported sociodemographic data associated with language data is one of the problems facing sociolinguistic analysis in Natural Language Processing. Text can be traced back to general sources such as Wikipedia, but at a more granular level, it’s frequently unknown who created the information. The team in this study has found websites, particularly ‘about me’ pages, by utilizing pre-existing patterns in web data. This allows an unprecedented understanding of whose language is represented in web-scraped text.

Using data from the ‘about me’ sections of websites, the team has performed sociolinguistic analyses to measure the topical interests, positioning individuals or organizations, self-identified social roles, and associated geographic locations of website authors. Ten quality and English ID filters from earlier research on LLM development have been used on these web pages to examine the effect of filtering on the kept or deleted pages. 

The team has shared that their main goal was to find trends in website origin-related behavior both inside and between filters. The results have shown that implicit preferences for specific subject areas are displayed by model-based quality filters, which causes text related to various professions and vocations to be removed at varied rates. Furthermore, filtering techniques that presume pages are monolingual may unintentionally eliminate content from non-anglophone parts of the globe. 

In conclusion, this research has highlighted the intricacies involved in data filtering during LLM development and its consequences for the portrayal of varied viewpoints in language models. The study’s main goal is to raise awareness of the intricate details that go into pretraining data curation procedures, particularly when considering social factors. The team has stressed on the need for more research on pretraining data curation procedures and their social implications.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Meet ‘AboutMe’: A New Dataset And AI Framework that Uses Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters appeared first on MarkTechPost.

Take Your Semrush & Customers.ai Integration to the Next Level

Last month we officially launched our Semrush integration and we are happy to say we’re already seeing some exciting results!

If you’re one of the people who made the very smart decision to install the Website Visitor ID X-Ray pixel, then we have some tips to help you take your marketing to the next level.

If you have yet to install the pixel, then we suggest you head over the Semrush App Center right now. Seriously, it takes less than 90 seconds to install, it’s free to signup, and you get your first 500 contacts at no cost. There’s really nothing to lose.

Ok, sales pitch over. 

For those of you who are already seeing the benefit of this integration, we have a few tips to help you not only make the most of your inbound traffic but improve your overall marketing efforts. Our goal? To 10x your ROI!

Let’s jump in with these three ways you can take your Semrush and Customers.ai integration to the next level. 

1. Turn Top-of-the-Funnel Content into a Lead Generation Tool

Semrush is all about helping you drive traffic. 

From keyword recommendations to content ideas to the SEO writing assistant, Semrush helps you create content that performs well in search and speaks to your target audience.

And it works!

The problem is, once you create that content, it starts ranking, and you start seeing traffic, it doesn’t often go anywhere.

All of that hard work and unless someone fills out a form, you have no idea who they are and can’t reach them again. 

This is where Customers.ai comes in. 

Let’s say you are a marketing agency and you have a popular blog post about running a LinkedIn ABM campaign. The post ranks for the term “LinkedIn ABM tips” and you get a ton of traffic. 

You have several other posts about LinkedIn Ads that all drive traffic but at the end of the day, aren’t driving any leads.

With Customers.ai, you can build a segment for all of the LinkedIn pages and start identifying the names, emails, websites, etc. of the people visiting those pages. 

From there, you can take that list of emails and create a remarketing campaign that sends them to a webinar or ebook about…you guessed it, LinkedIn Ads.

You know these people are interested in LinkedIn Ads but you also know they might not be interested in your services…yet. 

With Customers.ai, you can take that top-of-the-funnel content you worked so hard to build and turn it into an actual lead generator. 

2. Identify Content That Resonates with High-Intent Prospects

GA4 has made it really difficult to understand just how your content is performing and how users are moving through your site.

How are we supposed to know what content is resonating with users if we can barely find it in the platform?!

The answer is, you send landing page data to your CRM platform.

With Customers.ai, yes, you capture names, emails, etc., but you also capture landing pages. 

All of that can be integrated into your CRM, giving you the ability to see what content prospects are viewing before they take an action.

Let’s break it down.

Pricing pages are pretty good indicators someone is interested in what you have to offer. Or at least they show more intent than say someone visiting the home page.

Take those visitors, and send their data to your CRM. 

You have now started a customer profile and as they go through the funnel and you continue capturing page-level data, you can get a much clearer picture of of what pages they visited and the topics they are interested in. 

Not only does this help your sales team better understand what the prospect is interested in but it also helps marketers understand the type of content that resonates with high-value prospects. 

A better understanding equals better content, better personalization, and better results.

3. Superpower Your Abandoned Cart Recovery

You worked really hard to get product descriptions optimized, your Google shopping campaigns humming, and overall, your conversion rate feels pretty good.

But what if there was a bigger opportunity? What if you could turn more of those visits into sales?

We know that almost 70% of people abandoned their shopping carts. 

We also know that abandoned cart recovery is big business. It’s why there are over 250 listings for shopping cart software on G2 alone.

Unfortunately, abandoned cart recovery only works if people fill out their information or you already have an existing customer profile. 

The result? Abandoned cart recovery only reaches on average 3% of your site visitors. 

With Customers.ai, you can 10x that!

Website visitor identification allows you to identify people who have reached the shopping cart but didn’t actually fill out their email. And with our ESP integrations, you can add those people directly into your existing automated cart recovery workflows, helping you can reach more people than ever before.

As marketers, we work really hard to get people to the point of purchase. Why not make sure that we can capture as many of them as possible?

Bonus Tip: Expand & Inform Your Retargeting Audiences

We mentioned top-of-the-funnel content earlier and I think it’s a really important point to drive home. 

Working on the agency side for so many years, I can’t tell you how many times I heard “Traffic is great, but we need revenue”. 

It’s really a catch-22 in the world of SEO.

You build content that performs really well. In fact, it performs too well. Traffic skyrockets but its top of the funnel content and the client isn’t quite as happy as you thought they’d be.

For agencies or marketing teams running full-funnel campaigns, this top-of-the-funnel content is great for remarketing audiences.

And it goes beyond top of the funnel.

In the good old days, you could build huge retargeting lists. Facebook and Google were brilliant at putting people into specific segments and delivering them applicable ads. 

Unfortunately, that’s no longer the case.

With Click IDs out the door and privacy a bigger issue than ever before, retargeting audiences are shrinking.

That’s where Customers.ai comes in.

You can create really specific segments that will allow you to retarget down to the product page itself. 

So whether you are trying to take that top-of-the-funnel audience and guide them through to purchase or take that high-intent audience and get them to finalize their purchase, 

Either way, you can scale up your custom retargeting audiences and inform your platform-provided AI-generated audience (looking at you Advantage+).

The result? Expanded audiences, warm leads, better ROI, and happy teams!

Semrush + Customers.ai = Real Results

The Semrush integration really does feel like a perfect fit and we want more than anything for people to truly take advantage of what this integration offers.

Marketing is harder than ever. 

With attribution challenges and Google doing whatever it is they are doing with their results on a day-to-day basis, getting results can be challenging.

Our goal is to make your life easier and make your look like the marketing star that you are. 

If you are already an Semrush customer, head over to the App Center and easily add the Customers.ai pixel to your site.

If you aren’t, you can get started right here. 

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations. 

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

Easy right?

2024 is just starting, giving you a ton of time to make a big impact and drive big results. Let’s do it!

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Take Your Semrush & Customers.ai Integration to the Next Level appeared first on Customers.ai.

Meet FedTabDiff: An Innovative Federated Diffusion-based Generative AI …

While generating realistic tabular data, one of the difficulties faced by the researchers is maintaining privacy, especially in sensitive domains like finance and healthcare.  As the amount of data and the importance of data analysis is increasing in all fields and privacy concerns are leading to hesitancy in deploying AI models, the importance of maintaining privacy is also increasing. Some of the challenges in preserving the privacy in financial field are mixed attribute types, implicit relationships, and distribution imbalances in real-world datasets. 

Researchers from the University of St.Gallen (Switzerland), Deutsche Bundesbank (Germany), and  International Computer Science Institute (USA) have introduced a method to generate high-fidelity mixed-type tabular data without centralized access to the original datasets, FedTabDiff, ensuring privacy and compliance with regulations (Example: EU’s General Data Protection Regulation and the California Privacy Rights Act).

Traditional methods like anonymization and elimination of sensitive attributes in high-stake domains do not provide any privacy. FedTabDiff introduces the concept of synthetic data, which involves generating data through a generative process based on the inherent properties of real data. The researchers leverage Denoising Diffusion Probabilistic Models (DDPMs), which have successfully generated synthetic images, and used the concept in a federated setting for tabular data generation.

FedTabDiff incorporates DDPMs into a federated learning framework, allowing multiple entities to collaboratively train a generative model while respecting data privacy and locality. The DDPMs use a Gaussian Diffusion Model, employing a forward process to perturb data incrementally with Gaussian noise and then restoring it through a reverse process. The federated learning aspect involves a synchronous update scheme and weighted averaging for effective model aggregation. The architecture of FedTabDiff includes a central FinDiff model maintained by a trusted entity and decentralized FinDiff models contributed by individual clients. The federated optimization provides a weighted average over decentralized model updates which helps in the collaborative learning process. For the evaluation of the model, the researchers used standard metrics of fidelity, utility, privacy, and coverage.

FedTabDiff shows exceptional performance with both financial and medical datasets, proving its effectiveness in diverse scenarios. The comparison of the model to the non-federated FinDiff models showcased better performance in all four metrics. The approach manages to balance maintaining privacy as well as keeping deviations from the original data in control and preventing the data from being too unrealistic. FedTabDiff’s effectiveness is demonstrated through empirical evaluations on real-world datasets, showcasing its potential for responsible and privacy-preserving AI applications in domains like finance and healthcare.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Meet FedTabDiff: An Innovative Federated Diffusion-based Generative AI Model Tailored for the High-Quality Synthesis of Mixed-Type Tabular Data appeared first on MarkTechPost.

This NIST Trustworthy and Responsible AI Report Develops a Taxonomy of …

Artificial intelligence (AI) systems are expanding and advancing at a significant pace. The two main categories into which AI systems have been divided are Predictive AI and Generative AI. The well-known Large Language Models (LLMs), which have recently gathered massive attention, are the best examples of generative AI. While Generative AI creates original content, Predictive AI concentrates on making predictions using data.

It is important for AI systems to have safe, reliable, and resilient operations as these systems are being used as an integral component in almost all significant industries. The NIST AI Risk Management Framework and AI Trustworthiness taxonomy have indicated that these operational characteristics are necessary for trustworthy AI.

In a recent study, a team of researchers from the NIST Trustworthy and Responsible AI has shared their goal of advancing the field of Adversarial Machine Learning (AML) by creating a thorough taxonomy of terms and providing definitions for pertinent terms. This taxonomy has been structured into a conceptual hierarchy and created by carefully analyzing the body of current AML literature.

The hierarchy includes the main categories of Machine Learning (ML) techniques, different phases of the attack lifecycle, the aims and objectives of the attacker, and the skills and information that the attackers have about the learning process. Along with outlining the taxonomy, the study has offered strategies for controlling and reducing the effects of AML attacks.

The team has shared that AML problems are dynamic and identify unresolved issues that need to be taken into account at every stage of the development of Artificial Intelligence systems. The goal is to provide a thorough resource that helps shape future practice guides and standards for evaluating and controlling the security of AI systems.

The terminology mentioned in the shared research paper aligns with the body of current AML literature. A dictionary explaining important topics related to AI system security has also been provided. The team has shared that establishing a common language and understanding within the AML domain is the ultimate purpose of the integrated taxonomy and nomenclature. By doing this, the study supports the development of future norms and standards, promoting a coordinated and knowledgeable approach to tackling the security issues brought about by the quickly changing AML landscape.

The primary contributions of the research can be summarized as follows.

A common vocabulary for discussing Adversarial Machine Learning (AML) ideas by developing standardized terminology for the ML and cybersecurity communities has been shared.

A comprehensive taxonomy of AML attacks that covers systems that use both Generative AI and Predictive AI has been presented.

Generative AI attacks have been divided into categories for evasion, poisoning, abuse, and privacy, and predictive AI attacks have been divided into categories for evasion, poisoning, and confidentiality.

Attacks on several data modalities and learning approaches, i.e., supervised, unsupervised, semi-supervised, federated learning, and reinforcement learning, have been tackled.

Possible AML mitigations and ways to handle particular attack classes have been discussed.

The shortcomings of current mitigation strategies have been analyzed, and a critical viewpoint on their efficiency has been provided.

Check out the Technical Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post This NIST Trustworthy and Responsible AI Report Develops a Taxonomy of Concepts and Defines Terminology in the Field of Adversarial Machine Learning (AML) appeared first on MarkTechPost.

Unlabel Releases Tower: A Multilingual 7B Parameter Large Language Mod …

With the growth of large language models, natural language processing has been revolutionized. Many LLMs, like GPT-3.5, LLaMA, and Mixtral, came up last year, which helped tackle diverse language tasks. Even though there are many such LLMs now, open-source models have no reliable models for translation tasks. Thorough research has been done to tackle this challenge.

Consequently, a collaboration between the researchers of Unbabel, the SARDINE Lab at Instituto Superior Técnico, and the researchers of the MICS lab at CentraleSupélec, University of Paris-Saclay, has created a new multilingual model Tower. This Llama 2-based multilingual LLM has 7B parameters specifically designed for translation-related tasks. The main highlight of this model is that, unlike other open-source models, which are predominantly built with English data, Tower supports 10 languages. These languages are English, German, French, Spanish, Chinese, Portuguese, Italian, Russian, Korean, and Dutch.

In addition to multilingual translation, it also has capabilities for pre-translation activities, like grammar improvement, to translation assessment jobs, like machine translation and automatic post-editing. The researchers of this collaboration found that this model performed better than the state-of-the-art counterparts in translation and better than alternative open-source solutions, including ALMA 13B and LLaMA-2 70B. 

The researchers used two stages to formulate Tower: extended pre-training and instruction tuning. The researchers emphasized that they used continued pre-training as it enhances LLaMA2’s proficiency in non-English languages, while instruction tuning improves its performance in addressing particular problems without prior experience. To do continued pre-training, they used a dataset of 20 billion tokens evenly distributed among different languages. They sourced two-thirds of the tokens from monolingual data, and they sourced one-third of the data from publicly accessible bilingual datasets, such as OPUS. 

The second step of instruction tuning enhanced the model’s ability to handle specific tasks at a higher level in a 0-shot fashion. They developed a dataset named TowerBlocks for supervised fine-tuning. The dataset comprises code instructions and conversational data and has task-specific records. This dataset helped the model to maintain competency across various translation-related tasks by providing prompts for all tasks, including zero and few-shot templates.

In conclusion, TowerInstruct can be a significant step in multilingual machine translation as it outperforms GPT-3.5 and Mixtral 8x7B models. Its features, including automatic post-edition, named-entity recognition, or source error correction, can be very helpful in this domain. As the researchers focus on enhancing the model’s efficiency, this model can be a revolutionary stride in multilingual translation. The researchers of this collaboration are also looking forward to the release of TowerEval, an evaluation repository focused on machine translation and related tasks. This will help users reproduce benchmarks and assess the performance of their language models against Tower’s standards. 

Check out the Model and Reference Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Unlabel Releases Tower: A Multilingual 7B Parameter Large Language Model (LLM) Optimized for Translation-Related Tasks appeared first on MarkTechPost.

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMak …

Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Using AWS Trainium and Inferentia based instances, through SageMaker, can help users lower fine-tuning costs by up to 50%, and lower deployment costs by 4.7x, while lowering per token latency. Llama 2 is an auto-regressive generative text language model that uses an optimized transformer architecture. As a publicly available model, Llama 2 is designed for many NLP tasks such as text classification, sentiment analysis, language translation, language modeling, text generation, and dialogue systems. Fine-tuning and deploying LLMs, like Llama 2, can become costly or challenging to meet real time performance to deliver good customer experience. Trainium and AWS Inferentia, enabled by the AWS Neuron software development kit (SDK), offer a high-performance, and cost effective option for training and inference of Llama 2 models.
In this post, we demonstrate how to deploy and fine-tune Llama 2 on Trainium and AWS Inferentia instances in SageMaker JumpStart.
Solution overview
In this blog, we will walk through the following scenarios :

Deploy Llama 2 on AWS Inferentia instances in both the Amazon SageMaker Studio UI, with a one-click deployment experience, and the SageMaker Python SDK.
Fine-tune Llama 2 on Trainium instances in both the SageMaker Studio UI and the SageMaker Python SDK.
Compare the performance of the fine-tuned Llama 2 model with that of pre-trained model to show the effectiveness of fine-tuning.

To get hands on, see the GitHub example notebook.
Deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI and the Python SDK
In this section, we demonstrate how to deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI for a one-click deployment and the Python SDK.
Discover the Llama 2 model on the SageMaker Studio UI
SageMaker JumpStart provides access to both publicly available and proprietary foundation models. Foundation models are onboarded and maintained from third-party and proprietary providers. As such, they are released under different licenses as designated by the model source. Be sure to review the license for any foundation model that you use. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using the content.
You can access the Llama 2 foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
After you’re in SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions. For more detailed information on how to access proprietary models, refer to Use proprietary foundation models from Amazon SageMaker JumpStart in Amazon SageMaker Studio.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources.
If you don’t see the Llama 2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.

You can also find other model variants by choosing Explore All Text Generation Models or searching for llama or neuron in the search box. You will be able to view the Llama 2 Neuron models on this page.

Deploy the Llama-2-13b model with SageMaker Jumpstart
You can choose the model card to view details about the model such as license, data used to train, and how to use it. You can also find two buttons, Deploy and Open notebook, which help you use the model using this no-code example.

When you choose either button, a pop-up will show the End User License Agreement and Acceptable Use Policy (AUP) for you to acknowledge.

After you acknowledge the policies, you can deploy the endpoint of the model and use it via the steps in the next section.
Deploy the Llama 2 Neuron model via the Python SDK
When you choose Deploy and acknowledge the terms, model deployment will start. Alternatively, you can deploy through the example notebook by choosing Open notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy or fine-tune a model on Trainium or AWS Inferentia instances, you first need to call PyTorch Neuron (torch-neuronx) to compile the model into a Neuron-specific graph, which will optimize it for Inferentia’s NeuronCores. Users can instruct the compiler to optimize for lowest latency or highest throughput, depending on the objectives of the application. In JumpStart, we pre-compiled the Neuron graphs for a variety of configurations, to allow users to sip compilation steps, enabling faster fine-tuning and deploying models.
Note that the Neuron pre-compiled graph is created based on a specific version of the Neuron Compiler version.
There are two ways to deploy LIama 2 on AWS Inferentia-based instances. The first method utilizes the pre-built configuration, and allows you to deploy the model in just two lines of code. In the second, you have greater control over the configuration. Let’s start with the first method, with the pre-built configuration, and use the pre-trained Llama 2 13B Neuron Model, as an example. The following code shows how to deploy Llama 13B with just two lines:

from sagemaker.jumpstart.model import JumpStartModel

model_id = “meta-textgenerationneuron-llama-2-13b”
model = JumpStartModel(model_id=model_id)
pretrained_predictor = model.deploy(accept_eula=False) ## To set ‘accept_eula’ to be True to deploy

To perform inference on these models, you need to specify the argument accept_eula to be True as part of the model.deploy() call. Setting this argument to be true, acknowledges you have read and accepted the EULA of the model. The EULA can be found in the model card description or from the Meta website.
The default instance type for Llama 2 13B is ml.inf2.8xlarge. You can also try other supported models IDs:

meta-textgenerationneuron-llama-2-7b
meta-textgenerationneuron-llama-2-7b-f (chat model)
meta-textgenerationneuron-llama-2-13b-f (chat model)

Alternatively, if you want have more control of the deployment configurations, such as context length, tensor parallel degree, and maximum rolling batch size, you can modify them via environmental variables, as demonstrated in this section. The underlying Deep Learning Container (DLC) of the deployment is the Large Model Inference (LMI) NeuronX DLC. The environmental variables are as follows:

OPTION_N_POSITIONS – The maximum numbers of input and output tokens. For example, if you compile the model with OPTION_N_POSITIONS as 512, then you can use an input token of 128 (input prompt size) with a maximum output token of 384 (the total of the input and output tokens has to be 512). For the maximum output token, any value below 384 is fine, but you can’t go beyond it (for example, input 256 and output 512).
OPTION_TENSOR_PARALLEL_DEGREE – The number of NeuronCores to load the model in AWS Inferentia instances.
OPTION_MAX_ROLLING_BATCH_SIZE – The maximum batch size for concurrent requests.
OPTION_DTYPE – The date type to load the model.

The compilation of Neuron graph depends on the context length (OPTION_N_POSITIONS), tensor parallel degree (OPTION_TENSOR_PARALLEL_DEGREE), maximum batch size (OPTION_MAX_ROLLING_BATCH_SIZE), and data type (OPTION_DTYPE) to load the model. SageMaker JumpStart has pre-compiled Neuron graphs for a variety of configurations for the preceding parameters to avoid runtime compilation. The configurations of pre-compiled graphs are listed in the following table. As long as the environmental variables fall into one of the following categories, compilation of Neuron graphs will be skipped.

LIama-2 7B and LIama-2 7B Chat

Instance type
OPTION_N_POSITIONS
OPTION_MAX_ROLLING_BATCH_SIZE
OPTION_TENSOR_PARALLEL_DEGREE
OPTION_DTYPE

ml.inf2.xlarge
1024
1
2
fp16

ml.inf2.8xlarge
2048
1
2
fp16

ml.inf2.24xlarge
4096
4
4
fp16

ml.inf2.24xlarge
4096
4
8
fp16

ml.inf2.24xlarge
4096
4
12
fp16

ml.inf2.48xlarge
4096
4
4
fp16

ml.inf2.48xlarge
4096
4
8
fp16

ml.inf2.48xlarge
4096
4
12
fp16

ml.inf2.48xlarge
4096
4
24
fp16

LIama-2 13B and LIama-2 13B Chat

ml.inf2.8xlarge
1024
1
2
fp16

ml.inf2.24xlarge
2048
4
4
fp16

ml.inf2.24xlarge
4096
4
8
fp16

ml.inf2.24xlarge
4096
4
12
fp16

ml.inf2.48xlarge
2048
4
4
fp16

ml.inf2.48xlarge
4096
4
8
fp16

ml.inf2.48xlarge
4096
4
12
fp16

ml.inf2.48xlarge
4096
4
24
fp16

The following is an example of deploying Llama 2 13B and setting all the available configurations.

from sagemaker.jumpstart.model import JumpStartModel

model_id = “meta-textgenerationneuron-llama-2-13b-f”
model = JumpStartModel(
model_id=model_id,
env={
“OPTION_DTYPE”: “fp16”,
“OPTION_N_POSITIONS”: “4096”,
“OPTION_TENSOR_PARALLEL_DEGREE”: “12”,
“OPTION_MAX_ROLLING_BATCH_SIZE”: “4”,
},
instance_type=”ml.inf2.24xlarge”
)
pretrained_predictor = model.deploy(accept_eula=False) ## To set ‘accept_eula’ to be True to deploy

Now that we have deployed the Llama-2-13b model, we can run inference with it by invoking the endpoint. The following code snippet demonstrates using the supported inference parameters to control text generation:

max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
max_new_tokens – The model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
num_beams – This indicates the number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature – This controls the randomness in the output. A higher temperature results in an output sequence with low-probability words; a lower temperature results in an output sequence with high-probability words. If temperature equals 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping – If True, text generation is finished when all beam hypotheses reach the end of the sentence token. If specified, it must be Boolean.
do_sample – If True, the model samples the next word as per the likelihood. If specified, it must be Boolean.
top_k – In each step of text generation, the model samples from only the top_k most likely words. If specified, it must be a positive integer.
top_p – In each step of text generation, the model samples from the smallest possible set of words with a cumulative probability of top_p. If specified, it must be a float between 0–1.
stop – If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

The following code shows an example:

payload = {
“inputs”: “I believe the meaning of life is”,
“parameters”: {
“max_new_tokens”: 64,
“top_p”: 0.9,
“temperature”: 0.6,
},
}

response = pretrained_predictor.predict(payload)

Output:

I believe the meaning of life is
> to be happy. I believe that happiness is a choice. I believe that happiness
is a state of mind. I believe that happiness is a state of being. I believe that
happiness is a state of being. I believe that happiness is a state of being. I
believe that happiness is a state of being. I believe

For more information on the parameters in the payload, refer to Detailed parameters.
You can also explore the implementation of the parameters in the notebook to add more information about the link of the notebook.
Fine-tune Llama 2 models on Trainium instances using the SageMaker Studio UI and SageMaker Python SDK
Generative AI foundation models have become a primary focus in ML and AI, however, their broad generalization can fall short in specific domains like healthcare or financial services, where unique datasets are involved. This limitation highlights the need to fine-tune these generative AI models with domain-specific data to enhance their performance in these specialized areas.
Now that we have deployed the pre-trained version of the Llama 2 model, let’s look at how we can fine-tune this to domain-specific data to increase the accuracy, improve the model in terms of prompt completions, and adapt the model to your specific business use case and data. You can fine-tune the models using either the SageMaker Studio UI or SageMaker Python SDK. We discuss both methods in this section.
Fine-tune the Llama-2-13b Neuron model with SageMaker Studio
In SageMaker Studio, navigate to the Llama-2-13b Neuron model. On the Deploy tab, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning. Then choose Train to start the training job on a SageMaker ML instance.

To use Llama 2 models, you need to accept the EULA and AUP. It will show up when you when you choose Train. Choose I have read and accept EULA and AUP to start the fine-tuning job.
You can view the status of your training job for the fine-tuned model under on the SageMaker console by choosing Training jobs in the navigation pane.
You can either fine-tune your Llama 2 Neuron model using this no-code example, or fine-tune via the Python SDK, as demonstrated in the next section.
Fine-tune the Llama-2-13b Neuron model via the SageMaker Python SDK
You can fine-tune on the dataset with the domain adaptation format or the instruction-based fine-tuning format. The following are the instructions for how the training data should be formatted before being sent into fine-tuning:

Input – A train directory containing either a JSON lines (.jsonl) or text (.txt) formatted file.

For the JSON lines (.jsonl) file, each line is a separate JSON object. Each JSON object should be structured as a key-value pair, where the key should be text, and the value is the content of one training example.
The number of files under the train directory should equal to 1.

Output – A trained model that can be deployed for inference.

In this example, we use a subset of the Dolly dataset in an instruction tuning format. The Dolly dataset contains approximately 15,000 instruction-following records for various categories, such as, question answering, summarization, and information extraction. It is available under the Apache 2.0 license. We use the information_extraction examples for fine-tuning.

Load the Dolly dataset and split it into train (for fine-tuning) and test (for evaluation):

from datasets import load_dataset

dolly_dataset = load_dataset(“databricks/databricks-dolly-15k”, split=”train”)

task = “information_extraction”
To train for summarization/closed question and answering, you can replace the assertion in next line to example[“category”] == “sumarization”/”closed_qa”.
summarization_dataset = dolly_dataset.filter(lambda example: example[“category”] == task)
summarization_dataset = summarization_dataset.remove_columns(“category”)

We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

Dumping the training data to a local file to be used for training.
train_and_test_dataset[“train”].to_json(“train.jsonl”)

Use a prompt template for preprocessing the data in an instruction format for the training job:

prompt = (“””Below is an instruction that describes a task, paired with an input
that provides further context. Write a response that appropriately completes the
request.nn### Instruction:n{instruction}nn### Input:n{context}###
Response:n{response}nn<s>”””)

Examine the hyperparameters and overwrite them for your own use case:

from sagemaker import hyperparameters

model_id = “meta-textgenerationneuron-llama-2-13b”
model_version = “1.*”

my_hyperparameters = hyperparameters.retrieve_default(
model_id=model_id, model_version=model_version
)

my_hyperparameters[“max_input_length”] = “4096” ## you can increase it up to 4096 for sequence length.
my_hyperparameters[“max_steps”] = “25”
my_hyperparameters[“learning_rate”] = “0.0001”
print(my_hyperparameters)

hyperparameters.validate(model_id=model_id, model_version=model_version, hyperparameters=my_hyperparameters)

Fine-tune the model and start a SageMaker training job. The fine-tuning scripts are based on the neuronx-nemo-megatron repository, which are modified versions of the packages NeMo and Apex that have been adapted for use with Neuron and EC2 Trn1 instances. The neuronx-nemo-megatron repository has 3D (data, tensor, and pipeline) parallelism to allow you to fine-tune LLMs in scale. The supported Trainium instances are ml.trn1.32xlarge and ml.trn1n.32xlarge.

from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
model_id=model_id,
model_version=model_version,
hyperparameters=my_hyperparameters,
environment={“accept_eula”: “false”}, # please change `accept_eula` to be `true` to accept EULA.
#instance_type=”ml.trn1n.32xlarge”, if not specified, default `ml.trn1.32xlarge` will be used.
)

estimator.fit({“train”: train_data_location})

Finally, deploy the fine-tuned model in a SageMaker endpoint:

finetuned_predictor = estimator.deploy()

Compare responses between the pre-trained and fine-tuned Llama 2 Neuron models
Now that we have deployed the pre-trained version of the Llama-2-13b model and fine-tuned it, we can view some of the performance comparisons of the prompt completions from both models, as shown in the following table. We also offer an example to fine-tune Llama 2 on a SEC filing dataset in .txt format. For details, see the GitHub example notebook.

Item
Inputs
Ground truth
Response from non-finetuned model
Response from fine-tuned model

1
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nExtract universities Moret studied at and his year of graduation for each, and put them in a bulleted list.nn### Input:nMoret was born in 1953 in Vevey Switzerland, and did his undergraduate studies at the École Polytechnique Fédérale de Lausanne (EPFL), graduating in 1975. He went on to graduate studies at the University of Tennessee, earning a Ph.D. in 1980. He then joined the faculty of the University of New Mexico, where he remained until 2006, when he moved to EPFL. He retired from EPFL in December 2016.nnIn 1996, Moret founded the ACM Journal of Experimental Algorithmics, and he remained editor in chief of the journal until 2003. In 2001, Moret founded the Workshop in Algorithms for Bioinformatics (WABI) and remains on the Steering Committee for the conference.nnIn 2018, Moret was elected as a Fellow of the International Society for Computational Biology, for his outstanding contributions to the fields of computational biology and bioinformatics.nn### Response:n
* École Polytechnique Fédérale de Lausanne (EPFL) (graduated 1975)n* University of Tennessee (graduated 1980)
Moret studied at the École Polytechnique Fédérale de Lausanne (EPFL) and the University of Tennessee, graduating in1975 and1980, respectively.nn### Hint:nn- Use the `split` function to split the input into a list of sentences.n- Use the `split` function to split each sentence into a list of words.n- Use the `split` function to split
– École Polytechnique Fédérale de Lausanne (EPFL),1975n- University of Tennessee,1980n- University of New Mexicon- École Polytechnique Fédérale de Lausanne (EPFL),2006nn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.n

2
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nFrom the passage identify the attributes of Dionysus. Display the results in a comma separated format.nn### Input:nDionysus (Διόνυσος, Diónusos)/Bacchus (Βάκχος, Bákkhos)nnGod of wine, fruitfulness, parties, festivals, madness, chaos, drunkenness, vegetation, ecstasy, and the theater. He is the twice-born son of Zeus and Semele, in that Zeus snatched him from his mother’s womb and stitched Dionysus into his own thigh and carried him until he was ready to be born. In art he is depicted as either an older bearded god (particularly before 430 BC) or an effeminate, long-haired youth (particularly after 430 BC). His attributes include the thyrsus, a drinking cup, the grape vine, and a crown of ivy. He is often in the company of his thiasos, a group of attendants including satyrs, maenads, and his old tutor Silenus. The consort of Dionysus was Ariadne. It was once held that Dionysius was a later addition to the Greek pantheon, but the discovery of Linear B tablets confirm his status as a deity from an early period. Bacchus was another name for him in Greek, and came into common usage among the Romans. His sacred animals include dolphins, serpents, tigers, and donkeys.nn### Response:n
the thyrsus, a drinking cup, the grape vine, a crown of ivy
Dionysus (Διόνυσος, Diónusos)nnGod of wine, fruitfulness, parties, festivals, madness, chaos, drunkenness, vegetation, ecstasy, and the theater. He is the twice-born son of Zeus and Semele, in that Zeus snatched him from his mother’s womb and stitched Dionysus into his own thigh
thyrsus, a drinking cup, the grape vine, a crown of ivynn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is the difference between Parthian and Arsacid Empire?nn### Input:nThe Parthian Empire, also known as the Arsacid Empire was a

3
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhy does the capital of Uzbekistan contain the biggest Greek community?nn### Input:nThe Greeks in Uzbekistan number approximately 9,000. The community is made up of Greeks from Russia who were deported by force from that country to Uzbekistan in the 1940s, and political refugees from Greece. About 30,000 Greeks lived in the country before World War II and a further 11,000 arrived after the Greek Civil War and the defeat of Democratic Army of Greece. Their numbers have dwindled from a high of some 40,000 in the 1960s. The main reason is emigration to Greece after the end of the Cold War when laws allowed the return of all ethnic Greeks who had been exiled for political reasons after the Greek Civil War. The biggest Greek community in the country is in the capital city of Tashkent where most of the Greek political refugees were relocated by the Soviet authorities. In ancient times the south of the country was part of the Hellenistic Seleucid Kingdom but the few Greek communities there have since assimilated. The most important organization representing ethnic Greeks is the Greek Cultural Association of Tashkent. The activities of this organization include Greek language instruction (19 classes with a total of 571 students, classes for adults) and the teaching of Greek dances and music.nn### Response:n
Most of the Greek political refugees in Uzbekistan were relocated by the Soviet authorities to the capital city of Tashkent.
n“`nThe capital of Uzbekistan contains the biggest Greek community because the Greeks in Uzbekistan number approximately9,000. The community is made up of Greeks from Russia who were deported by force from that country to Uzbekistan in the1940s, and political refugees from Greece. About30,000 Greeks lived in the country before World War II and a further11,00
The capital of Uzbekistan contains the biggest Greek community because that is where most of the Greek political refugees were relocated by the Soviet authorities after being deported from Greece.nn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is the difference between Parthian and Arsacid Empire?nn### Input:n

We can see that the responses from the fine-tuned model demonstrate a significant improvement in precision, relevance, and clarity compared to those from the pre-trained model. In some cases, using the pre-trained model for your use case might not be enough, so fine-tuning it using this technique will make the solution more personalized to your dataset.
Clean up
After you have completed your training job and don’t want to use the existing resources anymore, delete the resources using the following code:

# Delete resources
# Delete the fine-tuned model
finetuned_predictor.delete_model()

# Delete the fine-tuned model endpoint
finetuned_predictor.delete_endpoint()

Conclusion
The deployment and fine-tuning of Llama 2 Neuron models on SageMaker demonstrate a significant advancement in managing and optimizing large-scale generative AI models. These models, including variants like Llama-2-7b and Llama-2-13b, use Neuron for efficient training and inference on AWS Inferentia and Trainium based instances, enhancing their performance and scalability.
The ability to deploy these models through the SageMaker JumpStart UI and Python SDK offers flexibility and ease of use. The Neuron SDK, with its support for popular ML frameworks and high-performance capabilities, enables efficient handling of these large models.
Fine-tuning these models on domain-specific data is crucial for enhancing their relevance and accuracy in specialized fields. The process, which you can conduct through the SageMaker Studio UI or Python SDK, allows for customization to specific needs, leading to improved model performance in terms of prompt completions and response quality.
Comparatively, the pre-trained versions of these models, while powerful, may provide more generic or repetitive responses. Fine-tuning tailors the model to specific contexts, resulting in more accurate, relevant, and diverse responses. This customization is particularly evident when comparing responses from pre-trained and fine-tuned models, where the latter demonstrates a noticeable improvement in quality and specificity of output. In conclusion, the deployment and fine-tuning of Neuron Llama 2 models on SageMaker represent a robust framework for managing advanced AI models, offering significant improvements in performance and applicability, especially when tailored to specific domains or tasks.
Get started today by referencing sample SageMaker notebook.
For more information on deploying and fine-tuning pre-trained Llama 2 models on GPU-based instances, refer to Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart and Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart.
The authors would like to acknowledge the technical contributions of Evan Kravitz, Christopher Whitten, Adam Kozdrowicz, Manan Shah, Jonathan Guinegagne and Mike James.

About the Authors
Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.
Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.
Madhur Prashant works in the generative AI space at AWS. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.
Dewan Choudhury is a Software Development Engineer with Amazon Web Services. He works on Amazon SageMaker’s algorithms and JumpStart offerings. Apart from building AI/ML infrastructures, he is also passionate about building scalable distributed systems.
Hao Zhou is a Research Scientist with Amazon SageMaker. Before that, he worked on developing machine learning methods for fraud detection for Amazon Fraud Detector. He is passionate about applying machine learning, optimization, and generative AI techniques to various real-world problems. He holds a PhD in Electrical Engineering from Northwestern University.
Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Dr. Li Zhang is a Principal Product Manager-Technical for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms, a service that helps data scientists and machine learning practitioners get started with training and deploying their models, and uses reinforcement learning with Amazon SageMaker. His past work as a principal research staff member and master inventor at IBM Research has won the test of time paper award at IEEE INFOCOM.
Kamran Khan, Sr Technical Business Development Manager for AWS Inferentina/Trianium at AWS. He has over a decade of experience helping customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium.
Joe Senerchia is a Senior Product Manager at AWS. He defines and builds Amazon EC2 instances for deep learning, artificial intelligence, and high-performance computing workloads.

Use mobility data to derive insights using Amazon SageMaker geospatial …

Geospatial data is data about specific locations on the earth’s surface. It can represent a geographical area as a whole or it can represent an event associated with a geographical area. Analysis of geospatial data is sought after in a few industries. It involves understanding where the data exists from a spatial perspective and why it exists there.
There are two types of geospatial data: vector data and raster data. Raster data is a matrix of cells represented as a grid, mostly representing photographs and satellite imagery. In this post, we focus on vector data, which is represented as geographical coordinates of latitude and longitude as well as lines and polygons (areas) connecting or encompassing them. Vector data has a multitude of use cases in deriving mobility insights. User mobile data is one such component of it, and it’s derived mostly from the geographical position of mobile devices using GPS or app publishers using SDKs or similar integrations. For the purpose of this post, we refer to this data as mobility data.
This is a two-part series. In this first post, we introduce mobility data, its sources, and a typical schema of this data. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights. The second post will be more technical in nature and cover these steps in detail alongside sample code. This post does not have a sample dataset or sample code, rather it covers how to use the data after it’s purchased from a data aggregator.
You can use Amazon SageMaker geospatial capabilities to overlay mobility data on a base map and provide layered visualization to make collaboration easier. The GPU-powered interactive visualizer and Python notebooks provide a seamless way to explore millions of data points in a single window and share insights and results.
Sources and schema
There are few sources of mobility data. Apart from GPS pings and app publishers, other sources are used to augment the dataset, such as Wi-Fi access points, bid stream data obtained via serving ads on mobile devices, and specific hardware transmitters placed by businesses (for example, in physical stores). It’s often difficult for businesses to collect this data themselves, so they may purchase it from data aggregators. Data aggregators collect mobility data from various sources, clean it, add noise, and make the data available on a daily basis for specific geographic regions. Due to the nature of the data itself and because it’s difficult to obtain, the accuracy and quality of this data can vary considerably, and it’s up to the businesses to appraise and verify this by using metrics such as daily active users, total daily pings, and average daily pings per device. The following table shows what a typical schema of a daily data feed sent by data aggregators may look like.

Attribute
Description

Id or MAID
Mobile Advertising ID (MAID) of the device (hashed)

lat
Latitude of the device

lng
Longitude of the device

geohash
Geohash location of the device

device_type
Operating System of the device = IDFA or GAID

horizontal_accuracy
Accuracy of horizontal GPS coordinates (in meters)

timestamp
Timestamp of the event

ip
IP address

alt
Altitude of the device (in meters)

speed
Speed of the device (in meters/second)

country
ISO two-digit code for the country of origin

state
Codes representing state

city
Codes representing city

zipcode
Zipcode of where Device ID is seen

carrier
Carrier of the device

device_manufacturer
Manufacturer of the device

Use cases
Mobility data has widespread applications in varied industries. The following are some of the most common use cases:

Density metrics – Foot traffic analysis can be combined with population density to observe activities and visits to points of interest (POIs). These metrics present a picture of how many devices or users are actively stopping and engaging with a business, which can be further used for site selection or even analyzing movement patterns around an event (for example, people traveling for a game day). To obtain such insights, the incoming raw data goes through an extract, transform, and load (ETL) process to identify activities or engagements from the continuous stream of device location pings. We can analyze activities by identifying stops made by the user or mobile device by clustering pings using ML models in Amazon SageMaker.
Trips and trajectories – A device’s daily location feed can be expressed as a collection of activities (stops) and trips (movement). A pair of activities can represent a trip between them, and tracing the trip by the moving device in geographical space can lead to mapping the actual trajectory. Trajectory patterns of user movements can lead to interesting insights such as traffic patterns, fuel consumption, city planning, and more. It can also provide data to analyze the route taken from advertising points such as a billboard, identify the most efficient delivery routes to optimize supply chain operations, or analyze evacuation routes in natural disasters (for example, hurricane evacuation).
Catchment area analysis – A catchment area refers to places from where a given area draws its visitors, who may be customers or potential customers. Retail businesses can use this information to determine the optimal location to open a new store, or determine if two store locations are too close to each other with overlapping catchment areas and are hampering each other’s business. They can also find out where the actual customers are coming from, identify potential customers who pass by the area traveling to work or home, analyze similar visitation metrics for competitors, and more. Marketing Tech (MarTech) and Advertisement Tech (AdTech) companies can also use this analysis to optimize marketing campaigns by identifying the audience close to a brand’s store or to rank stores by performance for out-of-home advertising.

There are several other use cases, including generating location intelligence for commercial real estate, augmenting satellite imagery data with footfall numbers, identifying delivery hubs for restaurants, determining neighborhood evacuation likelihood, discovering people movement patterns during a pandemic, and more.
Challenges and ethical use
Ethical use of mobility data can lead to many interesting insights that can help organizations improve their operations, perform effective marketing, or even attain a competitive advantage. To utilize this data ethically, several steps need to be followed.
It starts with the collection of data itself. Although most mobility data remains free of personally identifiable information (PII) such as name and address, data collectors and aggregators must have the user’s consent to collect, use, store, and share their data. Data privacy laws such as GDPR and CCPA need to be adhered to because they empower users to determine how businesses can use their data. This first step is a substantial move towards ethical and responsible use of mobility data, but more can be done.
Each device is assigned a hashed Mobile Advertising ID (MAID), which is used to anchor the individual pings. This can be further obfuscated by using Amazon Macie, Amazon S3 Object Lambda, Amazon Comprehend, or even the AWS Glue Studio Detect PII transform. For more information, refer to Common techniques to detect PHI and PII data using AWS Services.
Apart from PII, considerations should be made to mask the user’s home location as well as other sensitive locations like military bases or places of worship.
The final step for ethical use is to derive and export only aggregated metrics out of Amazon SageMaker. This means getting metrics such as average number or total number of visitors as opposed to individual travel patterns; getting daily, weekly, monthly or yearly trends; or indexing mobility patters over publicly available data such as census data.
Solution overview
As mentioned earlier, the AWS services that you can use for analysis of mobility data are Amazon S3, Amazon Macie, AWS Glue, S3 Object Lambda, Amazon Comprehend, and Amazon SageMaker geospatial capabilities. Amazon SageMaker geospatial capabilities make it easy for data scientists and ML engineers to build, train, and deploy models using geospatial data. You can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained ML models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.
The following reference architecture depicts a workflow using ML with geospatial data.

In this workflow, raw data is aggregated from various data sources and stored in an Amazon Simple Storage Service (S3) bucket. Amazon Macie is used on this S3 bucket to identify and redact and PII. AWS Glue is then used to clean and transform the raw data to the required format, then the modified and cleaned data is stored in a separate S3 bucket. For those data transformations that are not possible via AWS Glue, you use AWS Lambda to modify and clean the raw data. When the data is cleaned, you can use Amazon SageMaker to build, train, and deploy ML models on the prepped geospatial data. You can also use the geospatial Processing jobs feature of Amazon SageMaker geospatial capabilities to preprocess the data—for example, using a Python function and SQL statements to identify activities from the raw mobility data. Data scientists can accomplish this process by connecting through Amazon SageMaker notebooks. You can also use Amazon QuickSight to visualize business outcomes and other important metrics from the data.
Amazon SageMaker geospatial capabilities and geospatial Processing jobs
After the data is obtained and fed into Amazon S3 with a daily feed and cleaned for any sensitive data, it can be imported into Amazon SageMaker using an Amazon SageMaker Studio notebook with a geospatial image. The following screenshot shows a sample of daily device pings uploaded into Amazon S3 as a CSV file and then loaded in a pandas data frame. The Amazon SageMaker Studio notebook with geospatial image comes preloaded with geospatial libraries such as GDAL, GeoPandas, Fiona, and Shapely, and makes it straightforward to process and analyze this data.

This sample dataset contains approximately 400,000 daily device pings from 5,000 devices from 14,000 unique places recorded from users visiting the Arrowhead Mall, a popular shopping mall complex in Phoenix, Arizona, on May 15, 2023. The preceding screenshot shows a subset of columns in the data schema. The MAID column represents the device ID, and each MAID generates pings every minute relaying the latitude and longitude of the device, recorded in the sample file as Lat and Lng columns.
The following are screenshots from the map visualization tool of Amazon SageMaker geospatial capabilities powered by Foursquare Studio, depicting the layout of pings from devices visiting the mall between 7:00 AM and 6:00 PM.
The following screenshot shows pings from the mall and surrounding areas.

The following shows pings from inside various stores in the mall.

Each dot in the screenshots depicts a ping from a given device at a given point in time. A cluster of pings represents popular spots where devices gathered or stopped, such as stores or restaurants.
As part of the initial ETL, this raw data can be loaded onto tables using AWS Glue. You can create an AWS Glue crawler to identify the schema of the data and form tables by pointing to the raw data location in Amazon S3 as the data source.

As mentioned above, the raw data (the daily device pings), even after initial ETL, will represent a continuous stream of GPS pings indicating device locations. To extract actionable insights from this data, we need to identify stops and trips (trajectories). This can be achieved using the geospatial Processing jobs feature of SageMaker geospatial capabilities. Amazon SageMaker Processing uses a simplified, managed experience on SageMaker to run data processing workloads with the purpose-built geospatial container. The underlying infrastructure for a SageMaker Processing job is fully managed by SageMaker. This feature enables custom code to run on geospatial data stored on Amazon S3 by running a geospatial ML container on a SageMaker Processing job. You can run custom operations on open or private geospatial data by writing custom code with open source libraries, and run the operation at scale using SageMaker Processing jobs. The container-based approach solves for needs around standardization of development environment with commonly used open source libraries.
To run such large-scale workloads, you need a flexible compute cluster that can scale from tens of instances to process a city block, to thousands of instances for planetary-scale processing. Manually managing a DIY compute cluster is slow and expensive. This feature is particularly helpful when the mobility dataset involves more than a few cities to multiple states or even countries and can be used to run a two-step ML approach.
The first step is to use density-based spatial clustering of applications with noise (DBSCAN) algorithm to cluster stops from pings. The next step is to use the support vector machines (SVMs) method to further improve the accuracy of the identified stops and also to distinguish stops with engagements with a POI vs. stops without one (such as home or work). You can also use SageMaker Processing job to generate trips and trajectories from the daily device pings by identifying consecutive stops and mapping the path between the source and destinations stops.
After processing the raw data (daily device pings) at scale with geospatial Processing jobs, the new dataset called stops should have the following schema.

Attribute
Description

Id or MAID
Mobile Advertising ID of the device (hashed)

lat
Latitude of the centroid of the stop cluster

lng
Longitude of the centroid of the stop cluster

geohash
Geohash location of the POI

device_type
Operating system of the device (IDFA or GAID)

timestamp
Start time of the stop

dwell_time
Dwell time of the stop (in seconds)

ip
IP address

alt
Altitude of the device (in meters)

country
ISO two-digit code for the country of origin

state
Codes representing state

city
Codes representing city

zipcode
Zip code of where device ID is seen

carrier
Carrier of the device

device_manufacturer
Manufacturer of the device

Stops are consolidated by clustering the pings per device. Density-based clustering is combined with parameters such as the stop threshold being 300 seconds and the minimum distance between stops being 50 meters. These parameters can be adjusted as per your use case.
The following screenshot shows approximately 15,000 stops identified from 400,000 pings. A subset of the preceding schema is present as well, where the column Dwell Time represents the stop duration, and the Lat and Lng columns represent the latitude and longitude of the centroids of the stops cluster per device per location.
Post-ETL, data is stored in Parquet file format, which is a columnar storage format that makes it easier to process large amounts of data.

The following screenshot shows the stops consolidated from pings per device inside the mall and surrounding areas.

After identifying stops, this dataset can be joined with publicly available POI data or custom POI data specific to the use case to identify activities, such as engagement with brands.
The following screenshot shows the stops identified at major POIs (stores and brands) inside the Arrowhead Mall.

Home zip codes have been used to mask each visitor’s home location to maintain privacy in case that is part of their trip in the dataset. The latitude and longitude in such cases are the respective coordinates of the centroid of the zip code.
The following screenshot is a visual representation of such activities. The left image maps the stops to the stores, and the right image gives an idea of the layout of the mall itself.

This resulting dataset can be visualized in a number of ways, which we discuss in the following sections.
Density metrics
We can calculate and visualize the density of activities and visits.
Example 1 – The following screenshot shows top 15 visited stores in the mall.

Example 2 – The following screenshot shows number of visits to the Apple Store by each hour.

Trips and trajectories
As mentioned earlier, a pair of consecutive activities represents a trip. We can use the following approach to derive trips from the activities data. Here, window functions are used with SQL to generate the trips table, as shown in the screenshot.

After the trips table is generated, trips to a POI can be determined.
Example 1 – The following screenshot shows the top 10 stores that direct foot traffic towards the Apple Store.

Example 2 – The following screenshot shows all the trips to the Arrowhead Mall.

Example 3 – The following video shows the movement patterns inside the mall.

Example 4 – The following video shows the movement patterns outside the mall.

Catchment area analysis
We can analyze all visits to a POI and determine the catchment area.
Example 1 – The following screenshot shows all visits to the Macy’s store.

Example 2 – The following screenshot shows the top 10 home area zip codes (boundaries highlighted) from where the visits occurred.

Data quality check
We can check the daily incoming data feed for quality and detect anomalies using QuickSight dashboards and data analyses. The following screenshot shows an example dashboard.

Conclusion
Mobility data and its analysis for gaining customer insights and obtaining competitive advantage remains a niche area because it’s difficult to obtain a consistent and accurate dataset. However, this data can help organizations add context to existing analysis and even produce new insights around customer movement patterns. Amazon SageMaker geospatial capabilities and geospatial Processing jobs can help implement these use cases and derive insights in an intuitive and accessible way.
In this post, we demonstrated how to use AWS services to clean the mobility data and then use Amazon SageMaker geospatial capabilities to generate derivative datasets such as stops, activities, and trips using ML models. Then we used the derivative datasets to visualize movement patterns and generate insights.
You can get started with Amazon SageMaker geospatial capabilities in two ways:

Through the Amazon SageMaker geospatial UI, as a part of Amazon SageMaker Studio UI
Through Amazon SageMaker notebooks with a Amazon SageMaker geospatial image

To learn more, visit Amazon SageMaker geospatial capabilities and Getting Started with Amazon SageMaker geospatial. Also, visit our GitHub repo, which has several example notebooks on Amazon SageMaker geospatial capabilities.

About the Authors
Jimy Matthews is an AWS Solutions Architect, with expertise in AI/ML tech. Jimy is based out of Boston and works with enterprise customers as they transform their business by adopting the cloud and helps them build efficient and sustainable solutions. He is passionate about his family, cars and Mixed martial arts.
Girish Keshav is a Solutions Architect at AWS, helping out customers in their cloud migration journey to modernize and run workloads securely and efficiently. He works with leaders of technology teams to guide them on application security, machine learning, cost optimization and sustainability. He is based out of San Francisco, and loves traveling, hiking, watching sports, and exploring craft breweries.
Ramesh Jetty is a Senior leader of Solutions Architecture focused on helping AWS enterprise customers monetize their data assets. He advises executives and engineers to design and build highly scalable, reliable, and cost effective cloud solutions, especially focused on machine learning, data and analytics. In his free time he enjoys the great outdoors, biking and hiking with his family.

10 Marketing Agency Trends You Should Know About in 2024

Welcome to 2024, a year that’s already shaping up to be a landmark in the world of marketing. 

As seasoned pros, we’ve seen trends come and go, but what’s unfolding this year is something different. Things are moving fast and there are some truly impactful trends that will reshape the marketing landscape. 

Let’s cut through the noise and dive straight into it. We’re not just talking about the next big thing – we’re talking real changes that are reshaping our industry. From leveraging cutting-edge AI to the loss of cookies to a focus on true personalization. 

I’ll walk you through the trends that are not just buzzwords, but game changers. So, grab your coffee, and let’s break down the marketing playbook for 2024.

What Are the Top Marketing Agency Trends for 2024?

At the heart of it, we are seeing 10 key trends in 2024:

Loss of Cookies

Privacy & Data Changes

Email Constraints

Purpose-Driven Marketing

Generative AI & Content Creation

Video Content is Still Here

AI-Based Personalization

Advanced Data Analytics

Agile Marketing 

Partnership Marketing

Let’s dive a bit deeper into each of these trends.

Marketing Agency Trend #1: Loss of Third-Party Cookies

Unless you’ve had your head buried in the sand, you are well aware of the “cookiepocolypse”.  

For years we’ve been talking about the removal of third-party cookies from major tech platforms.

Well, the time has finally come. 

Apple started blocking third-party cookies in 2023 and as of January 1, 2024, Chrome has began their phase out. Oh, and Microsoft is also deprecating cookies in early 2024.

What does this mean for marketing agencies?

Proving your value is about to get a lot harder.

So is campaign optimization.

For marketing agencies, the focus has to be on first-party data and more innovative attribution strategies. 

It really is a pivotal time for agencies to adapt and reinvent their methodologies, focusing on genuine connections and data integrity to stay ahead.

Marketing Agency Trend #2: Data & Privacy Changes

We’ve touched on privacy a lot here at Customers.ai (see Data Privacy Changes: A Must-Read Guide for Marketing Agencies).

The reason being, there have been A LOT of privacy and data changes that are impacting how marketing agencies do business. 

Last year we had the official switch to GA4. 

iOS 17 introduced Link Tracking Protection and the removal of Click IDs.

Then we had Meta announcing Ad Free Facebook subscriptions in the EU. 

On top of it, states passed privacy regulations at record pace. It’s a lot to keep up with! 

What does this mean for marketing agencies?

For a long time, marketing agencies were able to kind of roll with the punches when it came to privacy changes. 

Big changes like GDPR and CCPA made waves but overall they were fairly easy to adjust to. 

What we are seeing now is different. 

These updates are fundamentally impacting the way marketing agencies do business and it means that adjustments need to be made much quicker and innovation is not just necessary, but key. 

Marketing Agency Trend #3: Email Constraints

As of February 1, 2024, Google and Yahoo’s new email regulations will officially have launched.

If you aren’t super up to date on what this means, we did a deep dive sometime ago. 

The gist of it is that emails, especially from bulk senders, are about to be scrutinized at a much higher rate and deliverability is going to be a challenge if the right steps aren’t taken. 

Two main points to be aware of:

Mandatory Digital Email Signing.  Senders with over 5,000 emails will need to use DomainKeys Identified Mail (DKIM) for digital signing. While this is already a best practice, non-signed emails are now treated more suspiciously. 

New Complaint Rate Threshold. A complaint rate over 0.3% now poses the risk of being blocked. A complaint occurs when a user marks an email as spam, not when they unsubscribe. 

Along with the rollout of new email regulations, Google also rolled out a new AI email spam filter to better capture spam.

What does this mean for marketing agencies?

Email deliverability will become essential in 2024. With email providers cracking down on spam, there must be a focus on creating truly compelling messages that resonate with audiences.

It also means moving away from cold email tactics and toward warm leads. 

Why target people who have never heard of you when you can reach those who are already interested?

That’s where Customers.ai comes in. Our Website Visitor ID X-Ray Pixel captures the names and email addresses of those visiting your website. 

Think about it – even with a cart abandonment solution, you only capture people if they’ve put in their info. With website visitor identification, you can capture 10-20x the number of people! Perfect for remarketing cart abandonment emails. 

To install the Website Visitor ID X-Ray Pixel, sign up (for FREE!), go to your dashboard, and navigate to My Automations. 

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

Marketing Agency Trend #4: Purpose-Driven Marketing

According to a study by Deloitte, over 25% of consumers walked away from a brand that they interpreted as insensitive. 

Customers want to feel good about who they are doing business with and they want to do business with companies who stand for something. It’s part of the reason people love REI!

What does this mean for marketing agencies?

What makes your client’s unique? How are they helping people? Once you understand that, take that service and shoot it to the moon — get micro, get focused, and amplify your “here to help” messaging for better brand loyalty. 

Marketing Agency Trend #5: Generative AI & Content Creation

Just like cookie deprecation, AI is not a thing of the future…it is here. And it’s already being used by big companies and major ad agencies.

According to Reuters, WPP, the world’s biggest advertising agency, is working with consumer goods companies including Nestle and Oreo-maker Mondelez to use generative AI in advertising campaigns.

Additionally, Authority Hacker’s AI survey showed that 85.1% of AI users use the technology for article writing and content creation.

The point? AI is here and whether you agree with it or not, it’s being used to create marketing and advertising content.

Image Source: Lady Smith News

What does this mean for marketing agencies?

AI opens tons of doors for marketing agencies and can certainly help improve efficiency in the content production process. 

The key is to keep the human element at the core of the messaging and avoid falling into the trap of publishing the bland, generic content that AI tends to put out. Oh, also fact checking.

While AI can put out some great content, it’s not an expert and it requires fact checking.

At the end of the day, marketing agencies need to be experimenting with AI if they want to stay ahead of the game.

Customers.ai for Agencies

Higher retainers, more revenue streams, and happier clients.

Book a Demo

Marketing Agency Trend #6: Video Content is Still Here

Has video made every trends list for the past 10+ years? Yup. 

Do we still need to include it? Yup.

Here are a few stats from Hubspot’s Video Marketing Survey:

In 2023, people watched, on average, 17 hours of online videos per week

People are 52% more likely to share video content than any other type of content

75% of viewers watch short-form video content on their mobile devices

Video isn’t going anywhere, yet marketers still aren’t using video as much as they could be. According to Sprout Social, only 35% of marketers anticipate utilizing YouTube in the next 12 months.

What does this mean for marketing agencies?

Video presents a huge opportunity for marketers and brands. Consumers are spending their time watching online videos making it the perfect place to put your ad. 

That being said, it’s important to put your ad where it will be seen. Consumers have become really good at avoiding ads, which could result in wasted spend. 

Image Source: Marketing Charts

If you are looking to get started in video, start small. Create a video for the blog or for the help section. Create short videos for social and see how your audience responds.

Like we said earlier, video isn’t going anywhere so it’s time to figure out how to make it work for you.

Marketing Agency Trend #7: AI-Based Personalization

As a marketer, the dream is personalization – serving really specific ads to individuals. 

Unfortunately, with the privacy changes we mentioned above and previous ones not mentioned, this is becoming harder and harder, especially in the ads space. 

AI is changing that. 

According to a Twilio report, 92% of businesses surveyed are using AI-driven personalization. 

What does this mean for marketing agencies?

With AI, marketers are able to actually craft hyper-personalized content, especially in email campaigns. 

And this level of personalization goes beyond just using a customer’s name – it’s about creating content based on behaviors, preferences, purchase history, and more. 

The payoff for agencies is clear: enhanced customer engagement, higher conversion rates, and more meaningful brand interactions. 

[Check out the Customers.ai AI Email Writer] 

Marketing Agency Trend #8: Advanced Data Analytics

Thanks to the privacy changes we’ve noted a few times and Google’s shift to GA4 (boo!), 2024’s marketing scene is seeing a surge in advanced data analytics, powered largely by Generative AI. 

Marketers are being forced to get crafty when it comes to attribution – this includes using sophisticated tools to sift through data and glean deeper insights into consumer preferences and behaviors. 

Image Source: Marketing Evolution

It’s important to note, this shift is not just about adapting to new privacy norms; it’s about turning a challenge into an opportunity to understand audiences on a more nuanced level. 

What does this mean for marketing agencies?

Agencies are now able to predict trends, tailor marketing strategies, and create more relevant content thanks to these more advanced data platforms. 

For example, a retailer could use these insights to predict upcoming fashion trends, aligning their marketing campaigns more closely with consumer interests. 

The key is to not dwell on what once was but instead focus on what can be. 

Marketing Agency Trend #9: Agile Marketing

It may be a stretch to call marketing agility a trend, as at first blush it seems more like a necessary survival skill. However, marketers are uniquely positioned to leverage digital tools and technology better than most.

When the world throws you a pandemic or mass social unrest, you’ve got to make quick changes to stay relevant in the marketplace. Your consumers’ needs ebb and flow like the tide and you’ve got to be the lifeboat on these choppy seas of change. 

A great way to keep pace is by having an omnichannel approach to marketing where you can keep tabs on all of the different channels from a single dashboard or platform. 

Image Source: Triple Whale

What does this mean for marketing agencies?

We know how fast things move and that has never felt more relevant than this year. 

Marketers need to ensure they are staying up on trends and adapting to changes as quickly as possible.

It means being proactive vs. reactive and ensuring you have the tools and technology you need to be successful. 

For example, consider adding more cross-channel marketing tools to your tech stack this year to both delight your clients and improve your agency’s internal efficiency.

Marketing Agency Trend #10: Partnership Marketing

Every industry is looking towards fractional solutions in various applications. When it comes to marketing — combining forces with other successful or driven marketing agencies is better all around. 

Here at Customers.ai, we have had some really great partner-driven campaigns. For example, just this past December, we did an event with our partners at Semrush to announce our product integration:

What does this mean for marketing agencies?

Partner marketing opens up opportunities you wouldn’t have otherwise. Not only will your agency become more scalable and able to take on more work but also you can focus on your expertise. 

Find partners that complement you and share similar target markets as your business. 

Remember, there are many different “partner marketing” models, but you’re basically outsourcing anything you haven’t mastered and freeing yourself up to become the master of all things.  

What Is The Biggest Trend In Agency Marketing? 

For 2024, the big trend is going to be across many verticals — being kind, being real, and being helpful. 

It might sound mushy but we’re quickly approaching a time where brands have to take a stance. Then, they have to stick to it and beat that drum until you and your clients are excited about what you believe in. 

While you might still be reeling from all of the changes in the marketing industry in recent years — stop monkeying around and start brainstorming new ways of doing things to maintain your marketing agency’s competitive edge. 

Want an extra advantage? Try Customers.ai for free and 10x your campaign performance!

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Marketing Agency Trend FAQs

Q. What are the latest trends in digital marketing for agencies?

Current trends include increased use of artificial intelligence, personalized marketing, and a focus on data-driven strategies.

Q. How are marketing agencies adapting to changes in social media algorithms?

Agencies are focusing on creating more engaging content, utilizing stories and live videos, and leveraging influencer partnerships.

Q. What role does data analytics play in modern marketing agencies?

Data analytics is crucial for understanding consumer behavior, measuring campaign effectiveness, and guiding decision-making processes.

Q. How are marketing agencies integrating AI into their strategies?

AI is used for predictive analysis, customer segmentation, personalization, chatbots, and automating repetitive tasks.

Q. What is the significance of video marketing in current agency strategies?

Video marketing is increasingly important for engagement, with a focus on short-form videos, live streaming, and interactive content.

Q. How important is content marketing for agencies today?

Content marketing remains a key strategy, focusing on creating valuable, relevant content to attract and retain a clearly defined audience.

Q. What are the emerging trends in email marketing for agencies?

Trends include automation, personalization, segmentation, and integrating interactive elements in emails.

Q. How are agencies using SEO in current digital marketing strategies?

Agencies focus on voice search optimization, mobile-first indexing, and user experience as key elements of SEO.

Q. What is the role of influencer marketing in today’s agency strategies?

Influencer marketing is used to build brand authenticity and reach target audiences through trusted voices in various industries.

Q. How are marketing agencies tackling privacy and data protection?

Agencies are emphasizing transparency, compliance with regulations like GDPR, and ethical use of consumer data.

Q. What trends are shaping mobile marketing in agencies?

Trends include the increasing importance of mobile optimization, location-based marketing, and mobile apps for personalized experiences.

Q. How is user experience (UX) influencing marketing agency strategies?

UX is crucial for customer satisfaction, with a focus on website design, ease of navigation, and providing a seamless user journey.

Q. What are the trends in programmatic advertising for agencies?

Trends include the use of AI for real-time bidding, increased focus on transparency, and leveraging first-party data.

Q. How is augmented reality (AR) being used by marketing agencies?

AR is used for creating immersive brand experiences, interactive advertising, and enhancing online shopping experiences.

Q. What role does sustainability play in marketing agency trends?

Agencies are increasingly focusing on sustainable practices, eco-friendly campaigns, and promoting corporate social responsibility.

Q. How are chatbots transforming agency-customer interactions?

Chatbots are used for improving customer service, providing instant responses, and automating simple tasks.

Q. What impact is voice search having on marketing agencies’ strategies?

Agencies are optimizing content for voice search, focusing on conversational keywords, and ensuring mobile compatibility.

Q. How are agencies leveraging interactive content?

Interactive content like quizzes, polls, and interactive videos are used to engage users and provide personalized experiences.

Q. What are the trends in omnichannel marketing for agencies?

Omnichannel marketing focuses on creating a seamless customer experience across all channels and touchpoints.

Q. How are agencies addressing the need for marketing automation?

Agencies are implementing automation tools for efficient campaign management, lead nurturing, and analytics.

Q. What trends are impacting local SEO strategies in marketing agencies?

Key trends include optimizing for local search queries, focusing on local listings and reviews, and localized content.

Q. How is the gig economy affecting marketing agencies?

The gig economy is leading to more freelance and contract work, providing agencies with flexible staffing options.

Q. What is the significance of micro-moments in agency marketing strategies?

Micro-moments involve targeting customers at the exact moment they need a service, focusing on intent-driven marketing.

Q. How are agencies using predictive analytics?

Predictive analytics is used for forecasting trends, customer behavior analysis, and optimizing marketing strategies.

Q. What are the challenges faced by marketing agencies in adapting to new trends?

Challenges include keeping up with rapidly evolving technology, skill gaps, and aligning new trends with client objectives.
The post 10 Marketing Agency Trends You Should Know About in 2024 appeared first on Customers.ai.

This AI Paper from Harvard Explores the Frontiers of Privacy in AI: A …

Privacy concerns have become a significant issue in AI research, particularly in the context of Large Language Models (LLMs). The SAFR AI Lab at Harvard Business School was surveyed to explore the intricate landscape of privacy issues associated with LLMs. The researchers focused on red-teaming models to highlight privacy risks, integrate privacy into the training process, efficiently delete data from trained models, and mitigate copyright issues. Their emphasis lies on technical research, encompassing algorithm development, theorem proofs, and empirical evaluations.

The survey highlights the challenges of distinguishing desirable “memorization” from privacy-infringing instances. The researchers discuss the limitations of verbatim memorization filters and the complexities of fair use law in determining copyright violation. They also highlight researchers’ technical mitigation strategies, such as data filtering to prevent copyright infringement.

The survey provides insights into various datasets used in LLM training, including the AG News Corpus and BigPatent-G, which consist of news articles and US patent documents. The researchers also discuss the legal discourse surrounding copyright issues in LLMs, emphasizing the need for more solutions and modifications to safely deploy these models without risking copyright violations. They acknowledge the difficulty in quantifying creative novelty and intended use, underscoring the complexities of determining copyright violation.

The researchers discuss the use of differential privacy, which adds noise to the data to prevent the identification of individual users. They also discuss federated learning, which allows models to be trained on decentralized data sources without compromising privacy. The survey also highlights machine unlearning, which involves removing sensitive data from trained models to comply with privacy regulations.

The researchers demonstrate the effectiveness of differential privacy in mitigating privacy risks associated with LLMs. They also show that federated learning can train models on decentralized data sources without compromising privacy. The survey highlights machine unlearning to remove sensitive data from trained models to comply with privacy regulations.

The survey provides a comprehensive overview of the privacy challenges in Large Language Models, offering technical insights and mitigation strategies. It underscores the need for continued research and development to address the intricate intersection of privacy, copyright, and AI technology. The proposed methodology offers promising solutions to mitigate privacy risks associated with LLMs, and the performance and results demonstrate the effectiveness of these solutions. The survey highlights the importance of addressing privacy concerns in LLMs to ensure these models’ safe and ethical deployment.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post This AI Paper from Harvard Explores the Frontiers of Privacy in AI: A Comprehensive Survey of Large Language Models’ Privacy Challenges and Solutions appeared first on MarkTechPost.

Meet CrewAI: An Artificial IntelligenceFramework for Orchestrating Rol …

In artificial intelligence, the challenge arises when multiple AI agents need to work together seamlessly to tackle complex tasks. This collaborative intelligence is essential for building intelligent assistant platforms, automated customer service ensembles, or multi-agent research teams. Existing solutions, like Autogen and ChatDev, have their strengths, but they come with limitations, such as complex programming requirements and rigidity in customization.

In AI collaboration, Autogen stands out for creating conversational agents capable of teamwork. However, it needs a built-in concept of processes, making it challenging to orchestrate agents’ interactions efficiently, especially as tasks scale up. On the other hand, ChatDev introduces the idea of processes but is rigid and limited in customization, making it less suitable for real-world applications that require flexibility and scalability.

CrewAI is a cutting-edge framework designed to address the challenges of AI collaboration. It empowers AI agents to assume specific roles, share goals, and operate as a cohesive unit, much like a well-oiled crew. CrewAI is the solution that combines the flexibility of Autogen’s conversational agents with the structured process approach of ChatDev, offering a dynamic and adaptable framework for both development and production workflows.

One notable advantage of CrewAI is its role-based agent design, which allows the customization of agents with specific roles, goals, and tools. This enables autonomous inter-agent delegation, where agents can independently delegate tasks and communicate with each other, enhancing overall problem-solving efficiency. The framework also offers flexible task management, allowing the definition of tasks with customizable tools and dynamic assignment to agents.

CrewAI’s essential feature lies in its support for various processes, currently focusing on sequential task execution. While Autogen and ChatDev have their merits, CrewAI stands out in adaptability for real-world scenarios. Its methods are designed to be versatile, fitting seamlessly into various use cases.

CrewAI’s capabilities include role-based agent effectiveness, autonomous delegation efficiency, and overall task management flexibility. The framework’s ability to handle diverse roles and facilitate independent delegation contributes to improved problem-solving metrics. Moreover, its dynamic task management capabilities ensure the efficient execution of tasks, catering to the specific needs of different scenarios.

In conclusion, CrewAI is a powerful solution for orchestrating role-playing autonomous AI agents. Its unique flexibility, adaptability, and collaborative intelligence sets it apart in AI collaboration frameworks. As technology advances, the need for efficient collaboration among AI agents becomes more critical. CrewAI stands ready to meet these challenges head-on, providing a robust foundation for sophisticated multi-agent interactions.
The post Meet CrewAI: An Artificial IntelligenceFramework for Orchestrating Role-Playing, Autonomous AI Agents appeared first on MarkTechPost.

Meet PIXART-δ: The Next-Generation AI Framework in Text-to-Image Synt …

In the landscape of text-to-image models, the demand for high-quality visuals has surged. However, these models often need to grapple with resource-intensive training and slow inference, hindering their real-time applicability. In response, this paper introduces PIXART-δ, an advanced iteration that seamlessly integrates Latent Consistency Models (LCM) and a custom ControlNet module into the existing PIXART-α framework. PIXART-α, renowned for its efficient training and superior image generation quality, serves as the foundation. LCM accelerates the inference process, generating high-quality samples in just 2∼4 steps on pre-trained Latent Diffusion Models (LDMs). This enhancement allows PIXART-δ to achieve an impressive inference speed of 0.5 seconds per 1024 × 1024 image on an A100 GPU, a 7× improvement over PIXART-α.

The incorporation of ControlNet, originally designed for UNet architectures, posed a unique challenge when applied to Transformer-based models like PIXART-δ. To overcome this, the team introduces a novel ControlNet-Transformer architecture, ensuring effective integration and preserving ControlNet’s efficacy in managing control information. The proposed design involves selectively applying the ControlNet structure to the initial N base blocks of the Transformer, demonstrating significant improvements in controllability and performance.

https://arxiv.org/abs/2401.05252

https://arxiv.org/abs/2401.05252

The training process leverages Latent Consistency Distillation (LCD), a refined version of the original Consistency Distillation (CD) algorithm. The pseudo-code for PIXART-δ with Classifier-Free Guidance (CFG) is outlined in Algorithm 1, where the Teacher, Student, and EMA Model (Figure 1) function as denoisers for the ODE solver. The innovative LCD algorithm demonstrates its effectiveness, which was evaluated using FID and CLIP scores as performance benchmarks (shown in Figure 2).

Training efficiency is a key highlight, with PIXART-δ successfully undergoing the distillation process within a 32GB GPU memory constraint, supporting image resolutions up to 1024 × 1024. This efficiency allows PIXART-δ to be trained on consumer-grade GPUs, expanding its accessibility.

In terms of inference speed, PIXART-δ outperforms comparable methods like SDXL LCM-LoRA, PIXART-α, and the SDXL standard across different hardware platforms. With only four steps, PIXART-δ maintains a consistent lead in generation speed, showcasing its efficiency compared to the 14 and 25 steps required by PIXART-α and SDXL standards, respectively.

https://arxiv.org/abs/2401.05252

https://arxiv.org/abs/2401.05252

The introduction of ControlNet into PIXART-δ involves replacing the original zero-convolution with a zero linear layer tailored for Transformer architectures. The ControlNet-Transformer design, illustrated in Fig. 4 (c), selectively applies ControlNet to the initial N base blocks, achieving a seamless integration that enhances controllability and overall performance.

An ablation study on ControlNet-Transformer reveals its superiority, demonstrating faster convergence and improved performance across different scenarios. The number of copied blocks (N) is found to impact performance, with satisfactory results achieved with N = 1 for most scenarios but improved performance in challenging edge conditions as N increases (shown in Figure 5).

Analyzing the effect of training steps on ControlNet-Transformer (N = 13), the team observes rapid convergence, particularly noticeable in enhancing the quality of outline edges for human faces and bodies. The efficiency and effectiveness of ControlNet-Transformer are further emphasized, reinforcing its potential for real-time applications.

In summary, PIXART-δ represents a significant advancement in text-to-image generation, combining accelerated sampling with Latent Consistency Models and precise control through the innovative ControlNet-Transformer. Extensive experiments showcase PIXART-δ’s faster sampling and ControlNet-Transformer’s effectiveness in high-resolution and controlled image generation. This model stands at the forefront of state-of-the-art image generation, unlocking new possibilities for real-time applications.

Check out the Paper, Project, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Meet PIXART-δ: The Next-Generation AI Framework in Text-to-Image Synthesis with Unparalleled Speed and Quality appeared first on MarkTechPost.

Host the Whisper Model on Amazon SageMaker: exploring inference option …

OpenAI Whisper is an advanced automatic speech recognition (ASR) model with an MIT license. ASR technology finds utility in transcription services, voice assistants, and enhancing accessibility for individuals with hearing impairments. This state-of-the-art model is trained on a vast and diverse dataset of multilingual and multitask supervised data collected from the web. Its high accuracy and adaptability make it a valuable asset for a wide array of voice-related tasks.
In the ever-evolving landscape of machine learning and artificial intelligence, Amazon SageMaker provides a comprehensive ecosystem. SageMaker empowers data scientists, developers, and organizations to develop, train, deploy, and manage machine learning models at scale. Offering a wide range of tools and capabilities, it simplifies the entire machine learning workflow, from data pre-processing and model development to effortless deployment and monitoring. SageMaker’s user-friendly interface makes it a pivotal platform for unlocking the full potential of AI, establishing it as a game-changing solution in the realm of artificial intelligence.
In this post, we embark on an exploration of SageMaker’s capabilities, specifically focusing on hosting Whisper models. We’ll dive deep into two methods for doing this: one utilizing the Whisper PyTorch model and the other using the Hugging Face implementation of the Whisper model. Additionally, we’ll conduct an in-depth examination of SageMaker’s inference options, comparing them across parameters such as speed, cost, payload size, and scalability. This analysis empowers users to make informed decisions when integrating Whisper models into their specific use cases and systems.
Solution overview
The following diagram shows the main components of this solution.

In order to host the model on Amazon SageMaker, the first step is to save the model artifacts. These artifacts refer to the essential components of a machine learning model needed for various applications, including deployment and retraining. They can include model parameters, configuration files, pre-processing components, as well as metadata, such as version details, authorship, and any notes related to its performance. It’s important to note that Whisper models for PyTorch and Hugging Face implementations consist of different model artifacts.
Next, we create custom inference scripts. Within these scripts, we define how the model should be loaded and specify the inference process. This is also where we can incorporate custom parameters as needed. Additionally, you can list the required Python packages in a requirements.txt file. During the model’s deployment, these Python packages are automatically installed in the initialization phase.
Then we select either the PyTorch or Hugging Face deep learning containers (DLC) provided and maintained by AWS. These containers are pre-built Docker images with deep learning frameworks and other necessary Python packages. For more information, you can check this link.
With the model artifacts, custom inference scripts and selected DLCs, we’ll create Amazon SageMaker models for PyTorch and Hugging Face respectively.
Finally, the models can be deployed on SageMaker and used with the following options: real-time inference endpoints, batch transform jobs, and asynchronous inference endpoints. We’ll dive into these options in more detail later in this post.

The example notebook and code for this solution are available on this GitHub repository.

Figure 1. Overview of Key Solution Components

Walkthrough
Hosting the Whisper Model on Amazon SageMaker
In this section, we’ll explain the steps to host the Whisper model on Amazon SageMaker, using PyTorch and Hugging Face Frameworks, respectively. To experiment with this solution, you need an AWS account and access to the Amazon SageMaker service.
PyTorch framework

Save model artifacts

The first option to host the model is to use the Whisper official Python package, which can be installed using pip install openai-whisper. This package provides a PyTorch model. When saving model artifacts in the local repository, the first step is to save the model’s learnable parameters, such as model weights and biases of each layer in the neural network, as a ‘pt’ file. You can choose from different model sizes, including ‘tiny,’ ‘base,’ ‘small,’ ‘medium,’ and ‘large.’ Larger model sizes offer higher accuracy performance, but come at the cost of longer inference latency. Additionally, you need to save the model state dictionary and dimension dictionary, which contain a Python dictionary that maps each layer or parameter of the PyTorch model to its corresponding learnable parameters, along with other metadata and custom configurations. The code below shows how to save the Whisper PyTorch artifacts.

### PyTorch
import whisper
# Load the PyTorch model and save it in the local repo
model = whisper.load_model(“base”)
torch.save(
{
‘model_state_dict’: model.state_dict(),
‘dims’: model.dims.__dict__,
},
‘base.pt’
)

Select DLC

The next step is to select the pre-built DLC from this link. Be careful when choosing the correct image by considering the following settings: framework (PyTorch), framework version, task (inference), Python version, and hardware (i.e., GPU). It is recommended to use the latest versions for the framework and Python whenever possible, as this results in better performance and address known issues and bugs from previous releases.

Create Amazon SageMaker models

Next, we utilize the SageMaker Python SDK to create PyTorch models. It’s important to remember to add environment variables when creating a PyTorch model. By default, TorchServe can only process file sizes up to 6MB, regardless of the inference type used.

# Create a PyTorchModel for deployment
from sagemaker.pytorch.model import PyTorchModel

whisper_pytorch_model = PyTorchModel(
model_data=model_uri,
image_uri=image,
role=role,
entry_point=”inference.py”,
source_dir=’code’,
name=model_name,
env = {
‘TS_MAX_REQUEST_SIZE’: ‘100000000’,
‘TS_MAX_RESPONSE_SIZE’: ‘100000000’,
‘TS_DEFAULT_RESPONSE_TIMEOUT’: ‘1000’
}
)

The following table shows the settings for different PyTorch versions:

Framework
Environment variables

PyTorch 1.8 (based on TorchServe)
‘TS_MAX_REQUEST_SIZE‘: ‘100000000’ ‘TS_MAX_RESPONSE_SIZE‘: ‘100000000’ ‘TS_DEFAULT_RESPONSE_TIMEOUT‘: ‘1000’

PyTorch 1.4 (based on MMS)
‘MMS_MAX_REQUEST_SIZE‘: ‘1000000000’ ‘MMS_MAX_RESPONSE_SIZE‘: ‘1000000000’ ‘MMS_DEFAULT_RESPONSE_TIMEOUT‘: ‘900’

Define the model loading method in inference.py

In the custom inference.py script, we first check for the availability of a CUDA-capable GPU. If such a GPU is available, then we assign the ‘cuda’ device to the DEVICE variable; otherwise, we assign the ‘cpu’ device. This step ensures that the model is placed on the available hardware for efficient computation. We load the PyTorch model using the Whisper Python package.

### PyTorch
DEVICE = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
def model_fn(model_dir):
“””
Load and return the model
“””
model = whisper.load_model(os.path.join(model_dir, ‘base.pt’))
model = model.to(DEVICE)
return model

Hugging Face framework

Save model artifacts

The second option is to use Hugging Face’s Whisper implementation. The model can be loaded using the AutoModelForSpeechSeq2Seq transformers class. The learnable parameters are saved in a binary (bin) file using the save_pretrained method. The tokenizer and preprocessor also need to be saved separately to ensure the Hugging Face model works properly. Alternatively, you can deploy a model on Amazon SageMaker directly from the Hugging Face Hub by setting two environment variables: HF_MODEL_ID and HF_TASK. For more information, please refer to this webpage.

### Hugging Face
from transformers import WhisperTokenizer, WhisperProcessor, AutoModelForSpeechSeq2Seq

# Load the pre-trained model
model_name = “openai/whisper-base”
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name)
processor = WhisperProcessor.from_pretrained(model_name)

# Define a directory where you want to save the model
save_directory = “./model”

# Save the model to the specified directory
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
processor.save_pretrained(save_directory)

Select DLC

Similar to the PyTorch framework, you can choose a pre-built Hugging Face DLC from the same link. Make sure to select a DLC that supports the latest Hugging Face transformers and includes GPU support.

Create Amazon SageMaker models

Similarly, we utilize the SageMaker Python SDK to create Hugging Face models. The Hugging Face Whisper model has a default limitation where it can only process audio segments up to 30 seconds. To address this limitation, you can include the chunk_length_s parameter in the environment variable when creating the Hugging Face model, and later pass this parameter into the custom inference script when loading the model. Lastly, set the environment variables to increase payload size and response timeout for the Hugging Face container.

# Create a HuggingFaceModel for deployment
from sagemaker.huggingface.model import HuggingFaceModel

whisper_hf_model = HuggingFaceModel(
model_data=model_uri,
role=role,
image_uri = image,
entry_point=”inference.py”,
source_dir=’code’,
name=model_name,
env = {
“chunk_length_s”:”30″,
‘MMS_MAX_REQUEST_SIZE’: ‘2000000000’,
‘MMS_MAX_RESPONSE_SIZE’: ‘2000000000’,
‘MMS_DEFAULT_RESPONSE_TIMEOUT’: ‘900’
}
)

Framework
Environment variables

HuggingFace Inference Container (based on MMS)
‘MMS_MAX_REQUEST_SIZE‘: ‘2000000000’ ‘MMS_MAX_RESPONSE_SIZE‘: ‘2000000000’ ‘MMS_DEFAULT_RESPONSE_TIMEOUT‘: ‘900’

Define the model loading method in inference.py

When creating custom inference script for the Hugging Face model, we utilize a pipeline, allowing us to pass the chunk_length_s as a parameter. This parameter enables the model to efficiently process long audio files during inference.

### Hugging Face
DEVICE = “cuda:0” if torch.cuda.is_available() else “cpu”
chunk_length_s = int(os.environ.get(‘chunk_length_s’))
def model_fn(model_dir):
“””
Load and return the model
“””
model = pipeline(
“automatic-speech-recognition”,
model=model_dir,
chunk_length_s=chunk_length_s,
device=DEVICE,
)
return model

Exploring different inference options on Amazon SageMaker
The steps for selecting inference options are the same for both PyTorch and Hugging Face models, so we won’t differentiate between them below. However, it’s worth noting that, at the time of writing this post, the serverless inference option from SageMaker doesn’t support GPUs, and as a result, we exclude this option for this use-case.

Real-time inference

We can deploy the model as a real-time endpoint, providing responses in milliseconds. However, it’s important to note that this option is limited to processing inputs under 6 MB. We define the serializer as an audio serializer, which is responsible for converting the input data into a suitable format for the deployed model. We utilize a GPU instance for inference, allowing for accelerated processing of audio files. The inference input is an audio file that is from the local repository.

from sagemaker.serializers import DataSerializer
from sagemaker.deserializers import JSONDeserializer

# Define serializers and deserializer
audio_serializer = DataSerializer(content_type=”audio/x-audio”)
deserializer = JSONDeserializer()

# Deploy the model for real-time inference
endpoint_name = f’whisper-real-time-endpoint-{id}’

real_time_predictor = whisper_model.deploy(
initial_instance_count=1,
instance_type=”ml.g4dn.xlarge”,
endpoint_name = endpoint_name,
serializer=audio_serializer,
deserializer = deserializer
)

# Perform real-time inference
audio_path = “sample_audio.wav”
response = real_time_predictor.predict(data=audio_path)

Batch transform job

The second inference option is the batch transform job, which is capable of processing input payloads up to 100 MB. However, this method may take a few minutes of latency. Each instance can handle only one batch request at a time, and the instance initiation and shutdown also require a few minutes. The inference results are saved in an Amazon Simple Storage Service (Amazon S3) bucket upon completion of the batch transform job.
When configuring the batch transformer, be sure to include max_payload = 100 to handle larger payloads effectively. The inference input should be the Amazon S3 path to an audio file or an Amazon S3 Bucket folder containing a list of audio files, each with a size smaller than 100 MB.
Batch Transform partitions the Amazon S3 objects in the input by key and maps Amazon S3 objects to instances. For example, when you have multiple audio files, one instance might process input1.wav, and another instance might process the file named input2.wav to enhance scalability. Batch Transform allows you to configure max_concurrent_transforms to increase the number of HTTP requests made to each individual transformer container. However, it’s important to note that the value of (max_concurrent_transforms* max_payload) must not exceed 100 MB.

# Create a transformer
whisper_transformer = whisper_model.transformer(
instance_count = 1,
instance_type = “ml.g4dn.xlarge”,
output_path=”s3://{}/{}/batch-transform/”.format(bucket, prefix),
max_payload = 100
)
# Start batch transform job
whisper_transformer.transform(data = data, job_name= job_name, wait = False)

Asynchronous inference

Finally, Amazon SageMaker Asynchronous Inference is ideal for processing multiple requests concurrently, offering moderate latency and supporting input payloads of up to 1 GB. This option provides excellent scalability, enabling the configuration of an autoscaling group for the endpoint. When a surge of requests occurs, it automatically scales up to handle the traffic, and once all requests are processed, the endpoint scales down to 0 to save costs.
Using asynchronous inference, the results are automatically saved to an Amazon S3 bucket. In the AsyncInferenceConfig, you can configure notifications for successful or failed completions. The input path points to an Amazon S3 location of the audio file. For additional details, please refer to the code on GitHub.

from sagemaker.async_inference import AsyncInferenceConfig

# Create an AsyncInferenceConfig object
async_config = AsyncInferenceConfig(
output_path=f”s3://{bucket}/{prefix}/output”,
max_concurrent_invocations_per_instance = 4,
# notification_config = {
# “SuccessTopic”: “arn:aws:sns:us-east-2:123456789012:MyTopic”,
# “ErrorTopic”: “arn:aws:sns:us-east-2:123456789012:MyTopic”,
#}, # Notification configuration
)

# Deploy the model for async inference
endpoint_name = f’whisper-async-endpoint-{id}’
async_predictor = whisper_model.deploy(
async_inference_config=async_config,
initial_instance_count=1,
instance_type =’ml.g4dn.xlarge’,
endpoint_name = endpoint_name
)

# Perform async inference
initial_args = {‘ContentType’:”audio/x-audio”}
response = async_predictor.predict_async(initial_args = initial_args, input_path=input_path)

Optional: As mentioned earlier, we have the option to configure an autoscaling group for the asynchronous inference endpoint, which allows it to handle a sudden surge in inference requests. A code example is provided in this GitHub repository. In the following diagram, you can observe a line chart displaying two metrics from Amazon CloudWatch: ApproximateBacklogSize and ApproximateBacklogSizePerInstance. Initially, when 1000 requests were triggered, only one instance was available to handle the inference. For three minutes, the backlog size consistently exceeded three (please note that these numbers can be configured), and the autoscaling group responded by spinning up additional instances to efficiently clear out the backlog. This resulted in a significant decrease in the ApproximateBacklogSizePerInstance, allowing backlog requests to be processed much faster than during the initial phase.

Figure 2. Line chart illustrating the temporal changes in Amazon CloudWatch metrics

Comparative analysis for the inference options
The comparisons for different inference options are based on common audio processing use cases. Real-time inference offers the fastest inference speed but restricts payload size to 6 MB. This inference type is suitable for audio command systems, where users control or interact with devices or software using voice commands or spoken instructions. Voice commands are typically small in size, and low inference latency is crucial to ensure that transcribed commands can promptly trigger subsequent actions. Batch Transform is ideal for scheduled offline tasks, when each audio file’s size is under 100 MB, and there is no specific requirement for fast inference response times. Asynchronous inference allows for uploads of up to 1 GB and offers moderate inference latency. This inference type is well-suited for transcribing movies, TV series, and recorded conferences where larger audio files need to be processed.
Both real-time and asynchronous inference options provide autoscaling capabilities, allowing the endpoint instances to automatically scale up or down based on the volume of requests. In cases with no requests, autoscaling removes unnecessary instances, helping you avoid costs associated with provisioned instances that aren’t actively in use. However, for real-time inference, at least one persistent instance must be retained, which could lead to higher costs if the endpoint operates continuously. In contrast, asynchronous inference allows instance volume to be reduced to 0 when not in use. When configuring a batch transform job, it’s possible to use multiple instances to process the job and adjust max_concurrent_transforms to enable one instance to handle multiple requests. Therefore, all three inference options offer great scalability.
Cleaning up
Once you have completed utilizing the solution, ensure to remove the SageMaker endpoints to prevent incurring additional costs. You can use the provided code to delete real-time and asynchronous inference endpoints, respectively.

# Delete real-time inference endpoint
real_time_predictor.delete_endpoint()

# Delete asynchronous inference endpoint
async_predictor.delete_endpoint()

Conclusion
In this post, we showed you how deploying machine learning models for audio processing has become increasingly essential in various industries. Taking the Whisper model as an example, we demonstrated how to host open-source ASR models on Amazon SageMaker using PyTorch or Hugging Face approaches. The exploration encompassed various inference options on Amazon SageMaker, offering insights into efficiently handling audio data, making predictions, and managing costs effectively. This post aims to provide knowledge for researchers, developers, and data scientists interested in leveraging the Whisper model for audio-related tasks and making informed decisions on inference strategies.
For more detailed information on deploying models on SageMaker, please refer to this Developer guide. Additionally, the Whisper model can be deployed using SageMaker JumpStart. For additional details, kindly check the Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart post.
Feel free to check out the notebook and code for this project on GitHub and share your comment with us.

About the Author
Ying Hou, PhD, is a Machine Learning Prototyping Architect at AWS. Her primary areas of interest encompass Deep Learning, with a focus on GenAI, Computer Vision, NLP, and time series data prediction. In her spare time, she relishes spending quality moments with her family, immersing herself in novels, and hiking in the national parks of the UK.

Meet Lightning Attention-2: The Groundbreaking Linear Attention Mechan …

In sequence processing, one of the biggest challenges lies in optimizing attention mechanisms for computational efficiency. Linear attention has proven to be an efficient attention mechanism with its ability to process tokens in linear computational complexities. It has recently emerged as a promising alternative to conventional softmax attention. This theoretical advantage allows it to handle sequences of unlimited length while maintaining a constant training speed and fixed memory consumption. A crucial roadblock arises due to cumulative summation (cumsum), hindering current Linear Attention algorithms from demonstrating their promised efficiency in a casual setting.

The existing research involves leveraging the “kernel trick” to speed up attention matrix computation, emphasizing the product of keys and values before the n×n matrix multiplication. Lightning Attention-1 employs the FlashAttention-1/2 approach to address slow computation in Linear Attention by segmenting inputs and computing attention output concerning blocks. Significant approaches include 1 + elu activation, cosine function approximation, and sampling strategies to emulate softmax operation. IO-aware Attention focuses on system-level optimizations to efficiently implement the standard attention operator on GPU platforms. Some works attempt to directly increase context window sizes, such as Position Interpolation (PI) and StreamingLLM, to extend sequence length in LLMs.

A team of researchers has introduced Lightning Attention-2, an efficient linear attention mechanism for handling unlimited-length sequences without compromising speed. It uses tiling to divide computation into intra-block and inter-block components, optimizing linear attention’s computational characteristics. The research addresses the limitations of current linear attention algorithms, particularly the challenges associated with cumulative summation, and provides a breakthrough for large language models that require processing long sequences.

Various experiments conducted on different model sizes and sequence lengths validate the performance and computational advantages of Lightning Attention-2. Implementing Lightning Attention-2 in Triton makes it IO-aware and hardware-friendly, enhancing its efficiency. The algorithm exhibits consistent training and inference speeds across varied sequence lengths. It even surpasses other attention mechanisms in speed and accuracy, addressing the challenges of cumulative summation and offering a breakthrough for large language models processing long sequences.

Conclusively, the research introduces Lightning Attention-2, an implementation of linear attention that overcomes computational challenges in the causal setting. Utilizing “divide and conquer” and tiling techniques, this approach impressively tackles the current limitations of linear attention algorithms, especially cumsum challenges. Demonstrating unwavering training speeds and even surpassing existing attention mechanisms, Lightning Attention-2 holds immense potential for advancing large language models, especially those managing extended sequences. Future endeavors involve incorporating sequence parallelism to train exceptionally long sequences, overcoming prevailing hardware constraints.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..Don’t Forget to join our Telegram Channel
The post Meet Lightning Attention-2: The Groundbreaking Linear Attention Mechanism for Constant Speed and Fixed Memory Use appeared first on MarkTechPost.

Valence Labs Introduces LOWE: An LLM-Orchestrated Workflow Engine for …

Drug discovery is an essential process with applications across various scientific domains. However, Drug discovery is a very complex and time-consuming process. The traditional drug discovery approaches require extensive collaboration among teams spanning many years. Also, it involved scientists from various scientific fields working together to identify new drugs that can help the medical domain.

Consequently, there have been recent efforts to use artificial intelligence in this field. Valence Labs researchers have recently developed an LLM-Orchestrated Workflow Engine (LOWE). It is their latest advancement in the Recursion Operating System (OS). It allows scientists to use vast quantities of proprietary data and sophisticated computational tools for drug discovery. The system condenses various functionalities into a unified platform operated via natural language commands and helps reduce resource allocation and accelerate the progress of early discovery programs.

Earlier, the drug discovery process required multi-disciplinary collaboration between teams of chemists and biologists. LOWE can integrate diverse steps and instruments that are needed in drug discovery. It involves recognizing connections within Recursion’s unique Maps of Biology and Chemistry for constructing innovative compounds and arranging them for fabrication and examination. Also, Its integration with the Recursion OS is at the core of functionality. LOWE can navigate and assess relationships within Recursion’s PhenoMap data, using MatchMaker to identify drug-target interactions. This process allows LOWE to perform multisteps in drug discovery, like detecting prospective therapeutic objectives.

Also, LOWE has a user-friendly interface driven by natural language commands and interactive graphics. The researchers emphasize that these user-friendly features of LOWE allow users to ensure that drug discovery scientists can harness the power of state-of-the-art AI tools without requiring formal training in machine learning. Also, LOWE has data visualization tools to help the scientists efficiently parse the query output.

Further, It can identify new therapeutic targets and help predict ADMET properties. Also, LOWE helps in streamlining the process of procuring commercial compounds. These features of LOWE have immense use in R&D projects. It has a great potential impact on discovering new and effective medicines. The researchers emphasize that LOWE’s ability to streamline complex workflows significantly advances drug discovery. 

In conclusion, LOWE is a big step in drug discovery using LLM-based workflow engines. It showed that AI can help enhance efficiency and drive drug discovery. Its capacity to identify new therapeutic targets showcases its potential impact on navigating the discovery of new and effective medicines. Also, Valence Labs’ commitment to revolutionizing drug discovery has simplified workflows and democratized access to advanced AI tools, inspiring more scientific advancements.

Introducing LOWE: an LLM-Orchestrated Workflow Engine for executing complex drug discovery workflows using natural language.The next evolution of the @RecursionPharma OS, LOWE has access to proprietary data, integrates with Recursion’s labs, and more https://t.co/mCJDe9z9xZ pic.twitter.com/bX8ja8DP8U— Valence Labs (@valence_ai) January 8, 2024

The post Valence Labs Introduces LOWE: An LLM-Orchestrated Workflow Engine for Executing Complex Drug Discovery Workflows Using Natural Language appeared first on MarkTechPost.