Researchers at Stanford University Expose Systemic Biases in AI Langua …

In a new AI research paper, a team of researchers from Stanford Law School has investigated biases present in state-of-the-art large language models (LLMs), including GPT-4, focusing particularly on disparities related to race and gender. It highlights the potential harm caused by biases encoded in these models, especially when providing advice across various scenarios, such as car purchase negotiations or election outcome predictions. The paper aims to shed light on the systemic nature of biases in LLMs and propose methods to mitigate their harmful effects on marginalized communities.

Current methods struggle to address biases in LLMs, particularly the ones related to race and gender. While some efforts have been made to mitigate biases by avoiding explicit references to sensitive attributes, such as race or gender, researchers found that biases can still manifest through features strongly correlated with these attributes, such as names. To address this issue, the researchers propose an audit design that directly prompts LLMs with scenarios involving named individuals, varying the names to assess biases across racial and gender associations.

The proposed audit design involves structuring scenarios across multiple domains where LLMs provide advice to users, such as purchasing decisions or election predictions. By varying the names associated with individuals in these scenarios, the researchers aim to identify and quantify biases in the model’s responses. They employ three levels of contextual detail in the prompts: low context, high context, and numeric context, to evaluate the impact of additional information on bias mitigation. Through this approach, the study gathers quantitative data on disparities across different racial and gender associations, revealing systematic biases in LLM outputs.

The data indicate that names highly linked with ethnic minorities and women had consistently negative consequences across a variety of contexts. Providing qualitative context has mixed effects on biases, whereas a numerical anchor effectively eliminates differences in most circumstances. The paper also investigates intersectional biases, demonstrating that black women are especially disadvantaged. It further examines biases among different LLMs, revealing comparable trends across models, and demonstrating the prevalence of biases in cutting-edge language models.

In conclusion, the paper highlights the pervasive biases present in state-of-the-art LLMs, particularly concerning race and gender disparities. The proposed audit design provides a method to identify and quantify these biases, emphasizing the importance of conducting audits at LLMs’ deployment and implementation stages. While qualitative context may not consistently mitigate biases, numeric anchors offer a promising strategy for bias reduction.

Check out the Paper and Stanford Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Researchers at Stanford University Expose Systemic Biases in AI Language Models appeared first on MarkTechPost.

Achieve DevOps maturity with BMC AMI zAdviser Enterprise and Amazon Be …

In software engineering, there is a direct correlation between team performance and building robust, stable applications. The data community aims to adopt the rigorous engineering principles commonly used in software development into their own practices, which includes systematic approaches to design, development, testing, and maintenance. This requires carefully combining applications and metrics to provide complete awareness, accuracy, and control. It means evaluating all aspects of a team’s performance, with a focus on continuous improvement, and it applies just as much to mainframe as it does to distributed and cloud environments—maybe more.
This is achieved through practices like infrastructure as code (IaC) for deployments, automated testing, application observability, and complete application lifecycle ownership. Through years of research, the DevOps Research and Assessment (DORA) team has identified four key metrics that indicate the performance of a software development team:

Deployment frequency – How often an organization successfully releases to production
Lead time for changes – The amount of time it takes a commit to get into production
Change failure rate – The percentage of deployments causing a failure in production
Time to restore service – How long it takes an organization to recover from a failure in production

These metrics provide a quantitative way to measure the effectiveness and efficiency of DevOps practices. Although much of the focus around analysis of DevOps is on distributed and cloud technologies, the mainframe still maintains a unique and powerful position, and it can use the DORA 4 metrics to further its reputation as the engine of commerce.
This blog post discusses how BMC Software added AWS Generative AI capabilities to its product BMC AMI zAdviser Enterprise. The zAdviser uses Amazon Bedrock to provide summarization, analysis, and recommendations for improvement based on the DORA metrics data.
Challenges of tracking DORA 4 metrics
Tracking DORA 4 metrics means putting the numbers together and placing them on a dashboard. However, measuring productivity is essentially measuring the performance of individuals, which can make them feel scrutinized. This situation might necessitate a shift in organizational culture to focus on collective achievements and emphasize that automation tools enhance the developer experience.
It’s also vital to avoid focusing on irrelevant metrics or excessively tracking data. The essence of DORA metrics is to distill information into a core set of key performance indicators (KPIs) for evaluation. Mean time to restore (MTTR) is often the simplest KPI to track—most organizations use tools like BMC Helix ITSM or others that record events and issue tracking.
Capturing lead time for changes and change failure rate can be more challenging, especially on mainframes. Lead time for changes and change failure rate KPIs aggregate data from code commits, log files, and automated test results. Using a Git-based SCM pulls these insight together seamlessly. Mainframe teams using BMC’s Git-based DevOps platform, AMI DevX ,can collect this data as easily as distributed teams can.
Solution overview
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
BMC AMI zAdviser Enterprise provides a wide range of DevOps KPIs to optimize mainframe development and enable teams to proactvely identify and resolve issues. Using machine learning, AMI zAdviser monitors mainframe build, test and deploy functions across DevOps tool chains and then offers AI-led recommendations for continuous improvement. In addition to capturing and reporting on development KPIs, zAdviser captures data on how the BMC DevX products are adopted and used. This includes the number of programs that were debugged, the outcome of testing efforts using the DevX testing tools, and many other data points. These additional data points can provide deeper insight into the development KPIs, including the DORA metrics, and may be used in future generative AI efforts with Amazon Bedrock.
The following architecture diagram shows the final implementation of zAdviser Enterprise utilizing generative AI to provide summarization, analysis, and recommendations for improvement based on the DORA metrics KPI data.

The solution workflow includes the following steps:

Create the aggregation query to retrieve the metrics from Elasticsearch.
Extract the stored mainframe metrics data from zAdviser, which is hosted in Amazon Elastic Compute Cloud (Amazon EC2) and deployed in AWS.
Aggregate the data retrieved from Elasticsearch and form the prompt for the generative AI Amazon Bedrock API call.
Pass the generative AI prompt to Amazon Bedrock (using Anthropic’s Claude2 model on Amazon Bedrock).
Store the response from Amazon Bedrock (an HTML-formatted document) in Amazon Simple Storage Service (Amazon S3).
Trigger the KPI email process via AWS Lambda:

The HTML-formatted email is extracted from Amazon S3 and added to the body of the email.
The PDF for customer KPIs is extracted from zAdviser and attached to the email.
The email is sent to subscribers.

The following screenshot shows the LLM summarization of DORA metrics generated using Amazon Bedrock and sent as an email to the customer, with a PDF attachment that contains the DORA metrics KPI dashboard report by zAdviser.

Key takeaways
In this solution, you don’t need to worry about your data being exposed on the internet when sent to an AI client. The API call to Amazon Bedrock doesn’t contain any personally identifiable information (PII) or any data that could identify a customer. The only data transmitted consists of numerical values in the form of the DORA metric KPIs and instructions for the generative AI’s operations. Importantly, the generative AI client does not retain, learn from, or cache this data.
The zAdviser engineering team was successful in rapidly implementing this feature within a short time span. The rapid progress was facilitated by zAdviser’s substantial investment in AWS services and, importantly, the ease of using Amazon Bedrock via API calls. This underscores the transformative power of generative AI technology embodied in the Amazon Bedrock API. This API, equipped with the industry-specific knowledge repository zAdviser Enterprise and customized with continuously collected organization-specific DevOps metrics, demonstrates the potential of AI in this field.
Generative AI has the potential to lower the barrier to entry to build AI-driven organizations. Large language models (LLMs) in particular can bring tremendous value to enterprises seeking to explore and use unstructured data. Beyond chatbots, LLMs can be used in a variety of tasks, such as classification, editing, and summarization.
Conclusion
This post discussed the transformational impact of generative AI technology in the form of Amazon Bedrock APIs equipped with the industry-specific knowledge that BMC zAdviser possesses, tailored with organization-specific DevOps metrics collected on an ongoing basis.
Check out the BMC website to learn more and set up a demo.

About the Authors
Sunil Bemarkar is a Sr. Partner Solutions Architect at Amazon Web Services. He works with various Independent Software Vendors (ISVs) and Strategic customers across industries to accelerate their digital transformation journey and cloud adoption.
Vij Balakrishna is a Senior Partner Development manager at Amazon Web Services. She helps independent software vendors (ISVs) across industries to accelerate their digital transformation journey.
Spencer Hallman is the Lead Product Manager for the BMC AMI zAdviser Enterprise. Previously, he was the Product Manager for BMC AMI Strobe and BMC AMI Ops Automation for Batch Thruput. Prior to Product Management, Spencer was the Subject Matter Expert for Mainframe Performance. His diverse experience over the years has also included programming on multiple platforms and languages as well as working in the Operations Research field. He has a Master of Business Administration with a concentration in Operations Research from Temple University and a Bachelor of Science in Computer Science from the University of Vermont. He lives in Devon, PA and when he’s not attending virtual meetings, enjoys walking his dogs, riding his bike and spending time with his family.

Fine-tune your Amazon Titan Image Generator G1 model using Amazon Bedr …

Amazon Titan lmage Generator G1 is a cutting-edge text-to-image model, available via Amazon Bedrock, that is able to understand prompts describing multiple objects in various contexts and captures these relevant details in the images it generates. It is available in US East (N. Virginia) and US West (Oregon) AWS Regions and can perform advanced image editing tasks such as smart cropping, in-painting, and background changes. However, users would like to adapt the model to unique characteristics in custom datasets that the model is not already trained on. Custom datasets can include highly proprietary data that is consistent with your brand guidelines or specific styles such as a previous campaign. To address these use cases and generate fully personalized images, you can fine-tune Amazon Titan Image Generator with your own data using custom models for Amazon Bedrock.
From generating images to editing them, text-to-image models have broad applications across industries. They can enhance employee creativity and provide the ability to imagine new possibilities simply with textual descriptions. For example, it can aid design and floor planning for architects and allow faster innovation by providing the ability to visualize various designs without the manual process of creating them. Similarly, it can aid in design across various industries such as manufacturing, fashion design in retail, and game design by streamlining the generation of graphics and illustrations. Text-to-image models also enhance your customer experience by allowing for personalized advertising as well as interactive and immersive visual chatbots in media and entertainment use cases.
In this post, we guide you through the process of fine-tuning the Amazon Titan Image Generator model to learn two new categories: Ron the dog and Smila the cat, our favorite pets. We discuss how to prepare your data for the model fine-tuning task and how to create a model customization job in Amazon Bedrock. Finally, we show you how to test and deploy your fine-tuned model with Provisioned Throughput.

Ron the dog
Smila the cat

Evaluating model capabilities before fine-tuning a job
Foundation models are trained on large amounts of data, so it is possible that your model will work well enough out of the box. That’s why it’s good practice to check if you actually need to fine-tune your model for your use case or if prompt engineering is sufficient. Let’s try to generate some images of Ron the dog and Smila the cat with the base Amazon Titan Image Generator model, as shown in the following screenshots.

As expected, the out-of-the-box model does not know Ron and Smila yet, and the generated outputs show different dogs and cats. With some prompt engineering, we can provide more details to get closer to the look of our favorite pets.

Although the generated images are more similar to Ron and Smila, we see that the model is not able to reproduce the full likeness of them. Let’s now start a fine-tuning job with the photos from Ron and Smila to get consistent, personalized outputs.
Fine-tuning Amazon Titan Image Generator
Amazon Bedrock provides you with a serverless experience for fine-tuning your Amazon Titan Image Generator model. You only need to prepare your data and select your hyperparameters, and AWS will handle the heavy lifting for you.
When you use the Amazon Titan Image Generator model to fine-tune, a copy of this model is created in the AWS model development account, owned and managed by AWS, and a model customization job is created. This job then accesses the fine-tuning data from a VPC and the amazon Titan model has its weights updated. The new model is then saved to an Amazon Simple Storage Service (Amazon S3) located in the same model development account as the pre-trained model. It can now be used for inference only by your account and is not shared with any other AWS account. When running inference, you access this model via a provisioned capacity compute or directly, using batch inference for Amazon Bedrock. Independently from the inference modality chosen, your data remains in your account and is not copied to any AWS owned account or used to improve the Amazon Titan Image Generator model.
The following diagram illustrates this workflow.

Data privacy and network security
Your data used for fine-tuning including prompts, as well as the custom models, remain private in your AWS account. They are not shared or used for model training or service improvements, and aren’t shared with third-party model providers. All the data used for fine-tuning is encrypted in transit and at rest. The data remains in the same Region where the API call is processed. You can also use AWS PrivateLink to create a private connection between the AWS account where your data resides and the VPC.
Data preparation
Before you can create a model customization job, you need to prepare your training dataset. The format of your training dataset depends on the type of customization job you are creating (fine-tuning or continued pre-training) and the modality of your data (text-to-text, text-to-image, or image-to-embedding). For the Amazon Titan Image Generator model, you need to provide the images that you want to use for the fine-tuning and a caption for each image. Amazon Bedrock expects your images to be stored on Amazon S3 and the pairs of images and captions to be provided in a JSONL format with multiple JSON lines.
Each JSON line is a sample containing an image-ref, the S3 URI for an image, and a caption that includes a textual prompt for the image. Your images must be in JPEG or PNG format. The following code shows an example of the format:
{“image-ref”: “s3://bucket/path/to/image001.png”, “caption”: “<prompt text>”}
{“image-ref”: “s3://bucket/path/to/image002.png”, “caption”: “<prompt text>”}
{“image-ref”: “s3://bucket/path/to/image003.png”, “caption”: “<prompt text>”}
Because “Ron” and “Smila” are names that could also be used in other contexts, such as a person’s name, we add the identifiers “Ron the dog” and “Smila the cat” when creating the prompt to fine-tune our model. Although it’s not a requirement for the fine-tuning workflow, this additional information provides more contextual clarity for the model when it is being customized for the new classes and will avoid the confusion of ‘“Ron the dog” with a person called Ron and “Smila the cat” with the city Smila in Ukraine. Using this logic, the following images show a sample of our training dataset.

Ron the dog laying on a white dog bed
Ron the dog sitting on a tile floor
Ron the dog laying on a car seat

Smila the cat lying on a couch
Smila the cat staring at the camera laying on a couch
Smila the cat laying in a pet carrier

When transforming our data to the format expected by the customization job, we get the following sample structure:
{“image-ref”: “<S3_BUCKET_URL>/ron_01.jpg”, “caption”: “Ron the dog laying on a white dog bed”}
{“image-ref”: “<S3_BUCKET_URL>/ron_02.jpg”, “caption”: “Ron the dog sitting on a tile floor”}
{“image-ref”: “<S3_BUCKET_URL>/ron_03.jpg”, “caption”: “Ron the dog laying on a car seat”}
{“image-ref”: “<S3_BUCKET_URL>/smila_01.jpg”, “caption”: “Smila the cat lying on a couch”}
{“image-ref”: “<S3_BUCKET_URL>/smila_02.jpg”, “caption”: “Smila the cat sitting next to the window next to a statue cat”}
{“image-ref”: “<S3_BUCKET_URL>/smila_03.jpg”, “caption”: “Smila the cat lying on a pet carrier”}
After we have created our JSONL file, we need to store it on an S3 bucket to start our customization job. Amazon Titan Image Generator G1 fine-tuning jobs will work with 5–10,000 images. For the example discussed in this post, we use 60 images: 30 of Ron the dog and 30 of Smila the cat. In general, providing more varieties of the style or class you are trying to learn will improve the accuracy of your fine-tuned model. However, the more images you use for fine-tuning, the more time will be required for the fine-tuning job to complete. The number of images used also influence the pricing of your fine-tuned job. Refer to Amazon Bedrock Pricing for more information.
Fine-tuning Amazon Titan Image Generator
Now that we have our training data ready, we can begin a new customization job. This process can be done both via the Amazon Bedrock console or APIs. To use the Amazon Bedrock console, complete the following steps:

On the Amazon Bedrock console, choose Custom models in the navigation pane.
On the Customize model menu, choose Create fine-tuning job.
For Fine-tuned model name, enter a name for your new model.
For Job configuration, enter a name for the training job.
For Input data, enter the S3 path of the input data.
In the Hyperparameters section, provide values for the following:

Number of steps – The number of times the model is exposed to each batch.
Batch size – The number of samples processed before updating the model parameters.
Learning rate – The rate at which the model parameters are updated after each batch. The choice of these parameters depends on a given dataset. As a general guideline, we recommend you start by fixing the batch size to 8, the learning rate to 1e-5, and set the number of steps according to the number of images used, as detailed in the following table.

Number of images provided
8
32
64
1,000
10,000

Number of steps recommended
1,000
4,000
8,000
10,000
12,000

If the results of your fine-tuning job are not satisfactory, consider increasing the number of steps if you don’t observe any signs of the style in generated images, and decreasing the number of steps if you observe the style in the generated images but with artifacts or blurriness. If the fine-tuned model fails to learn the unique style in your dataset even after 40,000 steps, consider increasing the batch size or the learning rate.

In the Output data section, enter the S3 output path where the validation outputs, including the periodically recorded validation loss and accuracy metrics, are stored.
In the Service access section, generate a new AWS Identity and Access Management (IAM) role or choose an existing IAM role with the necessary permissions to access your S3 buckets.

This authorization enables Amazon Bedrock to retrieve input and validation datasets from your designated bucket and store validation outputs seamlessly in your S3 bucket.

Choose Fine-tune model.

With the correct configurations set, Amazon Bedrock will now train your custom model.
Deploy the fine-tuned Amazon Titan Image Generator with Provisioned Throughput
After you create custom model, Provisioned Throughput allows you to allocate a predetermined, fixed rate of processing capacity to the custom model. This allocation provides a consistent level of performance and capacity for handling workloads, which results in better performance in production workloads. The second advantage of Provisioned Throughput is cost control, because standard token-based pricing with on-demand inference mode can be difficult to predict at large scales.
When the fine tuning of your model is complete, this model will appear on the Custom models’ page on the Amazon Bedrock console.
To purchase Provisioned Throughput, select the custom model that you just fine-tuned and choose Purchase Provisioned Throughput.

This prepopulates the selected model for which you want to purchase Provisioned Throughput. For testing your fine-tuned model before deployment, set model units to a value of 1 and set the commitment term to No commitment. This quickly lets you start testing your models with your custom prompts and check if the training is adequate. Moreover, when new fine-tuned models and new versions are available, you can update the Provisioned Throughput as long as you update it with other versions of the same model.

Fine-tuning results
For our task of customizing the model on Ron the dog and Smila the cat, experiments showed that the best hyperparameters were 5,000 steps with a batch size of 8 and a learning rate of 1e-5.
The following are some examples of the images generated by the customized model.

Ron the dog wearing a superhero cape
Ron the dog on the moon
Ron the dog in a swimming pool with sunglasses

Smila the cat on the snow
Smila the cat in black and white staring at the camera
Smila the cat wearing a Christmas hat

Conclusion
In this post, we discussed when to use fine-tuning instead of engineering your prompts for better-quality image generation. We showed how to fine-tune the Amazon Titan Image Generator model and deploy the custom model on Amazon Bedrock. We also provided general guidelines on how to prepare your data for fine-tuning and set optimal hyperparameters for more accurate model customization.
As a next step, you can adapt the following example to your use case to generate hyper-personalized images using Amazon Titan Image Generator.

About the Authors
Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat Smila, and spending time with her family someplace warm.
Dani Mitchell is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is focused on computer vision use cases and helping customers across EMEA accelerate their ML journey.
Bharathi Srinivasan is a Data Scientist at AWS Professional Services, where she loves to build cool things on Amazon Bedrock. She is passionate about driving business value from machine learning applications, with a focus on responsible AI. Outside of building new AI experiences for customers, Bharathi loves to write science fiction and challenge herself with endurance sports.
Achin Jain is an Applied Scientist with the Amazon Artificial General Intelligence (AGI) team. He has expertise in text-to-image models and is focused on building the Amazon Titan Image Generator.

Catching Up with Customers.ai – What’s New in Q1

Can you believe it’s almost the end of Q1? We most certainly cannot. The past few months have been an exciting whirlwind for us at Customers.ai. 

As we get prepped for an even more exciting Q2–can’t say too much but we’re pumped–we wanted to take a moment and reflect. 

We’ve launched tons of exceptional new features designed to help marketers overcome the challenges they’re facing now and will face in the future. 

Without further ado, let’s run through the greatest hits over the last few months! 

Supercharge your Meta Ads with Restore

Customer Journey Tracking

Klaviyo Revenue Tracking

Email Deliverability Scoring

Direct Integration with Sendlane

Direct Integration with HighLevel

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Supercharge your Meta Ads with Restore

It’s been a difficult few years for digital advertisers (to say the least). Sometimes it seemed like you couldn’t open your email without learning that one tool or another was going to stop working how it used to. 

These already choppy waves are going to get even choppier in the coming months and years. 

Signal Degradation is a major challenge. But companies that overcome major challenges thrive while their competitors falter. And that’s why we’re so excited to have launched our Restore tool. 

Retargeting is a rock-solid marketing concept: reach out to the people who are already interested in you. Those people are 2-3x more likely to convert! 

The problem wasn’t the strategy, but that the data had gotten so much less accurate and reliable. 

Restore takes care of that! Our Website Visitor ID X-Ray pixel identifies the name and contact info of 20% or more of your visitors. 

Our direct integration with Facebook means that, once you connect your accounts, you can port these retargeting audiences over in a snap. 

Customer Journey Tracking

We would never pick a favorite of our new features but if we did have to choose, our Customer Journey feature might take the cake. 

Why? Because of how it opens up so many different avenues for marketers of all types. 

Huge companies like Amazon, Wal-Mart, and Target have had an edge on their smaller competitors for years because they have access to both a higher volume of data and more accurate, reliable data. 

They put that data to work for targeted and personalized ads. This reaped huge benefits! 

And now, with Customer Journey, you can reap those benefits too. 

The contacts you generate with our Website Visitor ID X-Ray pixel come with a full picture of how that customer has interacted with your site! 

What does this mean in practice? Your ability to target and personalize marketing outreach is only limited by your creativity! 

With our abundant integrations, you’ve got the ability to smoothly nurture leads into sales no matter how your organization is set up. 

Effectively segmenting your audience is fundamental to long-term sustainable marketing success and Customer Journey empowers you to do that like never before. 

This information is easy to find in your My Leads tab within your account. Build audiences based on whatever makers are relevant to your business and market away! 

Klaviyo Revenue Tracking

Attribution is a strangely tough part of marketing. 

Your goal isn’t to make one sale, it’s to make tons of sales. So understanding how each sale happened is crucial. You’ve got to know what works so that you know what to keep doing. And you’ve definitely got to know what doesn’t work so you can stop wasting money. 

That’s why we were so excited to launch a new tool within our Klaviyo integration: 

You can now track how much revenue the contacts identified by our Website Visitor ID X-Ray pixel are generating through Klaviyo right in your Integrations tab. 

Email Deliverability Scoring

Maintaining strong email deliverability has always been crucial but it’s only gotten more important in the first few months of 2024. 

Google announced that large senders who don’t abide by a strict set of guidelines will have their emails blocked before they can even reach the spam folder. 

That’s very bad news (if you’re unprepared). Thanks to this new feature, our customers weren’t unprepared: an email deliverability score right in the product. 

All you’ve got to do is connect your work email and within a day or so you’ll receive an assessment of your email deliverability! 

This score–which customers can constantly update–is a critical insight into your performance. Keeping an eye on it empowers you to take action the moment a negative signal appears and take care of it before small problems turn into massive ones. 

Email Deliverability Hacks:

Unlocking Inboxes

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

Sendlane Integration

Sendlane is a really innovative, exciting marketing automation platform. Launching our direct integration with their platform was a highlight of our first quarter. 

Setting it up takes only a few simple steps: 

Just drop in your API key, pick which audiences you want to send, and where you want them to go! 

Check out this conversation between our Founder & CEO, Larry Kim, and Sendlane’s Founder & CEO, Jimmy Kim on how to Maximize eCommerce ROI to get valuable insights: 

HighLevel Integration

Another integration we’re thrilled about is our new partnership with HighLevel. Nearly every marketing agency already knows and loves HighLevel for how easy they make white-labeling a sprawling marketing suite in one user-friendly interface. 

Now they can love HighLevel even more because leads generated for your clients by Customers.ai can go straight to them. Clients will love the engaged new contacts, agencies love the pricing and simplicity. A win-win-win. 

Hop into your Integrations tab and click “Add a HighLevel Account” to link your platforms. 

And then check out this Spotlight Session our Founder & CEO Larry Kim did with HighLevel! 

Conclusion

The most energizing thing about all we accomplished in Q1? That we’re just getting started. We hope you come along for the ride!

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Catching Up with Customers.ai – What’s New in Q1 appeared first on Customers.ai.

Revolutionizing Information Retrieval: How the FollowIR Dataset Enhanc …

Information Retrieval (IR) involves technologies and models that allow users to extract relevant information from large datasets. This field has evolved significantly with modern computational techniques, facilitating more efficient and precise search capabilities across vast digital information landscapes. Despite advancements, a prevailing issue within IR is the limited interaction models between users and retrieval systems. 

Existing IR systems heavily rely on models such as BM25, E5, and various neural network architectures, focusing primarily on enhancing semantic search capabilities through keyword-based queries and short sentences. Despite incorporating more sophisticated models like LLaMA 7B and the deployment of Large Language Models (LLMs) for semantic understanding, these strategies often need to pay more attention to the potential of utilizing detailed user instructions for refining search outcomes. Consequently, this overlooks an opportunity to fully exploit the advanced capabilities of LLMs in understanding and executing complex search intents.

Researchers from Johns Hopkins University, Allen Institute for AI, University of Glasgow, and Yale University have introduced “FOLLOWIR,” a novel dataset and benchmark to enhance IR models’ capacity to interpret and follow detailed user instructions. This method leverages rich instruction sets derived from the TREC conferences, enabling IR models to grasp better and execute more complex search criteria as specified by users.

FOLLOWIR integrates three TREC collections: TREC News 2021, TREC Robust 2004, and TREC Common Core 2017. Expert annotators refine TREC instructions, focusing on documents initially marked relevant, effectively halving the pool of relevant documents for select queries from TREC Robust 2004 and TREC News 2021. Instruction-following is assessed using standard retrieval metrics alongside a novel metric, p-MRR, designed to gauge rank-wise shifts between queries. This approach factors in document ranking, offering a comprehensive score range. Results are averaged per query and across the dataset, with data presented in 400-word segments, adhering to the MTEB framework for distribution.

The evaluation encompassed models such as BM25, BGE-base, E5-base-v2, TART-Contriever, and INSTRUCTOR-XL, segmented into categories based on their training with no instructions, instructions in IR, API models, and instruction-tuned LLMs. Large models and those tuned for instruction adherence exhibited notable success in instruction following. However, while strong in standard IR metrics, API models faltered in following instructions. Instruction-tuned LLMs, particularly FOLLOWIR-7B, demonstrated positive outcomes, underscoring their adeptness at instruction-based tasks. Ablation studies revealed that models optimized for keyword search struggled with instruction length, suggesting a gap in handling detailed directives. This was consistent across various datasets, indicating a broader trend of challenges in instruction comprehension.

To conclude, the research introduces “FOLLOWIR,” a benchmark to assess instruction-following in IR models. It reveals that most, except for large or instruction-tuned LLMs, struggle with following detailed instructions. The creation of FOLLOWIR-7B, an instruction-tuned model, illustrates the potential for significant improvement in standard retrieval metrics and instruction adherence. Despite limitations like reranking versus full retrieval challenges and potential annotation errors, this research paves the way for developing advanced IR models capable of adapting to complex user instructions through natural language.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Revolutionizing Information Retrieval: How the FollowIR Dataset Enhances Models’ Ability to Understand and Follow Complex Instructions appeared first on MarkTechPost.

Enhancing Graph Neural Networks for Heterophilic Graphs: McGill Univer …

Graph neural networks (GNNs) have revolutionized how researchers analyze and learn from data structured in complex networks. These models capture the intricate relationships inherent in graphs, which are omnipresent in social networks, molecular structures, and communication networks, to name a few areas. Central to their success is the ability to effectively process and learn from graph data, which is fundamentally non-Euclidean. Among various GNN architectures, Graph Attention Networks (GATs) stand out for their innovative use of attention mechanisms. These mechanisms assign varying levels of importance to neighboring nodes, allowing the model to focus on more relevant information during the learning process.

However, traditional GATs face significant challenges in heterophilic graphs, where connections are more likely between dissimilar nodes. The core issue lies in their inherent design, which optimizes for homophily, limiting their effectiveness in scenarios where understanding diverse connections is crucial. This limitation hampers the model’s ability to capture long-range dependencies and global structures within the graph, leading to decreased performance on tasks where such information is vital.

Researchers from McGill University and Mila-Quebec Artificial Intelligence Institute have introduced the Directional Graph Attention Network (DGAT), a novel framework designed to enhance GATs by incorporating global directional insights and feature-based attention mechanisms. DGAT’s key innovation lies in integrating a new class of Laplacian matrices, which allows for a more controlled diffusion process. This control enables the model to effectively prune noisy connections and add beneficial ones, improving the network’s ability to learn from long-range neighborhood information.

DGAT’s topology-guided neighbor pruning and edge addition strategies are particularly noteworthy. DGAT selectively refines the graph’s structure for more efficient message passing by leveraging the spectral properties of the newly proposed Laplacian matrices. It introduces a global directional attention mechanism that utilizes topological information to enhance the model’s ability to focus on certain parts of the graph. This sophisticated approach to managing the graph’s structure and attention mechanism significantly advances the field.

Empirical evaluations of DGAT have demonstrated its superior performance across various benchmarks, particularly in handling heterophilic graphs. The research team reported that DGAT outperforms traditional GAT models and other state-of-the-art methods in several node classification tasks. On six of seven real-world benchmark datasets, DGAT achieved remarkable improvements, highlighting its practical effectiveness in enhancing graph representation learning in heterophilic contexts.

In conclusion, DGAT emerges as a powerful tool for graph representation learning, bridging the gap between the theoretical potential of GNNs and their practical application in heterophilic graph scenarios. Its development underscores the importance of tailoring models to the specific data characteristics they are designed to process. With DGAT, researchers and practitioners have a more robust and versatile framework for extracting valuable insights from complex networked information.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Enhancing Graph Neural Networks for Heterophilic Graphs: McGill University Researchers Introduce Directional Graph Attention Networks (DGAT) appeared first on MarkTechPost.

OpenAI Sets Sight on Voice Assistant Market with New ‘Voice Engine …

In a bold move that signals a potential shift in the digital voice assistant market, OpenAI, the maker of ChatGPT, has filed a trademark application for a tool named “Voice Engine.” This strategic step could position OpenAI as a tough competitor to established tech giants like Apple, Amazon, and Google, whose products, Siri, Alexa, and Google Assistant, currently dominate the market.

OpenAI’s invasion into the voice technology arena with Voice Engine suggests a focused initiative to extend its prowess in artificial intelligence into the realm of digital voice assistants. The trademark application, submitted to the U.S. Patent and Trademark Office, outlines a comprehensive suite of voice-related technologies, highlighting OpenAI’s ambitious plans to innovate beyond its current capabilities.

This suite includes software designed for creating digital voice assistants, processing voice commands, generating audio from text prompts, and supporting multilingual speech recognition and translation. Such advancements build upon OpenAI’s existing technological base, including the text-to-speech API and the Whisper speech recognition model, marking a significant push towards offering a fully integrated virtual voice assistant for consumer use.

The introduction of the Read Aloud feature in ChatGPT, which can articulate responses in 37 languages, underscores OpenAI’s dedication to improving user interaction through voice. This feature, different from Whisper’s focus on understanding and responding to speech, combines both written and spoken communication, offering users a more holistic and hands-free experience. This development caters especially well to those who multitask or prefer auditory learning.

Sam Altman, CEO of OpenAI, hints at “many different things” being released this year, with speculation around Sora, the AI video tool, and potentially a new AI voice system. Despite the lack of concrete details about Voice Engine or its productization, OpenAI’s trademark filing speaks volumes about its intentions. Beyond consumer applications, Voice Engine could signify an enterprise play, enabling companies to enhance efficiency in call centers with advanced speech systems.

OpenAI’s move into digital voice assistants has its challenges. The company has encountered regulatory hurdles, such as the denial of the “GPT” trademark, but it continues its efforts to secure trademarks for future iterations like GPT-5, GPT-6, and GPT-7. With GPT-5’s release anticipated this summer, OpenAI remains at the forefront of AI innovation.

The venture into voice technology by filing a trademark for “Voice Engine” not only expands OpenAI’s technological ecosystem but also envisions a future where AI assistants are more integral to daily life. By prioritizing voice as a primary mode of interaction, OpenAI aims to facilitate seamless communication, bridging the gap between human intention and machine understanding.

Key Takeaways:

OpenAI has filed a trademark for “Voice Engine,” signaling a move to compete in the digital voice assistant market against giants like Apple, Amazon, and Google.

The Voice Engine initiative encompasses a suite of technologies aimed at creating comprehensive virtual voice assistants, leveraging OpenAI’s existing AI capabilities.

The introduction of the Read Aloud feature in ChatGPT, which offers vocalized responses in multiple languages, represents a step towards enhancing user experiences through voice.

OpenAI’s approach to voice technology is both consumer and enterprise-focused, potentially transforming how companies interact with customers.

Despite regulatory challenges, OpenAI continues to innovate, with developments like GPT-5 on the horizon, underscoring its commitment to pioneering the next generation of AI technologies.

The post OpenAI Sets Sight on Voice Assistant Market with New ‘Voice Engine’ Trademark appeared first on MarkTechPost.

Build a receipt and invoice processing pipeline with Amazon Textract

In today’s business landscape, organizations are constantly seeking ways to optimize their financial processes, enhance efficiency, and drive cost savings. One area that holds significant potential for improvement is accounts payable. On a high level, the accounts payable process includes receiving and scanning invoices, extraction of the relevant data from scanned invoices, validation, approval, and archival. The second step (extraction) can be complex. Each invoice and receipt look different. The labels are imperfect and inconsistent. The most important pieces of information such as price, vendor name, vendor address, and payment terms are often not explicitly labeled and have to be interpreted based on context. The traditional approach of using human reviewers to extract the data is time-consuming, error-prone, and not scalable.
In this post, we show how to automate the accounts payable process using Amazon Textract for data extraction. We also provide a reference architecture to build an invoice automation pipeline that enables extraction, verification, archival, and intelligent search.
Solution overview
The following architecture diagram shows the stages of a receipt and invoice processing workflow. It starts with a document capture stage to securely collect and store scanned invoices and receipts. The next stage is the extraction phase, where you pass the collected invoices and receipts to the Amazon Textract AnalyzeExpense API to extract financially related relationships between text such as vendor name, invoice receipt date, order date, amount due, amount paid, and so on. In the next stage, you use predefined expense rules to determine if you should automatically approve or reject the receipt. Approved and rejected documents go to their respective folders within the Amazon Simple Storage Service (Amazon S3) bucket. For approved documents, you can search all the extracted fields and values using Amazon OpenSearch Service. You can visualize the indexed metadata using OpenSearch Dashboards. Approved documents are also set up to be moved to Amazon S3 Intelligent-Tiering for long-term retention and archival using S3 lifecycle policies.

The following sections take you through the process of creating the solution.
Prerequisites
To deploy this solution, you must have the following:

An AWS account.
An AWS Cloud9 environment. AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal.

To create the AWS Cloud9 environment, provide a name and description. Keep everything else as default. Choose the IDE link on the AWS Cloud9 console to navigate to IDE. You’re now ready to use the AWS Cloud9 environment.
Deploy the solution
To set up the solution, you use the AWS Cloud Development Kit (AWS CDK) to deploy an AWS CloudFormation stack.

In your AWS Cloud9 IDE terminal, clone the GitHub repository and install the dependencies. Run the following commands to deploy the InvoiceProcessor stack:

git clone https://github.com/aws-samples/amazon-textract-invoice-processor.git
pip install -r requirements.txt
cdk bootstrap
cdk deploy

The deployment takes around 25 minutes with the default configuration settings from the GitHub repo. Additional output information is also available on the AWS CloudFormation console.

After the AWS CDK deployment is complete, create expense validation rules in an Amazon DynamoDB table. You can use the same AWS Cloud9 terminal to run the following commands:

aws dynamodb execute-statement –statement “INSERT INTO “$(aws cloudformation list-exports –query ‘Exports[?Name==`InvoiceProcessorWorkflow-RulesTableName`].Value’ –output text)” VALUE {‘ruleId’: 1, ‘type’: ‘regex’, ‘field’: ‘INVOICE_RECEIPT_ID’, ‘check’: ‘(?i)[0-9]{3}[a-z]{3}[0-9]{3}$’, ‘errorTxt’: ‘Receipt number is not valid. It is of the format: 123ABC456’}”
aws dynamodb execute-statement –statement “INSERT INTO “$(aws cloudformation list-exports –query ‘Exports[?Name==`InvoiceProcessorWorkflow-RulesTableName`].Value’ –output text)” VALUE {‘ruleId’: 2, ‘type’: ‘regex’, ‘field’: ‘PO_NUMBER’, ‘check’: ‘(?i)[a-z0-9]+$’, ‘errorTxt’: ‘PO number is not present’}”

In the S3 bucket that starts with invoiceprocessorworkflow-invoiceprocessorbucketf1-*, create an uploads folder.

In Amazon Cognito, you should already have an existing user pool called OpenSearchResourcesCognitoUserPool*. We use this user pool to create a new user.

On the Amazon Cognito console, navigate to the user pool OpenSearchResourcesCognitoUserPool*.
Create a new Amazon Cognito user.
Provide a user name and password of your choice and note them for later use.
Upload the documents random_invoice1 and random_invoice2 to the S3 uploads folder to start the workflows.

Now let’s dive into each of the document processing steps.
Document Capture
Customers handle invoices and receipts in a multitude of formats from different vendors. These documents are received through channels like hard copies, scanned copies uploaded to file storage, or shared storage devices. In the document capture stage, you store all scanned copies of receipts and invoices in a highly scalable storage such as in an S3 bucket.

Extraction
The next stage is the extraction phase, where you pass the collected invoices and receipts to the Amazon Textract AnalyzeExpense API to extract financially related relationships between text such as Vendor Name, Invoice Receipt Date, Order Date, Amount Due/Paid, etc.
AnalyzeExpense is an API dedicated to processing invoice and receipts documents. It is available both as a synchronous or asynchronous API. The synchronous API allows you to send images in bytes format, and the asynchronous API allows you to send files in JPG, PNG, TIFF, and PDF formats. The AnalyzeExpense API response consists of three distinct sections:

Summary fields – This section includes both normalized keys and the explicitly mentioned keys along with their values. AnalyzeExpense normalizes the keys for contact-related information such as vendor name and vendor address, tax ID-related keys such as tax payer ID, payment-related keys such as amount due and discount, and general keys such as invoice ID, delivery date, and account number. Keys that are not normalized still appear in the summary fields as key-value pairs. For a complete list of supported expense fields, refer to Analyzing Invoices and Receipts.
Line items – This section includes normalized line item keys such as item description, unit price, quantity, and product code.
OCR block – The block contains the raw text extract from the invoice page. The raw text extract can be used for postprocessing and identifying information that is not covered as part of the summary and line item fields.

This post uses the Amazon Textract IDP CDK constructs (AWS CDK components to define infrastructure for intelligent document processing (IDP) workflows), which allows you to build use case-specific, customizable IDP workflows. The constructs and samples are a collection of components to enable definition of IDP processes on AWS and published to GitHub. The main concepts used are the AWS CDK constructs, the actual AWS CDK stacks, and AWS Step Functions.
The following figure shows the Step Functions workflow.

The extraction workflow includes the following steps:

InvoiceProcessor-Decider – An AWS Lambda function that verifies if the input document format is supported by Amazon Textract. For more details about supported formats, refer to Input Documents.
DocumentSplitter – A Lambda function that generates 2,500-page (max) chunks from documents and can process large multi-page documents.
Map State – A Lambda function that processes each chunk in parallel.
TextractAsync – This task calls Amazon Textract using the asynchronous API following best practices with Amazon Simple Notification Service (Amazon SNS) notifications and uses OutputConfig to store the Amazon Textract JSON output to the S3 bucket you created earlier. It consists of two Lambda functions: one to submit the document for processing and one that is triggered on the SNS notification.
TextractAsyncToJSON2 – Because the TextractAsync task can produce multiple paginated output files, the TextractAsyncToJSON2 process combines them into one JSON file.

We discuss the details of the next three steps in the following sections.
Verification and approval
For the verification stage, the SetMetaData Lambda function verifies whether the uploaded file is a valid expense as per the rules configured previously in DynamoDB table. For this post, you use the following sample rules:

Verification is successful if INVOICE_RECEIPT_ID is present and matches the regex (?i)[0-9]{3}[a-z]{3}[0-9]{3}$ and if PO_NUMBER is present and matches the regex (?i)[a-z0-9]+$
Verification is un-successful if either PO_NUMBER or INVOICE_RECEIPT_ID is incorrect or missing in the document.

After the files are processed, the expense verification function moves the input files to either approved or declined folders in the same S3 bucket.

For the purposes of this solution, we use DynamoDB to store the expense validation rules. However, you can modify this solution to integrate with your own or commercial expense validation or management solutions.
Intelligent index and search
With the OpenSearchPushInvoke Lambda function, the extracted expense metadata is pushed to an OpenSearch Service index and is available for search.
The final TaskOpenSearchMapping step clears the context, which otherwise could exceed the Step Functions quota of maximum input or output size for a task, state, or workflow run.
After the OpenSearch Service index is created, you can search for keywords from the extracted text via OpenSearch Dashboards.

Archival, audit, and analytics
To manage the lifecycle and archival of invoices and receipts, you can configure S3 lifecycle rules to transition S3 objects from Standard to Intelligent-Tiering storage classes. S3 Intelligent-Tiering monitors access patterns and automatically moves objects to the Infrequent Access tier when they haven’t been accessed for 30 consecutive days. After 90 days of no access, the objects are moved to the Archive Instant Access tier without performance impact or operational overhead.
For auditing and analytics, this solution uses OpenSearch Service for running analytics on invoice requests. OpenSearch Service enables you to effortlessly ingest, secure, search, aggregate, view, and analyze data for a number of use cases, such as log analytics, application search, enterprise search, and more.
Log in to OpenSearch Dashboards and navigate to Stack Management, Saved objects, then choose Import. Choose the invoices.ndjson file from the cloned repository and choose Import. This prepopulates indexes and builds the visualization.

Refresh the page and navigate to Home, Dashboard, and open Invoices. You can now select and apply filters and expand the time window to explore past invoices.

Clean up
When you’re finished evaluating Amazon Textract for processing receipts and invoices, we recommend cleaning up any resources that you might have created. Complete the following steps:

Delete all content from the S3 bucket invoiceprocessorworkflow-invoiceprocessorbucketf1-*.
In AWS Cloud9, run the following commands to delete Amazon Cognito resources and CloudFormation stacks:

cognito_user_pool=$(aws cloudformation list-exports –query ‘Exports[?Name==`InvoiceProcessorWorkflow-CognitoUserPoolId`].Value’ –output text)
echo $cognito_user_pool
cdk destroy
aws cognito-idp delete-user-pool –user-pool-id $cognito_user_pool

Delete the AWS Cloud9 environment that you created from the AWS Cloud9 console.

Conclusion
In this post, we provided an overview of how we can build an invoice automation pipeline using Amazon Textract for data extraction and create a workflow for validation, archival, and search. We provided code samples on how to use the AnalyzeExpense API for extraction of critical fields from an invoice.
To get started, sign in to the Amazon Textract console to try this feature. To learn more about Amazon Textract capabilities, refer to the Amazon Textract Developer Guide or Textract Resources. To learn more about IDP, refer to the IDP with AWS AI services Part 1 and Part 2 posts.

About the Authors
Sushant Pradhan is a Sr. Solutions Architect at Amazon Web Services, helping enterprise customers. His interests and experience include containers, serverless technology, and DevOps. In his spare time, Sushant enjoys spending time outdoors with his family.
Shibin Michaelraj is a Sr. Product Manager with the AWS Textract team. He is focused on building AI/ML-based products for AWS customers.
Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI/ML community at AWS and designs intelligent document processing solutions.
Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel and ride his motorcycle in Texas Hill Country.

B2B Marketing in 2024: What’s in & What’s Out

If there’s one thing we know for sure, it’s that the only constant in life is change. 

That’s especially true in the world of B2B marketing. 

From privacy updates to ad platform changes to new regulations and more, there are always changes afoot.

To help our customers (and ourselves!) stay on top of this crazy industry, we sat down with five experts to talk about what’s in and what’s out this year when it comes to B2B marketing and advertising.

To say we got some great tips would be an understatement! 

From email to MQLs to LinkedIn and even design, our panelists absolutely brought it.

While we aren’t going to dive into every tip our panelists provided (you can watch the webinar for that), we are going to give you a few of the highlights. 

So, let’s dive into these expert tips and kick your B2B marketing and advertising strategies into high gear.

B2B Email Marketing

Larry Kim, CEO & Founder, Customers.ai

Email is a complicated channel. There is so much that goes into it and honestly, we could do 100 webinars on the topic and still not cover everything you need to know!

That being said, when we look at email marketing in the B2B space, it’s pretty clear what is out and what is in.

Out: Outbound Emailing

With the recent changes to Google and Yahoo’s spam algorithms, outbound email campaigns are only getting more complicated. 

As you can see in the image above (and in our B2B Spam Complaint Rate Study), spam complaint ranges are nowhere near the new thresholds set by the platforms, leaving B2B marketers in a precarious position.

The result? We can’t keep sending relying on outbound.

While cold emails have been a work horse for decades in the B2B space, they have higher chance of being marked as spam and as a result, a higher chance of getting your domain penalized. 

In: Intent-Driven Outbound Emailing

If outbound emails are out, that must mean inbound emails are in…right? Sort of.

What’s in is treating outbound the same as you would inbound. 

What makes inbound work? Warm leads!

People who have heard of you are less likely to mark you as spam. 

Using Customers.ai, you can start identifying high-intent visitors on your website who haven’t made a purchase, haven’t filled out a form, or haven’t previously received a message from you.

These are warm leads and these are the perfect people to consider for your outbound emailing. 

[x-ray banner]

Paid Advertising Spend

Molly Staats, Director of Strategic Partnerships, Lucky Orange

Paid ad spend has been on this rise the last few years and with the demise of third-party cookies and click IDs, it isn’t likely to get any better without some significant changes. 

That certainly doesn’t mean that there aren’t options. What is means is advertisers have to rethink how they are currently doing things and evolve past third-party cookies. 

Out: Rising Ad Costs & Third-Party Cookies

Unless you’ve been under a rock, you know third-party cookies are slated to be removed from Chrome by the end of 2024. 

You also know that Apple removed click IDs with iOS17 and that since iOS14, Facebook advertising hasn’t been the same. 

All of this, combined with demand has led to rising ad costs across platforms like Google, Facebook, and LinkedIn. 

As we head into the remainder of 2024, it’s time to say enough to these and start adapting to what’s in. 

In: First-Party Data & Owned Media 

Companies need to focus on collecting first-party data. Heck, even Google agrees with that! 

First-party data allows you to create better lookalike audiences, custom audience segments, and reach buyers in a much more cost-effective and efficient way. 

First-party data also allows you to focus on high-intent audiences – those who are truly interested in your business. These are the people who can be turned into subscribers and the people you can build real relationships with. 

For 2024, B2B marketers need to rely less on the platforms and focus on owned assets and media. First-party data is how we get there.

LinkedIn Advertising

AJ Wilcox, Founder, B2Linked.com

For B2B advertisers, LinkedIn has been a staple in the toolbox for years and they continue to get better. 

While it’s true we’ve seen rising ad costs there as well, their targeting capabilities have far outweighed that of Facebook and Google for B2B in recent years. 

LinkedIn continues to innovate and as a result, we are seeing new opportunities come in, and old ones head out the door.

Out: Single Image Ads & Corporate Videos

For years, LinkedIn advertisers have used single image ads, which, despite being straightforward to create, pretty much blend into the background. 

And with an average cost per click ranging from $10 to $16 and a click-through rate of less than half a percent, they aren’t exactly the low-cost ads they once were. They are out.

The same goes for video. 

While it’s great to see marketers evolving to video (we know the impact video ads can be extremely positive), it’s time to move away from those boring corporate videos. 

People want to feel a connection and traditional corporate videos tend to lack that emotional engagement. 

In: Thought Leadership Ads & Personal Videos

What do these two things have in common?

They feature real people!

That’s right. LinkedIn Thought Leader ads allow you to promote posts from actual employees (and coming in April 2024 – anyone). 

People are more likely to click on a post from a person than a company. This gives you an opportunity to connect with your target audience in a more effective way and guess what? It costs less! These ads perform 10x better than company posts and carry a much lower CPC.

As for video, the same thing applies – be a human! 

Focus on UGC or personal video posts to reach your customers. 

Personal videos can help you forge a genuine connection with your audience and they’re proven to be cost-effective and significantly impactful.

ABM Campaigns

Anastasia Warren, Senior Director of Paid Media, Walker Sands

ABM has been the hot buzzword for a few years and with good reason – as targeting capabilities became worse and worse, people needed to figure out a way to reach their audience. 

The focus? Accounts. 

But here’s the thing – it’s time to move past account-based marketing and focus on the people themselves (we call this contact-based marketing). 

Out: Siloed Campaigns & Generic Retargeting Lists

Single-phase campaigns, reliance solely on retargeting without segmentation, and broad, undifferentiated messaging are officially outdated.

People want personalization and they want to feel a connection to the brands they care about.

This means you can’t treat everyone the same and you must to build cohesive campaigns across channels. 

The time for generalization is over. 

In: Full-Funnel Campaigns & Segmented Audiences

We talk a lot about audience segmentation here and there is a reason for that – it’s a marketing must for B2B and B2C advertisers alike. 

With the right segmentation strategies in place, you can build really strong campaigns that target buyers throughout the funnel with personalized messaging and content that resonates. 

Look, with businesses needing upwards of 200 touchpoints to close a deal, you must ensure you are creating a full-funnel strategy.

Advanced segmentation and custom audiences are the way to do just that.  

B2B Messaging

Nica Latto, Content Marketing Strategist, Semrush

Whether you’re in B2B, B2C, DTC, or any other acronym, you know how difficult it can be to find the right messaging. 

You need messaging that resonates with your audience, captivates them, and gets them on board with your brand. 

Easier said than done. 

The right messaging also requires staying up to date with common vernacular, adapting to your audience, and evolving as language evolves. 

Out: Too Much Focus on Product Features 

The use of complex jargon, an overemphasis on product specifications without context, and the prevalent “Corporate Memphis” design style have got to go! 

These methods fail to resonate with modern B2B audiences seeking clarity and connection and are relics of a past world where B2B was treated differently than B2C.

B2B buyers aren’t the stiff, boring people we tend to treat them like. 

They are humans and they need to feel the same connection as any other customer out there.

In: Value to Customer

The key is to focus on relatable copy that emphasizes storytelling, highlights the value of products to customers rather than dry specifications, and incorporates unique design elements and colors. 

It can really be broken down into four key components:

Shift to Customer-Centric Messaging: Focus on what the customer gains from the product vs. the brand itself.

Humanize the Brand: Use real customer stories and UGC to add authenticity.

Incorporate Humor: Don’t shy away from adding humor to resonate more deeply with the audience.

Testing and Adaptation: Continuously test different aspects of your campaigns to discover what resonates best with your audience.

At the end of the day, it’s about the customer and that’s what your messaging needs to reflect.

A Shifting B2B Marketing Landscape

The B2B marketing world is not immune to changes happening across the board. 

Technology platforms are putting a hard focus on privacy, customer expectations have shifted and continue to shift, and what worked last year (or even six months ago) no longer works as effectively.

All of our panelists gave us some amazing tips for moving forward. From a focus on first-party data to better outbounding to more efficient ads and messaging, there is plenty of work that can be done.

These aren’t all the tips though!

Sign up for the webinar replay and you will get not only the video itself, but you’ll also get more insights into our new B2B consumer directory.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post B2B Marketing in 2024: What’s in & What’s Out appeared first on Customers.ai.

Enhancing User Agency in Generative Language Models: Algorithmic Recou …

Generative Language Models (GLMs) are being increasingly integrated into various sectors, including customer service and content creation, which necessitates maintaining a balance between moderation and freedom of expression. Hence, the need for a sophisticated approach to moderating potentially harmful content while preserving linguistic diversity and inclusiveness has never been more critical. 

Toxicity scoring systems, designed to filter out offensive or harmful language, do help but often struggle with false positives, especially concerning language used by marginalized communities. This issue restricts access to relevant information and stifles cultural expression and language reclamation efforts, where communities reclaim pejorative terms as a form of empowerment. Current moderation methods predominantly rely on fixed thresholds for toxicity scoring, leading to rigid and often biased content filtering. This one-size-fits-all approach must account for language’s nuanced and dynamic nature, particularly how it is used in diverse communities.

Researchers from Google DeepMind and UC San Diego have introduced a novel concept: dynamic thresholding for toxicity scoring in GLMs. The proposed algorithmic recourse mechanism allows users to override toxicity thresholds for specific phrases while protecting them from unnecessary exposure to toxic language. Users can specify and interact with content within their tolerances of “toxicity” and provide feedback to inform future user-specific norms or models for toxicity.

Users are first allowed to preview content flagged by the model’s initial toxicity assessment. They can decide whether such content should bypass automatic filters in future interactions. This interactive process fosters a sense of agency among users and tailors the GLM’s responses to align more closely with individual and societal norms. The implementation of this model was rigorously tested through a pilot study involving 30 participants. This study was designed to gauge the usability and effectiveness of the proposed mechanism in real-world scenarios.

The dynamic thresholding mechanism demonstrated effectiveness by securing an average System Usability Scale score of 66.8. This outcome, coupled with the study’s participants’ positive feedback, underscores the dynamic system’s superiority over the traditional fixed-threshold model. Participants expressed significant appreciation for the enhanced control and engagement facilitated by the dynamic thresholding, as it allowed for a more tailored interaction experience by enabling adjustments to content filtering based on individual user preferences.

In conclusion, exploring dynamic thresholding for toxicity scoring in GLMs offers promising insights into the future of user experience and agency. It represents a significant step towards more inclusive and flexible technology that respects the evolving nature of language and the diverse needs of its users. However, further research is needed to fully understand the implications of this method and how it can be optimized for various applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Enhancing User Agency in Generative Language Models: Algorithmic Recourse for Toxicity Filtering appeared first on MarkTechPost.

This AI Paper Introduces SafeEdit: A New Benchmark to Investigate Deto …

As Large Language Models (LLMs) like ChatGPT, LLaMA, and Mistral continue to advance, concerns about their susceptibility to harmful queries have intensified, prompting the need for robust safeguards. Approaches such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimization (DPO) have been widely adopted to enhance the safety of LLMs, enabling them to reject harmful queries. 

However, despite these advancements, aligned models may still be vulnerable to sophisticated attack prompts, raising questions about the precise modification of toxic regions within LLMs to achieve detoxification. Recent studies have demonstrated that previous approaches, such as DPO, may only suppress the activations of toxic parameters without effectively addressing underlying vulnerabilities, underscoring the importance of developing precise detoxification methods.

In response to these challenges, recent years have seen significant progress in knowledge editing methods tailored for LLMs, allowing for post-training adjustments without compromising overall performance. Leveraging knowledge editing to detoxify LLMs appears intuitive; however, existing datasets and evaluation metrics have focused on specific harmful issues, overlooking the threat posed by attack prompts and neglecting generalizability to various malicious inputs. 

To address this gap, researchers at Zhejiang University have introduced SafeEdit, a comprehensive benchmark designed to evaluate detoxification tasks via knowledge editing. SafeEdit covers nine unsafe categories with powerful attack templates and extends evaluation metrics to include defense success, defense generalization, and general performance, providing a standardized framework for assessing detoxification methods.

Several knowledge editing approaches, including MEND and Ext-Sub, have been explored on LLaMA and Mistral models, demonstrating the potential to detoxify LLMs efficiently with minimal impact on general performance. However, existing methods primarily target factual knowledge and may need help identifying toxic regions in response to complex adversarial inputs spanning multiple sentences. 

To address these challenges, researchers have proposed a novel knowledge editing baseline, Detoxifying with Intraoperative Neural Monitoring (DINM), which aims to diminish toxic regions within LLMs while minimizing side effects. Extensive experiments on LLaMA and Mistral models have shown that DINM outperforms traditional SFT and DPO methods in detoxifying LLMs, demonstrating stronger detoxification performance, efficiency, and the importance of accurately locating toxic regions.

In conclusion, the findings underscore the significant potential of knowledge editing for detoxifying LLMs, with the introduction of SafeEdit providing a standardized framework for evaluation. The efficient and effective DINM method represents a promising step towards addressing the challenge of detoxifying LLMs, shedding light on future applications of supervised fine-tuning, direct preference optimization, and knowledge editing in enhancing the safety and robustness of large language models.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post This AI Paper Introduces SafeEdit: A New Benchmark to Investigate Detoxifying LLMs via Knowledge Editing appeared first on MarkTechPost.

Researchers from Imperial College and GSK AI Introduce RAmBLA: A Machi …

As advanced models, large Language Models (LLMs) are tasked with interpreting complex medical texts, offering concise summaries, and providing accurate, evidence-based responses. The high stakes associated with medical decision-making underscore the paramount importance of these models’ reliability and accuracy. Amidst the increasing integration of LLMs in this sector, a pivotal challenge arises: ensuring these virtual assistants can navigate the intricacies of biomedical information without faltering.

Tackling this issue requires moving away from traditional AI evaluation methods, often focusing on narrow, task-specific benchmarks. While instrumental in gauging AI performance on discrete tasks like identifying drug interactions, these conventional approaches scarcely capture the multifaceted nature of biomedical inquiries. Such inquiries often demand the identification and the synthesis of complex data sets, requiring a nuanced understanding and the generation of comprehensive, contextually relevant responses.

Reliability AssessMent for Biomedical LLM Assistants (RAmBLA) is an innovative framework proposed by Imperial College London and GSK.ai researchers to rigorously assess LLM reliability within the biomedical domain. RAmBLA emphasizes criteria crucial for practical application in biomedicine, including the models’ resilience to diverse input variations, ability to recall pertinent information thoroughly, and proficiency in generating responses devoid of inaccuracies or fabricated information. This holistic evaluation approach represents a significant stride toward harnessing LLMs’ potential as dependable assistants in biomedical research and healthcare.

RAmBLA distinguishes itself by simulating real-world biomedical research scenarios to test LLMs. The framework exposes models to the breadth of challenges they would encounter in actual biomedical settings through meticulously designed tasks ranging from parsing complex prompts to accurately recalling and summarizing medical literature. One notable aspect of RAmBLA’s assessment is its focus on reducing hallucinations, where models generate plausible but incorrect or unfounded information, a critical reliability measure in medical applications.

The study underscored the superior performance of larger LLMs across several tasks, including a notable proficiency in semantic similarity measures, where GPT-4 showcased an impressive 0.952 accuracy in freeform QA tasks within biomedical queries. Despite these advancements, the analysis also highlighted areas needing refinements, such as the propensity for hallucinations and varying recall accuracy. Specifically, while larger models demonstrated a commendable ability to refrain from answering when presented with irrelevant context, achieving a 100% success rate in the ‘I don’t know’ task, smaller models like Llama and Mistral showed a drop in performance, underscoring the need for targeted improvements.

In conclusion, the study candidly addresses the challenges to fully realizing LLMs’ potential as reliable biomedical research tools. The introduction of RAmBLA offers a comprehensive framework that assesses LLMs’ current capabilities and guides enhancements to ensure these models can serve as invaluable, dependable assistants in the quest to advance biomedical science and healthcare.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Researchers from Imperial College and GSK AI Introduce RAmBLA: A Machine Learning Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain appeared first on MarkTechPost.

Best practices for building secure applications with Amazon Transcribe

Amazon Transcribe is an AWS service that allows customers to convert speech to text in either batch or streaming mode. It uses machine learning–powered automatic speech recognition (ASR), automatic language identification, and post-processing technologies. Amazon Transcribe can be used for transcription of customer care calls, multiparty conference calls, and voicemail messages, as well as subtitle generation for recorded and live videos, to name just a few examples. In this blog post, you will learn how to power your applications with Amazon Transcribe capabilities in a way that meets your security requirements.
Some customers entrust Amazon Transcribe with data that is confidential and proprietary to their business. In other cases, audio content processed by Amazon Transcribe may contain sensitive data that needs to be protected to comply with local laws and regulations. Examples of such information are personally identifiable information (PII), personal health information (PHI), and payment card industry (PCI) data. In the following sections of the blog, we cover different mechanisms Amazon Transcribe has to protect customer data both in transit and at rest. We share the following seven security best practices to build applications with Amazon Transcribe that meet your security and compliance requirements:

Use data protection with Amazon Transcribe
Communicate over a private network path
Redact sensitive data if needed
Use IAM roles for applications and AWS services that require Amazon Transcribe access
Use tag-based access control
Use AWS monitoring tools
Enable AWS Config

The following best practices are general guidelines and don’t represent a complete security solution. Because these best practices might not be appropriate or sufficient for your environment, use them as helpful considerations rather than prescriptions.
Best practice 1 – Use data protection with Amazon Transcribe
Amazon Transcribe conforms to the AWS shared responsibility model, which differentiates AWS responsibility for security of the cloud from customer responsibility for security in the cloud.
AWS is responsible for protecting the global infrastructure that runs all of the AWS Cloud. As the customer, you are responsible for maintaining control over your content that is hosted on this infrastructure. This content includes the security configuration and management tasks for the AWS services that you use. For more information about data privacy, see the Data Privacy FAQ.
Protecting data in transit
Data encryption is used to make sure that data communication between your application and Amazon Transcribe remains confidential. The use of strong cryptographic algorithms protects data while it is being transmitted.
Amazon Transcribe can operate in one of the two modes:

Streaming transcriptions allow media stream transcription in real time
Batch transcription jobs allow transcription of audio files using asynchronous jobs.

In streaming transcription mode, client applications open a bidirectional streaming connection over HTTP/2 or WebSockets. An application sends an audio stream to Amazon Transcribe, and the service responds with a stream of text in real time. Both HTTP/2 and WebSockets streaming connections are established over Transport Layer Security (TLS), which is a widely accepted cryptographic protocol. TLS provides authentication and encryption of data in transit using AWS certificates. We recommend using TLS 1.2 or later.
In batch transcription mode, an audio file first needs to be put in an Amazon Simple Storage Service (Amazon S3) bucket. Then a batch transcription job referencing the S3 URI of this file is created in Amazon Transcribe. Both Amazon Transcribe in batch mode and Amazon S3 use HTTP/1.1 over TLS to protect data in transit.
All requests to Amazon Transcribe over HTTP and WebSockets must be authenticated using AWS Signature Version 4. It is recommended to use Signature Version 4 to authenticate HTTP requests to Amazon S3 as well, although authentication with older Signature Version 2 is also possible in some AWS Regions. Applications must have valid credentials to sign API requests to AWS services.
Protecting data at rest
Amazon Transcribe in batch mode uses S3 buckets to store both the input audio file and the output transcription file. Customers use an S3 bucket to store the input audio file, and it is highly recommended to enable encryption on this bucket. Amazon Transcribe supports the following S3 encryption methods:

Server-Side Encryption with Amazon S3 Managed Keys (SSE-S3)
Server-Side Encryption with KMS keys Stored in AWS Key Management Service (SSE-KMS)

Both methods encrypt customer data as it is written to disks and decrypt it when you access it using one of the strongest block cyphers available: 256-bit Advanced Encryption Standard (AES-256) GCM.When using SSE-S3, encryption keys are managed and regularly rotated by the Amazon S3 service. For additional security and compliance, SSE-KMS provides customers with control over encryption keys via AWS Key Management Service (AWS KMS). AWS KMS gives additional access controls because you have to have permissions to use the appropriate KMS keys in order to encrypt and decrypt objects in S3 buckets configured with SSE-KMS. Also, SSE-KMS provides customers with an audit trail capability that keeps records of who used your KMS keys and when.
The output transcription can be stored in the same or a different customer-owned S3 bucket. In this case, the same SSE-S3 and SSE-KMS encryption options apply. Another option for Amazon Transcribe output in batch mode is using a service-managed S3 bucket. Then output data is put in a secure S3 bucket managed by Amazon Transcribe service, and you are provided with a temporary URI that can be used to download your transcript.
Amazon Transcribe uses encrypted Amazon Elastic Block Store (Amazon EBS) volumes to temporarily store customer data during media processing. The customer data is cleaned up for both complete and failure cases.
Best practice 2 – Communicate over a private network path
Many customers rely on encryption in transit to securely communicate with Amazon Transcribe over the Internet. However, for some applications, data encryption in transit may not be sufficient to meet security requirements. In some cases, data is required to not traverse public networks such as the internet. Also, there may be a requirement for the application to be deployed in a private environment not connected to the internet. To meet these requirements, use interface VPC endpoints powered by AWS PrivateLink.
The following architectural diagram demonstrates a use case where an application is deployed on Amazon EC2. The EC2 instance that is running the application does not have access to the internet and is communicating with Amazon Transcribe and Amazon S3 via interface VPC endpoints.

In some scenarios, the application that is communicating with Amazon Transcribe may be deployed in an on-premises data center. There may be additional security or compliance requirements that mandate that data exchanged with Amazon Transcribe must not transit public networks such as the internet. In this case, private connectivity via AWS Direct Connect can be used. The following diagram shows an architecture that allows an on-premises application to communicate with Amazon Transcribe without any connectivity to the internet.

Best practice 3 – Redact sensitive data if needed
Some use cases and regulatory environments may require the removal of sensitive data from transcripts and audio files. Amazon Transcribe supports identifying and redacting personally identifiable information (PII) such as names, addresses, Social Security numbers, and so on. This capability can be used to enable customers to achieve payment card industry (PCI) compliance by redacting PII such as credit or debit card number, expiration date, and three-digit card verification code (CVV). Transcripts with redacted information will have PII replaced with placeholders in square brackets indicating what type of PII was redacted. Streaming transcriptions support the additional capability to only identify PII and label it without redaction. The types of PII redacted by Amazon Transcribe vary between batch and streaming transcriptions. Refer to Redacting PII in your batch job and Redacting or identifying PII in a real-time stream for more details.
The specialized Amazon Transcribe Call Analytics APIs have a built-in capability to redact PII in both text transcripts and audio files. This API uses specialized speech-to-text and natural language processing (NLP) models trained specifically to understand customer service and sales calls. For other use cases, you can use this solution to redact PII from audio files with Amazon Transcribe.
Additional Amazon Transcribe security best practices
Best practice 4 – Use IAM roles for applications and AWS services that require Amazon Transcribe access. When you use a role, you don’t have to distribute long-term credentials, such as passwords or access keys, to an EC2 instance or AWS service. IAM roles can supply temporary permissions that applications can use when they make requests to AWS resources.
Best Practice 5 – Use tag-based access control. You can use tags to control access within your AWS accounts. In Amazon Transcribe, tags can be added to transcription jobs, custom vocabularies, custom vocabulary filters, and custom language models.
Best Practice 6 – Use AWS monitoring tools. Monitoring is an important part of maintaining the reliability, security, availability, and performance of Amazon Transcribe and your AWS solutions. You can monitor Amazon Transcribe using AWS CloudTrail and Amazon CloudWatch.
Best Practice 7 – Enable AWS Config. AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. Using AWS Config, you can review changes in configurations and relationships between AWS resources, investigate detailed resource configuration histories, and determine your overall compliance against the configurations specified in your internal guidelines. This can help you simplify compliance auditing, security analysis, change management, and operational troubleshooting.
Compliance validation for Amazon Transcribe
Applications that you build on AWS may be subject to compliance programs, such as SOC, PCI, FedRAMP, and HIPAA. AWS uses third-party auditors to evaluate its services for compliance with various programs. AWS Artifact allows you to download third-party audit reports.
To find out if an AWS service is within the scope of specific compliance programs, refer to AWS Services in Scope by Compliance Program. For additional information and resources that AWS provides to help customers with compliance, refer to Compliance validation for Amazon Transcribe and AWS compliance resources.
Conclusion
In this post, you have learned about various security mechanisms, best practices, and architectural patterns available for you to build secure applications with Amazon Transcribe. You can protect your sensitive data both in transit and at rest with strong encryption. PII redaction can be used to enable removal of personal information from your transcripts if you do not want to process and store it. VPC endpoints and Direct Connect allow you to establish private connectivity between your application and the Amazon Transcribe service. We also provided references that will help you validate compliance of your application using Amazon Transcribe with programs such as SOC, PCI, FedRAMP, and HIPAA.
As next steps, check out Getting started with Amazon Transcribe to quickly start using the service. Refer to Amazon Transcribe documentation to dive deeper into the service details. And follow Amazon Transcribe on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Transcribe.

About the Author

Alex Bulatkin is a Solutions Architect at AWS. He enjoys helping communication service providers build innovative solutions in AWS that are redefining the telecom industry. He is passionate about working with customers on bringing the power of AWS AI services into their applications. Alex is based in the Denver metropolitan area and likes to hike, ski, and snowboard.

Apple Researchers Propose a Multimodal AI Approach to Device-Directed …

Virtual assistant technology aims to create seamless and intuitive human-device interactions. However, the need for a specific trigger phrase or button press to initiate a command interrupts the fluidity of natural dialogue. Recognizing this challenge, Apple researchers have embarked on a groundbreaking study to enhance the intuitiveness of these interactions. Their solution eliminates the need for trigger phrases, allowing users to interact with devices more spontaneously.

The heart of the challenge lies in accurately identifying when a spoken command is intended for the device amidst a stream of background noise and speech. This problem is markedly more complex than simple wake-word detection because it involves discerning the user’s intent without explicit cues. Previous attempts to address this issue have utilized acoustic signals and linguistic information. However, these methods often falter in noisy environments or ambiguous speech scenarios, which could be clearer, highlighting a gap that this new research aims to bridge.

Apple’s research team introduces an innovative multimodal approach that leverages the synergy between acoustic data, linguistic cues, and outputs from automatic speech recognition (ASR) systems. This method’s core is using a large language model (LLM), which, due to its state-of-the-art text comprehension capabilities, can integrate diverse types of data to improve the accuracy of detecting device-directed speech. This approach utilizes the individual strengths of each input type and explores how their combination can offer a more nuanced understanding of user intent.

From a technical standpoint, the researchers’ methodology involves training classifiers using purely acoustic information extracted from audio waveforms. The decoder outputs of an ASR system, including hypotheses and lexical features, are then used as inputs to the LLM. The final step merges these acoustic and lexical features with ASR decoder signals into a multimodal system that inputs into an LLM, creating a robust framework for understanding and categorizing speech directed at a device.

The efficacy of this multimodal system is demonstrated through its performance metrics, which show significant improvements over traditional models. Specifically, the system achieves equal error rate (EER) reductions of up to 39% and 61% over text-only and audio-only models, respectively. Furthermore, by increasing the size of the LLM and applying low-rank adaptation techniques, the research team pushed these EER reductions even further, up to 18% on their dataset.

Apple’s groundbreaking research paves the way for more natural interactions with virtual assistants and sets a new benchmark for the field. By achieving an EER of 7.95% with the Whisper audio encoder and 7.45% with the CLAP backbone, the research showcases the potential of combining text, audio, and decoder signals from an ASR system. These results signify a leap towards the realization of virtual assistants that can understand and respond to user commands without the need for explicit trigger phrases, moving closer to a future where technology understands us just as well as we know it.

Apple’s research has resulted in significant improvements in human-device interaction. By combining the capabilities of multimodal information and advanced processing powered by LLMs, the research team has paved the way for the next generation of virtual assistants. This technology aims to make our interactions with devices more intuitive, similar to human-to-human communication. It has the potential to change our relationship with technology fundamentally.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Apple Researchers Propose a Multimodal AI Approach to Device-Directed Speech Detection with Large Language Models appeared first on MarkTechPost.

Meta AI Proposes Reverse Training: A Simple and Effective Artificial I …

Large language models have revolutionized natural language processing, providing machines with human-like language abilities. However, despite their prowess, these models grapple with a crucial issue- the Reversal Curse. This term encapsulates their struggle to comprehend logical reversibility, where they often need to deduce that if ‘A has a feature B,’ it logically implies ‘B is a feature of A.’ This limitation poses a significant challenge in the pursuit of truly intelligent systems.

At FAIR, Meta’s AI research division, scientists have delved into this issue, recognizing that the Reversal Curse is not just an academic concern. It’s a practical problem that hampers the effective use of LLMs in various applications, from automated reasoning to natural language understanding tasks. Despite their effectiveness in absorbing vast amounts of data, the traditional one-directional training methods need to improve in teaching LLMs the reversible nature of relationships within the data.

In response to this challenge, the Meta team has proposed a novel training strategy-reverse training. This approach ingeniously doubles the data’s utility by presenting information in original and reversed forms. For instance, alongside the standard training phrase’ A has a feature B,’ the model would also encounter ‘B is a feature of A,’ effectively teaching it the concept of reversibility. This technique is akin to introducing a new language to the model, expanding its understanding and flexibility in handling language-based tasks.

The reverse training method was rigorously tested against traditional models in tasks designed to evaluate the understanding of reversible relationships. The results were telling. In experiments where models were tasked with identifying relationships in both directions, reverse-trained models displayed superior performance. For example, in the reversal task of connecting celebrities to their parents based on the training data, reverse-trained models achieved an accuracy improvement, registering a significant 10.4% accuracy in the more challenging “parent to celebrity” direction, as opposed to 1.6% accuracy seen in models trained using conventional methods. Furthermore, these models enhanced performance in standard tasks, underscoring the versatility and efficiency of the reverse training approach.

This innovative methodology overcomes the Reversal Curse by training language models to recognize and interpret information in forward and backward formats. This breakthrough enhances their reasoning abilities, making them more adept at understanding and interacting with the world. The Meta team’s work exemplifies innovative thinking that pushes the boundaries of what machines can understand and achieve, contributing to the advancement of language modeling techniques.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit
The post Meta AI Proposes Reverse Training: A Simple and Effective Artificial Intelligence Training Method to Help Remedy the Reversal Curse in LLMs appeared first on MarkTechPost.