Unlocking the Future of Mathematics with AI: Meet InternLM-Math, the G …

The integration of artificial intelligence in mathematical reasoning marks a pivotal advancement in our quest to understand and utilize the very language of the universe. Mathematics, a discipline that stretches from the rudimentary principles of arithmetic to the complexities of algebra and calculus, serves as the bedrock for innovation across various fields, including science, engineering, and technology. The challenge, however, has always been to move beyond mere computation to achieve a level of reasoning and proof akin to human capability.

Significant advancements have been made in the field of large language models (LLMs) to confront this challenge head-on. Through their extensive training on diverse datasets, these models have demonstrated an ability to compute, reason, infer, and even prove mathematical theorems. This evolution from computation to reasoning represents a significant leap forward, offering new tools for solving some of mathematics’ most enduring problems.

InternLM-Math, a state-of-the-art model developed by Shanghai AI Laboratory in collaboration with prestigious academic institutions such as Tsinghua University, Fudan University, and the University of Southern California, is at the forefront of this evolution. InternLM-Math, an offspring of the foundational InternLM2 model, represents a paradigm shift in mathematical reasoning. It incorporates a suite of advanced features, including chain-of-thought reasoning, reward modeling, formal reasoning, and data augmentation, all within a unified sequence-to-sequence (seq2seq) framework. This comprehensive approach has positioned InternLM-Math as a frontrunner in the field, capable of tackling a wide range of mathematical tasks with unprecedented accuracy and depth.

The methodology behind InternLM-Math is as innovative as it is effective. The team has significantly enhanced the model’s reasoning capabilities by continuing the pre-training of InternLM2, focusing on mathematical data. Including chain-of-thought reasoning, in particular, allows InternLM-Math to approach problems step-by-step, mirroring the human thought process. Coding integration further bolsters this through the reasoning interleaved with the coding (RICO) technique, enabling the model to solve complex problems and generate proofs more naturally and intuitively.

The performance of InternLM-Math speaks volumes about its capabilities. On various benchmarks, including GSM8K, MATH, and MiniF2F, InternLM-Math has consistently outperformed existing models. Notably, it scored 30.3 on the MiniF2F test set without any fine-tuning, a testament to its robust pre-training and innovative methodology. Furthermore, the model’s ability to use LEAN for solving and proving mathematical statements showcases its versatility and potential as a tool for both research and education.

The implications of InternLM-Math’s achievements are far-reaching. By providing a model capable of verifiable reasoning and proof, Shanghai AI Laboratory has not only advanced the field of artificial intelligence. Still, it has also opened new avenues for exploration in mathematics. InternLM-Math’s ability to synthesize new problems, verify solutions, and even improve itself through data augmentation positions it as a pivotal tool in the ongoing quest to deepen our understanding of mathematics.

In summary, InternLM-Math represents a significant milestone in achieving human-like reasoning in mathematics through artificial intelligence. Its development by Shanghai AI Laboratory and academic collaborators marks an important step forward in our ability to solve, reason, and prove mathematical concepts, promising a future where AI-driven tools augment our understanding and exploration of the mathematical world.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Unlocking the Future of Mathematics with AI: Meet InternLM-Math, the Groundbreaking Language Model for Advanced Math Reasoning and Problem-Solving appeared first on MarkTechPost.

Huawei Researchers Introduce a Novel and Adaptively Adjustable Loss Fu …

The progress and development of artificial intelligence (AI) heavily rely on human evaluation, guidance, and expertise. In computer vision, convolutional networks acquire a semantic understanding of images through extensive labeling provided by experts, such as delineating object boundaries in datasets like COCO or categorizing images in ImageNet. 

Similarly, in robotics, reinforcement learning often relies on human-defined reward functions to guide machines toward optimal performance. In Natural Language Processing (NLP), recurrent neural networks and Transformers can learn the intricacies of language from vast amounts of unsupervised text generated by humans. This symbiotic relationship highlights how AI models advance by leveraging human intelligence, tapping into the depth and breadth of human expertise to enhance their capabilities and understanding.

Researchers from Huawei introduced the concept of ” superalignment ” to address the challenge of effectively leveraging human expertise to supervise superhuman AI models. Superalignment aims to align superhuman models to maximize their learning from human input. A seminal concept in this area is Weak-to-Strong Generalization (WSG), which explores using weaker models to supervise stronger ones. 

WSG research has shown that stronger models can surpass their weaker counterparts in performance through simple supervision, even with incomplete or flawed labels. This approach has demonstrated effectiveness in natural language processing and reinforcement learning.

Researchers extend their idea to “vision superalignment,” specifically examining the application of Weak-to-Strong Generalization (WSG) within the context of vision foundation models. Multiple scenarios in computer vision, including few-shot learning, transfer learning, noisy label learning, and traditional knowledge distillation settings, were meticulously designed and examined. 

Their approach’s effectiveness stems from its capacity to blend direct learning from the weak model with the strong model’s inherent capability to comprehend and interpret visual data. By leveraging the guidance provided by the weak model while capitalizing on the advanced capabilities of the strong model, this method enables the strong model to transcend the constraints of the weak model, thereby enhancing its predictions.

However, to deal with the problems of weak models not providing precise guidance and strong models sometimes giving incorrect labels, one needs a smarter method than just mixing these labels. Since it’s hard to know how accurate each label is, in the future, researchers plan to use confidence as a measure to pick the most likely correct label. This way, by considering confidence levels, one can choose the best labels more effectively, making the model’s predictions more accurate and reliable overall.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Huawei Researchers Introduce a Novel and Adaptively Adjustable Loss Function for Weak-to-Strong Supervision appeared first on MarkTechPost.

CREMA by UNC-Chapel Hill: A Modular AI Framework for Efficient Multimo …

In artificial intelligence, integrating multimodal inputs for video reasoning stands as a frontier, challenging yet ripe with potential. Researchers increasingly focus on leveraging diverse data types – from visual frames and audio snippets to more complex 3D point clouds – to enrich AI’s understanding and interpretation of the world. This endeavor aims to mimic human sensory integration and surpass it in depth and breadth, enabling machines to make sense of complex environments and scenarios with unprecedented clarity.

At the heart of this challenge is the problem of efficiently and effectively fusing these varied modalities. Traditional approaches have often fallen short, either by needing to be more flexible in accommodating new data types or necessitating prohibitive computational resources. Thus, the quest is for a solution that not only embraces the diversity of sensory data but does so with agility and scalability.

Current methodologies in multimodal learning have shown promise but are hampered by their computational intensity and inflexibility. These systems typically require substantial parameter updates or dedicated modules for each new modality, making the integration of new data types cumbersome and resource-intensive. Such limitations hinder the adaptability and scalability of AI systems in dealing with the richness of real-world inputs.

A groundbreaking framework proposed by UNC-Chapel Hill researchers was designed to revolutionize how AI systems handle multimodal inputs for video reasoning. This innovative approach introduces a modular, efficient system for fusing different modalities, such as optical flow, 3D point clouds, and audio, without requiring extensive parameter updates or bespoke modules for each data type. At its core, CREMA utilizes a query transformer architecture that integrates diverse sensory data, paving the way for a more nuanced and comprehensive AI understanding of complex scenarios.

CREMA’s methodology is notable for its efficiency and adaptability. Employing a set of parameter-efficient modules allows the framework to project diverse modality features into a common embedding space, facilitating seamless integration without overhauling the underlying model architecture. This approach conserves computational resources and ensures the model’s future-proofing, ready to accommodate new modalities as they become relevant.

CREMA’s performance has been rigorously validated across various benchmarks, demonstrating superior or equivalent results compared to existing multimodal learning models with a fraction of the trainable parameters. This efficiency does not come at the cost of effectiveness; CREMA adeptly balances the inclusion of new modalities, ensuring that each contributes meaningfully to the reasoning process without overwhelming the system with redundant or irrelevant information.

In conclusion, CREMA represents a significant leap forward in multimodal video reasoning. Its innovative fusion of diverse data types into a coherent, efficient framework not only addresses the challenges of flexibility and computational efficiency but also sets a new standard for future developments in the field. The implications of this research are profound, promising to enhance AI’s ability to interpret and interact with the world more nuanced and intelligently.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post CREMA by UNC-Chapel Hill: A Modular AI Framework for Efficient Multimodal Video Reasoning appeared first on MarkTechPost.

Maximizing ROI with Advanced Facebook Ad Strategies

With all the changes happening in the world of Facebook advertising, staying ahead requires more than just a basic understanding of ad placements and targeting. 

As technology evolves and privacy concerns continue to intensify, advertisers tasked with maximizing ROI must continuously refine their approaches. 

This means going beyond the basics and focusing on things like advanced segmentation and targeting, AI and machine learning for optimization, creative testing, and more. 

The thing is, we know how much potential there is in the Facebook ecosystem, it’s why we are an official Meta partner. When done right, advertisers can see really exceptional results.

So let’s dive in and look at how we can maximize performance and take our Facebook ad strategies to the next level. 

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Advanced Segmentation and Targeting 

We are big fans of segmentation here at Customers.ai. 

After all, detailed audience segmentation allows advertisers to create and deliver highly relevant content to specific audiences. By looking at demographics, interests, behaviors, and more, you can significantly increase engagement rates and improve overall campaign effectiveness. 

Seems like a good thing right?

There are several ways you can add advanced segmentation strategies to your ad campaigns. Let’s look at three: Custom Audiences, Lookalike Audiences & Advantage+ Audiences.

Custom Audiences

Custom audiences are created from existing data on your customers (think email lists or users who have previously interacted with your content).

While Facebook is pushing advertisers to Advantage+, we think custom audiences are way more valuable. After all, they are YOUR audience. 

And with the right data, you can create fantastic segments with highly targeted creative. This includes:

Website Traffic: Create custom audiences from users who have visited specific pages of your website. For example, with the Customers.ai Website Visitor ID X-Ray pixel, you can identify over 20% of site visitors, even if they don’t give you their information. That data can then be used to retarget individuals on Facebook. With Restore, you can get an even higher match rate (check out this post).

Engage Past Customers: No one wants a one and done buyer. Target users who have previously made a purchase with ads for related products or exclusive offers. Is it time for a refill? Is there an update on the product they purchased? Get creative and make your segments work for you.

Segment by Intent: We know that not every buyer is ready to make a purchase. That’s ok! That’s what ads are for. The ads for visitors who hit the shopping cart should be much different than the ads for visitors who hit the T-Shirts page. By being able to track which pages your users visit, you can create really specific segments based on intent. 

Lucky for you, our tools make all of this not just possible, but easy. Including our customer journey mapping feature which shows you all the pages your buyer visited.

And our Facebook integration which allows you to send audiences directly to your Facebook campaigns:

Lookalike & Advantage+ Audiences 

Lookalike audiences allow you to reach new people whose interests and behaviors are similar to those of your existing customers while Advantage+ audiences use Meta’s advanced AI to build your campaign audience.

The thing with both of these audiences is that they are only as good as the data you populate them with. 

So whether you are still using lookalike audiences or have made the switch to Advantage+, you want to keep the following in mind:

Start with a High-Quality Source Audience: You want to reach those most likely to buy right? That means your source audience must be people who…you guessed it, bought! Use your best-performing customer segments, such as those with high engagement or conversion rates, as the basis for creating lookalike audiences or informing your Advantage+ audiences.

Specify Audience Size and Similarity: In the case of lookalike audiences, size matters. While Facebook generally recommends a source audience between 1,000-5,000 people, smaller audiences can actually perform better. For Advantage+ audiences, more data is usually better. The more data you can “train” Facebook’s AI with, the better.

Update and Refine: Like anything in marketing, it’s important to continuously update and refine your source audiences to ensure your segments remain relevant and effective. 

Whatever kind of audience you decide on (maybe all of them!), segmentation not only elevates the efficiency of your ad spend but also drives superior conversion rates by aligning your content with the distinct needs and interests of each audience segment.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Leveraging AI and Machine Learning for Optimization

AI and machine learning are revolutionizing how Facebook ad campaigns are optimized. 

We’re talking automated bidding processes, campaign data analysis, and real-time campaign optimization. AI algorithms can predict outcomes based on historical performance and real-time data, adjusting bids in milliseconds to maximize ad visibility and engagement. 

These dynamic capabilities ensure that advertisers not only reach their target audience more effectively but also optimize their ad spend, reducing CPA and increasing ROI. 

Here are a few examples of how AI is being used (and can be used) in Facebook advertising:

Dynamic Creative Optimization: AI can be used to automatically test different combinations of ad elements such as images, videos, headlines, descriptions, and CTAs. By analyzing the performance data in real-time, the system identifies the most effective combination for each segment of the audience. Even if you aren’t doing dynamic ad creation, AI tools can be really helpful for creative. 

Predictive Analytics for Audience Targeting: Facebook uses AI to analyze data on things like user behavior, preferences, and engagement patterns. This analysis helps predict which users are most likely to take a specific action (think making a purchase, clicking on a link, or engaging with content) and ensures ads are shown to these particular people. 

Audience Expansion: We already touched on how AI is driving audience expansion above but it’s worth noting just how big of an impact this is now having in ads. With iOS 14 and now the loss of cookies, Facebook audiences have shrunk dramatically. AI is helping Facebook (and you) to build back those audiences and ensure targeting capabilities don’t completely fade away. 

AI is here to stay and as and advertiser, you can use it to your advantage or you can let your competitors use it to their advantage. 

Creative Testing and Iteration

Testing isn’t exactly an “advanced” technique in the general sense. However, the process of creative testing can be. 

Testing allows you to understand which elements of your ads resonate most and it allows you to see results quicker. 

Let’s look at a few creative testing strategies:

A/B Testing: At its core, A/B testing is simple – compare two versions of an ad and see which performs better. The key here is variable isolation; change one element at a time—be it the image, headline, ad copy, or CTA—while keeping all other variables constant. By doing it this way, you can get clear insights into which specific changes improve ad performance.

Multivariate Testing: For a more comprehensive analysis, multivariate testing lets you test multiple variations of several elements at once. While it can be a bit more challenging, this approach can show how different elements interact with one another and their combined effect on ad performance. The key to multivariate testing is a large audience and statistically significant results.

Sequential Testing: Sequential testing involves giving your audience a series of ad variations over time. While more time-consuming than A/B testing, this strategy can be particularly useful for understanding how changes in creative elements impact ad fatigue and engagement over longer campaigns. The goal – figuring out the optimal frequency and timing for refreshing ad creatives.

In the analysis phase, having the right tools in place is key. Whether it’s Facebook’s analytics tools or third-party platforms, you will need something to understand ad performance and test results. 

By prioritizing creative testing and iteration, advertisers can significantly enhance the effectiveness of their Facebook ads. This cycle of analysis and iteration ensures that ad strategies remain dynamic, targeted, and continually optimized.

Making Facebook Ads Work for You  

As we said earlier, there is so much potential in Facebook Ads. I think sometimes we forget there is actually a whole Meta ecosystem of Facebook, Instagram, WhatsApp, and Messenger, giving us multiple places to reach our audience.

But we won’t be successful if we rely on old strategies. 

In order to make Facebook ads work for you, you must go beyond the basics. We need advanced segmentation and targeting, we need to capitalize on AI technology, and we certainly need to remember to test, test, test.

And if you really want to go above and beyond, you need Customers.ai. 

Our Restore product can skyrocket your Facebook reach and allow you to target website visitors you had no idea existed. 

Want to learn more? Try it free or contact our sales team for more information.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Maximizing ROI with Advanced Facebook Ad Strategies appeared first on Customers.ai.

Meet Guardrails: An Open-Source Python Package for Specifying Structur …

In the vast world of artificial intelligence, developers face a common challenge – ensuring the reliability and quality of outputs generated by large language models (LLMs). The outputs, like generated text or code, must be accurate, structured, and aligned with specified requirements. These outputs may contain biases, bugs, or other usability issues without proper validation.

While developers often rely on LLMs to generate various outputs, there is a need for a tool that can add a layer of assurance, validating and correcting the results. Existing solutions are limited, often requiring manual intervention or lacking a comprehensive approach to ensure both structure and type guarantees in the generated content. This gap in the existing tools prompted the development of Guardrails, an open-source Python package designed to address these challenges.

Guardrails introduces the concept of a “rail spec,” a human-readable file format (.rail) that allows users to define the expected structure and types of LLM outputs. This spec also includes quality criteria, such as checking for biases in generated text or bugs in code. The tool utilizes validators to enforce these criteria and takes corrective actions, such as reasking the LLM when validation fails.

One of Guardrails‘ notable features is its compatibility with various LLMs, including popular ones like OpenAI’s GPT and Anthropic’s Claude, as well as any language model available on Hugging Face. This flexibility allows developers to integrate Guardrails seamlessly into their existing workflows.

To showcase its capabilities, Guardrails offers Pydantic-style validation, ensuring that the outputs conform to the specified structure and predefined variable types. The tool goes beyond simple structuring, allowing developers to set up corrective actions when the output fails to meet the specified criteria. For example, if a generated pet name exceeds the defined length, Guardrails triggers a reask to the LLM, prompting it to generate a new, valid name.

Guardrails also supports streaming, enabling users to receive validations in real-time without waiting for the entire process to complete. This enhancement enhances efficiency and provides a dynamic way to interact with the LLM during the generation process.

In conclusion, Guardrails addresses a crucial aspect of AI development by providing a reliable solution to validate and correct the outputs of LLMs. Its rail spec, Pydantic-style validation, and corrective actions make it a valuable tool for developers striving to enhance AI-generated content’s accuracy, relevance, and quality. With Guardrails, developers can navigate the challenges of ensuring reliable AI outputs with greater confidence and efficiency.
The post Meet Guardrails: An Open-Source Python Package for Specifying Structure and Type, Validating and Correcting the Outputs of Large Language Models (LLMs) appeared first on MarkTechPost.

Cornell Researchers Introduce Graph Mamba Networks (GMNs): A General F …

Graph-based machine learning is undergoing a significant transformation, largely propelled by the introduction of Graph Neural Networks (GNNs). These networks have been pivotal in harnessing the complexity of graph-structured data, offering innovative solutions across various domains. Despite their initial success, traditional GNNs face critical challenges, particularly those relying on local message-passing mechanisms. They need help managing long-range dependencies within graphs and often encounter the issue of over-squashing, where information from distant nodes is compressed excessively as it passes through the network layers.

Graph Mamba Networks (GMNs) by researchers from Cornell University emerge as a groundbreaking solution to these challenges. By integrating the principles of State Space Models (SSMs), widely celebrated for their efficiency and effectiveness across different data modalities, GMNs offer a novel approach to graph learning. This innovative framework is designed to overcome the limitations of both traditional GNNs and their more recent advancements, such as Graph Transformers, which, despite their promise, grapple with scalability due to their quadratic computational requirements.

At the heart of GMNs lies a meticulously crafted architecture that embraces neighborhood tokenization, token ordering, and a bidirectional selective SSM encoder, among other features. This structure enhances the network’s ability to capture and model long-range dependencies effectively and addresses the computational and structural constraints that have hampered previous models. GMNs adopt a selective approach to SSM application on graph data, enabling more nuanced and efficient handling of the inherent complexities of graph-structured information.

The introduction of GMNs into the landscape of graph-based machine learning is not without empirical validation. Rigorous testing across a spectrum of benchmarks reveals that GMNs excel in tasks requiring modeling long-range interactions within graphs. This exceptional performance is not just a testament to the architectural ingenuity of GMNs but also highlights the strategic leverage of SSMs’ strengths in a graph-learning context. GMNs distinguish themselves through their computational efficiency, setting a new standard in the field.

GMNs stand out as a beacon of progress. They signify a major leap in our capacity to learn from graph-structured data and open up a myriad of possibilities for exploration and application. From analyzing complex social networks to deciphering the intricate molecular structures that define life, GMNs offer a robust and efficient framework for understanding how data connects and interacts.

In conclusion, the advent of Graph Mamba Networks marks a pivotal moment in graph-based machine learning:

GMNs adeptly incorporate state space models to address the limitations of traditional GNNs and Graph Transformers, paving the way for more efficient graph learning.

The unique architecture of GMNs, featuring neighborhood tokenization and a bidirectional selective SSM encoder, enables the nuanced handling of graph-structured data.

Demonstrated through extensive benchmarks, GMNs excel in capturing long-range dependencies within graphs, showcasing superior performance and remarkable computational efficiency.

GMNs open new avenues for research and application across various domains by enhancing our ability to model and understand graph-structured data.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post Cornell Researchers Introduce Graph Mamba Networks (GMNs): A General Framework for a New Class of Graph Neural Networks Based on Selective State Space Models appeared first on MarkTechPost.

LAION Presents BUD-E: An Open-Source Voice Assistant that Runs on a Ga …

In the fast-paced world of technology, where innovation often outpaces human interaction, LAION and its collaborators at the ELLIS Institute Tübingen, Collabora, and the Tübingen AI Center are taking a giant leap towards revolutionizing how we converse with artificial intelligence. Their brainchild, BUD-E (Buddy for Understanding and Digital Empathy), seeks to break down the barriers of stilted, mechanical responses that have long hindered our immersive experiences with AI voice assistants.

The journey began with a mission to create a baseline voice assistant that not only responded in real time but also embraced natural voices, empathy, and emotional intelligence. The team recognized the shortcomings of existing models, focusing on reducing latency and enhancing the overall conversational quality. The result? A carefully evaluated model boasts response times as low as 300 to 500 ms, setting the stage for a more seamless and responsive interaction.

However, the developers acknowledge that the road to a truly empathic and natural voice assistant is still in progress. Their open-source initiative invites contributions from a global community, emphasizing the need to tackle immediate problems and work towards a shared vision.

One key area of focus is the reduction of latency and system requirements. The team aims to achieve response times below 300 ms through sophisticated quantization techniques and fine-tuning streaming models, even with larger models. This dedication to real-time interaction lays the groundwork for an AI companion that mirrors the fluidity of human conversation.

The quest for naturalness extends to speech and responses. Leveraging a dataset of natural human dialogues, the developers are fine-tuning BUD-E to respond similarly to humans, incorporating interruptions, affirmations, and thinking pauses. The goal is to create an AI voice assistant that not only understands language but also mirrors the nuances of human expression.

BUD-E’s memory is another remarkable feature in development. With tools like Retrieval Augmented Generation (RAG) and Conversation Memory, the model aims to keep track of conversations over extended periods, unlocking a new level of context familiarity.

The developers are not stopping there. BUD-E is envisioned to be a multi-modal assistant, incorporating visual input through a lightweight vision encoder. The incorporation of webcam images to evaluate user emotions adds a layer of emotional intelligence, bringing the AI voice assistant closer to understanding and responding to human feelings.

Building a user-friendly interface is also a priority. The team plans to implement LLamaFile for easy cross-platform installation and deployment, introducing an animated avatar akin to Meta’s Audio2Photoreal. A chat-based interface capturing conversations in writing and providing ways to capture user feedback aims to make the interaction intuitive and enjoyable.

Furthermore, BUD-E is not limited by language or the number of speakers. The developers are extending streaming Speech-to-Text to more languages, including low-resource ones, and plan to accommodate multi-speaker environments seamlessly.

In conclusion, the development of BUD-E represents a collective effort to create AI voice assistants that engage in natural, intuitive, and empathetic conversations. The future of conversational AI looks promising as BUD-E stands as a beacon, lighting the way for the next era of human-technology interaction.

Check out the Code and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel
The post LAION Presents BUD-E: An Open-Source Voice Assistant that Runs on a Gaming Laptop with Low Latency without Requiring an Internet Connection appeared first on MarkTechPost.

How to Use Restore by Customers.ai to Boost Your Ad Campaigns

Digital marketing’s getting trickier by the day, isn’t it? Between all the new privacy laws, the end of those handy third-party cookies, and the ever-annoying bots tricking us into wasting our ad budgets, it feels like we’re running an obstacle course. And let’s not even get started on trying to keep up with how fast everything changes, including what our audiences are looking for. It’s a lot.

But, imagine if we didn’t have to stress so much about these things. What if there was a way to get ahead of the game, to really understand and reach the people we want to talk to, without stepping on any privacy landmines? And what if we could make sure our hard-earned ad dollars were actually going towards real, live human beings interested in what we’ve got to offer?

Enter Restore. It’s our new tool that’s all about making your life a whole lot easier. We’ve figured out a way to track what matters—real interest and intent—without relying on those disappearing cookies. And bots? We can help you spot those so you’re not throwing your budget into the digital void.

In this blog post, we’re diving deep into what’s been making digital advertising such a headache lately and how Restore is changing the game. It’s not just about keeping up anymore; it’s about setting the pace. We’ll talk about how Restore’s tech can give you a clearer view of who’s actually interested in what you’re selling, all while playing nice with privacy rules and keeping your ad spend in check.

Common Digital Advertising Problems

Solving Digital Advertising Problems with Restore

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Common Digital Advertising Problems

Let’s take a step back and explore how the digital advertising landscape has evolved, leading us to navigate through some challenging waters today. It’s been quite a journey from the simpler times of online advertising to the complex scenario we’re dealing with now.

The Rise of Privacy Concerns

One of the most significant shifts has been the heightened focus on privacy. With regulations like the GDPR in Europe and the CCPA in California coming into effect, the importance of respecting user privacy and handling data responsibly has taken center stage. These changes have fundamentally altered how we can collect and use data, emphasizing the need for transparency and user consent. It’s a necessary evolution, ensuring that trust and respect form the foundation of our interactions online.

Navigating a World Without Cookies

The gradual elimination of third-party cookies is another pivotal change. Major browsers are moving away from cookies, pushing the industry towards more privacy-friendly ways of understanding user behavior. This transition challenges us to innovate and find new methods to reach our audience without compromising on privacy.

The Challenge of Bots and Fraud

Bots have also become a more pronounced issue, creating a landscape where ad fraud can easily eat into budgets without delivering any real engagement. Identifying and mitigating the impact of bots is crucial in ensuring that our advertising efforts reach real people and drive genuine interactions.

Ad Saturation and Consumer Fatigue

Ad Saturation and Consumer Fatigue: With the digital space becoming increasingly crowded, capturing and maintaining audience attention has become more difficult. Consumers are bombarded with ads, leading to ad fatigue and making it harder for messages to stand out. This saturation calls for more creative and engaging approaches to advertising, prioritizing quality and relevance.

The Increasing Complexity of Digital Advertising

Finally, the sheer complexity of digital advertising today, with its myriad platforms, formats, and strategies, demands a more sophisticated approach. Staying ahead requires not just keeping up with current trends but anticipating future shifts, all while maintaining an ethical and responsible stance towards privacy and data use.

In this evolving landscape, the need for tools that can navigate these complexities while upholding the principles of privacy and engagement has never been clearer. Enter a solution designed to meet these modern challenges with integrity and innovation. Let’s dive into how we can adapt and thrive in this new era of digital advertising, making every connection count.

Navigating the current digital advertising landscape requires a blend of innovation, strategy, and a keen understanding of the evolving challenges. As we’ve explored the shifts and turns in this environment, the next step is to delve into effective strategies that can help advertisers thrive amidst these changes. Three key strategies stand out for their ability to enhance engagement, optimize spend, and ultimately drive better results: accurate retargeting and lookalike audiences, understanding the full customer journey, and avoiding wasteful spending on bots.

Overcoming Challenges with Restore by Customers.ai

We’ve designed our Restore tool to help our customers vault over these pernicious hurdles. Here’s how: 

Accurate Retargeting and Lookalike Audiences

Retargeting and lookalike audiences are the two most crucial audience building tools in your digital advertising arsenal. 

But because Facebook’s pixel is blocked by so many browsers, these crucial targeting audiences are built with bad data, meaning your campaigns won’t be reaching who they should. That’s bad! 

Our Restore tool gathers data from our Website Visitor ID X-Ray Pixel, which isn’t blocked by any browsers. That means that you can build your retargeting and lookalike audience off of people who are actually visiting your site. 

Identifying High-Intent Users

Zooming in on high-intent users is a game-changer. It’s all about spotting those who are just a nudge away from making a purchase or signing up. By tracking key behaviors—like how often they visit your site, what they’re checking out, or how long they linger on certain pages—you get a clearer picture of who’s really interested. This isn’t just about following the crowd; it’s about understanding who’s ready to take the leap. With smarter analytics, you can tailor your messages to catch them at just the right moment, turning warm leads into solid conversions. This strategy sharpens your focus, ensuring your ad dollars target the people most likely to act, making every penny count.

Our Restore tool allows you to target people who’ve visited specific high-intent pages or viewed the same product several times. This means that you don’t waste money on badly targeted campaigns to low-intent visitors! 

Creating these audiences is very simple! 

In your Customers.ai account, navigate to your My Leads tab and click on “Audiences.” 

You’ll see a big list of all your contacts. Then select “Add FIlter.”

In the Attribute drop-down, select “Landing Page URL” and in the Operator drop-down select “Equals.” Paste the landing page URL in the Value section. 

Then just save it as an audience and you’re good to send it to your Facebook account! 

Understanding the Full Customer Journey

The digital touchpoints a customer interacts with on their journey are like pieces of a puzzle. Having visibility into the entire picture is crucial for crafting campaigns that not only reach the customer at the right time but also with the right message. This holistic view goes beyond the last click, acknowledging that earlier interactions, though they may not directly lead to a conversion, play a significant role in influencing the customer’s decision-making process.

Our Customer Journey tools allow you to understand your customers’ journeys across devices and retarget ads to them more effectively. 

Avoiding Spending on Bots

The digital ad space is fraught with inefficiencies, one of the most glaring being the expenditure on non-human traffic, namely bots. Bots can skew analytics, drain budgets, and dilute the effectiveness of campaigns. Implementing strategies to identify and exclude bot traffic is not just about saving money; it’s about ensuring that every dollar spent is an investment in reaching real, engaged users.

Because our audiences are built on who actually visits your site–and we verify your contacts–you don’t have to worry about chasing a bot who’ll never purchase from you (because they don’t really exist!). 

Interested in getting started? See how many contacts we could pull for your ad audiences here: 

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post How to Use Restore by Customers.ai to Boost Your Ad Campaigns appeared first on Customers.ai.

Google AI Introduces ScreenAI: A Vision-Language Model for User interf …

The capacity of infographics to strategically arrange and use visual signals to clarify complicated concepts has made them essential for efficient communication. Infographics include various visual elements such as charts, diagrams, illustrations, maps, tables, and document layouts. This has been a long-standing technique that makes the material easier to understand. User interfaces (UIs) on desktop and mobile platforms share design concepts and visual languages with infographics in the modern digital world. 

Though there is a lot of overlap between UIs and infographics, creating a cohesive model is made more difficult by the complexity of each. It is difficult to develop a single model that can efficiently analyze and interpret the visual information encoded in pixels because of the intricacy required in understanding, reasoning, and engaging with the various aspects of infographics and user interfaces.

To address this, in a recent Google Research, a team of researchers proposed ScreenAI as a solution. ScreenAI is a Vision-Language Model (VLM) that has the ability to comprehend both UIs and infographics fully. Tasks like graphical question-answering (QA), which may contain charts, pictures, maps, and more, have been included in its scope.

The team has shared that ScreenAI can manage jobs like element annotation, summarization, navigation, and additional UI-specific QA. To accomplish this, the model combines the flexible patching method taken from Pix2struct with the PaLI architecture, which allows it to tackle vision-related tasks by converting them into text or image-to-text problems.

Several tests have been carried out to demonstrate how these design decisions affect the model’s functionality. Upon evaluation, ScreenAI produced new state-of-the-art results on tasks like Multipage DocVQA, WebSRC, MoTIF, and Widget Captioning with under 5 billion parameters. It achieved remarkable performance on tasks including DocVQA, InfographicVQA, and Chart QA, outperforming models of comparable size. 

The team has made available three additional datasets: Screen Annotation, ScreenQA Short, and Complex ScreenQA. One of these datasets specifically focuses on the screen annotation task for future research, while the other two datasets are focused on question-answering, thus further expanding the resources available to advance the field. 

The team has summarized their primary contributions as follows:

The Vision-Language Model (VLM) ScreenAI concept is a step towards a holistic solution that focuses on infographic and user interface comprehension. By utilizing the common visual language and sophisticated design of these components, ScreenAI offers a comprehensive method for understanding digital material.

One significant advancement is the development of a textual representation for UIs. During the pretraining stage, this representation has been used to teach the model how to comprehend user interfaces, improving its capacity to comprehend and process visual data.

To automatically create training data at scale, ScreenAI has used LLMs and the new UI representation, making training more effective and comprehensive.

Three new datasets, Screen Annotation, ScreenQA Short, and Complex ScreenQA, have been released. These datasets allow for thorough model benchmarking for screen-based question answering and the suggested textual representation.

ScreenAI has outperformed larger models by a factor of ten or more on four public infographics QA benchmarks, even with its low number of 4.6 billion parameters. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Google AI Introduces ScreenAI: A Vision-Language Model for User interfaces (UI) and Infographics Understanding appeared first on MarkTechPost.

What is Fine Tuning and Best Methods for Large Language Model (LLM) Fi …

Large Language Models (LLMs) such as GPT, PaLM, and LLaMa have made major advancements in the field of Artificial Intelligence (AI) and Natural Language Processing (NLP) by enabling machines to comprehend and produce content that is similar to that of humans. These models possess an extensive comprehension of language and its subtleties, having been trained on massive amounts of data. However, their generalist character frequently proves inadequate when used for specialized activities or domains. This is where finetuning enters the picture, which is a crucial procedure that greatly improves the model’s performance.

What is Fine Tuning?

Finetuning is a way to modify a language model that has already been taught to perform well in a certain area. Even though LLMs have remarkable comprehension and production skills, they are not naturally suited to tackle specialized activities accurately. By retraining the model on a more manageable, domain-specific dataset, finetuning overcomes this constraint and enables the model to acquire the nuances and distinctive features of the intended field.

A pre-trained model with a broad grasp of language is the starting point for finetuning. This model is finetuned by subjecting it to a carefully selected dataset. The model modifies its internal parameters, such as weights and biases, through this exposure to better match the data’s characteristics. This specialized training phase greatly enhances the model’s performance on tasks linked to the domain, which helps the model understand the intricacies, vocabulary, and context.

Fine Tuning Approaches

Parameter Efficient Fine Tuning (PEFT) 

Reducing the trainable parameters in a neural network makes the training process more computationally efficient, and this is the main notion underlying PEFT. LoRA and QLoRA are a few prominent PEFT approaches.

a) LoRA 

Low-Rank Adaptation, or LoRA, is a PEFT method that operates as an adapter-based strategy. LoRA simply adds new parameters during the training phase, never permanently changing the model architecture. This method enables parameter-efficient finetuning without adding more parameters to the model overall.

LoRA divides the weight update matrix into two smaller matrices, A and B, each of which has a rank parameter ‘r.’ This allows for parameter efficiency. The rank parameter determines the size of these smaller matrices. The weight update matrix has the same size as the number of parameters that need to be updated during finetuning, and it basically represents the modifications learned through backpropagation. These smaller matrices help the model be trained using standard backpropagation. 

b) QLoRA

Quantized LoRA, often known as QLoRA, is an improvement on LoRA that combines low-precision storage with high-precision computation techniques. The goal of this combination is to maintain good accuracy and performance while keeping the model small.

To accomplish its objectives, QLoRA presents two crucial concepts,  i.e., Normal Float for 4 bits, in which numerical values are represented using a 4-bit normal float representation, and Double quantization, which includes quantizing both the learning rate and the model parameters. 

2. Supervised finetuning

Supervised finetuning is a method of optimizing LLMs using task-specific labeled datasets. The foundation of this approach is the idea that every input data point in these datasets is labeled with an accurate label or response, acting as a final manual for the model to follow during its learning phase. The model is motivated to modify its internal parameters in order to achieve high-accuracy label prediction through supervised fine-tuning. This uses the model’s huge knowledge base, which it gathered from large datasets during its initial pre-training phase, and refines it to the particulars and demands of the intended task. 

a) Basic Hyperparameter Tuning

Using this fundamental method, the model’s hyperparameters and important variables that control the training process, like learning rate, batch size, and number of training epochs, are carefully adjusted. The essence of basic hyperparameter tweaking is finding the ideal mix of these parameters that enables the model to learn from the task-specific data most effectively. This significantly increases learning efficacy, improving the model’s task-specific performance while reducing the likelihood of overfitting.

b) Transfer Learning

Transfer learning is particularly useful when there is a shortage of task-specific data. It begins with a pre-trained model on a large-scale, widely-used dataset. The smaller, task-specific dataset is then used to refine this model. Utilizing the model’s previously gained, broad information and tailoring it to the new task is the essence of transfer learning. In addition to saving time and training resources, this method frequently produces better outcomes than creating a model from scratch.

c) Few-shot learning

Few-shot learning enables a model to rapidly adjust to a new task using the least amount of task-specific data possible. By utilizing the model’s vast pre-trained knowledge base, it can understand the new task in a few instances. This approach is helpful when gathering a sizable labeled dataset for the new task is not feasible. The foundation of few-shot learning is the idea that a limited number of examples given during inference can successfully direct the model’s comprehension and execution of the novel job.

3. Reinforcement Learning from Human Feedback (RLHF) 

RLHF is an approach to language model training that integrates human evaluation skills and sophisticated comprehension into machine learning. This technology allows language models to be dynamically improved, resulting in outputs that are accurate, socially and contextually suitable. The key to RLHF is its capacity to combine the algorithmic learning powers of models with the subjective assessments of human feedback, allowing the models to develop more naturally and more responsively.

a) Reward modeling

By exposing the model to a range of possible reactions, reward modeling involves assessing the model’s performance through human evaluation. A variety of factors, such as appropriateness, coherence, and relevance, are taken into consideration by the evaluators when rating or ranking these outputs. The model is then trained as a reward function using human input as it learns to predict the rewards for various outputs depending on human evaluations. The model uses this learned reward function as a guide to modify its outputs over time to maximize these rewards from humans.

b) Proximal Policy Optimisation

Within the RLHF paradigm, Proximal Policy Optimisation is a more technical step that focuses on improving the model’s decision-making policy iteratively in order to improve the expected reward outcomes. The key to PPO’s effectiveness is its deliberate approach to policy updates, which attempts to make modifiable but cautiously incremental changes to the model’s policy to prevent dramatic shifts that can upset the learning trajectory. 

An objective function that has been created and incorporates a clipping method to control the policy update rate accomplishes this. By doing this, PPO guarantees that the policy updates retain a controlled and steady advancement in learning by not deviating too much from the prior policy iteration, even while they are still significant enough to contribute to learning. PPO’s constraint mechanism is essential to its effectiveness because it fosters a steady and balanced learning process that is less vulnerable to the dangers of unpredictable policy changes.

References 

https://www.turing.com/resources/finetuning-large-language-models

Parameter-Efficient Fine-Tuning of Large Language Models with LoRA and QLoRA

https://medium.com/@sujathamudadla1213/difference-between-qlora-and-lora-for-fine-tuning-llms-0ea35a195535

A Comprehensive Guide to Fine-Tuning Large Language Models

https://www.signalfire.com/blog/comparing-llm-fine-tuning-methods

The post What is Fine Tuning and Best Methods for Large Language Model (LLM) Fine-Tuning appeared first on MarkTechPost.

Unlocking AI’s Potential: A Comprehensive Survey of Prompt Engineeri …

Prompt engineering has burgeoned into a pivotal technique for augmenting the capabilities of large language models (LLMs) and vision-language models (VLMs), utilizing task-specific instructions or prompts to amplify model efficacy without altering core model parameters. These prompts range from natural language instructions that provide context to guide the model to learning vector representations that activate relevant knowledge, fostering success in myriad applications like question-answering and commonsense reasoning. Despite its burgeoning use, a systematic organization and understanding of the diverse prompt engineering methods still need to be discovered.

This survey by researchers from the Indian Institute of Technology Patna, Stanford University, and Amazon AI endeavors to bridge this gap by offering a structured overview of the recent advancements in prompt engineering, categorized by application area. It meticulously analyzes over 29 distinct techniques, delving into their methodologies, applications, models involved, and datasets utilized. This examination extends from foundational methods like zero-shot and few-shot prompting to more intricate approaches such as chain of code prompting, showcasing the field’s breadth and depth.

The survey highlights the transformative impact of prompt engineering on the adaptability of LLMs and VLMs, enabling these models to excel across diverse tasks and domains with a finesse previously unattainable through traditional model training paradigms. Prompt engineering pushes the boundaries of AI by sidestepping the need for model retraining or extensive fine-tuning, paving the way for a future teeming with possibilities.

The survey underscores the importance of prompt engineering in steering model responses, thus enhancing the adaptability and applicability of LLMs across various sectors. It presents a comprehensive taxonomy and summarizes key points, datasets, models, and the critical features of each prompting technique, providing a clearer understanding of this rapidly developing field. This systematic analysis aims to illuminate open challenges and opportunities for prompt engineering, facilitating future research in this dynamic arena.

In conclusion, the domain of artificial intelligence witnesses prompt engineering as a transformative force, unlocking the vast potential of LLMs. This survey serves as a foundational resource, categorizing distinct prompt engineering techniques based on their functionalities, inspiring further research, and empowering innovators in the evolving landscape of prompt engineering. Despite its successes, challenges such as biases, factual inaccuracies, and interpretability gaps persist, necessitating continued investigation and mitigation strategies. With emerging trends like meta-learning and hybrid prompting architectures, the future of prompt engineering holds immense potential, yet ethical considerations remain paramount to ensure its responsible development and deployment.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Unlocking AI’s Potential: A Comprehensive Survey of Prompt Engineering Techniques appeared first on MarkTechPost.

Streamline diarization using AI as an assistive technology: ZOO Digita …

ZOO Digital provides end-to-end localization and media services to adapt original TV and movie content to different languages, regions, and cultures. It makes globalization easier for the world’s best content creators. Trusted by the biggest names in entertainment, ZOO Digital delivers high-quality localization and media services at scale, including dubbing, subtitling, scripting, and compliance.
Typical localization workflows require manual speaker diarization, wherein an audio stream is segmented based on the identity of the speaker. This time-consuming process must be completed before content can be dubbed into another language. With manual methods, a 30-minute episode can take between 1–3 hours to localize. Through automation, ZOO Digital aims to achieve localization in under 30 minutes.
In this post, we discuss deploying scalable machine learning (ML) models for diarizing media content using Amazon SageMaker, with a focus on the WhisperX model.
Background
ZOO Digital’s vision is to provide a faster turnaround of localized content. This goal is bottlenecked by the manually intensive nature of the exercise compounded by the small workforce of skilled people that can localize content manually. ZOO Digital works with over 11,000 freelancers and localized over 600 million words in 2022 alone. However, the supply of skilled people is being outstripped by the increasing demand for content, requiring automation to assist with localization workflows.
With an aim to accelerate the localization of content workflows through machine learning, ZOO Digital engaged AWS Prototyping, an investment program by AWS to co-build workloads with customers. The engagement focused on delivering a functional solution for the localization process, while providing hands-on training to ZOO Digital developers on SageMaker, Amazon Transcribe, and Amazon Translate.
Customer challenge
After a title (a movie or an episode of a TV series) has been transcribed, speakers must be assigned to each segment of speech so that they can be correctly assigned to the voice artists that are cast to play the characters. This process is called speaker diarization. ZOO Digital faces the challenge of diarizing content at scale while being economically viable.
Solution overview
In this prototype, we stored the original media files in a specified Amazon Simple Storage Service (Amazon S3) bucket. This S3 bucket was configured to emit an event when new files are detected within it, triggering an AWS Lambda function. For instructions on configuring this trigger, refer to the tutorial Using an Amazon S3 trigger to invoke a Lambda function. Subsequently, the Lambda function invoked the SageMaker endpoint for inference using the Boto3 SageMaker Runtime client.
The WhisperX model, based on OpenAI’s Whisper, performs transcriptions and diarization for media assets. It’s built upon the Faster Whisper reimplementation, offering up to four times faster transcription with improved word-level timestamp alignment compared to Whisper. Additionally, it introduces speaker diarization, not present in the original Whisper model. WhisperX utilizes the Whisper model for transcriptions, the Wav2Vec2 model to enhance timestamp alignment (ensuring synchronization of transcribed text with audio timestamps), and the pyannote model for diarization. FFmpeg is used for loading audio from source media, supporting various media formats. The transparent and modular model architecture allows flexibility, because each component of the model can be swapped out as needed in the future. However, it’s essential to note that WhisperX lacks full management features and isn’t an enterprise-level product. Without maintenance and support, it may not be suitable for production deployment.
In this collaboration, we deployed and evaluated WhisperX on SageMaker, using an asynchronous inference endpoint to host the model. SageMaker asynchronous endpoints support upload sizes up to 1 GB and incorporate auto scaling features that efficiently mitigate traffic spikes and save costs during off-peak times. Asynchronous endpoints are particularly well-suited for processing large files, such as movies and TV series in our use case.
The following diagram illustrates the core elements of the experiments we conducted in this collaboration.

In the following sections, we delve into the details of deploying the WhisperX model on SageMaker, and evaluate the diarization performance.
Download the model and its components
WhisperX is a system that includes multiple models for transcription, forced alignment, and diarization. For smooth SageMaker operation without the need to fetch model artifacts during inference, it’s essential to pre-download all model artifacts. These artifacts are then loaded into the SageMaker serving container during initiation. Because these models aren’t directly accessible, we offer descriptions and sample code from the WhisperX source, providing instructions on downloading the model and its components.
WhisperX uses six models:

A Faster Whisper model
A Voice Activity Detection (VAD) model
A Wav2Vec2 model
pyannote’s Speaker Diarization model
pyannote’s Segmentation model
SpeechBrain’s Speaker Embedding model

Most of these models can be obtained from Hugging Face using the huggingface_hub library. We use the following download_hf_model() function to retrieve these model artifacts. An access token from Hugging Face, generated after accepting the user agreements for the following pyannote models, is required:

Speaker Diarization
Segmentation
Voice Activity Detection

import huggingface_hub
import yaml
import torchaudio
import urllib.request
import os

CONTAINER_MODEL_DIR = “/opt/ml/model”
WHISPERX_MODEL = “guillaumekln/faster-whisper-large-v2”
VAD_MODEL_URL = “https://whisperx.s3.eu-west-2.amazonaws.com/model_weights/segmentation/0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea/pytorch_model.bin”
WAV2VEC2_MODEL = “WAV2VEC2_ASR_BASE_960H”
DIARIZATION_MODEL = “pyannote/speaker-diarization”

def download_hf_model(model_name: str, hf_token: str, local_model_dir: str) -> str:
“””
Fetches the provided model from HuggingFace and returns the subdirectory it is downloaded to
:param model_name: HuggingFace model name (and an optional version, appended with @[version])
:param hf_token: HuggingFace access token authorized to access the requested model
:param local_model_dir: The local directory to download the model to
:return: The subdirectory within local_modeL_dir that the model is downloaded to
“””
model_subdir = model_name.split(‘@’)[0]
huggingface_hub.snapshot_download(model_subdir, token=hf_token, local_dir=f”{local_model_dir}/{model_subdir}”, local_dir_use_symlinks=False)
return model_subdir

The VAD model is fetched from Amazon S3, and the Wav2Vec2 model is retrieved from the torchaudio.pipelines module. Based on the following code, we can retrieve all the models’ artifacts, including those from Hugging Face, and save them to the specified local model directory:
def fetch_models(hf_token: str, local_model_dir=”./models”):
“””
Fetches all required models to run WhisperX locally without downloading models every time
:param hf_token: A huggingface access token to download the models
:param local_model_dir: The directory to download the models to
“””
# Fetch Faster Whisper’s Large V2 model from HuggingFace
download_hf_model(model_name=WHISPERX_MODEL, hf_token=hf_token, local_model_dir=local_model_dir)

# Fetch WhisperX’s VAD Segmentation model from S3
vad_model_dir = “whisperx/vad”
if not os.path.exists(f”{local_model_dir}/{vad_model_dir}”):
os.makedirs(f”{local_model_dir}/{vad_model_dir}”)

urllib.request.urlretrieve(VAD_MODEL_URL, f”{local_model_dir}/{vad_model_dir}/pytorch_model.bin”)

# Fetch the Wav2Vec2 alignment model
torchaudio.pipelines.__dict__[WAV2VEC2_MODEL].get_model(dl_kwargs={“model_dir”: f”{local_model_dir}/wav2vec2/”})

# Fetch pyannote’s Speaker Diarization model from HuggingFace
download_hf_model(model_name=DIARIZATION_MODEL,
hf_token=hf_token,
local_model_dir=local_model_dir)

# Read in the Speaker Diarization model config to fetch models and update with their local paths
with open(f”{local_model_dir}/{DIARIZATION_MODEL}/config.yaml”, ‘r’) as file:
diarization_config = yaml.safe_load(file)

embedding_model = diarization_config[‘pipeline’][‘params’][’embedding’]
embedding_model_dir = download_hf_model(model_name=embedding_model,
hf_token=hf_token,
local_model_dir=local_model_dir)
diarization_config[‘pipeline’][‘params’][’embedding’] = f”{CONTAINER_MODEL_DIR}/{embedding_model_dir}”

segmentation_model = diarization_config[‘pipeline’][‘params’][‘segmentation’]
segmentation_model_dir = download_hf_model(model_name=segmentation_model,
hf_token=hf_token,
local_model_dir=local_model_dir)
diarization_config[‘pipeline’][‘params’][‘segmentation’] = f”{CONTAINER_MODEL_DIR}/{segmentation_model_dir}/pytorch_model.bin”

with open(f”{local_model_dir}/{DIARIZATION_MODEL}/config.yaml”, ‘w’) as file:
yaml.safe_dump(diarization_config, file)

# Read in the Speaker Embedding model config to update it with its local path
speechbrain_hyperparams_path = f”{local_model_dir}/{embedding_model_dir}/hyperparams.yaml”
with open(speechbrain_hyperparams_path, ‘r’) as file:
speechbrain_hyperparams = file.read()

speechbrain_hyperparams = speechbrain_hyperparams.replace(embedding_model_dir, f”{CONTAINER_MODEL_DIR}/{embedding_model_dir}”)

with open(speechbrain_hyperparams_path, ‘w’) as file:
file.write(speechbrain_hyperparams)

Select the appropriate AWS Deep Learning Container for serving the model
After the model artifacts are saved using the preceding sample code, you can choose pre-built AWS Deep Learning Containers (DLCs) from the following GitHub repo. When selecting the Docker image, consider the following settings: framework (Hugging Face), task (inference), Python version, and hardware (for example, GPU). We recommend using the following image: 763104351884.dkr.ecr.[REGION].amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04 This image has all the necessary system packages pre-installed, such as ffmpeg. Remember to replace [REGION] with the AWS Region you are using.
For other required Python packages, create a requirements.txt file with a list of packages and their versions. These packages will be installed when the AWS DLC is built. The following are the additional packages needed to host the WhisperX model on SageMaker:

faster-whisper==0.7.1
git+https://github.com/m-bain/whisperx.git@1b092de19a1878a8f138f665b1467ca21b076e7e
ffmpeg-python

Create an inference script to load the models and run inference
Next, we create a custom inference.py script to outline how the WhisperX model and its components are loaded into the container and how the inference process should be run. The script contains two functions: model_fn and transform_fn. The model_fn function is invoked to load the models from their respective locations. Subsequently, these models are passed to the transform_fn function during inference, where transcription, alignment, and diarization processes are performed. The following is a code sample for inference.py:
import io
import json
import logging
import tempfile
import time

import torch
import whisperx

DEVICE = ‘cuda’ if torch.cuda.is_available() else ‘cpu’

def model_fn(model_dir: str) -> dict:
“””
Deserialize and return the models
“””
logging.info(“Loading WhisperX model”)
model = whisperx.load_model(whisper_arch=f”{model_dir}/guillaumekln/faster-whisper-large-v2″,
device=DEVICE,
language=”en”,
compute_type=”float16″,
vad_options={‘model_fp’: f”{model_dir}/whisperx/vad/pytorch_model.bin”})

logging.info(“Loading alignment model”)
align_model, metadata = whisperx.load_align_model(language_code=”en”,
device=DEVICE,
model_name=”WAV2VEC2_ASR_BASE_960H”,
model_dir=f”{model_dir}/wav2vec2″)

logging.info(“Loading diarization model”)
diarization_model = whisperx.DiarizationPipeline(model_name=f”{model_dir}/pyannote/speaker-diarization/config.yaml”,
device=DEVICE)

return {
‘model’: model,
‘align_model’: align_model,
‘metadata’: metadata,
‘diarization_model’: diarization_model
}

def transform_fn(model: dict, request_body: bytes, request_content_type: str, response_content_type=”application/json”) -> (str, str):
“””
Load in audio from the request, transcribe and diarize, and return JSON output
“””

# Start a timer so that we can log how long inference takes
start_time = time.time()

# Unpack the models
whisperx_model = model[‘model’]
align_model = model[‘align_model’]
metadata = model[‘metadata’]
diarization_model = model[‘diarization_model’]

# Load the media file (the request_body as bytes) into a temporary file, then use WhisperX to load the audio from it
logging.info(“Loading audio”)
with io.BytesIO(request_body) as file:
tfile = tempfile.NamedTemporaryFile(delete=False)
tfile.write(file.read())
audio = whisperx.load_audio(tfile.name)

# Run transcription
logging.info(“Transcribing audio”)
result = whisperx_model.transcribe(audio, batch_size=16)

# Align the outputs for better timings
logging.info(“Aligning outputs”)
result = whisperx.align(result[“segments”], align_model, metadata, audio, DEVICE, return_char_alignments=False)

# Run diarization
logging.info(“Running diarization”)
diarize_segments = diarization_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)

# Calculate the time it took to perform the transcription and diarization
end_time = time.time()
elapsed_time = end_time – start_time
logging.info(f”Transcription and Diarization took {int(elapsed_time)} seconds”)

# Return the results to be stored in S3
return json.dumps(result), response_content_type

Within the model’s directory, alongside the requirements.txt file, ensure the presence of inference.py in a code subdirectory. The models directory should resemble the following:

models
├── code
│ ├── inference.py
│ └── requirements.txt
├── guillaumekln
│ └── faster-whisper-large-v2
├── pyannote
│ ├── segmentation
│ │ └── …
│ └── speaker-diarization
│ └── …
├── speechbrain
│ └── spkrec-ecapa-voxceleb
│ └── …
├── wav2vec2
│ └── …
└── whisperx
└── vad
└── …

Create a tarball of the models
After you create the models and code directories, you can use the following command lines to compress the model into a tarball (.tar.gz file) and upload it to Amazon S3. At the time of writing, using the faster-whisper Large V2 model, the resulting tarball representing the SageMaker model is 3 GB in size. For more information, refer to Model hosting patterns in Amazon SageMaker, Part 2: Getting started with deploying real time models on SageMaker.

# Save the model artifacts to the ‘model’ directory and create a tarball
tar cvzf model.tar.gz -C model/ .
# Upload the model to S3
aws s3 cp model.tar.gz s3://<target_bucket>

Create a SageMaker model and deploy an endpoint with an asynchronous predictor
Now you can create the SageMaker model, endpoint config, and asynchronous endpoint with AsyncPredictor using the model tarball created in the previous step. For instructions, refer to Create an Asynchronous Inference Endpoint.
Evaluate diarization performance
To assess the diarization performance of the WhisperX model in various scenarios, we selected three episodes each from two English titles: one drama title consisting of 30-minute episodes, and one documentary title consisting of 45-minute episodes. We utilized pyannote’s metrics toolkit, pyannote.metrics, to calculate the diarization error rate (DER). In the evaluation, manually transcribed and diarized transcripts provided by ZOO served as the ground truth.
We defined the DER as follows:

Total is the length of the ground truth video. FA (False Alarm) is the length of segments that are considered as speech in predictions, but not in ground truth. Miss is the length of segments that are considered as speech in ground truth, but not in prediction. Error, also called Confusion, is the length of segments that are assigned to different speakers in prediction and ground truth. All the units are measured in seconds. The typical values for DER can vary depending on the specific application, dataset, and the quality of the diarization system. Note that DER can be larger than 1.0. A lower DER is better.
To be able to calculate the DER for a piece of media, a ground truth diarization is required as well as the WhisperX transcribed and diarized outputs. These must be parsed and result in lists of tuples containing a speaker label, speech segment start time, and speech segment end time for each segment of speech in the media. The speaker labels don’t need to match between the WhisperX and ground truth diarizations. The results are based mostly on the time of the segments. pyannote.metrics takes these tuples of ground truth diarizations and output diarizations (referred to in the pyannote.metrics documentation as reference and hypothesis) to calculate the DER. The following table summarizes our results.

Video Type 
DER 
Correct
Miss 
Error 
False Alarm 

Drama
0.738
44.80%
21.80%
33.30%
18.70%

Documentary 
1.29
94.50%
5.30%
0.20%
123.40%

Average
0.901
71.40%
13.50%
15.10%
61.50%

These results reveal a significant performance difference between the drama and documentary titles, with the model achieving notably better results (using DER as an aggregate metric) for the drama episodes compared to the documentary title. A closer analysis of the titles provides insights into potential factors contributing to this performance gap. One key factor could be the frequent presence of background music overlapping with speech in the documentary title. Although preprocessing media to enhance diarization accuracy, such as removing background noise to isolate speech, was beyond the scope of this prototype, it opens avenues for future work that could potentially enhance the performance of WhisperX.
Conclusion
In this post, we explored the collaborative partnership between AWS and ZOO Digital, employing machine learning techniques with SageMaker and the WhisperX model to enhance the diarization workflow. The AWS team played a pivotal role in assisting ZOO in prototyping, evaluating, and understanding the effective deployment of custom ML models, specifically designed for diarization. This included incorporating auto scaling for scalability using SageMaker.
Harnessing AI for diarization will lead to substantial savings in both cost and time when generating localized content for ZOO. By aiding transcribers in swiftly and precisely creating and identifying speakers, this technology addresses the traditionally time-consuming and error-prone nature of the task. The conventional process often involves multiple passes through the video and additional quality control steps to minimize errors. The adoption of AI for diarization enables a more targeted and efficient approach, thereby increasing productivity within a shorter timeframe.
We’ve outlined key steps to deploy the WhisperX model on the SageMaker asynchronous endpoint, and encourage you to try it yourself using the provided code. For further insights into ZOO Digital’s services and technology, visit ZOO Digital’s official site. For details on deploying the OpenAI Whisper model on SageMaker and various inference options, refer to Host the Whisper Model on Amazon SageMaker: exploring inference options. Feel free to share your thoughts in the comments.

About the Authors
Ying Hou, PhD, is a Machine Learning Prototyping Architect at AWS. Her primary areas of interest encompass Deep Learning, with a focus on GenAI, Computer Vision, NLP, and time series data prediction. In her spare time, she relishes spending quality moments with her family, immersing herself in novels, and hiking in the national parks of the UK.
Ethan Cumberland is an AI Research Engineer at ZOO Digital, where he works on using AI and Machine Learning as assistive technologies to improve workflows in speech, language, and localisation. He has a background in software engineering and research in the security and policing domain, focusing on extracting structured information from the web and leveraging open-source ML models for analysing and enriching collected data.
Gaurav Kaila leads the AWS Prototyping team for UK & Ireland. His team works with customers across diverse industries to ideate & co-develop business critical workloads with a mandate to accelerate adoption of AWS services.

The Evolution of Email Deliverability: From Basics to AI-Driven Insigh …

Did you know you can get a free email deliverability audit from Customers.ai? Start your free trial and get an automatic deliverability score!

Achieving that coveted spot in the inbox requires more than just sending messages; it requires navigating spam filters and privacy laws, all while writing copy that drives opens and keeps engagement high. No easy feat.

Email deliverability has long been a challenge and unfortunately, without the help of technology, it’s not going to get any easier. 

That’s where AI comes in. 

AI isn’t just helping to improve email deliverability. It’s turning the concept on its head, offering new strategies to help marketers deal with new changes and reach the inbox. 

Let’s explore the transformation from basic deliverability tactics to cutting-edge, AI-enhanced strategies that are setting new benchmarks in email marketing success.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Looking Back at Email Deliverability

In the early days of email marketing, deliverability hinged on one thing – avoiding the spam folder. 

There weren’t promotions tabs or super-sophisticated filters. You didn’t have to worry about having too many links in your message or using characters incorrectly and ending up in the spam folder.

The biggest challenges were crafting non-spammy subject lines and managing bounce rates while the strategies were straightforward – focus on list hygiene and mass distribution without much nuance.

Ah, simpler times. 

But as the saying goes, all good things must come to an end.

As email became ubiquitous, the landscape began to grow more complicated. 

The introduction of Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM), and Domain-based Message Authentication, Reporting, and Conformance (DMARC) marked significant milestones and new challenges for email marketers.

These authentication protocols, designed to verify the sender’s identity and combat phishing, also made deliverability much more difficult. 

At the same time as new protocols were being put in place, ISPs were evolving and adopting more sophisticated algorithms to filter out spam. 

All of this together led to one thing – the need for an email deliverability strategy.

These changes really forced email marketers to rethink how they got messages to the inbox and placed a greater emphasis on sender reputation and engagement metrics. 

What began as a simple battle to reach the inbox has become a complex endeavor to ensure emails are welcomed by recipients and trusted by email providers.

Contemporary Challenges in Email Deliverability

With the glory days of email marketing behind us, today’s email marketers have to navigate a whole new set of challenges to achieve deliverability. 

The sophistication of spam filters has reached unprecedented levels, employing AI and machine learning to scrutinize every aspect of an email, from technology to content to sender behavior. 

These filters aren’t fooled by basic tactics. They demand genuine engagement and relevant content. 

They also demand a clean sender reputation and adherence to privacy guidelines. 

Sender Reputation

Sender reputation is a huge part of deliverability.

ISPs and email services meticulously score senders on various metrics, including open rates, click-through rates, and spam complaints. 

This scrutiny means that maintaining a clean, engaged email list is more crucial than ever. 

A drop in sender reputation can lead to emails being sent to the spam folder, or worse, blocked entirely.

Privacy Regulations

GDPR and CCPA have really reshaped the email marketing landscape by mandating compliance.

These regulations don’t just suggest consent for data collection, they require it. In fact, violations risk hefty fines.

With these new challenges, email marketers must employ more nuanced and sophisticated strategies. 

It’s a balancing act – ensuring compliance with privacy laws, maintaining a positive sender reputation, navigating the intricate algorithms of spam filters, and requiring innovation and adaptability.

The Role of AI in Enhancing Email Deliverability

AI has ushered in a new era for email marketing.

From predictive analytics to personalization, AI is helping email marketers navigate all of the crazy changes taking place.

Predictive Analytics

I think we can all agree that predictive analytics has revolutionized email marketing, particularly when it comes to send time and frequency. 

Instead of having to manually analyze data and test assumptions, AI can take your data and predict the optimal moment for email engagement. The results? Higher open rates and improved deliverability.

Personalization

Personalization is another area where AI is making its mark. 

Through NLP, AI email writers can help craft email content that resonates with readers while steering clear of spam filters. 

They can analyze the effectiveness of subject lines, call-to-actions, and overall content, suggesting improvements that enhance engagement rates. 

Sender Reputation Management

The ability to predict and adapt content for better engagement and compliance is invaluable not just for getting your messages delivered but also for maintaining a healthy sender reputation.

Machine learning models can forecast potential deliverability issues before they even happen. 

By analyzing email interactions, AI can identify patterns that may lead to blacklisting or spam complaints, allowing marketers to adjust strategies proactively.

Way better than being blacklisted right?

AI is already making waves when it comes to email deliverability and it’s only going to get better. 

Marketers leveraging AI-driven platforms report significant improvements in open rates, reduced spam complaints, and enhanced overall campaign performance. 

In fact, during a 2023 survey carried out among email marketers from the United States, the United Kingdom, and other European countries, it was found that ~51% of respondents believed that AI-supported email marketing was more effective than traditional email marketing approaches.

These advancements are not just about overcoming challenges; they’re about setting new benchmarks in email marketing effectiveness.

AI for Improved Email Deliverability 

The evolution of email deliverability, from its simplest forms to the sophisticated, AI-driven landscape of today, shows us that email marketing is in a constant state of flux. 

The challenges that once seemed impossible have given way to innovative solutions and opportunities for growth and engagement.

The role of AI in this evolution cannot be overstated. 

It is now a pivotal tool, giving marketers the ability to navigate the many, many complexities of modern email marketing with unprecedented precision. 

AI-driven insights and technologies are not just enhancing deliverability, they are reshaping the very foundations of email marketing strategies, offering a glimpse into a future where personalization, engagement, and compliance are seamlessly integrated.

For strategic email marketers, the message is clear – embracing AI-driven insights and technologies is no longer an option but a necessity for staying ahead. 

The future of email deliverability lies in the ability to adapt, innovate, and harness the full potential of AI to create meaningful, engaging, and successful email campaigns.

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post The Evolution of Email Deliverability: From Basics to AI-Driven Insights appeared first on Customers.ai.

Checkmate with Scale: Google DeepMind’s Revolutionary Leap in Chess …

The intersection of artificial intelligence and the ancient game of chess has long captivated researchers, offering a fertile ground for testing the limits of computational strategy and intelligence. The journey from IBM’s Deep Blue, which in 1997 famously defeated the reigning world champion, to today’s highly sophisticated engines like Stockfish and AlphaZero underscores a continuous quest to refine and redefine machine intellect. These advancements have primarily been anchored in explicit search algorithms and intricate heuristics tailored to dissect and dominate the chessboard.

In an era where AI’s prowess is increasingly measured by its capacity to learn and adapt, a groundbreaking study shifts the narrative by harnessing the power of large-scale data and advanced neural architectures. This research by Google DeepMind revolves around a bold experiment: training a transformer model equipped with 270 million parameters, purely through supervised learning techniques, on an extensive dataset comprised of 10 million chess games. This model stands apart by not leaning on the conventional crutches of domain-specific adaptations or the explicit navigation of the decision tree that chess inherently represents.

Rather than concocting a labyrinth of search paths and handcrafted heuristics, the model learns to predict the most advantageous moves directly from the positions on the chessboard. This methodological pivot is not just a departure from tradition but a testament to the transformative potential of large-scale attention-based learning. By annotating each game state with action values derived from the formidable Stockfish 16 engine, the research taps into a deep well of strategic insight, distilling this knowledge into a neural network capable of grandmaster-level decision-making.

The performance metrics of this transformer model are nothing short of revolutionary. Achieving a Lichess blitz Elo rating of 2895 not only sets a new benchmark in human-computer chess confrontations but also demonstrates a remarkable proficiency in solving intricate chess puzzles that have historically been the domain of the most advanced search-based engines. A comparative analysis with existing field giants further underscores this performance leap. The model not only outperforms the policy and value networks of AlphaZero. This program had itself redefined AI’s approach to chess through self-play and deep learning, but it also eclipses the capabilities of GPT-3.5-turbo-instruct in understanding and executing chess strategy.

This paradigm-shifting success story is underpinned by meticulously examining the factors contributing to AI excellence in chess. The study delineates a direct correlation between the scale of the training data and the model’s effectiveness, revealing that the depth of strategic understanding and the ability to generalize across unseen board configurations only emerge at a certain magnitude of dataset and model complexity. This insight reinforces the significance of scale in AI’s conquest of intellectual domains and illustrates the nuanced balance between data diversity and computational heuristics.

In conclusion, this research not only redefines the boundaries of AI in chess but also illuminates a path forward for artificial intelligence. The key takeaways include:

The feasibility of achieving grandmaster-level chess play without explicit search algorithms relying solely on the predictive power of transformer models trained on large-scale datasets.

This demonstrates that the traditional reliance on complex heuristics and domain-specific adjustments can be bypassed, paving the way for more generalized and scalable approaches to AI problem-solving.

The critical role of dataset and model size in unlocking the full potential of AI suggests a broader applicability of these findings beyond the chessboard.

These revelations propel further exploration into the capabilities of neural networks, suggesting that the future of AI may well lie in its ability to distill complex patterns and strategies from vast oceans of data across diverse domains without the need for explicitly programmed guidance.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Checkmate with Scale: Google DeepMind’s Revolutionary Leap in Chess AI appeared first on MarkTechPost.

Huawei Researchers Tries to Rewrite the Rules with PanGu-π Pro: The D …

A groundbreaking study conducted by researchers from Huawei Noah’s Ark Lab, in collaboration with Peking University and Huawei Consumer Business Group, presents a transformative approach to developing tiny language models (TLMs) suitable for mobile devices. Despite their reduced size, these compact models aim to deliver performance on par with their larger counterparts, addressing the crucial need for efficient AI applications in resource-constrained environments.

The research team tackled the pressing challenge of optimizing language models for mobile deployment. Traditional large language models, while powerful, could be more practical for mobile use due to their substantial computational and memory requirements. This study introduces an innovative tiny language model, PanGu-π Pro, which leverages a meticulously designed architecture and advanced training methodologies to achieve remarkable efficiency and effectiveness.

At the core of their methodology is a strategic optimization of the model’s components. The team embarked on a series of empirical studies to dissect the impact of various elements on the model’s performance. A notable innovation is the compression of the tokenizer, significantly reducing the model’s size without compromising its ability to understand and generate language. Furthermore, architectural adjustments were made to streamline the model, including parameter inheritance from larger models and a multi-round training strategy that enhances learning efficiency.

The introduction of PanGu-π Pro in 1B and 1.5B parameter versions marks a significant leap forward. Following the newly established optimization protocols, the models were trained on a 1.6T multilingual corpus. The results were astounding; PanGu-π-1B Pro demonstrated an average improvement of 8.87 on benchmark evaluation sets. More impressively, PanGu-π-1.5B Pro surpassed several state-of-the-art models with larger sizes, establishing new benchmarks for performance in compact language models.

The implications of this research extend far beyond the realm of mobile devices. By achieving such a delicate balance between size and performance, the Huawei team has opened new avenues for deploying AI technologies in various scenarios where computational resources are limited. Their work not only paves the way for more accessible AI applications but also sets a precedent for future research in optimizing language models.

This study’s findings are a testament to the possibilities inherent in AI, showcasing how innovative approaches can overcome the limitations of current technologies. The Huawei team’s contributions are poised to revolutionize how we think about and interact with AI, making it more ubiquitous and integrated into our daily lives. As we progress, the principles and methodologies developed in this research will undoubtedly influence the evolution of AI technologies, making them more adaptable, efficient, and accessible to all.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Huawei Researchers Tries to Rewrite the Rules with PanGu-π Pro: The Dawn of Ultra-Efficient, Tiny Language Models Is Here! appeared first on MarkTechPost.