Unleashing the power of generative AI: Verisk’s journey to an Instan …

This post is co-written with Tom Famularo, Abhay Shah and Nicolette Kontor from Verisk.
Verisk (Nasdaq: VRSK) is a leading data analytics and technology partner for the global insurance industry. Through advanced analytics, software, research, and industry expertise across over 20 countries, Verisk helps build resilience for individuals, communities, and businesses. The company is committed to ethical and responsible AI development, with human oversight and transparency. Verisk is using generative artificial intelligence (AI) to enhance operational efficiencies and profitability for insurance clients while adhering to its ethical AI principles.
Verisk’s FAST platform is a leader in the life insurance and retirement sector, providing enhanced efficiency and flexible, easily upgradable architecture. FAST has earned a fourth consecutive leader ranking in the 2024 ISG Provider Lens report for its seamless integration with Verisk’s data, analytics, and claims tools. The software as a service (SaaS) platform offers out-of-the-box solutions for life, annuity, employee benefits, and institutional annuity providers. With preconfigured components and platform configurability, FAST enables carriers to reduce product time-to-market by 75% and launch new offerings in as little as 2 months.
In this post, we describe the development of the customer support process in FAST incorporating generative AI, the data, the architecture, and the evaluation of the results. Conversational AI assistants are rapidly transforming customer and employee support. Verisk has embraced this technology and has developed their own Instant Insight Engine, or AI companion, that provides an enhanced self-service capability to their FAST platform.
The Opportunity
Verisk FAST’s initial foray into using AI was due to the immense breadth and complexity of the platform. With hundreds of thousands of hours spent on customer support every year, it became abundantly clear they needed help to scale their efforts and meet their objectives. Verisk’s talented teams were overloaded handling common inquiries, leaving less time for the type of innovation that would allow them to maintain the pole position as insurance technology providers.
Verisk FAST’s AI companion aims to alleviate this burden by not only providing 24/7 support for business processing and configuration questions related to FAST, but also tapping into the immense knowledge base to provide an in-depth, tailored response. It is designed to be deeply integrated into the FAST platform and use all of Verisk’s documentation, training materials, and collective expertise. It relies on a Retrieval Augmented Generation (RAG) approach and a mix of AWS services and proprietary configuration to instantly answer most user questions about the Verisk FAST platform’s extensive capabilities.
When the AI companion is rolled out at scale, it will allow Verisk’s staff to focus more time on complex problems, critical initiatives, and innovation while delivering a better customer experience. As part of the build-out, Verisk came across several considerations, key findings, and decisions worth sharing for any enterprise looking to take the first step in tapping into generative AI’s potential.
The Approach
When building an interactive agent with large language models (LLMs), there are often two techniques that can be used: RAG and fine-tuning. The choice between these approaches depends on the use case and available dataset. Verisk FAST started building a RAG pipeline for their AI companion and have iteratively enhanced this solution. The following are some of the reasons why continuing with a RAG architecture made sense to Verisk:

Access to Dynamic Data – The FAST platform is a constantly evolving platform adding both business functionality and technical capabilities. Verisk needed to make sure their responses were always based on the most up-to-date information. The RAG approach allows for accessing frequently updated data, enabling responses using the most recent information without frequent retraining of the model.
Multiple Data Sources – In addition to recency of data, another important aspect was the ability to tap into multiple different data sources to retrieve the right context. These data sources may be both internal and external to provide a more holistic response. The ease of expanding the knowledge domain without the need to fine-tune with new data sources makes the solution extensible.
Reduce Hallucination – Retrieval reduces the risk of hallucination compared to free-form text generation because responses derive directly from the provided excerpts.
LLM Linguistics – Although appropriate context can be retrieved from enterprise data sources, the underlying LLM handles linguistics and fluency.
Transparency – Verisk wants to continuously improve the AI companion’s ability to generate responses. A RAG architecture gave them the transparency needed into the context retrieval process, information that would ultimately be used for generating user responses. Having that transparency helped Verisk identify areas of the system where their documents were lacking and needed some restructuring.
Data governance – With a wide variety of users accessing the platform and with different users having access to different data, data governance and isolation was paramount. Verisk injected controls into the RAG pipeline that restricted access to data based on user access controls, making sure responses were highly tuned to the user.

Although both RAG and fine-tuning have trade-offs, RAG was the optimal approach for building an AI companion on the FAST platform given their requirements for real-time accuracy, explainability, and configurability. The pipeline architecture allows for iterative enhancement as Verisk FAST’s use cases evolve.
Solution Overview
The following diagram presents a high-level architectural data flow highlighting several of the AWS services used in building the solution. Verisk’s solution represents a compound AI system, involving multiple interacting components and making numerous calls to the LLM to furnish responses to the user. Using the FAST platform for orchestrating these diverse components proved to be an intuitive choice, circumventing certain challenges encountered with alternative frameworks such as LangChain.

The key components are as follows:

Amazon Comprehend
Amazon Kendra
Amazon Bedrock
Amazon Rekognition
Amazon Transcribe
A prompt template warehouse

Amazon Comprehend
To bolster security, Verisk aimed to block the submission of personally identifiable information (PII) within user questions. Although PII isn’t typically necessary for interactions with the AI companion, Verisk employed Amazon Comprehend to detect any potential PII within queries.
 Amazon Kendra
In designing an effective RAG solution, one of the most critical steps is the context retrieval from enterprise documentation. Although many options exist to store embeddings, Verisk FAST opted to use Amazon Kendra due to its powerful out-of-the-box semantic search capabilities. As a fully managed service, Verisk took advantage of its deep-learning search models without additional provisioning. Verisk compared using Amazon OpenSearch Serverless with several embedding approaches and Amazon Kendra, and saw better retrieval results with Amazon Kendra. As you’ll see further in the post, Verisk incorporated the Retrieve API and the Query API to retrieve semantically relevant passages for their queries to further improve generation by the LLM.
Amazon Bedrock
Anthropic Claude, available in Amazon Bedrock, played various roles within Verisk’s solution:

Response Generation – When building their AI companion, Verisk thoroughly evaluated the LLM options from leading providers, using their dataset to test each model’s comprehension and response quality. After this extensive testing, Verisk found Anthropic’s Claude model consistently outperformed across key criteria. Claude demonstrated superior language understanding in Verisk’s complex business domain, allowing more pertinent responses to user questions. It also did exceedingly well at SQL generation, better than any other model they tested. Given Claude’s standout results across Verisk FAST’s use cases, it was the clear choice to power their AI companion’s natural language capabilities.
Preprocessing of Images and Videos – The outputs from Amazon Rekognition and Amazon Transcribe were fed into Claude. Claude demonstrated remarkable capabilities in generating natural language descriptions, which could be effectively used for indexing purposes with Amazon Kendra. Additionally, Claude excelled at summarizing video transcriptions into concise segments corresponding to specific time intervals, enabling the display of videos at precise points. This combination of AWS services and Claude’s language processing capabilities facilitated a more intuitive and user-friendly experience for media exploration and navigation.
Relevance Ranking – Although Amazon Kendra returned confidence scores on search results, Verisk needed to further tune the search results for Query API calls for a few scenarios. Verisk was able to use Claude to rank the relevance of search results from Amazon Kendra, further improving the results returned to the user.
Tool Identification – Verisk used Claude to determine the most suitable techniques, whether API calls or SQL queries, for retrieving data from the operational database based on user requests. Furthermore, Claude generated SQL queries tailored to the provided schemas, enabling efficient data retrieval.
Conversation Summarization – When a user asks a follow-up question, the AI companion can continue the conversational thread. To enable this, Verisk used Claude to summarize the dialogue to update the context from Amazon Kendra. The full conversation summary and new excerpts are input to the LLM to generate the next response. This conversational flow allows the AI compan to answer user follow-up questions and have a more natural, contextual dialogue, bringing Verisk FAST closer to having a true AI assistant that can engage in useful back-and-forth conversations with users.

Amazon Rekognition
Primarily used for processing images containing text and process flow diagrams, the pre-trained features of Amazon Rekognition facilitated information extraction. The extracted data was then passed to Claude for transformation into a more natural language format suitable for indexing within Amazon Kendra.
Amazon Transcribe
Similar to Amazon Rekognition, Amazon Transcribe was employed to preprocess videos and generate transcripts, with a notable feature being the masking of sensitive information. The verbose transcripts, along with timestamps, were condensed using Claude before being indexed into Amazon Kendra.
Prompt Template Warehouse
Central to the solution was the dynamic selection of templates to create prompts based on question classification. Substantial effort was invested in developing and continuously improving these prompt templates.
Throughout Verisk’s journey, they worked closely with the AWS Solutioning team to brainstorm concrete suggestions to enhance the overall solution.
Data Harvesting
Before Verisk started building anything in the platform, they spent weeks amassing information, initially in the form of questions and answers. Verisk FAST’s initial dataset comprised 10,000 questions and their corresponding answers, meticulously collected and vetted to confirm accuracy and relevance. However, they understood that this was not a one-and-done effort. Verisk needed to continually expand its knowledge base by identifying new data sources across the business.
Driven by this, Verisk diligently added 15,000 more questions, making sure they covered less frequently encountered scenarios. Verisk also added user guides, technical documentation, and other text-based information. This data spanned several categories, from business processing to configuration to their delivery approach. This enriched the AI companion’s knowledge and understanding of diverse user queries, enabling it to provide more accurate and insightful responses.
The Verisk FAST team also recognized the necessity of exploring additional modalities. Videos and images, particularly those illustrating process flows and information sharing videos, proved to be invaluable sources of data. During the initial rollout phase, it became evident that certain inquiries demanded real-time data retrieval from their operational data store. Through some slick prompt engineering and using Claude’s latest capabilities to invoke APIs, Verisk seamlessly accessed their database to procure real-time information.
Structuring and Retrieving the Data
An essential element in developing the AI companion’s knowledge base was properly structuring and effectively querying the data to deliver accurate answers. Verisk explored various techniques to optimize both the organization of the content and the methods to extract the most relevant information:

Chunking – One key step in preparing the accumulated questions and answers was splitting the data into individual documents to facilitate indexing into Amazon Kendra. Rather than uploading a single large file containing all 10,000 question-answer pairs, Verisk chunked the data into 10,000 separate text documents, with each document containing one question-answer pair. By splitting the data into small, modular documents focused on a single question-answer pair, Verisk could more easily index each document and had greater success in pulling back the correct context. Chunking the data also enabled straightforward updating and reindexing of the knowledge base over time. Verisk applied the same technique to other data sources as well.
Selecting the Right Number of Results – Verisk tested configuring Amazon Kendra to return different numbers of results for each question query. Returning too few results ran the risk of not capturing the best answer, whereas too many results made it more difficult to identify the right response. Verisk found returning the top three matching results from Amazon Kendra optimized both accuracy and performance.
Multi-step Query – To further improve accuracy, Verisk implemented a multi-step query process. First, they used the Amazon Kendra Retrieve API to get multiple relevant passages and excerpts based on keyword search. Next, they took a second pass at getting excerpts through the Query API, to find any additional shorter documents that might have been missed. Combining these two query types enabled Verisk to reliably identify the correct documentation and excerpts to generate a response.
Relevance Parameters – Verisk also tuned relevance parameters in Amazon Kendra to weigh their most up-to-date documentation higher than others. This improved results over just generic text search.

By thoroughly experimenting and optimizing both the knowledge base powering their AI companion and the queries to extract answers from it, Verisk was able to achieve very high answer accuracy during the proof of concept, paving the way for further development. The techniques they explored—multi-stage querying, tuning relevance, enriching data—became core elements of their approach for extracting quality automated answers.
LLM Parameters and Models
Experimenting with prompt structure, length, temperature, role-playing, and context was key to improving the quality and accuracy of the AI companion’s Claude-powered responses. The prompt design guidelines provided by Anthropic were incredibly helpful.
Verisk crafted prompts that provided Claude with clear context and set roles for answering user questions. Setting the temperature to 0.5 helped reduce randomness and repetition in the generated responses.
Verisk also experimented with different models to improve the efficiency of the overall solution. Although Claude 3 models like Sonnet and Haiku did a great job at generating responses, as part of the overall solution, Verisk didn’t always need the LLM to generate text. For scenarios that required identification of tools, Claude Instant was a better suited model due to its quicker response times.
Metrics, Data Governance, and Accuracy
A critical component of Verisk FAST’s AI companion and its usefulness is their rigorous evaluation of its performance and the accuracy of its generated responses.
As part of the proof of concept in working with the Amazon Generative AI Innovation Center, Verisk came up with 100 questions to evaluate the accuracy and performance of the AI companion. Central to this process was crafting questions designed to assess the bot’s ability to comprehend and respond effectively across a diverse range of topics and scenarios. These questions spanned a variety of topics and varying levels of difficulty. Verisk wanted to make sure their AI companion provided accurate responses to frequently asked questions and could demonstrate proficiency in handling nuanced and less predictable or straightforward inquiries. The results provided invaluable insights into RAG’s strengths and areas for improvement, guiding Verisk’s future efforts to refine and enhance its capabilities further.
After Verisk integrated their AI companion into the platform and began testing it with real-world scenarios, their accuracy rate was approximately 40%. However, within a few months, it rapidly increased to over 70% because of all the data harvesting work, and the accuracy continues to steadily improve each day.
Contributing to the AI companion’s rising accuracy is Verisk’s evaluation heat map. This provides a visual representation of the documentation available across 20 topics that comprehensively encompasses the Verisk FAST platform’s capabilities. This is compared against the volume of inquiries within each specific topic segment and the health of the generated responses in each.
This visualized data allows the Verisk FAST team to effortlessly identify gaps. They can quickly see which capability the AI companion currently struggles with against where user questions are most focused on. The Verisk team can then prioritize expanding its knowledge in these areas through additional documentation, training data, research materials, and testing.

Business Impact
Verisk initially rolled out the AI companion to one beta customer to demonstrate real-world performance and impact. Supporting a customer in this way is a stark contrast to how Verisk has historically engaged with and supported customers in the past, where they would typically have a team allocated to interact with the customer directly. Now only a fraction of the time a person would usually spend is needed to review submissions and adjust responses. Verisk FAST’s AI companion has helped them cost-effectively scale while still providing high-quality assistance.
In analyzing this early usage data, Verisk uncovered additional areas they can drive business value for their customers. As they collect additional information, this data will help them uncover what will be needed to improve results and prepare for a wider rollout.
Ongoing development will focus on expanding these capabilities, prioritized based on the collected questions. Most exciting, though, are the new possibilities on the horizon with generative AI. Verisk knows this technology is rapidly advancing, and they are eager to harness innovations to bring even more value to their customers. As new models and techniques emerge, Verisk plans to adapt their AI companion to take advantage of the latest capabilities. Although the AI companion currently focuses on responding to user questions, this is only the starting point. Verisk plans to quickly improve its capabilities to proactively make suggestions and configure functionality directly in the system itself. The Verisk FAST team is inspired by the challenge of pushing the boundaries of what is possible with generative AI and is excited to test the limits of what’s possible.
Conclusion
Verisk’s journey in developing an AI companion for their FAST platform showcases the immense potential of generative AI to transform customer support and drive operational efficiencies. By meticulously harvesting, structuring, and retrieving data, and leveraging large language models, semantic search capabilities, and rigorous evaluation processes, Verisk has created a robust solution that provides accurate, real-time responses to user inquiries. As Verisk continues to expand the AI companion’s capabilities while adhering to ethical and responsible AI development practices, they are poised to unlock greater value for customers, enable staff to focus on innovation, and set new standards for customer support in the insurance industry.
For more information, see the following resources:

Explore generative AI on AWS
Learn about Unlocking the business value of Generative AI
Learn more about Anthropic Claude 3 models on Amazon Bedrock
Learn about Amazon Bedrock and how to build and scale generative AI applications with foundation models
Generative AI Quickstart POCs

About the Authors
Tom Famularo was Co-Founder/CEO or FAST and lead’s Verisk Life Solutions, based in NJ. Tom is responsible for platform strategy, data/analytics, AI and Verisk’s life/annuity customers. His focus and passion are for teaching customers and team members how to allow technology to enable business outcomes with far less human effort. Outside of work, he’s an avid fan of his son’s baseball and football teams.
Abhay Shah leads engineering efforts for the FAST Platform at Verisk – Life Solutions, where he offers guidance on architecture and provides technical leadership for Customer Implementations and Product Development. With over two decades of experience in the technology sector, Abhay helps insurance carriers maximize the value of their ecosystem through modern technology and is excited by the opportunities that AI provides. Beyond his professional passion, he enjoys reading, traveling, and coaching the middle school robotics team.
Nicolette Kontor is a technology enthusiast who thrives on helping customers embrace digital transformation. In her current role at Verisk – Life Solutions, she spearheads the application of artificial intelligence to the FAST Platform, which she finds tremendously rewarding and exciting. With over 10 years of experience in major customer implementations and product development, Nicolette is driven to deliver innovative solutions that unlock value for insurance carriers. Beyond her professional pursuits, Nicolette is an avid traveler, having explored 39 countries to date. She enjoys winning trivia, reading mystery novels, and learning new languages.
Ryan Doty is a Sr. Solutions Architect at AWS, based out of New York. He helps enterprise customers in the Northeast U.S. accelerate their adoption of the AWS Cloud by providing architectural guidelines to design innovative and scalable solutions. Coming from a software development and sales engineering background, the possibilities that the cloud can bring to the world excite him.
Tarik Makota is a Senior Principal Solutions Architect with Amazon Web Services. He provides technical guidance, design advice, and thought leadership to AWS’ customers across the US Northeast. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.
Dom Bavaro is a Senior Solutions Architect for Financial Services. While providing technical guidance to customers across many use cases, He is focused on helping customer build and productionize Generative AI solutions and workflows

Establishing an AI/ML center of excellence

The rapid advancements in artificial intelligence and machine learning (AI/ML) have made these technologies a transformative force across industries. According to a McKinsey study, across the financial services industry (FSI), generative AI is projected to deliver over $400 billion (5%) of industry revenue in productivity benefits. As maintained by Gartner, more than 80% of enterprises will have AI deployed by 2026. At Amazon, we believe innovation (rethink and reinvent) drives improved customer experiences and efficient processes, leading to increased productivity. Generative AI is a catalyst for business transformation, making it imperative for FSI organizations to determine where generative AI’s current capabilities could deliver the biggest value for FSI customers.
Organizations across industries face numerous challenges implementing generative AI across their organization, such as lack of clear business case, scaling beyond proof of concept, lack of governance, and availability of the right talent. An effective approach that addresses a wide range of observed issues is the establishment of an AI/ML center of excellence (CoE). An AI/ML CoE is a dedicated unit, either centralized or federated, that coordinates and oversees all AI/ML initiatives within an organization, bridging business strategy to value delivery. As observed by Harvard Business Review, an AI/ML CoE is already established in 37% of large companies in the US. For organizations to be successful in their generative AI journey, there is growing importance for coordinated collaboration across lines of businesses and technical teams.
This post, along with the Cloud Adoption Framework for AI/ML and Well-Architected Machine Learning Lens, serves as a guide for implementing an effective AI/ML CoE with the objective to capture generative AI’s possibilities. This includes guiding practitioners to define the CoE mission, forming a leadership team, integrating ethical guidelines, qualification and prioritization of use cases, upskilling of teams, implementing governance, creating infrastructure, embedding security, and enabling operational excellence.
What is an AI/ML CoE?
The AI/ML CoE is responsible for partnering with lines of business and end-users in identifying AI/ML use cases aligned to business and product strategy, recognizing common reusable patterns from different business units (BUs), implementing a company-wide AI/ML vision, and deploying an AI/ML platform and workloads on the most appropriate combination of computing hardware and software. The CoE team synergizes business acumen with profound technical AI/ML proficiency to develop and implement interoperable, scalable solutions throughout the organization. They establish and enforce best practices encompassing design, development, processes, and governance operations, thereby mitigating risks and making sure robust business, technical, and governance frameworks are consistently upheld. For ease of consumption, standardization, scalability, and value delivery, the outputs of an AI/ML CoE can be of two types: guidance such as published guidance, best practices, lessons learned, and tutorials, and capabilities such as people skills, tools, technical solutions, and reusable templates.
The following are benefits of establishing an AI/ML CoE:

Faster time to market through a clear path to production
Maximized return on investments through delivering on the promise of generative AI business outcomes
Optimized risk management
Structured upskilling of teams
Sustainable scaling with standardized workflows and tooling
Better support and prioritization of innovation initiatives

The following figure illustrates the key components for establishing an effective AI/ML CoE.

In the following sections, we discuss each numbered component in detail.
1. Sponsorship and mission
The foundational step in setting up an AI/ML CoE is securing sponsorship from senior leadership, establishing leadership, defining its mission and objectives, and aligning empowered leadership.
Establish sponsorship
Establish clear leadership roles and structure to provide decision-making processes, accountability, and adherence to ethical and legal standards:

Executive sponsorship – Secure support from senior leadership to champion AI/ML initiatives
Steering committee – Form a committee of key stakeholders to oversee the AI/ML CoE’s activities and strategic direction
Ethics board – Create a board to address ethical and responsible AI considerations in AI/ML development and deployment

Define the mission
Making the mission customer- or product-focused and aligned with the organization’s overall strategic goals helps outline the AI/ML CoE’s role in achieving them. This mission, usually set by the executive sponsor in alignment with the heads of business units, serves as a guiding principle for all CoE activities, and contains the following:

Mission statement – Clearly articulate the purpose of the CoE in advancing customer and product outcomes applying AI/ML technologies
Strategic objectives – Outline tangible and measurable AI/ML goals that align with the organization’s overall strategic goals
Value proposition – Quantify the expected business value Key Performance Indicators (KPIs) such as cost savings, revenue gains, user satisfaction, time savings, and time-to-market

2. People
According to a Gartner report, 53% of business, functional, and technical teams rate their technical acumen on generative AI as “Intermediate” and 64% of senior leadership rate their skill as “Novice.” By developing customized solutions tailored to the specific and evolving needs of the business, you can foster a culture of continuous growth and learning and cultivate a deep understanding of AI and ML technologies, including generative AI skill development and enablement.
Training and enablement
To help educate employees on AI/ML concepts, tools, and techniques, the AI/ML CoE can develop training programs, workshops, certification programs, and hackathons. These programs can be tailored to different levels of expertise and designed to help employees understand how to use AI/ML to solve business problems. Additionally, the CoE could provide a mentoring platform to employees who are interested in further enhancing their AI/ML skills, develop certification programs to recognize employees who have achieved a certain level of proficiency in AI/ML, and provide ongoing training to keep the team updated with the latest technologies and methodologies.
Dream team
Cross-functional engagement is essential to achieve well-rounded AI/ML solutions. Having a multidisciplinary AI/ML CoE that combines industry, business, technical, compliance, and operational expertise helps drive innovation. It harnesses the 360 view potential of AI in achieving a company’s strategic business goals. Such a diverse team with AI/ML expertise may include roles such as:

Product strategists – Make sure all products, features, and experiments are cohesive to the overall transformation strategy
AI researchers – Employ experts in the field to drive innovation and explore cutting-edge techniques such as generative AI
Data scientists and ML engineers – Develop capabilities for data preprocessing, model training, and validation
Domain experts – Collaborate with professionals from business units who understand the specific applications and business need
Operations – Develop KPIs, demonstrate value delivery, and manage machine learning operations (MLOPs) pipelines
Project managers – Appoint project managers to implement projects efficiently

Knowledge sharing
By fostering collaboration within the CoE, internal stakeholders, business unit teams, and external stakeholders, you can enable knowledge sharing and cross-disciplinary teamwork. Encourage knowledge sharing, establish a knowledge repository, and facilitate cross-functional projects to maximize the impact of AI/ML initiatives. Some example key actions to foster knowledge sharing are:

Cross-functional collaborations – Promote teamwork between experts in generative AI and business unit domain-specific professionals to innovate on cross-functional use cases
Strategic partnerships – Investigate partnerships with research institutions, universities, and industry leaders specializing in generative AI to take advantage of their collective expertise and insights

3. Governance
Establish governance that enables the organization to scale value delivery from AI/ML initiatives while managing risk, compliance, and security. Additionally, pay special attention to the changing nature of the risk and cost that is associated with the development as well as the scaling of AI.
Responsible AI
Organizations can navigate potential ethical dilemmas associated with generative AI by incorporating considerations such as fairness, explainability, privacy and security, robustness, governance, and transparency. To provide ethical integrity, an AI/ML CoE helps integrate robust guidelines and safeguards across the AI/ML lifecycle in collaboration with stakeholders. By taking a proactive approach, the CoE provides ethical compliance but also builds trust, enhances accountability, and mitigates potential risks such as veracity, toxicity, data misuse, and intellectual property concerns.
Standards and best practices
Continuing its stride towards excellence, the CoE helps define common standards, industry-leading practices, and guidelines. These encompass a holistic approach, covering data governance, model development, ethical deployment, and ongoing monitoring, reinforcing the organization’s commitment to responsible and ethical AI/ML practices. Examples of such standards include:

Development framework – Establishing standardized frameworks for AI development, deployment, and governance provides consistency across projects, making it easier to adopt and share best practices.
Repositories – Centralized code and model repositories facilitate the sharing of best practices and industry standard solutions in coding standards, enabling teams to adhere to consistent coding conventions for better collaboration, reusability, and maintainability.
Centralized knowledge hub – A central repository housing datasets and research discoveries to serve as a comprehensive knowledge center.
Platform – A central platform such as Amazon SageMaker for creation, training, and deployment. It helps manage and scale central policies and standards.
Benchmarking and metrics – Defining standardized metrics and benchmarking to measure and compare the performance of AI models, and the business value derived.

Data governance
Data governance is a crucial function of an AI/ML CoE, such as making sure data is collected, used, and shared in a responsible and trustworthy manner. Data governance is essential for AI applications, because these applications often use large amounts of data. The quality and integrity of this data are critical to the accuracy and fairness of AI-powered decisions. The AI/ML CoE helps define best practices and guidelines for data preprocessing, model development, training, validation, and deployment. The CoE should make sure that data is accurate, complete, and up-to-date; the data is protected from unauthorized access, use, or disclosure; and data governance policies demonstrate the adherence to regulatory and internal compliance.
Model oversight
Model governance is a framework that determines how a company implements policies, controls access to models, and tracks their activity. The CoE helps make sure that models are developed and deployed in a safe, trustworthy, and ethical fashion. Additionally, it can confirm that model governance policies demonstrate the organization’s commitment to transparency, fostering trust with customers, partners, and regulators. It can also provide safeguards customized to your application requirements and make sure responsible AI policies are implemented using services such as Guardrails for Amazon Bedrock.
Value delivery
Manage the AI/ML initiative return on investment, platform and services expenses, efficient and effective use of resources, and ongoing optimization. This requires monitoring and analyzing use case-based value KPIs and expenditures related to data storage, model training, and inference. This includes assessing the performance of various AI models and algorithms to identify cost-effective, resource-optimal solutions such as using AWS Inferentia for inference and AWS Trainium for training. Setting KPIs and metrics is pivotal to gauge effectiveness. Some example KPIs are:

Return on investment (ROI) – Evaluating financial returns against investments justifies resource allocation for AI projects
Business impact – Measuring tangible business outcomes like revenue uplift or enhanced customer experiences validates AI’s value
Project delivery time – Tracking time from project initiation to completion showcases operational efficiency and responsiveness

4. Platform
The AI/ML CoE, in collaboration with the business and technology teams, can help build an enterprise-grade and scalable AI platform, enabling organizations to operate AI-enabled services and products across business units. It can also help develop custom AI solutions and help practitioners adapt to change in AI/ML development.
Data and engineering architecture
The AI/ML CoE helps set up the right data flows and engineering infrastructure, in collaboration with the technology teams, to accelerate the adoption and scaling of AI-based solutions:

High-performance computing resources – Powerful GPUs such as Amazon Elastic Compute Cloud (Amazon EC2) instances, powered by the latest NVIDIA H100 Tensor Core GPUs, are essential for training complex models.
Data storage and management – Implement robust data storage, processing, and management systems such as AWS Glue and Amazon OpenSearch Service.
Platform – Using cloud platforms can provide flexibility and scalability for AI/ML projects for tasks such as SageMaker, which can help provide end-to-end ML capability across generative AI experimentation, data prep, model training, deployment, and monitoring. This further helps accelerate generative AI workloads from experimentation to production. Amazon Bedrock is an easier way to build and scale generative AI applications with foundation models (FMs). As a fully managed service, it offers a choice of high-performing FMs from leading AI companies including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon.
Development tools and frameworks – Use industry-standard AI/ML frameworks and tools such as Amazon CodeWhisperer, Apache MXNet, PyTorch, and TensorFlow.
Version control and collaboration tools – Git repositories, project management tools, and collaboration platforms can facilitate teamwork, such as AWS CodePipeline and Amazon CodeGuru.
Generative AI frameworks – Utilize state-of-the-art foundation models, tools, agents, knowledge bases, and guardrails available on Amazon Bedrock.
Experimentation platforms – Deploy platforms for experimentation and model development, allowing for reproducibility and collaboration, such as Amazon SageMaker JumpStart.
Documentation – Emphasize the documentation of processes, workflows, and best practices within the platform to facilitate knowledge sharing among practitioners and teams.

Lifecycle management
Within the AI/ML CoE, the emphasis on scalability, availability, reliability, performance, and resilience is fundamental to the success and adaptability of AI/ML initiatives. Implementation and operationalization of a lifecycle management system such as MLOps can help automate deployment and monitoring, resulting in improved reliability, time to market, and observability. Using tools like Amazon SageMaker Pipelines for workflow management, Amazon SageMaker Experiments for managing experiments, and Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration enables adaptable deployment and management of AI/ML applications, fostering scalability and portability across various environments. Similarly, employing serverless architectures such as AWS Lambda empowers automatic scaling based on demand, reducing operational complexity while offering flexibility in resource allocation.
Strategic alliances in AI services
The decision to buy or build solutions involves trade-offs. Buying offers speed and convenience by using pre-built tools, but may lack customization. On the other hand, building provides tailored solutions but demands time and resources. The balance hinges on the project scope, timeline, and long-term needs, achieving optimal alignment with organizational goals and technical requirements. The decision, ideally, can be based on a thorough assessment of the specific problem to be solved, the organization’s internal capabilities, and the area of the business targeted for growth. For example, if the business system helps establish uniqueness and then builds to differentiate in the market, or if the business system supports a standard commoditized business process, then buys to save.
By partnering with third-party AI service providers, such as AWS Generative AI Competency Partners, the CoE can use their expertise and experience to accelerate the adoption and scaling of AI-based solutions. These partnerships can help the CoE stay up to date with the latest AI/ML research and trends, and can provide access to cutting-edge AI/ML tools and technologies. Additionally, third-party AI service providers can help the CoE identify new use cases for AI/ML and can provide guidance on how to implement AI/ML solutions effectively.
5. Security
Emphasize, assess, and implement security and privacy controls across the organization’s data, AI/ML, and generative AI workloads. Integrate security measures across all aspects of AI/ML to identify, classify, remediate, and mitigate vulnerabilities and threats.
Holistic vigilance
Based on how your organization is using generative AI solutions, scope the security efforts, design resiliency of the workloads, and apply relevant security controls. This includes employing encryption techniques, multifactor authentication, threat detection, and regular security audits to make sure data and systems remain protected against unauthorized access and breaches. Regular vulnerability assessments and threat modeling are crucial to address emerging threats. Strategies such as model encryption, using secure environments, and continuous monitoring for anomalies can help protect against adversarial attacks and malicious misuse. To monitor the model for threats detection, you can use tools like Amazon GuardDuty. With Amazon Bedrock, you have full control over the data you use to customize the foundation models for your generative AI applications. Data is encrypted in transit and at rest. User inputs and model outputs are not shared with any model providers; keeping your data and applications secure and private.
End-to-end assurance
Enforcing the security of the three critical components of any AI system (inputs, model, and outputs) is critical. Establishing clearly defined roles, security policies, standards, and guidelines across the lifecycle can help manage the integrity and confidentiality of the system. This includes implementation of industry best practice measures and industry frameworks, such as NIST, OWASP-LLM, OWASP-ML, MITRE Atlas. Furthermore, evaluate and implement requirements such as Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) and European Union’s General Data Protection Regulation (GDPR). You can use tools such as Amazon Macie to discover and protect your sensitive data.
Infrastructure (data and systems)
Given the sensitivity of the data involved, exploring and implementing access and privacy-preserving techniques is vital. This involves techniques such as least privilege access, data lineage, keeping only relevant data for use case, and identifying and classifying sensitive data to enable collaboration without compromising individual data privacy. It’s essential to embed these techniques within the AI/ML development lifecycle workflows, maintain a secure data and modeling environment, and stay in compliance with privacy regulations and protect sensitive information. By integrating security-focused measures into the AI/ML CoE’s strategies, the organization can better mitigate risks associated with data breaches, unauthorized access, and adversarial attacks, thereby providing integrity, confidentiality, and availability for its AI assets and sensitive information.
6. Operations
The AI/ML CoE needs to focus on optimizing the efficiency and growth potential of implementing generative AI within the organization’s framework. In this section, we discuss several key aspects aimed at driving successful integration while upholding workload performance.
Performance management
Setting KPIs and metrics is pivotal to gauge effectiveness. Regular assessment of these metrics allows you to track progress, identify trends, and foster a culture of continual improvement within the CoE. Reporting on these insights provides alignment with organizational objectives and informs decision-making processes for enhanced AI/ML practices. Solutions such as Bedrock integration with Amazon CloudWatch, helps track and manage usage metrics, and build customized dashboards for auditing.
An example KPI is model accuracy: assessing models against benchmarks provides reliable and trustworthy AI-generated outcomes.
Incident management
AI/ML solutions need ongoing control and observation to manage any anomalous activities. This requires establishing processes and systems across the AI/ML platform, ideally automated. A standardized incident response strategy needs to be developed and implemented in alignment with the chosen monitoring solution. This includes elements such as formalized roles and responsibilities, data sources and metrics to be monitored, systems for monitoring, and response actions such as mitigation, escalation, and root cause analysis.
Continuous improvement
Define rigorous processes for generative AI model development, testing, and deployment. Streamline the development of generative AI models by defining and refining robust processes. Regularly evaluate the AI/ML platform performance and enhance generative AI capabilities. This involves incorporating feedback loops from stakeholders and end-users and dedicating resources to exploratory research and innovation in generative AI. These practices drive continual improvement and keep the CoE at the forefront of AI innovation. Furthermore, implement generative AI initiatives seamlessly by adopting agile methodologies, maintaining comprehensive documentation, conducting regular benchmarking, and implementing industry best practices.
7. Business
The AI/ML CoE helps drive business transformation by continuously identifying priority pain points and opportunities across business units. Aligning business challenges and opportunities to customized AI/ML capabilities, the CoE drives rapid development and deployment of high-value solutions. This alignment to real business needs enables step-change value creation through new products, revenue streams, productivity, optimized operations, and customer satisfaction.
Envision an AI strategy
With the objective to drive business outcomes, establish a compelling multi-year vision and strategy on how the adoption of AI/ML and generative AI techniques can transform major facets of the business. This includes quantifying the tangible value at stake from AI/ML in terms of revenues, cost savings, customer satisfaction, productivity, and other vital performance indicators over a defined strategic planning timeline, such as 3–5 years. Additionally, the CoE must secure buy-in from executives across business units by making the case for how embracing AI/ML will create competitive advantages and unlock step-change improvements in key processes or offerings.
Use case management
To identify, qualify, and prioritize the most promising AI/ML use cases, the CoE facilitates an ongoing discovery dialogue with all business units to surface their highest-priority challenges and opportunities. Each complex business issue or opportunity must be articulated by the CoE, in collaboration with business unit leaders, as a well-defined problem and opportunity statement that lends itself to an AI/ML-powered solution. These opportunities establish clear success metrics tied to business KPIs and outline the potential value impact vs. implementation complexity. A prioritized pipeline of high-potential AI/ML use cases can then be created, ranking opportunities based on expected business benefit and feasibility.
Proof of concept
Before undertaking full production development, prototype proposed solutions for high-value use cases through controlled proof of concept (PoC) projects focused on demonstrating initial viability. Rapid feedback loops during these PoC phases allow for iteration and refinement of approaches at a small scale prior to wider deployment. The CoE establishes clear success criteria for PoCs, in alignment with business unit leaders, that map to business metrics and KPIs for ultimate solution impact. Furthermore, the CoE can engage to share expertise, reusable assets, best practices, and standards.
Executive alignment
To provide full transparency, the business unit executive stakeholders must be aligned with AI/ML initiatives, and have regular reporting with them. This way, any challenges that need to be escalated can be quickly resolved with executives who are familiar with the initiatives.
8. Legal
The legal landscape of AI/ML and generative AI is complex and evolving, presenting a myriad of challenges and implications for organizations. Issues such as data privacy, intellectual property, liability, and bias require careful consideration within the AI/ML CoE. As regulations struggle to keep pace with technological advancements, the CoE must partner with the organization’s legal team to navigate this dynamic terrain to enforce compliance and responsible development and deployment of these technologies. The evolving landscape demands that the CoE, working in collaboration with the legal team, develops comprehensive AI/ML governance policies covering the entire AI/ML lifecycle. This process involves business stakeholders in decision-making processes and regular audits and reviews of AI/ML systems to validate compliance with governance policies.
9. Procurement
The AI/ML CoE needs to work with partners, both Independent Software Vendors (ISV) and System Integrators (SI) to help with the buy and build strategies. They need to partner with the procurement team to develop a selection, onboarding, management, and exit framework. This includes acquiring technologies, algorithms, and datasets (sourcing reliable datasets is crucial for training ML models, and acquiring cutting-edge algorithms and generative AI tools enhances innovation). This will help accelerated development of capabilities needed for business. Procurement strategies must prioritize ethical considerations, data security, and ongoing vendor support to provide sustainable, scalable, and responsible AI integration.
10. Human Resources
Partner with Human Resources (HR) on AI/ML talent management and pipeline. This involves cultivating talent to understand, develop, and implement these technologies. HR can help bridge the technical and non-technical divide, fostering interdisciplinary collaboration, building a path for onboarding new talent, training them, and growing them on both professional and skills. They can also address ethical concerns through compliance training, upskill employees on the latest emerging technologies, and manage the impact of job roles that are critical for continued success.
11. Regulatory and compliance
The regulatory landscape for AI/ML is rapidly evolving, with governments worldwide racing to establish governance regimes for the increasing adoption of AI applications. The AI/ML CoE needs a focused approach to stay updated, derive actions, and implement regulatory legislations such as Brazil’s General Personal Data Protection Law (LGPD), Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA), and the European Union’s General Data Protection Regulation (GDPR), and frameworks such as ISO 31700, ISO 29100, ISO 27701, Federal Information Processing Standards (FIPS), and NIST Privacy Framework. In the US, regulatory actions include mitigating risks posed by the increased adoption of AI, protecting workers affected by generative AI, and providing stronger consumer protections. The EU AI Act includes new assessment and compliance requirements.
As AI regulations continue to take shape, organizations are advised to establish responsible AI as a C-level priority, set and enforce clear governance policies and processes around AI/ML, and involve diverse stakeholders in decision-making processes. The evolving regulations emphasize the need for comprehensive AI governance policies that cover the entire AI/ML lifecycle, and regular audits and reviews of AI systems to address biases, transparency, and explainability in algorithms. Adherence to standards fosters trust, mitigates risks, and promotes responsible deployment of these advanced technologies.
Conclusion
The journey to establishing a successful AI/ML center of excellence is a multifaceted endeavor that requires dedication and strategic planning, while operating with agility and collaborative spirit. As the landscape of artificial intelligence and machine learning continues to evolve at a rapid pace, the creation of an AI/ML CoE represents a necessary step towards harnessing these technologies for transformative impact. By focusing on the key considerations, from defining a clear mission to fostering innovation and enforcing ethical governance, organizations can lay a solid foundation for AI/ML initiatives that drive value. Moreover, an AI/ML CoE is not just a hub for technological innovation; it’s a beacon for cultural change within the organization, promoting a mindset of continuous learning, ethical responsibility, and cross-functional collaboration.
Stay tuned as we continue to explore the AI/ML CoE topics in our upcoming posts in this series. If you need help establishing an AI/ML Center of Excellence, please reach out to a specialist.

About the Authors
Ankush Chauhan is a Sr. Manager, Customer Solutions at AWS based in New York, US. He supports Capital Markets customers optimize their cloud journey, scale adoption, and realize the transformative value of building and inventing in the cloud. In addition, he is focused on enabling customers on their AI/ML journeys including generative AI. Beyond work, you can find Ankush running, hiking, or watching soccer.
Ava Kong is a Generative AI Strategist at the AWS Generative AI Innovation Center, specializing in the financial services sector. Based in New York, Ava has worked closely with a variety of financial institutions on a variety of use cases, combining the latest in generative AI technology with strategic insights to enhance operational efficiency, drive business outcomes, and demonstrate the broad and impactful application of AI technologies.
Vikram Elango is a Sr. AI/ML Specialist Solutions Architect at AWS, based in Virginia, US. He is currently focused on generative AI, LLMs, prompt engineering, large model inference optimization, and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.
Rifat Jafreen is a Generative AI Strategist in the AWS Generative AI Innovation center where her focus is to help customers realize business value and operational efficiency by using generative AI. She has worked in industries across telecom, finance, healthcare and energy; and onboarded machine learning workloads for numerous customers. Rifat is also very involved in MLOps, FMOps and Responsible AI.
Authors would like to extend special thanks to Arslan Hussain, David Ping, Jarred Graber, and Raghvender Arni, for their support, expertise, and guidance.

Deep Learning Techniques for Autonomous Driving: An Overview

Over the past decade, advancements in deep learning and artificial intelligence have driven significant strides in self-driving vehicle technology. These technologies have revolutionized computer vision, robotics, and natural language processing and played a pivotal role in the autonomous driving revolution. From basic driver assistance to fully autonomous vehicles(AVs) capable of navigating without human intervention, the progression is evident through the SAE Levels of vehicle automation. Despite most scenarios being solvable with traditional methods, unresolved corner cases highlight the necessity for AI-driven solutions. With sensors enabling perception and communication technologies like 5G aiding extended perception, AVs promise safer, more efficient transportation, albeit with challenges like sensor reliability and integration.

Deep Learning-based Decision-Making Architectures for Self-Driving Cars:

Self-driving cars rely on complex decision-making systems that analyze data from various sensors to navigate autonomously. These systems can be modular, with distinct components for perception, path planning, behavior arbitration, and motion control, each designed using AI or classical methods. Alternatively, an End2End learning approach directly maps sensory data to control outputs. Safety monitors ensure the reliability of each module. Understanding the environment, planning paths, behavior arbitration, and motion control are essential tasks. Classical methodologies for these tasks are also explored. Deep learning and AI technologies play crucial roles in both modular and End2End systems for autonomous driving.

Overview of Deep Learning Technologies:

Deep learning plays an important role in autonomous driving, with CNNs being crucial for processing spatial information like images, replacing traditional handcrafted features with learned representations. Mimicking aspects of the mammalian visual cortex, CNNs efficiently detect image features, aiding in object recognition. RNNs excel in processing temporal sequences such as video streams or text. Unlike conventional networks, RNNs possess a time-dependent feedback loop, allowing them to capture temporal dependencies. Long Short-Term Memory (LSTM) networks mitigate the vanishing gradient problem encountered in basic RNNs, enabling the modeling of longer-term dependencies in sequences.

DRL presents a paradigm for autonomous driving, employing the Partially Observable Markov Decision Process formalism. In this framework, an agent, like a self-driving car, navigates an environment based on observed sensory data, taking actions to maximize cumulative future rewards. DRL models, such as Deep Q-Networks (DQN), estimate optimal action policies by training neural networks to approximate the maximum expected future rewards. Extensions to the base DQN algorithm, like Double Q Learning and Prioritized replay, enhance its performance, offering promising avenues for autonomous driving applications. However, challenges remain in adapting DRL to real-world driving conditions.

Deep Learning for Driving Scene Perception and Localization:

Autonomous vehicles rely on perceiving their surroundings to navigate safely. The methods involve deep learning, particularly for object detection, recognition, and scene understanding. The debate between camera and LiDAR sensors persists, each having advantages and limitations. While LiDAR offers precise 3D data but is costly and susceptible to weather, cameras are cost-efficient but lack depth perception. Researchers aim to bridge this gap by generating LiDAR-like point clouds from visual depth estimation. Deep learning architectures are employed for object detection, semantic segmentation, and localization, leveraging camera and LiDAR data for comprehensive scene understanding essential for autonomous driving.

Safety of Deep Learning in Autonomous Driving:

Ensuring the safety of autonomous driving systems that utilize deep learning is a multifaceted challenge. Safety hinges on understanding potential failures, the system’s context and defining safe behavior. Different definitions of safety exist, from risk reduction to minimizing harm from unwanted outcomes. Existing standards like ISO 26262 provide a framework, but adapting them for deep learning is complex. Deep learning introduces unique hazards and uncertainties, requiring new fault detection and mitigation approaches. While machine learning techniques are becoming more reliable, comprehensive safety assurance for deep learning in safety-critical systems remains an ongoing endeavor, necessitating the development of tailored safety standards.

Conclusion:

In the realm of autonomous driving, several open challenges persist, all of which can be addressed with the help of Deep Learning and AI:

Perception: Deep learning enhances object detection and recognition accuracy, but future systems should aim for increased detail recognition and improved camera and LiDAR data integration.

Short- to middle-term reasoning: AI and deep learning are crucial for path planning, particularly in local trajectory estimation and planning.

Availability of training data: Deep learning’s efficacy relies heavily on data quality, with simulation environments bridging the gap between real-world data scarcity and training requirements.

Learning corner cases: Deep learning algorithms need enhanced generalization power to handle rare driving scenarios, necessitating the development of one-shot and low-shot learning methods.

Learning-based control methods: Deep learning can adaptively learn control parameters, improving autonomous vehicle performance by approximating true system models.

Functional safety: Integrating deep learning into safety-critical systems poses challenges, particularly in meeting existing safety standards and ensuring the explainability, stability, and robustness of neural networks.

Real-time computing and communication: Meeting real-time processing requirements for large sensor data volumes and high-speed communication lines requires advances in hardware and communication networks.

The post Deep Learning Techniques for Autonomous Driving: An Overview appeared first on MarkTechPost.

TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Sp …

Wearables have transformed human-technology interaction, facilitating continuous health monitoring. The wearables market is projected to surge from 70 billion USD in 2023 to 230 billion USD by 2032, with head-worn devices, including earphones and glasses, experiencing rapid growth (71 billion USD in 2023 to 172 billion USD by 2030). This growth is propelled by the rising significance of wearables, augmented reality (AR), and virtual reality (VR). Head-worn wearables uniquely capture speech signals, traditionally collected by over-the-air (OTA) microphones near or on the head, converting air pressure fluctuations into electrical signals for various applications. However, OTA microphones, typically located near the mouth, easily capture background noise, potentially compromising speech quality, particularly in noisy environments.

Various studies have tackled the challenge of separating speech from background noise through denoising, sound source separation, and speech enhancement techniques. However, this approach is hindered by the model’s inability to anticipate the diverse types of background noises and the prevalence of noisy environments, such as bustling cafeterias or construction sites. Unlike OTA microphones, bone conduction microphones (BCM) placed directly on the head are resilient to ambient noise, detecting vibrations from the skin and skull during speech. Although BCMs offer noise robustness, vibration-based methods suffer from frequency attenuation, affecting speech intelligibility. Some research endeavors explore vibration and bone-conduction super-resolution methods to reconstruct higher frequencies for improved speech quality, yet practical implementation for real-time wearable systems faces challenges. These include the heavy processing demands of state-of-the-art speech super-resolution models like generative adversarial networks (GANs), which require substantial memory and computational resources, resulting in performance gaps compared to smaller footprint methods. Optimization considerations, such as sampling rate and deployment strategies, remain crucial for enhancing real-time system efficiency.

Researchers from Northwestern University and Columbia University introduced TRAMBA, a hybrid transformer, and Mamba architecture for enhancing acoustic and bone conduction speech in mobile and wearable platforms. Previously, adopting bone conduction speech enhancement in such platforms faced challenges due to labor-intensive data collection and performance gaps between models. TRAMBA addresses this by pre-training with widely available audio speech datasets and fine-tuning with a small amount of bone conduction data. It achieves reconstructing intelligible speech using a single wearable accelerometer, demonstrating generalizability across multiple acoustic modalities. Integrated into wearable and mobile platforms, TRAMBA enables real-time speech super-resolution and significant power consumption reduction. This is also the first study to sense intelligible speech using only a single head-worn accelerometer.

At a macro level, TRAMBA architecture integrates a modified U-Net structure with self-attention in the downsampling and upsampling layers, along with Mamba in the narrow bottleneck layer. TRAMBA operates on 512ms windows of single-channel audio and preprocesses acceleration data from an accelerometer. Each downsampling block consists of a 1D convolutional layer with LeakyReLU activations, followed by a robust conditioning layer called Scale-only Attention-based Feature-wise Linear Modulation (SAFiLM). SAFiLM utilizes a multi-head attention mechanism to learn scaling factors for enhancing feature representations. The bottleneck layer employs Mamba, known for its efficient memory usage and attention mechanisms akin to transformers. However, due to gradient vanishing issues, transformers are retained only in the downsampling and upsampling blocks. Residual connections are employed to facilitate gradient flow and optimize deeper networks, enhancing training efficiency.

TRAMBA exhibits superior performance across various metrics and sampling rates compared to other models, including U-Net architectures. Although the Aero GAN method slightly outperforms TRAMBA in the LSD metric, TRAMBA excels in perceptual and noise metrics such as SNR, PESQ, and STOI. This highlights the effectiveness of integrating transformers and Mamba in enhancing local speech formants compared to traditional architectures. Also, transformer and Mamba-based models demonstrate superior performance over state-of-the-art GANs with significantly reduced memory and inference time requirements. Notably, TRAMBA’s efficient processing allows for real-time operation, unlike Aero GAN, which exceeds the window size, making it impractical for real-time applications. Comparisons with the top-performing U-Net architecture (TUNet) are also made.

In conclusion, this study presents TRAMBA, a hybrid architecture combining transformer and Mamba elements for speech super-resolution and enhancement on mobile and wearable platforms. It surpasses existing methods across various acoustic modalities while maintaining a compact memory footprint of only 19.7 MBs, contrasting with GANs requiring at least hundreds of MBs. Integrated into real mobile and head-worn wearable systems, TRAMBA exhibits superior speech quality in noisy environments compared to traditional denoising approaches. Also, it extends battery life by up to 160% by reducing the resolution of audio that needs to be sampled and transmitted. TRAMBA represents a crucial advancement for integrating speech enhancement into practical mobile and wearable platforms.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Speech Super Resolution and Enhancement for Mobile and Wearable Platforms appeared first on MarkTechPost.

What Are The Dimensions For Creating Retrieval Augmented Generation (R …

In the dynamic realm of Artificial Intelligence, Natural Language Processing (NLP), and Information Retrieval, advanced architectures like Retrieval Augmented Generation (RAG) have gained a significant amount of attention. However, most data science researchers suggest not to leap into sophisticated RAG models until the evaluation pipeline is completely reliable and robust.

Carefully assessing RAG pipelines is vital, but it is frequently overlooked in the rush to incorporate cutting-edge features. It is recommended that researchers and practitioners strengthen their evaluation set up as a top priority before tackling intricate model improvements. 

Comprehending the assessment nuances for RAG pipelines is critical because these models depend on both generation capabilities and retrieval quality. The dimensions have been divided into two important categories, which are as follows.

 1. Retrieval Dimensions  

a. Context Precision: It determines if every ground-truth item in the context has a higher priority ranking than any other item.

b. Context Recall: It assesses the degree to which the ground-truth response and the recovered context correspond. It is dependent on the retrieved context as well as the ground truth.

c. Context Relevance: It evaluates the contexts that are offered in order to assess the relevance of the retrieved context.

d. Context Entity Recall: By comparing the number of entities present in the ground truths and the contexts to the number of entities present in the ground truths alone, the Context Entity Recall metric calculates the recall of the retrieved context.

e. Noise Robustness: The Noise Robustness metric assesses the model’s ability to handle question-related noise documents that don’t provide much information.

2. Generation dimensions

a. Faithfulness: It evaluates the generated response’s factual consistency in according to the given context. 

b. Answer Relevance It calculates how well the generated response responds to the given question. Lower points are awarded for answers that contain redundant or missing information, and vice versa. 

c. Negative Rejection: It assesses the model’s capacity to hold off on responding when the documents it has obtained don’t include enough information to address a query. 

d. Information Integration: It evaluates how well the model can integrate data from different documents to provide answers to complex questions.

e. Counterfactual Robustness: It assesses the model’s ability to recognize and ignore known errors in documents, even while it is aware of possible disinformation.

Here are some frameworks consisting of these dimensions which can be accessed by the following links.

1. Ragas – https://docs.ragas.io/en/stable/

2. TruLens – https://www.trulens.org/

3. ARES – https://ares-ai.vercel.app/

4. DeepEval – https://docs.confident-ai.com/docs/getting-started

5. Tonic Validate – https://docs.tonic.ai/validate

6. LangFuse – https://langfuse.com/

This article is inspired by this LinkedIn post.
The post What Are The Dimensions For Creating Retrieval Augmented Generation (RAG) Pipelines? appeared first on MarkTechPost.

Build a Hugging Face text classification model in Amazon SageMaker Jum …

Amazon SageMaker JumpStart provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including image, text, and tabular.
This post introduces using the text classification and fill-mask models available on Hugging Face in SageMaker JumpStart for text classification on a custom dataset. We also demonstrate performing real-time and batch inference for these models. This supervised learning algorithm supports transfer learning for all pre-trained models available on Hugging Face. It takes a piece of text as input and outputs the probability for each of the class labels. You can fine-tune these pre-trained models using transfer learning even when a large corpus of text isn’t available. It’s available in the SageMaker JumpStart UI in Amazon SageMaker Studio. You can also use it through the SageMaker Python SDK, as demonstrated in the example notebook Introduction to SageMaker HuggingFace – Text Classification.
Solution overview
Text classification with Hugging Face in SageMaker provides transfer learning on all pre-trained models available on Hugging Face. According to the number of class labels in the training data, a classification layer is attached to the pre-trained Hugging Face model. Then either the whole network, including the pre-trained model, or only the top classification layer can be fine-tuned on the custom training data. In this transfer learning mode, training can be achieved even with a smaller dataset.
In this post, we demonstrate how to do the following:

Use the new Hugging Face text classification algorithm
Perform inference with the Hugging Face text classification algorithm
Fine-tune the pre-trained model on a custom dataset
Perform batch inference with the Hugging Face text classification algorithm

Prerequisites
Before you run the notebook, you must complete some initial setup steps. Let’s set up the SageMaker execution role so it has permissions to run AWS services on your behalf:

!pip install sagemaker –upgrade –quiet

import sagemaker, boto3, json
from sagemaker.session import Session
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

Run inference on the pre-trained model
SageMaker JumpStart support inference for any text classification model available through Hugging Face. The model can be hosted for inference and support text as the application/x-text content type. This will not only allow you to use a set of pre-trained models, but also enable you to choose other classification tasks.
The output contains the probability values, class labels for all classes, and the predicted label corresponding to the class index with the highest probability encoded in JSON format. The model processes a single string per request and outputs only one line. The following is an example of a JSON format response:

accept: application/json;verbose
{“probabilities”: [prob_0, prob_1, prob_2, …],
“labels”: [label_0, label_1, label_2, …],
“predicted_label”: predicted_label}

If accept is set to application/json, then the model only outputs probabilities. For more details on training and inference, see the sample notebook.
You can run inference on the text classification model by passing the model_id in the environment variable while creating the object of the Model class. See the following code:

from sagemaker.jumpstart.model import JumpStartModel

hub = {}
HF_MODEL_ID = ‘distilbert-base-uncased-finetuned-sst-2-english’ # Pass any other HF_MODEL_ID from – https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hub[‘HF_MODEL_ID’] = HF_MODEL_ID
hub[‘HF_TASK’] = ‘text-classification’

model = JumpStartModel(model_id=infer_model_id, env =hub, enable_network_isolation=False

Fine-tune the pre-trained model on a custom dataset
You can fine-tune each of the pre-trained fill-mask or text classification models to any given dataset made up of text sentences with any number of classes. The pretrained model attaches a classification layer to the text embedding model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes detected in the input data. The objective is to minimize classification errors on the input data. Then you can deploy the fine-tuned model for inference.
The following are the instructions for how the training data should be formatted for input to the model:

Input – A directory containing a data.csv file. Each row of the first column should have an integer class label between 0 and the number of classes. Each row of the second column should have the corresponding text data.
Output – A fine-tuned model that can be deployed for inference or further trained using incremental training.

The following is an example of an input CSV file. The file should not have any header. The file should be hosted in an Amazon Simple Storage Service (Amazon S3) bucket with a path similar to the following: s3://bucket_name/input_directory/. The trailing / is required.

|0 |hide new secretions from the parental units|
|0 |contains no wit , only labored gags|
|1 |that loves its characters and communicates something rather beautiful about human nature|
|…|…|

The algorithm also supports transfer learning for Hugging Face pre-trained models. Each model is identified by a unique model_id. The following example shows how to fine-tune a BERT base model identified by model_id=huggingface-tc-bert-base-cased on a custom training dataset. The pre-trained model tarballs have been pre-downloaded from Hugging Face and saved with the appropriate model signature in S3 buckets, such that the training job runs in network isolation.
For transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default, update them as needed, and then pass them to the Estimator class. The hyperparameter Train_only_top_layer defines which model parameters change during the fine-tuning process. If train_only_top_layer is True, parameters of the classification layers change and the rest of the parameters remain constant during the fine-tuning process. If train_only_top_layer is False, all parameters of the model are fine-tuned. See the following code:

from sagemaker import hyperparameters# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)# [Optional] Override default hyperparameters with custom values
hyperparameters[“epochs”] = “5”

For this use case, we provide SST2 as a default dataset for fine-tuning the models. The dataset contains positive and negative movie reviews. It has been downloaded from TensorFlow under the Apache 2.0 License. The following code provides the default training dataset hosted in S3 buckets:

# Sample training data is available in this bucket
training_data_bucket = f”jumpstart-cache-prod-{aws_region}”
training_data_prefix = “training-datasets/SST/”

training_dataset_s3_path = f”s3://{training_data_bucket}/{training_data_prefix}”

We create an Estimator object by providing the model_id and hyperparameters values as follows:

# Create SageMaker Estimator instance
tc_estimator = JumpStartEstimator(
hyperparameters=hyperparameters,
model_id=dropdown.value,
instance_type=training_instance_type,
metric_definitions=training_metric_definitions,
output_path=s3_output_location,
enable_network_isolation=False if model_id == “huggingface-tc-models” else True
)

To launch the SageMaker training job for fine-tuning the model, call .fit on the object of the Estimator class, while passing the S3 location of the training dataset:

# Launch a SageMaker Training job by passing s3 path of the training data
tc_estimator.fit({“training”: training_dataset_s3_path}, logs=True)

You can view performance metrics such as training loss and validation accuracy/loss through Amazon CloudWatch while training. You can also fetch these metrics and analyze them using TrainingJobAnalytics:

df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe() #It will produce a dataframe with different metrics
df.head(10)

The following graph shows different metrics collected from the CloudWatch log using TrainingJobAnalytics.

For more information about how to use the new SageMaker Hugging Face text classification algorithm for transfer learning on a custom dataset, deploy the fine-tuned model, run inference on the deployed model, and deploy the pre-trained model as is without first fine-tuning on a custom dataset, see the following example notebook.
Fine-tune any Hugging Face fill-mask or text classification model
SageMaker JumpStart supports the fine-tuning of any pre-trained fill-mask or text classification Hugging Face model. You can download the required model from the Hugging Face hub and perform the fine-tuning. To use these models, the model_id is provided in the hyperparameters as hub_key. See the following code:

HF_MODEL_ID = “distilbert-base-uncased” # Specify the HF_MODEL_ID here from https://huggingface.co/models?pipeline_tag=fill-mask&sort=downloads or https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
hyperparameters[“hub_key”] = HF_MODEL_ID

Now you can construct an object of the Estimator class by passing the updated hyperparameters. You call .fit on the object of the Estimator class while passing the S3 location of the training dataset to perform the SageMaker training job for fine-tuning the model.
Fine-tune a model with automatic model tuning
SageMaker automatic model tuning (ATM), also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. In the following code, you use a HyperparameterTuner object to interact with SageMaker hyperparameter tuning APIs:

from sagemaker.tuner import ContinuousParameter
# Define objective metric based on which the best model will be selected.
amt_metric_definitions = {
“metrics”: [{“Name”: “val_accuracy”, “Regex”: “‘eval_accuracy’: ([0-9\.]+)”}],
“type”: “Maximize”,
}
# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
“learning_rate”: ContinuousParameter(0.00001, 0.0001, scaling_type=”Logarithmic”)
}
# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 6
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 2

After you have defined the arguments for the HyperparameterTuner object, you pass it the Estimator and start the training. This will find the best-performing model.
Perform batch inference with the Hugging Face text classification algorithm
If the goal of inference is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch inference functionality may be most straightforward, more scalable, and more appropriate.
Batch inference is useful in the following scenarios:

Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset
Get inference from large datasets
Run inference when you don’t need a persistent endpoint
Associate input records with inferences to assist the interpretation of results

For running batch inference in this use case, you first download the SST2 dataset locally. Remove the class label from it and upload it to Amazon S3 for batch inference. You create the object of Model class without providing the endpoint and create the batch transformer object from it. You use this object to provide batch predictions on the input data. See the following code:

batch_transformer = model.transformer(
instance_count=1,
instance_type=inference_instance_type,
output_path=output_path,
assemble_with=”Line”,
accept=”text/csv”
)

batch_transformer.transform(
input_path, content_type=”text/csv”, split_type=”Line”
)

batch_transformer.wait()

After you run batch inference, you can compare the predication accuracy on the SST2 dataset.
Conclusion
In this post, we discussed the SageMaker Hugging Face text classification algorithm. We provided example code to perform transfer learning on a custom dataset using a pre-trained model in network isolation using this algorithm. We also provided the functionality to use any Hugging Face fill-mask or text classification model for inference and transfer learning. Lastly, we used batch inference to run inference on large datasets. For more information, check out the example notebook.

About the authors
Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.
Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

How Dialog Axiata used Amazon SageMaker to scale ML models in producti …

The telecommunications industry is more competitive than ever before. With customers able to easily switch between providers, reducing customer churn is a crucial priority for telecom companies who want to stay ahead. To address this challenge, Dialog Axiata has pioneered a cutting-edge solution called the Home Broadband (HBB) Churn Prediction Model.
This post explores the intricacies of Dialog Axiata’s approach, from the meticulous creation of nearly 100 features across ­10 distinct areas and the implementation of two essential models using Amazon SageMaker:

A base model powered by CatBoost, an open source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm
An ensemble model, taking advantage of the strengths of multiple machine learning (ML) models

About Dialog Axiata
Dialog Axiata PLC (part of the Axiata Group Berhad) is one of Sri Lanka’s largest quad-play telecommunications service providers and the country’s largest mobile network operator with 17.1 million subscribers, which amounts to 57% of the Sri Lankan mobile market. Dialog Axiata provides a variety of services, such as fixed-line, home broadband, mobile, television, payment apps, and financial services in Sri Lanka.
In 2022, Dialog Axiata made significant progress in their digital transformation efforts, with AWS playing a key role in this journey. They focused on improving customer service using data with artificial intelligence (AI) and ML and saw positive results, with their Group AI Maturity increasing from 50% to 80%, according to the TM Forum’s AI Maturity Index.
Dialog Axiata runs some of their business-critical telecom workloads on AWS, including Charging Gateway, Payment Gateway, Campaign Management System, SuperApp, and various analytics tasks. They use variety of AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Kubernetes Service (Amazon EKS) for computing, Amazon Relational Database Service (Amazon RDS) for databases, Amazon Simple Storage Service (Amazon S3) for object storage, Amazon OpenSearch Service for search and analytics, SageMaker for ML, and AWS Glue for data integration. This strategic use of AWS services delivers efficiency and scalability of their operations, as well as the implementation of advanced AI/ML applications.
For more about how Axiata uses AWS services, see Axiata Selects AWS as its Primary Cloud Provider to Drive Innovation in the Telecom Industry
Challenges with understanding customer churn
The Sri Lankan telecom market has high churn rates due to several factors. Multiple mobile operators provide similar services, making it easy for customers to switch between providers. Prepaid services dominate the market, and multi-SIM usage is widespread. These conditions lead to a lack of customer loyalty and high churn rates.
In addition to its core business of mobile telephony, Dialog Axiata also offers a number of services, including broadband connections and Dialog TV. However, customer churn is a common issue in the telecom industry. Therefore, Dialog Axiata needs to find ways to reduce their churn rate and retain more of their existing home broadband customers. Potential solutions could involve improving customer satisfaction, enhancing value propositions, analyzing reasons for churn, or implementing customer retention initiatives. The key is for Dialog Axiata to gain insights into why customers are leaving and take meaningful actions to increase customer loyalty and satisfaction.
Solution overview
To reduce customer churn, Dialog Axiata used SageMaker to build a predictive model that assigns each customer a churn risk score. The model was trained on demographic, network usage, and network outage data from across the organization. By predicting churn 45 days in advance, Dialog Axiata is able to proactively retain customers and significantly reduce customer churn.
Dialog Axiata’s churn prediction approach is built on a robust architecture involving two distinct pipelines: one dedicated to training the models, and the other for inference or making predictions. The training pipeline is responsible for developing the base model, which is a CatBoost model trained on a comprehensive set of features. To further enhance the predictive capabilities, an ensemble model is also trained to identify potential churn instances that may have been missed by the base model. This ensemble model is designed to capture additional insights and patterns that the base model alone may not have effectively captured.
The integration of the ensemble model alongside the base model creates a synergistic effect, resulting in a more comprehensive and accurate inference process. By combining the strengths of both models, Dialog Axiata’s churn prediction system gains an enhanced overall predictive capability, providing a more robust and reliable identification of customers at risk of churning.
Both the training and inference pipelines are run three times per month, aligning with Dialog Axiata’s billing cycle. This regular schedule makes sure that the models are trained and updated with the latest customer data, enabling timely and accurate churn predictions.
In the training process, features are sourced from Amazon SageMaker Feature Store, which houses nearly 100 carefully curated features. Because real-time inference is not a requirement for this specific use case, an offline feature store is used to store and retrieve the necessary features efficiently. This approach allows for batch inference, significantly reducing daily expenses to under $0.50 while processing batch sizes averaging around 100,000 customers within a reasonable runtime of approximately 50 minutes.
Dialog Axiata has meticulously selected instance types to strike a balance between optimal resource utilization and cost-effectiveness. However, should the need arise for faster pipeline runtime, larger instance types can be recommended. This flexibility allows Dialog Axiata to adjust the pipeline’s performance based on specific requirements, while considering the trade-off between speed and cost considerations.
After the predictions are generated separately using both the base model and the ensemble model, Dialog Axiata takes action to retain the customers identified as potential churn risks. The customers predicted to churn by the base model, along with those exclusively identified by the ensemble model, are targeted with personalized retention campaigns. By excluding any overlapping customers between the two models, Dialog Axiata ensures a focused and efficient outreach strategy.
The following figure illustrates the output predictions and churn probabilities generated by the base model and the ensemble model.

The first table is the output from the base model, which provides valuable insights into each customer’s churn risk. The columns in this table include a customer identifier (Cx), a Churn Reason column that highlights potential reasons for churn, such as Daily Usage or ARPU Drop (Average Revenue Per User), and a Churn Probability column that quantifies the likelihood of each customer churning.
The second table presents the output from the ensemble model, a complementary approach designed to capture additional churn risks that may have been missed by the base model. This table has two columns: the customer identifier (Cx) and a binary Churn column that indicates whether the customer is predicted to churn (1) or not (0).
The arrows connecting the two tables visually represent the process Dialog Axiata employs to comprehensively identify customers at risk of churning.
The following figure showcases the comprehensive output of this analysis, where customers are meticulously segmented, scored, and classified according to their propensity to churn or discontinue their services. The analysis delves into various factors, such as customer profiles, usage patterns, and behavioral data, to accurately identify those at a higher risk of churning. With this predictive model, Dialog Axiata can pinpoint specific customer segments that require immediate attention and tailored retention efforts.

With this powerful information, Dialog Axiata develops targeted retention strategies and campaigns specifically designed for high-risk customer groups. These campaigns may include personalized offers, as shown in the following figure, incentives, or customized communication aimed at addressing the unique needs and concerns of at-risk customers.

These personalized campaigns, tailored to each customer’s needs and preferences, aim to proactively address their concerns and provide compelling reasons for them to continue their relationship with Dialog Axiata.
Methodologies
This solution uses the following methodologies:

Comprehensive analysis of customer data – The foundation of the solution’s success lies in the comprehensive analysis of more than 100 features spanning demographic, usage, payment, network, package, geographic (location), quad-play, customer experience (CX) status, complaint, and other related data. This meticulous approach allows Dialog Axiata to gain valuable insights into customer behavior, enabling them to predict potential churn events with remarkable accuracy.
Dual-model strategy (base and ensemble models) – What sets Dialog Axiata’s approach apart is the use of two essential models. The base model, powered by CatBoost, provides a solid foundation for churn prediction. The threshold probability to define churn is calculated by considering ROC optimization and business requirements. Concurrently, the ensemble model strategically combines the strengths of various algorithms. This combination enhances the robustness and accuracy of the predictions. The models are developed considering precision as the evaluation parameter.
Actionable insights shared with business units – The insights derived from the models are not confined to the technical realm. Dialog Axiata ensures that these insights are effectively communicated and put into action by sharing the models separately with the business units. This collaborative approach means that the organization is better equipped to proactively address customer churn.
Proactive measures with two action types – Equipped with insights from the models, Dialog Axiata has implemented two main action types: network issue-based and non-network issue-based. During the inference phase, the churn status and churn reason are predicted. The top five features that have a high probability for the churn reason are selected using SHAP (SHapley Additive exPlanations). Then, the selected features associated with the churn reason are further classified into two categories: network issue-based and non-network issue-based. If there are features related to network issues, those users are categorized as network issue-based users. The resultant categorization, along with the predicted churn status for each user, is then transmitted for campaign purposes. This information is valuable in scheduling targeted campaigns based on the identified churn reasons, enhancing the precision and effectiveness of the overall campaign strategy.

Dialog Axiata’s AI Factory
Dialog Axiata built the AI Factory to facilitate running all AI/ML workloads on a single platform with multiple capabilities across various building blocks. To tackle technical aspects and challenges related to continuous integration and continuous delivery (CI/CD) and cost-efficiency, Dialog Axiata turned to the AI Factory framework. Using the power of SageMaker as the platform, they implemented separate SageMaker pipelines for model training and inference, as shown in the following diagram.

A primary advantage lies in cost reduction through the implementation of CI/CD pipelines. By conducting experiments within these automated pipelines, significant cost savings could be achieved. It also helps maintain an experiment version tracking system. Additionally, the integration of AI Factory components contributes to a reduction in time to production and overall workload by reducing repetitive tasks through the use of reusable artifacts. The incorporation of an experiment tracking system facilitates the monitoring of performance metrics, enabling a data-driven approach to decision-making.
Furthermore, the deployment of alerting systems enhances the proactive identification of failures, allowing for immediate actions to resolve issues. Data drift and model drift are also monitored. This streamlined process makes sure that any issues are addressed promptly, minimizing downtime and optimizing system reliability. By developing this project under the AI Factory framework, Dialog Axiata could overcome the aforementioned challenges.
Furthermore, the AI Factory framework provides a robust security framework to govern confidential user data and access permissions. It offers solutions to optimize AWS costs, including lifecycle configurations, alerting systems, and monitoring dashboards. These measures contribute to enhanced data security and cost-effectiveness, aligning with Dialog Axiata’s objectives and resulting in the efficient operation of AI initiatives.
Dialog Axiata’s MLOps process
The following diagram illustrates Dialog Axiata’s MLOps process.

The following key components are used in the process:

SageMaker as the ML Platform – Dialog Axiata uses SageMaker as their core ML platform to perform feature engineering, and train and deploy models in production.
SageMaker Feature Store – By using a centralized repository for ML features, SageMaker Feature Store enhances data consumption and facilitates experimentation with validation data. Instead of directly ingesting data from the data warehouse, the required features for training and inference steps are taken from the feature store. With SageMaker Feature Store, Dialog Axiata could reduce the time for feature creation because they could reuse the same features.
Amazon SageMaker Pipelines – Amazon SageMaker Pipelines is a CI/CD service for ML. These workflow automation components helped the Dialog Axiata team effortlessly scale their ability to build, train, test, and deploy multiple models in production; iterate faster; reduce errors due to manual orchestration; and build repeatable mechanisms.
Reusable components – Employing containerized environments, such as Docker images, and custom modules promoted the bring your own code approach within Dialog Axiata’s ML pipelines.
Monitoring and alerting – Monitoring tools and alert systems provided ongoing success by keeping track of the model and pipeline status.

Business outcomes
The churn prediction solution implemented by Dialog Axiata has yielded remarkable business outcomes, exemplifying the power of data-driven decision-making and strategic deployment of AI/ML technologies. Within a relatively short span of 5 months, the company witnessed a substantial reduction in month-over-month gross churn rates, a testament to the effectiveness of the predictive model and the actionable insights it provides.
This outstanding achievement not only underscores the robustness of the solution, it also highlights its pivotal role in fortifying Dialog Axiata’s position as a leading player in Sri Lanka’s highly competitive telecommunications landscape. By proactively identifying and addressing potential customer churn risks, the company has reinforced its commitment to delivering exceptional service and fostering long-lasting customer relationships.
Conclusion
Dialog Axiata’s journey in overcoming telecom churn challenges showcases the power of innovative solutions and the seamless integration of AI technologies. By using the AI Factory framework and SageMaker, Dialog Axiata not only addressed complex technical challenges, but also achieved tangible business benefits. This success story emphasizes the crucial role of predictive analytics in staying ahead in the competitive telecom industry, demonstrating the transformative impact of advanced AI models.
We appreciate you for reading this post, and hope you learned something new and useful. Please don’t hesitate to leave your feedback in the comments section.
Thank you Nilanka S. Weeraman, Sajani Jayathilaka, and Devinda Liyanage for your valuable contributions to this blog post.

About the Authors

Senthilvel (Vel) Palraj is a Senior Solutions Architect at AWS with over 15 years of IT experience. In this role, he helps customers in the telco, and media and entertainment industries across India and SAARC countries transition to the cloud. Before joining AWS India, Vel worked as a Senior DevOps Architect with AWS ProServe North America, supporting major Fortune 500 corporations in the United States. He is passionate about GenAI & AIML and leverages his deep knowledge to provide strategic guidance to companies looking to adopt and optimize AWS services. Outside of work, Vel enjoys spending time with his family and mountain biking on rough terrains.
Chamika Ramanayake is the Head of AI Platforms at Dialog Axiata PLC, Sri Lanka’s leading telecommunications company. He leverages his 7 years of experience in the telecommunication industry when leading his team to design and set the foundation to operationalize the end-to-end AI/ML system life cycle in the AWS cloud environment. He holds an MBA from PIM, University of Sri Jayawardenepura, and a B.Sc. Eng (Hons) in Electronics and Telecommunication Engineering from the University of Moratuwa.

Amazon SageMaker now integrates with Amazon DataZone to streamline mac …

Amazon SageMaker is a fully managed machine learning (ML) service that provides a range of tools and features for building, training, and deploying ML models. Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources.
Today, we are excited to announce an integration between Amazon SageMaker and Amazon DataZone to help you set up infrastructure with security controls, collaborate on machine learning (ML) projects, and govern access to data and ML assets.
When solving a business problem with ML, you create ML models from training data and integrate those models with business applications to make predictive decisions. For example, you could use an ML model for loan application processing to make decisions such as approving or denying a loan. When deploying such ML models, effective ML governance helps build trust in ML-powered applications, minimize risks, and promote responsible AI practices.
A comprehensive governance strategy spans across infrastructure, data, and ML. ML governance requires implementing policies, procedures, and tools to identify and mitigate various risks associated with ML use cases. Applying governance practices at every stage of the ML lifecycle is essential for successfully maximizing the value for the organization. For example, when building a ML model for a loan application processing use case, you can align the model development and deployment with your organization’s overall governance policies and controls to create effective loan approval workflows.
However, it might be challenging and time-consuming to apply governance across an ML lifecycle because it typically requires custom workflows and integration of several tools. With the new built-in integration between SageMaker and Amazon DataZone, you can streamline setting up ML governance across infrastructure, collaborate on business initiatives, and govern data and ML assets in just a few clicks.
For governing ML use cases, this new integration offers the following capabilities:

Business project management – You can create, edit, and view projects, as well as add users to start collaborating on the shared business objective
Infrastructure management – You can create multiple project environments and deploy infrastructure resources with embedded security controls to meet the enterprise needs
Asset governance – Users can search, discover, request access, and publish data and ML assets along with business metadata to the enterprise business catalog

In this post, we dive deep into how to set up and govern ML use cases. We discuss the end-to-end journey for setup and configuration of the SageMaker and Amazon DataZone integration. We also discuss how you can use self-service capabilities to discover, subscribe, consume, and publish data and ML assets as you work through your ML lifecycle.
Solution overview
With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data. These controls are designed to enforce access with the right level of privileges and context. Amazon DataZone makes it effortless for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so that they can discover, use, and collaborate to derive data-driven insights. The following diagram illustrates a sample architecture of Amazon DataZone and Amazon SageMaker integration.

With this integration, you can deploy SageMaker infrastructure using blueprints. The new SageMaker blueprint provides a well-architected infrastructure template. With this template, ML administrators can build a SageMaker environment profile with appropriate controls from services such as Amazon Virtual Private Cloud (VPC), Amazon Key Management Service (KMS Keys), and AWS Identity and Access Management (IAM), and enable ML builders to use this environment profile to deploy a SageMaker domain in minutes. When you create a SageMaker environment using the SageMaker environment profile, Amazon DataZone provisions a data and ML asset catalog, Amazon SageMaker Studio, and (IAM) roles for managing Amazon DataZone project permissions. The following diagram shows how the SageMaker environment fits in with the existing environments in Amazon DataZone projects.

To facilitate data and ML asset governance from SageMaker Studio, we extended SageMaker Studio to incorporate the following component:

Asset – A data or ML resource that can be published to a catalog or project inventory, discovered, and shared. Amazon Redshift tables and AWS Glue tables are original Amazon DataZone assets. With this integration, we introduce two more asset types: SageMaker Feature Groups and Model Package Groups.
Owned assets – A collection of project inventory assets discoverable only by project members. These are the staging assets in the project inventory that are not available to Amazon DataZone domain users until they are explicitly published to the Amazon DataZone business catalog.
Asset catalog – A collection of published assets in the Amazon DataZone business catalog discoverable across your organization with business context, thereby enabling everyone in your organization to find assets quickly for their use case.
Subscribed assets – A collection of assets the subscriber has been approved from the Amazon DataZone business catalog. Owners of those assets have to approve the request for access before the subscriber can consume them.

The following diagram shows an example of an ML asset like Customer-Churn-Model lifecycle with the described components.

In the following sections, we show you the user experience of the SageMaker and Amazon DataZone integration with an example. We demonstrate how to set up Amazon DataZone, including a domain, project, and SageMaker environment, and how to perform asset management using SageMaker Studio. The following diagram illustrates our workflow.

Set up an Amazon DataZone domain, project, and SageMaker environment
On the Amazon DataZone console, administrators create an Amazon DataZone domain, get access to the Amazon DataZone data portal, and provision a new project with access to specific data and users.
Administrators use the SageMaker blueprint that has enterprise level security controls to setup the SageMaker environment profile. Then, the SageMaker infrastructure with appropriate organizational boundaries will deploy in minutes so that ML builders can start using it for their ML use cases.
In the Amazon DataZone data portal, ML builders can create or join a project to collaborate on the business problem being solved. To start their ML use case in SageMaker, they use the SageMaker environment profile made by the administrators to create a SageMaker environment or use an existing one.
ML builders can then seamlessly federate into SageMaker Studio from the Amazon DataZone data portal with just a few clicks. The following actions can happen in SageMaker Studio:

Subscribe – SageMaker allows you to find, access, and consume the assets in the Amazon DataZone business catalog. When you find an asset in the catalog that you want to access, you need to subscribe to the asset, which creates a subscription request to the asset owner.
Publish – SageMaker allows you to publish your assets and their metadata as an owner of the asset to the Amazon DataZone business catalog so that others in the organization can subscribe and consume in their ML use cases.

Perform asset management using SageMaker Studio
In SageMaker Studio, ML builders can search, discover, and subscribe to data and ML assets in their business catalog. They can consume these assets for ML workflows such as data preparation, model training, and feature engineering in SageMaker Studio and SageMaker Canvas. Upon completing the ML tasks, ML builders can publish data, models, and feature groups to the business catalog for governance and discoverability.
Search and discover assets
After ML builders are federated into SageMaker Studio, they can view the Assets option in the navigation pane.
On the Assets page, ML builders can search and discover data assets and ML assets without additional administrator overhead.
The search result displays all the assets corresponding to the search criteria, including a name and description. ML builders can further filter by the type of asset to narrow down their results. The following screenshot is an example of available assets from a search result.

Subscribe to assets
After ML builders discover the asset from their search results, they can choose the asset to see details such as schema or metadata to understand whether the asset is useful for their use case.
To gain access to the asset, choose Subscribe to initiate the request for access from the asset owner. This action allows data governance for the asset owners to determine which members of the organization can access their assets.
The owner of the asset will be able to see the request in the Incoming subscription requests section on the Assets page. The asset owners can approve or reject the request with justifications. ML builders will also be able to see the corresponding action on the Assets page in the Outgoing subscription requests section. The following screenshot shows an example of managing asset requests and the Subscribed assets tab. In the next steps, we demonstrate how a subscribed data asset like mkt_sls_table and an ML asset like Customer-Churn-Model are used within SageMaker.

Consume subscribed assets
After ML builders are approved to access the subscribed assets, they can choose to use Amazon SageMaker Canvas or JupyterLab within SageMaker Studio. In this section, we explore the scenarios in which ML builders can consume the subscribed assets.
Consume a subscribed Model Package Group in SageMaker Studio
ML builders can see all the subscribed Model Package Groups in SageMaker Studio by choosing Open in Model Registry on the asset details page. ML builders are also able to consume the subscribed model by deploying the model to an endpoint for prediction. The following screenshot shows an example of opening a subscribed model asset.

Consume a subscribed data asset in SageMaker Canvas
When ML builders open the SageMaker Canvas app from SageMaker Studio, they are able to use Amazon SageMaker Data Wrangler and datasets. ML builders can view their subscribed data asset to perform experimentation and build models. As part of this integration, ML builders can view their subscribed assets under sub_db, and publish their assets via pub_db.The created models can then be registered in the Amazon SageMaker Model Registry from SageMaker Canvas. The following screenshot is an example of the subscribed asset mkt_sls_table for data preparation in SageMaker Canvas.

Consume a subscribed data asset in JupyterLab notebooks
ML builders can navigate to JupyterLab in SageMaker Studio to open a notebook and start their data experimentation. In JupyterLab notebooks, ML builders are able to see the subscribed data assets to query in their notebook and consume for experimentation and model building. The following screenshot is an example of the subscribed asset mkt_sls_table for data preparation in SageMaker Studio.

Publish assets
After experimentation and analysis, ML builders are able to share the assets with the rest of the organization by publishing them to the Amazon DataZone business catalog.  They can also make their assets only available to the project members by just publishing to the project inventory. ML builders can achieve these tasks by using the SageMaker SDK or publishing directly from SageMaker Studio.
You can publish ML assets by navigating to the specific asset tab and choosing Publish to asset catalog or Publish to inventory. The following screenshot show how you can publish feature group to asset catalog.

The following screenshot show how you can also publish model group to asset catalog or project inventory.

On the Assets page, you can use the data source feature to publish data assets like an AWS Glue table or Redshift table.

Conclusion
Governance is a multi-faceted discipline that encompasses controls across infrastructure management, data management, model management, access management, policy management, and more. ML governance plays a key role for organizations to successfully scale their ML usage across a wide range of use cases and also mitigate technical and operational risks.
The new SageMaker and Amazon DataZone integration enables your organization to streamline infrastructure controls and permissions, in addition to data and ML asset governance in ML projects. The provisioned ML environment is secure, scalable, and reliable for your teams to access data and ML assets, and build and train ML models.
We would like to hear from you on how this new capability is helping your ML governance use cases. Be on the lookout for more data and ML governance blog posts. Try out this new SageMaker integration for ML governance capability and leave your comments in the comments section.

About the authors
Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, digital transformation, and enabling automation to improve overall organizational efficiency and productivity. He has over 7 years of automation experience deploying various technologies. In his spare time, Siamak enjoys exploring the outdoors, long-distance running, and playing sports.
Kareem Syed-Mohammed is a Product Manager at AWS. He is focused on ML Observability and ML Governance. Prior to this, at Amazon QuickSight, he led embedded analytics, and developer experience. In addition to QuickSight, he has been with AWS Marketplace and Amazon retail as a Product Manager. Kareem started his career as a developer for call center technologies, Local Expert and Ads for Expedia, and management consultant at McKinsey.
Dr. Sokratis Kartakis is a Principal Machine Learning and Operations Specialist Solutions Architect at AWS. Sokratis focuses on enabling enterprise customers to industrialize their Machine Learning (ML) and generative AI solutions by exploiting AWS services and shaping their operating model, i.e. MLOps/FMOps/LLMOps foundations, and transformation roadmap leveraging best development practices. He has spent 15+ years on inventing, designing, leading, and implementing innovative end-to-end production-level ML and AI solutions in the domains of energy, retail, health, finance, motorsports etc.
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.

Bayesian Optimization for Preference Elicitation with Large Language M …

Imagine you’re trying to help a friend find their favorite movie to watch, but they’re not quite sure what they’re in the mood for. You could list random movie titles and see if any pique their interest, but that’s pretty inefficient, right? The researchers behind this work had a similar problem – they wanted to build conversational recommender systems that can quickly learn a user’s preferences for items (like movies, restaurants, etc.) through natural language dialogues without needing any prior data about those preferences.

The traditional approach would be to have the user rate or compare items directly. But that’s not feasible when the user is unfamiliar with most of the items. Large language models (LLMs) like GPT-3 can be a potential solution because these powerful AI models can understand and generate human-like text, so in theory, they could engage in back-and-forth conversations to intuitively elicit someone’s preferences.

However, the researchers realized that simply prompting an LLM with a bunch of item descriptions and telling it to have a preference-eliciting conversation has some major limitations. For one, feeding the LLM detailed descriptions of every item is computationally expensive. More importantly, monolithic LLMs lack the strategic reasoning to actively guide the conversation toward exploring the most relevant preferences while avoiding getting stuck on irrelevant tangents.

Reference: https://arxiv.org/pdf/2405.00981

So, what did the researchers do? They developed a novel algorithm called PEBOL (Preference Elicitation with Bayesian Optimization Augmented LLMs) that combines the language understanding capabilities of LLMs with a principled Bayesian optimization framework for efficient preference elicitation. Here’s a high-level overview of how it works (shown in Figure 2):

1. Modeling User Preferences: PEBOL starts by assuming there’s some hidden “utility function” that determines how much a user would prefer each item based on its description. It uses probability distributions (specifically, Beta distributions) to model the uncertainty in these utilities.

2. Natural Language Queries: At each conversation turn, PEBOL uses decision-theoretic strategies like Thompson Sampling and Upper Confidence Bound to select one item description. It then prompts the LLM to generate a short, aspect-based query about that item (e.g., “Are you interested in movies with patriotic themes?”).

3. Inferring Preferences via NLI: When the user responds (e.g., “Yes” or “No”), PEBOL doesn’t take that at face value. Instead, it uses a Natural Language Inference model to predict how likely it is that the user’s response implies a preference for (or against) each item description.

4. Bayesian Belief Updates: Using these predicted preferences as observations, PEBOL updates its probabilistic beliefs about the user’s utilities for each item. This allows it to systematically explore unfamiliar preferences while exploiting what it’s already learned.

5. Repeat: The process repeats, with PEBOL generating new queries focused on the items/aspects it’s most uncertain about, ultimately aiming to identify the user’s most preferred items.

The key innovation here is using LLMs for natural query generation while leveraging Bayesian optimization to strategically guide the conversational flow. This approach reduces the context needed for each LLM prompt and provides a principled way to balance the exploration-exploitation trade-off.

The researchers evaluated PEBOL through simulated preference elicitation dialogues across three datasets: MovieLens25M, Yelp, and Recipe-MPR. They compared it against a monolithic GPT-3.5 baseline (MonoLLM) prompted with full item descriptions and dialogue history.

For fair comparison, they limited the item set size to 100 due to context constraints. Performance was measured by Mean Average Precision at 10 (MAP@10) over 10 conversational turns with simulated users.

In their experiments, PEBOL achieved MAP@10 improvements of 131% on Yelp, 88% on MovieLens, and 55% on Recipe-MPR over MonoLLM after just 10 turns. While MonoLLM exhibited major performance drops (e.g., on Recipe-MPR between turns 4-5), PEBOL’s incremental belief updates made it more robust against such catastrophic errors. PEBOL also consistently outperformed MonoLLM under simulated user noise conditions. On Yelp and MovieLens, MonoLLM was the worst performer across all noise levels, while on Recipe-MPR, it trailed behind PEBOL’s UCB, Greedy, and Entropy Reduction acquisition policies.

While PEBOL is a promising first step, the researchers acknowledge there’s still more work to be done. For example, future versions could explore generating contrastive multi-item queries or integrating this preference elicitation approach into broader conversational recommendation systems. But overall, by combining the strengths of LLMs and Bayesian optimization, PEBOL offers an intriguing new paradigm for building AI systems that can converse with users in natural language to understand their preferences better and provide personalized recommendations.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post Bayesian Optimization for Preference Elicitation with Large Language Models appeared first on MarkTechPost.

LLMClean: An AI Approach for the Automated Generation of Context Model …

The burgeoning expansion of the data landscape, propelled by the Internet of Things (IoT), presents a pressing challenge: ensuring data quality amidst the deluge of information. With IoT devices increasingly interconnected and data acquisition costs declining, enterprises are capitalizing on this wealth of data to inform strategic decisions. 

However, the quality of that data is paramount, especially given the escalating reliance on Machine Learning (ML) across various industries. Poor-quality training data can introduce biases and inaccuracies, undermining the efficacy of ML applications. Real-world data often harbors inaccuracies such as duplications, null entries, anomalies, and inconsistencies, posing significant obstacles to data quality.

Efforts to mitigate data quality issues have led to the development of automated data cleaning tools. However, many of these tools need more context awareness, which is crucial for effectively cleaning data within ML workflows. Contextual information elucidates the data’s meaning, relevance, and relationships, ensuring alignment with real-world phenomena.

Context-aware data cleaning tools offer promise, leveraging Ontological Functional Dependencies (OFDs) extracted from context models. OFDs provide an advanced mechanism for capturing semantic relationships between attributes, enhancing error detection and correction precision.

Despite the efficacy of OFD-based cleaning tools, manual construction of context models presents practical challenges, particularly for real-time applications. The labor-intensive nature of manual methods, coupled with the need for domain expertise and scalability concerns, underscores the necessity for automation. 

In response, the proposed solution, LLMClean, leverages large language models (LLMs) to automatically generate context models from real-world data, obviating the need for supplementary meta-information. By automating this process, LLMClean addresses the scalability, adaptability, and consistency challenges inherent in manual methods.

LLMClean encompasses a three-stage architectural framework, integrating LLM models, context models, and data-cleaning tools to effectively identify erroneous instances in tabular data. The method includes dataset classification, model extraction or mapping, and context model generation.

By leveraging automatically generated OFDs, LLMClean provides a robust data cleaning and analytical framework tailored to the evolving nature of real-world data, including IoT datasets. Additionally, LLMClean introduces Sensor Capability Dependencies and Device-Link Dependencies, which are critical for precise error detection.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post LLMClean: An AI Approach for the Automated Generation of Context Models Utilizing Large Language Models to Analyze and Understand Various Datasets appeared first on MarkTechPost.

Meet ZleepAnlystNet: A Novel Deep Learning Model for Automatic Sleep S …

Sleep studies have long been vital to understanding human health, providing insights into how rest affects mental and physical well-being. Polysomnography, which is the standard for diagnosing sleep disorders, utilizes an array of sensors to measure signals during sleep, such as brain waves (EEG), eye movements (EOG), and muscle activity (EMG). Despite its importance, the traditional approach to analyzing these data, manual sleep stage classification, is labor-intensive and prone to inconsistencies due to human error.

Researchers have turned to automated methods to improve accuracy and reduce the burden on sleep technicians. Current computerized systems employ machine learning techniques, from shallow learning that relies on hand-crafted features to more advanced deep learning models that extract features directly from raw EEG data. These technologies aim to mimic the precision of human analysts while surpassing their speed and endurance.

Researchers from Mahidol University introduced a breakthrough known as ZleepAnlystNet, which presents a sophisticated deep-learning framework designed specifically for sleep stage classification. This model utilizes a ‘separating training’ method, where individual components are trained separately to enhance their specific abilities to recognize sleep stages. The system incorporates fifteen convolutional neural networks (CNNs) for feature extraction, each tailored to capture different aspects of the EEG signals and a bidirectional long-short-term memory (BiLSTM) network for sequence classification.

The efficacy of ZleepAnlystNet is notable, with the model achieving an overall accuracy of 87.02%, a macro F1 score (MF1) of 82.09%, and a kappa coefficient of 0.8221, indicating excellent agreement with standard sleep stage scoring. This performance significantly improved over previous models, which often struggled with specific stages like N1, where ZleepAnlystNet manages a per-class F1 score of 54.23%. The model’s ability to consistently identify other stages like Wake (W), N2, N3, and rapid eye movement (REM) with F1 scores of 90.34%, 89.53%, 88.96%, and 87.40% respectively, also stands out.

Cross-dataset validation further illustrates the model’s robustness, showing strong performance metrics even when applied to external datasets, demonstrating its potential for widespread clinical use. The training approach, which isolates and optimizes different model components, has proven crucial in achieving these results. This method also allows for precise adjustments to the model’s architecture, ensuring each part performs optimally without compromising the system’s overall effectiveness.

In conclusion, ZleepAnlystNet represents an advancement in sleep research, offering a powerful tool for accurately and efficiently classifying sleep stages. Its development marks a step forward in the automation of sleep analysis and sets a new standard for integrating deep learning technologies in medical diagnostics. By reducing dependency on manual scoring and increasing reliability, this model paves the way for better understanding and treatment of sleep-related disorders, promising to profoundly impact the field of sleep medicine.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post Meet ZleepAnlystNet: A Novel Deep Learning Model for Automatic Sleep Stage Scoring based on Single-Channel Raw EEG Data Using Separating Training appeared first on MarkTechPost.

Boost employee productivity with automated meeting summaries using Ama …

The prevalence of virtual business meetings in the corporate world, largely accelerated by the COVID-19 pandemic, is here to stay. Based on a survey conducted by American Express in 2023, 41% of business meetings are expected to take place in hybrid or virtual format by 2024. Attending multiple meetings daily and keeping track of all ongoing topics gets increasingly more difficult to manage over time. This can have a negative impact in many ways, from delayed project timelines to loss of customer trust. Writing meeting summaries is the usual remedy to overcome this challenge, but it disturbs the focus required to listen to ongoing conversations.
A more efficient way to manage meeting summaries is to create them automatically at the end of a call through the use of generative artificial intelligence (AI) and speech-to-text technologies. This allows attendees to focus solely on the conversation, knowing that a transcript will be made available automatically at the end of the call.
This post presents a solution to automatically generate a meeting summary from a recorded virtual meeting (for example, using Amazon Chime) with several participants. The recording is transcribed to text using Amazon Transcribe and then processed using Amazon SageMaker Hugging Face containers to generate the meeting summary. The Hugging Face containers host a large language model (LLM) from the Hugging Face Hub.
If you prefer to generate post call recording summaries with Amazon Bedrock rather than Amazon SageMaker, checkout this Bedrock sample solution. For a generative AI powered Live Meeting Assistant that creates post call summaries, but also provides live transcripts, translations, and contextual assistance based on your own company knowledge base, see our new LMA solution.
Solution overview
The entire infrastructure of the solution is provisioned using the AWS Cloud Development Kit (AWS CDK), which is an infrastructure as code (IaC) framework to programmatically define and deploy AWS resources. The framework provisions resources in a safe, repeatable manner, allowing for a significant acceleration of the development process.
Amazon Transcribe is a fully managed service that seamlessly runs automatic speech recognition (ASR) workloads in the cloud. The service allows for simple audio data ingestion, easy-to-read transcript creation, and accuracy improvement through custom vocabularies. Amazon Transcribe’s new ASR foundation model supports 100+ language variants. In this post, we use the speaker diarization feature, which enables Amazon Transcribe to differentiate between a maximum of 10 unique speakers and label a conversation accordingly.
Hugging Face is an open-source machine learning (ML) platform that provides tools and resources for the development of AI projects. Its key offering is the Hugging Face Hub, which hosts a vast collection of over 200,000 pre-trained models and 30,000 datasets. The AWS partnership with Hugging Face allows a seamless integration through SageMaker with a set of Deep Learning Containers (DLCs) for training and inference, and Hugging Face estimators and predictors for the SageMaker Python SDK.
Generative AI CDK Constructs, an open-source extension of AWS CDK, provides well-architected multi-service patterns to quickly and efficiently create repeatable infrastructure required for generative AI projects on AWS. For this post, we illustrate how it simplifies the deployment of foundation models (FMs) from Hugging Face or Amazon SageMaker JumpStart with SageMaker real-time inference, which provides persistent and fully managed endpoints to host ML models. They are designed for real-time, interactive, and low-latency workloads and provide auto scaling to manage load fluctuations. For all languages that are supported by Amazon Transcribe, you can find FMs from Hugging Face supporting summarization in corresponding languages
The following diagram depicts the automated meeting summarization workflow.

The workflow consists of the following steps:

The user uploads the meeting recording as an audio or video file to the project’s Amazon Simple Storage Service (Amazon S3) bucket, in the /recordings folder.
Every time a new recording is uploaded to this folder, an AWS Lambda Transcribe function is invoked and initiates an Amazon Transcribe job that converts the meeting recording into text. Transcripts are then stored in the project’s S3 bucket under /transcriptions/TranscribeOutput/.
This triggers the Inference Lambda function, which preprocesses the transcript file into an adequate format for ML inference, stores it in the project’s S3 bucket under the prefix /summaries/InvokeInput/processed-TranscribeOutput/, and invokes a SageMaker endpoint. The endpoint hosts the Hugging Face model that summarizes the processed transcript. The summary is loaded into the S3 bucket under the prefix /summaries. Note that the prompt template used in this example includes a single instruction, however for more sophisticated requirements the template can be easily extended to tailor the solution to your own use case.
This S3 event triggers the Notification Lambda function, which pushes the summary to an Amazon Simple Notification Service (Amazon SNS) topic.
All subscribers of the SNS topic (such as meeting attendees) receive the summary in their email inbox.

In this post, we deploy the Mistral 7B Instruct, an LLM available in the Hugging Face Model Hub, to a SageMaker endpoint to perform the summarization tasks. Mistral 7B Instruct is developed by Mistral AI. It is equipped with over 7 billion parameters, enabling it to process and generate text based on user instructions. It has been trained on a wide-ranging corpus of text data to understand various contexts and nuances of language. The model is designed to perform tasks such as answering questions, summarizing information, and creating content, among others, by following specific prompts given by users. Its effectiveness is measured through metrics like perplexity, accuracy, and F1 score, and it is fine-tuned to respond to instructions with relevant and coherent text outputs.
Prerequisites
To follow along with this post, you should have the following prerequisites:

Python version greater than 3.9
AWS CDK version 2.0

Deploy the solution
To deploy the solution in your own AWS account, refer to the GitHub repository to access the full source code of the AWS CDK project in Python:

git clone https://github.com/aws-samples/audio-conversation-summary-with-hugging-face-and-transcribe.git
cd audio-conversation-summary-with-hugging-face-and-transcribe/infrastructure
pip install -r requirements.txt

If you are deploying AWS CDK assets for the first time in your AWS account and the AWS Region you specified, you need to run the bootstrap command first. It sets up the baseline AWS resources and permissions required for AWS CDK to deploy AWS CloudFormation stacks in a given environment:

cdk bootstrap aws://<ACCOUNT_ID>/<AWS_REGION>

Finally, run the following command to deploy the solution. Specify the summary’s recipient mail address in the SubscriberEmailAddress parameter:

cdk deploy –parameters SubscriberEmailAddress=”<SUBSCRIBER_MAIL_ADDRESS>”

Test the solution
We have provided a few sample meeting recordings in the data folder of the project repository. You can upload the test.mp4 recording into the project’s S3 bucket under the /recordings folder. The summary will be saved in Amazon S3 and sent to the subscriber. The end-to-end duration is approximately 2 minutes given an input of approximately 250 tokens.
The following figure shows the input conversation and output summary.

Limitations
This solution has the following limitations:

The model provides high-accuracy completions for English language. You can use other languages such as Spanish, French, or Portuguese, but the quality of the completions may degrade. You can find other Hugging Face models that are better suited for other languages.
The model used in this post is limited by a context length of approximately 8,000 tokens, which equates to approximately 6,000 words. If a larger context length is required, you can replace the model by referencing the new model ID in the respective AWS CDK construct.
Like other LLMs, Mistral 7B Instruct may hallucinate, generating content that strays from factual reality or includes fabricated information.
The format of the recordings must be either .mp4, .mp3, or .wav.

Clean up
To delete the deployed resources and stop incurring charges, run the following command:

cdk destroy

Alternatively, to use the AWS Management Console, complete the following steps:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Select the stack called Text-summarization-Infrastructure-stack and choose Delete.

Conclusion
In this post, we proposed an architecture pattern to automatically transform your meeting recordings into insightful conversation summaries. This workflow showcases how the AWS Cloud and Hugging Face can help you accelerate with your generative AI application development by orchestrating a combination of managed AI services such as Amazon Transcribe, and externally sourced ML models from the Hugging Face Hub such as those from Mistral AI.
If you are eager to learn more about how conversation summaries can apply to a contact center environment, you can deploy this technique in our suite of solutions for Live Call Analytics and Post Call Analytics.
References
Mistral 7B release post, by Mistral AI
Our team
This post has been created by AWS Professional Services, a global team of experts that can help realize desired business outcomes when using the AWS Cloud. We work together with your team and your chosen member of the AWS Partner Network (APN) to implement your enterprise cloud computing initiatives. Our team provides assistance through a collection of offerings that help you achieve specific outcomes related to enterprise cloud adoption. We also deliver focused guidance through our global specialty practices, which cover a variety of solutions, technologies, and industries.

About the Authors
Gabriel Rodriguez Garcia is a Machine Learning engineer at AWS Professional Services in Zurich. In his current role, he has helped customers achieve their business goals on a variety of ML use cases, ranging from setting up MLOps inference pipelines to developing a fraud detection application. Whenever he is not working, he enjoys doing physical activities, listening to podcasts, or reading books.
Jahed Zaïdi is an AI & Machine Learning specialist at AWS Professional Services in Paris. He is a builder and trusted advisor to companies across industries, helping businesses innovate faster and on a larger scale with technologies ranging from generative AI to scalable ML platforms. Outside of work, you will find Jahed discovering new cities and cultures, and enjoying outdoor activities.
Mateusz Zaremba is a DevOps Architect at AWS Professional Services. Mateusz supports customers at the intersection of machine learning and DevOps specialization, helping them to bring value efficiently and securely. Beyond tech, he is an aerospace engineer and avid sailor.
Kemeng Zhang is currently working at AWS Professional Services in Zurich, Switzerland, with a specialization in AI/ML. She has been part of multiple NLP projects, from behavioral change in digital communication to fraud detection. Apart from that, she is interested in UX design and playing cards.

How Veritone uses Amazon Bedrock, Amazon Rekognition, Amazon Transcrib …

This post is co-written with Tim Camara, Senior Product Manager at Veritone.
Veritone is an artificial intelligence (AI) company based in Irvine, California. Founded in 2014, Veritone empowers people with AI-powered software and solutions for various applications, including media processing, analytics, advertising, and more. It offers solutions for media transcription, facial recognition, content summarization, object detection, and other AI capabilities to solve the unique challenges professionals face across industries.
Veritone began its journey with its foundational AI operating system, aiWARETM, solving industry and brand-specific challenges by building applications on top of this powerful technology. Growing in the media and entertainment space, Veritone solves media management, broadcast content, and ad tracking issues. Alongside these applications, Veritone offers media services including AI-powered audio advertising and influencer marketing, content licensing and media monetization services, and professional services to build bespoke AI solutions.
With a decade of enterprise AI experience, Veritone supports the public sector, working with US federal government agencies, state and local government, law enforcement agencies, and legal organizations to automate and simplify evidence management, redaction, person-of-interest tracking, and eDiscovery. Veritone has also expanded into the talent acquisition space, serving HR teams worldwide with its powerful programmatic job advertising platform and distribution network.
Using generative AI and new multimodal foundation models (FMs) could be very strategic for Veritone and the businesses they serve, because it would significantly improve media indexing and retrieval based on contextual meaning—a critical first step to eventually generating new content. Building enhanced semantic search capabilities that analyze media contextually would lay the groundwork for creating AI-generated content, allowing customers to produce customized media more efficiently.
Veritone’s current media search and retrieval system relies on keyword matching of metadata generated from ML services, including information related to faces, sentiment, and objects. With recent advances in large language models (LLMs), Veritone has updated its platform with these powerful new AI capabilities. Looking ahead, Veritone wants to take advantage of new advanced FM techniques to improve the quality of media search results of “Digital Media Hub”( DMH ) and grow the number of users by achieving a better user experience.
In this post, we demonstrate how to use enhanced video search capabilities by enabling semantic retrieval of videos based on text queries. We match the most relevant videos to text-based search queries by incorporating new multimodal embedding models like Amazon Titan Multimodal Embeddings to encode all visual, visual-meta, and transcription data. The primary focus is building a robust text search that goes beyond traditional word-matching algorithms as well as an interface for comparing search algorithms. Additionally, we explore narrowing retrieval to specific shots within videos (a shot is a series of interrelated consecutive pictures taken contiguously by a single camera representing a continuous action in time and space). Overall, we aim to improve video search through cutting-edge semantic matching, providing an efficient way to find videos relevant to your rich textual queries.
Solution overview
We use the following AWS services to implement the solution:

Amazon Bedrock and the Amazon Titan Multimodal Embeddings and Amazon Titan Text models
Amazon Comprehend
AWS Lambda
Amazon OpenSearch Service
Amazon Rekognition
Amazon Simple Storage Service (Amazon S3)
Amazon Transcribe

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon within a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
The current architecture consists of three components:

Metadata generation – This component generates metadata from a video archive, processes it, and creates embeddings for search indexing. The videos from Amazon S3 are retrieved and converted to H264 vcodec format using the FFmpeg library. The processed videos are sent to AWS services like Amazon Rekognition, Amazon Transcribe, and Amazon Comprehend to generate metadata at shot level and video level. We use the Amazon Titan Text and Multimodal Embeddings models to embed the metadata and the video frames and index them in OpenSearch Service. We use AWS Step Functions to orchestrate the entire pipeline.
Search – A UI-based video search pipeline takes in the user query as input and retrieves relevant videos. The user query invokes a Lambda function. Based on the search method selected, you either perform a text- or keyword-based search or an embedding-based search. The search body is sent to OpenSearch Service to retrieve video results at the shot level, which is displayed to the user.
Evaluation – The UI enables you to perform qualitative evaluation against different search settings. You enter a query and, based on the search settings, video results are retrieved from OpenSearch. You can view the results and provide feedback by voting for the winning setting.

The following diagram illustrates the solution architecture.

The high-level takeaways from this work are the following:

Using an Amazon Rekognition API to detect shots and index them achieved better retrieving recall (at least 50% improvement) than performing the same on the video level
Incorporating the Amazon Titan Text Embeddings model to semantically retrieve the video results instead of using raw text generated by Amazon Rekognition and Amazon Transcribe boosted the recall performance by 52%
The Amazon Titan Multimodal Embeddings model showed high capability to encode visual information of video image frames and achieved the best performance when combined with text embeddings of Amazon Rekognition and Amazon Transcribe text metadata, improving on baseline metrics by up to three times
The A/B evaluation UI that we developed to test new search methods and features proved to be effective

Detailed quantitative analysis of these conclusions is discussed later in this post.
Metadata generation pipeline
The video metadata generation pipeline consists of processing video files using AWS services such as Amazon Transcribe, Amazon Rekognition, and Amazon Comprehend, as shown in the following diagram. The metadata is generated at the shot level for a video.

In this section, we discuss the details of each service and the workflow in more detail.
Amazon Transcribe
The transcription for the entire video is generated using the StartTranscriptionJob API. When the job is complete, you can obtain the raw transcript data using GetTranscriptionJob. The GetTranscriptionJob returns a TranscriptFileUri, which can be processed to get the speakers and transcripts based on a timestamp. The file formats supported by Amazon Transcribe are AMR, FLAC (recommended), M4A, MP3, MP4, Ogg, WebM, and WAV (recommended).
The raw transcripts are further processed to be stored using timestamps, as shown in the following example.

Amazon Rekognition
Amazon Rekognition requires the video to be encoded using the H.264 codec and formatted to either MPEG-4 or MOV. We used FFmpeg to format the videos in Amazon S3 to the required vcodec. FFmpeg is a free and open-source software project in the form of a command line tool designed for processing video, audio, and other multimedia files and streams. Python provides a wrapper library around the tool called ffmpeg-python.
The solution runs Amazon Rekognition APIs for label detection, text detection, celebrity detection, and face detection on videos. The metadata generated for each video by the APIs is processed and stored with timestamps. The videos are then segmented into individual shots. With Amazon Rekognition, you can detect the start, end, and duration of each shot as well as the total shot count for a content piece. The video shot detection job starts with the StartSegmentDetection API, which returns a jobId that can be used to monitor status with the GetSegmentDetection API. When the video segmentation status changes to Succeeded, for each shot, you parse the previously generated Amazon Rekognition API metadata using the shot’s timestamp. You then append this parsed metadata to the shot record. Similarly, the full transcript from Amazon Transcribe is segmented using the shot start and end timestamps to create shot-level transcripts.
Amazon Comprehend
The temporal transcripts are then processed by Amazon Comprehend to detect entities and sentiments using the DetectEntities, DetectSentiment, and DetectTargetedSentiment APIs. The following code gives more details on the API requests and responses used to generate metadata by using sample shot-level metadata generated for a video:

Metadata processing
The shot-level metadata generated by the pipeline is processed to stage it for embedding generation. The goal of this processing is to aggregate useful information and remove null or less significant information that wouldn’t add value for embedding generation.
The processing algorithm is as follows:
rekognition_metadata   – shot_metadata: extract StartFrameNumber and EndFrameNumber   – celeb_metadata: extract celeb_metadata   – label_metadata: extract unique labels   – text_metadata: extract unique text labels if there are more than 3 words (comes noisy with “-“, “null” and other values)   – face_analysis_metadata: extract unique list of AgeRange, Emotions, Gender We combine all rekognition text data into `rek_text_metadata` string transcribe_metadata   – transcribe_metadata: check the wordcount of the conversation across all speakers. if it is more than 50 words, mark it for summarization task with Amazon Bedrock comprehend_metadata   – comprehend_metadata: extract sentiment   – comprehend_metadata: extract target sentiment scores for words with score > 0.9
Large transcript summarization
Large transcripts from the processed metadata are summarized through the Anthropic Claude 2 model. After summarizing the transcript, we extract the names of the key characters mentioned in the summary as well the important keywords.
Embeddings generation
In this section, we discuss the details for generating shot-level and video-level embeddings.
Shot-level embeddings
We generate two types of embeddings: text and multimodal. To understand which metadata and service contributes to the search performance and by how much, we create a varying set of embeddings for experimental analysis.
We implement the following with Amazon Titan Multimodal Embeddings:

Embed image:

TMM_shot_img_embs – We sample the middle frame from every shot and embed them. We assume the middle frame in the shot captures the semantic nuance in the entire shot. You can also experiment with embedding all the frames and averaging them.
TMM_rek_text_shot_emb – We sample the middle frame from every shot and embed it along with Amazon Rekognition text data.
TMM_transcribe_shot_emb – We sample the middle frame from every shot and embed it along with Amazon Transcribe text data.

Embed text (to compare if the text data is represented well with the LLM or multimodal model, we also embed them with Amazon Titan Multimodal):

TMM_rek_text_emb – We embed the Amazon Rekognition text as multimodal embeddings without the images.
TMM_transcribe_emb – We embed the Amazon Transcribe text as multimodal embeddings without the images.

We implement the following with the Amazon Titan Text Embeddings model:

Embed text:

TT_rek_text_emb – We embed the Amazon Rekognition text as text embeddings
TT_transcribe_emb – We embed the Amazon Transcribe text as text embeddings

Video-level embeddings
If a video has only one shot (a small video capturing a single action), the embeddings will be the same as shot-level embeddings.
For videos that have more than one shot, we implement the following using the Amazon Titan Multimodal Embeddings Model:

Embed image:

TMM_shot_img_embs – We sample K images with replacement across all the shot-level metadata, generate embeddings, and average them
TMM_rek_text_shot_emb – We sample K images with replacement across all the shot-level metadata, embed it along with Amazon Rekognition text data, and average them.
TMM_transcribe_shot_emb – We sample K images with replacement across all the shot-level metadata, embed it along with Amazon Transcribe text data, and average them

Embed text:

TMM_rek_text_emb – We combine all the Amazon Rekognition text data and embed it as multimodal embeddings without the images
TMM_transcribe_emb – We combine all the Amazon Transcribe text data and embed it as multimodal embeddings without the images

We implement the following using the Amazon Titan Text Embeddings model:

Embed text:

TT_rek_text_emb – We combine all the Amazon Rekognition text data and embed it as text embeddings
TT_transcribe_emb – We combine all the Amazon Transcribe text data and embed it as text embeddings

Search pipeline
In this section, we discuss the components of the search pipeline.
Search index creation
We use an OpenSearch cluster (OpenSearch Service domain) with t3.medium.search to store and retrieve indexes for our experimentation with text, knn_vector, and Boolean fields indexed. We recommend exploring Amazon OpenSearch Serverless for production deployment for indexing and retrieval. OpenSearch Serverless can index billions of records and has expanded its auto scaling capabilities to efficiently handle tens of thousands of query transactions per minute.
The following screenshots are examples of the text, Boolean, and embedding fields that we created.

Query flow
The following diagram illustrates the query workflow.

You can use a user query to compare the video records using text or semantic (embedding) search for retrieval.
For text-based retrieval, we use the search query as input to retrieve results from OpenSearch Service using the search fields transcribe_metadata, transcribe_summary, transcribe_keyword, transcribe_speakers, and rek_text_metadata:
OpenSearch Input
search_fields=[
“transcribe_metadata”,
“transcribe_summary”,
“transcribe_keyword”,
“transcribe_speakers”,
“rek_text_metadata”
]
search_body = {
“query”: {
“multi_match”: {
“query”: search_query,
“fields”: search_fields
}
}
}
For semantic retrieval, the query is embedded using the amazon.Titan-embed-text-v1 or amazon.titan-embed-image-v1 model, which is then used as an input to retrieve results from OpenSearch Service using the search field name, which could match with the metadata embedding of choice:
OpenSearch Input
search_body = {
“size”: <number of top results>,
“fields”: [“name”],
“query”: {
“knn”: {
vector_field: {“vector”: <embedding>, “k”: <length of embedding>}
}
},
}

Search results combination
Exact match and semantic search have their own benefits depending on the application. Users who search for a specific celebrity or movie name would benefit from an exact match search, whereas users looking for thematic queries like “summer beach vibes” and “candlelit dinner” would find semantic search results more applicable. To enable the best of both, we combine the results from both types of searches. Additionally, different embeddings could capture different semantics (for example, Amazon Transcribe text embedding vs. image embedding with a multimodal model). Therefore, we also explore combining different semantic search results.
To combine search results from different search methods and different score ranges, we used the following logic:

Normalize the scores from each results list independently to a common 0–1 range using rank_norm.
Sum the weighted normalized scores for each result video from all the search results.
Sort the results based on the score.
Return the top K results.

We use the rank_norm method, where the score is calculated based on the rank of each video in the list. The following is the Python implementation of this method:
def rank_norm(results):
n_results = len(results)
normalized_results = {}
for i, doc_id in enumerate(results.keys()):
normalized_results[doc_id] = 1 – (i / n_results)
ranked_normalized_results = sorted(
normalized_results.items(), key=lambda x: x[1], reverse=True
)
return dict(ranked_normalized_results)

Evaluation pipeline
In this section, we discuss the components of the evaluation pipeline.
Search and evaluation UI
The following diagram illustrates the architecture of the search and evaluation UI.

The UI webpage is hosted in an S3 bucket and deployed using Amazon CloudFront distributions. The current approach uses an API key for authentication. This can be enhanced by using Amazon Cognito and registering users. The user can perform two actions on the webpage:

Search – Enter the query to retrieve video content
Feedback – Based on the results displayed for a query, vote for the winning method

We create two API endpoints using Amazon API Gateway: GET /search and POST /feedback. The following screenshot illustrates our UI with two retrieval methods that have been anonymized for the user for a bias-free evaluation.

GET /search
We pass two QueryStringParameters with this API call:

query – The user input query
method – The method the user is evaluating

This API is created with a proxy integration with a Lambda function invoked. The Lambda function processes the query and, based on the method used, retrieves results from OpenSearch Service. The results are then processed to retrieve videos from the S3 bucket and displayed on the webpage. In the search UI, we use a specific method (search setting) to retrieve results:
Request ?query=<>&method=<>
Response
{
“results”: [
{“name”: <video-name>, “score”: <score>},
{“name”: <video-name>, “score”: <score>},

]
}

The following is a sample request:
?query=candlelit dinner&method=MethodB
The following screenshot shows our results.

POST /feedback
Given a query, each method will have video content and the video name displayed on the webpage. Based on the relevance of the results, the user can vote if a particular method has better performance over the other (win or lose) or if the methods are tied. The API has a proxy connection to Lambda. Lambda stores these results into an S3 bucket. In the evaluation UI, you can analyze the method search results to find the best search configuration setting. The request body includes the following syntax:
Request Body
{
“result”: <winning method>,
“searchQuery”:<query>,
“sessionId”:<current-session-id>,
“Method<>”:{
“methodType”: <Type of method used>,
“results”:”[{“name”:<video-name>,”score”:<score>}]”},
“Method<>”:{
“methodType”: <Type of method used>,
“results”:”[{“name”:”1QT426_s01″,”score”:1.5053753}]”}
}
The following screenshot shows a sample request.

Experiments and results
In this section, we discuss the datasets used in our experiments and the quantitative and qualitative evaluations based on the results.
Short videos dataset
This dataset includes 500 videos with an average length of 20 seconds. Each video has manually written metadata such as keywords and descriptions. In general, the videos in this dataset are related to travel, vacations, and restaurants topics.
The majority of videos are less than 20 seconds and the maximum is 400 seconds, as illustrated in the following figure.

Long videos dataset
The second dataset has 300 high-definition videos with a video length ranging from 20–160 minutes, as illustrated in the following figure.

Quantitative evaluation
We use the following metrics in our quantitative evaluation:

Mean reciprocal rank – Mean reciprocal rank (MRR) measures the inverse of the position number of the most relevant item in search results.
Recall@topK – We measure recall at topk as the percentage of correctly retrieved video out of the desired video search results (ground truth). For example:

A, B, C are related (GT) A, D, N, M, G are the TopK retrieved videos Recall @TOP5 = 1/3
We compute these metrics using a ground truth dataset provided by Veritone that had mappings of search query examples to relevant video IDs.
The following table summarizes the top three retrieval methods from the long videos dataset (% improvement over baseline).

Methods
Video Level: MRR vs. Video-level Baseline MRR
Shot Level: MRR vs. Video-level Baseline MRR
Video Level: Recall@top10 vs. Video-level Baseline Recall@top10
Shot Level: Recall@top10 vs. Video-level Baseline Recall@top10

Raw Text: Amazon Transcribe + Amazon Rekognition
Baseline comparison
N/A
.
.

Semantic: Amazon Transcribe + Amazon Rekognition
0.84%
52.41%
19.67%
94.00%

Semantic: Amazon Transcribe + Amazon Rekognition + Amazon Titan Multimodal
37.31%
81.19%
71.00%
93.33%

Semantic: Amazon Transcribe + Amazon Titan Multimodal
15.56%
58.54%
61.33%
121.33%

The following are our observations on the MRR and recall results:

Overall shot-level retrieval outperforms the video-level retrieval baseline across both MRR and recall metrics.
Raw text has lower MRR and recall scores than embedding-based search on both video and shot level. All three semantic methods show improvement in MRR and recall.
Combining semantic (Amazon Transcribe + Amazon Rekognition + Amazon Titan Multimodal) yields the best improvement across video MRR, shot MRR, and video recall metrics.

The following table summarizes the top three retrieval methods from the short videos dataset (% improvement over baseline).

Methods
Video Level: MRR vs. Video-level Baseline MRR
Shot Level: MRR vs. Video-level Baseline MRR
Video Level: Recall@top10 vs. Video-Level Baseline Recall@top10
Shot Level: Recall@top10 vs. Video-level Baseline Recall@top10

Raw Text: Amazon Transcribe + Amazon Rekognition
Baseline
N/A
Baseline
N/A

Semantic: Amazon Titan Multimodal
226.67%
226.67%
373.57%
382.61%

Semantic: Amazon Transcribe + Amazon Rekognition + Amazon Titan Multimodal
100.00%
60.00%
299.28%
314.29%

Semantic: Amazon Transcribe + Amazon Titan Multimodal
53.33%
53.33%
307.21%
312.77%

We made the following observations on the MRR and recall results:

Encoding the videos using the Amazon Titan Multimodal Embeddings model alone yields the best result compared to adding just Amazon Transcribe, Amazon Transcribe + Rekognition, or Amazon Transcribe + Amazon Rekognition + Amazon Titan Multimodal Embeddings (due to lack of dialogue and scene changes in these short videos)
All semantic retrieval methods (2, 3, and 4) should at least have 53% improvement over the baseline
Although Amazon Titan Multimodal alone works well for this data, it should be noted that other metadata like Amazon Transcribe, Amazon Rekognition, and pre-existing human labels as semantic representation retrieval can be augmented with Amazon Titan Multimodal Embeddings to improve performance depending on the nature of the data

Qualitative evaluation
We evaluated the quantitative results from our pipeline to find matches with the ground truth shared by Veritone. However, there could be other relevant videos in the retrieved results from our pipeline that are not part of the ground truth, which could further improve some of these metrics. Therefore, to qualitatively evaluate our pipeline, we used an A/B testing framework, where a user can view results from two anonymized methods (the metadata used by the method is not exposed to reduce any bias) and rate which results were more aligned with the query entered.
The aggregated results across the method comparison were used to calculate the win rate to select the final embedding method for search pipeline.
The following methods were shortlisted based on Veritone’s interest to reduce multiple comparison methods.

Method Name (Exposed to User)
Retrieval Type (Not Exposed to User)

Method E
Just semantic Amazon Transcribe retrieval results

Method F
Fusion of semantic Amazon Transcribe + Amazon Titan Multimodal retrieval results

Method G
Fusion of semantic Amazon Transcribe + semantic Amazon Rekognition + Amazon Titan Multimodal retrieval results

The following table summarizes the quantitative results and winning rate.
 

Experiment
Winning Method (Count of Queries)
.
.

Method E
Method F
Tie

Method E vs. Method F
10%
85%
5%

Method F
Method G
Tie

Method F vs. Method G
30%
60%
10%

Based on the results, we see that adding Amazon Titan Multimodal Embeddings to the transcription method (Method F) is better than just using semantic transcription retrieval (Method E). Adding Amazon Rekognition based retrieval results (Method G) improves over Method F.
Takeaways
We had the following key takeaways:

Enabling vector search indexing and retrieving instead of relying only on text matching with AI generated text metadata improves the search recall.
Indexing and retrieving videos at the shot level can boost performance and improve customer experience. Users can efficiently find precise clips matching their query rather than sifting through entire videos.
Multimodal representation of queries and metadata through models trained on both images and text have better performance over single modality representation from models trained on just textual data.
The fusion of text and visual cues significantly improves search relevance by capturing semantic alignments between queries and clips more accurately and semantically capturing the user search intent.
Enabling direct human comparison between retrieval models through A/B testing allows for inspecting and selecting the optimal approach. This can boost the confidence to ship new features or search methods to production.

Security best practices
We recommend the following security guidelines for building secure applications on AWS:

Building secure machine learning environments with Amazon SageMaker
Control root access to a SageMaker notebook instance
Amazon S3 security
Data protection in Amazon Cognito

Conclusion
In this post, we showed how Veritone upgraded their classical search pipelines with Amazon Titan Multimodal Embeddings in Amazon Bedrock through a few API calls. We showed how videos can be indexed in different representations, text vs. text embeddings vs. multimodal embeddings, and how they can be analyzed to produce a robust search based on the data characteristics and use case.
If you are interested in working with the AWS Generative AI Innovation Center, please reach out to the GenAIIC.

About the Authors

Tim Camara is a Senior Product Manager on the Digital Media Hub team at Veritone. With over 15 years of experience across a range of technologies and industries, he’s focused on finding ways to use emerging technologies to improve customer experiences.

Mohamad Al Jazaery is an Applied Scientist at the Generative AI Innovation Center. As a scientist and tech lead, he helps AWS customers envision and build GenAI solutions to address their business challenges in different domains such as Media and Entertainment, Finance, and Lifestyle.

Meghana Ashok is a Machine Learning Engineer at the Generative AI Innovation Center. She collaborates closely with customers, guiding them in developing secure, cost-efficient, and resilient solutions and infrastructure tailored to their generative AI needs.

Divya Bhargavi is a Senior Applied Scientist Lead at the Generative AI Innovation Center, where she solves high-value business problems for AWS customers using generative AI methods. She works on image/video understanding and retrieval, knowledge graph augmented large language models, and personalized advertising use cases.

Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he uses his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

Information extraction with LLMs using Amazon SageMaker JumpStart

Large language models (LLMs) have unlocked new possibilities for extracting information from unstructured text data. Although much of the current excitement is around LLMs for generative AI tasks, many of the key use cases that you might want to solve have not fundamentally changed. Tasks such as routing support tickets, recognizing customers intents from a chatbot conversation session, extracting key entities from contracts, invoices, and other type of documents, as well as analyzing customer feedback are examples of long-standing needs.
What makes LLMs so transformative, however, is their ability to achieve state-of-the-art results on these common tasks with minimal data and simple prompting, and their ability to multitask. Rather than requiring extensive feature engineering and dataset labeling, LLMs can be fine-tuned on small amounts of domain-specific data to quickly adapt to new use cases. By handling most of the heavy lifting, services like Amazon SageMaker JumpStart remove the complexity of fine-tuning and deploying these models.
SageMaker JumpStart is a machine learning (ML) hub with foundation models (FMs), built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. With SageMaker JumpStart, you can evaluate, compare, and select FMs quickly based on predefined quality and responsibility metrics to perform tasks like article summarization and image generation.
This post walks through examples of building information extraction use cases by combining LLMs with prompt engineering and frameworks such as LangChain. We also examine the uplift from fine-tuning an LLM for a specific extractive task. Whether you’re looking to classify documents, extract keywords, detect and redact personally identifiable information (PIIs), or parse semantic relationships, you can start ideating your use case and use LLMs for your natural language processing (NLP).
Prompt engineering
Prompt engineering enables you to instruct LLMs to generate suggestions, explanations, or completions of text in an interactive way. Prompt engineering relies on large pretrained language models that have been trained on massive amounts of text data. At first glance, there might not be one best way to design a prompt, and different LLMs might work better or worse with different prompts. Therefore, prompts are often iteratively refined through trial and error to produce better results. As a starting point, you can refer to the model documentation which typically includes recommendations and best practices for prompting the model, and examples provided in SageMaker JumpStart.
In the following sections, we focus on the prompt engineering techniques required for extractive use cases. They help unlock the power of LLMs by providing helpful constraints and guide the model toward its intended behavior. We discuss the following use cases:

Sensitive information detection and redaction
Entity extraction; generic and specific entities with structured formats
Classification, using prompt engineering and fine-tuning

Before we explore these use cases, we need to set up our development environment.
Prerequisites
The source code accompanying this example is available in this GitHub repo. It consists of several Jupyter notebooks and a utils.py module. The utils.py module houses the shared code that is used throughout the notebooks.
The simplest way to run this example is by using Amazon SageMaker Studio with the Data Science 3.0 kernel or an Amazon SageMaker notebook instance with the conda_python3 kernel. For the instance type, you can choose the default settings.
In this example, we use ml.g5.2xlarge and ml.g5.48xlarge instances for endpoint usage, and ml.g5.24xlarge for training job usage. Use the Service Quotas console to make sure you have sufficient quotas for these instances in the Region where you’re running this example.
We use Jupyter notebooks throughout this post. Before we explore the examples, it’s crucial to confirm that you have the latest version of the SageMaker Python SDK. This SDK offers a user-friendly interface for training and deploying models on SageMaker. To install or upgrade to the latest version, run the following command in the first cell of your Jupyter notebook:
%pip install –quiet –upgrade sagemaker
Deploy Llama-2-70b-chat using SageMaker JumpStart
There are many LLMs available in SageMaker JumpStart to choose from. In this example, we use Llama-2-70b-chat, but you might use a different model depending on your use case. To explore the list of SageMaker JumpStart models, see JumpStart Available Model Table.
To deploy a model from SageMaker JumpStart, you can use either APIs, as demonstrated in this post, or use the SageMaker Studio UI. After the model is deployed, you can test it by asking a question from the model:
from sagemaker.jumpstart.model import JumpStartModel

model_id, model_version = “meta-textgeneration-llama-2-70b-f”, “2.*”
endpoint_name = model_id
instance_type = “ml.g5.48xlarge”

model = JumpStartModel(
model_id=model_id, model_version=model_version, role=role_arn
)
predictor = model.deploy(
endpoint_name=endpoint_name, instance_type=instance_type
)
If no instance_type is provided, the SageMaker JumpStart SDK will select the default type. In this example, you explicitly set the instance type to ml.g5.48xlarge.
Sensitive data extraction and redaction
LLMs show promise for extracting sensitive information for redaction. This includes techniques such as prompt engineering, which includes priming the model to understand the redaction task, and by providing examples that can improve the performance. For example, priming the model by stating “redact sensitive information” and demonstrating a few examples of redacting names, dates, and locations can help the LLM infer the rules of the task.
More in-depth forms of priming the model include providing positive and negative examples, demonstrations of common errors, and in-context learning to teach the nuances of proper redaction. With careful prompt design, LLMs can learn to redact information while maintaining readability and utility of the document. In real-life applications, however, additional evaluation is often necessary to improve the reliability and safety of LLMs for handling confidential data. This is often achieved through the inclusion of human review, because no automated approach is entirely foolproof.
The following are a few examples of using prompt engineering for the extraction and redaction of PII. The prompt consists of multiple parts: the report_sample, which includes the text that you want to identify and mask the PII data within, and instructions (or guidance) passed on to the model as the system message.
report_sample = “””
This month at AnyCompany, we have seen a significant surge in orders from a diverse clientele. On November 5th, 2023, customer Alice from US placed an order with total of $2190. Following her, on Nov 7th, Bob from UK ordered a bulk set of twenty-five ergonomic keyboards for his office setup with total of $1000. The trend continued with Jane from Australia, who on Nov 12th requested a shipment of ten high-definition monitors with total of $9000, emphasizing the need for environmentally friendly packaging. On the last day of that month, customer John, located in Singapore, finalized an order for fifteen USB-C docking stations, aiming to equip his design studio with the latest technology for total of $3600.
“””

system = “””
Your task is to precisely identify Personally Identifiable Information (PII) and identifiable details, including name, address, and the person’s country, in the provided text. Replace these details with exactly four asterisks (****) as the masking characters. Use ‘****’ for masking text of any length. Only write the masked text in the response.
“””
In the following example, you define the llama2_chat function that encapsulates sending the prompt to the Llama-2 model. You reuse this function throughout the examples.
def llama2_chat(
predictor,
user,
temperature=0.1,
max_tokens=512,
top_p=0.9,
system=None,
):
“””Constructs the payload for the llama2 model, sends it to the endpoint,
and returns the response.”””

inputs = []
if system:
inputs.append({“role”: “system”, “content”: system})
if user:
inputs.append({“role”: “user”, “content”: user})

payload = {
“inputs”: [inputs],
“parameters”: {
“max_new_tokens”: max_tokens,
“top_p”: top_p,
“temperature”: temperature,
},
}
response = predictor.predict(payload, custom_attributes=”accept_eula=true”)
return response

Use the following code to call the function, passing your parameters:
response = utils.llama2_chat(
predictor,
system=system,
user=report_sample,
)
print(utils.llama2_parse_output(response))

You get the following output:

This month at AnyCompany, we have seen a significant surge in orders from a diverse clientele. On November 5th, 2023, customer ***** from ***** placed an order with total of $2190. Following her, on Nov 7th, ***** from ***** ordered a bulk set of twenty-five ergonomic keyboards for his office setup with total of $1000. The trend continued with ***** from *****, who on Nov 12th requested a shipment of ten high-definition monitors with total of $9000, emphasizing the need for environmentally friendly packaging. On the last day of that month, customer *****, located in *****, finalized an order for fifteen USB-C docking stations, aiming to equip his design studio with the latest technology for total of $3600.

Entity extraction
Entity extraction is the process of identifying and extracting key information entities from unstructured text. This technique helps create structured data from unstructured text and provides useful contextual information for many downstream NLP tasks. Common applications for entity extraction include building a knowledge base, extracting metadata to use for personalization or search, and improving user inputs and conversation understanding within chatbots.
You can effectively use LLMs for entity extraction tasks through careful prompt engineering. With a few examples of extracting entities from text, explanatory prompts, and the desired output format, the model can learn to identify and extract entities such as people, organizations, and locations from new input texts. In the following examples, we demonstrate a few different entity extraction tasks ranging from simpler to more complex using prompt engineering with the Llama-2-70b-chat model you deployed earlier.
Extract generic entities
Use the following code to extract specific entities:
email_sample = “Hello, My name is John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000. Customer feedback for Sunshine Spa, 123 Main St, Anywhere. Send comments to Alice at alice_aa@anycompany.com and Bob at bob_bb@anycompany.com. I enjoyed visiting the spa. It was very comfortable but it was also very expensive. The amenities were ok but the service made the spa a great experience.”

system = “””
Your task is to precisely identify any email addresses from the given text and then write them, one per line. Remember to ONLY write an email address if it’s precisely spelled out in the input text. If there are no email addresses in the text, write “N/A”. DO NOT write anything else.
“””

result = utils.llama2_chat(predictor, system=system, user=email_sample)
print(utils.llama2_parse_output(result))

You get the following output:
alice_aa@anycompany.com
bob_bb@anycompany.com

Extract specific entities in a structured format
Using the previous sample report, you can extract more complex information in a structured manner. This time, you provide a JSON template for the model to use and return the output in JSON format.
With LLMs generating JSON documents as output, you can effortlessly parse them into a range of other data structures. This enables simple conversions to dictionaries, YAML, or even Pydantic models using third-party libraries, such as LangChain’s PydanticOutputParser. You can see the implementation in the GitHub repo.
import json

system = “””
Your task is to precisely extract information from the text provided, and format it according to the given JSON schema delimited with triple backticks. Only include the JSON output in your response. If a specific field has no available data, indicate this by writing `null` as the value for that field in the output JSON. In cases where there is no data available at all, return an empty JSON object. Avoid including any other statements in the response.

“`
{json_schema}
“`
“””

json_schema = “””
{
“orders”:
[
{
“name”: “<customer_name>”,
“location”: “<customer_location>”,
“order_date”: “<order_date in format YYYY-MM-DD>”,
“order_total”: “<order_total>”,
“order_items”: [
{
“item_name”: “<item_name>”,
“item_quantity”: “<item_quantity>”
}
]
}
]
}
“””

response = utils.llama2_chat(
predictor,
system=system.format(json_schema=json_schema),
user=report_sample,
)
json_str = utils.llama2_parse_output(response)
print(json_str)
You get the following output:
{
“orders”: [
{
“name”: “Alice”,
“location”: “US”,
“order_date”: “2023-11-05”,
“order_total”: 2190,
“order_items”: [
{
“item_name”: null,
“item_quantity”: null
}
]
},
{
“name”: “Bob”,
“location”: “UK”,
“order_date”: “2023-11-07”,
“order_total”: 1000,
“order_items”: [
{
“item_name”: “ergonomic keyboards”,
“item_quantity”: 25
}
]
},
{
“name”: “Jane”,
“location”: “Australia”,
“order_date”: “2023-11-12”,
“order_total”: 9000,
“order_items”: [
{
“item_name”: “high-definition monitors”,
“item_quantity”: 10
}
]
},
{
“name”: “John”,
“location”: “Singapore”,
“order_date”: “2023-11-30”,
“order_total”: 3600,
“order_items”: [
{
“item_name”: “USB-C docking stations”,
“item_quantity”: 15
}
]
}
]
}
Classification using prompt engineering
LLMS can be a useful tool for information extraction tasks such as text classification. Common applications include classifying the intents of user interactions via channels such as email, chatbots, voice, and others, or categorizing documents to route their requests to downstream systems. The initial step involves identifying the intent or class of the user’s request or the document. These intents or classes could take many forms—from short single words to thousands of hierarchical classes and sub-classes.
In the following examples, we demonstrate prompt engineering on synthetic conversation data to extract intents. Additionally, we show how pre-trained models can be assessed to determine if fine-tuning is needed.
Let’s start with the following example. You have a list of customer interactions with an imaginary health and life insurance company. To start, use the Llama-2-70b-chat model you deployed in the previous section:
inference_instance_type = “ml.g5.48xlarge”

# Llama-2-70b chat
model_id, model_version = “meta-textgeneration-llama-2-70b-f”, “2.*”
endpoint_name = model_id

predictor = utils.get_predictor(
endpoint_name=endpoint_name,
model_id=model_id,
model_version=model_version,
inference_instance_type=inference_instance_type,
)
The get_predictor function is a helper function that creates a predictor object from a model ID and version. If the specified endpoint doesn’t exist, it creates a new endpoint and deploy the model. If the endpoint already exists, it uses the existing endpoint.
customer_interactions = [
“””Hello, I’ve recently moved to a new state and I need to update my address for my health insurance policy.
Can you assist me with that?
“””,
“””Good afternoon! I’m interested in adding dental coverage to my existing health plan.
Could you provide me the options and prices?
“””,
“””I had a disappointing experience with the customer service yesterday regarding my claim.
I want to file a formal complaint and speak with a supervisor.
“””,
]

system = “””
Your task is to identify the customer intent from their interactions with support bot in the provided text. The intent output must not more than 4 words. If the intent is not clear, please provide a fallback intent of “unknown”.
“””

def get_intent(system, customer_interactions):
for customer_interaction in customer_interactions:
response = utils.llama2_chat(
predictor,
system=system,
user=customer_interaction,
)
content = utils.llama2_parse_output(response)
print(content)
get_intent(system, customer_interactions)
You get the following output:
Update Address
Intent: Informational
Intent: Escalate issue
Looking at the output, these seem reasonable as the intents. However, the format and style of the intents can vary depending on the language model. Another limitation of this approach is that intents are not confined to a predefined list, which means the language model might generate and word the intents differently each time you run it.
To address this, you can use the in-context learning technique in prompt engineering to steer the model towards selecting from a predefined set of intents, or class labels, that you provide. In the following example, alongside the customer conversation, you include a list of potential intents and ask the model to choose from this list:
system = “””
Your task is to identify the intent from the customer interaction with the support bot. Select from the intents provided in the following list delimited with ####. If the intent is not clear, please provide a fallback intent of “unknown”. ONLY write the intent.

####
– information change
– add coverage
– complaint
– portal navigation
– free product upgrade
####
“””

get_intent(system, customer_interactions)
You get the following output:
information change
add coverage
complaint
Reviewing the results, it’s evident that the language model performs well in selecting the appropriate intent in the desired format.
Sub-intents and intent trees
If you make the preceding scenario more complex, as in many real-life use cases, intents can be designed in a large number of categories and also in a hierarchical fashion, which will make the classification tasks more challenging for the model. Therefore, you can further improve and modify your prompt to provide an example to the model, also known as n-shot learning, k-shot learning, or few-shot learning.
The following is the intent tree to use in this example. You can find its source code in the utils.py file in the code repository.
INTENTS = [
{
“main_intent”: “profile_update”,
“sub_intents”: [
“contact_info”,
“payment_info”,
“members”,
],
},
{
“main_intent”: “health_cover”,
“sub_intents”: [
“add_extras”,
“add_hospital”,
“remove_extras”,
“remove_hospital”,
“new_policy”,
“cancel_policy”,
],
},
{
“main_intent”: “life_cover”,
“sub_intents”: [
“new_policy”,
“cancel_policy”,
“beneficiary_info”,
],
},
{
“main_intent”: “customer_retention”,
“sub_intents”: [
“complaint”,
“escalation”,
“free_product_upgrade”,
],
},
{
“main_intent”: “technical_support”,
“sub_intents”: [
“portal_navigation”,
“login_issues”,
],
},
]
Using the following prompt (which includes the intents), you can ask the model to pick from the provided list of intents:
system = “””
Your task is to identify the intent from the customer interaction with the support bot. Identify the intent of the provided text using the list of provided intent tree delimited with ####. The intents are defined in classes and sub-classes. Write the intention with this format: <main-intent>:<sub-intent>. ONLY write the intent.

OUTPUT EXAMPLE:
profile_update:contact_info

OUTPUT EXAMPLE:
customer_retention:complaint

####
{intents}
####
“””

intents_json = json.dumps(utils.INTENTS, indent=4)
system = system.format(intents=intents_json)
get_intent(system, customer_interactions)
You get the following output:
profile_update:contact_info
health_cover:add_extras
customer_retention:complaint
Although LLMs can often correctly identify intent from a list of possible intents, they may sometimes produce additional outputs or fail to adhere to the exact intent structure and output schema. There are also scenarios where intents are not as straightforward as they initially seem or are highly specific to a business domain context that the model doesn’t fully comprehend.
As an example, in the following sample interaction, the customer ultimately wants to change their coverage, but their immediate question and interaction intent is to get help with portal navigation. Similarly, in the second interaction, the more appropriate intent is “free product upgrade” which the customer is requesting. However, the model is unable to detect these nuanced intents as accurately as desired.
customer_interactions = [
“I want to change my coverage plan. But I’m not seeing where to do this on the online website. Could you please point me to it?”,
“I’m unhappy with the current benefits of my plan and I’m considering canceling unless there are better alternatives. What can you offer?”,
]

get_intent(system, customer_interactions)
You get the following output:
profile_update:contact_info
customer_retention:complaint
Prompt engineering can often successfully extract specific intents from text. However, for some use cases, relying solely on prompt engineering has limitations. Scenarios where additional techniques beyond prompt engineering may be needed include:

Conversations with a large number of intent classes or long contexts that exceed the language model’s context window size, or making queries more computationally expensive
Desired outputs in specific formats that the model struggles to adopt
Enhancing model understanding of the domain or task to boost performance

In the following section, we demonstrate how fine-tuning can boost the accuracy of the LLM for the intent classification task attempted earlier.
Fine-tuning an LLM for classification
The following sections detail the fine-tuning process of the FlanT5-XL and Mistral 7B model using SageMaker JumpStart. We use the FlanT5-XL and Mistral 7B models to compare their accuracy. Both models are significantly smaller compared to the Llama-2-70b-Chat. The goal is to determine whether smaller models can achieve state-of-the-art performance on specific tasks after they’re fine-tuned.
We have fine-tuned both Mitral 7B and FlanT5-XL models. You can see the details of the Mistral 7b fine-tuning in the code repository. In the following, we outline the steps for fine-tuning and evaluating of FlanT5-XL.
Initially, you deploy (or reuse) the FlanT5 endpoint as the base_predictor, which represents the base model prior to any fine-tuning. Subsequently, you assess the performance of the models by comparing them after the fine-tuning process.
inference_instance_type = “ml.g5.2xlarge”

model_id , model_version= “huggingface-text2text-flan-t5-xl”, “2.0.0”
base_endpoint_name = model_id

base_predictor = utils.get_predictor(
endpoint_name=base_endpoint_name,
model_id=model_id,
model_version=model_version,
inference_instance_type=inference_instance_type,
)
Prepare training data for fine-tuning
Preparing for fine-tuning requires organizing several files, including the dataset and template files. The dataset is structured to align with the required input format for fine-tuning. For example, each record in our training dataset adheres to the following structure:
{“query”: “customer query”, “response”: “main-intent:sub-intent”}
In this example, you use a synthesized dataset comprising customer interactions with a fictional insurance company. To learn more about the data and gain access to it, refer to the source code.
intent_dataset_file = “data/intent_dataset.jsonl”
intent_dataset_train_file = “data/intent_dataset_train.jsonl”
intent_dataset_test_file = “data/intent_dataset_test.jsonl”
ft_template_file = “data/template.json”
The following is the prompt for fine-tuning. The prompt has the query parameter, which is set during the fine-tuning using the SageMaker JumpStart SDK.
FT_PROMPT = “””Identify the intent classes from the given user query, delimited with ####. Intents are categorized into two levels: main intent and sub intent. In your response, provide only ONE set of main and sub intents that is most relevant to the query. Write your response ONLY in this format <main-intent>:<sub-intent>. ONLY Write the intention.

OUTPUT EXAMPLE:
profile_update:contact_info

OUTPUT EXAMPLE:
technical_support:portal_navigation

#### QUERY:
{query}
####
“””
The following creates a template file that will be used by the SageMaker JumpStart framework to fine-tune the model. The template has two fields, prompt and completion. These fields are used to pass labeled data to the model for the fine-tuning process.
template = {
“prompt”: utils.FT_PROMPT,
“completion”: “{response}”,
}

with open(ft_template_file, “w”) as f:
json.dump(template, f)
The training data is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, setting the stage for the actual fine-tuning process.
train_data_location = utils.upload_train_and_template_to_s3(
bucket_prefix=”intent_dataset_flant5″,
train_path=intent_dataset_train_file,
template_path=ft_template_file,
)
Fine-tune the model
Configure the JumpStartEstimator, specifying your chosen model and other parameters like instance type and hyperparameters (in this example, you use five epochs for the training). This estimator drives the fine-tuning process.
from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
model_id=model_id,
disable_output_compression=True,
instance_type=”ml.g5.24xlarge”,
role=utils.get_role_arn(),
)

estimator.set_hyperparameters(
instruction_tuned=”True”, epochs=”5″, max_input_length=”1024″
)

estimator.fit({“training”: train_data_location})
Deploy the fine-tuned model
After fine-tuning, deploy the fine-tuned model:
finetuned_endpoint_name = “flan-t5-xl-ft-infoext”
finetuned_model_name = finetuned_endpoint_name
# Deploying the finetuned model to an endpoint
finetuned_predictor = estimator.deploy(
endpoint_name=finetuned_endpoint_name,
model_name=finetuned_model_name,
)
Use the following code to test the fine-tuned model against its base model with ambiguous queries, which you saw in the previous section:
ambiguous_queries = [
{
“query”: “I want to change my coverage plan. But I’m not seeing where to do this on the online site. Could you please show me how?”,
“main_intent”: “techincal_support”,
“sub_intent”: “portal_navigation”,
},
{
“query”: “I’m unhappy with the current benefits of my plan and I’m considering canceling unless there are better alternatives. What can you offer?”,
“main_intent”: “customer_retention”,
“sub_intent”: “free_product_upgrade”,
},
]
for query in ambiguous_queries:
question = query[“query”]
print(“query:”, question, “n”)
print(
“expected intent: “, f”{query[‘main_intent’]}:{query[‘sub_intent’]}”
)

prompt = utils.FT_PROMPT.format(query=question)
response = utils.flant5(base_predictor, user=prompt, max_tokens=13)
print(“base model: “, utils.parse_output(response))

response = utils.flant5(finetuned_predictor, user=prompt, max_tokens=13)
print(“finetuned model: “, utils.parse_output(response))
print(“-” * 80)
You get the following output:
query: I want to change my coverage plan. But I’m not seeing where to do this on the online site. Could you please show me how?
expected intent: techincal_support:portal_navigation
base model: main_intent>:sub_intent> change
finetuned model: technical_support:portal_navigation
——————————————————————————–
query: I’m unhappy with the current benefits of my plan and I’m considering canceling unless there are better alternatives. What can you offer?

expected intent: customer_retention:free_product_upgrade
base model: main_intent>:sub_intent> cancel
finetuned model: customer_retention:free_product_upgrade
——————————————————————————–
As shown in this example, the fine-tuned model is able to classify the ambiguous queries correctly.
In evaluations, fine-tuned models performed better in identifying the correct class for both clear and ambiguous intents. The following section details the benchmark’s performance overall, and against each intent.
Performance comparisons and considerations
In this section, we have gathered the evaluation results and performance benchmarks for each model, before and after fine-tuning, as well as a comparison between the prompt engineering and fine-tuning the LLM. The dataset consists of 7,824 examples, with a 90% split for training (including validation) and 10% for testing.

Model
Overall Accuracy
Fine-tuning Duration (minutes)
Notes

Mistral-7b (fine-tuned five epochs, without classes in the prompt)
98.97%
720
Given Mistral-7b’s nature as a text generation model, parsing its output to extract intent can be challenging due to tendencies for character repetition and generation of additional characters. Improved performance with more epochs: 98% accuracy for five epochs compared to 92% for one epoch.

Flan-T5-XL (fine-tuned one epochs, without classes in the prompt)
98.46%
150
Marginal improvement in accuracy with increased epochs: from 97.5% (one epoch) to 98.46% (five epochs).

Llama-2-70b-chat (With classes in the prompt)
78.42%
N/A
Low accuracy in ambiguous scenarios.

Llama-2-70b-chat (Without classes in the prompt)
10.85%
N/A
.

Flan-T5-XL (base model, without classes in the prompt)
0.0%
N/A
Unable to identify any of the intent classes with the expected format.

Mistral-7b (base model, without classes in the prompt)
0.0%
N/A
Unable to identify any of the intent classes with the expected format.

The following table contains a breakdown of models’ accuracy for each intent class.

Main Intent
Sub-intent
Example Count
Llama2-70b (without classes in prompt)
Llama2-70b (with classes in prompt)
Flant5-XL Fine-tuned
Mistral-7b Fine-tuned

Customer Retention
Complaint
63
7.94%
44.44%
98.41%
98.41%

Customer Retention
Escalation
49
91.84%
100%
100%
100%

Customer Retention
Free Product Upgrade
50
0.00%
64.00%
100%
100%

Health Cover
Add Extras
38
0.00%
100%
97.37%
100%

Health Cover
Add Hospital
44
0.00%
81.82%
100%
97.73%

Health Cover
Cancel Policy
43
0.00%
100%
100%
97.67%

Health Cover
New Policy
41
0.00%
82.93%
100%
100%

Health Cover
Remove Extras
47
0.00%
85.11%
100%
100%

Health Cover
Remove Hospital
53
0.00%
84.90%
100%
100%

Life Cover
Beneficiary Info
45
0.00%
100%
97.78%
97.78%

Life Cover
Cancel Policy
47
0.00%
55.32%
100%
100%

Life Cover
New Policy
40
0.00%
90.00%
92.50%
100%

Profile Update
Contact Info
45
35.56%
95.56%
95.56%
95.56%

Profile Update
Members
52
0.00%
36.54%
98.08%
98.08%

Profile Update
Payment Info
47
40.43%
97.87%
100%
100%

Technical Support
Login Issues
39
0.00%
92.31%
97.44%
100%

Technical Support
Portal Navigation
40
0.00%
45.00%
95.00%
97.50%

This comparative analysis illustrates the trade-offs between fine-tuning time and model accuracy. It highlights the ability of models like Mistral-7b and FlanT5-XL to achieve higher classification accuracy through fine-tuning. Additionally, it shows how smaller models can match or surpass the performance of larger models on specific tasks when fine-tuned, contrasted with using prompt engineering alone on the larger models.
Clean up
Complete the following steps to clean up your resources:

Delete the SageMaker endpoints, configuration, and models.
Delete the S3 bucket created for this example.
Delete the SageMaker notebook instance (if you used one to run this example).

Summary
Large language models have revolutionized information extraction from unstructured text data. These models excel in tasks such as classifying information and extracting key entities from various documents, achieving state-of-the-art results with minimal data.
This post demonstrated the use of large language models for information extraction through prompt engineering and fine-tuning. While effective, relying solely on prompt engineering can have limitations for complex tasks that require rigid output formats or a large number of classes. In these scenarios, fine-tuning even smaller models on domain-specific data can significantly improve performance beyond what prompt engineering alone can achieve.
The post included practical examples highlighting how fine-tuned smaller models can surpass prompt engineering with larger models for such complex use cases. Although prompt engineering is a good starting point for simpler use cases, fine-tuning offers a more robust solution for complex information extraction tasks, ensuring higher accuracy and adaptability to specific use cases. SageMaker JumpStart tools and services facilitate this process, making it accessible for individuals and teams across all levels of ML expertise.
Additional reading
You can read more on using SageMaker JumpStart for intelligent document processing, fine-tuning, and evaluation of LLMs in the following resources:

Enhancing AWS intelligent document processing with generative AI
Fine-tune and Deploy Mistral 7B with Amazon SageMaker JumpStart
Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data
Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart
Evaluate large language models for quality and responsibility

About the Authors
Pooya Vahidi  is a Senior Solutions Architect at AWS, passionate about computer science, artificial intelligence, and cloud computing. As an AI professional, he is an active member of the AWS AI/ML Area-of-Depth team. With a background spanning over two decades of expertise in leading the architecture and engineering of large-scale solutions, he helps customers on their transformative journeys through cloud and AI/ML technologies.
Dr. Romina Sharifpour is a Senior Machine Learning and Artificial Intelligence Solutions Architect at Amazon Web Services (AWS). She has spent over 10 years leading the design and implementation of innovative end-to-end solutions enabled by advancements in ML and AI. Romina’s areas of interest are natural language processing, large language models, and MLOps.

Self-Play Preference Optimization (SPPO): An Innovative Machine Learni …

Large Language Models (LLMs) have demonstrated remarkable abilities in generating human-like text, answering questions, and coding. However, they face hurdles requiring high reliability, safety, and ethical adherence. Reinforcement Learning from Human Feedback (RLHF), or Preference-based Reinforcement Learning (PbRL), emerges as a promising solution. This framework has shown significant success in fine-tuning LLMs to align with human preferences, enhancing their usefulness. 

Existing RLHF approaches, like InstructGPT, rely on explicit or implicit reward models, e.g., the Bradley-Terry model. Recent research explores direct preference probabilities to better represent human preferences. Some researchers formulate RLHF as finding Nash equilibriums in constant-sum games, proposing mirror descent and Self-play Preference Optimization (SPO) methods. Direct Nash Optimization (DNO) was also introduced based on win rate gaps, yet its practical implementation still relies on iterative DPO frameworks.

Researchers from the University of California, Los Angeles and Carnegie Mellon University introduce a robust self-play framework, Self-Play Preference Optimization (SPPO),  for language model alignment addressing RLHF challenges. It offers provable guarantees for solving two-player constant-sum games and scalability for large language models. In formulating RLHF as such a game, the objective is to identify the Nash equilibrium policy, ensuring consistently preferred responses. They propose an adaptive algorithm based on multiplicative weights, employing a self-play mechanism where the policy fine-tunes itself on synthetic data annotated by the preference model.  

The self-play framework aims to solve two-player constant-sum games efficiently and at scale for large language models. It adopts an iterative framework based on multiplicative weight updates and a self-play mechanism. The algorithm asymptotically converges to the optimal policy, identifying the Nash equilibrium. Theoretical analysis ensures convergence, providing provable guarantees. Compared to existing methods like DPO and IPO, SPPO demonstrates improved convergence and addresses data sparsity issues efficiently.

The researchers evaluate models using GPT-4 for automatic evaluation, presenting results on AlpacaEval 2.0 and MT-Bench. SPPO models consistently improve across iterations, with SPPO Iter3 showing the highest win rate. Compared to DPO and IPO, SPPO achieves superior performance and effectively controls output length. Test-time reranking with the PairRM reward model consistently improves model performance without over-optimization. SPPO outperforms many state-of-the-art chatbots on AlpacaEval 2.0 and remains competitive with GPT-4 on MT-Bench. 

To conclude, the paper introduces Self-Play Preference Optimization (SPPO), a robust method for fine-tuning LLMs using Human/AI Feedback. By employing self-play in a two-player game and a preference-based learning objective, SPPO significantly improves over existing methods like DPO and IPO across various benchmarks. Integrating a preference model and batched estimation, SPPO aligns LLMs closely with human preferences, addressing issues like “length bias” reward hacking. These findings suggest SPPO’s potential for enhancing generative AI system alignment, advocating for its broader adoption in LLMs and beyond.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit
The post Self-Play Preference Optimization (SPPO): An Innovative Machine Learning Approach to Finetuning Large Language Models (LLMs) from Human/AI Feedback appeared first on MarkTechPost.