December 2024 - Page 6 of 10

How Clearwater Analytics is revolutionizing investment management with …

Posted on December 14, 2024 by i-genie

This post was written with Darrel Cherry, Dan Siddall, and Rany ElHousieny of Clearwater Analytics.
As global trading volumes rise rapidly each year, capital markets firms are facing the need to manage large and diverse datasets to stay ahead. These datasets aren’t just expansive in volume; they’re critical in driving strategy development, enhancing execution, and streamlining risk management. The explosion of data creation and utilization, paired with the increasing need for rapid decision-making, has intensified competition and unlocked opportunities within the industry. To remain competitive, capital markets firms are adopting Amazon Web Services (AWS) Cloud services across the trade lifecycle to rearchitect their infrastructure, remove capacity constraints, accelerate innovation, and optimize costs.
Generative AI, AI, and machine learning (ML) are playing a vital role for capital markets firms to speed up revenue generation, deliver new products, mitigate risk, and innovate on behalf of their customers. A great example of such innovation is our customer Clearwater Analytics and their use of large language models (LLMs) hosted on Amazon SageMaker JumpStart, which has propelled asset management productivity and delivered AI-powered investment management productivity solutions to their customers.
In this post, we explore Clearwater Analytics’ foray into generative AI, how they’ve architected their solution with Amazon SageMaker, and dive deep into how Clearwater Analytics is using LLMs to take advantage of more than 18 years of experience within the investment management domain while optimizing model cost and performance.
About Clearwater Analytics
Clearwater Analytics (NYSE: CWAN) stands at the forefront of investment management technology. Founded in 2004 in Boise, Idaho, Clearwater has grown into a global software-as-a-service (SaaS) powerhouse, providing automated investment data reconciliation and reporting for over $7.3 trillion in assets across thousands of accounts worldwide. With a team of more than 1,600 professionals and a long-standing relationship with AWS dating back to 2008, Clearwater has consistently pushed the boundaries of financial technology innovation.
In May 2023, Clearwater embarked on a journey into the realm of generative AI, starting with a private, secure generative AI chat-based assistant for their internal workforce, enhancing client inquiries through Retrieval Augmented Generation (RAG). As a result, Clearwater was able to increase assets under management (AUM) over 20% without increasing operational headcount. By September of the same year, Clearwater unveiled its generative AI customer offerings at the Clearwater Connect User Conference, marking a significant milestone in their AI-driven transformation.
About SageMaker JumpStart
Amazon SageMaker JumpStart is an ML hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select foundation models (FMs) quickly based on predefined quality and responsibility metrics to perform tasks such as article summarization and image generation. Pre-trained models are fully customizable for your use case with your data, and you can effortlessly deploy them into production with the user interface or AWS SDK. You can also share artifacts, including models and notebooks, within your organization to accelerate model building and deployment, and admins can control which models are visible to users within their organization.
Clearwater’s generative AI solution architecture
Clearwater Analytics’ generative AI architecture supports a wide array of vertical solutions by merging extensive functional capabilities through the LangChain framework, domain knowledge through RAG, and customized LLMs hosted on Amazon SageMaker. This integration has resulted in a potent asset for both Clearwater customers and their internal teams.
The following image illustrates the solution architecture.

As of September 2024, the AI solution supports three core applications:

Clearwater Intelligent Console (CWIC) – Clearwater’s customer-facing AI application. This assistant framework is built upon three pillars:

Knowledge awareness – Using RAG, CWIC compiles and delivers comprehensive knowledge that is crucial for customers from intricate calculations of book value to period-end reconciliation processes.
Application awareness – Transforming novice users into power users instantly, CWIC guides clients to inquire about Clearwater’s applications and receive direct links to relevant investment reports. For instance, if a client needs information on their yuan exposure, CWIC employs its tool framework to identify and provide links to the appropriate currency exposure reports.
Data awareness – Digging deep into portfolio data, CWIC adeptly manages complex queries, such as validating book yield tie-outs, by accessing customer-specific data and performing real-time calculations.The following image shows a snippet of the generative AI assistance within the CWIC.

Crystal – Clearwater’s advanced AI assistant with expanded capabilities that empower internal teams’ operations. Crystal shares CWIC’s core functionalities but benefits from broader data sources and API access. Enhancements driven by Crystal have achieved efficiency gains between 25% and 43%, improving Clearwater’s ability to manage substantial increases in AUM without increases in staffing.
CWIC Specialists – Their most recent solution CWIC Specialists are domain-specific generative AI agents equipped to handle nuanced investment tasks, from accounting to regulatory compliance. These agents can work in single or multi-agentic workflows to answer questions, perform complex operations, and collaborate to solve various investment-related tasks. These specialists assist both internal teams and customers in domain specific areas, such as investment accounting, regulatory requirements, and compliance information. Each specialist is underpinned by thousands of pages of domain documentation, which feeds into the RAG system and is used to train smaller, specialized models with Amazon SageMaker JumpStart. This approach enhances cost-effectiveness and performance to promote high-quality interactions.

In the next sections, we dive deep into how Clearwater analytics is using Amazon SageMaker JumpStart to fine-tune models for productivity improvement and to deliver new AI services.
Clearwater’s Use of LLMs hosted on Amazon SageMaker JumpStart
Clearwater employs a two-pronged strategy for using LLMs. This approach addresses both high-complexity scenarios requiring powerful language models and domain-specific applications demanding rapid response times.

Advanced foundation models – For tasks involving intricate reasoning or creative output, Clearwater uses state-of-the-art pre-trained models such as Anthropic’s Claude or Meta’s Llama. These models excel in handling complex queries and generating innovative solutions.
Fine-tuned models for specialized knowledge – In cases where domain-specific expertise or swift responses are crucial, Clearwater uses fine-tuned models. These customized LLMs are optimized for industries or tasks that require accuracy and efficiency.

Fine-tuned models through domain adaptation with Amazon SageMaker JumpStart
Although general LLMs are powerful, their accuracy can be put to the test in specialized domains. This is where domain adaptation, also known as continued pre-training, comes into play. Domain adaptation is a sophisticated form of transfer learning that allows a pre-trained model to be fine-tuned for optimal performance in a different, yet related, target domain. This approach is particularly valuable when there’s a scarcity of labeled data in the target domain but an abundance in a related source domain.
These are some of the key benefits for domain adaptation:

Cost-effectiveness – Creating a curated set of questions and answers for instruction fine-tuning can be prohibitively expensive and time-consuming. Domain adaptation eliminates the need for thousands of manually created Q&As.
Comprehensive learning – Unlike instruction tuning, which only learns from provided questions, domain adaptation extracts information from entire documents, resulting in a more thorough understanding of the subject matter.
Efficient use of expertise – Domain adaptation frees up human experts from the time-consuming task of generating questions so they can focus on their primary responsibilities.
Faster deployment – With domain adaptation, specialized AI models can be developed and deployed more quickly, accelerating time to market for AI-powered solutions.

AWS has been at the forefront of domain adaptation, creating a framework to allow creating powerful, specialized AI models. Using this framework, Clearwater has been able to train smaller, faster models tailored to specific domains without the need for extensive labeled datasets. This innovative approach allows Clearwater to power digital specialists with a finely tuned model trained on a particular domain. The result? More responsive LLMs that form the backbone of their cutting-edge generative AI services.
The evolution of fine-tuning with Amazon SageMaker JumpStart
Clearwater is collaborating with AWS to enhance their fine-tuning processes. Amazon SageMaker JumpStart offered them a framework for domain adaptation. During the year, Clearwater has witnessed significant improvements in the user interface and effortlessness of fine-tuning using SageMaker JumpStart.
For instance, the code required to set up and fine-tune a GPT-J-6B model has been drastically streamlined. Previously, it required a data scientist to write over 100 lines of code within an Amazon SageMaker Notebook to identify and retrieve the proper image, set the right training script, and import the right hyperparameters. Now, using SageMaker JumpStart and advancements in the field, the process has streamlined to a few lines of code:

estimator = JumpStartEstimator(
model_id=model_id,
hyperparameters={“epoch”: “3”, “per_device_train_batch_size”: “4”},
)

# initiate the traning process with the path of the data
estimator.fit(
{“train”: training_dataset_s3_path, “validation”: validation_dataset_s3_path}, logs=True
)

A fine-tuning example: Clearwater’s approach
For Clearwater’s AI, the team successfully fine-tuned a GPT-J-6B (huggingface-textgeneration1-gpt-j- 6bmodel) model with domain adaptation using Amazon SageMaker JumpStart. The following are the concrete steps used for the fine-tuning process to serve as a blueprint for others to implement similar strategies. A detailed tutorial can found in this amazon-sagemaker-examples repo.

Document assembly – Gather all relevant documents that will be used for training. This includes help content, manuals, and other domain-specific text. The data Clearwater used for training this model is public help content which contains no client data. Clearwater exclusively uses client data, with their collaboration and approval, to fine-tune a model dedicated solely to the specific client. Curation, cleaning and de-identification of data is necessary for training and subsequent tuning operations.
Test set creation – Develop a set of questions and answers that will be used to evaluate the model’s performance before and after fine-tuning. Clearwater has implemented a sophisticated model evaluation system for additional assessment of performance for open source and commercial models. This is covered more in the Model evaluation and optimization section later in this post.
Pre-trained model deployment – Deploy the original, pre-trained GPT-J-6B model.
Baseline testing – Use the question set to test the pre-trained model, establishing a performance baseline.
Pre-trained model teardown – Remove the pre-trained model to free up resources.
Data preparation – Upload the assembled documents to an S3 bucket, making sure they’re in a format suitable for the fine-tuning process.
Fine-tuning – Train the new model using the uploaded documents, adjusting hyperparameters as needed.
Fine-tuned model testing – Evaluate the fine-tuned model using the same question set used for the baseline.
Fine-tuned model teardown – If not immediately needed, tear down the fine-tuned model to optimize resource usage.
RAG comparison – Test a RAG-based system using the same question set for an additional point of comparison.
Performance evaluation – Analyze the results from all tests to assess the effectiveness of the fine-tuning process.

This is how the results look before tuning an open source model (GGPT-J-6B), after tuning, and using a commercial model with a larger context window (Claude-3.5-Sonnet).
Prompt
What does Clearwater Compliance module do?
Reference Answer
The Clearwater Compliance module supports client investment policies, providing daily updates, including compliance warnings and violations. Clearwater’s Compliance module monitors various portfolio characteristics, including asset and security type allocation, sector exposure, investment time horizon, and credit quality. The Compliance module enables users to establish and update investment policies, monitor changes to accounts, and provide audit reporting. Rules provide flexibility to evaluate data from the individual security-level to aggregate values at the portfolio-level. User-identified investments can also be prohibited. Multiple investment policies can be applied to individual accounts and aggregates of accounts.

GPT-J-6B pre-trained
GPT-J-6B fine-tuned
Claude-3.5-Sonnet with RAG

The Clearwater Compliance module is an integrated set of controls you can use to manage your compliance with the federal and state requirements for the handling and transportation of hazardous materials.
The Clearwater Compliance module is a suite of tools designed to help clients monitor and manage their compliance policies and procedures. · Compliance Overview dashboard – Provides a high-level view of compliance policies and a history of compliance policy changes. · Compliance Settings tool – Users can update compliance settings, such as the frequency at which compliance policies are checked and the level of compliance required to access a compliance-restricted area of the website. · Compliance Audit report – Provides a detailed list of policies and rules that weren’t met. The report includes a breakdown of rule violations and the actions required to bring the policies into compliance. · Compliance Notifications tool – Sends notifications to designated recipients, such as the client’s compliance officer, when a compliance rule is about to expire or has been violated. · Compliance Documents tool – Users can upload and manage compliance policies and rules.
The Clearwater Compliance module is a sophisticated tool designed to support and monitor client investment policies. Here’s what it does: · Daily Monitoring – Provides daily updates, including compliance warnings and violations, to foster ongoing adherence to investment policies. · Portfolio characteristic tracking –Monitors various aspects of portfolios, including asset and security type allocation, sector exposure, investment time horizon, and credit quality. · Policy management – Users can establish, update, and manage investment policies through the module. · Flexible rule application – Allows for the evaluation of compliance rules at both individual security and portfolio levels. · Custom policy creation – Clearwater works with clients to create compliance rules based on their specific investment policies and interpretations. · Multi-account support – Multiple investment policies can be applied to individual accounts or aggregates of accounts. · Reporting and auditing – Generates audit reports and provides tools for monitoring account changes. · Risk mitigation – Offers integrated, customized daily reports on the Clearwater Reporting Site, to proactively manage risk. · Regulatory compliance – Includes external and regulatory policies when provided by the client, providing broader compliance coverage. · Violation tracking – Identifies compliance rule violations, tracks the number of days in violation, and allows for resolution statements.

Although the Compliance module is comprehensive, it has some limitations. It only supports objective rules using data available within the Clearwater system and doesn’t cover areas such as pre-trade compliance, investments without publicly available data, or scenario testing.
Overall, the Clearwater Compliance module serves as a robust tool for maintaining investment policy compliance, offering daily monitoring, customization, and reporting features to support effective risk management and regulatory adherence.
Model evaluation and optimization
Clearwater employs a sophisticated evaluation system to assess the performance of new models available on Amazon SageMaker JumpStart. This means that only models demonstrating superior capabilities are integrated into the production environment.
Clearwater’s LLM operations (LLMOps) pipeline plays a crucial role in this process, automating the evaluation and seamless integration of new models. This commitment to using the most effective LLMs for each unique task with cutting-edge technology and optimal performance is the cornerstone of Clearwater’s approach.
The evaluation phase is crucial for determining the success of the fine-tuning process. As you determine the evaluation process and framework that should be used, you need to make sure they fit the criteria for their domain. At Clearwater, we designed our own internal evaluation framework to meet the specific needs of our investment management and accounting domains.
Here are key considerations:

Performance comparison – The fine-tuned model should outperform the pre-trained model on domain-specific tasks. If it doesn’t, it might indicate that the pre-trained model already had significant knowledge in this area.
RAG benchmark – Compare the fine-tuned model’s performance against a RAG system using a pre-trained model. If the fine-tuned model doesn’t at least match RAG performance, troubleshooting is necessary.
Troubleshooting checklist:

Data format suitability for fine-tuning
Completeness of the training dataset
Hyperparameter optimization
Potential overfitting or underfitting
Cost-benefit analysis. That is, estimate the operational costs of using a RAG system with a pre-tuned model (for example, Claude-3.5 Sonnet) compared with deploying the fine-tuned model at production scale.

Advance considerations:

Iterative fine-tuning – Consider multiple rounds of fine-tuning, gradually introducing more specific or complex data.
Multi-task learning – If applicable, fine-tune the model on multiple related domains simultaneously to improve its versatility.
Continual learning – Implement strategies to update the model with new information over time without full retraining.

Conclusion
For businesses and organizations seeking to harness the power of AI in specialized domains, domain adaptation presents significant opportunities. Whether you’re in healthcare, finance, legal services, or any other specialized field, adapting LLMs to your specific needs can provide a significant competitive advantage.
By following this comprehensive approach with Amazon SageMaker, organizations can effectively adapt LLMs to their specific domains, achieving better performance and potentially more cost-effective solutions than generic models with RAG systems. However, the process requires careful monitoring, evaluation, and optimization to achieve the best results.
As we’ve observed with Clearwater’s success, partnering with an experienced AI company such as AWS can help navigate the complexities of domain adaptation and unlock its full potential. By embracing this technology, you can create AI solutions that are not just powerful, but also truly tailored to your unique requirements and expertise.
The future of AI isn’t just about bigger models, but smarter, more specialized ones. Domain adaptation is paving the way for this future, and those who harness its power will emerge as leaders in their respective industries.
Get started with Amazon SageMaker JumpStart on your fine-tuning LLM journey today.

About the Authors
Darrel Cherry is a Distinguished Engineer with over 25 years of experience leading organizations to create solutions for complex business problems. With a passion for emerging technologies, he has architected large cloud and data processing solutions, including machine learning and deep learning AI applications. Darrel holds 19 US patents and has contributed to various industry publications. In his current role at Clearwater Analytics, Darrel leads technology strategy for AI solutions, as well as Clearwater’s overall enterprise architecture. Outside the professional sphere, he enjoys traveling, auto racing, and motorcycling, while also spending quality time with his family.
Dan Siddall, a Staff Data Scientist at Clearwater Analytics, is a seasoned expert in generative AI and machine learning, with a comprehensive understanding of the entire ML lifecycle from development to production deployment. Recognized for his innovative problem-solving skills and ability to lead cross-functional teams, Dan leverages his extensive software engineering background and strong communication abilities to bridge the gap between complex AI concepts and practical business solutions.
Rany ElHousieny is an Engineering Leader at Clearwater Analytics with over 30 years of experience in software development, machine learning, and artificial intelligence. He has held leadership roles at Microsoft for two decades, where he led the NLP team at Microsoft Research and Azure AI, contributing to advancements in AI technologies. At Clearwater, Rany continues to leverage his extensive background to drive innovation in AI, helping teams solve complex challenges while maintaining a collaborative approach to leadership and problem-solving.
Pablo Redondo is a Principal Solutions Architect at Amazon Web Services. He is a data enthusiast with over 18 years of FinTech and healthcare industry experience and is a member of the AWS Analytics Technical Field Community (TFC). Pablo has been leading the AWS Gain Insights Program to help AWS customers achieve better insights and tangible business value from their data analytics and AI/ML initiatives. In his spare time, Pablo enjoys quality time with his family and plays pickleball in his hometown of Petaluma, CA.
Prashanth Ganapathy is a Senior Solutions Architect in the Small Medium Business (SMB) segment at AWS. He enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them. Outside of work, Prashanth enjoys photography, travel, and trying out different cuisines.

This AI Paper Introduces A Maximum Entropy Inverse Reinforcement Learn …

Posted on December 13, 2024 by i-genie

Diffusion models are closely linked to imitation learning because they generate samples by gradually refining random noise into meaningful data. This process is guided by behavioral cloning, a common imitation learning approach where the model learns to copy an expert’s actions step by step. For diffusion models, the predefined process transforms noise into a final sample, and following this process ensures high-quality results in various tasks. However, behavioral cloning also causes slow generation speed. This happens because the model is trained to follow a detailed path with many small steps, often requiring hundreds or thousands of calculations. However, these steps are computationally expensive in terms of time and require a lot of computation, and taking fewer steps to generate reduces the quality of the model.

Current methods optimize the sampling process without changing the model, such as tuning noise schedules, improving differential equation solvers, and using non–Markovian methods. Others enhance the quality of the sample by training neural networks for short-run sampling. Distillation techniques show promise but usually perform below teacher models. However, adversarial or reinforcement learning methods may surpass them. RL updates the diffusion models based on reward signals using policy gradients or different value functions.

To solve this, researchers from the Korea Institute for Advanced Study, Seoul National University, University of Seoul, Hanyang University, and Saige Research proposed two advancements in diffusion models. The first approach, called Diffusion by Maximum Entropy Inverse Reinforcement Learning (DxMI), combined two methods: diffusion and Energy-Based Models (EBM). In this method, EBM used rewards to measure how good the results were. The goal was to adjust the reward and entropy (uncertainty) in the diffusion model to make training more stable and ensure that both models performed well with the data. The second advancement, Diffusion by Dynamic Programming (DxDP), introduced a reinforcement learning algorithm that simplified entropy estimation by optimizing an upper bound of the objective and eliminated the need for back-propagation through time by framing the task as an optimal control problem, applying dynamic programming for faster and more efficient convergence.

The experiments demonstrated DxMI’s effectiveness in training diffusion and energy-based models (EBMs) for tasks like image generation and anomaly detection. For 2D synthetic data, DxMI improved sample quality and energy function accuracy with a proper entropy regularization parameter. It was demonstrated that pre-training with DDPM is useful but unnecessary for DxMI to function. DxMI fine-tuned models such as DDPM and EDM with fewer generation steps for image generation, which were competitive in quality. In anomaly detection, the energy function of DxMI performed better in detecting and localizing anomalies on the MVTec-AD dataset. Entropy maximization improved performance by promoting exploration and increasing model diversity.

In summary, the proposed method greatly advances the efficiency and quality of diffusion generative models by using the DxMI approach. It solves the issues of previous methods, such as slow generation speeds and degraded sample quality. However, it is not directly suitable for training single-step generators, but a diffusion model fine-tuned by DxMI can be converted into one. DxMI lacks the flexibility to use different generation steps during testing. This method can be used for upcoming research in this domain and serve as a baseline, making a significant difference!

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….
The post This AI Paper Introduces A Maximum Entropy Inverse Reinforcement Learning (IRL) Approach for Improving the Sample Quality of Diffusion Generative Models appeared first on MarkTechPost.

Researchers at Stanford Introduce UniTox: A Unified Dataset of 2,418 F …

Posted on December 13, 2024 by i-genie

Drug-induced toxicity is a major challenge in drug development, contributing significantly to the failure of clinical trials. While efficacy issues account for most failures, safety concerns are the second leading cause, at 24%. Toxicities can affect various organ systems, including the heart, liver, kidneys, and lungs, and even approved drugs may face withdrawal due to unforeseen toxic effects in post-market surveillance. Current toxicity datasets, often derived from labor-intensive expert analyses of FDA drug labels, are typically small and limited to specific organ systems. These documents, which detail a drug’s indications, risks, and clinical trial results, are critical but time-consuming to curate, often exceeding 100 pages per drug. Consequently, there is a pressing need for predictive models to identify safer drug candidates early in development.

Efforts to build comprehensive toxicity datasets have faced several limitations. Existing databases, such as SIDER, LiverTox, and PNEUMOTOX, are often organ-specific or rely on in vitro assays, which may not accurately predict in vivo effects. Annotation efforts are time-intensive, and methodologies for toxicity evaluation vary widely, leading to inconsistencies across datasets. For instance, the FDA’s renal toxicity database, DIRIL, integrates conflicting sources with over 30% disagreement on certain drugs. Large language models (LLMs) like askFDALabel offer promise by streamlining data extraction from FDA labels, achieving up to 78% agreement with human evaluations for cardiotoxicity. However, despite advancements, challenges in dataset scalability, annotation consistency, and comprehensive coverage persist, limiting the effectiveness of ML models trained on these datasets.

Researchers from Stanford University and Genmab introduced UniTox, a comprehensive dataset of 2,418 FDA-approved drugs, summarizing and rating drug-induced toxicities using GPT-4o to process FDA drug labels. Covering eight toxicity types, including cardiotoxicity, liver toxicity, and infertility, UniTox is the largest systematic in vivo database and the first to encompass nearly all non-combination FDA-approved drugs for these toxicities. Clinicians validated a subset of GPT-4o annotations, with concordance rates of 85–96%. Benchmarks of machine learning models trained on UniTox demonstrated its utility for predicting molecular toxicity, achieving up to 93% accuracy on existing datasets and surpassing askFDALabel’s performance.

To develop UniTox, researchers curated a dataset of 2,418 FDA-approved drugs by filtering and deduplicating drug labels from the FDALabel database, including biologics. Using GPT-4o and a two-step chain-of-thought prompting system, the team generated toxicity summaries and ratings for eight toxicity types. The model categorized toxicity using ternary (No, Less, Most) and binary (Yes, No) scales. The validation included comparisons with existing FDA datasets (DICTrank, DILIrank, DIRIL) and clinician reviews, achieving strong concordance. Clinicians evaluated a subset for toxicity types lacking prior data, scoring the model’s outputs based on factual accuracy and alignment with expert knowledge.

The UniTox dataset, encompassing 2,418 drugs and eight toxicity types, offers a comprehensive resource for toxicity analysis. It includes GPT-4o-generated toxicity summaries, ternary and binary classifications, and Structured Product Labeling (SPL) IDs. Summaries condense lengthy drug labels into 297 words on average, aiding quick comprehension and enabling their use as ground truth for training toxicity predictors. The dataset reveals toxicity correlations, with liver and hematological toxicity showing the highest relationship (0.45). UniTox also provides insights into toxicity patterns across drug classes based on WHO-ATC classifications, highlighting variations linked to FDA risk tolerance for different therapeutic categories.

In conclusion, the study highlights the use of GPT-4o for efficiently summarizing complex drug labels, producing accurate toxicity ratings across eight types, including liver, renal, and cardiotoxicity. These ratings showed strong agreement with datasets like DILIrank and clinical reviewers, enabling training molecular classifiers with predictive value. The UniTox dataset, comprising 2,418 FDA-approved drugs, is the largest and bridges gaps in toxicity evaluation across multiple organ systems. Despite challenges like nuanced toxicity translation and limited applicability to failed drugs, UniTox demonstrates the value of LLMs in creating detailed datasets, advancing drug toxicity prediction, and supporting future research efforts.

Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….
The post Researchers at Stanford Introduce UniTox: A Unified Dataset of 2,418 FDA-Approved Drugs with Drug-Induced Toxicity Summaries and Ratings Created by Using GPT-4o to Process FDA Drug Labels appeared first on MarkTechPost.

Meet Ivy-VL: A Lightweight Multimodal Model with Only 3 Billion Parame …

Posted on December 13, 2024 by i-genie

The ongoing advancement in artificial intelligence highlights a persistent challenge: balancing model size, efficiency, and performance. Larger models often deliver superior capabilities but require extensive computational resources, which can limit accessibility and practicality. For organizations and individuals without access to high-end infrastructure, deploying multimodal AI models that process diverse data types, such as text and images, becomes a significant hurdle. Addressing these challenges is crucial to making AI solutions more accessible and efficient.

Ivy-VL, developed by AI-Safeguard, is a compact multimodal model with 3 billion parameters. Despite its small size, Ivy-VL delivers strong performance across multimodal tasks, balancing efficiency and capability. Unlike traditional models that prioritize performance at the expense of computational feasibility, Ivy-VL demonstrates that smaller models can be both effective and accessible. Its design focuses on addressing the growing demand for AI solutions in resource-constrained environments without compromising quality.

Leveraging advancements in vision-language alignment and parameter-efficient architecture, Ivy-VL optimizes performance while maintaining a low computational footprint. This makes it an appealing option for industries like healthcare and retail, where deploying large models may not be practical.

Technical Details

Ivy-VL is built on an efficient transformer architecture, optimized for multimodal learning. It integrates vision and language processing streams, enabling robust cross-modal understanding and interaction. By using advanced vision encoders alongside lightweight language models, Ivy-VL achieves a balance between interpretability and efficiency.

Key features include:

Resource Efficiency: With 3 billion parameters, Ivy-VL requires less memory and computation compared to larger models, making it cost-effective and environmentally friendly.

Performance Optimization: Ivy-VL delivers strong results across multimodal tasks, such as image captioning and visual question answering, without the overhead of larger architectures.

Scalability: Its lightweight nature allows deployment on edge devices, broadening its applicability in areas such as IoT and mobile platforms.

Fine-tuning Capability: Its modular design simplifies fine-tuning for domain-specific tasks, facilitating quick adaptation to different use cases.

Results and Insights

Ivy-VL’s performance across various benchmarks underscores its effectiveness. For instance, it achieves a score of 81.6 on the AI2D benchmark and 82.6 on MMBench, showcasing its robust multimodal capabilities. In the ScienceQA benchmark, Ivy-VL achieves a high score of 97.3, demonstrating its ability to handle complex reasoning tasks. Additionally, it performs well in RealWorldQA and TextVQA, with scores of 65.75 and 76.48, respectively.

These results highlight Ivy-VL’s ability to compete with larger models while maintaining a lightweight architecture. Its efficiency makes it well-suited for real-world applications, including those requiring deployment in resource-limited environments.

Conclusion

Ivy-VL represents a promising development in lightweight, efficient AI models. With just 3 billion parameters, it provides a balanced approach to performance, scalability, and accessibility. This makes it a practical choice for researchers and organizations seeking to deploy AI solutions in diverse environments.

As AI becomes increasingly integrated into everyday applications, models like Ivy-VL play a key role in enabling broader access to advanced technology. Its combination of technical efficiency and strong performance sets a benchmark for the development of future multimodal AI systems.

Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….
The post Meet Ivy-VL: A Lightweight Multimodal Model with Only 3 Billion Parameters for Edge Devices appeared first on MarkTechPost.

Accelerate your ML lifecycle using the new and improved Amazon SageMak …

Posted on December 13, 2024 by i-genie

In Part 1 of this series, we introduced the newly launched ModelTrainer class on the Amazon SageMaker Python SDK and its benefits, and showed you how to fine-tune a Meta Llama 3.1 8B model on a custom dataset. In this post, we look at the enhancements to the ModelBuilder class, which lets you seamlessly deploy a model from ModelTrainer to a SageMaker endpoint, and provides a single interface for multiple deployment configurations.
In November 2023, we launched the ModelBuilder class (see Package and deploy models faster with new tools and guided workflows in Amazon SageMaker and Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements), which reduced the complexity of initial setup of creating a SageMaker endpoint such as creating an endpoint configuration, choosing the container, serialization and deserialization, and more, and helps you create a deployable model in a single step. The recent update enhances usability of the ModelBuilder class for a wide range of use cases, particularly in the rapidly evolving field of generative AI. In this post, we deep dive into the enhancements made to the ModelBuilder class, and show you how to seamlessly deploy the fine-tuned model from Part 1 to a SageMaker endpoint.
Improvements to the ModelBuilder class
We’ve made the following usability improvements to the ModelBuilder class:

Seamless transition from training to inference – ModelBuilder now integrates directly with SageMaker training interfaces to make sure that the correct file path to the latest trained model artifact is automatically computed, simplifying the workflow from model training to deployment.
Unified inference interface – Previously, the SageMaker SDK offered separate interfaces and workflows for different types of inference, such as real-time, batch, serverless, and asynchronous inference. To simplify the model deployment process and provide a consistent experience, we have enhanced ModelBuilder to serve as a unified interface that supports multiple inference types.
Ease of development, testing, and production handoff – We are adding support for local mode testing with ModelBuilder so that users can effortlessly debug and test their processing and inference scripts with faster local testing without including a container, and a new function that outputs the latest container image for a given framework so you don’t have to update the code each time a new LMI release comes out.
Customizable inference preprocessing and postprocessing – ModelBuilder now allows you to customize preprocessing and postprocessing steps for inference. By enabling scripts to filter content and remove personally identifiable information (PII), this integration streamlines the deployment process, encapsulating the necessary steps within the model configuration for better management and deployment of models with specific inference requirements.
Benchmarking support – The new benchmarking support in ModelBuilder empowers you to evaluate deployment options—like endpoints and containers—based on key performance metrics such as latency and cost. With the introduction of a Benchmarking API, you can test scenarios and make informed decisions, optimizing your models for peak performance before production. This enhances efficiency and provides cost-effective deployments.

In the following sections, we discuss these improvements in more detail and demonstrate how to customize, test, and deploy your model.
Seamless deployment from ModelTrainer class
ModelBuilder integrates seamlessly with the ModelTrainer class; you can simply pass the ModelTrainer object that was used for training the model directly to ModelBuilder in the model parameter. In addition to the ModelTrainer, ModelBuilder also supports the Estimator class and the result of the SageMaker Core TrainingJob.create() function, and automatically parses the model artifacts to create a SageMaker Model object. With resource chaining, you can build and deploy the model as shown in the following example. If you followed Part 1 of this series to fine-tune a Meta Llama 3.1 8B model, you can pass the model_trainer object as follows:

# set container URI
image_uri = “763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0″

model_builder = ModelBuilder(
   model=model_trainer, # ModelTrainer object passed onto ModelBuilder directly
   role_arn=role,
   image_uri=image_uri,
inference_spec=inf_spec,
   instance_type=”ml.g5.2xlarge”
)
# deploy the model
model_builder.build().deploy()

Customize the model using InferenceSpec
The InferenceSpec class allows you to customize the model by providing custom logic to load and invoke the model, and specify any preprocessing logic or postprocessing logic as needed. For SageMaker endpoints, preprocessing and postprocessing scripts are often used as part of the inference pipeline to handle tasks that are required before and after the data is sent to the model for predictions, especially in the case of complex workflows or non-standard models. The following example shows how you can specify the custom logic using InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec

class CustomerInferenceSpec(InferenceSpec):
   def load(self, model_dir):
   from transformers import AutoModel
   return AutoModel.from_pretrained(HF_TEI_MODEL, trust_remote_code=True)

def invoke(self, x, model):
return model.encode(x)

def preprocess(self, input_data):
return json.loads(input_data)[“inputs”]

   def postprocess(self, predictions):
   assert predictions is not None
   return predictions

Test using local and in process mode
Deploying a trained model to a SageMaker endpoint involves creating a SageMaker model and configuring the endpoint. This includes the inference script, any serialization or deserialization required, the model artifact location in Amazon Simple Storage Service (Amazon S3), the container image URI, the right instance type and count, and more. The machine learning (ML) practitioners need to iterate over these settings before finally deploying the endpoint to SageMaker for inference. The ModelBuilder offers two modes for quick prototyping:

In process mode – In this case, the inferences are made directly within the same inference process. This is highly useful in quickly testing the inference logic provided through InferenceSpec and provides immediate feedback during experimentation.
Local mode – The model is deployed and run as a local container. This is achieved by setting the mode to LOCAL_CONTAINER when you build the model. This is helpful to mimic the same environment as the SageMaker endpoint. Refer to the following notebook for an example.

The following code is an example of running inference in process mode, with a custom InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
from sagemaker.serve import Mode
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.builder.model_builder import ModelBuilder

value: str = “Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.nDaniel: Hello, Girafatron!nGirafatron:”
schema = SchemaBuilder(value,
{“generated_text”: “Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron: Hi, Daniel. I was just thinking about how magnificent giraffes are and how they should be worshiped by all.\nDaniel: You and I think alike, Girafatron. I think all animals should be worshipped! But I guess that could be a bit impractical…\nGirafatron: That’s true. But the giraffe is just such an amazing creature and should always be respected!\nDaniel: Yes! And the way you go on about giraffes, I could tell you really love them.\nGirafatron: I’m obsessed with them, and I’m glad to hear you noticed!\nDaniel: I'”})

# custom inference spec with hugging face pipeline
class MyInferenceSpec(InferenceSpec):
   def load(self, model_dir: str):
…
   def invoke(self, input, model):
…
def preprocess(self, input_data):
…
def postprocess(self, predictions):
…

inf_spec = MyInferenceSpec()

# Build ModelBuilder object in IN_PROCESS mode
builder = ModelBuilder(inference_spec=inf_spec,
   mode=Mode.IN_PROCESS,
   schema_builder=schema
   )

# Build and deploy the model
model = builder.build()
predictor=model.deploy()

# make predictions
predictor.predict(“How are you today?”)

As the next steps, you can test it in local container mode as shown in the following code, by adding the image_uri. You will need to include the model_server argument when you include the image_uri.

image_uri = ‘763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04′

builder = ModelBuilder(inference_spec=inf_spec,
   mode=Mode.LOCAL_CONTAINER, # you can change it to Mode.SAGEMAKER_ENDPOINT for endpoint deployment
   schema_builder=schema,
   image_uri=image,
   model_server=ModelServer.TORCHSERVE
   )

model = builder.build()
predictor = model.deploy()

predictor.predict(“How are you today?”)

Deploy the model
When testing is complete, you can now deploy the model to a real-time endpoint for predictions by updating the mode to mode.SAGEMAKER_ENDPOINT and providing an instance type and size:

sm_predictor = model.deploy(
   initial_instance_count=1,
   instance_type=”ml.g5.2xlarge”,
   mode=Mode.SAGEMAKER_ENDPOINT,
   role=execution_role,
)

sm_predictor.predict(“How is the weather?”)

In addition to real-time inference, SageMaker supports serverless inference, asynchronous inference, and batch inference modes for deployment. You can also use InferenceComponents to abstract your models and assign CPU, GPU, accelerators, and scaling policies per model. To learn more, see Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker.
After you have the ModelBuilder object, you can deploy to any of these options simply by adding the corresponding inference configurations when deploying the model. By default, if the mode is not provided, the model is deployed to a real-time endpoint. The following are examples of other configurations:

Deploy the model to a serverless endpoint:

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
endpoint_name=”serverless-endpoint”,
inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))

Deploy the model to an asynchronous endpoint:

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3_utils import s3_path_join

predictor = model_builder.deploy(
endpoint_name=”async-endpoint”,
inference_config=AsyncInferenceConfig(
output_path=s3_path_join(“s3://”, bucket, “async_inference/output”)))

Run a batch transform job for offline inference on a dataset:

from sagemaker.batch_inference.batch_transform_inference_config import BatchTransformInferenceConfig

transformer = model_builder.deploy(
endpoint_name=”batch-transform-job”,
    inference_config=BatchTransformInferenceConfig(
    instance_count=1,
    instance_type=’ml.m5.large’,
        output_path=s3_path_join(“s3://”, bucket, “batch_inference/output”),
        test_data_s3_path = s3_test_path
    ))
print(transformer)

Deploy a multi-model endpoint using InferenceComponent:

from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

predictor = model_builder.deploy(
endpoint_name=”multi-model-endpoint”,
    inference_config=ResourceRequirements(
    requests={
        “num_cpus”: 0.5,
            “memory”: 512,
            “copies”: 2,
        },
   limits={},
))

Clean up
If you created any endpoints when following this post, you will incur charges while it is up and running. As best practice, delete any endpoints if they are no longer required, either using the AWS Management Console, or using the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion
In this two-part series, we introduced the ModelTrainer and the ModelBuilder enhancements in the SageMaker Python SDK. Both classes aim to reduce the complexity and cognitive overhead for data scientists, providing you with a straightforward and intuitive interface to train and deploy models, both locally on your SageMaker notebooks and to remote SageMaker endpoints.
We encourage you to try out the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by referring to the SDK documentation and sample notebooks on the GitHub repo, and let us know your feedback in the comments!

About the Authors
Durga Sury is a Senior Solutions Architect on the Amazon SageMaker team. Over the past 5 years, she has worked with multiple enterprise customers to set up a secure, scalable AI/ML platform built on SageMaker.
Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.

Accelerate your ML lifecycle using the new and improved Amazon SageMak …

Posted on December 13, 2024 by i-genie

Amazon SageMaker has redesigned its Python SDK to provide a unified object-oriented interface that makes it straightforward to interact with SageMaker services. The new SDK is designed with a tiered user experience in mind, where the new lower-level SDK (SageMaker Core) provides access to full breadth of SageMaker features and configurations, allowing for greater flexibility and control for ML engineers. The higher-level abstracted layer is designed for data scientists with limited AWS expertise, offering a simplified interface that hides complex infrastructure details.
In this two-part series, we introduce the abstracted layer of the SageMaker Python SDK that allows you to train and deploy machine learning (ML) models by using the new ModelTrainer and the improved ModelBuilder classes.
In this post, we focus on the ModelTrainer class for simplifying the training experience. The ModelTrainer class provides significant improvements over the current Estimator class, which are discussed in detail in this post. We show you how to use the ModelTrainer class to train your ML models, which includes executing distributed training using a custom script or container. In Part 2, we show you how to build a model and deploy to a SageMaker endpoint using the improved ModelBuilder class.
Benefits of the ModelTrainer class
The new ModelTrainer class has been designed to address usability challenges associated with Estimator class. Moving forward, ModelTrainer will be the preferred approach for model training, bringing significant enhancements that greatly improve the user experience. This evolution marks a step towards achieving a best-in-class developer experience for model training. The following are the key benefits:

Improved intuitiveness – The ModelTrainer class reduces complexity by consolidating configurations into just few core parameters. This streamlining minimizes cognitive overload, allowing users to focus on model training rather than configuration intricacies. Additionally, it employs intuitive config classes for straightforward platform interactions.
Simplified script mode and BYOC – Transitioning from local development to cloud training is now seamless. The ModelTrainer automatically maps source code, data paths, and parameter specifications to the remote execution environment, eliminating the need for special handshakes or complex setup processes.
Simplified distributed training – The ModelTrainer class provides enhanced flexibility for users to specify custom commands and distributed training strategies, allowing you to directly provide the exact command you want to run in your container through the command parameter in the SourceCode This approach decouples distributed training strategies from the training toolkit and framework-specific estimators.
Improved hyperparameter contracts – The ModelTrainer class passes the training job’s hyperparameters as a single environment variable, allowing the you to load the hyperparameters using a single SM_HPSvariable.

To further explain each of these benefits, we demonstrate with examples in the following sections, and finally show you how to set up and run distributed training for the Meta Llama 3.1 8B model using the new ModelTrainer class.
Launch a training job using the ModelTrainer class
The ModelTrainer class simplifies the experience by letting you customize the training job, including providing a custom script, directly providing a command to run the training job, supporting local mode, and much more. However, you can spin up a SageMaker training job in script mode by providing minimal parameters—the SourceCode and the training image URI.
The following example illustrates how you can launch a training job with your own custom script by providing just the script and the training image URI (in this case, PyTorch), and an optional requirements file. Additional parameters such as the instance type and instance size are automatically set by the SDK to preset defaults, and parameters such as the AWS Identity and Access Management (IAM) role and SageMaker session are automatically detected from the current session and user’s credentials. Admins and users can also overwrite the defaults using the SDK defaults configuration file. For the detailed list of pre-set values, refer to the SDK documentation.

from sagemaker.modules.train import ModelTrainer
from sagemaker.modules.configs import SourceCode, InputData

# image URI for the training job
pytorch_image = “763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-cpu-py310″
# you can find all available images here
# https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/sagemaker-algo-docker-registry-paths.html

# define the script to be run
source_code = SourceCode(
source_dir=”basic-script-mode”,
requirements=”requirements.txt”,
entry_script=”custom_script.py”,
)

# define the ModelTrainer
model_trainer = ModelTrainer(
training_image=pytorch_image,
source_code=source_code,
base_job_name=”script-mode”,
)

# pass the input data
input_data = InputData(
channel_name=”train”,
data_source=training_input_path, #s3 path where training data is stored
)

# start the training job
model_trainer.train(input_data_config=[input_data], wait=False)

With purpose-built configurations, you can now reuse these objects to create multiple training jobs with different hyperparameters, for example, without having to re-define all the parameters.
Run the job locally for experimentation
To run the preceding training job locally, you can simply set the training_mode parameter as shown in the following code:

from sagemaker.modules.train.model_trainer import Mode

…
model_trainer = ModelTrainer(
training_image=pytorch_image,
source_code=source_code,
base_job_name=”script-mode-local”,
training_mode=Mode.LOCAL_CONTAINER,
)
model_trainer.train()

The training job runs remotely because training_mode is set to Mode.LOCAL_CONTAINER. If not explicitly set, the ModelTrainer runs a remote SageMaker training job by default. This behavior can also be enforced by changing the value to Mode.SAGEMAKER_TRAINING_JOB. For a full list of the available configs, including compute and networking, refer to the SDK documentation.
Read hyperparameters in your custom script
The ModelTrainer supports multiple ways to read the hyperparameters that are passed to a training job. In addition to the existing support to read the hyperparameters as command line arguments in your custom script, ModelTrainer also supports reading the hyperparameters as individual environment variables, prefixed with SM_HPS_<hyperparameter-key>, or as a single environment variable dictionary, SM_HPS.
Suppose the following hyperparameters are passed to the training job:

hyperparams = {
“learning_rate”: 1e-5,
“epochs”: 2,
}

model_trainer = ModelTrainer(
…
hyperparameters=hyperparams,
…
)

You have the following options:

Option 1 – Load the hyperparameters into a single JSON dictionary using the SM_HPS environment variable in your custom script:

def main():
hyperparams = json.loads(os.environ[“SM_HPS”])
learning_rate = hyperparams.get(“learning_rate”)
epochs = hyperparams.get(“epochs”, 1)
…

Option 2 – Read the hyperparameters as individual environment variables, prefixed by SM_HP as shown in the following code (you need to explicitly specify the correct input type for these variables):

def main():
learning_rate = float(os.environ.get(“SM_HP_LEARNING_RATE”, 3e-5))
epochs = int(os.environ.get(“SM_HP_EPOCHS”, 1)
…

Option 3 – Read the hyperparameters as AWS CLI arguments using parse.args:

def main():
  parser = argparse.ArgumentParser()
   parser.add_argument(“–learning_rate”, type=float, default=3e-5)
   parser.add_argument(“–epochs”, type=int, default=1)

    args = parse_args()

    learning_rate = args.learning_rate
    epochs = args.epochs

Run distributed training jobs
SageMaker supports distributed training to support training for deep learning tasks such as natural language processing and computer vision, to run secure and scalable data parallel and model parallel jobs. This is usually achieved by providing the right set of parameters when using an Estimator. For example, to use torchrun, you would define the distribution parameter in the PyTorch Estimator and set it to “torch_distributed”: {“enabled”: True}.
The ModelTrainer class provides enhanced flexibility for users to specify custom commands directly through the command parameter in the SourceCode class, and supports torchrun, torchrun smp, and the MPI strategies. This capability is particularly useful when you need to launch a job with a custom launcher command that is not supported by the training toolkit.
In the following example, we show how to fine-tune the latest Meta Llama 3.1 8B model using the default launcher script using Torchrun on a custom dataset that’s preprocessed and saved in an Amazon Simple Storage Service (Amazon S3) location:

from sagemaker.modules.train import ModelTrainer
from sagemaker.modules.distributed import Torchrun
from sagemaker.modules.configs import Compute, SourceCode, InputData

# provide image URI – update the URI if you’re in a different region
pytorch_image = “763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.2.0-gpu-py310″

torchrun = Torchrun()

hyperparameters = {
…
}

# Compute configuration for the training job
compute = Compute(
   instance_count=1,
   instance_type=”ml.g5.12xlarge”,
   volume_size_in_gb=96,
   keep_alive_period_in_seconds=3600,
)

# Initialize the ModelTrainer with the specified configurations
model_trainer = ModelTrainer(
    training_image=pytorch_image,
   source_code=source_code,
compute=compute,
   distributed_runner=torchrun,
   hyperparameters=hyperparameters,
)

# pass the input data
input_data = InputData(
channel_name=”dataset”,
data_source=”s3://your-bucket/your-prefix”, # this is the s3 path where processed data is stored
)

# Start the training job
model_trainer.train(input_data_config=[input_data], wait=False)

If you wanted to customize your torchrun launcher script, you can also directly provide the commands using the command parameter:

# Define the source code configuration for the distributed training job
source_code = SourceCode(
source_dir=”distributed-training-scripts”,
    requirements=”requirements.txt”,
   # Custom command for distributed training launcher script
    command=”torchrun –nnodes 1
   –nproc_per_node 4
   –master_addr algo-1
   –master_port 7777
fine_tune_llama.py”
)

# Initialize the ModelTrainer with the specified configurations
model_trainer = ModelTrainer(
training_image=pytorch_image,
source_code=source_code,
compute=compute,
)

# Start the training job
model_trainer.train(..)

For more examples and end-to-end ML workflows using the SageMaker ModelTrainer, refer to the GitHub repo.
Conclusion
The newly launched SageMaker ModelTrainer class simplifies the user experience by reducing the number of parameters, introducing intuitive configurations, and supporting complex setups like bringing your own container and running distributed training. Data scientists can also seamlessly transition from local training to remote training and training on multiple nodes using the ModelTrainer.
We encourage you to try out the ModelTrainer class by referring to the SDK documentation and sample notebooks on the GitHub repo. The ModelTrainer class is available from the SageMaker SDK v2.x onwards, at no additional charge. In Part 2 of this series, we show you how to build a model and deploy to a SageMaker endpoint using the improved ModelBuilder class.

Amazon Q Apps supports customization and governance of generative AI-p …

Posted on December 13, 2024 by i-genie

We are excited to announce new features that allow creation of more powerful apps, while giving more governance control using Amazon Q Apps, a capability within Amazon Q Business that allows you to create generative AI-powered apps based on your organization’s data. These features enhance app customization options that let business users tailor solutions to their specific individual or organizational requirements. We have introduced new governance features for administrators to endorse user-created apps with app verification, and to organize app libraries with customizable label categories that reflect their organizations. App creators can now share apps privately and build data collection apps that can collate inputs across multiple users. These additions are designed to improve how companies use generative AI in their daily operations by focusing on admin controls and capabilities that unlock new use cases.
In this post, we examine how these features enhance the capabilities of Amazon Q Apps. We explore the new customization options, detailing how these advancements make Amazon Q Apps more accessible and applicable to a wider range of enterprise customers. We focus on key features such as custom labels, verified apps, private sharing, and data collection apps (preview).
Endorse quality apps and customize labels in the app library
To help with discoverability of published Amazon Q Apps and address questions about quality of user-created apps, we have launched verified apps. Verified apps are endorsed by admins, indicating they have undergone approval based on your company’s standards. Admins can endorse published Amazon Q Apps by updating their status from Default to Verified directly on the Amazon Q Business console. Admins can work closely with their business stakeholders to determine the criteria for verifying apps, based on their organization’s specific needs and policies. This admin-led labeling capability is a reactive approach to endorsing published apps, without gating the publishing process for app creators.
When users access the library, they will see a distinct blue checkmark icon on any apps that have been marked as Verified by admins (as shown in the following screenshot). Additionally, verified apps are automatically surfaced to the top of the app list within each category, making them easily discoverable. To learn more about verifying apps, refer to Understanding and managing Verified Amazon Q Apps.

The next feature we discuss is custom labels. Admins can create custom category labels for app users to organize and classify apps in the library to reflect their team functions or organizational structure. This feature enables admins to create and manage these labels on the Amazon Q Business console, and end-users can use them at app creation and to discover relevant apps in the library. Admins can update the category labels at any time to tailor towards specific business needs depending on their use cases. For example, admins that manage Amazon Q Business app environments for marketing organizations might add labels like Product Marketing, PR, Ads, or Sales solely for the users on the marketing team to use (see the following screenshot).

Users on the marketing team who create apps can use the custom labels to slot their app in the right category, which will help other users discover apps in the library based on their focus area (as shown in the following screenshot). To learn more about custom labels, see Custom labels for Amazon Q Apps.

Share your apps with select users
App creators can now use advanced sharing options to create more granular controls over apps and facilitate collaboration within their organizations. With private sharing, you have the option to share an app with select individuals or with all app users (which was previously possible). Sharing of any extent will still display the app in the library, but with private sharing, it will only be visible to app users with whom it has been shared. This means the library continues to be the place where users discover apps that they have access to. This feature unlocks the ability to enable apps only to the intended audience and helps reduce “noise” in the library from apps that aren’t necessarily relevant for all users. App creators have the ability to test updates before they are ready to publish changes, helping make sure app iterations and refinements aren’t shared before they are ready to widely publish the revised version.
To share an app with specific users, creators can add each user using their full email address (see the following screenshot). Users are only added after the email address match is found, making sure creators don’t unknowingly give access to someone who doesn’t have access to that Amazon Q Business app environment. To learn more about private sharing, see Sharing Amazon Q Apps.

Unlock new use cases with data collection
The last feature we share in this post is data collection apps (preview), a new capability that allows you to record inputs provided by other app users, resulting in a new genre of Amazon Q Apps such as team surveys and project retrospectives. This enhancement enables you to collate data across multiple users within your organization, further enhancing the collaborative quality of Amazon Q Apps for various business needs. These apps can further use generative AI to analyze the collected data, identify common themes, summarize ideas, and provide actionable insights.
After publishing a data collection app to the library, creators can share the unique link to invite their colleagues to participate. You must share the unique link to get submissions for your specific data collection. When app users open the data collection app from the library, it triggers a fresh data collection with its own unique shareable link, for which they are the designated owner. As the owner of a data collection, you can start new rounds and manage controls to start and stop accepting new data submissions, as well as reveal or hide the collected data. To learn more about data collection apps, see Data collection in Amazon Q Apps.

Conclusion
In this post, we discussed how these new features for Amazon Q Apps in Amazon Q Business make generative AI more customizable and governable for enterprise users. From custom labels and verified apps to private sharing and data collection capabilities, these innovations enable organizations to create, manage, and share AI-powered apps that align with their specific business needs while maintaining appropriate controls.
For more information, see Creating purpose-built Amazon Q Apps.

About the Author
Tiffany Myers is a Product Manager at AWS, where she leads bringing in new capabilities while maintaining the simplicity of Amazon Q Business and Amazon Q Apps, drawing inspiration from the adaptive intelligence of amphibians in nature to help customers transform and evolve their businesses through generative AI.

How to Re-Signal Your Klaviyo Return Visitors for 25x ROI

Posted on December 13, 2024 by i-genie

Here’s a fact to chew on. Return visitors are 2x more likely to convert than first-timers.

When someone comes back to your site, they’re raising their hand and saying, “Hey, I’m interested.”

The problem? If you’re relying on Klaviyo to track those visitors, you’re leaving money on the table.

Klaviyo’s tracking is tied to a 7-day cookie window. Once that timer runs out, return visitors look like strangers, and your chance to engage them vanishes.

Don’t let Klaviyo cookies expiring in 7 days knock out your welcome back and abandoned cart flows for Black Friday!Read about how to send Klaviyo signals of existing contacts in our timely tutorial. Then get Klaviyo Signal set up on your site ASAP!https://t.co/MS9H9xDZ6q pic.twitter.com/fS0aRyaacO— CustomersAI (@CustomersAI) November 21, 2024

It’s a huge blind spot! Especially for ecommerce brands trying to make every interaction count.

This is where Signal by Customers.ai steps in.

Signal extends your reach by identifying return visitors beyond that seven-day cutoff, giving you the power to re-engage high-intent shoppers when they’re most ready to buy.

The result? More sales, better segmentation, and a serious boost to your ROI.

Let’s break down why re-signaling is the tactic your Klaviyo campaigns have been missing.

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Why Re-Signaling Klaviyo Return Visitors Matter More Than You Think

First-time visitors are like window shoppers. They’re browsing, getting a feel for your vibe, and maybe saving something to their cart.

Return visitors? They’re a whole other story.

When someone comes back to your site, they’re not “just looking.” They’re warmed up, curious, and way more likely to convert.

In fact, when someone comes back to your site, they’re 60% more likely to make a purchase than someone who’s visiting for the first time. That’s huge!

Stop ignoring your return visitors Look what happened when our customer added https://t.co/jkRIu5kbND Signal to only their abandoned cart flows. 30% increase in conversions 34% increase in conversion rate 29% increase in revenue per recipientIf you aren’t using… pic.twitter.com/P2DmmFZwdW— CustomersAI (@CustomersAI) December 5, 2024

Recognizing these visitors is key to scaling your business.

They’re already familiar with your brand, they’ve shown intent, and they’re primed to take action. However, if you don’t know who they are when they return, you can’t re-engage them effectively.

For ecommerce brands, the ability to identify and act on this renewed interest is essential. And that’s why re-signaling matters.

The Klaviyo Return Visitor Identification Problem

Klaviyo is great, don’t get us wrong (we legit built full integrations because we love it) but its cookie-based tracking has a major flaw in that their cookies expire after seven days.

That means if someone visits your site, leaves, and comes back a week later, Klaviyo has no idea it’s the same person.

It’s like they’re hitting the reset button on every returning visitor.

Here’s why that’s a big deal:

You miss segmentation opportunities: That visitor could’ve been added to a high-intent audience for targeted campaigns. Instead, they’re lumped in with cold traffic.

No personalization: Without knowing it’s a returning customer, you can’t tailor your messaging to their specific behaviors or interests.

Engagement falls flat: You’re treating a warm lead like a stranger, which is a surefire way to lose their interest.

And the worst part?

This costs you revenue.

Returning visitors are some of the most valuable traffic you’ll get, but if you’re not recognizing them, you’re missing out.

Plus, your ad spend takes a hit too. Retargeting campaigns aimed at warm audiences miss the mark because Klaviyo isn’t keeping up.

The good news?

This isn’t a lost cause. Tools like Signal make sure those return visitors don’t slip through the cracks. More on that next.

Enter Signal: The Game-Changer for Klaviyo Return Visitor Identification

Here’s where Signal by Customers.ai swoops in to save the day.

One of our customers made $60K in just 30 days using https://t.co/jkRIu5kJDb Signal! It tracks return visitors and sends them into the right Klaviyo flow—even after Klaviyo cookies expire. Want to learn more? https://t.co/2EBYXTEOUK pic.twitter.com/RmCO1Cxm1v— CustomersAI (@CustomersAI) October 9, 2024

Signal takes Klaviyo’s already-powerful platform and fills in its biggest blind spot – identifying return visitors after that 7-day cookie window shuts down.

What it is

Signal is an advanced tracking and re-identification tool that integrates seamlessly with Klaviyo to extend its capabilities.

Think of it as the missing link that ensures no return visitor slips through the cracks. It goes beyond cookie-based tracking by using smarter technology to recognize visitors who return after days, weeks, or even months.

With Signal, you get:

Real-Time Recognition: Know exactly who’s coming back to your site, even after Klaviyo’s tracking stops.

Data Sync: Visitor data is automatically updated in Klaviyo, so you’re always ready to act.

Stronger Campaigns: With better data, your engagement strategies are sharper, more relevant, and more effective.

Why It Matters

More Accurate Data: No more gaps in tracking. You get the full picture of who’s visiting your site, even after weeks or months.

Better Engagement: Knowing who your return visitors are means you can tailor your campaigns with pinpoint precision, delivering the right message at the right time and improving your Klaviyo email flow open rates.

Increased ROI: More engagement and smarter targeting equal higher conversions and better returns on your marketing spend.

Signal is an upgrade for your entire Klaviyo strategy.

It ensures you never lose sight of high-intent visitors, keeping your marketing sharp and your revenue flowing.

How Re-Signaling Klaviyo Return Visitors Works

Signal transforms how you track, identify, and engage high-intent customers.

Here’s how the re-signaling process works:

Step 1: A Visitor Returns After 7 DaysLet’s say a potential customer checks out your site, leaves, and then comes back two weeks later.

Klaviyo’s cookies expired, so it has no idea who they are. Without Signal, this visitor would be treated as brand new—wasting a golden opportunity to re-engage with a warm lead.

Step 2: Signal Identifies ThemSignal steps up and recognizes the returning visitor, pulling in their data from previous interactions.

Whether they browsed products, abandoned a cart, or subscribed to an email list, Signal knows their history and syncs it back to Klaviyo.

Step 3: Syncs Data to KlaviyoNow armed with fresh insights, Klaviyo can re-engage that visitor like they never left.

Their behavior, preferences, and past interactions are back in play, ready for personalized marketing.

Step 4: Trigger Targeted Emails or CampaignsWith Signal’s data, you can trigger campaigns tailored to their actions.

Abandoned cart email? Done.

A personalized product recommendation? Easy.

A timely promo that speaks to their browsing history? You got it.

Why This Tactic Gets Overlooked

Marketers assume Klaviyo’s built-in tracking is “good enough” but that 7-day cookie expiration creates a major gap. Most brands don’t realize how much valuable traffic they’re losing because they don’t know these visitors are returning.

By extending Klaviyo’s reach, Signal makes re-signaling a powerful, yet often ignored, tactic.

Get the full breakdown of setting up triggering and setting up flows for new vs. return visitors in Klaviyo.

The Benefits of Re-Signaling Your Klaviyo Return Visitors

Let me be clear, re-signaling isn’t just a “nice-to-have” tactic. It is a huge opportunity that most businesses don’t even know exists.

Here’s what you unlock when you bring Signal into your Klaviyo strategy:

1. Boost Conversions

Return visitors aren’t just browsing. They’re back because they’re interested.

Re-signaling these contacts makes sure you catch them at the right moment with the right message, giving you the best shot at converting.

Example: A customer browses your skincare line but doesn’t buy. Two weeks later, they return. Signal recognizes them, syncs with Klaviyo, and triggers a personalized email featuring the exact product they viewed, along with a limited-time discount. Sale made.

2. Improve Segmentation

Stop lumping returning visitors in with cold traffic!

Re-signaling lets you add these high-intent users to more precise audience segments. Whether it’s a “visited twice in 30 days” segment or a group of shoppers browsing a specific category, better data means better targeting.

Example: A visitor repeatedly checks out your running shoes category. Signal syncs this behavior, and Klaviyo adds them to a “running enthusiasts” segment. You then send them a campaign showcasing your new trail runners. Engagement skyrockets.

3. Increase Retention

Customers don’t always buy on the first visit or even the second. With Signal, you stay on their radar, reminding them why they came to you in the first place.

Example: A returning customer checks out your subscription page but doesn’t commit. Signal flags this activity, and Klaviyo sends them an email breaking down the benefits of subscribing, paired with a 10% off offer. They’re back in the game.

4. Optimize Ad Spend

Reaching warm audiences costs less and converts better than spraying ads at cold traffic. Signal ensures you’re spending your budget where it matters – on people who’ve already shown interest and are ready to take action.

Example: Instead of targeting a general retargeting pool, you use Signal to focus your Meta Ads on returning visitors who recently viewed your high-ticket items. The result? A higher ROAS (return on ad spend) with less wasted budget.

5. Enhance Lifecycle Marketing

Re-signaling fuels your lifecycle campaigns with fresh, relevant data. By keeping track of return visitors, you can craft messaging that aligns perfectly with where they are in their customer journey.

Example: A visitor who returned to your site after viewing baby gear gets synced into a Klaviyo flow tailored for new parents. Over time, they receive a series of helpful content, product suggestions, and time-sensitive offers that lead to a big purchase—and keep them coming back.

The takeaway? Re-signaling isn’t just about spotting return visitors. It’s about sharper targeting, stronger customer relationships, and giving your ROI an extra oomph.

How to Get Started with Signal and Klaviyo

Re-signaling isn’t just a cool idea. It’s the tactic that’ll transform how you connect with return visitors and drive sales.

If you’re serious about squeezing every drop of ROI out of your Klaviyo campaigns, this is the move you can’t afford to skip.

Signal bridges Klaviyo’s 7-day cookie gap, making sure no high-intent visitor slips through the cracks.

It keeps your data sharp, your campaigns smarter, and your conversions climbing.

So, what are you waiting for?

Start using Signal today and watch your Klaviyo strategy hit a whole new level. Let’s go!

Get a free Klaviyo signal audit and see how we can help 25x your return visitor performance!

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post How to Re-Signal Your Klaviyo Return Visitors for 25x ROI appeared first on Customers.ai.

Faceless Marketing: The Low-Key Power Move for Bold Ecommerce Brands

Posted on December 13, 2024 by i-genie

Scroll through any ecommerce brand’s Instagram, and you’ll spot a pattern – influencer takeovers, founder vlogs, and personal stories galore. It’s like the face of the brand is doing all the heavy lifting.

And sure, that works – until it doesn’t.

Influencers lose their shine, founders step back, and suddenly the “face” of your brand is gone. Then what?

That’s where faceless marketing comes in.

It’s not about staying anonymous or hiding behind your logo. It’s about building a brand so strong it speaks for itself…no celebrity endorsements or smiling founder selfies required.

And while faceless marketing has become a buzzword in the last few months, it isn’t just a trendy strategy. It’s a game plan for bold ecommerce brands ready to stand out on their own terms.

So if you’re tired of leaning on others to tell your story, it’s time to flip the script and faceless marketing is your power move. Let’s break it down.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

What Is Faceless Marketing (And Why Should You Care?)

Faceless marketing is exactly what it sounds like – a strategy where your brand isn’t tied to a specific face, founder, or influencer.

Instead, it’s built around your product, your values, and the experience you deliver to customers. It’s branding that speaks for itself. No smiling spokesperson or curated “day in the life” reels.

Take ecommerce brands like Glossier, which skyrocketed by focusing on sleek, product-first branding and building a community of real users instead of propping up a single face.

Or think about other brands leveraging UGC – their customers become the stars, showcasing real experiences without needing an official brand “voice”.

How Faceless Marketing It Came About

The bottom line? Influencers are over it. For years, ecommerce brands have leaned on creators to build hype, craft narratives, and carry their entire marketing strategy on their backs.

But as influencer burnout sets in and audiences grow skeptical of overly polished ads, brands are realizing they need something more sustainable.

For ecommerce brands, this approach is a dream come true.

Instead of chasing big names or trying to craft a founder-as-celebrity persona, you get to focus on what really matters – your product.

Why Faceless Marketing Is Popping Off

If your ecommerce brand is pulling in under $5 million a year, you’re likely juggling tight budgets, stiff competition, and the constant hustle to make your brand stand out.

Here’s why it works so well for ambitious ecommerce brands:

Scalable: No need to reinvent the wheel for every campaign. Faceless marketing grows with your business.

Risk-Free: No one person’s scandal, burnout, or PR disaster can derail your brand.

Product-First: It’s all about what you’re selling, not who’s selling it.

This approach is like a safety net for growing brands as it protects your image while allowing you to put the spotlight on your amazing products.

AI-Powered Advertising

How to Unlock AI and Lead Capture Tech for 10X Return on Ad Spend

HOSTED BY

Larry Kim

Founder and CEO, Customers.ai

Free Webinar: Watch Now

Why Faceless Marketing Works for Ecommerce

Ecommerce success often comes down to one thing – trust. And nothing builds trust faster than a brand that keeps it simple and delivers on its promises.

That’s exactly where faceless marketing shines. Here’s why it works:

1. Focus on the Product

When you’re not busy hyping up a spokesperson, you can put the spotlight on your product.

Showing off the quality, benefits, and real-world impact of what you sell builds credibility fast. Customers care about how your product fits into their life, not how great your founder’s dog looks on Instagram (although let’s be honest, an occasional dog photo won’t hurt).

2. Inclusive Appeal

No face, no problem.

Faceless marketing works because it’s relatable to everyone. Without tying your brand to one person, you’re free to connect with diverse audiences on their terms.

It’s like an open invitation that says, “This brand is for you” because that brand can be anything or anyone.

3. Scalability

Let’s talk growth.

Faceless marketing doesn’t care what timezone you’re in, what language you speak, or whether your audience vibes with a specific celebrity.

It works globally without you having to localize a single personality.

Think about Gymshark. They built an empire by focusing on fitness and community, not a single influencer’s abs.

4. Privacy Concerns

People want brands that don’t feel invasive and faceless marketing respects boundaries.

Instead of digging through data to over-personalize every touchpoint, you’re showing up as a brand that’s confident enough to let the product speak for itself.

And customers? They notice.

When you center your marketing on the product, the community, and the experience, you’re building trust.

How to Build a Faceless Brand That Pops

If you’re going the faceless route, you’ve got to bring the heat.

No founder selfies or influencer cameos to lean on means your brand has to do the talking. And it better have something cool to say!

Here’s how to make it pop:

Nail Your Brand Identity

You need a vibe and it better be strong.

Think clean visuals, a logo that’s impossible to forget, and a message that hits every time. What does your brand stand for? Why should customers care? Make it clear, make it consistent, and make it unforgettable.

Your brand needs to scream, “This is who we are!” without over-explaining itself. Think bold, consistent, and instantly recognizable.

Define your core values: What do you stand for, and why should your customers care?

Build a visual identity: Pick a color palette, font, and logo that pop and stick.

Craft a tone of voice: Friendly? Edgy? Luxe? Be consistent across all channels.

Example: Liquid Death. A water brand with a rebellious vibe that’s all about “murdering your thirst.” No founder spotlight, just killer branding and a tone you can’t ignore.

Leverage Community Power

Your customers are your co-creators.

User-generated content (UGC), glowing reviews, and authentic customer stories are the real MVPs here.

Why pay influencers when your actual users can show off your product in action? Let them do the talking while you amplify their voices.

Use UGC: Showcase real customers using your product in their element.

Highlight reviews and testimonials: Create social proof that doesn’t feel scripted.

Encourage tagging: Run campaigns that invite users to share and tag their content.

Example: Lush Cosmetics. They’ve built a cult-like following by spotlighting customer reviews, Instagram stories, and authentic photos of their products in action, all without needing a spokesperson.

Let the Product Speak

This is where your product becomes the star of the show.

Show it off in ways that matter – demo reels, before-and-after pics, or testimonials that feel legit.

Your audience doesn’t need a backstory, they need to see how your product fits into their life. Your product isn’t shy, so don’t be afraid to let it take center stage.

Demo reels: Show your product solving problems or transforming lives.

Tutorials: Teach your audience how to use your product in creative ways.

Case studies: Share real-world stories about how your product made a difference.

Example: Oura Ring. A wellness brand that focuses entirely on its sleek product and what it does for the customer, with product-focused campaigns that make the tech the star.

Automate Like a Pro

Faceless doesn’t mean soulless and automation can feel personal when done right.

Use chatbots to deliver 24/7 customer service, programmatic ads to reach the right people at scale, and smart email workflows that feel personal without a face attached.

It’s how you stay lean while growing big.

Set up chatbots for 24/7 support that actually helps (no robotic vibes).

Run email campaigns with personalized workflows based on behavior.

Use programmatic ads to scale your reach with pinpoint precision.

Example: Chewy. They automate personalized emails, birthday cards, helpful reminders, and product recommendations while making every interaction feel tailored and thoughtful.

When you get the branding, community, product focus, and automation right, your faceless brand becomes a magnet for attention and loyalty.

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

The Faceless Marketing Playbook: Tactics to Try Right Now

Ready to put faceless marketing to work?

Here’s your rapid-fire cheat sheet of tactics that’ll make your ecommerce brand unforgettable. No face required:

1. Design Visually Iconic Campaigns

Your visuals are your face, so make them impossible to ignore.

Build a recognizable logo that pops in any size or format.

Create a cohesive color palette and stick to it like glue.

Use product-focused imagery that highlights features and benefits, not people.

Invest in professional product photography or 3D renders for a polished look.

2. Run Social Campaigns Centered on Customer Experiences

Your customers are the stars of your show. Use social proof to tell your story.

Encourage tagged photos and reviews on Instagram or TikTok.

Share customer testimonials in creative formats (like video reels or carousels).

Use polls, Q&As, and UGC challenges to engage your audience.

Highlight real-life use cases for your products, showing how they solve problems.

3. Leverage Email Marketing to Tell Your Brand’s Story

Email is still your go-to tactic for creating connection without needing a face.

Use storytelling to show the “why” behind your brand (without oversharing).

Focus on what your product does for the customer, not who’s behind it.

Personalize campaigns based on purchase behavior, but keep the tone brand-focused.

Send regular updates showcasing new arrivals, tutorials, or tips.

4. Create Evergreen Content Hubs

Your blog or content library should be a go-to resource that oozes expertise.

Publish guides, how-tos, and case studies related to your niche.

Focus on SEO-driven topics that solve customer problems.

Avoid attaching content to a personality; let the knowledge stand on its own.

Update content regularly to keep it fresh and relevant.

5. Dominate the Packaging Game

Your packaging is part of your brand story—make it memorable.

Use branded, eco-friendly packaging that’s share-worthy (hello, unboxing videos).

Include inserts like thank-you notes or mini product guides that reflect your brand’s vibe.

Add QR codes that link to product tutorials, customer stories, or special offers.

6. Collaborate Without Co-Branding

Team up with other brands or creators, but keep the focus on your product, not personalities.

Host giveaways with complementary ecommerce brands.

Collaborate on limited-edition products that spotlight your shared audiences.

Share your product with micro-influencers but let the reviews come naturally.

7. Make Your FAQ Page a Selling Machine

Your FAQ isn’t just for answering questions—it’s a goldmine for building trust.

Write answers in your brand voice to keep it consistent.

Include visuals like diagrams, gifs, or quick how-to videos.

Highlight unique product features or use cases customers might not know about.

Common Faceless Marketing Pitfalls (And How to Avoid Them)

So, you’ve got the rapid-fire cheat sheet, and you’re ready to dive into faceless marketing like the pro you are.

But hold up! There are some mistakes that even the boldest brands can make. Let’s make sure you don’t trip up on these common pitfalls:

1. Overly Generic Branding

Faceless doesn’t mean boring. If your branding feels too cookie-cutter, your audience won’t connect.

Avoid generic templates or stock imagery—invest in custom visuals.

Infuse your brand with personality through tone, design, and storytelling.

Make your values clear and unique—what do you stand for?

How to avoid it: Look at brands like Everlane, which uses minimalist designs but still makes sustainability and transparency the core of their identity.

2. Lack of Emotional Connection

No face doesn’t mean no feels. Without emotional resonance, your brand might fall flat.

Use storytelling to create a sense of belonging or shared values.

Highlight real customer experiences that tug at the heartstrings.

Build community through social media engagement and interactive content.

How to avoid it: Take notes from Yeti, which focuses on rugged, adventurous lifestyles that resonate with their outdoorsy customers.

3. Ignoring Community Power

If you’re not leveraging your community, you’re leaving money on the table. Customers want to feel like they’re part of something bigger.

Don’t just post—interact. Respond to comments, DMs, and tags.

Encourage customers to share their stories and amplify their voices.

Reward loyalty with exclusive perks or shoutouts.

How to avoid it: Look at Hydro Flask, a brand that thrives on UGC and a loyal, adventure-loving community.

Faceless marketing isn’t just about what you do, it’s about how you make your audience feel. Avoid these mistakes and you’ll build a brand that’s as magnetic as it is memorable.

The Bold Future of Faceless Marketing for Ecommerce

Faceless marketing isn’t about hiding in the shadows or being anonymous. It’s about stepping into the spotlight in a way that’s bigger than any one person.

It’s a strategy that lets your brand, your product, and your community shine, creating something memorable that resonates far beyond a single spokesperson.

For ecommerce brands, this approach is scalable, inclusive, and built for long-term trust in a privacy-conscious time.

So, are you ready to ditch the old-school playbook and start building a faceless strategy that turns heads and drives sales? Awesome. Then let’s make your ecommerce brand unforgettable – no face required.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

Faceless Marketing FAQs

1. How is faceless marketing different from traditional marketing?

Faceless marketing skips the personality-driven approach and focuses solely on the brand or product. Traditional marketing often leans on faces—founders, influencers, or spokespeople—to humanize a brand. Faceless marketing instead builds trust and engagement through shared values, customer stories, and powerful visuals.

2. Why is faceless marketing effective for ecommerce?

Ecommerce thrives on scalability and inclusivity, and faceless marketing delivers both. It allows you to appeal to a diverse audience without tying your brand to a specific identity. Plus, it puts the spotlight on your product, making it the hero of your campaigns, which drives conversions and builds trust.

3. What are the key components of faceless marketing?

Brand Identity: A strong logo, cohesive visuals, and consistent messaging.

Customer-Driven Content: UGC, testimonials, and authentic reviews.

Storytelling: Focused on product benefits or shared values rather than personalities.

Automation: Tools like chatbots and programmatic ads for scalability.

4. What types of businesses can benefit from faceless marketing?

Faceless marketing is ideal for ecommerce brands, SaaS companies, and even DTC brands that want to scale globally. It’s particularly effective for startups with limited budgets who need to focus on product quality and trust-building rather than personal branding.

5. How does faceless marketing handle personalization?

Faceless marketing personalizes through data, not faces. By analyzing customer behavior, you can deliver tailored emails, product recommendations, and retargeting ads. It’s all about making the customer feel seen, not showcasing a brand personality.

6. Is faceless marketing harder to pull off than traditional strategies?

Not necessarily. While it requires a strong focus on design, storytelling, and community engagement, it skips the complexity of managing personalities or maintaining a public-facing figure. The key is nailing your brand identity and letting the product and community take the lead.

7. What are some stats that show faceless marketing works?

UGC-based ads drive 4x higher click-through rates than traditional ads.

90% of consumers say authenticity is important in deciding which brands to support.

Brands with consistent visuals and messaging see up to 23% higher revenue.

8. How can brands build trust without a “face”?

Brands can build trust through transparency, quality, and customer experiences. Showcase product benefits, share authentic user stories, and highlight your values. Trust comes from delivering on promises, not necessarily from a spokesperson.

9. Can faceless marketing be combined with influencer marketing?

Yes, but in a different way. Instead of making the influencer the “face” of your brand, use them to amplify product features or showcase real-life use cases. The focus remains on your product while influencers act as additional touchpoints.

10. How do you create a strong brand identity in faceless marketing?

Define Your Values: What does your brand stand for?

Visual Consistency: Use cohesive colors, fonts, and logo designs.

Voice and Tone: Whether it’s playful or professional, keep it consistent.

Storytelling: Let your product and customers drive the narrative.

11. Are there risks to faceless marketing?

The main risk is being too generic. Without a face, it’s easy to blend in with competitors if your brand identity isn’t strong. Avoid this by creating a unique voice, engaging visuals, and focusing on customer connection.

12. How does faceless marketing appeal to privacy-conscious consumers?

By avoiding overly personal tracking or invasive ads, faceless marketing respects consumer boundaries. It builds trust by focusing on shared values and authentic engagement instead of aggressive, hyper-targeted campaigns.

13. What are some examples of successful faceless marketing campaigns?

Liquid Death: Focused on bold visuals and edgy messaging rather than spokespeople.

Lush Cosmetics: Highlights customer stories and product benefits with zero reliance on a face.

Oura Ring: Let the product’s sleek design and features do the talking, backed by data-driven campaigns.

14. How do you measure success in faceless marketing?

Engagement Metrics: Click-through rates, likes, and shares of product-focused content.

Customer Retention: How well customers connect with your product and return for repeat purchases.

Conversion Rates: The effectiveness of campaigns in turning interest into sales.

15. Why should ecommerce startups consider faceless marketing?

For startups, faceless marketing is scalable, cost-effective, and builds credibility. It puts the product front and center, which is essential when budgets are tight, and trust is still being earned. It also avoids the risk of tying the brand to a single personality that may not resonate with every audience.
The post Faceless Marketing: The Low-Key Power Move for Bold Ecommerce Brands appeared first on Customers.ai.

Microsoft Research Introduces AI-Powered Carbon Budgeting Method: A Re …

Posted on December 12, 2024 by i-genie

Since the Industrial Revolution, burning fossil fuels and changes in land use, especially deforestation, have driven the rise in atmospheric carbon dioxide (CO2). While terrestrial vegetation and oceans serve as natural carbon sinks, absorbing some of this CO2, emissions have consistently outpaced their annual capacity. This imbalance has continuously increased atmospheric CO2 concentrations, fueling global warming and extreme weather events. Understanding the carbon budget—how CO2 is sourced and absorbed—has become essential in combating climate change, especially as countries strive for carbon neutrality.

The primary challenge lies in accurately estimating the carbon budget and its environmental impact. The carbon budget measures the balance between emissions from fossil fuels, cement production, land use changes, and natural sources of CO2 against the absorption capacity of carbon sinks. Addressing the growing climate crisis with accurate and timely data on CO2 levels and carbon sinks is easier. Existing methods fail to track the shifts in global carbon sinks quickly enough, especially when environmental disturbances—such as wildfires or El Niño—alter carbon dynamics unpredictably.

Traditional methods for carbon budgeting typically rely on numerical simulations of the Earth’s carbon cycle. While these models can simulate complex Earth system processes, they often face significant delays. For instance, the Global Carbon Budget 2023 report, which uses data until the end of 2022, illustrates the one-year lag in carbon budget information. This delay limits the effectiveness of current models in providing timely climate data that can guide real-world actions. Researchers need a faster and more reliable way to capture sudden carbon dynamics shifts affecting global warming.

To address these limitations, researchers from Microsoft Research Asia, in collaboration with Tsinghua University, the French Laboratory for Climate and Environmental Sciences, and other global research organizations, introduced an AI-powered method for near-real-time carbon budgeting. By integrating satellite data, dynamic global vegetation models, and ocean model emulators, the research team developed a near-instantaneous carbon sink model capable of predicting carbon budgets with unprecedented speed and accuracy. This model harnesses the power of convolutional neural networks (CNNs) and semi-supervised learning techniques to deliver low-latency results.

The proposed AI-based model utilizes environmental variable observations and historical data to predict global carbon sink levels. The model integrates 12 months of historical data, monthly features, and target outputs. CNNs process this data to compute predictions, while semi-supervised learning provides an unsupervised loss function to improve prediction accuracy. The model processes environmental data from ocean and land sinks and satellite fire emissions to provide real-time updates on CO2 sinks. This methodology ensures that predictions are made with a margin of error of less than 2%, offering a fast, responsive alternative to traditional carbon budgeting methods.

The results of this near-real-time carbon sink model showed promising accuracy. In particular, the model was able to track a dramatic decline in the land carbon sink in 2023. The Amazon rainforest, severely affected by drought, showed a carbon sink loss of 0.31 ± 0.19 GtC. The model also accurately predicted carbon emissions from the 2023 wildfires in North America, contributing 0.58 ± 0.10 GtC to atmospheric CO2. In addition, the model detected a shift from La Niña to a moderate El Niño phase, significantly impacting global carbon dynamics. These findings highlight the effectiveness of the AI model in capturing dynamic environmental changes and producing actionable data in near real-time.

In conclusion, the rapid decline in land carbon sinks poses a serious threat to the effectiveness of global carbon neutrality efforts. The AI-based carbon budget model introduced by the research team from Microsoft Research Asia, Tsinghua University, and the French Laboratory for Climate and Environmental Sciences provides an innovative solution to the challenges of carbon budget estimation. This model’s ability to produce real-time predictions and track environmental shifts more accurately than traditional methods is a crucial step forward in global efforts to combat climate change. By reducing the delay in carbon data updates, this approach enables more effective climate action and policymaking in response to urgent environmental threats.

Check out the Details here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

[Must Subscribe]: Subscribe to our newsletter to get trending AI research and dev updates
The post Microsoft Research Introduces AI-Powered Carbon Budgeting Method: A Real-Time Approach to Tracking Global Carbon Sinks and Emission appeared first on MarkTechPost.

Google AI Releases Gemini 2.0 Flash: A New AI Model that is 2x Faster …

Posted on December 12, 2024 by i-genie

Google AI Research introduces Gemini 2.0 Flash, the latest iteration of its Gemini AI model. This release focuses on performance improvements, notably a significant increase in speed and expanded multimodal functionality.

A key development in Gemini 2.0 Flash is its enhanced processing speed. Google reports that the new model operates at twice the speed of its predecessor, Gemini 1.5 Pro, while also demonstrating improved performance across various benchmarks. This speed enhancement translates to more efficient processing and faster response times for users.

Gemini 2.0 Flash expands its capabilities in handling diverse data types. The model now includes a Multimodal Live API, enabling real-time processing of audio and video streams. This addition allows developers to create applications that utilize dynamic audio and visual input. Furthermore, native image generation is now integrated, allowing users to create and modify images using conversational text prompts.

Beyond these core advancements, Gemini 2.0 Flash incorporates several other enhancements. Native multilingual audio output is now available with eight distinct voices, increasing accessibility for a broader user base. Improvements to tool and agentic support allow the model to interact more effectively with external tools and systems, facilitating more complex task completion.

In software engineering tasks, Gemini 2.0 Flash achieved a 51.8% score on SWE-bench Verified, a benchmark designed to evaluate coding proficiency. This result indicates the model’s potential for assisting developers with code generation, debugging, and optimization processes.

Google is integrating Gemini 2.0 Flash into its own development tools. Jules, a new AI-powered code agent, utilizes Gemini 2.0 Flash to provide assistance to developers within Google Colaboratory. This integration showcases practical applications of the model within a development environment.

Gemini 2.0 Flash also includes features related to responsible AI development. Support for 109 languages expands the model’s accessibility globally. The integration of SynthID watermarking for all generated image and audio outputs provides a mechanism for tracking provenance and addressing potential issues related to AI-generated content.

The release of Gemini 2.0 Flash represents a further step in the development of Google’s AI models. The focus on increased speed, expanded multimodal capabilities, and improved tool interaction contributes to a more versatile and capable AI system.

As Google continues to develop the Gemini family of models, further refinements and expansions of capabilities are anticipated. Gemini 2.0 Flash contributes to the ongoing advancement of AI technology and its potential applications across various fields.

[Must Subscribe]: Subscribe to our newsletter to get trending AI research and dev updates
The post Google AI Releases Gemini 2.0 Flash: A New AI Model that is 2x Faster than Gemini 1.5 Pro appeared first on MarkTechPost.

LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Fronti …

Posted on December 12, 2024 by i-genie

LG AI Research has released bilingual models expertizing in English and Korean based on EXAONE 3.5 as open source following the success of its predecessor, EXAONE 3.0. The research team has expanded the EXAONE 3.5 models, including three types designed for specific use cases:

The 2.4B model is an ultra-lightweight version optimized for on-device use. It can operate on low-spec GPUs and in environments with limited infrastructure.

A lightweight 7.8B model offers improved performance over its predecessor, the EXAONE-3.0-7.8B-Instruct model while maintaining versatility for general-purpose use.

The 32B model represents a frontier-level high-performance option for demanding applications, catering to users who prioritize computational power.

The EXAONE 3.5 models demonstrate exceptional performance and cost-efficiency, achieved through LG AI Research’s innovative R&D methodologies. The hallmark feature of EXAONE 3.5 is its support for long-context processing, allowing the handling of up to 32,768 tokens. This capability makes it effective in addressing the demands of real-world use cases and Retrieval-Augmented Generation (RAG) scenarios, where extended textual inputs are common. Each model in the EXAONE 3.5 series has demonstrated state-of-the-art performance in real-world applications and tasks requiring long-context understanding.

Training Methodologies and Architectural Innovations of EXAONE 3.5

Training EXAONE 3.5 language models involves a blend of advanced configurations, pre-training strategies, and post-training refinements to maximize performance and usability. The models are built on a state-of-the-art decoder-only Transformer architecture, with configurations varying based on model size. While structurally similar to EXAONE 3.0 7.8B, the EXAONE 3.5 models introduce improvements such as extended context length, supporting up to 32,768 tokens, a significant increase from the previous 4,096 tokens. The architecture incorporates advanced features like SwiGLU non-linearities, Grouped Query Attention (GQA), and Rotary Position Embeddings (RoPE), ensuring efficient processing and enhanced bilingual support for English and Korean. All models share a vocabulary of 102,400 tokens, evenly divided between the two languages.

The pre-training phase of EXAONE 3.5 was conducted in two stages. The first stage focused on diverse data sources to enhance general domain performance, while the second stage targeted specific domains requiring improved capabilities, such as long-context understanding. During the second stage, a replay-based method was employed to address catastrophic forgetting, allowing the model to retain knowledge from the initial training phase. Computational resources were optimized during pre-training; for example, the 32B model achieved high performance with significantly lower computation requirements than other models of similar size. A rigorous decontamination process was applied to eliminate contaminated examples in the training data, ensuring the reliability of benchmark evaluations.

Post-training, the models underwent supervised fine-tuning (SFT) to enhance their ability to respond effectively to varied instructions. This involved creating an instruction-response dataset from a taxonomy of knowledge derived from web corpora. The dataset was designed to include a range of complexities, enabling the model to generalize well across tasks. Preference optimization was then employed using Direct Preference Optimization (DPO) and other algorithms to align the models with human preferences. This process included multiple training stages to prevent over-optimization and improve output alignment with user expectations. LG AI Research conducted extensive reviews to address potential legal risks like copyright infringement and personal information protection to ensure data compliance. Steps were taken to de-identify sensitive data and ensure that all datasets met strict ethical and legal standards.

Benchmark Evaluations: Unparalleled Performance of EXAONE 3.5 Bilingual Models

The evaluation benchmarks of EXAONE 3.5 Models were categorized into three groups: real-world use cases, long-context processing, and general domain tasks. Real-world benchmarks evaluated the models’ ability to understand and respond to user queries in practical scenarios. Long-context benchmarks assessed the models’ capability to process and retrieve information from extended textual inputs, which is critical for RAG applications. General domain benchmarks tested the models’ proficiency in mathematics, coding, and knowledge-based tasks. EXAONE 3.5 models consistently performed well across all benchmark categories. The 32B and 7.8B models excelled in real-world use cases and long-context scenarios, often surpassing baseline models of similar size. For example, the 32B model achieved an average score of 74.3 in real-world use cases, significantly outperforming competitors like Qwen 2.5 32B and Gemma 2 27B.

Image. Performance Comparison Results of EXAONE 3.5 – On Four Benchmarks Representing Long Context Scenarios. (Excluded from results if the model does not support context lengths longer than 16K) | Image Source: LG AI Research Blog (https://www.lgresearch.ai/blog/view?seq=507)

Similarly, in long-context benchmarks, the models demonstrated a superior ability to process and understand extended contexts in both English and Korean. On tests like Needle-in-a-Haystack (NIAH), all three models achieved near-perfect retrieval accuracy, showcasing their robust performance in tasks requiring detailed context comprehension. The 2.4B model was an efficient option for resource-constrained environments, outperforming baseline models of similar size in all categories. Despite its smaller size, it delivered competitive results in general domain tasks, such as solving mathematical problems and writing source code. For example, the 2.4B model scored an average of 63.3 across nine benchmarks in general scenarios, surpassing larger models like Gemma 2 9B in multiple metrics. Real-world use case evaluations incorporated benchmarks like MT-Bench, KoMT-Bench, and LogicKor, where EXAONE 3.5 models were judged on multi-turn responses. They achieved high scores in both English and Korean, highlighting their bilingual proficiency. For instance, the 32B model achieved top-tier results in MT-Bench with a score of 8.51, generating accurate and contextually relevant responses.

이미지 3. Performance Comparison Results of EXAONE 3.5 – On Seven Benchmarks Representing Real-world Use Case Scenarios | Image Source: LG AI Research Blog (https://www.lgresearch.ai/blog/view?seq=507)

In the long-context category, EXAONE 3.5 models were evaluated using benchmarks like LongBench and LongRAG and in-house tests like Ko-WebRAG. The models demonstrated exceptional long-context processing capabilities, consistently outperforming baselines in retrieving and reasoning over extended texts. The 32B model, for example, scored 71.1 on average across long-context benchmarks, cementing its status as a leader in this domain. General domain evaluations included benchmarks for mathematics, coding, and parametric knowledge. The EXAONE 3.5 models delivered competitive performance compared to peers. The 32B model achieved an average score of 74.8 across nine benchmarks, while the 7.8B model scored 70.2.

Image Source : LG AI Research Blog (https://www.lgresearch.ai/blog/view?seq=507)

Responsible AI Development: Ethical and Transparent Practices

The development of EXAONE 3.5 models adhered to LG AI Research’s Responsible AI Development Framework, prioritizing data governance, ethical considerations, and risk management. Recognizing these models’ open nature and potential for widespread use across various domains, the framework aims to maximize social benefits while maintaining fairness, safety, accountability, and transparency. This commitment aligns with the LG AI Ethics Principles, which guide AI technologies’ ethical use and deployment. EXAONE 3.5 models benefit the AI community by addressing feedback from the EXAONE 3.0 release.

However, releasing open models like EXAONE 3.5 also entails potential risks, including inequality, misuse, and the unintended generation of harmful content. LG AI Research conducted an AI ethical impact assessment to mitigate these risks, identifying challenges such as bias, privacy violations, and regulatory compliance. Legal risk assessments were performed on all datasets, and sensitive information was removed through de-identification processes. Bias in training data was addressed through pre-processing documentation and evaluation, ensuring high data quality and fairness. To ensure safe and responsible use of the models, LG AI Research verified the open-source libraries employed and committed to monitoring AI regulations across different jurisdictions. Efforts to enhance the explainability of AI inferences were also prioritized to build trust among users and stakeholders. While fully explaining AI reasoning remains challenging, ongoing research aims to improve transparency and accountability. The safety of EXAONE 3.5 models was assessed using a third-party dataset provided by the Ministry of Science and ICT of the Republic of Korea. This evaluation tested the models’ ability to filter out harmful content, with results showing some effectiveness but highlighting the need for further improvement.

Key Takeaways, Real-World Applications, and Business Partnerships of EXAONE 3.5

Exceptional Long Context Understanding: EXAONE 3.5 models stand out for their robust long-context processing capabilities, achieved through RAG technology. Each model can effectively handle 32K tokens. Unlike models claiming theoretical long-context capacities, EXAONE 3.5 has an “Effective Context Length” of 32K, making it highly functional for practical applications. Its bilingual proficiency ensures top-tier performance in processing complex English and Korean contexts.

Superior Instruction Following Capabilities: EXAONE 3.5 excels in usability-focused tasks, delivering the highest average scores across seven benchmarks representing real-world use cases. This demonstrates its ability to enhance productivity and efficiency in industrial applications. All three models performed significantly better than global models of similar sizes in English and Korean.

Strong General Domain Performance: EXAONE 3.5 models deliver excellent results across nine benchmarks in general domains, particularly in mathematics and programming. The 2.4B model ranks first in average scores among models of similar size, showcasing its efficiency for resource-constrained environments. Meanwhile, the 7.8B and 32B models achieve competitive scores, demonstrating EXAONE 3.5’s versatility in handling various tasks.

Commitment to Responsible AI Development: LG AI Research has prioritized ethical considerations and transparency in the development of EXAONE 3.5. An AI ethical impact assessment identified and addressed potential risks such as inequality, harmful content, and misuse. The models excel at filtering out hate speech and illegal content, although the 2.4B model requires improvement in addressing regional and occupational biases. Transparent disclosure of evaluation results underscores LG AI Research’s commitment to fostering ethical AI development and encouraging further research into responsible AI.

Practical Applications and Business Partnerships: EXAONE 3.5 is being integrated into real-world applications through partnerships with companies like Polaris Office and Hancom. These collaborations aim to incorporate EXAONE 3.5 into software solutions, enhancing efficiency and productivity for both corporate and public sectors. A Proof of Concept (PoC) project with the Hancom highlights the potential for AI-driven innovations to transform government and public institution workflows, showcasing the model’s practical business value.

Conclusion: A New Standard in Open-Source AI

In conclusion, LG AI Research has set a new benchmark with the release of EXAONE 3.5, a 3-model series of open-source LLMs. Combining advanced instruction-following capabilities and unparalleled long-context understanding, EXAONE 3.5 is designed to meet the diverse needs of researchers, businesses, and industries. Its versatile range of models, 2.4B, 7.8B, and 32B, offers tailored solutions for resource-constrained environments and high-performance applications. These open-sourced 3-model series can be accessed on Hugging Face. Users can stay connected by following LG AI Research’s LinkedIn page and LG AI Research Website for the latest updates, insights, and opportunities to engage with their latest advancements.

Sources

LG AI Research LinkedIn Page

EXAONE 3.5 Blog

EXAONE 3.5 Technical Report

EXAONE 3.5 on Hugging Face

EXAONE 3.5 on GitHub

Thanks to the LG AI Research team for the thought leadership/ Resources for this article. LG AI Research team has supported us in this content/article.
The post LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence appeared first on MarkTechPost.

How AWS sales uses Amazon Q Business for customer engagement

Posted on December 12, 2024 by i-genie

Earlier this year, we published the first in a series of posts about how AWS is transforming our seller and customer journeys using generative AI. In addition to planning considerations when building an AI application from the ground up, it focused on our Account Summaries use case, which allows account teams to quickly understand the state of a customer account, including recent trends in service usage, opportunity pipeline, and recommendations to help customers maximize the value they receive from AWS.
In the same spirit of using generative AI to equip our sales teams to most effectively meet customer needs, this post reviews how we’ve delivered an internally-facing conversational sales assistant using Amazon Q Business. We discuss how our sales teams are using it today, compare the benefits of Amazon Q Business as a managed service to the do-it-yourself option, review the data sources available and high-level technical design, and talk about some of our future plans.
Introducing Field Advisor
In April 2024, we launched our AI sales assistant, which we call Field Advisor, making it available to AWS employees in the Sales, Marketing, and Global Services organization, powered by Amazon Q Business. Since that time, thousands of active users have asked hundreds of thousands of questions through Field Advisor, which we have embedded in our customer relationship management (CRM) system, as well as through a Slack application. The following screenshot shows an example of an interaction with Field Advisor.

Field Advisor serves four primary use cases:

AWS-specific knowledge search – With Amazon Q Business, we’ve made internal data sources as well as public AWS content available in Field Advisor’s index. This enables sales teams to interact with our internal sales enablement collateral, including sales plays and first-call decks, as well as customer references, customer- and field-facing incentive programs, and content on the AWS website, including blog posts and service documentation.
Document upload – When users need to provide context of their own, the chatbot supports uploading multiple documents during a conversation. We’ve seen our sales teams use this capability to do things like consolidate meeting notes from multiple team members, analyze business reports, and develop account strategies. For example, an account manager can upload a document representing their customer’s account plan, and use the assistant to help identify new opportunities with the customer.
General productivity – Amazon Q Business specializes in Retrieval Augmented Generation (RAG) over enterprise and domain-specific datasets, and can also perform general knowledge retrieval and content generation tasks. Our sales, marketing, and operations teams use Field Advisor to brainstorm new ideas, as well as generate personalized outreach that they can use with their customers and stakeholders.
Notifications and recommendations – To complement the conversational capabilities provided by Amazon Q, we’ve built a mechanism that allows us to deliver alerts, notifications, and recommendations to our field team members. These push-based notifications are available in our assistant’s Slack application, and we’re planning to make them available in our web experience as well. Example notifications we deliver include field-wide alerts in support of AWS summits like AWS re:Invent, reminders to generate an account summary when there’s an upcoming customer meeting, AI-driven insights around customer service usage and business data, and cutting-edge use cases like autonomous prospecting, which we’ll talk more about in an upcoming post.

Based on an internal survey, our field teams estimate that roughly a third of their time is spent preparing for their customer conversations, and another 20% (or more) is spent on administrative tasks. This time adds up individually, but also collectively at the team and organizational level. Using our AI assistant built on Amazon Q, team members are saving hours of time each week. Not only that, but our sales teams devise action plans that they otherwise might have missed without AI assistance.
Here’s a sampling of what some of our more active users had to say about their experience with Field Advisor:
“I use Field Advisor to review executive briefing documents, summarize meetings and outline actions, as well analyze dense information into key points with prompts. Field Advisor continues to enable me to work smarter, not harder.”– Sales Director
“When I prepare for onsite customer meetings, I define which advisory packages to offer to the customer. We work backward from the customer’s business objectives, so I download an annual report from the customer website, upload it in Field Advisor, ask about the key business and tech objectives, and get a lot of valuable insights. I then use Field Advisor to brainstorm ideas on how to best position AWS services. Summarizing the business objectives alone saves me between 4–8 hours per customer, and we have around five customer meetings to prepare for per team member per month.” – AWS Professional Services, EMEA
“I benefit from getting notifications through Field Advisor that I would otherwise not be aware of. My customer’s Savings Plans were expiring, and the notification helped me kick off a conversation with them at the right time. I asked Field Advisor to improve the content and message of an email I needed to send their executive team, and it only took me a minute. Thank you!” – Startup Account Manager, North America
Amazon Q Business underpins this experience, reducing the time and effort it takes for internal teams to have productive conversations with their customers that drive them toward the best possible outcomes on AWS.
The rest of this post explores how we’ve built our AI assistant for sales teams using Amazon Q Business, and highlights some of our future plans.
Putting Amazon Q Business into action
We started our journey in building this sales assistant before Amazon Q Business was available as a fully managed service. AWS provides the primitives needed for building new generative AI applications from the ground up: services like Amazon Bedrock to provide access to several leading foundation models, several managed vector database options for semantic search, and patterns for using Amazon Simple Storage Service (Amazon S3) as a data lake to host knowledge bases that can be used for RAG. This approach works well for teams like ours with builders experienced in these technologies, as well as for teams who need deep control over every component of the tech stack to meet their business objectives.
When Amazon Q Business became generally available in April 2024, we quickly saw an opportunity to simplify our architecture, because the service was designed to meet the needs of our use case—to provide a conversational assistant that could tap into our vast (sales) domain-specific knowledge bases. By moving our core infrastructure to Amazon Q, we no longer needed to choose a large language model (LLM) and optimize our use of it, manage Amazon Bedrock agents, a vector database and semantic search implementation, or custom pipelines for data ingestion and management. In just a few weeks, we were able to cut over to Amazon Q and significantly reduce the complexity of our service architecture and operations. Not only that, we expected this move to pay dividends—and it has—as the Amazon Q Business service team has continued to add new features (like automatic personalization) and enhance performance and result accuracy.
The following diagram illustrates Field Advisor’s high-level architecture:

Solution overview
We built Field Advisor using the built-in capabilities of Amazon Q Business. This includes how we configured data sources that comprise our knowledge base, indexing documents and relevancy tuning, security (authentication, authorization, and guardrails), and Amazon Q’s APIs for conversation management and custom plugins. We deliver our chatbot experience through a custom web frontend, as well as through a Slack application.
Data management
As mentioned earlier in this post, our initial knowledge base is comprised of all of our internal sales enablement materials, as well as publicly available content including the AWS website, blog posts, and service documentation. Amazon Q Business provides a number of out-of-the-box connectors to popular data sources like relational databases, content management systems, and collaboration tools. In our case, where we have several applications built in-house, as well as third-party software backed by Amazon S3, we make heavy use of Amazon Q connector for Amazon S3, and as well as custom connectors we’ve written. Using the service’s built-in source connectors standardizes and simplifies the work needed to maintain data quality and manage the overall data lifecycle. Amazon Q gives us a templatized way to filter source documents when generating responses on a particular topic, making it straightforward for the application to produce a higher quality response. Not only that, but each time Amazon Q provides an answer using the knowledge base we’ve connected, it automatically cites sources, enabling our sellers to verify authenticity in the information. Previously, we had to build and maintain custom logic to handle these tasks.
Security
Amazon Q Business provides capabilities for authentication, authorization, and access control out of the box. For authentication, we use AWS IAM Identity Center for enterprise single sign-on (SSO), using our internal identity provider called Amazon Federate. After going through a one-time setup for identity management that governs access to our sales assistant application, Amazon Q is aware of the users and roles across our sales teams, making it effortless for our users to access Field Advisor across multiple delivery channels, like the web experience embedded in our CRM, as well as the Slack application.
Also, with our multi-tenant AI application serving thousands of users across multiple sales teams, it’s critical that end-users are only interacting with data and insights that they should be seeing. Like any large organization, we have information firewalls between teams that help us properly safeguard customer information and adhere to privacy and compliance rules. Amazon Q Business provides the mechanisms for protecting each individual document in its knowledge base, simplifying the work required to make sure we’re respecting permissions on the underlying content that’s accessible to a generative AI application. This way, when a user asks a question of the tool, the answer will be generated using only information that the user is permitted to access.
Web experience
As noted earlier, we built a custom web frontend rather than using the Amazon Q built-in web experience. The Amazon Q experience works great, with features like conversation history, sample quick prompts, and Amazon Q Apps. Amazon Q Business makes these features available through the service API, allowing for a customized look and feel on the frontend. We chose this path to have a more fluid integration with our other field-facing tools, control over branding, and sales-specific contextual hints that we’ve built into the experience. As an example, we’re planning to use Amazon Q Apps as the foundation for an integrated prompt library that is personalized for each user and field-facing role.
A look at what’s to come
Field Advisor has seen early success, but it’s still just the beginning, or Day 1 as we like to say here at Amazon. We’re continuing to work on bringing our field-facing teams and field support functions more generative AI across the board. With Amazon Q Business, we no longer need to manage each of the infrastructure components required to deliver a secure, scalable conversational assistant—instead, we can focus on the data, insights, and experience that benefit our salesforce and help them make our customers successful on AWS. As Amazon Q Business adds features, capabilities, and improvements (which we often have the privilege of being able to test in early access) we automatically reap the benefits.
The team that built this sales assistant has been focused on developing—and will be launching soon—deeper integration with our CRM. This will enable teams across all roles to ask detailed questions about their customer and partner accounts, territories, leads and contacts, and sales pipeline. With an Amazon Q custom plugin that uses an internal library used for natural language to SQL (NL2SQL), the same that powers generative SQL capabilities across some AWS database services like Amazon Redshift, we will provide the ability to aggregate and slice-and-dice the opportunity pipeline and trends in product consumption conversationally. Finally, a common request we get is to use the assistant to generate more hyper-personalized customer-facing collateral—think of a first-call deck about AWS products and solutions that’s specific to an individual customer, localized in their language, that draws from the latest available service options, competitive intelligence, and the customer’s existing usage in the AWS Cloud.
Conclusion
In this post, we reviewed how we’ve made a generative AI assistant available to AWS sales teams, powered by Amazon Q Business. As new capabilities land and usage continues to grow, we’re excited to see how our field teams use this, along with other AI solutions, to help customers maximize their value on the AWS Cloud.
The next post in this series will dive deeper into another recent generative AI use case and how we applied this to autonomous sales prospecting. Stay tuned for more, and reach out to us with any questions about how you can drive growth with AI at your business.

About the authors
Joe Travaglini is a Principal Product Manager on the AWS Field Experiences (AFX) team who focuses on helping the AWS salesforce deliver value to AWS customers through generative AI. Prior to AFX, Joe led the product management function for Amazon Elastic File System, Amazon ElastiCache, and Amazon MemoryDB.
Jonathan Garcia is a Sr. Software Development Manager based in Seattle with over a decade of experience at AWS. He has worked on a variety of products, including data visualization tools and mobile applications. He is passionate about serverless technologies, mobile development, leveraging Generative AI, and architecting innovative high-impact solutions. Outside of work, he enjoys golfing, biking, and exploring the outdoors.
Umesh Mohan is a Software Engineering Manager at AWS, where he has been leading a team of talented engineers for over three years. With more than 15 years of experience in building data warehousing products and software applications, he is now focusing on the use of generative AI to drive smarter and more impactful solutions. Outside of work, he enjoys spending time with his family and playing tennis.

Discover insights from your Amazon Aurora PostgreSQL database using th …

Posted on December 12, 2024 by i-genie

Amazon Aurora PostgreSQL-Compatible Edition is a fully managed, PostgreSQL-compatible, ACID-aligned relational database engine that combines the speed, reliability, and manageability of Amazon Aurora with the simplicity and cost-effectiveness of open source databases. Aurora PostgreSQL-Compatible is a drop-in replacement for PostgreSQL and makes it simple and cost-effective to set up, operate, and scale your new and existing PostgreSQL deployments, freeing you to focus on your business and applications.
Effective data management and performance optimization are critical aspects of running robust and scalable applications. Aurora PostgreSQL-Compatible, a managed relational database service, has become an indispensable part of many organizations’ infrastructure to maintain the reliability and efficiency of their data-driven applications. However, extracting valuable insights from the vast amount of data stored in Aurora PostgreSQL-Compatible often requires manual efforts and specialized tooling. Users such as database administrators, data analysts, and application developers need to be able to query and analyze data to optimize performance and validate the success of their applications. Generative AI provides the ability to take relevant information from a data source and deliver well-constructed answers back to the user.
Building a generative AI-based conversational application that is integrated with the data sources that contain relevant content requires time, money, and people. You first need to build connectors to the data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. You also need to hire and staff a large team to build, maintain, and manage such a system.
Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as an Aurora PostgreSQL database, among others). Amazon Q provides out-of-the-box data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.
Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including Aurora PostgreSQL-Compatible, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q Business supported data source connectors, see Amazon Q Business connectors.
In this post, we walk you through configuring and integrating Amazon Q for Business with Aurora PostgreSQL-Compatible to enable your database administrators, data analysts, application developers, leadership, and other teams to quickly get accurate answers to their questions related to the content stored in Aurora PostgreSQL databases.
Use cases
After you integrate Amazon Q Business with Aurora PostgreSQL-Compatible, users can ask questions directly from the database content. This enables the following use cases:

Natural language search – Users can search for specific data, such as records or entries, using conversational language. This makes it straightforward to find the necessary information without needing to remember exact keywords or filters.
Summarization – Users can request a concise summary of the data matching their search query, helping them quickly understand key points without manually reviewing each record.
Query clarification – If a user’s query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, making sure the user receives the most relevant and accurate results.

Overview of the Amazon Q Business Aurora (PostgreSQL) connector
A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration.
A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your data stored in the database. After the PostgreSQL data source is set up, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing data from your Aurora PostgreSQL database. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and adds documents from the data source to its index.
Types of documents
Let’s look at what are considered as documents in the context of the Amazon Q Business Aurora (PostgreSQL) connector. A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document), and access control list (ACL) information to make sure answers are provided from documents that the user has access to.
The Amazon Q Business Aurora (PostgreSQL) connector supports crawling of the following entities as a document:

Table data in a single database
View data in a single database

Each row in a table and view is considered a single document.
The Amazon Q Business Aurora (PostgreSQL) connector also supports field mappings. Field mappings allow you to map document attributes from your data sources to fields in your Amazon Q index. This includes both reserved or default field mappings created automatically by Amazon Q, as well as custom field mappings that you can create and edit.
Refer to Aurora (PostgreSQL) data source connector field mappings for more information.
ACL crawling
Amazon Q Business supports crawling ACLs for document security by default. Turning off ACLs and identity crawling is no longer supported. In preparation for connecting Amazon Q Business applications to AWS IAM Identity Center, enable ACL indexing and identity crawling for secure querying and re-sync your connector. After you turn ACL and identity crawling on, you won’t be able to turn them off.
If you want to index documents without ACLs, make sure the documents are marked as public in your data source.
When you connect a database data source to Amazon Q, Amazon Q crawls user and group information from a column in the source table. You specify this column on the Amazon Q console or using the configuration parameter as part of the CreateDataSource operation.
If you activate ACL crawling, you can use that information to filter chat responses to your end-user’s document access level.
The following are important considerations for a database data source:

You can only specify an allow list for a database data source. You can’t specify a deny list.
You can only specify groups. You can’t specify individual users for the allow list.
The database column should be a string containing a semicolon delimited list of groups.

Refer to How Amazon Q Business connector crawls Aurora (PostgreSQL) ACLs for more information.
Solution overview
In the following sections, we demonstrate how to set up the Amazon Q Business Aurora (PostgreSQL) connector. This connector allows you to query your Aurora PostgreSQL database using Amazon Q using natural language. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.
After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Aurora PostgreSQL database to keep up to date with the database content. This enables you to perform complex searches and retrieve relevant information quickly and efficiently, leading to intelligent insights and informed decision-making. By centralizing search functionality and seamlessly integrating with other AWS services, the connector enhances operational efficiency and productivity, while enabling organizations to use the full capabilities of the AWS landscape for data management, analytics, and visualization.
Prerequisites
For this walkthrough, you should have the following prerequisites:

An AWS account where you can follow the instructions mentioned below
An Amazon Aurora PostgreSQL database.
Your Aurora PostgreSQL-Compatible authentication credentials in an AWS Secrets Manager
Your Aurora PostgreSQL database user name and password. As a best practice, provide Amazon Q with read-only database credentials.
Your database host URL, port, and instance. You can find this information on the Amazon RDS console.

Create an Amazon Q Business application
In this section, we walk through the configuration steps for the Amazon Q Business Aurora (PostgreSQL) connector. For more information, see Creating an Amazon Q Business application environment. Complete the following steps to create your application:

On the Amazon Q Business console, choose Applications in the navigation pane.
Choose Create application.

For Application name¸ enter a name (for example, aurora-connector).
For Access management method, select AWS IAM Identity Center.
For Advanced IAM Identity Center settings, enable Enable cross-region calls to allow Amazon Q Business to connect to an AWS IAM Identity Center instance that exists in an AWS Region not already supported by Amazon Q Business. For more information, see Creating a cross-region IAM Identity Center integration.
Then, you will see the following options based on whether you have an IAM Identity Center instance already configured, or need to create one.

If you don’t have an IAM Identity Center instance configured, you see the following:

The Region your Amazon Q Business application environment is in.
Specify tags for IAM Identity Center – Add tags to keep track of your IAM Identity Center instance.
Create IAM Identity Center – Select to create an IAM Identity Center instance. Depending on your setup, you may be prompted to create an account instance or an organization instance, or both. The console will display an ARN for your newly created resource after it’s created.

If you have both an IAM Identity Center organization instance and an account instance configured, your instances will be auto-detected, and you see the following options:

Organization instance of IAM Identity Center – Select this option to manage access to Amazon Q Business by assigning users and groups from the IAM Identity Center directory for your organization. If you have an IAM Identity Center organization instance configured, your organization instance will be auto-detected.
Account instance of IAM Identity Center – Select this option to manage access to Amazon Q Business by assigning existing users and groups from your IAM Identity Center directory. If you have an IAM Identity Center account instance configured, your account instance will be auto-detected.
The Region your Amazon Q Business application environment is in.
IAM Identity Center – The ARN for your IAM Identity Center instance.

If your IAM Identity Center instance is configured in a Region Amazon Q Business isn’t available in, and you haven’t activated cross-Region IAM Identity Center calls, you will see a message saying that a connection is unavailable with an option to Switch Region. When you allow a cross-Region connection between Amazon Q Business and IAM Identity Center using Advanced IAM Identity Center settings, your cross-Region IAM Identity Center instance will be auto-detected by Amazon Q Business.

Keep everything else as default and choose Create.

Create an Amazon Q Business retriever
After you create the application, you can create a retriever. Complete the following steps:

On the application page, choose Data sources in the navigation pane.

Choose Select retriever.

For Retrievers, select your type of retriever. For this post, we select Native.
For Index provisioning¸ select your index type. For this post, we select Enterprise.
For Number of units, enter a number of index units. For this post, we use 1 unit, which can read up to 20,000 documents. This limit applies to the connectors you configure for this retriever.
Choose Confirm.

Connect data sources
After you create the retriever, complete the following steps to add a data source:

On the Data sources page, choose Add data source.

Choose your data source. For this post, we choose Aurora (PostgreSQL).

You can configure up to 50 data sources per application.

Under Name and description, enter a data source name. Your name can include hyphens (-) but not spaces. The name has a maximum of 1,000 alphanumeric characters.
Under Source, enter the following information:

For Host, enter the database host URL, for example http://instance URL.region.rds.amazonaws.com.
For Port, enter the database port, for example 5432.
For Instance, enter the name of the database that you want to connect with and where tables and views are created, for example postgres.

If you enable SSL Certificate Location, enter the Amazon S3 path to your SSL certificate file.
For Authorization, Amazon Q Business crawls ACL information by default to make sure responses are generated only from documents your end-users have access to. See Authorization for more details.
Under Authentication, if you have an existing Secrets Manager secret that has the database user name and password, you can use it; otherwise, enter the following information for your new secret:

For Secret name, enter a name for your secret.
For Database user name and Password, enter the authentication credentials you copied from your database.
Choose Save.

For Configure VPC and security group, choose whether you want to use a virtual private cloud (VPC). For more information, see Virtual private cloud. If you do, enter the following information:

For Virtual Private Cloud (VPC), choose the VPC where Aurora PostgreSQL-Compatible is present.
For Subnets, choose up to six repository subnets that define the subnets and IP ranges the repository instance uses in the selected VPC.
For VPC security groups, choose up to 10 security groups that allow access to your data source.

Make sure the security group allows incoming traffic from Amazon Elastic Compute Cloud (Amazon EC2) instances and devices outside your VPC. For databases, security group instances are required.

Keep the default setting for IAM role (Create a new service role) and a new role name is generated automatically. For more information, see IAM role for Aurora (PostgreSQL) connector.

Under Sync scope, enter the following information:

For SQL query, enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 1,000 characters and not contain any semi-colons (;). Amazon Q will crawl database content that matches your query.
For Primary key column, enter the primary key for the database table. This identifies a table row within your database table. Each row in a table and view is considered a single document.
For Title column, enter the name of the document title column in your database table.
For Body column, enter the name of the document body column in your database table.

Under Additional configuration, configure the following settings:

For Change-detecting columns, enter the names of the columns that Amazon Q will use to detect content changes. Amazon Q will re-index content when there is a change in these columns.
For Users’ IDs column, enter the name of the column that contains user IDs to be allowed access to content.
For Groups column, enter the name of the column that contains groups to be allowed access to content.
For Source URLs column, enter the name of the column that contains source URLs to be indexed.
For Timestamp column, enter the name of the column that contains timestamps. Amazon Q uses timestamp information to detect changes in your content and sync only changed content.
For Timestamp format of table, enter the name of the column that contains timestamp formats to use to detect content changes and re-sync your content.
For Database time zone, enter the name of the column that contains time zones for the content to be crawled.

Under Sync mode, choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Q for the first time, content is synced by default. For more details, see Sync mode.

New, modified, or deleted content sync – Sync and index new, modified, or deleted content only.
New or modified content sync – Sync and index new or modified content only.
Full sync – Sync and index content regardless of previous sync status.

Under Sync run schedule, for Frequency, choose how often Amazon Q will sync with your data source. For more details, see Sync run schedule.
Under Tags, add tags to search and filter your resources or track your AWS costs. See Tags for more details.
Under Field mappings, you can list data source document attributes to map to your index fields. Add the fields from the Data source details page after you finish adding your data source. For more information, see Field mappings. You can choose from two types of fields:

Default – Automatically created by Amazon Q on your behalf based on common fields in your data source. You can’t edit these.
Custom – Automatically created by Amazon Q on your behalf based on common fields in your data source. You can edit these. You can also create and add new custom fields.

Once done click on the Add data source button.

When the data source state is Active, choose Sync now.

Add groups and users
After you add the data source, you can add users and groups in the Amazon Q Business application to query the data ingested from data source. Complete the following steps:

On your application page, choose Manage user access.

Choose to add new users or assign existing users:

Select Add new users to create new users in IAM Identity Center.
Select Assign existing users and groups if you already have users and groups in IAM Identity Center. For this post, we select this option.

Choose Next.

Search for the users or groups you want to assign and choose Assign to add them to the application.

After the users are added, choose Change subscription to assign either the Business Lite or Business Pro subscription plan.

Choose Confirm to confirm your subscription choice.

Test the solution
To access the Amazon Q Business Web Experience, navigate to the Web experience settings tab and choose the link for Deployed URL.

You will need to authenticate with the IAM Identity Center user details before you’re redirected to the chat interface.

Our data source is the Aurora PostgreSQL database, which contains a Movie table. We have indexed this to our Amazon Q Business application, and we will ask questions related to this data. The following screenshot shows a sample of the data in this table.

For the first query, we ask Amazon Q Business to provide recommendations for kids’ movies in natural language, and it queries the indexed data to provide the response shown in the following screenshot.

For the second query, we ask Amazon Q Business to provide more details of a specific movie in natural language. It uses the indexed data from the column of our table to provide the response.

Frequently asked questions
In this section, we provide guidance to frequently asked questions.
Amazon Q Business is unable to answer your questions
If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

No permissions – ACLs applied to your account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources. You can go to the Sync History tab to view the sync history, and then choose the View Report link, which opens an Amazon CloudWatch Logs Insights query that provides additional details like the ACL list, metadata, and other useful information that might help with troubleshooting. For more details, see Introducing document-level sync reports: Enhanced data sync visibility in Amazon Q Business.
Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.
How to generate responses from authoritative data sources
If you want Amazon Q Business to only generate responses from authoritative data sources, you can configure this using the Amazon Q Business application global controls under Admin controls and guardrails.

Log in to the Amazon Q Business console as an Amazon Q Business application administrator.
Navigate to the application and choose Admin controls and guardrails in the navigation pane.
Choose Edit in the Global controls section to set these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Amazon Q Business responds using old (stale) data even though your data source is updated
Each Amazon Q Business data connector can be configured with a unique sync run schedule frequency. Verifying the sync status and sync schedule frequency for your data connector reveals when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be manually invoked. When the sync run is complete, verify the sync history to make sure the run has successfully synced new issues. Refer to Sync run schedule for more information about each option.

Using different IdPs such as Okta, Entra ID, or Ping Identity
For more information about how to set up Amazon Q Business with other identity providers (IdPs) as your SAML 2.0-aligned IdP, see Creating an Amazon Q Business application using Identity Federation through IAM.
Limitations
For more details about limitations your Amazon Q Business Aurora (PostgreSQL) connector, see Known limitations for the Aurora (PostgreSQL) connector.
Clean up
To avoid incurring future charges and to clean up unused roles and policies, delete the resources you created:

If you created a Secrets Manager secret to store the database password, delete the secret.
Delete the data source IAM role. You can find the role ARN on the data source page.

Delete the Amazon Q application:

On the Amazon Q console, choose Applications in the navigation pane.
Select your application and on the Actions menu, choose Delete.
To confirm deletion, enter delete in the field and choose Delete.
Wait until you get the confirmation message; the process can take up to 15 minutes.

Delete your IAM Identity Center instance.

Conclusion
Amazon Q Business unlocks powerful generative AI capabilities, allowing you to gain intelligent insights from your Aurora PostgreSQL-Compatible data through natural language querying and generation. By following the steps outlined in this post, you can seamlessly connect your Aurora PostgreSQL database to Amazon Q Business and empower your developers and end-users to interact with structured data in a more intuitive and conversational manner.
To learn more about the Amazon Q Business Aurora (PostgreSQL) connector, refer to Connecting Amazon Q Business to Aurora (PostgreSQL) using the console.

About the Authors
Moumita Dutta is a Technical Account Manager at Amazon Web Services. With a focus on financial services industry clients, she delivers top-tier enterprise support, collaborating closely with them to optimize their AWS experience. Additionally, she is a member of the AI/ML community and serves as a generative AI expert at AWS. In her leisure time, she enjoys gardening, hiking, and camping.
Manoj CS is a Solutions Architect at AWS, based in Atlanta, Georgia. He specializes in assisting customers in the telecommunications industry to build innovative solutions on the AWS platform. With a passion for generative AI, he dedicates his free time to exploring this field. Outside of work, Manoj enjoys spending quality time with his family, gardening, and traveling.
Gopal Gupta is a Software Development Engineer at Amazon Web Services. With a passion for software development and expertise in this domain, he designs and develops highly scalable software solutions.

How Tealium built a chatbot evaluation platform with Ragas and Auto-In …

Posted on December 12, 2024 by i-genie

This post was co-written with Varun Kumar from Tealium
Retrieval Augmented Generation (RAG) pipelines are popular for generating domain-specific outputs based on external data that’s fed in as part of the context. However, there are challenges with evaluating and improving such systems. Two open-source libraries, Ragas (a library for RAG evaluation) and Auto-Instruct, used Amazon Bedrock to power a framework that evaluates and improves upon RAG.
In this post, we illustrate the importance of generative AI in the collaboration between Tealium and the AWS Generative AI Innovation Center (GenAIIC) team by automating the following:

Evaluating the retriever and the generated answer of a RAG system based on the Ragas Repository powered by Amazon Bedrock.
Generating improved instructions for each question-and-answer pair using an automatic prompt engineering technique based on the Auto-Instruct Repository. An instruction refers to a general direction or command given to the model to guide generation of a response. These instructions were generated using Anthropic’s Claude on Amazon Bedrock.
Providing a UI for a human-based feedback mechanism that complements an evaluation system powered by Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes popular FMs available through an API, so you can choose from a wide range of foundational models (FMs) to find the model that’s best suited for your use case. Because Amazon Bedrock is serverless, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications without having to manage any infrastructure.
Tealium background and use case
Tealium is a leader in real-time customer data integration and management. They empower organizations to build a complete infrastructure for collecting, managing, and activating customer data across channels and systems. Tealium uses AI capabilities to integrate data and derive customer insights at scale. Their AI vision is to provide their customers with an active system that continuously learns from customer behaviors and optimizes engagement in real time.
Tealium has built a question and answer (QA) bot using a RAG pipeline to help identify common issues and answer questions about using the platform. The bot is expected to act as a virtual assistant to answer common questions, identify and solve issues, monitor platform health, and provide best practice suggestions, all aimed at helping Tealium customers get the most value from their customer data platform.
The primary goal of this solution with Tealium was to evaluate and improve the RAG solution that Tealium uses to power their QA bot. This was achieved by building an:

Evaluation pipeline.
Error correction mechanism to semi-automatically improve upon the metrics generated from evaluation. In this engagement, automatic prompt engineering was the only technique used, but others such as different chunking strategies and using semantic instead of hybrid search can be explored depending on your use case.
A human-in the-loop feedback system allowing the human to approve or disapprove RAG outputs

Amazon Bedrock was vital in powering an evaluation pipeline and error correction mechanism because of its flexibility in choosing a wide range of leading FMs and its ability to customize models for various tasks. This allowed for testing of many types of specialized models on specific data to power such frameworks. The value of Amazon Bedrock in text generation for automatic prompt engineering and text summarization for evaluation helped tremendously in the collaboration with Tealium. Lastly, Amazon Bedrock allowed for more secure generative AI applications, giving Tealium full control over their data while also encrypting it at rest and in transit.
Solution prerequisites
To test the Tealium solution, start with the following:

Get access to an AWS account.
Create a SageMaker domain instance.
Obtain access to the following models on Amazon Bedrock: Anthropic’s Claude Instant, Claude v2, Claude 3 Haiku, and Titan Embeddings G1 – Text. The evaluation using Ragas can be performed using any foundation model (FM) that’s available on Amazon Bedrock. Automatic prompt engineering must use Anthropic’s Claude v2, v2.1, or Claude Instant.
Obtain a golden set of question and answer pairs. Specifically, you need to provide examples of questions that you will ask the RAG bot and their expected ground truths.
Clone automatic prompt engineering and human-in-the-loop repositories. If you want access to a Ragas repository with prompts favorable towards Anthropic Claude models available on Amazon Bedrock, clone and navigate through this repository and this notebook.

The code repositories allow for flexibility of various FMs and customized models with minimal updates, illustrating Amazon Bedrock’s value in this engagement.
Solution overview
The following diagram illustrates a sample solution architecture that includes an evaluation framework, error correction technique (Auto-Instruct and automatic prompt engineering), and human-in-the-loop. As you can see, generative AI is an important part of the evaluation pipeline and the automatic prompt engineering pipeline.

The workflow consists of the following steps:

You first enter a query into the Tealium RAG QA bot. The RAG solution uses FAISS to retrieve an appropriate context for the specified query. Then, it outputs a response.
Ragas takes in this query, context, answer, and a ground truth that you input, and calculates faithfulness, context precision, context recall, answer correctness, answer relevancy, and answer similarity. Ragas can be integrated with Amazon Bedrock (look at the Ragas section of the notebook link). This illustrates integrating Amazon Bedrock in different frameworks.
If any of the metrics are below a certain threshold, the specific question and answer pair is run by the Auto-Instruct library, which generates candidate instructions using Amazon Bedrock. Various FMs can be used for this text generation use case.
The new instructions are appended to the original query to be prepared to be run by the Tealium RAG QA bot.
The QA bot runs an evaluation to determine whether improvements have been made. Steps 3 and 4 can be iterated until all metrics are above a certain threshold. In addition, you can set a maximum number of times steps 3 and 4 are iterated to prevent an infinite loop.
A human-in-the-loop UI is used to allow a subject matter expert (SME) to provide their own evaluation on given model outputs. This can also be used to provide guard rails against a system powered by generative AI.

In the following sections, we discuss how an example question, its context, its answer (RAG output) and ground truth (expected answer) can be evaluated and revised for a more ideal output. The evaluation is done using Ragas, a RAG evaluation library. Then, prompts and instructions are automatically generated based on their relevance to the question and answer. Lastly, you can approve or disapprove the RAG outputs based on the specific instruction generated from the automatic prompt engineering step.
Out-of-scope
Error correction and human-in-the-loop are two important aspects in this post. However, for each component, the following is out-of-scope, but can be improved upon in future iterations of the solution:
Error correction mechanism

Automatic prompt engineering is the only method used to correct the RAG solution. This engagement didn’t go over other techniques to improve the RAG solution; such as using Amazon Bedrock to find optimal chunking strategies, vector stores, models, semantic or hybrid search, and other mechanisms. Further testing needs to be done to evaluate whether FMs from Amazon Bedrock can be a good decision maker for such parameters of a RAG solution.
Based on the technique presented for automatic prompt engineering, there might be opportunities to optimize the cost. This wasn’t analyzed during the engagement. Disclaimer: The technique described in this post might not be the most optimal approach in terms of cost.

Human-in-the-loop

SMEs provide their evaluation of the RAG solution by approving and disapproving FM outputs. This feedback is stored in the user’s file directory. There is an opportunity to improve upon the model based on this feedback, but this isn’t touched upon in this post.

Ragas – Evaluation of RAG pipelines
Ragas is a framework that helps evaluate a RAG pipeline. In general, RAG is a natural language processing technique that uses external data to augment an FM’s context. Therefore, this framework evaluates the ability for the bot to retrieve relevant context as well as output an accurate response to a given question. The collaboration between the AWS GenAIIC and the Tealium team showed the success of Amazon Bedrock integration with Ragas with minimal changes.
The inputs to Ragas include a set of questions, ground truths, answers, and contexts. For each question, an expected answer (ground truth), LLM output (answer), and a list of contexts (retrieved chunks) were inputted. Context recall, precision, answer relevancy, faithfulness, answer similarity, and answer correctness were evaluated using Anthropic’s Claude on Amazon Bedrock (any version). For your reference, here are the metrics that have been successfully calculated using Amazon Bedrock:

Faithfulness – This measures the factual consistency of the generated answer against the given context, so it requires the answer and retrieved context as an input. This is a two-step prompt where the generated answer is first broken down into multiple standalone statements and propositions. Then, the evaluation LLM validates the attribution of the generated statement to the context. If the attribution can’t be validated, it’s assumed that the statement is at risk of hallucination. The answer is scaled to a 0–1 range; the higher the better.
Context precision – This evaluates the relevancy of the context to the answer, or in other words, the retriever’s ability to capture the best context to answer your query. An LLM verifies if the information in the given context is directly relevant to the question with a single “Yes” or “No” response. The context is passed in as a list, so if the list is size one (one chunk), then the metric for context precision is either 0 (representing the context isn’t relevant to the question) or 1 (representing that it is relevant). If the context list is greater than one (or includes multiple chunks), then context precision is between 0–1, representing a specific weighted average precision calculation. This involves the context precision of the first chunk being weighted heavier than the second chunk, which itself is weighted heavier than the third chunk, and onwards, taking into account the ordering of the chunks being outputted as contexts.
Context recall – This measures the alignment between the context and the expected RAG output, the ground truth. Similar to faithfulness, each statement in the ground truth is checked to see if it is attributed to the context (thereby evaluating the context).
Answer similarity – This assesses the semantic similarity between the RAG output (answer) and expected answer (ground truth), with a range between 0–1. A higher score signifies better performance. First, the embeddings of answer and ground truth are created, and then a score between 0–1 is predicted, representing the semantic similarity of the embeddings using a cross encoder Tiny BERT model.
Answer relevance – This focuses on how pertinent the generated RAG output (answer) is to the question. A lower score is assigned to answers that are incomplete or contain redundant information. To calculate this score, the LLM is asked to generate multiple questions from a given answer. Then using an Amazon Titan Embeddings model, embeddings are generated for the generated question and the actual question. The metric therefore is the mean cosine similarity between all the generated questions and the actual question.
Answer correctness – This is the accuracy between the generated answer and the ground truth. This is calculated from the semantic similarity metric between the answer and the ground truth in addition to a factual similarity by looking at the context. A threshold value is used if you want to employ a binary 0 or 1 answer correctness score, otherwise a value between 0–1 is generated.

AutoPrompt – Automatically generate instructions for RAG
Secondly, generative AI services were shown to successfully generate and select instructions for prompting FMs. In a nutshell, instructions are generated by an FM that best map a question and context to the RAG QA bot answer based on a certain style. This process was done using the Auto-Instruct library. The approach harnesses the ability of FMs to produce candidate instructions, which are then ranked using a scoring model to determine the most effective prompts.
First, you need to ask an Anthropic’s Claude model on Amazon Bedrock to generate an instruction for a set of inputs (question and context) that map to an output (answer). The FM is then asked to generate a specific type of instruction, such as a one-paragraph instruction, one-sentence instruction, or step-by-step instruction. Many candidate instructions are then generated. Look at the generate_candidate_prompts() function to see the logic in code.
Then, the resulting candidate instructions are tested against each other using an evaluation FM. To do this, first, each instruction is compared against all other instructions. Then, the evaluation FM is used to evaluate the quality of the prompts for a given task (query plus context to answer pairs). The evaluation logic for a sample pair of candidate instructions is shown in the test_candidate_prompts() function.
This outputs the most ideal prompt generated by the framework. For each question-and-answer pair, the output includes the best instruction, second best instruction, and third best instruction.
For a demonstration of performing automatic prompt engineering (and calling Ragas):

Navigate through the following notebook.
Code snippets for how candidate prompts are generated and evaluated are included in this source file with their associated prompts included in this config file.

You can review the full repository for automatic prompt engineering using FMs from Amazon Bedrock.
Human-in-the-loop evaluation
So far, you have learned about the applications of FMs in their generation of quantitative metrics and prompts. However, depending on the use case, they need to be aligned with human evaluators’ preferences to have ultimate confidence in these systems. This section presents a HITL web UI (Streamlit) demonstration, showing a side-by-side comparison of instructions and question inputs and RAG outputs. This is shown in the following image:

The structure of the UI is:

On the left, select an FM and two instruction templates (as marked by the index number) to test. After you choose Start, you will see the instructions on the main page.
The top text box on the main page is the query.
The text box below that is the first instruction sent to the LLM as chosen by the index number in the first bullet point.
The text box below the first instruction is the second instruction sent to the LLM as chosen by the index number in the first bullet point.
Then comes the model output for Prompt A, which is the output when the first instruction and query is sent to the LLM. This is compared against the model output for Prompt B, which is the output when the second instruction and query is sent to the LLM.
You can give your feedback for the two outputs, as shown in the following image.

After you input your results, they’re saved in a file in your directory. These can be used for further enhancement of the RAG solution.
Follow the instructions in this repository to run your own human-in-the-loop UI.
Chatbot live evaluation metrics
Amazon Bedrock has been used to continuously analyze the bot performance. The following are the latest results using Ragas:

.
Context Utilization
Faithfulness
Answer Relevancy

Count
714
704
714

Mean
0.85014
0.856887
0.7648831

Standard Deviation
0.357184
0.282743
0.304744

Min
0
0
0

25%
1
1
0.786385

50%
1
1
0.879644

75%
1
1
0.923229

Max
1
1
1

The Amazon Bedrock-based chatbot with Amazon Titan embeddings achieved 85% context utilization, 86% faithfulness, and 76% answer relevancy.
Conclusion
Overall, the AWS team was able to use various FMs on Amazon Bedrock using the Ragas library to evaluate Tealium’s RAG QA bot when inputted with a query, RAG response, retrieved context, and expected ground truth. It did this by finding out if:

The RAG response is attributed to the context.
The context is attributed to the query.
The ground truth is attributed to the context.
Whether the RAG response is relevant to the question and similar to the ground truth.

Therefore, it was able to evaluate a RAG solution’s ability to retrieve relevant context and answer the sample question accurately.
In addition, an FM was able to generate multiple instructions from a question-and-answer pair and rank them based on the quality of the responses. After instructions were generated, it was able to slightly improve errors in the LLM response. The human in the loop demonstration provides a side-by-side view of outputs for different prompts and instructions. This was an enhanced thumbs up/thumbs down approach to further improve inputs to the RAG bot based on human feedback.
Some next steps with this solution include the following:

Improving RAG performance using different models or different chunking strategies based on specific metrics
Testing out different strategies to optimize the cost (number of FM calls) to evaluate generated instructions in the automatic prompt engineering phase
Allowing SME feedback in the human evaluation step to automatically improve upon ground truth or instruction templates

The value of Amazon Bedrock was shown throughout the collaboration with Tealium. The flexibility of Amazon Bedrock in choosing a wide range of leading FMs and the ability to customize models for specific tasks allow Tealium to power the solution in specialized ways with minimal updates in the future. The importance of Amazon Bedrock in text generation and success in evaluation were shown in this engagement, providing potential and flexibility for Tealium to build on the solution. Its emphasis on security allows Tealium to be confident in building and delivering more secure applications.
As stated by Matt Gray, VP of Global Partnerships at Tealium,

“In collaboration with the AWS Generative AI Innovation Center, we have developed a sophisticated evaluation framework and an error correction system, utilizing Amazon Bedrock, to elevate the user experience. This initiative has resulted in a streamlined process for assessing the performance of the Tealium QA bot, enhancing its accuracy and reliability through advanced technical metrics and error correction methodologies. Our partnership with AWS and Amazon Bedrock is a testament to our dedication to delivering superior outcomes and continuing to innovate for our mutual clients.”

This is just one of the ways AWS enables builders to deliver generative AI based solutions. You can get started with Amazon Bedrock and see how it can be integrated in example code bases today. If you’re interested in working with the AWS generative AI services, reach out to the GenAIIC.

About the authors
Suren Gunturu is a Data Scientist working in the Generative AI Innovation Center, where he works with various AWS customers to solve high-value business problems. He specializes in building ML pipelines using large language models, primarily through Amazon Bedrock and other AWS Cloud services.
Varun Kumar is a Staff Data Scientist at Tealium, leading its research program to provide high-quality data and AI solutions to its customers. He has extensive experience in training and deploying deep learning and machine learning models at scale. Additionally, he is accelerating Tealium’s adoption of foundation models in its workflow including RAG, agents, fine-tuning, and continued pre-training.
Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.