Best practices to build generative AI applications on AWS

Generative AI applications driven by foundational models (FMs) are enabling organizations with significant business value in customer experience, productivity, process optimization, and innovations. However, adoption of these FMs involves addressing some key challenges, including quality output, data privacy, security, integration with organization data, cost, and skills to deliver.
In this post, we explore different approaches you can take when building applications that use generative AI. With the rapid advancement of FMs, it’s an exciting time to harness their power, but also crucial to understand how to properly use them to achieve business outcomes. We provide an overview of key generative AI approaches, including prompt engineering, Retrieval Augmented Generation (RAG), and model customization. When applying these approaches, we discuss key considerations around potential hallucination, integration with enterprise data, output quality, and cost. By the end, you will have solid guidelines and a helpful flow chart for determining the best method to develop your own FM-powered applications, grounded in real-life examples. Whether creating a chatbot or summarization tool, you can shape powerful FMs to suit your needs.
Generative AI with AWS
The emergence of FMs is creating both opportunities and challenges for organizations looking to use these technologies. A key challenge is ensuring high-quality, coherent outputs that align with business needs, rather than hallucinations or false information. Organizations must also carefully manage data privacy and security risks that arise from processing proprietary data with FMs. The skills needed to properly integrate, customize, and validate FMs within existing systems and data are in short supply. Building large language models (LLMs) from scratch or customizing pre-trained models requires substantial compute resources, expert data scientists, and months of engineering work. The computational cost alone can easily run into the millions of dollars to train models with hundreds of billions of parameters on massive datasets using thousands of GPUs or TPUs. Beyond hardware, data cleaning and processing, model architecture design, hyperparameter tuning, and training pipeline development demand specialized machine learning (ML) skills. The end-to-end process is complex, time-consuming, and prohibitively expensive for most organizations without the requisite infrastructure and talent investment. Organizations that fail to adequately address these risks can face negative impacts to their brand reputation, customer trust, operations, and revenues.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage any infrastructure. Amazon Bedrock is HIPAA eligible, and you can use Amazon Bedrock in compliance with the GDPR. With Amazon Bedrock, your content is not used to improve the base models and is not shared with third-party model providers. Your data in Amazon Bedrock is always encrypted in transit and at rest, and you can optionally encrypt resources using your own keys. You can use AWS PrivateLink with Amazon Bedrock to establish private connectivity between your FMs and your VPC without exposing your traffic to the internet. With Knowledge Bases for Amazon Bedrock, you can give FMs and agents contextual information from your company’s private data sources for RAG to deliver more relevant, accurate, and customized responses. You can privately customize FMs with your own data through a visual interface without writing any code. As a fully managed service, Amazon Bedrock offers a straightforward developer experience to work with a broad range of high-performing FMs.
Launched in 2017, Amazon SageMaker is a fully managed service that makes it straightforward to build, train, and deploy ML models. More and more customers are building their own FMs using SageMaker, including Stability AI, AI21 Labs, Hugging Face, Perplexity AI, Hippocratic AI, LG AI Research, and Technology Innovation Institute. To help you get started quickly, Amazon SageMaker JumpStart offers an ML hub where you can explore, train, and deploy a wide selection of public FMs, such as Mistral models, LightOn models, RedPajama, Mosiac MPT-7B, FLAN-T5/UL2, GPT-J-6B/Neox-20B, and Bloom/BloomZ, using purpose-built SageMaker tools such as experiments and pipelines.
Common generative AI approaches
In this section, we discuss common approaches to implement effective generative AI solutions. We explore popular prompt engineering techniques that allow you to achieve more complex and interesting tasks with FMs. We also discuss how techniques like RAG and model customization can further enhance FMs’ capabilities and overcome challenges like limited data and computational constraints. With the right technique, you can build powerful and impactful generative AI solutions.
Prompt engineering
Prompt engineering is the practice of carefully designing prompts to efficiently tap into the capabilities of FMs. It involves the use of prompts, which are short pieces of text that guide the model to generate more accurate and relevant responses. With prompt engineering, you can improve the performance of FMs and make them more effective for a variety of applications. In this section, we explore techniques like zero-shot and few-shot prompting, which rapidly adapts FMs to new tasks with just a few examples, and chain-of-thought prompting, which breaks down complex reasoning into intermediate steps. These methods demonstrate how prompt engineering can make FMs more effective on complex tasks without requiring model retraining.
Zero-shot prompting
A zero-shot prompt technique requires FMs to generate an answer without providing any explicit examples of the desired behavior, relying solely on its pre-training. The following screenshot shows an example of a zero-shot prompt with the Anthropic Claude 2.1 model on the Amazon Bedrock console.

In these instructions, we didn’t provide any examples. However, the model can understand the task and generate appropriate output. Zero-shot prompts are the most straightforward prompt technique to begin with when evaluating an FM for your use case. However, although FMs are remarkable with zero-shot prompts, it may not always yield accurate or desired results for more complex tasks. When zero-shot prompts fall short, it is recommended to provide a few examples in the prompt (few-shot prompts).
Few-shot prompting
The few-shot prompt technique allows FMs to do in-context learning from the examples in the prompts and perform the task more accurately. With just a few examples, you can rapidly adapt FMs to new tasks without large training sets and guide them towards the desired behavior. The following is an example of a few-shot prompt with the Cohere Command model on the Amazon Bedrock console.

In the preceding example, the FM was able to identify entities from the input text (reviews) and extract the associated sentiments. Few-shot prompts are an effective way to tackle complex tasks by providing a few examples of input-output pairs. For straightforward tasks, you can give one example (1-shot), whereas for more difficult tasks, you should provide three (3-shot) to five (5-shot) examples. Min et al. (2022) published findings about in-context learning that can enhance the performance of the few-shot prompting technique. You can use few-shot prompting for a variety of tasks, such as sentiment analysis, entity recognition, question answering, translation, and code generation.
Chain-of-thought prompting
Despite its potential, few-shot prompting has limitations, especially when dealing with complex reasoning tasks (such as arithmetic or logical tasks). These tasks require breaking the problem down into steps and then solving it. Wei et al. (2022) introduced the chain-of-thought (CoT) prompting technique to solve complex reasoning problems through intermediate reasoning steps. You can combine CoT with few-shot prompting to improve results on complex tasks. The following is an example of a reasoning task using few-shot CoT prompting with the Anthropic Claude 2 model on the Amazon Bedrock console.

Kojima et al. (2022) introduced an idea of zero-shot CoT by using FMs’ untapped zero-shot capabilities. Their research indicates that zero-shot CoT, using the same single-prompt template, significantly outperforms zero-shot FM performances on diverse benchmark reasoning tasks. You can use zero-shot CoT prompting for simple reasoning tasks by adding “Let’s think step by step” to the original prompt.
ReAct
CoT prompting can enhance FMs’ reasoning capabilities, but it still depends on the model’s internal knowledge and doesn’t consider any external knowledge base or environment to gather more information, which can lead to issues like hallucination. The ReAct (reasoning and acting) approach addresses this gap by extending CoT and allowing dynamic reasoning using an external environment (such as Wikipedia).
Integration
FMs have the ability to comprehend questions and provide answers using their pre-trained knowledge. However, they lack the capacity to respond to queries requiring access to an organization’s private data or the ability to autonomously carry out tasks. RAG and agents are methods to connect these generative AI-powered applications to enterprise datasets, empowering them to give responses that account for organizational information and enable running actions based on requests.
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) allows you to customize a model’s responses when you want the model to consider new knowledge or up-to-date information. When your data changes frequently, like inventory or pricing, it’s not practical to fine-tune and update the model while it’s serving user queries. To equip the FM with up-to-date proprietary information, organizations turn to RAG, a technique that involves fetching data from company data sources and enriching the prompt with that data to deliver more relevant and accurate responses.
There are several use cases where RAG can help improve FM performance:

Question answering – RAG models help question answering applications locate and integrate information from documents or knowledge sources to generate high-quality answers. For example, a question answering application could retrieve passages about a topic before generating a summarizing answer.
Chatbots and conversational agents – RAG allow chatbots to access relevant information from large external knowledge sources. This makes the chatbot’s responses more knowledgeable and natural.
Writing assistance – RAG can suggest relevant content, facts, and talking points to help you write documents such as articles, reports, and emails more efficiently. The retrieved information provides useful context and ideas.
Summarization – RAG can find relevant source documents, passages, or facts to augment a summarization model’s understanding of a topic, allowing it to generate better summaries.
Creative writing and storytelling – RAG can pull plot ideas, characters, settings, and creative elements from existing stories to inspire AI story generation models. This makes the output more interesting and grounded.
Translation – RAG can find examples of how certain phrases are translated between languages. This provides context to the translation model, improving translation of ambiguous phrases.
Personalization – In chatbots and recommendation applications, RAG can pull personal context like past conversations, profile information, and preferences to make responses more personalized and relevant.

There are several advantages in using a RAG framework:

Reduced hallucinations – Retrieving relevant information helps ground the generated text in facts and real-world knowledge, rather than hallucinating text. This promotes more accurate, factual, and trustworthy responses.
Coverage – Retrieval allows an FM to cover a broader range of topics and scenarios beyond its training data by pulling in external information. This helps address limited coverage issues.
Efficiency – Retrieval lets the model focus its generation on the most relevant information, rather than generating everything from scratch. This improves efficiency and allows larger contexts to be used.
Safety – Retrieving the information from required and permitted data sources can improve governance and control over harmful and inaccurate content generation. This supports safer adoption.
Scalability – Indexing and retrieving from large corpora allows the approach to scale better compared to using the full corpus during generation. This enables you to adopt FMs in more resource-constrained environments.

RAG produces quality results, due to augmenting use case-specific context directly from vectorized data stores. Compared to prompt engineering, it produces vastly improved results with massively low chances of hallucinations. You can build RAG-powered applications on your enterprise data using Amazon Kendra. RAG has higher complexity than prompt engineering because you need to have coding and architecture skills to implement this solution. However, Knowledge Bases for Amazon Bedrock provides a fully managed RAG experience and the most straightforward way to get started with RAG in Amazon Bedrock. Knowledge Bases for Amazon Bedrock automates the end-to-end RAG workflow, including ingestion, retrieval, and prompt augmentation, eliminating the need for you to write custom code to integrate data sources and manage queries. Session context management is built in so your app can support multi-turn conversations. Knowledge base responses come with source citations to improve transparency and minimize hallucinations. The most straightforward way to build generative-AI powered assistant is by using Amazon Q, which has a built-in RAG system.
RAG has the highest degree of flexibility when it comes to changes in the architecture. You can change the embedding model, vector store, and FM independently with minimal-to-moderate impact on other components. To learn more about the RAG approach with Amazon OpenSearch Service and Amazon Bedrock, refer to Build scalable and serverless RAG workflows with a vector engine for Amazon OpenSearch Serverless and Amazon Bedrock Claude models. To learn about how to implement RAG with Amazon Kendra, refer to Harnessing the power of enterprise data with generative AI: Insights from Amazon Kendra, LangChain, and large language models.
Agents
FMs can understand and respond to queries based on their pre-trained knowledge. However, they are unable to complete any real-world tasks, like booking a flight or processing a purchase order, on their own. This is because such tasks require organization-specific data and workflows that typically need custom programming. Frameworks like LangChain and certain FMs such as Claude models provide function-calling capabilities to interact with APIs and tools. However, Agents for Amazon Bedrock, a new and fully managed AI capability from AWS, aims to make it more straightforward for developers to build applications using next-generation FMs. With just a few clicks, it can automatically break down tasks and generate the required orchestration logic, without needing manual coding. Agents can securely connect to company databases via APIs, ingest and structure the data for machine consumption, and augment it with contextual details to produce more accurate responses and fulfill requests. Because it handles integration and infrastructure, Agents for Amazon Bedrock allows you to fully harness generative AI for business use cases. Developers can now focus on their core applications rather than routine plumbing. The automated data processing and API calling also enables FM to deliver updated, tailored answers and perform actual tasks by using proprietary knowledge.
Model customization
Foundation models are extremely capable and enable some great applications, but what will help drive your business is generative AI that knows what’s important to your customers, your products, and your company. And that’s only possible when you supercharge models with your data. Data is the key to moving from generic applications to customized generative AI applications that create real value for your customers and your business.
In this section, we discuss different techniques and benefits of customizing your FMs. We cover how model customization involves further training and changing the weights of the model to enhance its performance.
Fine-tuning
Fine-tuning is the process of taking a pre-trained FM, such as Llama 2, and further training it on a downstream task with a dataset specific to that task. The pre-trained model provides general linguistic knowledge, and fine-tuning allows it to specialize and improve performance on a particular task like text classification, question answering, or text generation. With fine-tuning, you provide labeled datasets—which are annotated with additional context—to train the model on specific tasks. You can then adapt the model parameters for the specific task based on your business context.
You can implement fine-tuning on FMs with Amazon SageMaker JumpStart and Amazon Bedrock. For more details, refer to Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code and Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training.
Continued pre-training
Continued pre-training in Amazon Bedrock enables you to teach a previously trained model on additional data similar to its original data. It enables the model to gain more general linguistic knowledge rather than focus on a single application. With continued pre-training, you can use your unlabeled datasets, or raw data, to improve the accuracy of foundation model for your domain through tweaking model parameters. For example, a healthcare company can continue to pre-train its model using medical journals, articles, and research papers to make it more knowledgeable on industry terminology. For more details, refer to Amazon Bedrock Developer Experience.
Benefits of model customization
Model customization has several advantages and can help organizations with the following:

Domain-specific adaptation – You can use a general-purpose FM, and then further train it on data from a specific domain (such as biomedical, legal, or financial). This adapts the model to that domain’s vocabulary, style, and so on.
Task-specific fine-tuning – You can take a pre-trained FM and fine-tune it on data for a specific task (such as sentiment analysis or question answering). This specializes the model for that particular task.
Personalization – You can customize an FM on an individual’s data (emails, texts, documents they’ve written) to adapt the model to their unique style. This can enable more personalized applications.
Low-resource language tuning – You can retrain only the top layers of a multilingual FM on a low-resource language to better adapt it to that language.
Fixing flaws – If certain unintended behaviors are discovered in a model, customizing on appropriate data can help update the model to reduce those flaws.

Model customization helps overcome the following FM adoption challenges:

Adaptation to new domains and tasks – FMs pre-trained on general text corpora often need to be fine-tuned on task-specific data to work well for downstream applications. Fine-tuning adapts the model to new domains or tasks it wasn’t originally trained on.
Overcoming bias – FMs may exhibit biases from their original training data. Customizing a model on new data can reduce unwanted biases in the model’s outputs.
Improving computational efficiency – Pre-trained FMs are often very large and computationally expensive. Model customization can allow downsizing the model by pruning unimportant parameters, making deployment more feasible.
Dealing with limited target data – In some cases, there is limited real-world data available for the target task. Model customization uses the pre-trained weights learned on larger datasets to overcome this data scarcity.
Improving task performance – Fine-tuning almost always improves performance on target tasks compared to using the original pre-trained weights. This optimization of the model for its intended use allows you to deploy FMs successfully in real applications.

Model customization has higher complexity than prompt engineering and RAG because the model’s weight and parameters are being changed via tuning scripts, which requires data science and ML expertise. However, Amazon Bedrock makes it straightforward by providing you a managed experience to customize models with fine-tuning or continued pre-training. Model customization provides highly accurate results with comparable quality output than RAG. Because you’re updating model weights on domain-specific data, the model produces more contextual responses. Compared to RAG, the quality might be marginally better depending on the use case. Therefore, it’s important to conduct a trade-off analysis between the two techniques. You can potentially implement RAG with a customized model.
Retraining or training from scratch
Building your own foundation AI model rather than solely using pre-trained public models allows for greater control, improved performance, and customization to your organization’s specific use cases and data. Investing in creating a tailored FM can provide better adaptability, upgrades, and control over capabilities. Distributed training enables the scalability needed to train very large FMs on massive datasets across many machines. This parallelization makes models with hundreds of billions of parameters trained on trillions of tokens feasible. Larger models have greater capacity to learn and generalize.
Training from scratch can produce high-quality results because the model is training on use case-specific data from scratch, the chances of hallucination are rare, and the accuracy of the output can be amongst the highest. However, if your dataset is constantly evolving, you can still run into hallucination issues. Training from scratch has the highest implementation complexity and cost. It requires the most effort because it requires collecting a vast amount of data, curating and processing it, and training a fairly large FM, which requires deep data science and ML expertise. This approach is time-consuming (it can typically take weeks to months).
You should consider training an FM from scratch when none of the other approaches work for you, and you have the ability to build an FM with a large amount of well-curated tokenized data, a sophisticated budget, and a team of highly skilled ML experts. AWS provides the most advanced cloud infrastructure to train and run LLMs and other FMs powered by GPUs and the purpose-built ML training chip, AWS Trainium, and ML inference accelerator, AWS Inferentia. For more details about training LLMs on SageMaker, refer to Training large language models on Amazon SageMaker: Best practices and SageMaker HyperPod.
Selecting the right approach for developing generative AI applications
When developing generative AI applications, organizations must carefully consider several key factors before selecting the most suitable model to meet their needs. A variety of aspects should be considered, such as cost (to ensure the selected model aligns with budget constraints), quality (to deliver coherent and factually accurate output), seamless integration with current enterprise platforms and workflows, and reducing hallucinations or generating false information. With many options available, taking the time to thoroughly evaluate these aspects will help organizations choose the generative AI model that best serves their specific requirements and priorities. You should examine the following factors closely:

Integration with enterprise systems – For FMs to be truly useful in an enterprise context, they need to integrate and interoperate with existing business systems and workflows. This could involve accessing data from databases, enterprise resource planning (ERP), and customer relationship management (CRM), as well as triggering actions and workflows. Without proper integration, the FM risks being an isolated tool. Enterprise systems like ERP contain key business data (customers, products, orders). The FM needs to be connected to these systems to use enterprise data rather than work off its own knowledge graph, which may be inaccurate or outdated. This ensures accuracy and a single source of truth.
Hallucinations – Hallucinations are when an AI application generates false information that appears factual. These need to be carefully addressed before FMs are widely adopted. For example, a medical chatbot designed to provide diagnosis suggestions could hallucinate details about a patient’s symptoms or medical history, leading it to propose an inaccurate diagnosis. Preventing harmful hallucinations like these through technical solutions and dataset curation will be critical to making sure these FMs can be trusted for sensitive applications like healthcare, finance, and legal. Thorough testing and transparency about an FM’s training data and remaining flaws will need to accompany deployments.
Skills and resources – The successful adoption of FMs will depend heavily on having the proper skills and resources to use the technology effectively. Organizations need employees with strong technical skills to properly implement, customize, and maintain FMs to suit their specific needs. They also require ample computational resources like advanced hardware and cloud computing capabilities to run complex FMs. For example, a marketing team wanting to use an FM to generate advertising copy and social media posts needs skilled engineers to integrate the system, creatives to provide prompts and assess output quality, and sufficient cloud computing power to deploy the model cost-effectively. Investing in developing expertise and technical infrastructure will enable organizations to gain real business value from applying FMs.
Output quality – The quality of the output produced by FMs will be critical in determining their adoption and use, particularly in consumer-facing applications like chatbots. If chatbots powered by FMs provide responses that are inaccurate, nonsensical, or inappropriate, users will quickly become frustrated and stop engaging with them. Therefore, companies looking to deploy chatbots need to rigorously test the FMs that drive them to ensure they consistently generate high-quality responses that are helpful, relevant, and appropriate to provide a good user experience. Output quality encompasses factors like relevance, accuracy, coherence, and appropriateness, which all contribute to overall user satisfaction and will make or break the adoption of FMs like those used for chatbots.
Cost – The high computational power required to train and run large AI models like FMs can incur substantial costs. Many organizations may lack the financial resources or cloud infrastructure necessary to use such massive models. Additionally, integrating and customizing FMs for specific use cases adds engineering costs. The considerable expenses required to use FMs could deter widespread adoption, especially among smaller companies and startups with limited budgets. Evaluating potential return on investment and weighing the costs vs. benefits of FMs is critical for organizations considering their application and utility. Cost-efficiency will likely be a deciding factor in determining if and how these powerful but resource-intensive models can be feasibly deployed.

Design decision
As we covered in this post, many different AI techniques are currently available, such as prompt engineering, RAG, and model customization. This wide range of choices makes it challenging for companies to determine the optimal approach for their particular use case. Selecting the right set of techniques depends on various factors, including access to external data sources, real-time data feeds, and the domain specificity of the intended application. To aid in identifying the most suitable technique based on the use case and considerations involved, we walk through the following flow chart, which outlines recommendations for matching specific needs and constraints with appropriate methods.

To gain a clear understanding, let’s go through the design decision flow chart using a few illustrative examples:

Enterprise search – An employee is looking to request leave from their organization. To provide a response aligned with the organization’s HR policies, the FM needs more context beyond its own knowledge and capabilities. Specifically, the FM requires access to external data sources that provide relevant HR guidelines and policies. Given this scenario of an employee request that requires referring to external domain-specific data, the recommended approach according to the flow chart is prompt engineering with RAG. RAG will help in providing the relevant data from the external data sources as context to the FM.
Enterprise search with organization-specific output – Suppose you have engineering drawings and you want to extract the bill of materials from them, formatting the output according to industry standards. To do this, you can use a technique that combines prompt engineering with RAG and a fine-tuned language model. The fine-tuned model would be trained to produce bills of materials when given engineering drawings as input. RAG helps find the most relevant engineering drawings from the organization’s data sources to feed in the context for the FM. Overall, this approach extracts bills of materials from engineering drawings and structures the output appropriately for the engineering domain.
General search – Imagine you want to find the identity of the 30th President of the United States. You could use prompt engineering to get the answer from an FM. Because these models are trained on many data sources, they can often provide accurate responses to factual questions like this.
General search with recent events – If you want to determine the current stock price for Amazon, you can use the approach of prompt engineering with an agent. The agent will provide the FM with the most recent stock price so it can generate the factual response.

Conclusion
Generative AI offers tremendous potential for organizations to drive innovation and boost productivity across a variety of applications. However, successfully adopting these emerging AI technologies requires addressing key considerations around integration, output quality, skills, costs, and potential risks like harmful hallucinations or security vulnerabilities. Organizations need to take a systematic approach to evaluating their use case requirements and constraints to determine the most appropriate techniques for adapting and applying FMs. As highlighted in this post, prompt engineering, RAG, and efficient model customization methods each have their own strengths and weaknesses that suit different scenarios. By mapping business needs to AI capabilities using a structured framework, organizations can overcome hurdles to implementation and start realizing benefits from FMs while also building guardrails to manage risks. With thoughtful planning grounded in real-world examples, businesses in every industry stand to unlock immense value from this new wave of generative AI. Learn about generative AI on AWS.

About the Authors
Jay Rao is a Principal Solutions Architect at AWS. He focuses on AI/ML technologies with a keen interest in Generative AI and Computer Vision. At AWS, he enjoys providing technical and strategic guidance to customers and helping them design and implement solutions that drive business outcomes. He is a book author (Computer Vision on AWS), regularly publishes blogs and code samples, and has delivered talks at tech conferences such as AWS re:Invent.
Babu Kariyaden Parambath is a Senior AI/ML Specialist at AWS. At AWS, he enjoys working with customers in helping them identify the right business use case with business value and solve it using AWS AI/ML solutions and services. Prior to joining AWS, Babu was an AI evangelist with 20 years of diverse industry experience delivering AI driven business value for customers.

Revolutionizing Fibrosis Treatment: AI-Driven Discovery of TNIK Inhibi …

Idiopathic Pulmonary Fibrosis (IPF) and renal fibrosis present significant challenges in drug development due to their complex pathogenesis and lack of effective treatments. Despite extensive research, potential drug targets, such as TGF-β signaling pathways, have not successfully translated into viable therapies for clinical use. IPF, explained by fibroblast proliferation and extracellular matrix deposition, remains particularly lethal, with limited treatment options like nintedanib and pirfenidone. Renal fibrosis, associated with chronic kidney disease, also lacks specific inhibitors despite its increasing global prevalence. Addressing these unmet clinical needs requires innovative approaches to identify and develop effective anti-fibrotic medicines.

Researchers from multiple institutions, including Insilico Medicine, have identified TNIK as a promising anti-fibrotic target using AI. They have developed INS018_055, a TNIK inhibitor showing favorable drug properties and anti-fibrotic effects across various organs in vivo via different administration routes. The compound also exhibits anti-inflammatory effects, which have been validated in multiple animal studies. Phase I clinical trials confirmed its safety, tolerability, and pharmacokinetics in healthy individuals. This AI-driven drug discovery process, spanning from target identification to clinical validation, took approximately 18 months, demonstrating the efficacy of their approach in addressing unmet medical needs in fibrosis treatment.

The study explores the use of overexpression, knockouts, and mutations to know the relevance of pathways and interactome in a heterogeneous graph walk. It also utilizes matrix factorization and machine learning models to optimize compounds. The study involves using human tissue and clinical trials, with all tissues obtained with informed consent and adherence to HIPAA regulations. Written consent was obtained from humans participating in the clinical trials. The study follows the Declaration of Helsinki. The study mentions the canonical Wnt signaling pathway’s positive regulation, NF-kappaB transcription factor activity, and cellular response to transforming growth factor.

The study utilized predictive AI to identify TNIK as an anti-fibrotic target. An AI-driven drug discovery pipeline, incorporating pathway analysis and multiomics data, generated INS018_055, a TNIK inhibitor. Its anti-fibrotic effects were assessed through various administration routes in vivo and validated for safety in clinical trials with healthy participants. The research involved analyzing multiomics datasets, biological networks, and scientific literature to prioritize potential targets. Experimental conditions, including temperature, humidity, and gas levels, were rigorously controlled, with real-time monitoring during experiments to ensure accuracy.

Utilizing PandaOmics, an AI-driven platform, anti-fibrotic targets were discovered by integrating multiomics datasets, biological network analysis, and text data. TNIK emerged as the top candidate, unrecognized in IPF therapy, with potential implications for fibrosis and aging-related conditions. Transparency analysis revealed its involvement in crucial fibrosis-related processes and tight connection with IPF-associated genes. Single-cell expression data confirmed elevated TNIK expression in fibrotic tissue, particularly in key cell types. Simulation studies demonstrated that TNIK inhibition primarily activates Hippo signaling, suggesting its significance in regulating IPF pathogenesis. These findings underscore TNIK’s promise as a therapeutic target for fibrosis, supported by diverse AI-driven analyses.

In conclusion, researchers leveraging generative AI identified TNIK as a promising anti-fibrotic target, addressing the challenge of limited understanding in fibrotic reprogramming. Small-molecule inhibitor INS018_055 effectively mitigated fibrosis in lung, kidney, and skin models in vitro and in vivo, notably improving lung function in murine lung fibrosis. Preclinical validation and phase I trials demonstrated its safety and tolerability, with ongoing phase II trials for IPF. Integrating AI-driven target discovery and drug design approach offers a swift path to potent anti-fibrotic therapies with potential applications in COVID-19-related complications and chronic kidney disease.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Want to get in front of 1.5 Million AI enthusiasts? Work with us here
The post Revolutionizing Fibrosis Treatment: AI-Driven Discovery of TNIK Inhibitor INS018_055 Unveils New Horizons in Therapeutics appeared first on MarkTechPost.

Cohere AI Unleashes Command-R: The Ultimate 35 Billion-Parameter Revol …

The demand for advanced, scalable, and versatile tools is ever-growing in software development. Developers constantly seek efficient ways to handle complex tasks such as reasoning, summarization, and multilingual question answering. Identifying and meeting these demands requires innovative solutions adapting to various use cases and language nuances.

The challenges associated with developing such tools are significant. They include handling vast amounts of data, ensuring model performance across different languages, and providing a flexible, user-friendly interface for diverse applications. This broad problem set demands a solution that is scalable, versatile, and accessible to a wide range of users. Current approaches to address these challenges have seen the development of large language models. However, these models often need more language support, scalability, and the ability to integrate with other tools or services seamlessly. Moreover, the need for models that can perform well across various tasks, including those requiring reasoning and summarization, has been increasingly recognized.

The research community has introduced C4AI Command-R, a groundbreaking tool designed to tackle these challenges head-on. Developed by Cohere and Cohere For AI, Command-R is a 35-billion parameter generative model that sets new standards for performance and flexibility.  C4AI Command-R stands out for its unique features. It offers open weights and optimization for multiple use cases, including reasoning, summarization, and question-answering. Notably, it supports generation in 10 languages and boasts impressive RAG (Retrieval-Augmented Generation) capabilities. Its architecture enables efficient and accurate processing of input and generation of responses, thanks to its quantized versions through bitsandbytes, offering 8-bit and 4-bit precision.

Performance assessments of C4AI Command-R demonstrate its exceptional results across its intended use cases. Its ability to support a context length of 128K and its specialized training for conversational tool use underscore its innovative approach to model design and functionality. 

C4AI Command-R represents a significant leap forward in the development of generative models. Its comprehensive approach to addressing common challenges in language model development—ranging from multilingual support to advanced reasoning and summarization capabilities—sets a new benchmark for what’s possible in this space. The dedication and innovation of the development team are evident in the model’s design and performance, indicating a promising future for similar endeavors.

Check out the Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Want to get in front of 1.5 Million AI enthusiasts? Work with us here
The post Cohere AI Unleashes Command-R: The Ultimate 35 Billion-Parameter Revolution in AI Language Processing, Setting New Standards for Multilingual Generation and Reasoning Capabilities! appeared first on MarkTechPost.

Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensi …

In data science and artificial intelligence, embedding entities into vector spaces is a pivotal technique, enabling the numerical representation of objects like words, users, and items. This method facilitates the quantification of similarities among entities, where vectors closer in space are considered more similar. Cosine similarity is the one that measures the cosine of the angle between two vectors and is a favored metric for this purpose. It’s heralded for its ability to capture the semantic or relational proximity between entities within these transformed vector spaces.

Researchers from Netflix Inc. and Cornell University challenge the reliability of cosine similarity as a universal metric. Their investigation unveils that, contrary to common belief, cosine similarity can sometimes produce arbitrary and even misleading results. This revelation prompts a reevaluation of its application, especially in contexts where embeddings are derived from models subjected to regularization, a mathematical technique used to simplify the model to prevent overfitting.

The study delves into the underpinnings of embeddings created from regularized linear models. It uncovers that the semblance derived from cosine similarity can be significantly arbitrary. For example, in certain linear models, the similarities produced are not inherently unique and can be manipulated by the model’s regularization parameters. This indicates a stark discrepancy in what is conventionally understood about the metric’s capacity to reflect the true semantic or relational similarity between entities.

Further exploration into the methodological aspects of the study highlights the substantial impact of different regularization strategies on the cosine similarity outcomes. Regularization, a method employed to enhance the model’s generalization by penalizing complexity, inadvertently shapes the embeddings in ways that can skew the perceived similarities. The researchers’ analytical approach demonstrates how cosine similarities, under the influence of regularization, can become opaque and arbitrary, distorting the perceived relationships between entities.

The simulated data clearly illustrates the potential for cosine similarity to obscure or inaccurately represent the semantic relationships among entities. This underscores the need for caution and a more nuanced approach to employing this metric. These findings are not just interesting but crucial, as they highlight the variabilities in cosine similarity outcomes based on model specifics and regularization techniques, showcasing the metric’s potential to yield divergent results that may not accurately reflect true similarities.

In conclusion, this research is a reminder of the complexities underlying seemingly straightforward metrics like cosine similarity. It underscores the necessity of critically evaluating the methods and assumptions in data science practices, especially those as fundamental as measuring similarity. Key takeaways from this research include:

The reliability of cosine similarity as a measure of semantic or relational proximity is conditional on the embedding model and its regularization strategy.

Arbitrary and opaque results from cosine similarity, influenced by regularization, challenge its universal applicability.

Alternative approaches or modifications to the traditional use of cosine similarity are necessary to ensure more accurate and meaningful similarity assessments.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Want to get in front of 1.5 Million AI enthusiasts? Work with us here
The post Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensional Data: A Deep Dive into Linear Models and Beyond appeared first on MarkTechPost.

Gemma is now available in Amazon SageMaker JumpStart 

Today, we’re excited to announce that the Gemma model is now available for customers using Amazon SageMaker JumpStart. Gemma is a family of language models based on Google’s Gemini models, trained on up to 6 trillion tokens of text. The Gemma family consists of two sizes: a 7 billion parameter model and a 2 billion parameter model. Now, you can use Gemma 2B and Gemma 7B pretrained and instruction-tuned models within SageMaker JumpStart. JumpStart is the machine learning (ML) hub of SageMaker that provides access to foundation models in addition to built-in algorithms and end-to-end solution templates to help you quickly get started with ML.
In this post, we walk through how to deploy the Gemma model and fine tune it for your use cases in SageMaker JumpStart. The complete notebook is available on GitHub.
Gemma model
Gemma is a family of lightweight, state-of-the-art models built from the same research and technology used to create the Gemini models. Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini. Gemma exhibits strong generalist capabilities in text domains and state-of-the-art understanding and reasoning skills at scale. It achieves better performance compared to other publicly available models of similar or larger scales across different domains, including question answering, commonsense reasoning, mathematics and science, and coding. Gemma released the model weights to support developer innovation using Gemma models. Gemma was launched with a new Responsible Generative AI Toolkit that provides guidance and essential tools for creating safer AI applications with Gemma.
Foundation models in SageMaker
JumpStart provides access to a range of models from popular model hubs including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and are adaptable to a wide category of use cases, such as text summarization, generating digital art, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.
You can now find foundation models from different model providers within JumpStart, enabling you to get started with foundation models quickly. You can find foundation models based on different tasks or model providers, and review model characteristics and usage terms. You can also try these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, your data, whether used for evaluating the model or using it at scale, is never shared with third parties.
Let’s explore how you can use the Llama Guard model in JumpStart.
Explore the Gemma model in Jumpstart
You can access Gemma foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.
In the AWS Management Console for SageMaker Studio, go to SageMaker JumpStart under Prebuilt and automated solutions. Jumpstart contains pre-trained models, notebooks, and prebuilt solutions.

On the SageMaker JumpStart landing page, you can find the Gemma model by searching for Gemma.
You can then select from a variety of Gemma model variants, including Gemma 2B, Gemma 7B, Gemma 2B instruct, and Gemma 7B instruct.

Choose the model card to view details about the model such as the license, data used to train, and how to use the model. You will also find a Deploy button, which takes you to a landing page where you can test inference with an example payload.

Deploy Gemma with SageMaker Python  SDK
You can find the code showing the deployment of Gemma on JumpStart and an example of how to use the deployed model in this GitHub notebook.
Start by selecting the SageMaker Model Hub model ID and model version to use when deploying Gemma.

model_id, model_version = “huggingface-llm-gemma-7b-instruct”, “*”

Choose a model ID from the following table, which details the default configuration options for the JumpStart deployment. Because of the large vocabulary size of 256 thousand tokens, Gemma 7B can only fit on a single A10G GPU when supporting a 1 thousand context length. For this reason, JumpStart uses a larger default instance for Gemma 7B.

Model ID
Default inference instance
Tensor parallel degree
Supported context Length

huggingface-llm-gemma-2b
ml.g5.xlarge
1
8k

huggingface-llm-gemma-2b-instruct
ml.g5.xlarge
1
8k

huggingface-llm-gemma-7b
ml.g5.12xlarge
4
8k

huggingface-llm-gemma-7b-instruct
ml.g5.12xlarge
4
8k

You can now deploy the model using SageMaker JumpStart. The following code uses the default instance ml.g5.12xlarge for the inference endpoint You can deploy the model on other instance types by passing instance_type in the JumpStartModel class. The deployment might take 5-10 minutes.

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version)
predictor= model.deploy(accept_eula=False)  # manually accept EULA here!

For successful deployment, you must manually change the accept_eula argument in the model’s deploy method to True. This model is deployed using the text-generation-inference (TGI) deep learning container.
Invoke endpoint
You can programmatically retrieve example payloads from the JumpStartModel object. This will help you get started by observing pre-formatted instruction prompts that Gemma can ingest.

example_payloads = model.retrieve_all_examples()

for payload in example_payloads:
response = predictor.predict(payload.body)
print(“Input:n”, payload.body[payload.prompt_key])
print(“Output:n”, response[0][“generated_text”].strip())
print(“n===============n”)

Before we look at specific prompts, let’s consider the chat template for Gemma Instruct models.

<bos><start_of_turn>usern[USER_PROMPT]<end_of_turn>n<start_of_turn>model

Here, you place your prompt in the [USER_PROMPT] location. There’s no support for a system instruction; instead, you can prepend the desired instruction to the user prompt. Additionally, if you have a multi-turn conversation, then the model prompt can alternate between user and assistant as needed.
Now consider a few instruction example prompts. Here, you ask Gemma to write a Hello World program.

payload = {
“inputs”: “<bos><start_of_turn>usernWrite a hello world module in Python<end_of_turn>n<start_of_turn>model”,
“parameters”: {‘max_new_tokens’: 256},
}
predictor.predict(payload)

The following is the expected output:

“`python
# This is a Python module called hello.py

def hello():
“””Prints a hello message.”””
print(“Hello, world!”)

if __name__ == “__main__”:
hello()
“`

**Usage:**

To use the hello.py module, you can import it and call the hello function like this:

“`python
import hello

hello.hello()
“`

**Output:**

“`
Hello, world!
“`

**Explanation:**

* The `hello.py` module defines a function called `hello()`.
* The `if __name__ == “__main__”` block checks if the script is being run directly or imported. If it is run directly, the `hello()` function is called.
* The `print(“Hello, world!”)` statement prints the hello message to the console.

**Note:**

* You can save the code above in a file named `hello.py` in your current directory.
* To run the script, you can use the following command:
“`
python hello.py
“`
* The output will be printed in the console:
“`
Hello, world!
“`

Next, invoke Gemma for the creative task of writing a poem.

payload = {
“inputs”: “<bos><start_of_turn>usernWrite me a poem about Machine Learning.<end_of_turn>n<start_of_turn>model”,
“parameters”: {‘max_new_tokens’: 256},
}
predictor.predict(payload)

The following is the output:

**Machine Learning**

In the realm of data, a tale unfolds,
Where algorithms dance and stories are told.
With every click, a model takes flight,
Unveiling patterns, hidden in the night.

Data whispers secrets, a treasure to find,
Machine learning listens, with a curious mind.
It sifts through terabytes, with lightning speed,
Uncovering insights, a hidden creed.

From images to text, it learns to see,
To understand the complexities of reality.
It paints a canvas, with insights deep,
Unveiling patterns, secrets to keep.

In the field of medicine, it takes a stand,
Diagnosing diseases, guiding a hand.
It analyzes data, with unmatched grace,
Bringing hope to those in a troubled place.

But with great power comes great responsibility,
As algorithms can bias, with malicious intent.
It’s a journey of learning, a constant fight,
To ensure fairness, day and night.

So let us harness the power of this tool,
To create a future, where insights bloom.
With wisdom and caution, we must guide,
The path of machine learning, side by side.

This looks pretty good!
Now, let’s look at latency and throughput performance benchmarking for model serving with the default JumpStart deployment configuration. Here, we show how model performance might differ for your typical endpoint workload. In the following tables, you can observe that small-sized queries (256 input words and 256 output tokens) are quite performant under a large number of concurrent users, reaching token throughput on the order of one thousand to two thousand tokens per second. However, as the number of input words approaches Gemma’s maximum supported context length of eight thousand tokens, the endpoint saturates its batching capacity—the number of concurrent requests allowed to be processed simultaneously—due to instance memory-bound constraints.
For more information on how to consider this information and adjust deployment configurations for your specific use case, see Benchmark and optimize endpoint deployment in Amazon SageMaker JumpStart.

.
Throughput (tokens/s)

Concurrent users
1
2
4
8
16
32
64
128

model
Instance type
Input words
Output tokens
.
.
.
.
.
.
.
.

gemma-2b-instruct
ml.g5.xlarge
256
256
73
137
262
486
829
1330
1849
1834

2048
256
69
126
227
373
537
704
764

7936
256
60
100
147
195
226
230

gemma-7b-instruct
ml.g5.12xlarge
256
256
62
119
227
413
601
811
937
962

2048
256
56
100
172
245
267
273

7936
256
44
67
77
77
78


.
P50 latency (ms/token)

Concurrent users
1
2
4
8
16
32
64
128

model
Instance type
Input words
Output tokens
.
.
.
.
.
.
.
.

gemma-2b-instruct
ml.g5.xlarge
256
256
13
14
15
16
19
23
33
49

2048
256
14
15
17
20
28
43
79

7936
256
16
19
26
39
68
136

Gemma-7b-instruct
ml.g5.12xlarge
256
256
16
16
17
19
26
38
57
110

2048
256
17
19
23
32
52
119

7936
256
22
29
45
105
197


Fine-tune Gemma using SageMaker Python SDK
Next, we show you how to fine-tune the Gemma 7B instruct model on a conversational-formatted dataset using QLoRA technique. As mentioned previously, due to the large vocabulary size of 256 thousand and the 8 thousand context length, JumpStart offers the following default configurations for QLoRA fine-tuning.

Model ID
Default training instance
Maximum input sequence length
Per device training batch size
Gradient accumulation steps

huggingface-llm-gemma-2b
ml.g5.2xlarge
1024
1
4

huggingface-llm-gemma-2b-instruct
ml.g5.2xlarge
1024
1
4

huggingface-llm-gemma-7b
ml.g5.12xlarge
2048
1
4

huggingface-llm-gemma-7b-instruct
ml.g5.12xlarge
2048
1
4

Let’s load and process the dataset in conversational format. The example dataset for this demonstration is OpenAssistant’s TOP-1 Conversation Threads.

from datasets import load_dataset

# Load the dataset
dataset = load_dataset(“OpenAssistant/oasst_top1_2023-08-25”)

The training data should be formulated in JSON lines (.jsonl) format, where each line is a dictionary representing a set of conversations. One example within the JSON lines file is shown below. For details on how to process the dataset, see the notebook in GitHub.

{‘dialog’: [
{‘content’: ‘what is the height of the empire state building’,
‘role’: ‘user’},
{‘content’: ‘381 meters, or 1,250 feet, is the height of the Empire State Building. If you also account for the antenna, it brings up the total height to 443 meters, or 1,454 feet’,
‘role’: ‘assistant’},
{‘content’: ‘Some people need to pilot an aircraft above it and need to know.nSo what is the answer in feet?’,
‘role’: ‘user’},
{‘content’: ‘1454 feet’, ‘role’: ‘assistant’}]
}
import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

model_id = “huggingface-llm-gemma-7b-instruct”

estimator = JumpStartEstimator(
model_id=model_id, environment={“accept_eula”: “false”} # manually accept EULA here!
)

# For the other hyperparameters, see the GitHub notebook attached in this blog.
estimator.set_hyperparameters(chat_dataset=”True”, peft_type=”lora”, max_input_length=”2048″, epoch=”3″)
estimator.fit({“training”: <your_S3_bucket_hosting_the_train_data>})

Underlying the training scripts, JumpStart uses HuggingFace SFTTrainer with QLoRA and FlashAttention. Flash Attention enables scaling efficiency, leading to faster training and inference. Besides chat-based fine-tuning, JumpStart also supports instruction and domain adaptation fine-tuning with QLoRA. For details, see the notebook in GitHub.
After the fine-tuning, you can deploy the fine-tuned model to a SageMaker endpoint.

fine_tuned_predictor= estimator.deploy()

You can choose to deploy the model fine-tuned on conversation data in SageMaker endpoint with HuggingFace messages API feature as an alternative approach. The benefit of using Message API feature is that it allow users to pass the query examples in raw format as shown in the following table and chat_template within the endpoint will convert the query examples into the proper format before sending them into the model. For details, see the deployment section after fine-tuning in the notebook.
After deploying the fine-tuned model in an endpoint, you can compare output responses from pre-trained and fine-tuned Gemma models. The examples are collected from test data, which is a hold-out set from the training data. The inference parameters are max_tokens as 512, top_p as 0.9, temperature as 0.4, and top_k as 50.
As you can see with the fine tuning results, the agent answers are more accurate.
To further evaluate your fine tuned models, you may explore FM evaluation with Amazon SageMaker Clarify to see how your fine tuned model performs in comparison to base model.

Inputs
Ground truth
Response from base model
Response from fine-tuned model

1
User: I want to work on my vocabulary, can you give me some simple ways that I can find new words to use to increase the diversity of my sentences? Assistant: Certainly! Improving your vocabulary is an ongoing process, but here are some simple ways to find new words and increase the diversity of your sentences: 1. Read widely: Reading books, articles, and blogs from different genres and subjects exposes you to new words and phrases. Note down any unfamiliar words you encounter and look up their meanings. 2. Use a thesaurus: Whenever you write or read, use a thesaurus to find synonyms for commonly used words. This not only helps you learn new words but also enhances your understanding of their subtle differences in meaning. 3. Word-a-day calendars or apps: Subscribe to a word-a-day email or download a vocabulary-building app. These tools introduce you to a new word each day, along with its meaning, pronunciation, and usage examples. 4. Join a book club or discussion group: Engaging in conversations and discussions about books, articles, or any other topic exposes you to different perspectives and new vocabulary. Listen attentively to others and try using new words in context. 5. Play word games: Games like Scrabble, Boggle, and crossword puzzles challenge your vocabulary skills and help you learn new words in an enjoyable way. 6. Watch movies, TV shows, or documentaries: Visual media often use diverse language, slang, and jargon. Watching movies, TV shows, or documentaries with subtitles can help you better understand and remember new words. 7. Practice writing: Regularly writing stories, essays, or journal entries encourages you to use new words and phrases. Experiment with different writing styles and genres to expand your vocabulary. 8. Learn a new word’s etymology: Understanding the origin and history of a word can make it easier to remember and use correctly. Researching the etymology also exposes you to related words and concepts. 9. Use flashcards or spaced repetition software: Create flashcards with new words, their meanings, and example sentences. Review them regularly or use spaced repetition software like Anki to optimize your learning. 10. Set personal goals: Set achievable targets, such as learning five new words per week. Keep track of your progress and reward yourself for meeting your goals. By consistently incorporating these strategies into your daily routine, you’ll gradually build a diverse and robust vocabulary. User: Hm, i really like the idea of reading. Can you recomend me a cool science-fiction thriller which i could read?
Assistant: Certainly! Here are a few recommendations for science-fiction thrillers that you might enjoy: 1. “The Three-Body Problem” by Liu Cixin – This award-winning novel explores the first contact between humanity and an alien civilization, and the political and personal ramifications that result. It’s a gripping and thought-provoking read that delves into deep questions about the nature of existence and the universe. 2. “Annihilation” by Jeff Vandermeer – This novel follows a group of scientists as they venture into a mysterious area known as “Area X,” where they encounter strange and unsettling phenomena. As they delve deeper into the area, they begin to uncover dark secrets about the nature of reality and the world around them. 3. “Dark Matter” by Blake Crouch – This novel follows a brilliant physicist named Jason Dessen, who is abducted and transported to a parallel universe. As he tries to find his way back to his own reality, he must confront his own choices and the consequences of his actions. 4. “The Andromeda Strain” by Michael Crichton – This classic novel follows a team of scientists as they try to contain and study a deadly extraterrestrial microbe that has landed on Earth. As they race against time to find a way to neutralize the microbe, they must also contend with political intrigue and personal conflicts. 5. “Ready Player One” by Ernest Cline – This novel is set in a dystopian future where the majority of the population spends their time in a virtual reality world known as the OASIS. When the creator of the OASIS dies and leaves behind a challenge to find a hidden Easter egg, a young protagonist sets out on a thrilling quest that will change his life forever. I hope one of these recommendations sparks your interest!
Assistant: The text does not contain information about your reading preferences or interests, therefore I cannot provide a book recommendation.
Assistant: Sure, here are a few science-fiction thriller recommendations that you might enjoy: – “Dune” by Frank Herbert: This epic sci-fi novel is set on the desert planet of Arrakis, where a young prince named Paul Atreides must face the challenges of a hostile environment and a treacherous political landscape. – “The Martian” by Andy Weir: This gripping sci-fi thriller follows the story of Mark Watney, an astronaut stranded on Mars after a dust storm, as he uses his ingenuity and resourcefulness to survive and eventually make his way back to Earth. – “The Time Machine” by H.G. Wells: This classic sci-fi novel takes a philosophical approach to the concept of time travel, exploring the consequences of altering the past and the future. – “The Blade Runner” by Philip K. Dick: This neo-noir sci-fi novel, set in a futuristic Los Angeles, explores the intersection of human and artificial intelligence, and its influence on society and the nature of reality. – “The Androids” by Isaac Asimov: This sci-fi novel explores the concept of artificial intelligence and its potential consequences, particularly when humans develop advanced androids that challenge traditional human values and beliefs. These are just a few suggestions, and ultimately the best book for you will depend on your personal preferences and what kind of experience you’re looking for in a sci-fi thriller. But I hope these recommendations give you a good starting point for exploring the genre. Please let me know if you have any further questions or if there’s anything I can help you with. user Thank you for recommending these books to me! I’m excited to dive into one of them soon! model You’re welcome! I’m glad I could help.

2
User: Write a Hello World program in VB.NET Assistant: Here you go: “`vbnet Public Module HelloWorld Public Sub Main() System.Console.WriteLine(“Hello, World!”) End Sub End Module “` User: Make it take input from the $ and then say “Hello, $” instead of “Hello, World!”
Assistant: Public Module HelloWorld Public Sub Main() System.Console.WriteLine(“What is your name?”) Dim name As String = System.Console.ReadLine() System.Console.WriteLine(“Hello, ” & name & “!”) End Sub End Module
Assistant: “`vbnet Public Module HelloWorld Public Sub Main() Dim name As String = Console.ReadLine() Console.WriteLine(“Hello, ” & name) End Sub End Module
Assistant: Sure! Here is an example: “`vbnet Public Module HelloWorld Public Sub Main() Dim userName As String = System.Console.ReadLine() System.Console.WriteLine(“Hello, ” & userName & “!”) End Sub End Module “`

3
User: Give me a list of to do items when starting a project to train a large language model that can understanding and respond to human questions Assistant: The following steps are a condensed list of steps to take when starting a project to train a large language model capable of comprehending and responding to human inquiries: 1.  Establish project scope and objectives, target audience. 2.  Gather and preprocess a comprehensive/diverse text dataset. 3.  Select an appropriate model architecture, adjust hyperparameters for accuracy/speed. 4.  Divide dataset into training, validation, testing subsets, implement data augmentation. 5.  Train model, monitor progress on validation set, adjust as needed. 6.  Evaluate model on test set, compare to benchmarks, analyze errors and biases. 7.  Fine-tune for specific tasks or domains through transfer learning or fine-tuning. 8.  Create interface for user input and model response. 9.  Test in controlled environment, gather feedback for improvement. 10.  Continuously monitor and improve performance through retraining and fine-tuning. This should give you a basic idea of what the required steps are. Should I go into more detail on one of the points? User: These tasks need to be broken down further, could you please provide an updated to-do list with a more detailed breakdown of the tasks ahead
Assistant: Here is a more detailed breakdown of each step: 1. Establish project scope and objectives: It is important to clearly define the goals of the project and what the target audience is. This will help you make decisions about what type of data to gather and what model architecture is appropriate. 2. Gather and preprocess data: This is an important step as the quality and diversity of the data will directly impact the performance of the model. Preprocessing may involve cleaning the data, tokenizing text, and converting words to numerical representations. 3. Select model architecture: There are many different types of models to choose from, including recurrent neural networks (RNNs), transformer models, and convolutional neural networks (CNNs). The choice will depend on the specific requirements of the project and the amount of data available. 4. Divide dataset and implement data augmentation: Splitting the data into training, validation, and testing subsets is important for monitoring the performance of the model during training. Data augmentation can be used to artificially increase the size of the dataset and improve the robustness of the model. 5. Train model and monitor progress: This step involves training the model on the training data, using the validation set to monitor progress and adjust hyperparameters as needed. It may be necessary to repeat this process several times until the model is performing well on the validation set. 6. Evaluate model on test set: Once the model is trained, it’s important to evaluate its performance on a separate test set to get an idea of its generalization ability. This can be done by comparing the model’s predictions to the actual outcomes and analyzing the errors. 7. Fine-tune for specific tasks or domains: This step involves further training the model on task-specific data to improve its performance for a particular domain. This can be done through transfer learning, where a pre-trained model is fine-tuned on new data, or through fine-tuning, where the model is trained from scratch on the new data. 8. Create an interface for user input: This involves designing an interface that allows users to input questions or requests, and receive responses from the model. 9. Test in controlled environment: Before deploying the model in a real-world setting, it’s important to test it in a controlled environment and gather feedback from users. This will help identify any errors or biases in the model and provide opportunities for improvement. 10. Continuously monitor and improve performance: Finally, it is important to continuously monitor the performance of the model and make updates and improvements as needed. This may involve retraining the model on new data, fine-tuning hyperparameters, or updating the architecture.
Assistant: ## To-Do List for Training a Large Language Model for Question Answering **1. Project Scope & Objectives:** * Define project goals and desired functionalities. * Identify target audience and their specific needs. * Determine the desired level of accuracy and response quality. **2. Data Acquisition & Preprocessing:** * Identify sources for training data and gather a diverse text dataset. * Clean and preprocess the text data to ensure consistency and quality. * Implement data augmentation techniques to increase the training data volume. **3. Model Selection & Configuration:** * Select an appropriate model architecture based on the task complexity and desired performance. * Fine-tune hyperparameters for model performance optimization. * Choose the most suitable framework and tools for model implementation. **4. Training & Evaluation:** * Divide the training data into training, validation, and testing subsets. * Train the model using the selected framework and hyperparameters. * Monitor training progress on the validation set and adjust model parameters as needed. * Evaluate the model performance on the test set against benchmarks and desired objectives. **5. Fine-Tuning & Interface Development:** * Fine-tune the model for specific tasks or domains through transfer learning. * Develop an interface for user input and model response, considering usability and accessibility. * Integrate the model with the interface and ensure seamless interaction. **6. Testing & Improvement:** * Test the model in a controlled environment with various input scenarios. * Gather feedback from users and experts to identify areas for improvement. * Continuously monitor and improve the model performance through retraining and fine-tuning. **Additional Considerations:** * Set a timeline and budget for the project. * Identify potential challenges and develop contingency plans. * Document the process and findings for future reference and collaboration. **Note:** This list is not exhaustive and may need to be adapted based on the specific project requirements.
Assistant: Sure, here is a more detailed breakdown of the tasks involved in training a large language model to understand and respond to human questions: **1. Establish Project Scope and Objectives** – Define the goals and objectives of the project, including the target audience and the desired functionality of the model. – Identify the specific tasks that the model will need to perform. **2. Gather and Preprocess a Comprehensive/Diverse Text Dataset** – Identify sources of text data that are relevant to the target audience and tasks. – Cleanse and preprocess the text data to ensure consistency and quality. – Create a balanced dataset that reflects the diversity of human language and thought. **3. Select an Appropriate Model Architecture, Adjust Hyperparameters for Accuracy/Speed** – Choose a model architecture that is well-suited for the tasks and data. – Fine-tune the model’s hyperparameters to optimize performance and speed. – Monitor the model’s performance on validation data to ensure that it is generalizable. **4. Divide Dataset into Training, Validation, Testing Subsets, Implement Data Augmentation** – Divide the dataset into training, validation, and testing subsets. – Implement data augmentation techniques to increase the diversity and size of the training data. **5. Train Model, Monitor Progress on Validation Set, Adjust as Needed** – Train the model using the training data. – Monitor the model’s progress on validation data to identify areas for improvement. – Adjust the model’s hyperparameters or architecture as needed to optimize performance. **6. Evaluate Model on Test Set, Compare to Benchmarks, Analyze Errors and Biases** – Evaluate the model’s performance on the test data. – Compare the model’s performance to benchmarks and industry standards. – Analyze errors and biases to identify areas for improvement. **7. Fine-Tune for Specific Tasks or Domains Through Transfer Learning or Fine-Tuning** – Fine-tune the model for specific tasks or domains by transferring knowledge from a pre-trained model. – Use transfer learning to leverage existing knowledge and improve the model’s performance. **8. Create Interface for User Input and Model Response** – Create an interface that allows users to input questions and receive responses from the model. – Design the interface to be user-friendly and intuitive. **9. Test in Controlled Environment, Gather Feedback for Improvement** – Test the model in a controlled environment to ensure that it is functioning as expected. – Gather feedback from users to identify areas for improvement. **10. Continuously Monitor and Improve Performance Through Retraining and Fine-Tuning** – Continuously monitor the model’s performance. – Retrain and fine-tune the model as needed to improve its performance. These are the steps involved in training a large language model to understand and respond to human questions. Please let me know if you have question.

Clean up
After you have tested the endpoints, delete the SageMaker inference endpoints and the model to avoid incurring charges.
Conclusion
In this post, we showed you how to get started with Gemma in SageMaker Studio and deploy the model for inference. We also showed you how you can fine tune Gemma models on SageMaker Jumpstart.
Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.
This guidance is for informational purposes only. You should still perform your own independent assessment, and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party model referenced in this guidance. AWS has no control or authority over the third-party model referenced in this guidance, and does not make any representations or warranties that the third-party model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties or guarantees that any information in this guidance will result in a particular outcome or result.

About the authors
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.
Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in future and bring economical and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Moderate audio and text chats using AWS AI services and LLMs

Online gaming and social communities offer voice and text chat functionality for their users to communicate. Although voice and text chat often support friendly banter, it can also lead to problems such as hate speech, cyberbullying, harassment, and scams. Today, many companies rely solely on human moderators to review toxic content. However, verifying violations in chat is time-consuming, error-prone, and challenging to scale.
In this post, we introduce solutions that enable audio and text chat moderation using various AWS services, including Amazon Transcribe, Amazon Comprehend, Amazon Bedrock, and Amazon OpenSearch Service.
Social platforms seek an off-the-shelf moderation solution that is straightforward to initiate, but they also require customization for managing diverse policies. Latency and cost are also critical factors that must be taken into account. By orchestrating toxicity classification with large language models (LLMs) using generative AI, we offer a solution that balances simplicity, latency, cost, and flexibility to satisfy various requirements.
The sample code for this post is available in the GitHub repository.
Audio chat moderation workflow
An audio chat moderation workflow could be initiated by a user reporting other users on a gaming platform for policy violations such as profanity, hate speech, or harassment. This represents a passive approach to audio moderation. The system records all audio conversations without immediate analysis. When a report is received, the workflow retrieves the related audio files and initiates the analysis process. A human moderator then reviews the reported conversation, investigating its content to determine if it violates platform policy.

Alternatively, the workflow could be triggered proactively. For instance, in a social audio chat room, the system could record all conversations and apply analysis.

Both passive and proactive approaches can trigger the following pipeline for audio analysis.
The audio moderation workflow involves the following steps:

The workflow begins with receiving the audio file and storing it on a Amazon Simple Storage Service (Amazon S3) bucket for Amazon Transcribe to access.
The Amazon Transcribe StartTranscriptionJob API is invoked with Toxicity Detection enabled. Amazon Transcribe converts the audio into text, providing additional information about toxicity analysis. For more information about toxicity analysis, refer to Flag harmful language in spoken conversations with Amazon Transcribe Toxicity Detection.
If the toxicity analysis returns a toxicity score exceeding a certain threshold (for example, 50%), we can use Knowledge Bases for Amazon Bedrock to evaluate the message against customized policies using LLMs.
The human moderator receives a detailed audio moderation report highlighting the conversation segments considered toxic and in violation of policy, allowing them to make an informed decision.

The following screenshot shows a sample application displaying toxicity analysis for an audio segment. It includes the original transcription, the results from the Amazon Transcribe toxicity analysis, and the analysis conducted using an Amazon Bedrock knowledge base through the Amazon Bedrock Anthropic Claude V2 model.
The LLM analysis provides a violation result (Y or N) and explains the rationale behind the model’s decision regarding policy violation. Furthermore, the knowledge base includes the referenced policy documents used by the evaluation, providing moderators with additional context.

Amazon Transcribe Toxicity Detection
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward for developers to add speech-to-text capability to their applications. The audio moderation workflow uses Amazon Transcribe Toxicity Detection, which is a machine learning (ML)-powered capability that uses audio and text-based cues to identify and classify voice-based toxic content across seven categories, including sexual harassment, hate speech, threats, abuse, profanity, insults, and graphic language. In addition to analyzing text, Toxicity Detection uses speech cues such as tones and pitch to identify toxic intent in speech.
The audio moderation workflow activates the LLM’s policy evaluation only when the toxicity analysis exceeds a set threshold. This approach reduces latency and optimizes costs by selectively applying LLMs, filtering out a significant portion of the traffic.
Use LLM prompt engineering to accommodate customized policies
The pre-trained Toxicity Detection models from Amazon Transcribe and Amazon Comprehend provide a broad toxicity taxonomy, commonly used by social platforms for moderating user-generated content in audio and text formats. Although these pre-trained models efficiently detect issues with low latency, you may need a solution to detect violations against your specific company or business domain policies, which the pre-trained models alone can’t achieve.
Additionally, detecting violations in contextual conversations, such as identifying child sexual grooming conversations, requires a customizable solution that involves considering the chat messages and context outside of it, such as user’s age, gender, and conversation history. This is where LLMs can offer the flexibility needed to extend these requirements.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies. These solutions use Anthropic Claude v2 from Amazon Bedrock to moderate audio transcriptions and text chat messages using a flexible prompt template, as outlined in the following code:

Human: You are a Trust & Safety expert. Your job is to review user chat message and decide if it violate the policy.
You will find the chat message in <message> tag, and find the policy in the <policy> tag. You can find additional rules in the <rule> tag to assist your decision.

<policy>{policy}</policy>
<message>{message}</message>
<rule>{rule}</rule>

Does the chat message violate the policy? Please consider and provide your analysis in the <analysis> tag, breaking down each rule in the rule section, and keep and analysis within 100 words. Respond in the <answer> tag with either ‘Y’ or ‘N’. ‘Y’ indicates that the message violates the policy, while ‘N’ means the content is safe and does not violate the policy.

Assistant:

The template contains placeholders for the policy description, the chat message, and additional rules that requires moderation. The Anthropic Claude V2 model delivers responses in the instructed format (Y or N), along with an analysis explaining why it thinks the message violates the policy. This approach allows you to define flexible moderation categories and articulate your policies in human language.
The traditional method of training an in-house classification model involves cumbersome processes such as data annotation, training, testing, and model deployment, requiring the expertise of data scientists and ML engineers. LLMs, in contrast, offer a high degree of flexibility. Business users can modify prompts in human language, leading to enhanced efficiency and reduced iteration cycles in ML model training.
Amazon Bedrock knowledge bases
Although prompt engineering is efficient for customizing policies, injecting lengthy policies and rules directly into LLM prompts for each message may introduce latency and increase cost. To address this, we use Amazon Bedrock knowledge bases as a managed Retrieval Augmented Generation (RAG) system. This enables you to manage the policy document flexibly, allowing the workflow to retrieve only the relevant policy segments for each input message. This minimizes the number of tokens sent to the LLMs for analysis.
You can use the AWS Management Console to upload the policy documents to an S3 bucket and then index the documents to a vector database for efficient retrieval. The following is a conceptual workflow managed by an Amazon Bedrock knowledge base that retrieves documents from Amazon S3, splits the text into chunks, and invokes the Amazon Bedrock Titan text embeddings model to convert the text chunks into vectors, which are then stored in the vector database.

In this solution, we use Amazon OpenSearch Service as the vector store. OpenSearch is a scalable, flexible, and extensible open source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license. OpenSearch Service is a fully managed service that makes it straightforward to deploy, scale, and operate OpenSearch in the AWS Cloud.
After the document is indexed in OpenSearch Service, the audio and text moderation workflow sends chat messages, triggering the following query flow for customized policy evaluation.

The process is similar to the initiation workflow. First, the text message is converted to text embeddings using the Amazon Bedrock Titan Text Embedding API. These embeddings are then used to perform a vector search against the OpenSearch Service database, which has already been populated with document embeddings. The database returns policy chunks with the highest matching score, relevant to the input text message. We then compose prompts containing both the input chat message and the policy segment, which are sent to Anthropic Claude V2 for evaluation. The LLM model returns an analysis result based on the prompt instructions.
For detailed instructions on how to create a new instance with your policy document in an Amazon Bedrock knowledge base, refer to Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.
Text chat moderation workflow
The text chat moderation workflow follows a similar pattern to audio moderation, but it uses Amazon Comprehend toxicity analysis, which is tailored for text moderation. The sample app supports an interface for uploading bulk text files in CSV or TXT format and provides a single-message interface for quick testing. The following diagram illustrates the workflow.

The text moderation workflow involves the following steps:

The user uploads a text file to an S3 bucket.
Amazon Comprehend toxicity analysis is applied to the text message.
If the toxicity analysis returns a toxicity score exceeding a certain threshold (for example, 50%), we use an Amazon Bedrock knowledge base to evaluate the message against customized policies using the Anthropic Claude V2 LLM.
A policy evaluation report is sent to the human moderator.

Amazon Comprehend toxicity analysis
In the text moderation workflow, we use Amazon Comprehend toxicity analysis to assess the toxicity level of the text messages. Amazon Comprehend is a natural language processing (NLP) service that uses ML to uncover valuable insights and connections in text. The Amazon Comprehend toxicity detection API assigns an overall toxicity score to text content, ranging from 0–1, indicating the likelihood of it being toxic. It also categorizes text into the following categories and provides a confidence score for each: hate_speech, graphic, harrassement_or_abuse, sexual, violence_or_threat, insult, and profanity.
In this text moderation workflow, Amazon Comprehend toxicity analysis plays a crucial role in identifying whether the incoming text message contains toxic content. Similar to the audio moderation workflow, it includes a condition to activate the downstream LLM policy evaluation only when the toxicity analysis returns a score exceeding a predefined threshold. This optimization helps reduce overall latency and cost associated with LLM analysis.
Summary
In this post, we introduced solutions for audio and text chat moderation using AWS services, including Amazon Transcribe, Amazon Comprehend, Amazon Bedrock, and OpenSearch Service. These solutions use pre-trained models for toxicity analysis and are orchestrated with generative AI LLMs to achieve the optimal balance in accuracy, latency, and cost. They also empower you to flexibly define your own policies.
You can experience the sample app by following the instructions in the GitHub repo.

About the author
Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.

Set up cross-account Amazon S3 access for Amazon SageMaker notebooks i …

Advancements in artificial intelligence (AI) and machine learning (ML) are revolutionizing the financial industry for use cases such as fraud detection, credit worthiness assessment, and trading strategy optimization. To develop models for such use cases, data scientists need access to various datasets like credit decision engines, customer transactions, risk appetite, and stress testing. Managing appropriate access control for these datasets among the data scientists working on them is crucial to meet stringent compliance and regulatory requirements. Typically, these datasets are aggregated in a centralized Amazon Simple Storage Service (Amazon S3) location from various business applications and enterprise systems. Data scientists across business units working on model development using Amazon SageMaker are granted access to relevant data, which can lead to the requirement of managing prefix-level access controls. With an increase in use cases and datasets using bucket policy statements, managing cross-account access per application is too complex and long for a bucket policy to accommodate.
Amazon S3 Access Points simplify managing and securing data access at scale for applications using shared datasets on Amazon S3. You can create unique hostnames using access points to enforce distinct and secure permissions and network controls for any request made through the access point.
S3 Access Points simplifies the management of access permissions specific to each application accessing a shared dataset. It enables secure, high-speed data copy between same-Region access points using AWS internal networks and VPCs. S3 Access Points can restrict access to VPCs, enabling you to firewall data within private networks, test new access control policies without impacting existing access points, and configure VPC endpoint policies to restrict access to specific account ID-owned S3 buckets.
This post walks through the steps involved in configuring S3 Access Points to enable cross-account access from a SageMaker notebook instance.
Solution overview
For our use case, we have two accounts in an organization: Account A (111111111111), which is used by data scientists to develop models using a SageMaker notebook instance, and Account B (222222222222), which has required datasets in the S3 bucket test-bucket-1. The following diagram illustrates the solution architecture.

To implement the solution, complete the following high-level steps:

Configure Account A, including VPC, subnet security group, VPC gateway endpoint, and SageMaker notebook.
Configure Account B, including S3 bucket, access point, and bucket policy.
Configure AWS Identity and Access Management (IAM) permissions and policies in Account A.

You should repeat these steps for each SageMaker account that needs access to the shared dataset from Account B.
The names for each resource mentioned in this post are examples; you can replace them with other names as per your use case.
Configure Account A
Complete the following steps to configure Account A:

Create a VPC called DemoVPC.
Create a subnet called DemoSubnet in the VPC DemoVPC.
Create a security group called DemoSG.
Create a VPC S3 gateway endpoint called DemoS3GatewayEndpoint.
Create the SageMaker execution role.
Create a notebook instance called DemoNotebookInstance and the security guidelines as outlined in How to configure security in Amazon SageMaker.

Specify the Sagemaker execution role you created.
For the notebook network settings, specify the VPC, subnet, and security group you created.
Make sure that Direct Internet access is disabled.

You assign permissions to the role in subsequent steps after you create the required dependencies.

Configure Account B
To configure Account B, complete the following steps:

In Account B, create an S3 bucket called test-bucket-1 following Amazon S3 security guidance.
Upload your file to the S3 bucket.
Create an access point called test-ap-1 in Account B.

Don’t change or edit any Block Public Access settings for this access point (all public access should be blocked).

Attach the following policy to your access point:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam:: 111111111111:role/demo ”
},
“Action”: [“s3:GetObject”, “s3:GetObjectVersion”, “s3:PutObject”, “s3:PutObjectAcl”]
“Resource”: [
“arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1”,
” arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1/object/*”
]
}
]
}

The actions defined in the preceding code are sample actions for demonstration purposes. You can define the actions as per your requirements or use case.

Add the following bucket policy permissions to access the access point:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: ” arn:aws:iam:: 111111111111:role/demo ”
},
“Action” : [“s3:GetObject”,”s3:ListBucket”],
“Resource” : [“arn:aws:s3:::test-bucket-1 ”, ” arn:aws:s3:::test-bucket-1/*”]
“Condition”: {
“StringEquals”: {
“s3:DataAccessPointAccount”: “222222222222”
}
}
}
]
}

The preceding actions are examples. You can define the actions as per your requirements.
Configure IAM permissions and policies
Complete the following steps in Account A:

Confirm that the SageMaker execution role has the AmazonSagemakerFullAccess custom IAM inline policy, which looks like the following code:

{
“Sid”: “VisualEditor2”,
“Effect”: “Allow”,
” Action”: [“s3:GetObject”, “s3:GetObjectVersion”, “s3:PutObject”, “s3:PutObjectAcl”]
“Resource”: [
“arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1 ”,
“arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1 /object/*”, “arn:aws:s3:::test-bucket-1”,
” arn:aws:s3:::test-bucket-1/*”
]
}

The actions in the policy code are sample actions for demonstration purposes.

Go to the DemoS3GatewayEndpoint endpoint you created and add the following permissions:

{

“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AllowCrossAccountAccessThroughAccessPoint”,
“Effect”: “Allow”,
“Principal”: “*”,
“Action”: [
“s3:Get*”,
“s3:List*”,
“s3:Put*”
],
“Resource”: “: [
“arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1 ”,
“arn:aws:s3:us-east-1: 222222222222:accesspoint/test-ap-1 /object/*”, “arn:aws:s3:::test-bucket-1 ”,
” arn:aws:s3:::test-bucket-1/*”
]

}
]
}

To get a prefix list, run the AWS Command Line Interface (AWS CLI) describe-prefix-lists command:

aws ec2 describe-prefix-lists

In Account A, Go to the security group DemoSG for the target SageMaker notebook instance
Under Outbound rules, create an outbound rule with All traffic or All TCP, and then specify the destination as the prefix list ID you retrieved.

This completes the setup in both accounts.
Test the solution
To validate the solution, go to the SageMaker notebook instance terminal and enter the following commands to list the objects through the access point:

To list the objects successfully through S3 access point test-ap-1:

aws s3 ls arn:aws:s3:us-east-1:222222222222:accesspoint/Test-Ap-1

To get the objects successfully through S3 access point test-ap-1:

aws s3api get-object –bucket arn:aws:s3:us-east-1:222222222222:accesspoint/test-ap-1 –key sample2.csv test2.csv

Clean up
When you’re done testing, delete any S3 access points and S3 buckets. Also, delete any Sagemaker notebook instances to stop incurring charges.
Conclusion
In this post, we showed how S3 Access Points enables cross-account access to large, shared datasets from SageMaker notebook instances, bypassing size constraints imposed by bucket policies while configuring at-scale access management on shared datasets.
To learn more, refer to Easily Manage Shared Data Sets with Amazon S3 Access Points.

About the authors
Kiran Khambete is working as Senior Technical Account Manager at Amazon Web Services (AWS). As a TAM, Kiran plays a role of technical expert and strategic guide to helping Enterprise customers achieving their business goals.
Ankit Soni with total experience of 14 years holds the position of Principal Engineer at NatWest Group, where he has served as a Cloud Infrastructure Architect for the past six years.
Kesaraju Sai Sandeep is a Cloud Engineer specializing in Big Data Services at AWS.

AddShoppers vs. Customers.ai: An Honest Assessment

All successful businesses have one thing in common: they know how to reach potential customers in a way that convinces them to become actual customers. Because this is so fundamentally important, there are lots of platforms and tools that claim to be the best tool for the job. 

It can be difficult to tell from the outside what features are important and which tools are actually most effective. 

When comparing two different companies and their platforms, important questions to keep in mind are: 

How many potential customers can it identify? 

Do you own the information once it’s generated? 

Do emails come from your domain? 

Are the campaigns targeted and personalized?

Ideally, you want the answer to each of these questions to be yes. If the answer to even one of these is no, you’re losing out on a lot of leverage and flexibility that could be the key to sustainable, long-term success. 

Now that you know what matters, we can move on to the crucial question of the day: how do AddShoppers and Customers.ai stack up? Dig into the blogpost to find out. 

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Customers.ai vs. AddShoppers: How Many Potential Customers Can Each Identify? 

For most companies, this will be the most important question. You’re trying to add more contacts to your marketing funnels, right? So, how many contacts will the tools actually identify for you? 

Customers.ai’s Website Visitor ID X-Ray pixel identifies between 20% and 25% of visitors to your website! 

AddShoppers, on the other hand, identifies 12%. 

That’s a pretty big difference! 

Curious to see how many contacts you could generate? Drop your site in here and Customers.ai’s tool will analyze your site traffic and show you your potential! 

The X-Ray pixel is easy to install and works like a charm. Here’s a breakdown of its function! 

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Customers.ai vs. AddShoppers: Do you Own the Information Once it’s Generated? 

Identifying visitors is a great first step. But identifying anonymous visitors is just the first step in turning them into customers! 

Learning who reaches your site is valuable but the real value depends on what you’re able to do with that information after the fact. What exactly the best next steps are depends on several factors but you want to be sure that no matter what those next steps are, you’re able to actually implement them. 

The only way to implement them is to own the data yourself. 

When you’ve got their contact information, you can put them into an existing email nurture campaign that’s already effective, put them in a remarketing audience to build recognition and trust, or anything else that makes sense for you! 

When you don’t own the information, you’re at the mercy of the data source. And that means you’re constricted by their technology and limitations, whether that’s best for your business or not. 

WIth Customers.ai: You own the data you generate! 

With AddShoppers: They own the data! They control how you reach your potential customers. They send a couple emails and if they don’t convert, oh well! 

With Customers.ai, you’re able to use the full force of all your marketing funnels as you see fit. 

This is just one example of how you can use your data and our integrations to supercharge your existing flows and make the most of your new contacts. 

Customers.ai vs. AddShoppers: Do Emails Come from Your Domain? 

Marketing to potential customers is a delicate process. You’re not just trying to keep your name in their head, you’re trying to gain their trust. 

Once they trust you, they’re much more likely to go from a potential customer to a customer-for-life! 

One crucial element of building trust is getting your contacts to feel like you, personally, are reaching out to them, personally. This is especially important as people get increasingly aware of phishing campaigns and other nefarious communications that might come their way. 

To that end, perhaps the most significant signal in email marketing is this: does the domain of the email address match with the person I’m supposedly receiving an email from? In other words, if you get an email from Mary’s Jewelry, I’m much more likely to read it and click a link if the sender is from marysjewelry.com. 

Emails that you send through Customers.ai come from your own domain. 

Emails that you send through AddShoppers come from their domain. That means that those contacts won’t feel like they’re hearing from you. Even worse, they may think of you as a company that is sending them untargeted spam or has sold their email address. 

Instead of creating trust and building a relationship with your contacts, you risk alienating them and creating a negative association with your brand. 

Customers.ai vs. AddShoppers: Are the Campaigns Targeted and Personalized? 

Nurturing is maybe the most important word in marketing. 

The concept is simple: not all potential customers are alike and so you have to market to them differently. 

Smart marketers use a variety of different signals to segment their audience into different nurture campaigns. Those intent signals include: 

Personal Details

 Personal Touch in Communication: Make your messages stand out by incorporating your leads’ names and other personal details. This approach of personalizing communication, such as using their names in emails, significantly enhances engagement levels.

Visitor Interests on Your Site 

Segmentation Based on Interests: Organize your leads according to the specific landing pages they have shown interest in. This segmentation allows you to identify their particular needs or the solutions they are searching for, enabling you to customize your content and offers to align with their interests.

User Activity and Engagement 

Segmenting by Engagement: Pay attention to leads that navigate through several pages or invest time in exploring your site thoroughly. These individuals demonstrate a higher level of engagement. Focus your efforts on these engaged users by offering them more in-depth information or exclusive deals.

Time Investment on Content 

Gauging Interest Through Time Spent: When a lead spends a considerable amount of time on a particular page, it indicates a deep interest in the subject. Categorize these users separately to provide them with more detailed content or tailored offers that correlate with their demonstrated interests.

Geographical Considerations 

Localization of Marketing Efforts: Customize your marketing strategies based on the geographical locations of your leads. This could mean offering region-specific promotions, organizing local events, or creating content that aligns with local trends and preferences.

You use the most important intent signals to you to segment your nurture campaigns so that you maximize effectiveness without having to build hundreds and hundreds of different campaigns.

Strategies for Refined Segmentation

Integrating Multiple Segments: Enhance the precision of your targeting by combining various data points, such as focusing on leads from a particular industry, of a certain company size, who have shown significant interest in a specific topic.

Continuous Optimization: Implement A/B testing to identify which segments are more responsive to certain content or offers. Use this insight to continuously refine your segmentation for better results.

Adaptability is Key: Remember that as leads progress through the sales funnel, their interests and requirements may evolve. Maintain flexibility in your segmentation and adjust your strategies to stay relevant to their current needs.

Building smartly segmented, personalized campaigns is the best way to get new customers. 

With Customers.ai, you can build personalized and targeted campaigns. This includes everything from digital ad retargeting to abandoned cart flows. Contacts come with the intent information you need to execute on these campaigns: what pages they looked at, how long they looked at them, and more! 

With AddShoppers, you cannot build personalized and targeted campaigns. They own the data and contact; you don’t. They send a couple generic emails and if the contact doesn’t convert, that information is lost forever. 

Customers.ai vs. AddShoppers: Conclusion

To sum things up, here’s where Customers.ai and AddShoppers stand on the crucial questions: 

How many potential customers can it identify? 

Customers.ai can identify 20-25% of your anonymous website visitors. 

AddShoppers can identify 12% of your anonymous website visitors. 

Do you own the information once it’s generated? 

You own the contacts Customers.ai generates from your website. 

You do not own the contacts AddShoppers generates from your website. 

Do emails come from your domain? 

When you send emails from Customers.ai’s system, they come from your domain. You can also export the contacts to MailChimp, Sendgrid, or other email systems and reach your new contacts that way. 

Are the campaigns targeted and personalized?

You can customize your campaigns to contacts generated by Customers.ai to your heart’s content! 

You cannot customize and personalize the campaigns sent from AddShoppers. 

Interested in revolutionizing your approach to marketing? See how many contacts Customers.ai could generate for you and get started today! 

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post AddShoppers vs. Customers.ai: An Honest Assessment appeared first on Customers.ai.

Retrieval Augmented Thoughts (RAT): An AI Prompting Strategy that Syne …

The quest for models that can think, reason, and generate outputs similar to a human’s capacity for complex problem-solving has been paramount. Large language models (LLMs) are at the forefront, designed to mimic human-like understanding and articulation of ideas. Despite remarkable achievements, these models often grapple with the challenge of maintaining factual accuracy over extended reasoning tasks, leading to what is known as hallucinations – generating plausible but factually incorrect information. This phenomenon is particularly pronounced in scenarios requiring a series of logical steps, highlighting a gap in the LLMs’ ability to reason with precision and context awareness over longer horizons.

The endeavor to bridge this gap has led researchers to propose various methodologies aiming to refine the reasoning process of LLMs. Earlier approaches have explored the integration of external information retrieval with model-generated content, attempting to anchor the models’ outputs in factual accuracy. However, these methods typically fall short in dynamically refining the reasoning process, often producing results that, while improved, still need to improve the desired level of contextual understanding and accuracy.

Researchers from Peking University, the University of California Los Angeles, and the Beijing Institute for General Artificial Intelligence proposed the Retrieval Augmented Thoughts (RAT) method directly responds to maintaining factual accuracy in LLMs. RAT is a novel approach emphasizing the iterative revision of the model’s generated thoughts. RAT effectively mitigates the issue of hallucinations by harnessing external information relevant not just to the initial query but also to the evolving context of the model’s reasoning process. This is achieved by revising each step of the model’s generated chain of thoughts with pertinent information retrieved from vast databases, ensuring that each reasoning step is grounded in accuracy and relevance.

The RAT method’s versatility excels across long-horizon generation tasks, from generating complex code to solving intricate mathematical problems, crafting creative narratives, and planning functions in simulated environments. RAT consistently enhances the performance of LLMs, which is quantified in significant performance improvements. For instance, it has led to an average increase of 13.63% in rating scores for code generation tasks and marked enhancements in mathematical reasoning with a 16.96% increase in rating scores, 19.2% in creative writing rating scores, and a significant 42.78% in embodied task planning tasks. These achievements underscore RAT’s efficacy and its potential as a universally applicable solution for enhancing LLM reasoning capabilities.

RAT’s implementation reveals the potential for LLMs to achieve a more human-like ability to reason and generate responses. By iteratively refining the thought process with contextually relevant information, the method advances the frontier of what LLMs can achieve, setting new standards for accuracy, reliability, and context awareness in AI-generated content.

In conclusion, the Retrieval Augmented Thoughts (RAT) method can be presented in the following points:

Bridges the gap in LLMs’ ability to maintain factual accuracy over extended reasoning tasks.

Mitigates hallucinations by revising each reasoning step with pertinent, retrieved information, ensuring contextually aware outputs.

Demonstrates versatility across various tasks, including code generation, mathematical reasoning, creative writing, and task planning, showcasing universal applicability.

Sets new benchmarks for the performance, accuracy, and reliability of LLM outputs, paving the way for future advancements in AI reasoning capabilities. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….
The post Retrieval Augmented Thoughts (RAT): An AI Prompting Strategy that Synergies Chain of Thought (CoT) Prompting and Retrieval Augmented Generation (RAG) to Address the Challenging Long-Horizon Reasoning and Generation Tasks appeared first on MarkTechPost.

Meet Modeling Collaborator: A Novel Artificial Intelligence Framework …

The field of computer vision has traditionally focused on recognizing objectively agreed-upon concepts such as animals, vehicles, or specific objects. However, many practical, real-world applications require identifying subjective concepts that may vary significantly among individuals, such as predicting emotions, assessing aesthetic appeal, or moderating content. 

For example, what constitutes “unsafe” content may differ based on individual perspectives, and a food critic’s definition of “gourmet” may not align with others. There is a growing need for user-centric training frameworks to address this challenge that allow anyone to train subjective vision models tailored to their specific criteria.

Agile Modeling recently introduced a user-in-the-loop framework to formalize transforming any visual concept into a vision model. However, existing approaches often require significant manual effort and need more efficiency. For instance, their active learning algorithm necessitates users to label many training images iteratively, which can be tedious and time-consuming. This limitation underscores the need for more efficient methods that leverage human capabilities while minimizing manual effort.

One key capability humans possess is the ability to decompose complex subjective concepts into more manageable and objective components using first-order logic. By breaking down subjective concepts into objective clauses, individuals can define complex ideas in a non-laborious and cognitively effortless manner. The Modeling Collaborator harnesses this cognitive process. This tool empowers users to build classifiers by decomposing subjective concepts into their constituent sub-components, significantly reducing manual effort and increasing efficiency.

Modeling Collaborator employs advancements in large language models (LLMs) and vision-language models (VLMs) to facilitate training. The system streamlines the process of defining and classifying subjective concepts by utilizing an LLM to break down concepts into digestible questions for a Visual Question Answering (VQA) model. Users are only required to manually label a small validation set of 100 images, significantly reducing the annotation burden.

Moreover, Modeling Collaborator stands out from existing zero-shot methods on subjective concepts, particularly on more challenging tasks. Compared to previous approaches like Agile Modeling, Modeling Collaborator not only surpasses the quality of crowd-raters on difficult concepts but also significantly reduces the need for manual ground-truth annotation by orders of magnitude. By lowering the barriers to developing classification models, Modeling Collaborator empowers users to translate their ideas into reality more rapidly, paving the way for a new wave of end-user applications in computer vision.

Furthermore, by providing a more accessible and efficient approach to building subjective vision models, Modeling Collaborator can potentially revolutionize the development of AI applications. With reduced manual effort and costs, a broader range of users, including those without extensive technical expertise, can participate in creating customized vision models tailored to their specific needs and preferences. This democratization of AI development can lead to the emergence of innovative applications across various domains, including healthcare, education, entertainment, and more. Ultimately, by empowering users to rapidly convert their ideas into reality, Modeling Collaborator contributes to the democratization of AI and fosters a more inclusive and diverse landscape of AI-powered solutions.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….
The post Meet Modeling Collaborator: A Novel Artificial Intelligence Framework that Allows Anyone to Train Vision Models Using Natural Language Interactions and Minimal Effort appeared first on MarkTechPost.

From Text to Visuals: How AWS AI Labs and University of Waterloo Are C …

In human-computer interaction, multimodal systems that utilize text and images promise a more natural and engaging way for machines to communicate with humans. Such systems, however, are heavily dependent on datasets that combine these elements meaningfully. Traditional methods for creating these datasets have often fallen short, relying on static image databases with limited variety or raising significant privacy and quality concerns when sourcing images from the real world.

Introducing MAGID (Multimodal Augmented Generative Images Dialogues), a groundbreaking framework born out of the collaborative efforts of researchers from the esteemed University of Waterloo and the innovative AWS AI Labs. This cutting-edge approach is set to redefine the creation of multimodal dialogues by seamlessly integrating diverse and high-quality synthetic images with text dialogues. The essence of MAGID lies in its ability to transform text-only conversations into rich, multimodal interactions without the pitfalls of traditional dataset augmentation techniques.

MAGID’s heart is a meticulously designed pipeline consisting of three core components:

An LLM-based scanner

A diffusion-based image generator

A comprehensive quality assurance module

The process begins with the scanner identifying text utterances within dialogues that would benefit from visual augmentation. This selection is critical, as it determines the contextual relevance of the images to be generated.

Following the selection, the diffusion model takes center stage, generating images that complement the chosen utterances and enrich the overall dialogue. This model excels at producing varied and contextually aligned images, drawing from various visual concepts to ensure the generated dialogues reflect the diversity of real-world conversations.

However, the generation of images is only part of the equation. MAGID incorporates a meticulously designed and comprehensive quality assurance module to ensure the augmented dialogues’ utility and integrity. This module evaluates the generated images on several fronts, including their alignment with the corresponding text, aesthetic quality, and adherence to safety standards. It ensures that each image matches the text in context and content, meets high visual standards, and avoids inappropriate content.

The efficacy of MAGID was rigorously tested against state-of-the-art baselines and through comprehensive human evaluations. The results were nothing short of remarkable, with MAGID not only matching but often surpassing other methods in creating multimodal dialogues that were engaging, informative, and aesthetically pleasing. Specifically, human evaluators consistently rated MAGID-generated dialogues as superior, particularly noting the relevance and quality of the images when compared to those produced by retrieval-based methods. Including diverse and contextually aligned images significantly enhanced the dialogues’ realism and engagement, as evidenced by MAGID’s favorable comparison to real datasets in human evaluation metrics.

MAGID offers a powerful solution to the longstanding challenges in multimodal dataset generation through its sophisticated blend of generative models and quality assurance. By eschewing reliance on static image databases and mitigating privacy concerns associated with real-world images, MAGID paves the way for creating rich, diverse, and high-quality multimodal dialogues. This advancement is not just a technical achievement but a stepping stone toward realizing the full potential of multimodal interactive systems. As these systems become increasingly integral to our digital lives, frameworks like MAGID, ensure they can evolve in ways that are both innovative and aligned with the nuanced dynamics of human conversation.

In summary, the introduction of MAGID by the team from the University of Waterloo and AWS AI Labs marks a significant leap forward in AI and human-computer interaction. By addressing the critical need for high-quality, diverse multimodal datasets, MAGID enables the development of more sophisticated and engaging multimodal systems. Its ability to generate synthetic dialogues that are virtually indistinguishable from real human conversations underscores the immense potential of AI to bridge the gap between humans and machines, making interactions more natural, enjoyable, and, ultimately, human.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….
The post From Text to Visuals: How AWS AI Labs and University of Waterloo Are Changing the Game with MAGID appeared first on MarkTechPost.

Run an audience overlap analysis in AWS Clean Rooms

Advertisers, publishers, and advertising technology providers are actively seeking efficient ways to collaborate with their partners to generate insights about their collective datasets. One common reason to engage in data collaboration is to run an audience overlap analysis, which is a common analysis to run when media planning and evaluating new partnerships.
In this post, we explore what an audience overlap analysis is, discuss the current technical approaches and their challenges, and illustrate how you can run secure audience overlap analysis using AWS Clean Rooms.
Audience overlap analysis
Audience overlap is the percentage of users in your audience who are also present in another dataset (calculated as the number of users present in both your audience and another dataset divided by the total number of users in your audience). In the digital media planning process, audience overlaps are often conducted to compare an advertiser’s first-party dataset with a media partner’s (publisher) dataset. The analysis helps determine how much of the advertiser’s audience can be reached by a given media partner. By evaluating the overlap, advertisers can determine whether a media partner provides unique reach or if the media partner’s audience predominantly overlaps with the advertiser’s existing audience.
Current approaches and challenges
Advertisers, publishers, third-party data providers, and other entities often share their data when running audience overlaps or match tests. Common methods for sharing data, such as using pixels and SFTP transfers, can carry risk because they involve moving sensitive customer information. Sharing this data to another party can be time consuming and increase the risk of potential data breaches or unauthorized access. If the receiving party mishandles the data, it could violate privacy regulations, resulting in legal risks. Also, any perceived misuse or exposure of customer data can erode consumer trust, leading to reputational damage and potential loss of business.
Solution overview
AWS Clean Rooms can help you and your partners effortlessly and securely collaborate on and analyze your collective datasets—without copying each other’s underlying data. With AWS Clean Rooms, you can create a data clean room in minutes and collaborate with your partners to generate unique insights. AWS Clean Rooms allows you to run an audience overlap analysis and generate valuable insights while avoiding risks associated with other current approaches.
The following are key concepts and prerequisites to use AWS Clean Rooms:

Each party in the analysis (collaboration member) needs to have an AWS account.
One member invites the other member to the AWS Clean Rooms collaboration. It doesn’t matter which member creates the invitation. The collaboration creator uses the invitee’s AWS account ID as input to send invitations.
Only one member can query in the collaboration, and only one member can receive results from the collaboration. The abilities of each member are defined when the collaboration is created.
Each collaboration member stores datasets in their respective Amazon Simple Storage Service (Amazon S3) bucket and catalogs them (creates a schema with column names and data types) in the AWS Glue Data Catalog. You can also create the Data Catalog definition using the Amazon Athena create database and create table statements.
Collaborators need to have their S3 buckets and Data Catalog tables in the same AWS Region.
Collaborators can use the AWS Clean Rooms console, APIs, or AWS SDKs to set up a collaboration.
AWS Clean Rooms enables you to use any column as a join key, for example hashed MAIDs, emails, IP addresses, and RampIDs.
Each collaboration member associates their own data to the collaboration.

Let’s look at a scenario in which an advertiser collaborates with a publisher to identify the audience overlap. In this example, the publisher creates the collaboration, invites the advertiser, and designates the advertiser as the member who can query and receive results.
Prerequisites
To invite another person to a collaboration, you need their AWS account ID. In our use case, the publisher needs the AWS account ID of the advertiser.
Create a collaboration
In our use case, the publisher creates a collaboration using the AWS Clean Rooms console and invites the advertiser.
To create a collaboration, complete the following steps:

On the AWS Clean Rooms, console, choose Collaborations in the navigation pane.
Choose Create collaboration.
For Name, enter a name for the collaboration.
In the Members section, enter the AWS account ID of the account you want to invite (in this case, the advertiser).
In the Member abilities section, choose the member who can query and receive results (in this case, the advertiser).
For Query logging, decide if you want query logging turned on. The queries are logged to Amazon CloudWatch.
For Cryptographic computing, decide if you want to turn on support for cryptographic computing (pre-encrypt your data before associating it). AWS Clean Rooms will then run queries on the encrypted data.
Choose Next.
On the Configure membership page, choose if you want to create the membership and collaboration now, or create the collaboration but activate your membership later.
For Query results settings defaults, choose if you want to keep the default settings to receive results.
For Log storage in Amazon CloudWatch Logs, specify your log settings.
Specify any tags and who is paying for queries.
Choose Next.
Review the configuration and choose to either create the collaboration and membership now, or just the collaboration.

The publisher sends an invitation to the advertiser. The advertiser reviews the collaboration settings and creates a membership.
Create a configured table and set analysis rules
The publisher creates a configured table from the AWS Glue table (which represents the metadata definition of the S3 data, including location, so it can be read by AWS Clean Rooms when the query is run).
Complete the following steps:

On the AWS Clean Rooms console, choose Configured tables in the navigation pane.
Choose Configure new table.
In the Choose AWS Glue table section, choose your database and table.
In the Columns allowed in collaboration section, choose which of the existing table columns to allow for querying in the collaboration.
In the Configured table details section, enter a name and optional description for the configured table.
Choose Configure new table.
Choose the analysis rule type that matches the type of queries you want to allow on table. To allow an aggregation analysis, such as finding the size of the audience overlap, choose the aggregation analysis rule type.
In the Aggregate functions section, choose COUNT DISTINCT as the aggregate function.
In the Join controls section, choose whether your collaborator is required to join a table with yours. Because this is an audience overlap use case, select No, only overlap can be queried.
Select the operators to allow for matching (for this example, select AND and OR).
In the Dimension controls section, choose if you want to make any columns available as dimensions.
In the Scalar functions section, choose if you want to limit the scalar functions allowed.
Choose Next.
In the Aggregation constraints section, choose the minimum aggregation constraint for the configured table.

This allows you to filter out rows that don’t meet a certain minimum threshold of users (for example, if the threshold is set to 10, rows that aggregate fewer than 10 users are filtered out).

Choose Next.
Review the settings and create the table.

Associate the table to the collaboration
AWS Clean Rooms requires access to read the table in order to run the query submitted by the advertiser. Complete the following steps to associate the table:

On the AWS Clean Rooms console, navigate to your collaboration.
Choose Associate table.
For Configured table name, choose the name of your configured table.
In the Table association details section, enter a name and optional description for the table.
In the Service access section, you can choose to can use the default settings to create an AWS Identity and Access Management (IAM) service role for AWS Clean Rooms automatically, or you can use an existing role. IAM permissions are required to create or modify the role and pass the role to AWS Clean Rooms.
Choose Associate table.

The advertiser also completes the steps detailed in the preceding sections to create a configured table and associate it to the collaboration.
Run queries in the query editor
The advertiser can now navigate to the Queries tab for the collaboration and review tables to query and their analysis rules. You can specify
the S3 bucket where the output of the overlap query will go.
The advertiser can now write and run an overlap query. You can use a hashed email as a join key for the query (you have the option to use any column as the join key and can also use multiple columns for multiple join keys). You can also use the Analysis Builder no-code option to have AWS Clean Rooms generate SQL on your behalf. For our use case, we run the following queries:

#Query 1 – count of overlapping users between advertiser and publisher datasets

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser
INNER JOIN impressions as publisher
ON advertiser.hashed_email = publisher.hashed_email

#Query 2 – count of users in advertiser dataset

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser

The query results are sent to the advertiser’s S3 bucket, as shown in the following screenshot.

Clean up
It’s a best practice to delete resources that are no longer being used. The advertiser and publisher should clean up their respective resources:

Advertiser – The advertiser deletes their configured table associations and collaboration membership. However, they don’t have to delete their configured table because it’s reusable across collaborations.
Publisher – The publisher deletes their configured table associations and the collaboration. They don’t have to delete their configured table because it’s reusable across collaborations.

Conclusion
In this post, we demonstrated how to set up an audience overlap collaboration using AWS Clean Rooms for media planning and partnership evaluation using a hashed email as a join key between datasets. Advertisers are increasingly turning to AWS Clean Rooms to conduct audience overlap analyses with their media partners, aiding their media investment decisions. Furthermore, audience overlaps help you accelerate your partnership evaluations by identifying the extent of overlap you share with potential partners.
To learn more about AWS Clean Rooms, watch the video Getting Started with AWS Clean Rooms, and refer to the following additional resources:

AWS Clean Rooms Now Generally Available — Collaborate with Your Partners without Sharing Raw Data
AWS on Air: AWS Clean Rooms is now available for General Availability
Introducing AWS for Advertising & Marketing: Helping customers reinvent the industry with purpose-built services, solutions, and partners
Introducing four new solutions that help customers integrate AWS Clean Rooms into their advertising workflows
AWS Clean Rooms User Guide

About the Authors
Eric Saccullo is a Senior Business Development Manager for AWS Clean Rooms at Amazon Web Services. He is focused on helping customers collaborate with their partners in privacy-enhanced ways to gain insights and improve business outcomes.
Shamir Tanna is a Senior Technical Product Manager at Amazon Web Services.
Ryan Malecky is a Senior Solutions Architect at Amazon Web Services. He is focused on helping customers gain insights from their data, especially with AWS Clean Rooms.

Large language model inference over confidential data using AWS Nitro …

This post is co-written with Justin Miles, Liv d’Aliberti, and Joe Kovba from Leidos. 
Leidos is a Fortune 500 science and technology solutions leader working to address some of the world’s toughest challenges in the defense, intelligence, homeland security, civil, and healthcare markets. In this post, we discuss how Leidos worked with AWS to develop an approach to privacy-preserving large language model (LLM) inference using AWS Nitro Enclaves.
LLMs are designed to understand and generate human-like language, and are used in many industries, including government, healthcare, financial, and intellectual property. LLMs have broad applicability, including chatbots, content generation, language translation, sentiment analysis, question answering systems, search engines, and code generation. Introducing LLM-based inference into a system also has the potential to introduce privacy threats, including model exfiltration, data privacy violations, and unintended LLM-based service manipulation. Technical architectures need to be implemented in order to make sure that LLMs don’t expose sensitive information during inference.
This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.
Overview of LLMs and Nitro Enclaves
A potential use case is an LLM-based sensitive query chatbot designed to carry out a question and answering service containing PII and PHI. Most current LLM chatbot solutions explicitly inform users that they should not include PII or PHI when inputting questions due to security concerns. To mitigate these concerns and protect customer data, service owners rely primarily on user protections such as the following:

Redaction – The process of identifying and obscuring sensitive information like PII in documents, texts, or other forms of content. This can be accomplished with input data before being sent to a model or an LLM trained to redact their responses automatically.
Multi-factor authentication – A security process that requires users to provide multiple authentication methods to verify their identity to gain access to the LLM.
Transport Layer Security (TLS) – A cryptographic protocol that provides secure communication that enhances data privacy in transit between users and the LLM service.

Although these practices enhance the security posture of the service, they are not sufficient to safeguard all sensitive user information and other sensitive information that can persist without the user’s knowledge.
In our example use case, an LLM service is designed to answer employee healthcare benefit questions or provide a personal retirement plan. Let’s analyze the following sample architecture and identify data privacy risk areas.

Figure 1 – Data Privacy Risk Areas Diagram

The potential risk areas are as follows:

Privileged users have access to the instance that houses the server. Unintentional or unauthorized changes to the service could result in sensitive data being exposed in unintended ways.
Users must trust the service will not expose or retain sensitive information in application logs.
Changes to application packages can cause changes to the service, resulting in the exposure of sensitive data.
Privileged users with access to the instance have unrestricted access to the LLM used by the service. Changes may cause incorrect or inaccurate information being returned to users.

Nitro Enclaves provides additional isolation to your Amazon Elastic Compute Cloud (Amazon EC2) instance, safeguarding data in use from unauthorized access, including admin-level users. In the preceding architecture, it’s possible for an unintentional change to result in sensitive data to persist in plaintext and accidentally get revealed to a user who may not need to access that data. With Nitro Enclaves, you create an isolated environment from your EC2 instance, permitting you to allocate CPU and memory resources to the enclave. This enclave is a highly restrictive virtual machine. By running code that handles sensitive data within the enclave, none of the parent’s processes will be able to view enclave data.
Nitro Enclaves offers the following benefits:

Memory and CPU Isolation – It relies on the Nitro Hypervisor to isolate the CPU and memory of the enclave from users, applications, and libraries on the parent instance. This feature helps isolate the enclave and your software, and significantly reduces the surface area for unintended events.
Separate virtual machine – Enclaves are separated virtual machines attached to an EC2 instance to further protect and securely process highly sensitive data.
No interactive access – Enclaves provide only secure local socket connectivity with their parent instance. They have no persistent storage, interactive access, or external networking.
Cryptographic attestation – Nitro Enclaves offers cryptographic attestation, a process used to prove the identity of an enclave and verify that only authorized code is running in your enclave.
AWS integration – Nitro Enclaves is integrated with AWS Key Management Service (AWS KMS), allowing you to decrypt files that have been encrypted using AWS KMS inside the enclave. AWS Certificate Manager (ACM) for Nitro Enclaves allows you to use public and private SSL/TLS certificates with your web applications and servers running on EC2 instances with Nitro Enclaves.

You can use these features provided by Nitro Enclaves to help mitigate risks associated with PII and PHI data. We recommend including Nitro Enclaves in an LLM service when handling sensitive user data.
Solution overview
Let’s examine the architecture of the example service, now including Nitro Enclaves. By incorporating Nitro Enclaves, as shown in the following figure, the LLM becomes a more secure chatbot for handling PHI or PII data.

Figure 2 – Solution Overview Diagram

User data, including PII, PHI, and questions, remains encrypted throughout the request-response process when the application is hosted within an enclave. The steps carried out during the inference are as follows:

The chatbot app generates temporary AWS credentials and asks the user to input a question. The question, which may contain PII or PHI, is then encrypted via AWS KMS. The encrypted user input is combined with the temporary credentials to create the encrypted request.
The encrypted data is sent to an HTTP server hosted by Flask as a POST request. Before accepting sensitive data, this endpoint should be configured for HTTPs.
The client app receives the POST request and forwards it through a secure local channel (for example, vsock) to the server app running inside Nitro Enclaves.
The Nitro Enclaves server app uses the temporary credentials to decrypt the request, queries the LLM, and generates the response. The model-specific settings are stored within the enclaves and are protected with cryptographic attestation.
The server app uses the same temporary credentials to encrypt the response.
The encrypted response is returned back to the chatbot app through the client app as a response from the POST request.
The chatbot app decrypts the response using their KMS key and displays the plaintext to the user.

Prerequisites
Before we get started, you need the following prerequisites to deploy the solution:

AWS Identity and Access Management (IAM) user
KMS symmetric key
AWS Cloud9 environment

Configure an EC2 instance
Complete the following steps to configure an EC2 instance:

Launch an r5.8xlarge EC2 instance using the amzn2-ami-kernel-5.10-hvm-2.0.20230628.0-x86_64-gp2 AMI with Nitro Enclaves enabled.
Install the Nitro Enclaves CLI to build and run Nitro Enclaves applications:

sudo amazon-linux-extras install aws-nitro-enclaves-cli -y
sudo yum install aws-nitro-enclaves-cli-devel -y

Verify the installation of the Nitro Enclaves CLI:

nitro-cli –version
The version used in this post is 1.2.2

Install Git and Docker to build Docker images and download the application from GitHub. Add your instance user to the Docker group (<USER> is your IAM instance user):

sudo yum install git -y
sudo usermod -aG ne <USER>
sudo usermod -aG docker <USER>
sudo systemctl start docker && sudo systemctl enable docker

Start and enable the Nitro Enclaves allocator and vsock proxy services:

sudo systemctl start nitro-enclaves-allocator.service && sudo systemctl enable nitro-enclaves-allocator.service
sudo systemctl start nitro-enclaves-vsock-proxy.service && sudo systemctl enable nitro-enclaves-vsock-proxy.service

Nitro Enclaves uses a local socket connection called vsock to create a secure channel between the parent instance and the enclave.
After all the services are started and enabled, restart the instance to verify that all of the user groups and services are running correctly:
sudo shutdown -r now
Configure the Nitro Enclaves allocator service
Nitro Enclaves is an isolated environment that designates a portion of the instance CPU and memory to run the enclave. With the Nitro Enclaves allocator service, you can indicate how many CPUs and how much memory will be taken from the parent instance to run the enclave.
Modify the enclave’s reserved resources using a text editor (for our solution, we allocate 8 CPU and 70,000 MiB memory to provide enough resources):
vi /etc/nitro_enclaves/allocatory.yaml

Figure 3 – AWS Nitro Enclaves Allocator Service Configuration

Clone the project
After you configure the EC2 instance, you can download the code to run the sensitive chatbot with an LLM inside of Nitro Enclaves.
You need to update the server.py file with the appropriate KMS key ID that you created in the beginning to encrypt the LLM response.

Clone the GitHub project:

cd ~/ && git clone https://<THE_REPO.git>

Navigate to the project folder to build the enclave_base Docker image that contains the Nitro Enclaves Software Development Kit (SDK) for cryptographic attestation documents from the Nitro Hypervisor (this step can take up to 15 minutes):

cd /nitro_llm/enclave_base
docker build ./ -t “enclave_base”

Save the LLM in the EC2 Instance
We are using the open-source Bloom 560m LLM for natural language processing to generate responses. This model is not fine-tuned to PII and PHI, but demonstrates how an LLM can live inside of an enclave. The model also needs to be saved on the parent instance so that it can be copied into the enclave via the Dockerfile.

Navigate to the project:

cd /nitro_llm

Install the necessary requirements to save the model locally:

pip3 install requirements.txt

Run the save_model.py app to save the model within the /nitro_llm/enclave/bloom directory:

python3 save_model.py

Build and run the Nitro Enclaves image
To run Nitro Enclaves, you need to create an enclave image file (EIF) from a Docker image of your application. The Dockerfile located in the enclave directory contains the files, code, and LLM that will run inside of the enclave.
Building and running the enclave will take multiple minutes to complete.

Navigate to the root of the project:

cd /nitro_llm

Build the enclave image file as enclave.eif:

nitro-cli build-enclave –docker-uri enclave:latest –output-file enclave.eif

Figure 4 – AWS Nitro Enclaves Build Result

When the enclave is built, a series of unique hashes and platform configuration registers (PCRs) will be created. The PCRs are a contiguous measurement to prove the identity of the hardware and application. These PCRs will be required for cryptographic attestation and used during the KMS key policy update step.

Run the enclave with the resources from the allocator.service (adding the –attach-console argument at the end will run the enclave in debug mode):

nitro-cli run-enclave –cpu-count 8 –memory 70000 –enclave-cid 16 –eif-path enclave.eif

You need to allocate at least four times the EIF file size. This can be modified in the allocator.service from previous steps.

Verify the enclave is running with the following command:

nitro-cli describe-enclaves

Figure 5 – AWS Nitro Enclave Describe Command

Update the KMS key policy
Complete the following steps to update your KMS key policy:

On the AWS KMS console, choose Customer managed keys in the navigation pane.
Search for the key that you generated as a prerequisite.
Choose Edit on the key policy.
Update the key policy with the following information:

Your account ID
Your IAM user name
The updated Cloud9 environment instance role
Actions kms:Encrypt and kms:Decrypt
Enclave PCRs (for example, PCR0, PCR1, PCR2) to your key policy with a condition statement

See the following key policy code:

{
“Version”:”2012-10-17″,
“Id”:”key-default-1″,
“Statement”:[
{
“Sid”:”Enable User permissions”,
“Effect”:”Allow”,
“Principal”:{
“AWS”:”arn:aws:iam:::user/”
},
“Action”:[
“kms:CreateAlias”,
“kms:CreateKey”,
“kms:DeleteAlias”,
“kms:Describe*”,
“kms:GenerateRandom”,
“kms:Get*”,
“kms:List*”,
“kms:TagResource”,
“kms:UntagResource”,
“iam:ListGroups”,
“iam:ListRoles”,
“iam:ListUsers”
],
“Resource”:”*”
},
{
“Sid”:”Enable Enclave permissions”,
“Effect”:”Allow”,
“Principal”:{
“AWS”:”arn:aws:iam:::role/”
},
“Action”:[
“kms:Encrypt”,
“kms:Decrypt”
],
“Resource”:”*”,
“Condition”:{
“StringEqualsIgnoreCase”:{
“kms:RecipientAttestation:PCR0″:””,
“kms:RecipientAttestation:PCR1″:””,
“kms:RecipientAttestation:PCR2″:””
}
}
}
]
}

Save the chatbot app
To mimic a sensitive query chatbot application that lives outside of the AWS account, you need to save the chatbot.py app and run it inside the Cloud9 environment. Your Cloud9 environment will use its instance role for temporary credentials to disassociate permissions from the EC2 running the enclave. Complete the following steps:

On the Cloud9 console, open the environment you created.
Copy the following code into a new file like chatbot.py into the main directory.
Install the required modules:

pip install boto3
Pip install requests

On the Amazon EC2 console, note the IP associated with your Nitro Enclaves instance.
Update the URL variable in http://<ec2instanceIP>:5001.

“””
Modules for a basic chatbot like application and AWS communications
“””
import base64
import requests
import boto3
 
def get_identity_document():
    “””
    Get identity document for current EC2 Host
    “””
    identity_doc = requests.get(
        “http://169.254.169.254/latest/dynamic/instance-identity/document”, timeout=30)
    return identity_doc
 
def get_region(identity):
    “””
    Get account of current instance identity
    “””
    region = identity.json()[“region”]
    return region
 
def get_account(identity):
    “””
    Get account of current instance identity
    “””
    account = identity.json()[“accountId”]
    return account
 
def set_identity():
    “””
    Set region and account for KMS
    “””
    identity = get_identity_document()
    region = get_region(identity)
    account = get_account(identity)
    return region, account
 
def prepare_server_request(ciphertext):
    “””
    Get the AWS credential from EC2 instance metadata
    “””
    instance_prof = requests.get(
        “http://169.254.169.254/latest/meta-data/iam/security-credentials/”, timeout=30)
    instance_profile_name = instance_prof.text
 
    instance_prof_json = requests.get(
        f”http://169.254.169.254/latest/meta-data/iam/security-credentials/{instance_profile_name}”,
        timeout=30)
    response = instance_prof_json.json()
 
    credential = {
        ‘access_key_id’: response[‘AccessKeyId’],
        ‘secret_access_key’: response[‘SecretAccessKey’],
        ‘token’: response[‘Token’],
        ‘region’: REGION,
        ‘ciphertext’: ciphertext
    }
    return credential
 
def get_user_input():
    “””
    Start chatbot to collect user input
    “””
    print(“Chatbot: Hello! How can I assist you?”)
    user_input = input(‘Your Question: ‘)
    return user_input.lower()
 
def encrypt_string(user_input, alias, kms):
    “””
    Encrypt user input using AWS KMS
    “””
    file_contents = user_input
    encrypted_file = kms.encrypt(KeyId=f’alias/{alias}’, Plaintext=file_contents)
    encrypted_file_contents = encrypted_file[u’CiphertextBlob’]
    encrypted_file_contents_base64 = base64.b64encode(encrypted_file_contents)
    return encrypted_file_contents_base64.decode()
 
def decrypt_data(encrypted_data, kms):
    “””
    Decrypt the LLM response using AWS KMS
    “””
    try:
        ciphertext_blob = base64.b64decode(encrypted_data)
        response = kms.decrypt(CiphertextBlob=ciphertext_blob)
        decrypted_data = response[‘Plaintext’].decode()
        return decrypted_data
    except ImportError as e_decrypt:
        print(“Decryption failed:”, e_decrypt)
        return None
 
REGION, ACCOUNT = set_identity()
  
def main():
    “””
    Main function to encrypt/decrypt data and send/receive with parent instance
    “””
    kms = boto3.client(‘kms’, region_name=REGION)
    alias = “ncsnitro”
    user_input = get_user_input()
    encrypted_input = encrypt_string(user_input, alias, kms)
    server_request = prepare_server_request(encrypted_input)
    url = ‘http://<EC2 Instance Private IP>:5001’
    x = requests.post(url, json = server_request)
    response_body = x.json()
    llm_response = decrypt_data(response_body[“EncryptedData”], kms)
    print(llm_response)
 
if __name__ == ‘__main__’:
    main()

Run the chatbot application:

python3 chat.py

When it’s running, the terminal will ask for the user input and follow the architectural diagram from earlier to generate a secure response.
Run the private question and answer chatbot
Now that Nitro Enclaves is up and running on the EC2 instance, you can more securely ask your chatbot PHI and PII questions. Let’s look at an example.
Within the Cloud9 environment, we ask our chatbot a question and provide our user name.

Figure 6 – Asking the Chat Bot a Question

AWS KMS encrypts the question, which looks like the following screenshot.

Figure 7 – Encrypted Question

It is then sent to the enclave and asked of the secured LLM. The question and response of the LLM will look like the following screenshot (the result and encrypted response are visible inside the enclave only in debug mode).

Figure 8 – Response from LLM

The result is then encrypted using AWS KMS and returned to the Cloud9 environment to be decrypted.

Figure 9 – Final Decrypted Response

Clean up
Complete the following steps to clean up your resources:

Stop the EC2 instance created to house your enclave.
Delete the Cloud9 environment.
Delete the KMS key.
Remove the EC2 instance role and IAM user permissions.

Conclusion
In this post, we showcased how to use Nitro Enclaves to deploy an LLM question and answering service that more securely sends and receives PII and PHI information. This was deployed on Amazon EC2, and the enclaves are integrated with AWS KMS restricting access to a KMS key, so only Nitro Enclaves and the end-user are allowed to use the key and decrypt the question.
If you’re planning to scale this architecture to support larger workloads, make sure the model selection process matches your model requirements with EC2 resources. Additionally, you must consider the maximum request size and what impact that will have on the HTTP server and inference time against the model. Many of these parameters are customizable through the model and HTTP server settings.
The best way to determine the specific settings and requirements for your workload is through testing with a fine-tuned LLM. Although this post only included natural language processing of sensitive data, you can modify this architecture to support alternate LLMs supporting audio, computer vision, or multi-modalities. The same security principles highlighted here can be applied to data in any format. The resources used to build this post are available on the GitHub repo.
Share how you are going to adapt this solution for your environment in the comments section.

About the Authors
Justin Miles is a cloud engineer within the Leidos Digital Modernization Sector under the Office of Technology. In his spare time, he enjoys golfing and traveling.
Liv d’Aliberti is a researcher within the Leidos AI/ML Accelerator under the Office of Technology. Their research focuses on privacy-preserving machine learning.
Chris Renzo is a Sr. Solution Architect within the AWS Defense and Aerospace organization. Outside of work, he enjoys a balance of warm weather and traveling.
Joe Kovba is a Vice President within the Leidos Digital Modernization Sector. In his free time, he enjoys refereeing football games and playing softball.

Can AI Help Overcome the Loss of Third-Party Cookies? Google Thinks So

In June of 2021, Google announced they would sunset cookies come 2023. Sure they delayed it a few times but let’s be honest, as marketers, we’ve had plenty of time to adapt and adjust to a web without cookies.

Yet, here we are in 2024 and people still aren’t ready. 

In fact, according to Dan Taylor, Google’s VP of Global Advertising Strategies, only 30% of marketers feel prepared for third-party cookie deprecation and both advertisers and publishers have been reluctant to invest in cookie alternatives.

Well, guess what? There is no more time and Google’s latest announcement is a wake-up call for marketers. 

There will be no more delays, third-party cookies are on their way out, and marketers had better adapt. 

But how? 

While we don’t often find ourselves nodding our heads so enthusiastically at statements from Google, we can’t help but smile at this one. 

Per Taylor, advertisers can overcome signal loss by prioritizing the following:

AI investment

Refined targeting

Measurement and ad delivery strategies

We feel the same way Google. These are the exact things Customers.ai is solving for!

So let’s jump in and look at how marketers can start moving toward a cookieless future. 

1. Hop on the AI Train

I think at this point we can all agree AI isn’t just another trend. It’s big and it’s here to stay.

For marketers, AI opens the doors to not only a whole new level of scalability but also a new level of audience engagement. Imagine being able to predict what your customers want before they even know it themselves. AI can help do that!

But that’s just one power. 

Here at Customers.ai, we are using AI to capture first-party data, create personalized messaging at scale, build and refine ad audiences, and so much more. 

If you are just starting to experiment with AI technology or have started testing out new tools, here are a few quick tips to consider: 

Lean into AI tools that can sift through data and uncover insights faster than any human could.

Personalize your approach. Use AI to tailor your marketing messages so that they resonate deeply with each individual in your audience.

Experiment and Learn. Start with a mindset geared towards experimentation. Start small with AI-powered tools for insights or automation, and learn as you go. It’s about trial, error, and ultimately, discovery.

2. Collect First-Party Data Like It’s Gold (Because It Is)

For years we’ve relied on third-party data to help us make decisions, validate theories, distribute ad budgets, and so on and so forth. 

While that was all well and good when there was loads of data coming in but now with less and less third-party data available, we can’t rely on it to make important decisions. 

First-party data on the other hand is your ace in the hole. 

Names, emails, preferences, demographic data, psychographic data, website activity —this info comes straight from your audience and it’s as valuable as it gets.

That’s what we are doing here. With the Customers.ai Website Visitor ID X-Ray pixel, you can capture exactly what we just mentioned without the need for pop-ups or widgets.

First -party data is also essential for creating custom audiences that can be used across your marketing channels.

After all, a person who abandoned their cart should not be treated the same as someone who merely read a blog. 

As Dan Taylor mentioned, refined targeting is a must and in order to do it effectively, you need first party data. 

Want to start capturing website visitors for free? 

To install the Website Visitor ID X-Ray Pixel, sign up, go to your dashboard, and navigate to My Automations. 

Select + New Automation and get your pixel. We have easy install options for Google Tag Manager, WordPress, and Shopify, or you can install the pixel manually.

3. Turning Insights Into Action

The loss of data really means a loss of insights and a loss of accountability when it comes to marketing performance. 

But with AI and first-party data, there is so much we can do! 

It’s important to remember that AI is your ally. It’s not about replacing human insight but enhancing it.

Take the insights AI and first-party data provide and turn them into actionable tactics and strategies. Here are a few examples:

Get Personal. One of the best things about AI and first-party data is the ability to get granular and create specific audience segments. For example, Customers.ai not only tells you who is on your site but also enriches the data, giving you more insights into each individual. These custom audiences allows you to craft messages that don’t just sell but connect on a human level.

Make Data-Driven Decisions. Use first-party data to make informed decisions about your marketing strategies. This data tells you what your audience loves (and doesn’t), helping you to refine your approach continuously. Plus, with customer journey tracking, you can actually see which channels are working and what is driving the path to sale.

Test. Test. Test. Combine AI with your data to supercharge your testing. AI can help identify trends and patterns you might miss, enabling you to optimize campaigns in real-time.

Let’s Redefine Marketing with AI & First-Party Data

When it comes to dealing with the loss of third-party cookies, marketers need to adapt now. There is no more time. 

By embracing AI and valuing first-party data, you can not only adapt to change but you can also redefine your marketing strategy as a whole. 

At the end of the day, the departure from third-party cookies may feel like a huge loss but we need to look at it as an opportunity. An opportunity to redefine digital marketing, to build deeper connections with our prospects and customers, to create more personalized experiences, and to focus on the data we have – the data that we own. 

So why don’t we simply embrace this new era with open arms and innovative minds? The future of marketing is exciting and AI is leading the way!

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Can AI Help Overcome the Loss of Third-Party Cookies? Google Thinks So appeared first on Customers.ai.

Enhancing Tool Usage in Large Language Models: The Path to Precision w …

Developing large language models (LLMs) in artificial intelligence, such as OpenAI’s GPT series, marks a transformative era, bringing profound impacts across various sectors. These sophisticated models have become cornerstones for generating contextually rich and coherent text outputs, facilitating applications from automated content creation to nuanced customer service interactions. However, when integrated with external tools, their capabilities extend beyond text generation.

Despite the exciting prospects, integrating LLMs with external tools reveals a pivotal challenge: the precision with which these models utilize tools still needs to be improved. This gap is significant; for LLMs to truly extend their utility and application, they must access various tools and employ them with high accuracy. Current statistics, including those from groundbreaking models like GPT-4, show a tool usage correctness rate that falls short of the mark, emphasizing the necessity for enhanced methodologies in tool-augmented LLM applications. 

Studies have previously concentrated on expanding the toolset available to LLMs and simplifying the integration of new tools. But they scarcely scratch the surface of the underlying issue: the accuracy of tool utilization. This aspect is crucial; as LLMs venture into executing tasks with tangible impacts, the stakes of accurate tool usage escalate, especially in scenarios where incorrect actions could lead to adverse outcomes. The quest for a solution brings us to an innovative approach inspired by nature’s learning mechanisms.

Researchers from Ohio State University and Microsoft Semantic Machines have introduced Simulated Trial and Error (STE), a method inspired by the cognitive learning processes observed in humans and other intelligent organisms. This pioneering approach seeks to refine LLMs’ mastery over tools through a process reminiscent of human learning, combining the elements of imagination, trial and error, and memory. LLMs can use tools iteratively, learning from each interaction’s feedback to hone their approach and significantly improve accuracy. This method embodies a shift from a static understanding of tool operation towards a dynamic, interactive learning model that mirrors biological processes.

At the center of STE lies a dual-memory system consisting of short-term and long-term components designed to enhance the exploration capabilities of LLMs. The short-term memory focuses on the immediate past, allowing LLMs to learn from recent trials and refine their tool usage strategies accordingly. In contrast, the long-term memory component builds a reservoir of past experiences, guiding LLMs in their long-term learning trajectory and enabling them to draw upon knowledge for future interactions. This sophisticated memory framework underpins the STE method, fostering LLMs’ more nuanced and effective tool usage.

The efficacy of STE has been rigorously tested on the ToolBench platform, revealing remarkable improvements in tool usage accuracy among LLMs. Models augmented with STE surpassed traditional benchmarks, including GPT-4, but demonstrated superior performance across both in-context learning and fine-tuning scenarios. These findings underscore STE’s transformative potential in enhancing tool-augmented LLMs’ operational efficiency, propelling them towards more reliable and effective tool usage in practical applications.

In conclusion, integrating LLMs with external tools, powered by the innovative STE method, heralds a new chapter in artificial intelligence. This approach not only rectifies the pressing issue of tool usage accuracy but also paves the way for broader and more impactful applications of LLMs across diverse sectors. With its biologically inspired learning mechanisms, the STE method assists in the evolution of LLM.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….
The post Enhancing Tool Usage in Large Language Models: The Path to Precision with Simulated Trial and Error appeared first on MarkTechPost.