How to Overcome Ecommerce Challenges with Customers.ai and Sendlane

In the ever-evolving world of e-commerce, 2024 has emerged as a year of unparalleled challenges and opportunities. As the digital landscape continues to shift beneath our feet, businesses are grappling with how to stay ahead in a market that demands innovation, personalization, and a deep respect for consumer privacy. From the looming shadow of third-party cookie deprecation to the rising expectations of tech-savvy consumers for seamless online experiences, e-commerce companies find themselves at a critical juncture. In this dynamic environment, the integration between Customers.ai and Sendlane presents a beacon of hope.

Customers.ai, with its pioneering Website Visitor Identification X-Ray pixel, has revolutionized the way businesses identify and understand their website visitors. By unlocking the ability to recognize up to 30% of anonymous traffic, Customers.ai provides invaluable insights into the name, email, and behavioral intentions of potential customers. On the other side, Sendlane has been at the forefront of e-commerce marketing automation, empowering brands to send highly personalized, targeted emails that resonate with their audience’s unique needs and preferences.

Together, these two platforms offer a synergistic solution that not only addresses the pressing challenges of 2024’s e-commerce landscape but also paves the way for businesses to forge deeper connections with their customers. In this blog post, we’ll delve into the complexities facing today’s e-commerce businesses and explore how leveraging the combined strengths of Customers.ai and Sendlane can help overcome these hurdles, transforming challenges into stepping stones for success.

Ecommerce Webinar

Beyond Abandoned Cart to Abandoned Product View Revenue

with Email Deliverability Hacks & AI Tools

Watch The Webinar

The Ecommerce Landscape in 2024

Introduction to Cusotmers.ai and Sendlane

Overcoming Challenges with Customers.ai and Sendlane

Setting Up your Customers.ai and Sendlane Integration

The Ecommerce Landscape in 2024

The ecommerce sector is undergoing rapid transformation, with 2024 presenting a unique set of challenges that are reshaping the way businesses operate online. These challenges not only demand immediate attention but also offer a glimpse into the future of retail in the digital age.

Higher Consumer Expectations

The e-commerce experience is no longer just about the convenience of shopping from home. Today’s consumers expect a seamless integration of online and in-store experiences. They desire personalized interactions that make shopping more engaging and tailored to their individual preferences. This shift is largely driven by the success of giants like Amazon, which have set a new standard for what customers consider a satisfactory shopping experience. The challenge for e-commerce businesses is to match these expectations by leveraging technology to create a cohesive, personalized shopping journey.

Privacy and Data Concerns

As digital privacy becomes a hot-button issue globally, e-commerce businesses are navigating a privacy-first world where consumers are increasingly wary of how their data is collected and used. Regulatory frameworks such as the GDPR in Europe and various state-level laws in the U.S. reflect a growing demand for greater control over personal information. This changing landscape requires e-commerce companies to be more transparent and conscientious in their data practices, ensuring customer trust is not only maintained but strengthened.

The End of Third-party Cookies

The anticipated demise of third-party cookies on major web browsers poses a significant challenge for e-commerce marketers accustomed to relying on these cookies for targeted advertising and understanding customer behavior. With Google Chrome set to phase out third-party cookies by 2024, businesses must explore new strategies for collecting and leveraging consumer data without infringing on privacy. This shift underscores the need for innovative solutions in data collection and analysis, with a focus on first-party data and privacy-compliant methods.

The Rise of AI and Automation

Artificial Intelligence (AI) and automation are at the forefront of transforming the e-commerce industry. From personalized product recommendations to automated customer service chatbots, these technologies are making it possible for businesses to offer unique and efficient shopping experiences. However, integrating AI and automation into e-commerce operations is not without its challenges. Businesses must ensure these technologies enhance the customer experience without compromising personal touch or leading to privacy concerns.

In the face of these challenges, the integration between Customers.ai and Sendlane emerges as a potent solution. By combining the strengths of advanced visitor identification with sophisticated email marketing automation, these platforms are uniquely positioned to help e-commerce businesses navigate the complexities of the 2024 landscape. In the next section, we will delve into how this integration offers a path forward for e-commerce companies seeking to overcome these hurdles and achieve success in a rapidly changing market.

Introduction to Customers.ai and Sendlane

In an e-commerce ecosystem that’s increasingly complex and competitive, the integration between Customers.ai and Sendlane stands out as a beacon for businesses striving to navigate the turbulent waters of 2024. This section introduces these two groundbreaking platforms and their core capabilities, setting the stage for understanding how their combined forces can address the challenges discussed earlier.

Customers.ai: Revolutionizing Visitor Identification

At the heart of Customers.ai lies the innovative Website Visitor Identification X-Ray pixel, a tool that fundamentally changes the game for e-commerce businesses. This advanced technology has the capability to identify up to 30% of anonymous website visitors, providing not just names and email addresses but also critical behavioral intent information.

In a world where understanding and engaging with your audience is paramount, Customers.ai offers a way to bridge the gap between anonymous browsing and meaningful engagement. By shedding light on who is visiting your site and what they are looking for, Customers.ai enables businesses to tailor their marketing strategies with unprecedented precision.

Sendlane: Personalizing the E-commerce Experience

Sendlane elevates the e-commerce marketing game through its sophisticated automation platform, designed to deliver personalized, targeted emails to potential clients. This platform shines in its ability to craft highly customized communication paths for various customer actions and stages in the buyer’s journey.

From abandoned cart emails that nudge shoppers to complete their purchases to restock notifications that bring customers back to the site, Sendlane ensures that every message is relevant, timely, and likely to convert. Sendlane doesn’t just help businesses increase their conversion rates. It helps them builds lasting relationships with customers by showing them that their needs and preferences are understood and valued.

A Synergistic Integration

The integration of Customers.ai and Sendlane creates a powerhouse for e-commerce businesses. By combining deep insights into website visitor behavior with sophisticated email marketing automation, this partnership offers a comprehensive solution for personalized customer engagement. In the following section, we will explore how this integration specifically addresses the challenges of the 2024 e-commerce landscape, demonstrating its potential to not only meet but exceed consumer expectations in an increasingly digital world.

Overcoming Challenges with Customers.ai and Sendlane Integration

The integration of Customers.ai and Sendlane offers a compelling solution to the challenges faced by e-commerce businesses in 2024. By leveraging the unique strengths of both platforms, companies can enhance their marketing strategies, foster deeper customer relationships, and navigate the evolving e-commerce landscape with confidence.

Seamless Customer Experiences

This integration meets the expectation of blended in-store and online shopping experiences head-on. Customers.ai’s ability to identify and understand the behavior of website visitors enables businesses to gather rich, actionable data. This data is utilized by Sendlane’s targeted email automation. It creates a seamless customer journey from the first website visit through to post-purchase follow-up. This personalized approach ensures customers feel valued and understood, significantly enhancing their shopping experience.

Navigating Privacy and Data Changes

In a privacy-conscious world, the ability to collect and use data responsibly is paramount. Customers.ai collects first-party data directly from website interactions, ensuring businesses have access to customer information that is both valuable and compliant with privacy regulations. Sendlane’s use of this data for personalized email marketing further emphasizes a commitment to respecting user privacy, by providing customers with relevant and desired content without overstepping boundaries.

Adapting to the Cookieless Future

With third-party cookies becoming obsolete, the integration provides an alternative path for understanding customer behavior and preferences. Customers.ai’s technology does not rely on third-party cookies, instead offering a solution that respects user privacy while still delivering valuable insights. These insights fuel Sendlane’s automated emails, ensuring that marketing efforts remain effective and targeted, even in a post-cookie landscape.

Leveraging AI for Personalization

AI and automation are key to delivering the personalized experiences customers expect. Sendlane leverages AI to automate and personalize email campaigns, using data from Customers.ai to make these communications as relevant as possible. Whether it’s recommending products based on previous browsing behavior or reminding customers of items they viewed but didn’t purchase, the integration ensures that every interaction adds value to the customer’s experience.

Real-World Benefits

Increased Conversion Rates: By identifying anonymous visitors and engaging them with personalized emails, businesses can significantly increase their conversion rates.

Enhanced Customer Loyalty: Personalized experiences foster a sense of understanding and care, which in turn builds customer loyalty and trust.

Efficient Use of Marketing Resources: Targeted emails ensure that marketing resources are used efficiently, reaching out to individuals with high engagement potential.

Compliance with Privacy Regulations: Using first-party data and respecting user consent ensures that businesses remain compliant with evolving privacy laws.

The integration between Customers.ai and Sendlane not only addresses the immediate challenges of 2024 but also sets the stage for sustainable growth and success in the future of e-commerce. By focusing on personalized engagement and privacy-compliant practices, businesses can create meaningful connections with their customers, driving both satisfaction and sales.

Setting Up your Customers.ai and Sendlane Integration

Are you itching to take advantage of all these great new avenues for growth?

To get it running, all you’ve got to do is:

Log in to your Customers.ai account

Navigate to your “Integrations” tab

Select “Sendlane”

Drop in your API Key

Pick which audience you’d like to sync

And voila!

Check out how many leads Customers.ai could generate from your storefront and hit the ground running!

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post How to Overcome Ecommerce Challenges with Customers.ai and Sendlane appeared first on Customers.ai.

Microsoft Introduces AutoDev: A Fully Automated Artificial Intelligenc …

The software development sector stands at the dawn of a transformation powered by artificial intelligence (AI), where AI agents perform development tasks. This transformation is not just about incremental enhancements but a radical reimagining of how software engineering tasks are approached, executed, and delivered. Central to this shift is introducing AI-driven frameworks that transcend traditional code assistance tools, marking a leap toward more autonomous, efficient, and secure software development methodologies.

The integration of AI in software development has been confined largely to providing code suggestions and aiding in file manipulation. This approach, while beneficial, barely scratches the surface of what is technologically feasible. AI-powered tools operate within a constrained scope, missing out on Integrated Development Environments (IDEs)’ vast capabilities, such as comprehensive code building, testing, and version control operations. This limitation underscores a critical gap in the software development toolkit, where the potential for AI to contribute more profoundly to the development lifecycle remains largely untapped.

Microsoft researchers present AutoDev, which empowers AI agents to tackle a broad spectrum of software engineering tasks autonomously, from intricate code editing and comprehensive testing to advanced git operations. This framework is designed to focus on autonomy, efficiency, and security. By housing operations within Docker containers, AutoDev ensures that development processes are streamlined and secure, safeguarding user privacy and project integrity through meticulously designed guardrails.

AutoDev’s approach is underpinned by its capacity to delegate complex software engineering objectives to AI agents. These agents, equipped with diverse tools and operations, navigate through tasks with remarkable autonomy. Whether it involves editing files, compiling code, or executing tests, AutoDev’s AI agents manage these operations seamlessly, providing a comprehensive solution that addresses the multifaceted needs of modern software development. This level of automation introduces a new paradigm in software engineering, where AI takes on a more central role, enabling developers to concentrate on higher-level strategic tasks.

A rigorous evaluation of AutoDev using the HumanEval dataset showcases its impressive capabilities. The framework demonstrated exceptional performance in automating software engineering tasks, achieving a Pass@1 success rate of 91.5% for code generation and 87.8% for test generation. These results affirm AutoDev’s effectiveness in enhancing the development process and highlight its potential to redefine the standards of AI-driven software engineering.

In conclusion, AutoDev embodies a significant advancement in software development, shifting towards a more intelligent, efficient, and secure approach to software engineering by extending the capabilities of AI beyond simple code suggestions to encompass a wide array of autonomous software engineering tasks. Some key takeaways include:

Traditional Limitations: Past AI integrations in development focused on basic tasks, overlooking broader IDE capabilities.

AutoDev Introduction: Microsoft presents AutoDev expands AI’s role, autonomously handling complex engineering tasks.

Enhanced Autonomy and Security: AutoDev ensures secure and efficient task execution through Docker containers.

Impressive Capabilities: Evaluation via HumanEval demonstrated AutoDev’s high success in code and test generation tasks.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Microsoft Introduces AutoDev: A Fully Automated Artificial Intelligence-Driven Software Development Framework appeared first on MarkTechPost.

Anthropic and Google Cloud Partner to Bring Advanced Claude 3 AI Model …

Anthropic has made a significant milestone in artificial intelligence by announcing the general availability of Claude 3 Haiku and Claude 3 Sonnet on Google Cloud’s Vertex AI platform. This development marks a milestone in making advanced AI technologies more accessible to enterprises, leveraging Google Cloud’s infrastructure to offer intelligent, efficient, and cost-effective solutions. The partnership underscores a shared commitment to data privacy and security, aiming to empower organizations to easily deploy AI solutions while managing their data within Google Cloud’s environment.

The integration of Claude into Quora’s Poe app is a testament to the transformative potential of Anthropic’s AI. By enabling users to engage in natural conversations with AI, Poe has revolutionized how people interact with artificial intelligence. This real-world application of Claude’s strengths in [specific AI capabilities] underscores the versatility and adaptability of Anthropic’s AI models, providing detailed, comprehensible answers across various use cases.

Looking ahead, Anthropic plans to introduce Claude 3 Opus, claimed as their most capable and intelligent model yet, to Vertex AI in the near future. This forthcoming addition promises to further elevate the capabilities available to developers and enterprises, advancing innovation and creativity in AI applications.

As the partnership between Anthropic and Google Cloud flourishes, the general availability of Claude 3 models on Vertex AI stands as a testament to the evolving landscape of AI technologies. It signals a future where AI is not only more accessible but also more integrated into the fabric of digital solutions, driving advancements that are both groundbreaking and beneficial to society at large.

Key Takeaways:

Anthropic announces the general availability of Claude 3 Haiku and Claude 3 Sonnet on Google Cloud’s Vertex AI, with Claude 3 Opus set to join soon.

This collaboration enhances the accessibility of advanced AI technologies backed by Google Cloud’s infrastructure, emphasizing data privacy and security.

Quora’s successful integration of Claude into its Poe app showcases the practical applications and benefits of Anthropic’s AI models in enhancing user experiences.

The upcoming release of Claude 3 Opus is anticipated to expand further the potential for innovative and intelligent AI solutions on the cloud platform.

The post Anthropic and Google Cloud Partner to Bring Advanced Claude 3 AI Models to Vertex AI appeared first on MarkTechPost.

From Science Fiction to Reality: NVIDIA’s Project GR00T Redefines Hu …

NVIDIA’s unveiling of Project GR00T, a unique foundation model for humanoid robots, and its commitment to the Isaac Robotics Platform and the Robot Operating System (ROS) heralds a significant leap in the development and application of AI in robotics. This project promises to revolutionize how robots understand and interact with the world around them, equipping them with unprecedented levels of coordination, dexterity, and adaptability. It achieves this by focusing on natural language processing and emulating human movements. NVIDIA’s new computing platform, Jetson Thor, underpinned this advancement, leveraging the Blackwell GPU architecture to provide the computational power necessary for these complex tasks.

NVIDIA’s announcement at the GTC developer conference also highlighted its role as a founding platinum member of the Open Source Robotics Alliance (OSRA). This initiative aims to support the ROS ecosystem, ensuring its long-term viability and fostering greater community involvement and governance. Such collaboration between major tech companies and the open-source community is indicative of a broader trend toward democratizing technology development and application.

Furthermore, the significance of NVIDIA’s partnerships with leading roboticists and companies like Agility Robotics, Boston Dynamics, and XPENG Robotics cannot be overstated. These collaborations underscore the industry-wide push towards creating more capable and versatile robotic systems. They play a crucial role in refining the GR00T model and ensuring its applicability across various domains, highlighting the collaborative nature of this groundbreaking project.

In conclusion, NVIDIA’s recent announcements represent a pivotal moment in the convergence of AI and robotics. With GR00T and its support for ROS, NVIDIA is not just advancing the state of robotics technology but also shaping the future of human-robot interaction. The implications of these developments are vast, promising to revolutionize industries ranging from manufacturing to healthcare and setting the stage for a future where robots are integral to everyday life, underlining the potential benefits for various sectors.

Key Takeaways:

Project GR00T introduces a foundation model for humanoid robots, aiming to enhance their understanding and interaction with the world.

NVIDIA’s Jetson Thor platform and Blackwell GPU are critical technological underpinnings for GR00T, providing the necessary computational power.

The company’s commitment to the Open Source Robotics Alliance (OSRA) and the Robot Operating System (ROS) underscores a broader trend toward open-source collaboration in technology development.

Collaborations with leading companies and roboticists will be essential for refining GR00T and ensuring its widespread applicability and success.

The post From Science Fiction to Reality: NVIDIA’s Project GR00T Redefines Human-Robot Interaction appeared first on MarkTechPost.

Unlock the potential of generative AI in industrial operations

In the evolving landscape of manufacturing, the transformative power of AI and machine learning (ML) is evident, driving a digital revolution that streamlines operations and boosts productivity. However, this progress introduces unique challenges for enterprises navigating data-driven solutions. Industrial facilities grapple with vast volumes of unstructured data, sourced from sensors, telemetry systems, and equipment dispersed across production lines. Real-time data is critical for applications like predictive maintenance and anomaly detection, yet developing custom ML models for each industrial use case with such time series data demands considerable time and resources from data scientists, hindering widespread adoption.
Generative AI using large pre-trained foundation models (FMs) such as Claude can rapidly generate a variety of content from conversational text to computer code based on simple text prompts, known as zero-shot prompting. This eliminates the need for data scientists to manually develop specific ML models for each use case, and therefore democratizes AI access, benefitting even small manufacturers. Workers gain productivity through AI-generated insights, engineers can proactively detect anomalies, supply chain managers optimize inventories, and plant leadership makes informed, data-driven decisions.
Nevertheless, standalone FMs face limitations in handling complex industrial data with context size constraints (typically less than 200,000 tokens), which poses challenges. To address this, you can use the FM’s ability to generate code in response to natural language queries (NLQs). Agents like PandasAI come into play, running this code on high-resolution time series data and handling errors using FMs. PandasAI is a Python library that adds generative AI capabilities to pandas, the popular data analysis and manipulation tool.
However, complex NLQs, such as time series data processing, multi-level aggregation, and pivot or joint table operations, may yield inconsistent Python script accuracy with a zero-shot prompt.
To enhance code generation accuracy, we propose dynamically constructing multi-shot prompts for NLQs. Multi-shot prompting provides additional context to the FM by showing it several examples of desired outputs for similar prompts, boosting accuracy and consistency. In this post, multi-shot prompts are retrieved from an embedding containing successful Python code run on a similar data type (for example, high-resolution time series data from Internet of Things devices). The dynamically constructed multi-shot prompt provides the most relevant context to the FM, and boosts the FM’s capability in advanced math calculation, time series data processing, and data acronym understanding. This improved response facilitates enterprise workers and operational teams in engaging with data, deriving insights without requiring extensive data science skills.
Beyond time series data analysis, FMs prove valuable in various industrial applications. Maintenance teams assess asset health, capture images for Amazon Rekognition-based functionality summaries, and anomaly root cause analysis using intelligent searches with Retrieval Augmented Generation (RAG). To simplify these workflows, AWS has introduced Amazon Bedrock, enabling you to build and scale generative AI applications with state-of-the-art pre-trained FMs like Claude v2. With Knowledge Bases for Amazon Bedrock, you can simplify the RAG development process to provide more accurate anomaly root cause analysis for plant workers. Our post showcases an intelligent assistant for industrial use cases powered by Amazon Bedrock, addressing NLQ challenges, generating part summaries from images, and enhancing FM responses for equipment diagnosis through the RAG approach.
Solution overview
The following diagram illustrates the solution architecture.

The workflow includes three distinct use cases:
Use case 1: NLQ with time series data
The workflow for NLQ with time series data consists of the following steps:

We use a condition monitoring system with ML capabilities for anomaly detection, such as Amazon Monitron, to monitor industrial equipment health. Amazon Monitron is able to detect potential equipment failures from the equipment’s vibration and temperature measurements.
We collect time series data by processing Amazon Monitron data through Amazon Kinesis Data Streams and Amazon Data Firehose, converting it into a tabular CSV format and saving it in an Amazon Simple Storage Service (Amazon S3) bucket.
The end-user can start chatting with their time series data in Amazon S3 by sending a natural language query to the Streamlit app.
The Streamlit app forwards user queries to the Amazon Bedrock Titan text embedding model to embed this query, and performs a similarity search within an Amazon OpenSearch Service index, which contains prior NLQs and example codes.
After the similarity search, the top similar examples, including NLQ questions, data schema, and Python codes, are inserted in a custom prompt.
PandasAI sends this custom prompt to the Amazon Bedrock Claude v2 model.
The app uses the PandasAI agent to interact with the Amazon Bedrock Claude v2 model, generating Python code for Amazon Monitron data analysis and NLQ responses.
After the Amazon Bedrock Claude v2 model returns the Python code, PandasAI runs the Python query on the Amazon Monitron data uploaded from the app, collecting code outputs and addressing any necessary retries for failed runs.
The Streamlit app collects the response via PandasAI, and provides the output to users. If the output is satisfactory, the user can mark it as helpful, saving the NLQ and Claude-generated Python code in OpenSearch Service.

Use case 2: Summary generation of malfunctioning parts
Our summary generation use case consists of the following steps:

After the user knows which industrial asset shows anomalous behavior, they can upload images of the malfunctioning part to identify if there is something physically wrong with this part according to its technical specification and operation condition.
The user can use the Amazon Recognition DetectText API to extract text data from these images.
The extracted text data is included in the prompt for the Amazon Bedrock Claude v2 model, enabling the model to generate a 200-word summary of the malfunctioning part. The user can use this information to perform further inspection of the part.

Use case 3: Root cause diagnosis
Our root cause diagnosis use case consists of the following steps:

The user obtains enterprise data in various document formats (PDF, TXT, and so on) related with malfunctioning assets, and uploads them to an S3 bucket.
A knowledge base of these files is generated in Amazon Bedrock with a Titan text embeddings model and a default OpenSearch Service vector store.
The user poses questions related to the root cause diagnosis for malfunctioning equipment. Answers are generated through the Amazon Bedrock knowledge base with a RAG approach.

Prerequisites
To follow along with this post, you should meet the following prerequisites:

You need an AWS account with an AWS Identity and Access Management (IAM) role with admin permissions to manage resources created as part of the solution. For details, refer to Step 1: Create your AWS account.
For this tutorial, you need a bash terminal with Python 3.9 or higher installed on Linux, Mac, or Windows Subsystem for Linux, and an AWS account. We also recommend using an Amazon Elastic Compute Cloud (Amazon EC2) instance (Ubuntu Server 22.04 LTS).
Install or update the AWS Command Line Interface (AWS CLI) on either your PC or EC2 instance.
Request access to the Amazon Bedrock model.

Deploy the solution infrastructure
To set up your solution resources, complete the following steps:

Deploy the AWS CloudFormation template opensearchsagemaker.yml, which creates an OpenSearch Service collection and index, Amazon SageMaker notebook instance, and S3 bucket. You can name this AWS CloudFormation stack as: genai-sagemaker.
Open the SageMaker notebook instance in JupyterLab. You will find the following GitHub repo already downloaded on this instance: unlocking-the-potential-of-generative-ai-in-industrial-operations.
Run the notebook from the following directory in this repository: unlocking-the-potential-of-generative-ai-in-industrial-operations/SagemakerNotebook/nlq-vector-rag-embedding.ipynb. This notebook will load the OpenSearch Service index using the SageMaker notebook to store key-value pairs from the existing 23 NLQ examples.
Upload documents from the data folder assetpartdoc in the GitHub repository to the S3 bucket listed in the CloudFormation stack outputs.

Next, you create the knowledge base for the documents in Amazon S3.

On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
Choose Create knowledge base.
For Knowledge base name, enter a name.
For Runtime role, select Create and use a new service role.
For Data source name, enter the name of your data source.
For S3 URI, enter the S3 path of the bucket where you uploaded the root cause documents.
Choose Next. The Titan embeddings model is automatically selected.
Select Quick create a new vector store.
Review your settings and create the knowledge base by choosing Create knowledge base.
After the knowledge base is successfully created, choose Sync to sync the S3 bucket with the knowledge base.
After you set up the knowledge base, you can test the RAG approach for root cause diagnosis by asking questions like “My actuator travels slow, what might be the issue?”

The next step is to deploy the app with the required library packages on either your PC or an EC2 instance (Ubuntu Server 22.04 LTS).

Set up your AWS credentials with the AWS CLI on your local PC. For simplicity, you can use the same admin role you used to deploy the CloudFormation stack. If you’re using Amazon EC2, attach a suitable IAM role to the instance.
Clone GitHub repo:

git clone https://github.com/aws-samples/unlocking-the-potential-of-generative-ai-in-industrial-operations

Change the directory to unlocking-the-potential-of-generative-ai-in-industrial-operations/src and run the setup.sh script in this folder to install the required packages, including LangChain and PandasAI: cd unlocking-the-potential-of-generative-ai-in-industrial-operations/src
chmod +x ./setup.sh
./setup.sh
Run the Streamlit app with the following command: source monitron-genai/bin/activate
python3 -m streamlit run app_bedrock.py <REPLACE WITH YOUR BEDROCK KNOWLEDGEBASE ARN>

Provide the OpenSearch Service collection ARN you created in Amazon Bedrock from the previous step.
Chat with your asset health assistant
After you complete the end-to-end deployment, you can access the app via localhost on port 8501, which opens a browser window with the web interface. If you deployed the app on an EC2 instance, allow port 8501 access via the security group inbound rule. You can navigate to different tabs for various use cases.
Explore use case 1
To explore the first use case, choose Data Insight and Chart. Begin by uploading your time series data. If you don’t have an existing time series data file to use, you can upload the following sample CSV file with anonymous Amazon Monitron project data. If you already have an Amazon Monitron project, refer to Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis to stream your Amazon Monitron data to Amazon S3 and use your data with this application.
When the upload is complete, enter a query to initiate a conversation with your data. The left sidebar offers a range of example questions for your convenience. The following screenshots illustrate the response and Python code generated by the FM when inputting a question such as “Tell me the unique number of sensors for each site shown as Warning or Alarm respectively?” (a hard-level question) or “For sensors shown temperature signal as NOT Healthy, can you calculate the time duration in days for each sensor shown abnormal vibration signal?” (a challenge-level question). The app will answer your question, and will also show the Python script of data analysis it performed to generate such results.

If you’re satisfied with the answer, you can mark it as Helpful, saving the NLQ and Claude-generated Python code to an OpenSearch Service index.

Explore use case 2
To explore the second use case, choose the Captured Image Summary tab in the Streamlit app. You can upload an image of your industrial asset, and the application will generate a 200-word summary of its technical specification and operation condition based on the image information. The following screenshot shows the summary generated from an image of a belt motor drive. To test this feature, if you lack a suitable image, you can use the following example image.

Hydraulic elevator motor label” by Clarence Risher is licensed under CC BY-SA 2.0.

Explore use case 3
To explore the third use case, choose the Root cause diagnosis tab. Input a query related to your broken industrial asset, such as, “My actuator travels slow, what might be the issue?” As depicted in the following screenshot, the application delivers a response with the source document excerpt used to generate the answer.

Use case 1: Design details
In this section, we discuss the design details of the application workflow for the first use case.
Custom prompt building
The user’s natural language query comes with different difficult levels: easy, hard, and challenge.
Straightforward questions may include the following requests:

Select unique values
Count total numbers
Sort values

For these questions, PandasAI can directly interact with the FM to generate Python scripts for processing.
Hard questions require basic aggregation operation or time series analysis, such as the following:

Select value first and group results hierarchically
Perform statistics after initial record selection
Timestamp count (for example, min and max)

For hard questions, a prompt template with detailed step-by-step instructions assists FMs in providing accurate responses.
Challenge-level questions need advanced math calculation and time series processing, such as the following:

Calculate anomaly duration for each sensor
Calculate anomaly sensors for site on a monthly basis
Compare sensor readings under normal operation and abnormal conditions

For these questions, you can use multi-shots in a custom prompt to enhance response accuracy. Such multi-shots show examples of advanced time series processing and math calculation, and will provide context for the FM to perform relevant inference on similar analysis. Dynamically inserting the most relevant examples from an NLQ question bank into the prompt can be a challenge. One solution is to construct embeddings from existing NLQ question samples and save these embeddings in a vector store like OpenSearch Service. When a question is sent to the Streamlit app, the question will be vectorized by BedrockEmbeddings. The top N most-relevant embeddings to that question are retrieved using opensearch_vector_search.similarity_search and inserted into the prompt template as a multi-shot prompt.
The following diagram illustrates this workflow.

The embedding layer is constructed using three key tools:

Embeddings model – We use Amazon Titan Embeddings available through Amazon Bedrock (amazon.titan-embed-text-v1) to generate numerical representations of textual documents.
Vector store – For our vector store, we use OpenSearch Service via the LangChain framework, streamlining the storage of embeddings generated from NLQ examples in this notebook.
Index – The OpenSearch Service index plays a pivotal role in comparing input embeddings to document embeddings and facilitating the retrieval of relevant documents. Because the Python example codes were saved as a JSON file, they were indexed in OpenSearch Service as vectors via an OpenSearchVevtorSearch.fromtexts API call.

Continuous collection of human-audited examples via Streamlit
At the outset of app development, we began with only 23 saved examples in the OpenSearch Service index as embeddings. As the app goes live in the field, users start inputting their NLQs via the app. However, due to the limited examples available in the template, some NLQs may not find similar prompts. To continuously enrich these embeddings and offer more relevant user prompts, you can use the Streamlit app for gathering human-audited examples.
Within the app, the following function serves this purpose. When end-users find the output helpful and select Helpful, the application follows these steps:

Use the callback method from PandasAI to collect the Python script.
Reformat the Python script, input question, and CSV metadata into a string.
Check whether this NLQ example already exists in the current OpenSearch Service index using opensearch_vector_search.similarity_search_with_score.
If there’s no similar example, this NLQ is added to the OpenSearch Service index using opensearch_vector_search.add_texts.

In the event that a user selects Not Helpful, no action is taken. This iterative process makes sure that the system continually improves by incorporating user-contributed examples.
def addtext_opensearch(input_question, generated_chat_code, df_column_metadata, opensearch_vector_search,similarity_threshold,kexamples, indexname):
#######build the input_question and generated code the same format as existing opensearch index##########
reconstructed_json = {}
reconstructed_json[“question”]=input_question
reconstructed_json[“python_code”]=str(generated_chat_code)
reconstructed_json[“column_info”]=df_column_metadata
json_str = ”
for key,value in reconstructed_json.items():
json_str += key + ‘:’ + value
reconstructed_raw_text =[]
reconstructed_raw_text.append(json_str)

results = opensearch_vector_search.similarity_search_with_score(str(reconstructed_raw_text[0]), k=kexamples) # our search query # return 3 most relevant docs
if (dumpd(results[0][1])<similarity_threshold): ###No similar embedding exist, then add text to embedding
response = opensearch_vector_search.add_texts(texts=reconstructed_raw_text, engine=”faiss”, index_name=indexname)
else:
response = “A similar embedding is already exist, no action.”

return response

By incorporating human auditing, the quantity of examples in OpenSearch Service available for prompt embedding grows as the app gains usage. This expanded embedding dataset results in enhanced search accuracy over time. Specifically, for challenging NLQs, the FM’s response accuracy reaches approximately 90% when dynamically inserting similar examples to construct custom prompts for each NLQ question. This represents a notable 28% increase compared to scenarios without multi-shot prompts.
Use case 2: Design details
On the Streamlit app’s Captured Image Summary tab, you can directly upload an image file. This initiates the Amazon Rekognition API (detect_text API), extracting text from the image label detailing machine specifications. Subsequently, the extracted text data is sent to the Amazon Bedrock Claude model as the context of a prompt, resulting in a 200-word summary.
From a user experience perspective, enabling streaming functionality for a text summarization task is paramount, allowing users to read the FM-generated summary in smaller chunks rather than waiting for the entire output. Amazon Bedrock facilitates streaming via its API (bedrock_runtime.invoke_model_with_response_stream).
Use case 3: Design details
In this scenario, we’ve developed a chatbot application focused on root cause analysis, employing the RAG approach. This chatbot draws from multiple documents related to bearing equipment to facilitate root cause analysis. This RAG-based root cause analysis chatbot uses knowledge bases for generating vector text representations, or embeddings. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows and RAG implementation details.
When you’re satisfied with the knowledge base response from Amazon Bedrock, you can integrate the root cause response from the knowledge base to the Streamlit app.
Clean up
To save costs, delete the resources you created in this post:

Delete the knowledge base from Amazon Bedrock.
Delete the OpenSearch Service index.
Delete the genai-sagemaker CloudFormation stack.
Stop the EC2 instance if you used an EC2 instance to run the Streamlit app.

Conclusion
Generative AI applications have already transformed various business processes, enhancing worker productivity and skill sets. However, the limitations of FMs in handling time series data analysis have hindered their full utilization by industrial clients. This constraint has impeded the application of generative AI to the predominant data type processed daily.
In this post, we introduced a generative AI Application solution designed to alleviate this challenge for industrial users. This application uses an open source agent, PandasAI, to strengthen an FM’s time series analysis capability. Rather than sending time series data directly to FMs, the app employs PandasAI to generate Python code for the analysis of unstructured time series data. To enhance the accuracy of Python code generation, a custom prompt generation workflow with human auditing has been implemented.
Empowered with insights into their asset health, industrial workers can fully harness the potential of generative AI across various use cases, including root cause diagnosis and part replacement planning. With Knowledge Bases for Amazon Bedrock, the RAG solution is straightforward for developers to build and manage.
The trajectory of enterprise data management and operations is unmistakably moving towards deeper integration with generative AI for comprehensive insights into operational health. This shift, spearheaded by Amazon Bedrock, is significantly amplified by the growing robustness and potential of LLMs like Amazon Bedrock Claude 3 to further elevate solutions. To learn more, visit consult the Amazon Bedrock documentation, and get hands-on with the Amazon Bedrock workshop.

About the authors
Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She is specialized in Generative AI, Applied Data Science and IoT architecture. Currently she is part of the Amazon Q team, and an active member/mentor in Machine Learning Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome generative AI solutions. She is particularly passionate about leveraging Large Language Models for advanced data analytics and exploring practical applications that address real-world challenges.
Sudeesh Sasidharan is a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.
Neil Desai is a technology executive with over 20 years of experience in artificial intelligence (AI), data science, software engineering, and enterprise architecture. At AWS, he leads a team of Worldwide AI services specialist solutions architects who help customers build innovative Generative AI-powered solutions, share best practices with customers, and drive product roadmap. In his previous roles at Vestas, Honeywell, and Quest Diagnostics, Neil has held leadership roles in developing and launching innovative products and services that have helped companies improve their operations, reduce costs, and increase revenue. He is passionate about using technology to solve real-world problems and is a strategic thinker with a proven track record of success.

Enhance performance of generative language models with self-consistenc …

Generative language models have proven remarkably skillful at solving logical and analytical natural language processing (NLP) tasks. Furthermore, the use of prompt engineering can notably enhance their performance. For example, chain-of-thought (CoT) is known to improve a model’s capacity for complex multi-step problems. To additionally boost accuracy on tasks that involve reasoning, a self-consistency prompting approach has been suggested, which replaces greedy with stochastic decoding during language generation.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies and Amazon via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With the batch inference API, you can use Amazon Bedrock to run inference with foundation models in batches and get responses more efficiently. This post shows how to implement self-consistency prompting via batch inference on Amazon Bedrock to enhance model performance on arithmetic and multiple-choice reasoning tasks.
Overview of solution
Self-consistency prompting of language models relies on the generation of multiple responses that are aggregated into a final answer. In contrast to single-generation approaches like CoT, the self-consistency sample-and-marginalize procedure creates a range of model completions that lead to a more consistent solution. The generation of different responses for a given prompt is possible due to the use of a stochastic, rather than greedy, decoding strategy.
The following figure shows how self-consistency differs from greedy CoT in that it generates a diverse set of reasoning paths and aggregates them to produce the final answer.

Decoding strategies for text generation
Text generated by decoder-only language models unfolds word by word, with the subsequent token being predicted on the basis of the preceding context. For a given prompt, the model computes a probability distribution indicating the likelihood of each token to appear next in the sequence. Decoding involves translating these probability distributions into actual text. Text generation is mediated by a set of inference parameters that are often hyperparameters of the decoding method itself. One example is the temperature, which modulates the probability distribution of the next token and influences the randomness of the model’s output.
Greedy decoding is a deterministic decoding strategy that at each step selects the token with the highest probability. Although straightforward and efficient, the approach risks falling into repetitive patterns, because it disregards the broader probability space. Setting the temperature parameter to 0 at inference time essentially equates to implementing greedy decoding.
Sampling introduces stochasticity into the decoding process by randomly selecting each subsequent token based on the predicted probability distribution. This randomness results in greater output variability. Stochastic decoding proves more adept at capturing the diversity of potential outputs and often yields more imaginative responses. Higher temperature values introduce more fluctuations and increase the creativity of the model’s response.
Prompting techniques: CoT and self-consistency
The reasoning ability of language models can be augmented via prompt engineering. In particular, CoT has been shown to elicit reasoning in complex NLP tasks. One way to implement a zero-shot CoT is via prompt augmentation with the instruction to “think step by step.” Another is to expose the model to exemplars of intermediate reasoning steps in few-shot prompting fashion. Both scenarios typically use greedy decoding. CoT leads to significant performance gains compared to simple instruction prompting on arithmetic, commonsense, and symbolic reasoning tasks.
Self-consistency prompting is based on the assumption that introducing diversity in the reasoning process can be beneficial to help models converge on the correct answer. The technique uses stochastic decoding to achieve this goal in three steps:

Prompt the language model with CoT exemplars to elicit reasoning.
Replace greedy decoding with a sampling strategy to generate a diverse set of reasoning paths.
Aggregate the results to find the most consistent answer in the response set.

Self-consistency is shown to outperform CoT prompting on popular arithmetic and commonsense reasoning benchmarks. A limitation of the approach is its larger computational cost.
This post shows how self-consistency prompting enhances performance of generative language models on two NLP reasoning tasks: arithmetic problem-solving and multiple-choice domain-specific question answering. We demonstrate the approach using batch inference on Amazon Bedrock:

We access the Amazon Bedrock Python SDK in JupyterLab on an Amazon SageMaker notebook instance.
For arithmetic reasoning, we prompt Cohere Command on the GSM8K dataset of grade school math problems.
For multiple-choice reasoning, we prompt AI21 Labs Jurassic-2 Mid on a small sample of questions from the AWS Certified Solutions Architect – Associate exam.

Prerequisites
This walkthrough assumes the following prerequisites:

An AWS account with a ml.t3.medium notebook Instance hosted in SageMaker.
An AWS Identity and Access Management (IAM) SageMaker execution role with attached AmazonBedrockFullAccess and iam:PassRole policies to run Jupyter inside the SageMaker notebook instance.
An IAM BedrockBatchInferenceRole role for batch inference with Amazon Bedrock with Amazon Simple Storage Service (Amazon S3) access and sts:AssumeRole trust policies. For more information, refer to Set up permissions for batch inference.
Access to models hosted on Amazon Bedrock. Choose Manage model access on the Amazon Bedrock console and choose among the list of available options. We use Cohere Command and AI21 Labs Jurassic-2 Mid for this demo.

The estimated cost to run the code shown in this post is $100, assuming you run self-consistency prompting one time with 30 reasoning paths using one value for the temperature-based sampling.
Dataset to probe arithmetic reasoning capabilities
GSM8K is a dataset of human-assembled grade school math problems featuring a high linguistic diversity. Each problem takes 2–8 steps to solve and requires performing a sequence of elementary calculations with basic arithmetic operations. This data is commonly used to benchmark the multi-step arithmetic reasoning capabilities of generative language models. The GSM8K train set comprises 7,473 records. The following is an example:
{“question”: “Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?”, “answer”: “Natalia sold 48/2 = <<48/2=24>>24 clips in May.nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.n#### 72”}
Set up to run batch inference with Amazon Bedrock
Batch inference allows you to run multiple inference calls to Amazon Bedrock asynchronously and improve the performance of model inference on large datasets. The service is in preview as of this writing and only available through the API. Refer to Run batch inference to access batch inference APIs via custom SDKs.
After you have downloaded and unzipped the Python SDK in a SageMaker notebook instance, you can install it by running the following code in a Jupyter notebook cell:

# Install preview SDK packages
!pip install -q $(ls ./bedrock-python-sdk-reinvent/botocore-*.whl | head -1)
!pip install -q $(ls ./bedrock-python-sdk-reinvent/boto3-*.whl | head -1)

Format and upload input data to Amazon S3
Input data for batch inference needs to be prepared in JSONL format with recordId and modelInput keys. The latter should match the body field of the model to be invoked on Amazon Bedrock. In particular, some supported inference parameters for Cohere Command are temperature for randomness, max_tokens for output length, and num_generations to generate multiple responses, all of which are passed together with the prompt as modelInput:

data = [
{
“recordId”: “1”,
“modelInput”: {
“prompt”: prompt,
“temperature”: temperature,
“max_tokens”: max_tokens,
“num_generations”: n,
},
},
…,
]

See Inference parameters for foundation models for more details, including other model providers.
Our experiments on arithmetic reasoning are performed in the few-shot setting without customizing or fine-tuning Cohere Command. We use the same set of eight few-shot exemplars from the chain-of-thought (Table 20) and self-consistency (Table 17) papers. Prompts are created by concatenating the exemplars with each question from the GSM8K train set.
We set max_tokens to 512 and num_generations to 5, the maximum allowed by Cohere Command. For greedy decoding, we set temperature to 0 and for self-consistency, we run three experiments at temperatures 0.5, 0.7, and 1. Each setting yields different input data according to the respective temperature values. Data is formatted as JSONL and stored in Amazon S3.

# Set up S3 client
session = boto3.Session()
s3 = session.client(“s3″)

# Create S3 bucket with unique name to store input/output data
suffix = str(uuid.uuid4())[:8]
bucket = f”bedrock-self-consistency-{suffix}”
s3.create_bucket(
Bucket=bucket, CreateBucketConfiguration={“LocationConstraint”: session.region_name}
)

# Process data and output to new lines as JSONL
input_key = f”gsm8k/T{temperature}/input.jsonl”
s3_data = “”
for row in data:
s3_data += json.dumps(row) + “n”
s3.put_object(Body=s3_data, Bucket=bucket, Key=input_key)

Create and run batch inference jobs in Amazon Bedrock
Batch inference job creation requires an Amazon Bedrock client. We specify the S3 input and output paths and give each invocation job a unique name:

# Create Bedrock client
bedrock = boto3.client(“bedrock”)

# Input and output config
input_config = {“s3InputDataConfig”: {“s3Uri”: f”s3://{bucket}/{input_key}”}}
output_config = {“s3OutputDataConfig”: {“s3Uri”: f”s3://{bucket}/{output_key}”}}

# Create a unique job name
suffix = str(uuid.uuid4())[:8]
job_name = f”command-batch-T{temperature}-{suffix}”

Jobs are created by passing the IAM role, model ID, job name, and input/output configuration as parameters to the Amazon Bedrock API:

response = bedrock.create_model_invocation_job(
roleArn=f”arn:aws:iam::{account_id}:role/BedrockBatchInferenceRole”,
modelId=”cohere.command-text-v14″,
jobName=job_name,
inputDataConfig=input_config,
outputDataConfig=output_config,
)
job_arn = response[“jobArn”]

Listing, monitoring, and stopping batch inference jobs is supported by their respective API calls. On creation, jobs appear first as Submitted, then as InProgress, and finally as Stopped, Failed, or Completed.

# Get job details
job_details = bedrock.get_model_invocation_job(jobIdentifier=job_arn)

If the jobs are successfully complete, the generated content can be retrieved from Amazon S3 using its unique output location.

# Get the output file key
s3_prefix = f”s3://{bucket}/”
output_path = job_details[“outputDataConfig”][“s3OutputDataConfig”][“s3Uri”].replace(
s3_prefix, “”
)
output_folder = job_details[“jobArn”].split(“/”)[1]
output_file = (
f'{job_details[“inputDataConfig”][“s3InputDataConfig”][“s3Uri”].split(“/”)[-1]}.out’
)
result_key = f”{output_path}{output_folder}/{output_file}”

# Get output data
obj = s3.get_object(Bucket=bucket, Key=result_key)
content = obj[“Body”].read().decode(“utf-8”).strip().split(“n”)

# Show answer to the first question
print(json.loads(content[0])[“modelOutput”][“generations”][0][“text”])

[Out]: ‘Natalia sold 48 * 1/2 = 24 clips less in May. This means she sold 48 + 24 = 72 clips in April and May. The answer is 72.’
Self-consistency enhances model accuracy on arithmetic tasks
Self-consistency prompting of Cohere Command outperforms a greedy CoT baseline in terms of accuracy on the GSM8K dataset. For self-consistency, we sample 30 independent reasoning paths at three different temperatures, with topP and topK set to their default values. Final solutions are aggregated by choosing the most consistent occurrence via majority voting. In case of a tie, we randomly choose one of the majority responses. We compute accuracy and standard deviation values averaged over 100 runs.
The following figure shows the accuracy on the GSM8K dataset from Cohere Command prompted with greedy CoT (blue) and self-consistency at temperature values 0.5 (yellow), 0.7 (green), and 1.0 (orange) as a function of the number of sampled reasoning paths.

The preceding figure shows that self-consistency enhances arithmetic accuracy over greedy CoT when the number of sampled paths is as low as three. Performance increases consistently with further reasoning paths, confirming the importance of introducing diversity in the thought generation. Cohere Command solves the GSM8K question set with 51.7% accuracy when prompted with CoT vs. 68% with 30 self-consistent reasoning paths at T=1.0. All three surveyed temperature values yield similar results, with lower temperatures being comparatively more performant at less sampled paths.
Practical considerations on efficiency and cost
Self-consistency is limited by the increased response time and cost incurred when generating multiple outputs per prompt. As a practical illustration, batch inference for greedy generation with Cohere Command on 7,473 GSM8K records finished in less than 20 minutes. The job took 5.5 million tokens as input and generated 630,000 output tokens. At current Amazon Bedrock inference prices, the total cost incurred was around $9.50.
For self-consistency with Cohere Command, we use inference parameter num_generations to create multiple completions per prompt. As of this writing, Amazon Bedrock allows a maximum of five generations and three concurrent Submitted batch inference jobs. Jobs proceed to the InProgress status sequentially, therefore sampling more than five paths requires multiple invocations.
The following figure shows the runtimes for Cohere Command on the GSM8K dataset. Total runtime is shown on the x axis and runtime per sampled reasoning path on the y axis. Greedy generation runs in the shortest time but incurs a higher time cost per sampled path.

Greedy generation completes in less than 20 minutes for the full GSM8K set and samples a unique reasoning path. Self-consistency with five samples requires about 50% longer to complete and costs around $14.50, but produces five paths (over 500%) in that time. Total runtime and cost increase step-wise with every extra five sampled paths. A cost-benefit analysis suggests that 1–2 batch inference jobs with 5–10 sampled paths is the recommended setting for practical implementation of self-consistency. This achieves enhanced model performance while keeping cost and latency at bay.
Self-consistency enhances model performance beyond arithmetic reasoning
A crucial question to prove the suitability of self-consistency prompting is whether the method succeeds across further NLP tasks and language models. As an extension to an Amazon-related use case, we perform a small-sized analysis on sample questions from the AWS Solutions Architect Associate Certification. This is a multiple-choice exam on AWS technology and services that requires domain knowledge and the ability to reason and decide among several options.
We prepare a dataset from SAA-C01 and SAA-C03 sample exam questions. From the 20 available questions, we use the first 4 as few-shot exemplars and prompt the model to answer the remaining 16. This time, we run inference with the AI21 Labs Jurassic-2 Mid model and generate a maximum of 10 reasoning paths at temperature 0.7. Results show that self-consistency enhances performance: although greedy CoT produces 11 correct answers, self-consistency succeeds on 2 more.
The following table shows the accuracy results for 5 and 10 sampled paths averaged over 100 runs.

.
Greedy decoding
T = 0.7

# sampled paths: 5
68.6
74.1 ± 0.7

# sampled paths: 10
68.6
78.9 ± 0.3

In the following table, we present two exam questions that are incorrectly answered by greedy CoT while self-consistency succeeds, highlighting in each case the correct (green) or incorrect (red) reasoning traces that led the model to produce correct or incorrect responses. Although not every sampled path generated by self-consistency is correct, the majority converges on the true answer as the number of sampled paths increases. We observe that 5–10 paths are typically enough to improve over the greedy results, with diminishing returns in terms of efficiency past those values.

Question
A web application allows customers to upload orders to an S3 bucket. The resulting Amazon S3 events trigger a Lambda function that inserts a message to an SQS queue. A single EC2 instance reads messages from the queue, processes them, and stores them in a DynamoDB table partitioned by unique order ID. Next month traffic is expected to increase by a factor of 10 and a Solutions Architect is reviewing the architecture for possible scaling problems. Which component is MOST likely to need re-architecting to be able to scale to accommodate the new traffic? A. Lambda function B. SQS queue C. EC2 instance D. DynamoDB table
An application running on AWS uses an Amazon Aurora Multi-AZ DB cluster deployment for its database. When evaluating performance metrics, a solutions architect discovered that the database reads are causing high I/O and adding latency to the write requests against the database. What should the solutions architect do to separate the read requests from the write requests? A. Enable read-through caching on the Aurora database. B. Update the application to read from the Multi-AZ standby instance. C. Create an Aurora replica and modify the application to use the appropriate endpoints. D. Create a second Aurora database and link it to the primary database as a read replica.

Correct answer
C – A single EC2 instance will not scale and is a single point of failure in the architecture. A much better solution would be to have EC2 instances in an Auto Scaling group across 2 availability zones read messages from the queue. The other responses are all managed services that can be configured to scale or will scale automatically.
C – Aurora Replicas provide a way to offload read traffic. Aurora Replicas share the same underlying storage as the main database, so lag time is generally very low. Aurora Replicas have their own endpoints, so the application will need to be configured to direct read traffic to the new endpoints.

Greedy decoding
Scaling the EC2 instance directly may not be possible, as it may not be possible to resize the instance. Scaling the Lambda function may not be sufficient, as Lambda functions have a 6 GB of memory allocation limit. Scaling the SQS queue may not be sufficient, as SQS has a 10,000 message limit. DynamoDB tables can be scaled by provisioning more capacity units. The correct answer is D.
Modifying the application to read from the Multi-AZ standby instance (option B) is the best approach. The standby instance is asynchronously replicated from the primary instance, so read operations will not interfere with write operations. Enabling read-through caching (option A) or creating an Aurora replica (option C) will not solve the I/O contention issue. Creating a second Aurora database and linking it as a read replica (option D) will not solve the I/O contention issue either. The correct answer is B.

Sampled path #1
Scaling the EC2 instance to 10 times its current size may not be feasible, either from a cost or performance perspective. It would require a larger instance, which can be expensive. A more cost-effective approach might be to use a swarm of smaller instances (e.g., Auto Scaling Group) to share the load. In this case, the EC2 instance would likely need to be re-architected. The correct answer is C.
Modifying the application to use the appropriate endpoints is not a practical solution as it would require changes to the application code. The correct answer is C: Create an Aurora replica and modify the application to use the appropriate endpoints.

Sampled path #2
Scaling a single EC2 instance is more difficult and problematic than scaling the other service components. Amazon SQS, Lambda, and DynamoDB can all scale automatically to cope with the increased load. The correct answer is C.
(C) Create an Aurora replica and modify the application to use the appropriate endpoints. By configuring an Aurora Replica, you can separate read traffic from write traffic. The Aurora Replicas use different endpoint URLs, allowing you to direct read traffic to the replica instead of the primary database. The replica can process read requests in parallel with write requests to the primary database, reducing I/O and latency.

Clean up
Running batch inference in Amazon Bedrock is subject to charges according to the Amazon Bedrock Pricing. When you complete the walkthrough, delete your SageMaker notebook instance and remove all data from your S3 buckets to avoid incurring future charges.
Considerations
Although the demonstrated solution shows improved performance of language models when prompted with self-consistency, it’s important to note that the walkthrough is not production-ready. Before you deploy to production, you should adapt this proof of concept to your own implementation, keeping in mind the following requirements:

Access restriction to APIs and databases to prevent unauthorized usage.
Adherence to AWS security best practices regarding IAM role access and security groups.
Validation and sanitization of user input to prevent prompt injection attacks.
Monitoring and logging of triggered processes to enable testing and auditing.

Conclusion
This post shows that self-consistency prompting enhances performance of generative language models in complex NLP tasks that require arithmetic and multiple-choice logical skills. Self-consistency uses temperature-based stochastic decoding to generate various reasoning paths. This increases the ability of the model to elicit diverse and useful thoughts to arrive at correct answers.
With Amazon Bedrock batch inference, the language model Cohere Command is prompted to generate self-consistent answers to a set of arithmetic problems. Accuracy improves from 51.7% with greedy decoding to 68% with self-consistency sampling 30 reasoning paths at T=1.0. Sampling five paths already enhances accuracy by 7.5 percent points. The approach is transferable to other language models and reasoning tasks, as demonstrated by results of the AI21 Labs Jurassic-2 Mid model on an AWS Certification exam. In a small-sized question set, self-consistency with five sampled paths increases accuracy by 5 percent points over greedy CoT.
We encourage you to implement self-consistency prompting for enhanced performance in your own applications with generative language models. Learn more about Cohere Command and AI21 Labs Jurassic models available on Amazon Bedrock. For more information about batch inference, refer to Run batch inference.
Acknowledgements
The author thanks technical reviewers Amin Tajgardoon and Patrick McSweeney for helpful feedback.

About the Author

Lucía Santamaría is a Sr. Applied Scientist at Amazon’s ML University, where she’s focused on raising the level of ML competency across the company through hands-on education. Lucía has a PhD in astrophysics and is passionate about democratizing access to tech knowledge and tools.

3 Innovative Tactics for Converting Organic Traffic to Revenue

Having been in the SEO space for quite some time, I can tell you the hardest part about SEO isn’t growing organic traffic. The hardest part is understanding where that organic traffic is in the funnel and converting those visitors into actual customers. 

This is reflected in conversion rate data. 

With average organic conversion rates hovering around 2.35% across industries, it’s clear a significant portion of organic traffic is just passing through, often too top-of-the-funnel, or landing on pages without a clear path to conversion. 

Unfortunately, without the right technology or tools in place, the majority of those visitors are lost to the ether, never to return again. 

But fear not, this is where innovation comes in. 

By weaving together creativity, technology, and data-driven strategies, there’s a huge opportunity to transform these visitors into loyal customers. 

Let’s look at three ways you can convert organic traffic into real revenue.

Convert Website Visitors into Real Contacts!

Identify who is visiting your site with name, email and more. Get 500 contacts for free!

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

1. Website Visitor Identification

One of the biggest hurdles in converting organic traffic is the anonymity of website visitors. 

Without knowing who’s visiting your site, tailoring the user experience to meet their needs and interests becomes a shot in the dark. 

Herein lies the power of website visitor identification technology—a game-changer when it comes to personalized engagement and boosting conversion rates.

With website visitor identification, you can get data on each individual visitor – names, emails, domains, etc. 

By connecting these visitors to your CRM, you can start to create (or expand upon) a customer profile. 

We can see if they are a return visitor, if they’ve downloaded an asset, if they’ve made a purchase, etc.

With this information in hand, we can market to them in a more effective way. 

Example:

Let’s say someone comes to your site through Google and lands on a blog post featuring tips for running in the heat. 

We have no record of them in our CMS but we can see the page they landed on and exited.

We can then add them to our retargeting audience list and show them ads featuring equipment for running in warm weather.

This is amazing stuff!

It also lends perfectly into the next section – customer journey tracking.

[Did you know we integrate with Semrush?]

2. Customer Journey Tracking

If you remember the old days of using Google Analytics to track the customer journey, you know how impactful it was.

The ability to understand how users interacted with your content and how they navigated through your site was immensely helpful.

Unfortunately, with the rollout of GA4, this is much more difficult to see and almost impossible to understand.

Instead, you need a tool that does this for you. 

And…here’s our shameless plug (hey, it’s our blog, we can plug if you want).

With Customers.ai, you can not only see who is visiting your site, but you can also track their journey.

We can see what pages they visited, what emails they opened, what products they were interested in, if they abandoned a cart or form.

THIS IS GOLD!!!

Look, customer journey tracking is not just a tool for understanding your audience, it’s critical for increasing revenue from organic traffic. 

By meticulously mapping the journey from the first touchpoint to conversion, we can unlock real insights and build strategies that directly influence the bottom line.

The Power of Tracking in Conversion Optimization

Let’s look at how customer journey tracking can help optimize conversions:

Optimize Each Touchpoint: By identifying what content and which channels are most effective, you can direct more resources towards what truly works, increasing efficiency and ROI.

Reduce Friction, Boost Conversions: Identifying where potential customers hesitate or abandon their journey allows for targeted interventions. Whether it’s streamlining the checkout process or clarifying product information, small tweaks can lead to significant increases in conversion rates.

Tailor the Conversion Path: Use insights from journey tracking to customize the path to purchase for different segments of your organic traffic. Personalized content and offers that resonate with the specific needs and interests of these segments can dramatically improve conversion rates.

Segment Your Audience: Adjust your marketing and content strategies to address the unique needs of different audience segments.

Improve Attribution: Organic doesn’t always get the credit it deserves. With customer journey tracking, we can see what is successful and allocate marketing spend more effectively.

Customer journey tracking is key to turning organic traffic into revenue by understanding and optimizing the conversion path. By focusing on the journey, marketers can not only attract but convert and retain customers more effectively, directly impacting revenue growth.

3. Advanced Segmentation and Targeted Messaging

When it comes to converting organic traffic to revenue, advanced segmentation and targeted messaging stand out as key tactics. 

With website identification and journey tracking already in place, advanced segmentation becomes a breeze. 

You can create segments based on pages visited, demographic data, intent, and so much more.

With these audiences in place, you can create custom messaging throughout the rest of their journey.

Example:

Let’s look at another example using the fitness company we mentioned above.

You created a page on dealing with runner’s knee. 

The page ranks exceptionally well and drives a ton of organic traffic. 

However, since visitors are likely looking for information vs. looking to make a purchase, the conversion rate on that page is very low. 

With website visitor identification in place, we can see just who those people are visiting the page and if they’ve had any previous interactions with your brand. 

For those who have, we can send them into an email automation, offering them key products for dealing with runner’s knee and personalizing the email based on their information (ex: men’s sneakers vs. women’s sneakers).

For those who haven’t, we can create targeted ads that touch on the same subject and take them back to a page featuring products to help with runner’s knee.

The messaging is targeted, it’s relevant, and when done right, it’s helpful!

If you are looking to create advanced segments and implement targeted messaging, here are a few things to consider:

Leverage Behavioral Data: Use your first-party data to track how different segments interact with your website and content. This data forms the basis of your targeted messaging, enabling you to address the specific needs and interests of each segment wherever they are.

Personalize the Experience: From email marketing to website content to retargeting ads, ensure that every touchpoint is personalized. Use the insights gained from deep segmentation to tailor your messaging, whether it’s offering a solution to a common problem or highlighting a product that a particular segment shows interest in.

Test and Optimize: Continuously test different messages with your segments to see what resonates best. A/B testing can reveal invaluable insights into preferences and behaviors, allowing you to refine your approach and boost conversion rates.

Advanced segmentation and targeted messaging are not just tactics, they are components of a sophisticated strategy to convert organic traffic into revenue. 

By understanding the needs and behaviors of your audience segments, you can craft messages that resonate deeply, engage effectively, and ultimately drive conversions. 

Converting Organic Traffic to Revenue

At the end of the day, capturing organic traffic is only the first step. The real challenge, and opportunity, lies in converting these visitors into customers. 

Using innovative technologies that support website visitor identification, customer journey tracking, and advanced segmentation, marketers can better reach our audience and improve overall conversion rates. 

Ready to get started? Get demo from the team and learn how Customers.ai can help you convert that coveted organic traffic into real revenue.

Unlock High-Intent Leads Hiding on Your Site

Book a demo of Customers.ai’s U.S. website visitor identification, customer journey insights and remarketing platform to skyrocket conversions and sales.

Book a Demo

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post 3 Innovative Tactics for Converting Organic Traffic to Revenue appeared first on Customers.ai.

Griffon v2: A Unified High-Resolution Artificial Intelligence Model De …

Recently, Large Vision Language Models (LVLMs) have demonstrated remarkable performance in tasks requiring both text and image comprehension. Particularly in region-level tasks like Referring Expression Comprehension (REC), this progress has become noticeable after image-text understanding and reasoning developments. Models such as Griffon have demonstrated remarkable performance in tasks such as object detection, suggesting a major advancement in perception inside LVLMs. This development has spurred additional research into the use of flexible references outside of textual descriptions to improve user interfaces.

Despite tremendous progress in fine-grained object perception, LVLMs are unable to outperform task-specific specialists in complex scenarios due to the constraint of picture resolution. This restriction limits their capacity to efficiently refer to things with both textual and visual cues, especially in areas like GUI Agents and counting activities. 

To overcome this, a team of researchers has introduced Griffon v2, a unified high-resolution model designed to provide flexible object referring via textual and visual cues. In order to tackle the problem of effectively increasing image resolution, a straightforward and lightweight downsampling projector has been presented. The goal of this projector’s design is to get over the limitations placed by Large Language Models’ input tokens. 

This approach greatly improves multimodal perception abilities by keeping fine features and entire contexts, especially for little things that lower-resolution models can miss. The team has built on this base using a plug-and-play visual tokenizer and has augmented Griffon v2 with visual-language co-referring capabilities. This feature makes it possible to interact with a variety of inputs in an easy-to-use manner, such as coordinates, free-form text, and flexible target pictures. 

Griffon v2 has proven to be effective in a variety of tasks, such as Referring Expression Generation (REG), phrase grounding, and Referring Expression Comprehension (REC), according to experimental data. The model has performed better in object detection and object counting than expert models. 

The team has summarized their primary contributions as follows:

High-Resolution Multimodal Perception Model: By eliminating the requirement to split images, the model offers a unique method for multimodal perception that improves local understanding. The model’s capacity to capture small details has been improved by its ability to handle resolutions up to 1K. 

Visual-Language Co-Referring Structure: To extend the model’s utility and enable many interaction modes, a co-referring structure has been presented that combines language and visual inputs. This feature makes more adaptable and natural communication between users and the model possible.

Extensive experiments have been conducted to verify the effectiveness of the model on a variety of localization tasks. In phrase grounding, Referring Expression Generation (REG), and Referring Expression Comprehension (REC), state-of-the-art performance has been obtained. The model has outperformed expert models in both quantitative and qualitative object counting, demonstrating its superiority in perception and comprehension.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Griffon v2: A Unified High-Resolution Artificial Intelligence Model Designed to Provide Flexible Object Referring Via Textual and Visual Cues appeared first on MarkTechPost.

RA-ISF: An Artificial Intelligence Framework Designed to Enhance Retri …

Developing and refining large language models (LLMs) have marked a revolutionary stride toward machines that understand and generate human-like text. Despite their significant advances, these models grapple with the inherent challenge of their knowledge being fixed at the point of their training. This limitation confines their adaptability and restricts their ability to assimilate new, updated information post-training, posing a critical bottleneck for applications requiring up-to-the-minute data.

Current research has ventured into retrieval-augmented generation (RAG) techniques to bridge the divide between static knowledge bases and dynamic information needs. RAG methods empower models to fetch and incorporate external information, broadening their horizons beyond the original dataset. This capability is pivotal, especially in scenarios where the relevance and timeliness of information can significantly influence the accuracy and reliability of model outputs.

Researchers from Zhejiang University, Southeast University, and Massachusetts Institute of Technology propose the Retrieval Augmented Iterative Self-Feedback (RA-ISF) framework. RA-ISF innovates by combining the model’s internal knowledge assessment with a strategic retrieval of external data while employing an iterative feedback mechanism to refine its understanding and application of this information. The framework operates through a series of meticulously designed submodules that tackle distinct facets of the information retrieval and integration process. This includes initial self-assessment to determine a question’s answerability based on existing knowledge, followed by a relevance check of external information and, if necessary, decomposition of complex queries into more manageable sub-questions. Each of these steps is crucial for ensuring that the model accesses the most pertinent information and interprets and utilizes it correctly.

Its unique iterative self-feedback loop sets RA-ISF apart from conventional RAG methods. This loop enables the model to continuously refine its search and comprehension processes, leading to more accurate and relevant responses. Such a design amplifies the model’s ability to tackle complex queries with higher precision and significantly reduces errors and hallucinations, instances where models generate misleading or entirely fabricated information. This reduction in inaccuracies is a pivotal improvement, as it enhances the trustworthiness and reliability of the model’s outputs, making them more usable in real-world applications.

Empirical evaluations across various benchmarks and datasets underscore RA-ISF’s superior performance. By systematically enhancing the interaction between the model’s inherent knowledge base and external data sources, RA-ISF remarkably improves answering complex questions. This is evidenced by its ability to outperform existing benchmarks, showcasing its potential to redefine the capabilities of LLMs. Moreover, its success across different models, including GPT3.5 and Llama2, highlights its adaptability and robustness, further establishing its significance in the landscape of AI research. These practical results reassure RA-ISF’s potential to enhance the performance of AI systems in real-world applications.

In conclusion, RA-ISF embodies a significant stride toward resolving the long-standing challenge of integrating dynamic, external knowledge with the static data repositories of LLMs. By facilitating a more nuanced and refined approach to information retrieval and utilization, RA-ISF elevates the model’s performance and broadens its applicability across a spectrum of real-world scenarios. Its ability to iteratively refine and adjust its processes ensures that the model remains relevant and accurate, marking a paradigm shift in how the future of intelligent systems is envisioned. With its innovative structure and proven efficacy, this framework sets a new benchmark for developing more intelligent, adaptable, and reliable artificial intelligence systems.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post RA-ISF: An Artificial Intelligence Framework Designed to Enhance Retrieval Augmentation Effects and Improve Performance in Open-Domain Question Answering appeared first on MarkTechPost.

This Machine Learning Research from ServiceNow Proposes WorkArena and …

In the digital age, the interfaces individuals engage with software form the backbone of interaction with technology. Despite significant strides toward user-friendly design, individuals frequently need help with the complexity or repetitiveness of certain tasks. This presents a substantial barrier to efficiency and inclusivity in the digital workspace, highlighting the critical need for innovative solutions to streamline these interactions, making technology more accessible and intuitive for everyone.

Central to the digital workspace’s challenges is the issue of software systems prioritizing comprehensive functionality at the expense of user experience. Such environments often lead to steep learning curves and decreased productivity, especially within enterprise software. The necessity for a solution becomes apparent, a solution that not only simplifies the execution of repetitive tasks but also makes the digital workspace accessible to a wider audience, including those with disabilities.

Automating tasks within software systems has relied heavily on Application Programming Interfaces (APIs). While these have facilitated some programmatic interaction with software, they often fall short in transparency and universal accessibility. This gap in the automation landscape calls for a paradigm shift towards automated assistants that engage directly with user interfaces (UIs), offering a more transparent and flexible approach to automation.

Researchers from ServiceNow Research, Mila-Quebec AI Research Institute, Polytechnique Montreal, McGill University, and Universite de Montreal stand out as two innovative platforms. They harness the power of large language models (LLMs) to automate web-based tasks. WorkArena sets a new standard with its benchmark of 29 diverse tasks on the widely-used ServiceNow platform, providing a robust framework for evaluating the effectiveness of UI assistants. On the other hand, BrowserGym is a unique environment tailored for developing and assessing web agents. It has many actions and multimodal observations to support complex web interactions, making it a game-changer in the field.

The true power of this new approach lies in the assistants’ direct manipulation of UIs. This strategy not only enhances transparency and adaptability but also puts control in the hands of the users. They can now dictate the level of automation, ranging from simple assistance to full task execution. This level of versatility is akin to the varying degrees of automation seen in autonomous vehicles, highlighting the transformative potential of UI assistants in reshaping the landscape of knowledge work.

While current agents have shown promise in preliminary evaluations, achieving comprehensive task automation remains a formidable challenge. The performance gap highlighted in complex UI interaction tasks underscores the need for continued research and innovation. This ongoing commitment is crucial for unlocking UI assistants’ full potential and revolutionizing how individuals interact with enterprise software.

In conclusion, integrating UI assistants into the fabric of digital workspaces is poised to revolutionize interaction with technology. WorkArena and BrowserGym are two innovative platforms introduced to leverage LLMs to automate web-based tasks. By automating mundane tasks, these tools promise to boost productivity, improve the user experience, and ensure greater accessibility. This summary encapsulates the research’s exploration of the challenges, proposed solutions, and the promising yet demanding journey toward fully automated digital workspaces.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

How capable are web agents at solving knowledge work tasks? Are LLMs up to the challenge? Introducing WorkArena: a benchmark where agents meet the world 𝘸𝘪𝘭𝘥 web of enterprise software Paper: https://t.co/9WvM7xOWyCWebsite: https://t.co/lfNj3LnzmV 1/7 pic.twitter.com/8Ll1rmpfKI— Alexandre Lacoste (@alex_lacoste_) March 14, 2024

The post This Machine Learning Research from ServiceNow Proposes WorkArena and BrowserGym: A Leap Towards Automating Daily Workflows with AI appeared first on MarkTechPost.

Optimize price-performance of LLM inference on NVIDIA GPUs using the A …

NVIDIA NIM microservices now integrate with Amazon SageMaker, allowing you to deploy industry-leading large language models (LLMs) and optimize model performance and cost. You can deploy state-of-the-art LLMs in minutes instead of days using technologies such as NVIDIA TensorRT, NVIDIA TensorRT-LLM, and NVIDIA Triton Inference Server on NVIDIA accelerated instances hosted by SageMaker.
NIM, part of the NVIDIA AI Enterprise software platform listed on AWS marketplace, is a set of inference microservices that bring the power of state-of-the-art LLMs to your applications, providing natural language processing (NLP) and understanding capabilities, whether you’re developing chatbots, summarizing documents, or implementing other NLP-powered applications. You can use pre-built NVIDIA containers to host popular LLMs that are optimized for specific NVIDIA GPUs for quick deployment or use NIM tools to create your own containers.
In this post, we provide a high-level introduction to NIM and show how you can use it with SageMaker.
An introduction to NVIDIA NIM
NIM provides optimized and pre-generated engines for a variety of popular models for inference. These microservices support a variety of LLMs, such as Llama 2 (7B, 13B, and 70B), Mistral-7B-Instruct, Mixtral-8x7B, NVIDIA Nemotron-3 22B Persona, and Code Llama 70B, out of the box using pre-built NVIDIA TensorRT engines tailored for specific NVIDIA GPUs for maximum performance and utilization. These models are curated with the optimal hyperparameters for model-hosting performance for deploying applications with ease.
If your model is not in NVIDIA’s set of curated models, NIM offers essential utilities such as the Model Repo Generator, which facilitates the creation of a TensorRT-LLM-accelerated engine and a NIM-format model directory through a straightforward YAML file. Furthermore, an integrated community backend of vLLM provides support for cutting-edge models and emerging features that may not have been seamlessly integrated into the TensorRT-LLM-optimized stack.
In addition to creating optimized LLMs for inference, NIM provides advanced hosting technologies such as optimized scheduling techniques like in-flight batching, which can break down the overall text generation process for an LLM into multiple iterations on the model. With in-flight batching, rather than waiting for the whole batch to finish before moving on to the next set of requests, the NIM runtime immediately evicts finished sequences from the batch. The runtime then begins running new requests while other requests are still in flight, making the best use of your compute instances and GPUs.
Deploying NIM on SageMaker
NIM integrates with SageMaker, allowing you to host your LLMs with performance and cost optimization while benefiting from the capabilities of SageMaker. When you use NIM on SageMaker, you can use capabilities such as scaling out the number of instances to host your model, performing blue/green deployments, and evaluating workloads using shadow testing—all with best-in-class observability and monitoring with Amazon CloudWatch.
Conclusion
Using NIM to deploy optimized LLMs can be a great option for both performance and cost. It also helps make deploying LLMs effortless. In the future, NIM will also allow for Parameter-Efficient Fine-Tuning (PEFT) customization methods like LoRA and P-tuning. NIM also plans to have LLM support by supporting Triton Inference Server, TensorRT-LLM, and vLLM backends.
We encourage you to learn more about NVIDIA microservices and how to deploy your LLMs using SageMaker and try out the benefits available to you. NIM is available as a paid offering as part of the NVIDIA AI Enterprise software subscription available on AWS Marketplace.
In the near future, we will post an in-depth guide for NIM on SageMaker.

About the authors
James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends.You can find him on LinkedIn.
Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.
Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.
Nikhil Kulkarni is a software developer with AWS Machine Learning, focusing on making machine learning workloads more performant on the cloud, and is a co-creator of AWS Deep Learning Containers for training and inference. He’s passionate about distributed Deep Learning Systems. Outside of work, he enjoys reading books, fiddling with the guitar, and making pizza.
Harish Tummalacherla is Software Engineer with Deep Learning Performance team at SageMaker. He works on performance engineering for serving large language models efficiently on SageMaker. In his spare time, he enjoys running, cycling and ski mountaineering.
Eliuth Triana Isaza is a Developer Relations Manager at NVIDIA empowering Amazon’s AI MLOps, DevOps, Scientists and AWS technical experts to master the NVIDIA computing stack for accelerating and optimizing Generative AI Foundation models spanning from data curation, GPU training, model inference and production deployment on AWS GPU instances. In addition, Eliuth is a passionate mountain biker, skier, tennis and poker player.
Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.
Kshitiz Gupta is a Solutions Architect at NVIDIA. He enjoys educating cloud customers about the GPU AI technologies NVIDIA has to offer and assisting them with accelerating their machine learning and deep learning applications. Outside of work, he enjoys running, hiking and wildlife watching.

Fine-tune Code Llama on Amazon SageMaker JumpStart

Today, we are excited to announce the capability to fine-tune Code Llama models by Meta using Amazon SageMaker JumpStart. The Code Llama family of large language models (LLMs) is a collection of pre-trained and fine-tuned code generation models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned Code Llama models provide better accuracy and explainability over the base Code Llama models, as evident on its testing against HumanEval and MBPP datasets. You can fine-tune and deploy Code Llama models with SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK. Fine-tuning of Llama models is based on the scripts provided in the llama-recipes GitHub repo from Meta using PyTorch FSDP, PEFT/LoRA, and Int8 quantization techniques.
In this post, we walk through how to fine-tune Code Llama pre-trained models via SageMaker JumpStart through a one-click UI and SDK experience available in the following GitHub repository.
What is SageMaker JumpStart
With SageMaker JumpStart, machine learning (ML) practitioners can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.
What is Code Llama
Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets and sampling more data from that same dataset for longer. Code Llama features enhanced coding capabilities. It can generate code and natural language about code, from both code and natural language prompts (for example, “Write me a function that outputs the Fibonacci sequence”). You can also use it for code completion and debugging. It supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (JavaScript), C#, Bash, and more.
Why fine-tune Code Llama models
Meta published Code Llama performance benchmarks on HumanEval and MBPP for common coding languages such as Python, Java, and JavaScript. The performance of Code Llama Python models on HumanEval demonstrated varying performance across different coding languages and tasks ranging from 38% on 7B Python model to 57% on 70B Python models. In addition, fine-tuned Code Llama models on SQL programming language have shown better results, as evident in SQL evaluation benchmarks. These published benchmarks highlight the potential benefits of fine-tuning Code Llama models, enabling better performance, customization, and adaptation to specific coding domains and tasks.
No-code fine-tuning via the SageMaker Studio UI
To start fine-tuning your Llama models using SageMaker Studio, complete the following steps:

On the SageMaker Studio console, choose JumpStart in the navigation pane.

You will find listings of over 350 models ranging from open source and proprietary models.

Search for Code Llama models.

If you don’t see Code Llama models, you can update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps. You can also find other model variants by choosing Explore all Code Generation Models or searching for Code Llama in the search box.

SageMaker JumpStart currently supports instruction fine-tuning for Code Llama models. The following screenshot shows the fine-tuning page for the Code Llama 2 70B model.

For Training dataset location, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning.
Set your deployment configuration, hyperparameters, and security settings for fine-tuning.
Choose Train to start the fine-tuning job on a SageMaker ML instance.

We discuss the dataset format you need prepare for instruction fine-tuning in the next section.

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart.

The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

Fine-tune via the SageMaker Python SDK
In this section, we demonstrate how to fine-tune Code LIama models using the SageMaker Python SDK on an instruction-formatted dataset. Specifically, the model is fine-tuned for a set of natural language processing (NLP) tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts.
Complete the following steps to complete your fine-tuning job. You can get the entire fine-tuning code from the GitHub repository.
First, let’s look at the dataset format required for the instruction fine-tuning. The training data should be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a data sample. All training data must be in a single folder. However, it can be saved in multiple .jsonl files. The following is a sample in JSON lines format:

{
‘system_prompt’: ‘a chat’,
‘question’: ‘Please focus on the efficiency of this problem and provide code in python:nYou are given two strings `s` and `t` consisting of only lowercase English letters.nnReturn _the minimum number of characters that need to be appended to the end of_ `s` _so that_ `t` _becomes a **subsequence** of_ `s`.nnA **subsequence** is a string that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.nn**Example 1:**nn**Input:** s = “coaching “, t = “coding “n**Output:** 4n**Explanation:** Append the characters “ding ” to the end of s so that s = “coachingding “.nNow, t is a subsequence of s ( “**co**aching**ding** “).nIt can be shown that appending any 3 characters to the end of s will never make t a subsequence.nn**Example 2:**nn**Input:** s = “abcde “, t = “a “n**Output:** 0n**Explanation:** t is already a subsequence of s ( “**a**bcde “).nn**Example 3:**nn**Input:** s = “z “, t = “abcde “n**Output:** 5n**Explanation:** Append the characters “abcde ” to the end of s so that s = “zabcde “.nNow, t is a subsequence of s ( “z**abcde** “).nIt can be shown that appending any 4 characters to the end of s will never make t a subsequence.nn**Constraints:**nn* `1 <= s.length, t.length <= 105`n* `s` and `t` consist only of lowercase English letters.’,
‘response’: ‘### Explanationn The algorithm maintains two pointers, `i` and `j`, to iterate through the given strings `s` and `t`. The pointer `i` iterates through the string `s` and the pointer `j` iterates through the string `t`. While iterating through `s`, if we find a character that matches the current character in `t`, we increment the `j` pointer. This is because the matched character can be a part of the subsequence we are looking for. The iteration stops when either `i` reaches the end of `s` or `j` reaches the end of `t`.nnAt the end, we subtract the index `j` from the length of `t`. This gives us the minimum number of characters that need to be appended to the end of `s` so that `t` becomes a subsequence of `s`.nn### Pseudocode:n“`nMIN-ADDITION(s, t)n s_len = LENGTH(s)n t_len = LENGTH(t)n i = 1n j = 1n WHILE i <= s_len AND j <= t_lenn IF s[i] == t[j]n j = j + 1n ENDIFn i = i + 1n ENDWHILEn RETURN t_len – j + 1nENDMIN-ADDITIONn“`nn### Coden“`pythonndef min_addition(s: str, t: str) -> int:n s_len = len(s)n t_len = len(t)n i, j = 0, 0n while i < s_len and j < t_len:n if s[i] == t[j]:n j += 1n i += 1n return t_len – jn“`n’
}

The training folder can contain a template.json file describing the input and output formats. The following is an example template:

{
“prompt”: “{system_prompt} ### Input: {question}”,
“completion”: “{response}”
}

To match the template, each sample in the JSON lines files must include system_prompt, question, and response fields. In this demonstration, we use the Dolphin Coder dataset from Hugging Face.
After you prepare the dataset and upload it to the S3 bucket, you can start fine-tuning using the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator

model_id = “meta-textgeneration-llama-codellama-7b”
model_version = “*”
train_data_location = f”s3://{your_own_bucket_hosting_training_data}/” # training data in s3 bucket

estimator = JumpStartEstimator(
model_id=model_id,
model_version=model_version,
hyperparameters= hyperparameters,
environment={
“accept_eula”: “false”
}, # please change `accept_eula` to be `true` to accept EULA.
)

estimator.fit({“training”: train_data_location})

You can deploy the fine-tuned model directly from the estimator, as shown in the following code. For details, see the notebook in the GitHub repository.

finetuned_predictor = estimator.deploy()

Fine-tuning techniques
Language models such as Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly high CUDA memory. Furthermore, training these models can be very slow due to the size of the model. Therefore, for efficient fine-tuning, we use the following optimizations:

Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. With this method, you freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 7 billion parameters for Llama 2 7B, you can fine-tune less than 1% of the parameters. This helps in significant reduction of the memory requirement because you only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps in reduction of training time as well as the cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
Int8 quantization – Even with optimizations such as LoRA, models such as Llama 70B are still too big to train. To decrease the memory footprint during training, you can use Int8 quantization during training. Quantization typically reduces the precision of floating point data types. Although this decreases the memory required to store model weights, it degrades the performance due to loss of information. Int8 quantization uses only a quarter precision but doesn’t incur degradation of performance because it doesn’t simply drop the bits. It rounds the data from one type to the another. To learn about Int8 quantization, refer to LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale.
Fully Sharded Data Parallel (FSDP) – This is a type of data-parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance via communication and computation overlapping during training.

The following table summarizes the details of each model with different settings.

Model
Default Setting
LORA + FSDP
LORA + No FSDP
Int8 Quantization + LORA + No FSDP

Code Llama 2 7B
LORA + FSDP
Yes
Yes
Yes

Code Llama 2 13B
LORA + FSDP
Yes
Yes
Yes

Code Llama 2 34B
INT8 + LORA + NO FSDP
No
No
Yes

Code Llama 2 70B
INT8 + LORA + NO FSDP
No
No
Yes

Fine-tuning of Llama models is based on scripts provided by the following GitHub repo.
Supported hyperparameters for training
Code Llama 2 fine-tuning supports a number of hyperparameters, each of which can impact the memory requirement, training speed, and performance of the fine-tuned model:

epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 1e-4.
instruction_tuned – Whether to instruction-train the model or not. Must be True or False. Default is False.
per_device_train_batch_size – The batch size per GPU core/CPU for training. Must be a positive integer. Default is 4.
per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default is 1.
max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
validation_split_ratio – If validation channel is none, the ratio of the train-validation split from the train data must be between 0–1. Default is 0.2.
train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.
lora_r – Lora R. Must be a positive integer. Default is 8.
lora_alpha – Lora Alpha. Must be a positive integer. Default is 32
lora_dropout – Lora Dropout. must be a positive float between 0 and 1. Default is 0.05.
int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for 7B and 13B is False. Default for 70B is True.
enable_fsdp – If True, training uses FSDP. Default for 7B and 13B is True. Default for 70B is False. Note that int8_quantization is not supported with FSDP.

When choosing the hyperparameters, consider the following:

Setting int8_quantization=True decreases the memory requirement and leads to faster training.
Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

Supported instance types for training
The following table summarizes the supported instance types for training different models.

Model
Default Instance Type
Supported Instance Types

Code Llama 2 7B
ml.g5.12xlarge
ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge, ml.g4dn.12xlarge

Code Llama 2 13B
ml.g5.12xlarge
ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge, ml.g4dn.12xlarge

Code Llama 2 70B
ml.g5.48xlarge
ml.g5.48xlarge ml.p4d.24xlarge

When choosing the instance type, consider the following:

G5 instances provide the most efficient training among the instance types supported. Therefore, if you have G5 instances available, you should use them.
Training time largely depends on the amount of the number of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the cheaper instance for training (ml.g5.2xlarge).
When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job will consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.
Evaluation
Evaluation is an important step to assess the performance of fine-tuned models. We present both qualitative and quantitative evaluations to show improvement of fine-tuned models over non-fine-tuned ones. In qualitative evaluation, we show an example response from both fine-tuned and non-fine-tuned models. In quantitative evaluation, we use HumanEval, a test suite developed by OpenAI to generate Python code to test the abilities of producing correct and accurate results. The HumanEval repository is under MIT license. We fine-tuned Python variants of all Code LIama models over different sizes (Code LIama Python 7B, 13B, 34B, and 70B on the Dolphin Coder dataset), and present the evaluation results in the following sections.
Qualitatively evaluation
With your fine-tuned model deployed, you can start using the endpoint to generate code. In the following example, we present responses from both base and fine-tuned Code LIama 34B Python variants on a test sample in the Dolphin Coder dataset:

alpaca_format = “””{system_prompt}

### Input:
{question}

### Response:n”””

system=”You are a coding AI”
prompt = “””Please explain the reasoning behind the following solution and
provide code in python: Given a string `s`, return the longest palindromic
substring in `s`. **Example 1:** **Input:** s = “babad ” **Output:** “bab ”
**Explanation:** “aba ” is also a valid answer. **Example 2:**
**Input:** s = “cbbd ” **Output:** “bb ”
**Constraints:** * `1 <= s.length <= 1000` * `s` consist of only digits
and English letters.”””

payload = json.dumps({
“inputText”: alpaca_format.format(system_prompt=system, question=prompt),
“textGenerationConfig”: {
“maxTokenCount”: 2048,
“temperature”:0.2
}})

finetuned_response = finetuned_predictor.predict(payload)
finetuned_response[“generated_text”]

The fine-tuned Code Llama model, in addition to providing the code for the preceding query, generates a detailed explanation of the approach and a pseudo code.
Code Llama 34b Python Non-Fine-Tuned Response:

### Code
def longestPalindrome(s):
# Fill this in.
longest = “”
for i in range(len(s)):
for j in range(len(s), i, -1):
if len(longest) >= j-i:
break
elif s[i:j] == s[i:j][::-1]:
longest = s[i:j]
break
return longest

print(longestPalindrome(“abaxyzzyxf”))
# axyzzyx
###Reference:
– [https://www.geeksforgeeks.org/longest-palindrome-substring-set-1/]
(https://www.geeksforgeeks.org/longest-palindrome-substring-set-1/)

Code Llama 34B Python Fine-Tuned Response

### Explanation
The algorithm uses dynamic programming to find the longest palindromic substring.
It initializes a 2D boolean table `dp` with the same dimensions as the input string `s`.
The `dp[i][j]` value is set to `true` if the substring from index `i` to index `j` is a palindrome.

The algorithm then iterates through the string in a nested loop, starting from the first character to the last.
It checks if the substring from index `i` to index `j` is a palindrome.
If it is, it sets the `dp[i][j]` value to `true`.
If the substring is of length 2 or 3, it sets the `dp[i][j]` value to `true` as well.

After the nested loop, the algorithm iterates through the `dp` table to find the longest palindromic substring.
It keeps track of the maximum length and the starting and ending indices of the substring.

Finally, the algorithm returns the substring from the starting index to the ending index.

### Code
“`python
def longestPalindrome(s: str) -> str:
n = len(s)
dp = [[False] * n for _ in range(n)]

for i in range(n):
dp[i][i] = True
if i + 1 < n:
dp[i][i + 1] = s[i] == s[i + 1]

max_len = 1
start = 0
end = 0

for i in range(n – 1):
for j in range(i + 1, n):
if j – i == 1 or dp[i + 1][j – 1]:
dp[i][j] = s[i] == s[j]
if j – i + 1 > max_len:
max_len = j – i + 1
start = i
end = j

return s[start:end + 1]
“`

Ground Truth

### Explanation
1. Initialize `start` and `maxLength` for result substring.
2. Iterate through the given string `s` using the index `i`.
3. For each index `i`, create two pointers `l` and `r` starting at `i`.
4. Check if there’s a consecutive sequence of identical characters, increment the right pointer `r` until the end of the sequence is reached.
5. Update the index `i` to the current value of `r`.
6. Expand the pointers `l` and `r` outwards to find the longest palindromic substring, checking that characters on both sides are equal.
7. If the current length of the substring is greater than `maxLength`, update `start` and `maxLength`.
8. Return the longest palindromic substring using the `start` and `maxLength`.

### Code
“`python
def longest_palindromic_substring(s: str) → str:
n = len(s)
if n == 0:
return “”

start, max_length = 0, 1
for i in range(n):
l, r = i, i
while r < n – 1 and s[r] == s[r + 1]:
r += 1
i = r
while l > 0 and r < n – 1 and s[l – 1] == s[r + 1]:
l -= 1
r += 1
length = r – l + 1
if length > max_length:
start, max_length = l, length
return s[start:start + max_length]
“`

Interestingly, our fine-tuned version of Code Llama 34B Python provides a dynamic programming-based solution to the longest palindromic substring, which is different from the solution provided in the ground truth from the selected test example. Our fine-tuned model reasons and explains the dynamic programming-based solution in detail. On the other hand, the non-fine-tuned model hallucinates potential outputs right after the print statement (shown in the left cell) because the output axyzzyx is not the longest palindrome in the given string. In terms of time complexity, the dynamic programming solution is generally better than the initial approach. The dynamic programming solution has a time complexity of O(n^2), where n is the length of the input string. This is more efficient than the initial solution from the non-fine-tuned model, which also had a quadratic time complexity of O(n^2) but with a less optimized approach.
This looks promising! Remember, we only fine-tuned the Code LIama Python variant with 10% of the Dolphin Coder dataset. There is a lot more to explore!
Despite of thorough instructions in the response, we still need examine the correctness of the Python code provided in the solution. Next, we use an evaluation framework called Human Eval to run integration tests on the generated response from Code LIama to systematically examine its quality.
Quantitative evaluation with HumanEval
HumanEval is an evaluation harness for evaluating an LLM’s problem-solving capabilities on Python-based coding problems, as described in the paper Evaluating Large Language Models Trained on Code. Specifically, it consists of 164 original Python-based programming problems that assess a language model’s ability to generate code based on provided information like function signature, docstring, body, and unit tests.
For each Python-based programming question, we send it to a Code LIama model deployed on a SageMaker endpoint to get k responses. Next, we run each of the k responses on the integration tests in the HumanEval repository. If any response of the k responses passes the integration tests, we count that test case succeed; otherwise, failed. Then we repeat the process to calculate the ratio of successful cases as the final evaluation score named pass@k. Following standard practice, we set k as 1 in our evaluation, to only generate one response per question and test whether it passes the integration test.
The following is a sample code to use HumanEval repository. You can access the dataset and generate a single response using a SageMaker endpoint. For details, see the notebook in the GitHub repository.

%pip3 install human_eval
import json
from human_eval.evaluation import evaluate_functional_correctness
from human_eval.data import write_jsonl, read_problems
from tqdm import tqdm
problems = read_problems()

num_samples_per_task = 1 # value k: number of responses for each question
samples = [
dict(task_id=task_id, completion=generate_one_completion(problems[task_id][“prompt”]))
for task_id in tqdm(problems)
for _ in range(num_samples_per_task)
]
write_jsonl(“samples.jsonl”, samples)

evaluate_functional_correctness(‘./samples.jsonl’)

The following table shows the improvements of the fine-tuned Code LIama Python models over the non-fine-tuned models across different model sizes. To ensure correctness, we also deploy the non-fine-tuned Code LIama models in SageMaker endpoints and run through Human Eval evaluations. The pass@1 numbers (the first row in the following table) match the reported numbers in the Code Llama research paper. The inference parameters are consistently set as “parameters”: {“max_new_tokens”: 384, “temperature”: 0.2}.
As we can see from the results, all the fine-tuned Code LIama Python variants show significant improvement over the non-fine-tuned models. In particular, Code LIama Python 70B outperforms the non-fine-tuned model by approximately 12%.

.
7B Python
13B Python
34B
34B Python
70B Python

Pre-trained model performance (pass@1)
38.4
43.3
48.8
53.7
57.3

Fine-tuned model performance (pass@1)
45.12
45.12
59.1
61.5
69.5

Now you can try fine-tuning Code LIama models on your own dataset.
Clean up
If you decide that you no longer want to keep the SageMaker endpoint running, you can delete it using AWS SDK for Python (Boto3), AWS Command Line Interface (AWS CLI), or SageMaker console. For more information, see Delete Endpoints and Resources. Additionally, you can shut down the SageMaker Studio resources that are no longer required.
Conclusion
In this post, we discussed fine-tuning Meta’s Code Llama 2 models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out. As we can see from these results of fine-tuning three models over two datasets, fine-tuning improves summarization compared to non-fine-tuned models. As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

About the Authors
Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.
Vishaal Yalamanchali is a Startup Solutions Architect working with early-stage generative AI, robotics, and autonomous vehicle companies. Vishaal works with his customers to deliver cutting-edge ML solutions and is personally interested in reinforcement learning, LLM evaluation, and code generation. Prior to AWS, Vishaal was an undergraduate at UCI, focused on bioinformatics and intelligent systems.
Meenakshisundaram Thandavarayan works for AWS as an AI/ ML Specialist. He has a passion to design, create, and promote human-centered data and analytics experiences. Meena focuses on developing sustainable systems that deliver measurable, competitive advantages for strategic customers of AWS. Meena is a connector and design thinker, and strives to drive businesses to new ways of working through innovation, incubation, and democratization.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

This AI Paper from the University of Oxford Proposes Magi: A Machine L …

In storytelling, Japanese comics, known as Manga, have carved out a significant niche, captivating audiences worldwide with their intricate plots and distinctive art style. Despite their global appeal, a crucial segment of potential readers remains largely underserved: individuals with visual impairments. For them, the visual-centric nature of Manga creates an inaccessible realm despite the rich narratives within these pages.

The primary challenge lies in translating visually rich content into a format accessible to those who cannot see it. Earlier Manga relies heavily on intertwined visual elements and text, making the experience inherently visual. This visual reliance means that individuals with visual impairments often cannot engage with the stories, characters, and worlds created by Manga artists.

Current solutions to make Manga accessible are far from ideal, primarily because they rely on manual transcriptions or audio descriptions, which are labor-intensive and cannot scale effectively. This gap highlights a critical need for a more efficient, automated method to unlock Manga’s potential for all audiences, irrespective of their visual capabilities.

A research team at the University of Oxford has developed an advanced tool named Magi, representing a breakthrough in making Manga accessible to visually impaired readers. Magi is a gateway to stories previously locked behind visual barriers, offering all readers a new level of engagement.

The research method can be mentioned around the following points:

Magi’s Approach: At its core, Magi utilizes a comprehensive model to navigate the Manga pages intelligently. It identifies and interprets components such as panels, characters, and text blocks.

Character Clustering: The Magi’s remarkable feature is its ability to recognize and cluster characters, distinguishing them based on their identities across the narrative.

Dialogue Association: Beyond character recognition, Magi adeptly associate dialogues with their respective speakers, preserving the narrative’s integrity.

Reading Order: It orders text boxes to reflect the correct sequence, mirroring the intended reading experience and ensuring the story’s delivery coherence.

Through rigorous testing, Magi demonstrated superior capabilities in detecting and clustering characters and associating text with the correct speakers, outperforming existing methods. This efficiency showcases the tool’s precision and its potential to transform Manga reading into an inclusive activity that visually impaired individuals can enjoy.

This research and development effort underscores a significant advancement in accessibility technologies. By leveraging sophisticated algorithms and machine learning, Magi opens up a previously inaccessible world of Manga to those who cannot see. The implications of this innovation extend beyond Manga. It sets a precedent for how technology can bridge gaps in entertainment, making it universally accessible.

In conclusion, developing the Magi helps democratize access to cultural and entertainment content. It underscores a shift towards inclusivity, where barriers to enjoyment are dismantled, and stories become universally accessible. This research not only highlights the potential of artificial intelligence in enhancing accessibility but also serves as a call to action for further innovations in this field. As technology evolves, the hope is that more doors will open, allowing everyone to explore the vast and varied landscapes of entertainment and culture regardless of physical limitations. The journey of the Magi from concept to implementation illuminates the path toward a more inclusive world where the joy of stories knows no bounds.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post This AI Paper from the University of Oxford Proposes Magi: A Machine Learning Tool to Make Manga Accessible to the Visually Impaired appeared first on MarkTechPost.

LocalMamba: Revolutionizing Visual Perception with Innovative State Sp …

In recent years, the field of computer vision has witnessed remarkable progress, pushing the boundaries of how machines interpret complex visual information. One pivotal challenge in this domain is precisely interpreting intricate image details, which demands a nuanced understanding of global and local visual cues. Traditional models, including Convolutional Neural Networks (CNNs) and Vision Transformers, have significantly progressed. Yet, they often need to work effectively to balance the detailed local content with the broader image context, an essential aspect for tasks requiring fine-grained visual discrimination.

Researchers from SenseTime Research, The University of Sydney, and the University of Science and Technology of China presented LocalMamba, which was designed to refine visual data processing. By adopting a unique scanning strategy that divides images into distinct windows, LocalMamba allows for a more focused examination of local details while maintaining an awareness of the image’s overall structure. This strategic division enables the model to navigate through the complexities of visual data more efficiently, ensuring that both broad and minute details are captured with equal precision.

LocalMamba’s innovative methodology extends beyond traditional scanning techniques by integrating a dynamic scanning direction search. This search optimizes the model’s focus, allowing it to highlight crucial features within each window adaptively. Such adaptability ensures that LocalMamba understands the intricate relationships between image elements, setting it apart from conventional methods. The superiority of LocalMamba is underscored through rigorous testing across various benchmarks, where it demonstrates marked performance improvements.LocalMamba significantly surpasses existing models in image classification tasks, showcasing its ability to deliver nuanced and comprehensive image analysis.

LocalMamba’s versatility is evident across a spectrum of practical applications, from object detection to semantic segmentation. In each of these areas, LocalMamba sets new standards of accuracy and efficiency. Its success harmonizes the capture of local image features with a global understanding. This balance is crucial for applications requiring detailed recognition capabilities, such as autonomous driving, medical imaging, and content-based image retrieval.

LocalMamba’s approach opens up new avenues for future research in visual state space models, highlighting the untapped potential of optimizing scanning directions. By effectively leveraging local scanning within distinct windows, LocalMamba enhances the model’s capacity to interpret visual data, offering insights into how machines can better mimic human visual perception. This breakthrough suggests new avenues for exploration in the quest to develop more intelligent and capable visual processing systems.

In conclusion, LocalMamba marks a significant leap forward in the evolution of computer vision models. Its core innovation lies in the ability to intricately analyze visual data by emphasizing local details without compromising the global context. This dual focus ensures a comprehensive understanding of images, facilitating superior performance across various tasks. The research team’s contributions extend beyond the immediate benefits of improved accuracy and efficiency. They offer a blueprint for future advancements in the field, demonstrating the critical role of scanning mechanisms in enhancing the capabilities of visual processing models. LocalMamba sets new benchmarks in computer vision and inspires continued innovation toward more intelligent and wise machine vision systems.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Discord Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel and 38k+ ML SubReddit
The post LocalMamba: Revolutionizing Visual Perception with Innovative State Space Models for Enhanced Local Dependency Capture appeared first on MarkTechPost.

The Dawn of Grok-1: A Leap Forward in AI Accessibility

In an era where the democratization of artificial intelligence technology stands as a pivotal turning point for innovation across industries, xAI has embarked on a monumental stride. Today marks the open release of Grok-1, a behemoth in the landscape of AI, wielding a staggering 314 billion parameters. This Mixture-of-Experts model, which emerged from the fervent efforts of xAI’s dedicated team, represents not just a leap in AI capabilities but a testament to the commitment towards open science and technology.

Unveiling Grok-1: The Architectural Marvel

At the heart of Grok-1 lies a Mixture-of-Experts (MoE) architecture, a paradigm designed to harness the collective intelligence of an ensemble of models, each with its unique expertise. With 314 billion parameters, Grok-1 stands among the largest language models ever constructed. Unlike traditional models that utilize every parameter for each task, Grok-1 operates on a principle where only 25% of its weights are active for a given token, ensuring unparalleled efficiency and agility in processing vast swathes of data.

Crafted from Scratch: The xAI Odyssey

The journey to Grok-1’s inception is a narrative of innovation and perseverance. Trained from scratch by xAI, this colossal model was meticulously assembled using a custom training stack built upon JAX and Rust, a testament to the cutting-edge approach adopted by xAI’s engineers. Completed in October 2023, Grok-1’s training phase was a rigorous process that not only sculpted its capabilities but set a new benchmark for AI development.

Open Science Ethos: Empowering the Community

In an unprecedented move, xAI has released both the weights and the architecture of Grok-1 under the Apache 2.0 license, a gesture that underscores a profound commitment to open science. This decision paves the way for researchers, developers, and entrepreneurs worldwide to explore, enhance, and tailor Grok-1 to a myriad of applications. From academic research to industry innovation, the potential for Grok-1 to catalyze progress is boundless.

Navigating the Future with Grok-1

As we stand on the cusp of a new frontier in AI, Grok-1 emerges not just as a tool but as a beacon of open innovation. Its release invites us to reimagine the possibilities of language models in solving complex challenges, understanding human language with unprecedented depth, and driving forward the engines of creativity and efficiency. For those eager to embark on this journey, the path has been laid clear, with instructions available on GitHub for anyone ready to wield the power of Grok-1.

In the unfolding narrative of artificial intelligence, the release of Grok-1 by xAI marks a chapter where the walls guarding knowledge and technology crumble, heralding an age of openness and collaboration. As we explore the vast expanse of possibilities unlocked by Grok-1, one thing is clear: the future of AI is not just about the machines we build but the community we foster around them.
The post The Dawn of Grok-1: A Leap Forward in AI Accessibility appeared first on MarkTechPost.