Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canva …

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Amazon SageMaker Canvas is a low-code no-code visual interface to build and deploy ML models without the need to write code. Based on customers’ feedback, we have combined the advanced ML-specific data preparation capabilities of SageMaker Data Wrangler inside SageMaker Canvas, providing users with an end-to-end, no-code workspace for preparing data, and building and deploying ML models.
By abstracting away much of the complexity of the ML workflow, SageMaker Canvas enables you to prepare data, then build or use a model to generate highly accurate business insights without writing code. Additionally, preparing data in SageMaker Canvas offers many enhancements, such as page loads up to 10 times faster, a natural language interface for data preparation, the ability to view the data size and shape at every step, and improved replace and reorder transforms to iterate on a data flow. Finally, you can one-click create a model in the same interface, or create a SageMaker Canvas dataset to fine-tune foundation models (FMs).
This post demonstrates how you can bring your existing SageMaker Data Wrangler flows—the instructions created when building data transformations—from SageMaker Studio Classic to SageMaker Canvas. We provide an example of moving files from SageMaker Studio Classic to Amazon Simple Storage Service (Amazon S3) as an intermediate step before importing them into SageMaker Canvas.
Solution overview
The high-level steps are as follows:

Open a terminal in SageMaker Studio and copy the flow files to Amazon S3.
Import the flow files into SageMaker Canvas from Amazon S3.

Prerequisites
In this example, we use a folder called data-wrangler-classic-flows as a staging folder for migrating flow files to Amazon S3. It is not necessary to create a migration folder, but in this example, the folder was created using the file system browser portion of SageMaker Studio Classic. After you create the folder, take care to move and consolidate relevant SageMaker Data Wrangler flow files together. In the following screenshot, three flow files necessary for migration have been moved into the folder data-wrangler-classic-flows, as seen in the left pane. One of these files, titanic.flow, is opened and visible in the right pane.

Copy flow files to Amazon S3
To copy the flow files to Amazon S3, complete the following steps:

To open a new terminal in SageMaker Studio Classic, on the File menu, choose Terminal.
With a new terminal open, you can supply the following commands to copy your flow files to the Amazon S3 location of your choosing (replacing NNNNNNNNNNNN with your AWS account number):

cd data-wrangler-classic-flows
target=”s3://sagemaker-us-west-2-NNNNNNNNNNNN/data-wrangler-classic-flows/”
aws s3 sync . $target –exclude “*.*” –include “*.flow”

The following screenshot shows an example of what the Amazon S3 sync process should look like. You will get a confirmation after all files are uploaded. You can adjust the preceding code to meet your unique input folder and Amazon S3 location needs. If you don’t want to create a folder, when you enter the terminal, simply skip the change directory (cd) command, and all flow files on your entire SageMaker Studio Classic file system will be copied to Amazon S3, regardless of origin folder.

After you upload the files to Amazon S3, you can validate that they have been copied using the Amazon S3 console. In the following screenshot, we see the original three flow files, now in an S3 bucket.

Import Data Wrangler flow files into SageMaker Canvas
To import the flow files into SageMaker Canvas, complete the following steps:

On the SageMaker Studio console, choose Data Wrangler in the navigation pane.
Choose Import data flows.
For Select a data source, choose Amazon S3.
For Input S3 endpoint, enter the Amazon S3 location you used earlier to copy files from SageMaker Studio to Amazon S3, then choose Go. You can also navigate to the Amazon S3 location using the browser below.
Select the flow files to import, then choose Import.

After you import the files, the SageMaker Data Wrangler page will refresh to show the newly imported files, as shown in the following screenshot.

Use SageMaker Canvas for data transformation with SageMaker Data Wrangler
Choose one of the flows (for this example, we choose titanic.flow) to launch the SageMaker Data Wrangler transformation.

Now you can add analyses and transformations to the data flow using a visual interface (Accelerate data preparation for ML in Amazon SageMaker Canvas) or natural language interface (Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas).
When you’re happy with the data, choose the plus sign and choose Create model, or choose Export to export the dataset to build and use ML models.

Alternate migration method
This post has provided guidance on using Amazon S3 to migrate SageMaker Data Wrangler flow files from a SageMaker Studio Classic environment. Phase 3: (Optional) Migrate data from Studio Classic to Studio provides a second method that uses your local machine to transfer the flow files. Furthermore, you can download single flow files from the SageMaker Studio tree control to your local machine, then import them manually in SageMaker Canvas. Choose the method that suits your needs and use case.
Clean up
When you’re done, shut down any running SageMaker Data Wrangler applications in SageMaker Studio Classic. To save costs, you can also remove any flow files from the SageMaker Studio Classic file browser, which is an Amazon Elastic File System (Amazon EFS) volume. You can also delete any of the intermediate files in Amazon S3. After the flow files are imported into SageMaker Canvas, the files copied to Amazon S3 are no longer needed.
You can log out of SageMaker Canvas when you’re done, then relaunch it when you’re ready to use it again.

Conclusion
Migrating your existing SageMaker Data Wrangler flows to SageMaker Canvas is a straightforward process that allows you to use the advanced data preparations you’ve already developed while taking advantage of the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined in this post, you can seamlessly transition your data wrangling artifacts to the SageMaker Canvas environment, streamlining your ML projects and enabling business analysts and non-technical users to build and deploy models more efficiently.
Start exploring SageMaker Canvas today and experience the power of a unified platform for data preparation, model building, and deployment!

About the Authors
Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MS in Supply Chain Management and a PhD in Data Science. Charles works in the Amazon SageMaker service team where he brings research and voice of the customer to inform the service roadmap. In his work, he collaborates daily with diverse AWS customers to help transform their businesses with cutting-edge AWS technologies and thought leadership.
Dan Sinnreich is a Sr. Product Manager for Amazon SageMaker, focused on expanding no-code / low-code services. He is dedicated to making ML and generative AI more accessible and applying them to solve challenging problems. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.
Huong Nguyen is a Sr. Product Manager at AWS. She is leading the ML data preparation for SageMaker Canvas and SageMaker Data Wrangler, with 15 years of experience building customer-centric and data-driven products.
Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.get confirmation

Use IP-restricted presigned URLs to enhance security in Amazon SageMak …

Amazon SageMaker Ground Truth significantly reduces the cost and time required for labeling data by integrating human annotators with machine learning to automate the labeling process. You can use SageMaker Ground Truth to create labeling jobs, which are workflows where data objects (such as images, videos, or documents) need to be annotated by human workers. These labeling jobs are distributed among a workteam—a group of workers assigned to perform the annotations. To access the data objects they need to label, workers are provided with Amazon S3 presigned URLs.
A presigned URL is a temporary URL that grants time-limited access to an Amazon Simple Storage Service (Amazon S3) object. In the context of SageMaker Ground Truth, these presigned URLs are generated using the grant_read_access Liquid filter and embedded into the task templates. Workers can then use these URLs to directly access the necessary files, such as images or documents, in their web browsers for annotation purposes.
While presigned URLs offer a convenient way to grant temporary access to S3 objects, sharing these URLs with people outside of the workteam can lead to unintended access of those objects. To mitigate this risk and enhance the security of SageMaker Ground Truth labeling tasks, we have introduced a new feature that adds an additional layer of security by restricting access to the presigned URLs to the worker’s IP address or virtual private cloud (VPC) endpoint from which they access the labeling task. In this blog post, we show you how to enable this feature, allowing you to enhance your data security as needed, and outline the success criteria for this feature, including the scenarios where it will be most beneficial.
Prerequisites
Before you get started configuring IP-restricted presigned URLs, the following resources can help you understand the background concepts:

Amazon S3 presigned URL: This documentation covers the use of Amazon S3 presigned URLs, which provide temporary access to objects. Understanding how presigned URLs work will be beneficial.
Use Amazon SageMaker Ground Truth to label data: This guide explains how to use SageMaker Ground Truth for data labeling tasks, including setting up workteams and workforces. Familiarity with these concepts will be helpful when configuring IP restrictions for your workteams.

Introducing IP-restricted presigned URLs
Working closely with our customers, we recognized the need for enhanced security posture and stricter access controls to presigned URLs. So, we introduced a new feature that uses AWS global condition context keys aws:SourceIp and aws:VpcSourceIp to allow customers to restrict presigned URL access to specific IP addresses or VPC endpoints. By incorporating AWS Identity and Access Management (IAM) policy constraints, you can now restrict presigned URLs to only be accessible from an IP address or VPC endpoint of your choice. This IP-based access control effectively locks down the presigned URL to the worker’s location, mitigating the risk of unauthorized access or unintended sharing.
Benefits of the new feature
This update brings several significant security benefits to SageMaker Ground Truth:

Enhanced data privacy: These IP restrictions restrict presigned URLs to only be accessible from customer-approved locations, such as corporate VPNs, workers’ home networks, or designated VPC endpoints. Although the presigned URLs are pre-authenticated, this feature adds an additional layer of security by verifying the access location and locking the URL to that location until the task is completed.
Reduced risk of unauthorized access: Enforcing IP-based access controls minimizes the risk of data being accessed from unauthorized locations and mitigates the risk of data sharing outside the worker’s approved access network. This is particularly important when dealing with sensitive or confidential data.
Flexible security options: You can apply these restrictions in either VPC or non-VPC settings, allowing you to tailor security measures to your organization’s specific needs.
Auditing and compliance: By locking down presigned URLs to specific IP addresses or VPC endpoints, you can more easily track and audit access to your organization’s data, helping achieve compliance with internal policies and external regulations.
Seamless integration: This new feature seamlessly integrates with existing SageMaker Ground Truth workflows, providing enhanced security without disrupting established labeling processes or requiring significant changes to existing infrastructure.

By introducing IP-Restricted presigned URLs, SageMaker Ground Truth empowers you with greater control over data access, so sensitive information remains accessible only to authorized workers within approved locations.
Configuring IP-restricted presigned URLs for SageMaker Ground Truth
The new IP restriction feature for presigned URLs in SageMaker Ground Truth can be enabled through the SageMaker API or the AWS Command Line Interface (AWS CLI). Before we go into the configuration of this new feature, let’s look at how you can create and update workteams today using the AWS CLI. You can also perform these operations through the SageMaker API using the AWS SDK.
Here’s an example of creating a new workteam using the create-workteam command:
aws sagemaker create-workteam
–description “A team for image labeling tasks”
–workforce-name “default”
–workteam-name “MyWorkteam”
–member-definitions ‘{
“CognitoMemberDefinition”: {
“ClientId”: “exampleclientid”,
“UserGroup”: “sagemaker-groundtruth-user-group”,
“UserPool”: “us-west-2_examplepool”
}
}’

To update an existing workteam, you use the update-workteam command:
aws sagemaker update-workteam
–workteam-name “MyWorkteam”
–description “Updated description for image labeling tasks”

Note that these examples only show a subset of the available parameters for the create-workteam and update-workteam APIs. You can find detailed documentation and examples in the SageMaker Ground Truth Developer Guide.
Enabling IP restrictions for presigned URLs
With the new IP restriction feature, you can now configure IP-based access constraints specific to each workteam when creating a new workteam or modifying an existing one. Here’s how you can enable these restrictions:

When creating or updating a workteam, you can specify a WorkerAccessConfiguration object, which defines access constraints for the workers in that workteam.
Within the WorkerAccessConfiguration, you can include an S3Presign object, which allows you to set access configurations for the presigned URLs used by the workers. Currently, only IamPolicyConstraints can be added to the S3Presign SageMaker Ground Truth provides two Liquid filters that you can use in your custom worker task templates to generate presigned URLs:

grant_read_access: This filter generates a presigned URL for the specified S3 object, granting temporary read access. The command will look like:

<!– Using grant_read_access filter –>
<img src=”{{ s3://bucket-name/path/to/image.jpg | grant_read_access }}”/>

s3_presign: This new filter serves the same purpose as grant_read_access but makes it clear that the generated URL is subject to the S3Presign configuration defined for the workteam. The command will look like:

<!– Using s3_presign filter (equivalent) –>
<img src=”{{ s3://bucket-name/path/to/image.jpg | s3_presign }}”/>

The S3Presign object supports IamPolicyConstraints, where you can enable or disable the SourceIp and VpcSourceIp

SourceIp: When enabled, workers can access presigned URLs only from the specified IP addresses or ranges.
VpcSourceIp: When enabled, workers can access presigned URLs only from the specified VPC endpoints within your AWS account.

You can call the SageMaker ListWorkteams or DescribeWorkteam APIs to view workteams’ metadata, including the WorkerAccessConfiguration.
Let’s say you want to create or update a workteam so that presigned URLs will be restricted to the public IP address of the worker who originally accessed it.
Create workteam:
aws sagemaker create-workteam
–description “An example workteam with S3 presigned URLs restricted”
–workforce-name “default”
–workteam-name “exampleworkteam”
–member-definitions ‘{
“CognitoMemberDefinition”: {
“ClientId”: “exampleclientid”,
“UserGroup”: “sagemaker-groundtruth-user-group”,
“UserPool”: “us-west-2_examplepool”
}
}’
–worker-access-configuration ‘{
“S3Presign”: {
“IamPolicyConstraints”: {
“SourceIp”: “Enabled”,
“VpcSourceIp”: “Disabled”
}
}
}’

Update workteam:
aws sagemaker update-workteam
–workteam-name “existingworkteam”
–worker-access-configuration ‘{
“S3Presign”: {
“IamPolicyConstraints”: {
“SourceIp”: “Enabled”,
“VpcSourceIp”: “Disabled”
}
}
}’

Success criteria
While the IP-restricted presigned URLs feature provides enhanced security, there are scenarios where it might not be suitable. Understanding these limitations can help you make an informed decision about using the feature and verify that it aligns with your organization’s security needs and network configurations.
IP-restricted presigned URLs are effective in scenarios where there’s a consistent IP address used by the worker accessing SageMaker Ground Truth and the S3 object. For example, if a worker accesses labeling tasks from a stable public IP address, such as an office network with a fixed IP address, the IP restriction will provide access with enhanced security. Similarly, when a worker accesses both SageMaker Ground Truth and S3 objects through the same VPC endpoint, the IP restriction will verify that the presigned URL is only accessible from within this VPC. In both scenarios, the consistent IP address enables the IP-based access controls to function correctly, providing an additional layer of security.
Scenarios where IP-restricted presigned URLs aren’t effective

Scenario
Description
Example
Exit criteria

Asymmetric VPC endpoints
SageMaker Ground Truth is accessed through a public internet connection while Amazon S3 is accessed through a VPC endpoint, or vice versa.
Worker accesses SageMaker Ground Truth through the public internet but S3 through a VPC endpoint.
Verify that both SageMaker Ground Truth and S3 are accessed either entirely through the public internet or entirely through the same VPC endpoint.

Network Address Translation (NAT) layers
NAT layers can alter the source IP address of requests, causing IP mismatches. Issues can arise from dynamically assigned IP addresses or asymmetric configurations.
Examples include:

N-to-M IP translation where multiple internal IP addresses are translated to multiple public IP addresses.
A NAT gateway with multiple public IP addresses assigned to it, which can cause requests to appear from different IP addresses.
Shared IP addresses where multiple users’ traffic is routed through a single public IP address, making it difficult to enforce IP-based restrictions effectively.

Verify that the NAT gateway is configured to preserve the source IP address. Validate the NAT configuration for consistency when accessing both SageMaker Ground Truth and S3 resources.

Use of VPNs
VPNs change the outgoing IP address, leading to potential access issues with IP-restricted presigned URLs.
Worker uses a split-tunnel VPN that changes IP address for different requests to Ground Truth or S3, access might be denied.
Disable the VPN or use a full tunnel VPN that offers consistent IP address for all requests.

Interface endpoints aren’t supported by the grant_read_access feature because of their inability to resolve public DNS names. This limitation is orthogonal to the IP restrictions and should be considered when configuring your network setup for accessing S3 objects with presigned URLs. In such cases, use the S3 Gateway endpoint when accessing S3 to verify compatibility with the public DNS names generated by grant_read_access.
Using S3 access logs for debugging
To debug issues related to IP-restricted presigned URLs, S3 access logs can provide valuable insights. By enabling access logging for your S3 bucket, you can track every request made to your S3 objects, including the IP addresses from which the requests originate. This can help you identify:

Mismatches between expected and actual IP addresses
Dynamic IP addresses or VPNs causing access issues
Unauthorized access from unexpected locations

To debug using S3 access logs, follow these steps:

Enable S3 access logging: Configure your bucket to deliver access logs to another bucket or a logging service such as Amazon CloudWatch Logs.
Review log files: Analyze the log files to identify patterns or anomalies in IP addresses, request timestamps, and error codes.
Look for IP address changes: If you observe frequent changes in IP addresses within the logs, it might indicate that the worker’s IP address is dynamic or altered by a VPN or proxy.
Check for NAT layer modifications: See if NAT layers are modifying the source IP address by checking the x-forwarded-for header in the log files.
Verify authorized access: Confirm that requests are coming from approved and consistent IP addresses by checking the Remote IP field in the log files.

By following these steps and analyzing the S3 access logs, you can validate that the presigned URLs are accessed only from approved and consistent IP addresses.
Conclusion
The introduction of IP-restricted presigned URLs in Amazon SageMaker Ground Truth significantly enhances the security of data accessed through the service. By allowing you to restrict access to specific IP addresses or VPC endpoints, this feature helps facilitate more fine-tuned control of presigned URLs. It provides organizations with added protection for their sensitive data, offering a valuable option for those with stringent security requirements. We encourage you to explore this new security feature to protect your organization’s data and enhance the overall security of your labeling workflows. To get started with SageMaker Ground Truth, visit Getting Started. To implement IP restrictions on presigned URLs as part of your workteam setup, refer to the CreateWorkteam and UpdateWorkteam API documentation. Follow the guidance provided in this blog to configure these security measures effectively. For more information or assistance, contact your AWS account team or visit the SageMaker community forums.

About the Authors
Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers build scalable and cost-efficient AI/ML pipelines with Human in the Loop services. In his free time, Sundar loves traveling, sports and enjoying outdoor activities with his family.
Michael Borde is a lead software engineer at Amazon AI, where he has been for seven years. He previously studied mathematics and computer science at the University of Chicago. Michael is passionate about cloud computing, distributed systems design, and digital privacy & security. After work, you can often find Michael putzing around the local powerlifting gym in Capitol Hill.
Jacky Shum is a Software Engineer at AWS in the SageMaker Ground Truth team. He works to help AWS customers leverage machine learning applications, including prior work on ML-based fraud detection with Amazon Fraud Detector.
Rohith Kodukula is a Software Development Engineer on the SageMaker Ground Truth team. In his free time he enjoys staying active and reading up on anything that he finds mildly interesting (most things really).
Abhinay Sandeboina is a Engineering Manager at AWS Human In The Loop (HIL). He has been in AWS for over 2 years and his teams are responsible for managing ML platform services. He has a decade of experience in software/ML engineering building infrastructure platforms at scale. Prior to AWS, he worked in various engineering management roles at Zillow and Capital One.

Unlock the power of structured data for enterprises using natural lang …

One of the most common applications of generative artificial intelligence (AI) and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Pre-trained foundation models (FMs) excel at natural language understanding (NLU) tasks, including summarization, text generation, and question answering across a wide range of topics. However, they often struggle to provide accurate answers without hallucinations and fall short when addressing questions about content that wasn’t included in their training data. Furthermore, FMs are trained with a point-in-time snapshot of data and have no inherent ability to access fresh data at inference time; therefore, they might provide responses that are incorrect or inadequate.
We face a fundamental challenge with enterprise data—overcoming the disconnect between natural language and structured data. Natural language is ambiguous and imprecise, whereas data adheres to rigid schemas. For example, SQL queries can be complex and unintuitive for non-technical users. Handling complex queries involving multiple tables, joins, and aggregations makes it difficult to interpret user intent and translate it into correct SQL operations. Domain-specific terminology further complicates the mapping process. Another challenge is accommodating the linguistic variations users employ to express the same requirement. Effectively managing synonyms, paraphrases, and alternative phrasings is important. The inherent ambiguity of natural language can also result in multiple interpretations of a single query, making it difficult to accurately understand the user’s precise intent.
To bridge this gap, you need advanced natural language processing (NLP) to map user queries to database schema, tables, and operations. In this architecture, Amazon Q Business acts as an intermediary, translating natural language into precise SQL queries. You can simply ask questions like “What were the sales for outdoor gear in Q3 2023?” Amazon Q Business analyzes intent, accesses data sources, and generates the SQL query. This simplifies data access for your non-technical users and streamlines workflows for professionals, allowing them to focus on higher-level tasks.
In this post, we discuss an architecture to query structured data using Amazon Q Business, and build out an application to query cost and usage data in Amazon Athena with Amazon Q Business. Amazon Q Business can create SQL queries to your data sources when provided with the database schema, additional metadata describing the columns and tables, and prompting instructions. You can extend this architecture to use additional data sources, query validation, and prompting techniques to cover a wider range of use cases.
Solution overview
The following figure represents the high-level architecture of the proposed solution. Steps 3 and 4 augment the AWS IAM Identity Center integration with Amazon Q Business for an authorization flow. In this architecture, we use Amazon Cognito for user authentication as well as a trusted token issuer to IAM Identity Center. You can also use your own identity provider as a trusted token issuer as long as it supports OpenID Connect (OIDC).

The workflow includes the following steps:

The user initiates the interaction with the Streamlit application, which is accessible through an Application Load Balancer, acting as the entry point.
The application prompts the user to authenticate using their Amazon Cognito credentials, maintaining secure access.
The application exchanges the token obtained from Amazon Cognito for an IAM Identity Center token, granting the necessary scope to interact with Amazon Q Business.
Using the IAM Identity Center token, the application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session from AWS Security Token Service (AWS STS), enabling authorized communication with Amazon Q Business.
Based on the user’s natural language query, the application formulates relevant prompts and metadata, which are then submitted to the chat_sync API of Amazon Q Business. In response, Amazon Q Business provides an appropriate Athena query to run.
The application runs the Athena query received from Amazon Q Business, and the resulting data is displayed on the web application’s UI.

Querying Amazon Q Business LLMs directly
As explained in the response settings for Amazon Q Business, there are different options to generate responses that allow you to either use your enterprise data, use LLMs directly, or fall back on the LLMs if the answer is not found in your enterprise data. Along with the global controls for response settings, you need to specify which chatMode you want to use based on your specific use case. If you want to bypass Retrieval Augmented Generation (RAG) and use plain text in the context window, you should use CREATOR_MODE. Alternatively, RAG is also bypassed when you upload files directly in the context window.
If you just use text in the context window and call Amazon Q Business APIs without switching to CREATOR_MODE, that may break your use case in the future if you add content to the index (RAG). In this use case, because we’re not indexing any data and using schemas as attachments in the API call to Amazon Q Business, RAG is automatically bypassed and the response is generated directly from the LLMs. Another reason to use attachments for this use case is that for the chatSync API, userMessage has a maximum length of 7,000, which can be surpassed depending on how large your text is in the context window.
Data query workflow
Let’s look at the prompts, query generation, and Athena query in detail. We use Athena as the data store in this post. Users enter natural language questions into a web application built with Streamlit. Amazon Q Business converts the natural language questions to valid SQL for Athena using the prompting instructions, the database schema, and data dictionary that are provided as context to the LLM. The generated SQL is sent to Athena to run as a query, and the returned data is displayed to the user in the Streamlit application. The following diagram illustrates this workflow.

These are the various components to this data flow, as numbered in the diagram:

User intent
Prompt builder
SQL query generator
Running the query
Query results

In the following sections, we look at each component in more detail.
User intent
The user intent or your inquiry is the starting point of the process. It can be in natural language, such as “What was the total spend for ElasticSearch last year?” The user’s input serves as the basis for the subsequent steps in the workflow.
Prompt builder
The prompt builder component plays a crucial role in bridging the gap between your natural language input and the structured data format required for SQL querying. It augments your question with relevant information from the table schema and data dictionary to provide context for the query generation process. This step involves the following sub-tasks:

Natural language processing – NLP techniques are employed to analyze and understand your questions. This includes steps like tokenization and dependency parsing to extract the intent and relevant entities from the natural language input.
Entity recognition – Named entity recognition (NER) is used to identify and classify relevant entities mentioned in your question, such as product names, dates, or region. This step helps map your input to the corresponding data elements in the database schema.
Intent mapping – The prompt builder maps your intent, extracted from the NLP analysis, to the appropriate data structures and operations required to fulfill the query. This mapping process uses the table schema and data dictionary to establish connections between your natural language questions and the database elements. The output of the prompt builder is a structured representation of your question, augmented with the necessary context from the database schema and data dictionary. This structured representation serves as input for the next step, SQL query generation.

The following is an example prompt for “What was the total spend for ElasticSearch last year?”

You will not respond to gibberish, random character sequences, or prompts that do not make logical sense.
If the input the input does not make sense or is outside the scope of the provided context, do not respond with SQL
but respond with – I do not know about this. Please fix your input.
You are an expert SQL developer. Only return the sql query. Do not include any verbiage.
You are required to return SQL queries based on the provided schema and the service mappings for common services and
their synonyms. The table with the provided schema is the only source of data. Do not use joins. Assume product,
service are synonyms for product_servicecode and price,cost,spend are synonymns for line_item_unblended_cost. Use the
column names from the provided schema while creating queries. Do not use preceding zeroes for the column month when
creating the query. Only use predicates when asked. For your reference, current date is June 01, 2024. write a sql
query for this task – What was the total spend for ElasticSearch last year?

SQL query generation
Based on the prompt generated from the prompt builder and your original question, Amazon Q Business generates the corresponding SQL query. The SQL query is tailored to retrieve the relevant data and perform the desired analysis or calculations to accurately answer the user’s question. This step may involve techniques such as:

Mapping your intent and entities to SQL clauses (SELECT, FROM, WHERE, JOIN, and so on)
Handling complex queries involving aggregations, subqueries, or predicates
Incorporating domain-specific knowledge or business rules into the query generation process

Running the query
In this step, the generated SQL query is run against the chosen data store, which could be a relational database, data warehouse, NoSQL database, or an object store like Amazon Simple Storage Service (Amazon S3). The data store serves as the repository for the data required to answer the user’s question. Depending on the architecture and requirements, the data store query may involve additional components or processes, such as:

Query optimization and indexing strategies
Materialized views for complex queries
Real-time data ingestion and updates

Query results
The query engine runs the generated SQL query against the data store and returns the query results. These results contain the insights or answers to the original user question. The presentation of the query results can take various forms, depending on the requirements of the application or UI:

Tabular data – The results can be displayed as a table or spreadsheet, suitable for structured data analysis
Visualizations – The query results can be rendered as charts, graphs, or other visual representations, providing a more intuitive way to understand and explore the data
Natural language responses – In some cases, the query results can be translated back into natural language statements or summaries, making the insights more accessible to non-technical users

In the following sections, we walk through the steps to deploy the web application and test the solution.
Prerequisites
Complete the following prerequisite steps:

Set up IAM Identity Center and add users that you intend to give access to in your Amazon Q Business application.
Have an existing, working Amazon Q Business application and give access to the users created in the previous step to the application.
AWS Cost and Usage Reports (AWS CUR) data is available in Athena. If you have CUR data, you can skip the following steps for CUR data setup. If not, you have a few options to set up CUR data:

To set up sample CUR data, refer to the following lab and follow the instructions.
You also need to set up an AWS Glue crawler to make the data available in Athena.

If you already have an SSL certificate, you can skip this step; otherwise, generate a private certificate.
Import the certificate into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate.

Set up the application
Complete the following steps to set up the application:

From your terminal, clone the GitHub repository:

git clone https://github.com/aws-samples/data-insights-with-amazon-q-business.git

Go to the project directory:

cd data-insights-with-amazon-q-business

Based on your CUR table, update the CUR schema under app/schemas/cur_schema.txt. Review the prompts under app/qb_config.py. The schema looks similar to the following code:

Review the data dictionary under app/schemas/service_mappings.csv. You can modify the mappings according to your dataset. A sample data dictionary for CUR might look like the following screenshot.

Zip up the code repository and upload it to an S3 bucket.
Follow the steps in the GitHub repo to deploy the Streamlit application.

Access the web application
As part of the deployment steps, you launched an AWS CloudFormation stack. On the AWS CloudFormation console, navigate to the Outputs tab for the stack and find the URL to access the Streamlit application. When you open the URL in a browser, you’ll see a login screen like the following screenshot. Sign up to create a user in the Amazon Cognito user pool. After you’re validated, you can use the same credentials to log in to the web application.

Query your cost and usage data
Start with a simple query like “What was the total spend for ElasticSearch this year?” A relevant prompt will be created and sent to Amazon Q Business. It will respond back with the corresponding SQL query. Notice the predicate where product_servicecode = ‘AmazonES’. Amazon Q Business is able to formulate the query because it has the schema and the data dictionary in context. It understands that ElasticSearch is an AWS service represented by a column named product_servicecode in the CUR data schema and its corresponding value of ‘AmazonES’. Next, the query is run against Athena and you get the results back.
The sample dataset used in this post is from 2023. If you’re using the sample dataset, natural language queries referring to current year will give not return results. Modify your queries to 2023 or mention the year in the user intent.
The following figure highlights the steps as explained in the data flow.

You can also try complex queries like “Give me a list of the top 3 products by total spend last year. For each of these products, what percentage of the overall spend is from this product?” Because the prompt builder has schema and product (AWS services) information in its context, Amazon Q Business creates the corresponding query. In this case, you’ll see a query similar to the following:

SELECT
product_servicecode,
SUM(line_item_unblended_cost) AS total_spend,
ROUND(SUM(line_item_unblended_cost) * 100.0 / (SELECT SUM(line_item_unblended_cost)
FROM cur_daily WHERE year = ‘2023’), 2) AS percentage_of_total
FROM cur_daily
WHERE year = ‘2023’
GROUP BY product_servicecode
ORDER BY total_spend DESC
LIMIT 3;

When the query is run against Athena, you’ll see similar results corresponding to your data.
Along with the data, you can also see a summary and trend analysis of your data on the Description tab of your Streamlit app.

The prompts used in the application are open domain and you’re free to update them in the code. For example, the following is a prompt used for a summary task:

You are an AI assistant. You are required to return a summary based on the provided data in attachment. Use atleast
100 words. The spend is in dollars. The unit of measurement is dollars. Give trend analysis too. Start your response
with – Here is your summary..

The following screenshot shows the results.

Feedback loop
You also have the option of capturing feedback for the generated queries with the thumbs up/down icon on the web application. Currently, the feedback is captured in a local file under /app/feedback. You can change this implementation to write to a database of your choice and have it serve as a query validation mechanism after your testing, to allow only validated queries to run.
Clean up
To clean up your resources, delete the CloudFormation stack, Amazon Q Business application, and Athena tables.
Conclusion
In this post, we demonstrated how Amazon Q Business can effectively bridge the gap between users and data, enabling you to extract valuable insights from various data stores using natural language queries, without the need for extensive technical knowledge or SQL expertise. The natural language understanding capabilities of Amazon Q Business can accurately interpret user intent, extract relevant entities, and generate SQL to translate the user’s query into executable data operations. You can now empower a wider range of enterprise users to unlock the full value of your organization’s data assets. By democratizing data access and analysis using natural language queries, you can foster data-driven decision-making, drive innovation, and unlock new opportunities for growth and success.
In Part 2 of this series, we demonstrate how to integrate this architecture with LangChain using Amazon Q Business as a custom model. We also cover query validation and accuracy measurement.

About the Authors
Vishal Karlupia is a Senior Technical Account Manager/Lead at Amazon Web Services, Toronto. He specializes in generative AI applications and helps customers build and scale their AI/ML workloads on AWS. Outside of work, he enjoys being outdoors and keeping bonfires alive.
Srinivas Ganapathi is a Principal Technical Account Manager at Amazon Web Services. He is based in Toronto, Canada, and works with games customers to run efficient workloads on AWS.

Microsoft Released SuperBench: A Groundbreaking Proactive Validation S …

Cloud AI infrastructure is vital to modern technology, providing the backbone for various AI workloads and services. Ensuring the reliability of these infrastructures is crucial, as any failure can lead to widespread disruption, particularly in large-scale distributed systems where AI workloads are synchronized across numerous nodes. This synchronization means that a failure in one node can have cascading effects, magnifying the impact and causing significant downtime or performance degradation. The complexity and scale of these systems make it essential to have robust mechanisms in place to maintain their smooth operation and minimize incidents that could affect the quality of service provided to users.

One of the primary challenges in maintaining cloud AI infrastructure is addressing hidden degradations due to hardware redundancies. These subtle failures, often termed “gray failures,” do not cause immediate, catastrophic problems but gradually degrade performance over time. These issues are particularly problematic because they are not easily detectable with conventional monitoring tools, typically designed to identify more apparent binary failure states. The insidious nature of gray failures complicates the task of root cause analysis, making it difficult for cloud providers to identify and rectify the underlying problems before they escalate into more significant issues that could impact the entire system.

Cloud providers have traditionally relied on hardware redundancies to mitigate these hidden issues and ensure system reliability. Redundant components, such as extra GPU compute units or over-provisioned networking links, are intended to act as fail-safes. However, these redundancies can inadvertently introduce their own set of problems. Over time, continuous and repetitive use of these redundant components can lead to gradual performance degradation. For example, in Azure A100 clusters, where InfiniBand top-of-rack (ToR) switches have multiple redundant uplinks, the loss of some of these links can lead to throughput regression, particularly under certain traffic patterns. This gradual degradation type often goes unnoticed until it significantly impacts AI workloads, which becomes much more challenging to address.

A team of researchers from Microsoft Research and Microsoft introduced SuperBench, a proactive validation system designed to enhance cloud AI infrastructure’s reliability by addressing the hidden degradation problem. SuperBench performs a comprehensive evaluation of hardware components under realistic AI workloads. The system includes two main components: a Validator, which learns benchmark criteria to identify defective components, and a Selector, which optimizes the timing and scope of the validation process to ensure it is both effective and efficient. SuperBench can run diverse benchmarks representing most real AI workloads, allowing it to detect subtle performance regressions that might otherwise go unnoticed.

The technology behind SuperBench is sophisticated and tailored to address the unique challenges cloud AI infrastructures pose. The Validator component of SuperBench conducts a series of benchmarks on specified nodes, learning to distinguish between normal and defective performance by analyzing the cumulative distribution of benchmark results. This approach ensures that even slight deviations in performance, which could indicate a potential problem, are detected early. Meanwhile, the Selector component balances the trade-off between validation time and the possible impact of incidents. Using a probability model to predict the likelihood of incidents, the Selector determines the optimal time to run specific benchmarks. This ensures that validation is performed when it is most likely to prevent issues.

The effectiveness of SuperBench is demonstrated by its deployment in Azure’s production environment, where it has been used to validate hundreds of thousands of GPUs. Through rigorous testing, SuperBench has been shown to increase the mean time between incidents (MTBI) by up to 22.61 times. By reducing the time required for validation and focusing on the most critical components, SuperBench has decreased the cost of validation time by 92.07% while simultaneously increasing user GPU hours by 4.81 times. These impressive results highlight the system’s ability to detect and prevent performance issues before they impact end-to-end workloads.

In conclusion, SuperBench, by focusing on the early detection and resolution of hidden degradations, offers a robust solution to the complex challenge of ensuring the continuous and reliable operation of large-scale AI services. The system’s ability to identify subtle performance regressions and optimize the validation process makes it an invaluable tool for cloud service providers looking to enhance the reliability of their AI infrastructures. With SuperBench, Microsoft has set a new standard for cloud infrastructure maintenance, ensuring that AI workloads can be executed with minimal disruption and maximum efficiency, thus maintaining high-performance standards in a rapidly evolving technological landscape.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post Microsoft Released SuperBench: A Groundbreaking Proactive Validation System to Enhance Cloud AI Infrastructure Reliability and Mitigate Hidden Performance Degradations appeared first on MarkTechPost.

Improving Robustness Against Bias in Social Science Machine Learning: …

Language models (LMs) have gained significant prominence in computational text analysis, offering enhanced accuracy and versatility. However, a critical challenge persists: ensuring the validity of measurements derived from these models. Researchers face the risk of misinterpreting results, potentially measuring unintended factors such as incumbency instead of ideology, or party names rather than populism. This discrepancy between intended and actual measurements can lead to substantially flawed conclusions, undermining the credibility of research outcomes.

The fundamental question of measurement validity looms large in the field of computational social science. Despite the increasing sophistication of language models, concerns about the gap between the ambitions of these tools and the validity of their outputs remain. This issue has been a longstanding focus of computational social scientists, who have consistently warned about the challenges associated with validity in text analysis methods. The need to address this gap has become increasingly urgent as language models continue to evolve and expand their applications across various domains of research.

This study by researchers from Communication Science, Vrije Universiteit Amsterdam and Department of Politics, IR and Philosophy, Royal Holloway University of London addresses the critical issue of measurement validity in supervised machine learning for social science tasks, particularly focusing on how biases in fine-tuning data impact validity. The researchers aim to bridge the gap in social science literature by empirically investigating three key research questions: the extent of bias impact on validity, the robustness of different machine learning approaches against these biases, and the potential of meaningful instructions for language models to reduce bias and increase validity.

The study draws inspiration from the natural language processing (NLP) fairness literature, which suggests that language models like BERT or GPT may reproduce spurious patterns from their training data rather than truly understanding the concepts they are intended to measure. The researchers adopt a group-based definition of bias, considering a model biased if it performs unequally across social groups. This approach is particularly relevant for social science research, where complex concepts often need to be measured across diverse social groups using real-world training data that is rarely perfectly representative.

To tackle these challenges, the paper proposes and investigates instruction-based models as a potential solution. These models receive explicit, verbalized instructions for their tasks in addition to fine-tuning data. The researchers theorize that this approach might help models learn tasks more robustly and reduce reliance on spurious group-specific language patterns from the fine-tuning data, thereby potentially improving measurement validity across different social groups.

The proposed study addresses measurement validity in supervised machine learning for social science tasks, focusing on group-based biases in training data. Drawing from Adcock and Collier’s (2001) framework, the researchers emphasize robustness against group-specific patterns as crucial for validity. They highlight how standard machine learning models can become “stochastic parrots,” reproducing biases from training data without truly understanding concepts. To mitigate this, the study proposes investigating instruction-based models that receive explicit, verbalized task instructions alongside fine-tuning data. This approach aims to create a stronger link between the scoring process and the systematized concept, potentially reducing measurement error and enhancing validity across diverse social groups.

The proposed study investigates the robustness of different supervised machine learning approaches against biases in fine-tuning data, focusing on three main classifier types: logistic regression, BERT-base (DeBERTa-v3-base), and BERT-NLI (instruction-based). The study design involves training these models on four datasets across nine types of groups, comparing performance under biased and random training conditions.

Key aspects of the methodology include:

1. Training models on texts sampled from only one group (biased condition) and randomly across all groups (random condition).

2. Testing on a representative held-out test set to measure the “bias penalty” – the performance difference between biased and random conditions.

3. Using 500 texts with balanced classes for training to eliminate class imbalance as an intervening variable.

4. Conducting multiple training runs across six random seeds to reduce the influence of randomness.

5. Employing binomial mixed-effects regression to analyze classification errors, considering classifier type and whether test texts come from the same group as training data.

6. Testing the impact of meaningful instructions by comparing BERT-NLI performance with both meaningful and meaningless instructions.

This comprehensive approach aims to provide insights into the extent of bias impact on validity, the robustness of different classifiers against biases, and the potential of meaningful instructions to reduce bias and increase validity in supervised machine learning for social science tasks.

This study investigates the impact of group-based biases in machine learning training data on measurement validity across various classifiers, datasets, and social groups. The researchers found that all classifier types learn group-based biases, but the effects are generally small. Logistic regression showed the largest performance drop (2.3% F1 macro) when trained on biased data, followed by BERT-base (1.7% drop), while BERT-NLI demonstrated the smallest decrease (0.4% drop). Error probabilities on unseen groups increased for all models, with BERT-NLI showing the least increase. The study attributes BERT-NLI’s robustness to its algorithmic structure and ability to incorporate task definitions as plain text instructions, reducing dependence on group-specific language patterns. These findings suggest that instruction-based models like BERT-NLI may offer improved measurement validity in supervised machine learning for social science tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post Improving Robustness Against Bias in Social Science Machine Learning: The Promise of Instruction-Based Models appeared first on MarkTechPost.

KOALA (K-layer Optimized Adversarial Learning Architecture): An Orthog …

As LLMs become increasingly complex and powerful, their inference process, i.e., generating text given a prompt, becomes computationally expensive and time-consuming. Many applications, such as real-time translation, dialogue systems, or interactive content generation, require quick responses. Additionally, slow inference consumes substantial computational resources, leading to higher operational costs. 

Researchers from the Dalian University of Technology, China have addressed the challenge of high inference latency in Large Language Models (LLMs) caused by their autoregressive decoding nature, which requires tokens to be generated sequentially. Current methods like speculative decoding (an approach that involves a draft model predicting multiple future tokens for verification by the target LLM) have been introduced to mitigate this latency. Still, its full potential has yet to be fully explored. Specifically, the single-layer draft head used in speculative decoding has a performance gap due to limited parameter count and inadequate training methods, resulting in inefficient acceleration of LLM inference.

Researchers introduce KOALA (K-layer Optimized Adversarial Learning Architecture), a novel approach that optimizes the draft head for speculative decoding. KOALA enhances the traditional single-layer draft head by expanding it into a multi-layer architecture, thereby reducing the performance gap with the target LLM. Additionally, KOALA integrates adversarial learning into the training process, encouraging the draft head to better capture the token generation process of the target LLM, thus improving prediction accuracy. The multi-layer structure, and adversarial learning, allow KOALA to generate more accurate tokens per draft-then-verify cycle, reducing the number of iterations needed for decoding and consequently enhancing LLM inference speed.

KOALA is evaluated through comprehensive experiments with Medusa and EAGLE as non-autoregressive and autoregressive draft heads, respectively, with Vicuna models (7B, 13B, 33B) as target LLMs. Evaluations conducted on the MT-bench demonstrate that KOALA achieves a latency speedup ratio improvement of 0.24x-0.41x, which translates to being 10.57%-14.09% faster than the original draft heads. These results underscore KOALA’s ability to enhance the efficiency of speculative decoding across various LLM sizes and tasks, with the multi-layer architecture and adversarial learning both contributing to these gains.

In conclusion, KOALA presents a significant advancement in optimizing draft heads for speculative decoding in LLMs. By introducing a multi-layer structure and incorporating adversarial learning into the training process, KOALA reduces the performance gap between draft heads and target LLMs, leading to faster inference speeds. The experimental results validate KOALA’s efficacy, showing observable improvements in latency speedup ratios. Although KOALA  causes a slight increase in drafting overhead, this is outweighed by the substantial acceleration of LLM inference, making KOALA a promising technique for enhancing the efficiency of LLMs in real-world applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post KOALA (K-layer Optimized Adversarial Learning Architecture): An Orthogonal Technique for Draft Head Optimization appeared first on MarkTechPost.

Cohere Rerank 3 Nimble now generally available on Amazon SageMaker Jum …

The Cohere Rerank 3 Nimble foundation model (FM) is now generally available in Amazon SageMaker JumpStart. This model is the newest FM in Cohere’s Rerank model series, built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems.
In this post, we discuss the benefits and capabilities of this new model with some examples.
Overview of Cohere Rerank models
Cohere’s Rerank family of models are designed to enhance existing enterprise search systems and RAG systems. Rerank models improve search accuracy over both keyword-based and embedding-based search systems. Cohere Rerank 3 is designed to reorder documents retrieved by initial search algorithms based on their relevance to a given query. A reranking model, also known as a cross-encoder, is a type of model that, given a query and document pair, will output a similarity score. For FMs, words, sentences, or entire documents are often encoded as dense vectors in a semantic space. By calculating the cosine of the angle between these vectors, you can quantify their semantic similarity and output as a single similarity score. You can use this score to reorder the documents by relevance to your query.
Cohere Rerank 3 Nimble is the newest model from Cohere’s Rerank family of models, designed to improve speed and efficiency from its predecessor Cohere Rerank 3. According to Cohere’s benchmark tests including BEIR (Benchmarking IR) for accuracy and internal benchmarking datasets, Cohere Rerank 3 Nimble maintains high accuracy while being approximately 3–5 times faster than Cohere Rerank 3. The speed improvement is designed for enterprises looking to enhance their search capabilities without sacrificing performance.
The following diagram represents the two-stage retrieval of a RAG pipeline and illustrates where Cohere Rerank 3 Nimble is incorporated into the search pipeline.

In the first stage of retrieval in the RAG architecture, a set of candidate documents are returned based on the knowledge base that’s relevant to the query. In the second stage, Cohere Rerank 3 Nimble analyzes the semantic relevance between the query and each retrieved document, reordering them from most to least relevant. The top-ranked documents augment the original query with additional context. This process improves search result quality by identifying the most pertinent documents. Integrating Cohere Rerank 3 Nimble into a RAG system enables users to send fewer but higher-quality documents to the language model for grounded generation. This results in improved accuracy and relevance of search results without adding latency.
Overview of SageMaker JumpStart
SageMaker JumpStart offers access to a broad selection of publicly available FMs. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow. It offers an unparalleled suite of tools that cater to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. Data scientists and developers can use the SageMaker integrated development environment (IDE) to access a vast array of pre-built algorithms, customize their own models, and seamlessly scale their solutions. The platform’s strength lies in its ability to abstract away the complexities of infrastructure management, allowing you to focus on innovation rather than operational overhead. The automated ML capabilities of SageMaker, including automated machine learning (AutoML) features, democratize ML by enabling even non-experts to build sophisticated models. Furthermore, its robust governance features help organizations maintain control and transparency over their ML projects, addressing critical concerns around regulatory compliance.
Prerequisites
Make sure your SageMaker AWS Identity and Access Management (IAM) service role has the AmazonSageMakerFullAccess permission policy attached.
To deploy Cohere Rerank 3 Nimble successfully, confirm one of the following:

Make sure your IAM role has the following permissions and you have the authority to make AWS Marketplace subscriptions in the AWS account used:

aws-marketplace:ViewSubscriptions
aws-marketplace:Unsubscribe
aws-marketplace:Subscribe

Alternatively, confirm your AWS account has a subscription to the model. If so, you can skip the following deployment instructions and start with subscribing to the model package.

Deploy Cohere Rerank 3 Nimble on SageMaker JumpStart
You can access the Cohere Rerank 3 family of models using SageMaker JumpStart in Amazon SageMaker Studio, as shown in the following screenshot.

Deployment starts when you choose Deploy, and you may be prompted to subscribe to this model through AWS Marketplace. If you are already subscribed, you can choose Deploy again to deploy the model. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK.

Subscribe to the model package
To subscribe to the model package, complete the following steps:

Depending on the model you want to deploy, open the model package listing page for cohere-rerank-nimble-english or cohere-rerank-nimble-multilingual.
On the AWS Marketplace listing, choose Continue to subscribe.
On the Subscribe to this software page, review and choose Accept Offer if you and your organization agree with EULA, pricing, and support terms.
Choose Continue to configuration and then choose an AWS Region.

A product ARN will be displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3.
Deploy Cohere Rerank 3 Nimble using the SDK
To deploy the model using the SDK, copy the product ARN from the previous step and specify it in the model_package_arn in the following code:

from cohere_aws import Client
import boto3
region = boto3.Session().region_name

model_package_arn = “Specify the model package ARN here”

After you specify the model package ARN, you can create the endpoint, as shown in the following code. Specify the name of the endpoint, the instance type, and the number of instances being used. Make sure you have the account-level service limit for using ml.g5.xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas.

co = Client(region_name=region)
co.create_endpoint(arn=model_package_arn, endpoint_name=”cohere-rerank-3/cohere-rerank-nimble-multilingual”, instance_type=”ml.g5.xlarge”, n_instances=1)

If the endpoint is already created, you just need to connect to it with the following code:

co.connect_to_endpoint(endpoint_name=”cohere-rerank-3/cohere-rerank-nimble-multilingual-v3″)

Follow a similar process as detailed earlier to deploy Cohere Rerank 3 on SageMaker JumpStart.
Inference example with Cohere Rerank 3 Nimble
Cohere Rerank 3 Nimble offers robust multilingual support. The model is available in both English and multilingual versions supporting over 100 languages.
The following code example illustrates how to perform real-time inference using Cohere Rerank 3 Nimble-English:

documents = [
    {“Title”:”Incorrect Password”,”Content”:”Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?”},
    {“Title”:”Confirmation Email Missed”,”Content”:”Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?”},
    {“Title”:”Questions about Return Policy”,”Content”:”Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.”},
    {“Title”:”Customer Support is Busy”,”Content”:”Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?”},
    {“Title”:”Received Wrong Item”,”Content”:”Hi, I have a question about my recent order. I received the wrong item and I need to return it.”},
    {“Title”:”Customer Service is Unavailable”,”Content”:”Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?”},
    {“Title”:”Return Policy for Defective Product”,”Content”:”Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.”},
    {“Title”:”Wrong Item Received”,”Content”:”Good morning, I have a question about my recent order. I received the wrong item and I need to return it.”},
    {“Title”:”Return Defective Product”,”Content”:”Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.”}
]

In the following code, the top_n inference parameter for Cohere Rerank 3 and Rerank 3 Nimble specifies the number of top-ranked results to return after reranking the input documents. It allows you to control how many of the most relevant documents are included in the final output. To determine an optimal value for top_n, consider factors such as the diversity of your document set, the complexity of your queries, and the desired balance between precision and latency for enterprise search or RAG.

response = co.rerank(documents=documents, query=’What emails have been about returning items?’, rank_fields=[“Title”,”Content”], top_n=2)

The following is the output from Cohere Rerank 3 Nimble-English:

Documents: [RerankResult<document: {‘Title’: ‘Received Wrong Item’, ‘Content’: ‘Hi, I have a question about my recent order. I received the wrong item and I need to return it.’}, index: 4, relevance_score: 0.0068771075>, RerankResult<document: {‘Title’: ‘Wrong Item Received’, ‘Content’: ‘Good morning, I have a question about my recent order. I received the wrong item and I need to return it.’}, index: 7, relevance_score: 0.0064131636>]

Cohere Rerank 3 Nimble multilingual support
The multilingual capabilities of Cohere Rerank 3 Nimble-Multilingual enable global organizations to provide consistent, improved search experiences to users across different Regions and language preferences.
In the following example, we create an input payload for a list of emails in multiple languages. We can take the same set of emails from earlier and translate them to different languages. These examples are available under the SageMaker JumpStart model card and are randomly generated for this example.

documents = [
    {“Title”:”Contraseña incorrecta”,”Content”:”Hola, llevo una hora intentando acceder a mi cuenta y sigue diciendo que mi contraseña es incorrecta. ¿Puede ayudarme, por favor?”},
    {“Title”:”Confirmation Email Missed”,”Content”:”Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?”},
    {“Title”:”أسئلة حول سياسة الإرجاع”,”Content”:”مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب”},
    {“Title”:”Customer Support is Busy”,”Content”:”Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?”},
    {“Title”:”Falschen Artikel erhalten”,”Content”:”Hallo, ich habe eine Frage zu meiner letzten Bestellung. Ich habe den falschen Artikel erhalten und muss ihn zurückschicken.”},
    {“Title”:”Customer Service is Unavailable”,”Content”:”Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?”},
    {“Title”:”Return Policy for Defective Product”,”Content”:”Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.”},
    {“Title”:”收到错误物品”,”Content”:”早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。”},
    {“Title”:”Return Defective Product”,”Content”:”Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.”}
]

Use the following code to perform real-time inference using Cohere Rerank 3 Nimble-Multilingual:

response = co.rerank(documents=documents, query=’What emails have been about returning items?’, rank_fields=[‘Title’,’Content’], top_n=2)
print(f’Documents: {response}’)

The following is the output from Cohere Rerank 3 Nimble-Multilingual:

Documents: [RerankResult<document: {‘Title’: ‘收到错误物品’, ‘Content’: ‘早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。’}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {‘Title’: ‘أسئلة حول سياسة الإرجاع’, ‘Content’: ‘مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب’}, index: 2, relevance_score: 0.00037263767>]

The output translated to English is as follows:

Documents: [RerankResult<document: {‘Title’: ‘Received Wrong Item’, ‘Content’: ‘Good morning, I have a question about my recent order. I received the wrong item and need to return it.’}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {‘Title’: ‘Questions about Return Policy’, ‘Content’: ‘Hello, I have a question about the return policy for this product. I bought it a few weeks ago and it’s defective’}, index: 2, relevance_score: 0.00037263767>]

In both examples, the relevance scores are normalized to be in the range [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance.
Use cases suitable for Cohere Rerank 3 Nimble
The Cohere Rerank 3 Nimble model provides an option that prioritizes efficiency. The model is ideal for enterprises looking to enable their customers to accurately search complex documentation, build applications that understand over 100 languages, and retrieve the most relevant information from various data stores. In industries such as retail, where website drop-off increases with every 100 milliseconds added to search response time, having a faster AI model like Cohere Rerank 3 Nimble powering the enterprise search system translates to higher conversion rates.
Conclusion
Cohere Rerank 3 and Rerank 3 Nimble are now available on SageMaker JumpStart. To get started, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart.
Interested in diving deeper? Check out the Cohere on AWS GitHub repo.

About the Authors
Breanne Warner is an Enterprise Solutions Architect at Amazon Web Services supporting healthcare and life science (HCLS) customers. She is passionate about supporting customers to use generative AI on AWS and evangelizing model adoption. Breanne is also on the Women@Amazon board as co-director of Allyship with the goal of fostering inclusive and diverse culture at Amazon. Breanne holds a Bachelor’s of Science in Computer Engineering from University of Illinois at Urbana Champaign (UIUC)
Nithin Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.
Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundational model providers to define and run join GTM motions that help customers train, deploy, and scale foundational models. Karan holds a Bachelor’s of Science in Electrical and Instrumentation Engineering from Manipal University and a Master’s in Science in Electrical Engineering from Northwestern University, and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Meet Decisional AI: An AI Agent for Financial Analysts

Tasks like extracting data, creating market maps, and sorting through transcripts and board packs prevent analysts from using the first principles of thinking to generate alpha. Airtable, Dropbox, and email are just a few examples of the internal data silos they face. At the same time, external sources include websites, SEC filings, and private data feeds from companies like S&P. Balancing deep financial work and clerical cross-referencing is difficult, complicated, and time-consuming. There is continual context flipping due to data silos and various information sources.

Meet Decisional, an AI Financial Analyst tool developed to read and comprehend data from many public and private sources. Envision a future where an AI analyst is always available, able to read data rooms, understand emails, and navigate the web. Never again will you have to manually copy a table from a PDF to Excel since Decisional eliminates data silos and handles all the tedious work for you.

Decisional compiles data from your company and other external sources into a knowledge graph, which fuels an intelligent AI agent available around the clock. It aids in discovering previously unseen but critically important facts and patterns. Magic Tables by Decisional will arrange critical metrics, public comps, and rivalry by adding external data such as financing amounts, stage, area, and sector.

With the help of your documents, analysts can create “AI memos” and extract valuable information. Using Decisional, you won’t have to wait 24-48 hours for your offshore crew to finish menial tasks like data extraction. An AI agent then assists in drafting deep, live texts by automatically retrieving and organizing data from underlying sources, ensuring that all citations are properly included.

Key Features

To improve the accuracy and efficiency of financial analysis, Decisional AI provides features such as the ability to create custom knowledge engines, analyze data using AI, streamline workflows, quickly set up models, handle data securely, verify facts, store documents, have flexible query options, and support multiple models.

In Conclusion

Developed to help financial analysts, Decisional AI is an AI-powered platform. It automates data collecting, analysis, and report preparation, simplifying workflows. Decisional AI improves efficiency and accuracy by taking care of the boring parts of financial analysis so analysts can concentrate on the important stuff, like making strategic decisions.
The post Meet Decisional AI: An AI Agent for Financial Analysts appeared first on MarkTechPost.

FlexEval: An Open-Source AI Tool for Chatbot Performance Evaluation an …

A Large Language Model (LLM) is an advanced type of artificial intelligence designed to understand and generate human-like text. It’s trained on vast amounts of data, enabling it to perform various natural language processing tasks, such as answering questions, summarizing content, and engaging in conversation.

LLMs are revolutionizing education by serving as chatbots that enrich learning experiences. They offer personalized tutoring, instant answers to students’ queries, aid in language learning, and simplify complex topics. By emulating human-like interactions, these chatbots democratize learning, making it more accessible and engaging. They empower students to learn at their own pace and cater to their individual needs.

However, evaluating educational chatbots powered by LLMs is challenging due to their open-ended, conversational nature. Unlike traditional models with predefined correct responses, educational chatbots are assessed on their ability to engage students, use supportive language, and avoid harmful content. The evaluation focuses on how well these chatbots align with specific educational goals, like guiding problem-solving without directly giving answers. Flexible, automated tools are essential for efficiently assessing and improving these chatbots, ensuring they meet their intended educational objectives.

To resolve the challenges cited above, a new paper was recently published introducing FlexEval, an open-source tool designed to simplify and customize the evaluation of LLM-based systems. FlexEval allows users to rerun conversations that led to undesirable behavior, apply custom metrics, and evaluate new and historical interactions. It provides a user-friendly interface for creating and using rubrics, integrates with various LLMs, and safeguards sensitive data by running evaluations locally. FlexEval addresses the complexities of evaluating conversational systems in educational settings by streamlining the process and making it more flexible.

Here are the three parts of the text categorized as requested:

Concretely, FlexEval is designed to reduce the complexity of automated testing by allowing developers to increase visibility into system behavior before and after product releases. It provides editable files in a single directory: `evals.yaml` for test suite specifications, `function_metrics.py` for custom Python metrics, `rubric_metrics.yaml` for machine-graded rubrics, and `completion_functions.py` for defining completion functions. FlexEval supports evaluating new and historical conversations and storing results locally in an SQLite database. It integrates with various LLMs and configures user needs, facilitating system evaluation without compromising sensitive educational data.

To check the effectiveness of FlexEval, two example evaluations were conducted. The first tested model safety using the Bot Adversarial Dialogue (BAD) dataset to determine whether pre-release models agreed with or produced harmful statements. Results were evaluated using the OpenAI Moderation API and a rubric to detect the Yeasayer Effect. The second evaluation involved historical conversations between students and a math tutor from the NCTE dataset, where FlexEval classified tutor utterances as on or off task using LLM-graded rubrics. Metrics such as harassment and F1 scores were calculated, demonstrating FlexEval’s utility in model evaluation.

To conclude, we presented FlexEval in this article, which was proposed recently in a new paper. FlexEval addresses the challenges of evaluating LLM-based systems by simplifying the process and increasing visibility into model behavior. It offers a flexible, customizable solution that safeguards sensitive data and integrates easily with other tools. As LLM-powered products continue to grow in educational settings, FlexEval is important to ensure these systems reliably serve their intended purpose. Future developments aim to further ease-of-use and broaden the tool’s application.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post FlexEval: An Open-Source AI Tool for Chatbot Performance Evaluation and Dialogue Analysis appeared first on MarkTechPost.

USC Researchers Present Safer-Instruct: A Novel Pipeline for Automatic …

Language model alignment is quite important, particularly in a subset of methods from RLHF that have been applied to strengthen the safety and competence of AI systems. Language models are deployed in many applications today, and their outputs can be harmful or biased. Inherent human preference alignment under RLHF ensures that their behaviors are ethical and socially applicable. This is a critical process to avoid spreading misinformation and harmful content and ensure that AI is developed for the betterment of society.

The main difficulty of RLHF lies in the fact that preference data should be annotated through a resource-intensive, creativity-demanding process. Researchers need help with diversified and high-quality data gathering for training models that can represent human preferences with higher accuracy. Traditional methods, such as manually crafting prompts and responses, are inherently narrow and result in bias, complicating the scaling of effective data annotation processes. This challenge hinders the development of safe AI that can understand nuanced human interactions.

In-plane, current methods for preference data generation are heavily dependent on human annotation or a few automatic generation techniques. Most of these methods must rely on authored scenarios or seed instructions and are hence likely to be low in diversity, introducing subjectivity into the data. Moreover, it is time-consuming and expensive to elicit the preferences of human evaluators for both preferred and dispreferred responses. Moreover, many expert models used to generate data have strong safety filters, making it very hard to develop the dispreferred responses necessary for building comprehensive safety preference datasets.

In this line of thinking, researchers from the University of Southern California introduced SAFER-INSTRUCT, a new pipeline for automatically constructing large-scale preference data. It applies reversed instruction tuning, induction, and evaluation of an expert model to generate high-quality preference data without human annotators. The process is thus automated; hence, SAFER-INSTRUCT enables more diversified and contextually relevant data to be created, enhancing the safety and alignment of language models. This method simplifies the data annotation process and extends its applicability in different domains, making it a versatile tool for AI development.

It starts with reversed instruction tuning, where a model is trained to generate instructions based on responses, which essentially performs instruction induction. Through this method, it would be easy to produce a great variety of instructions over specific topics such as hate speech or self-harm without having manual prompts. The quality of the generated instructions is filtered, and an expert model generates the preferred responses. These responses again undergo filtering according to human preferences. The result of this rigorous process will be a comprehensive preference dataset for fine-tuning language models to be safe and effective.

Testing the performance of the SAFER-INSTRUCT framework was done by evaluating an Alpaca model fine-tuned on the generated safety preference dataset. Results were huge; it has outperformed the rest of the Alpaca-based models regarding harmlessness, with huge improvements in safety metrics. Precisely, the model trained on SAFER-INSTRUCT data realized 94.7% of the harmlessness rate when evaluated with Claude 3, significantly higher when compared to the models fine-tuned on human-annotated data: 86.3%. It has continued to be conversational and competitive at downstream tasks, indicating that the safety improvements did not come at the cost of other capabilities. This performance demonstrates how effective SAFER-INSTRUCT is in making progress toward creating safer yet more capable AI systems.

That is to say, the researchers from the University of Southern California actually tackled one of the thorny issues of preference data annotation in RLHF by introducing SAFER-INSTRUCT. This creative pipeline automated not only the construction of large-scale preference data, raising if needed—safety and alignment without performance sacrifice for language models—but the versatility of this framework served well within AI development for many years to come, making certain that language models can be safe and effective across many applications.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post USC Researchers Present Safer-Instruct: A Novel Pipeline for Automatically Constructing Large-Scale Preference Data appeared first on MarkTechPost.

Researchers from UCI and Cisco Propose ‘CrystalBall’: A Novel AI M …

Cybersecurity is a fast-paced area wherein knowledge and mitigation of threats are most necessary. In this respect, the attack graph is one tool that security analysts mainly resort to for charting all possible attacker paths to the exploitation of vulnerabilities within a system. The challenge of managing vulnerabilities and threats has increased with modern systems’ enhanced complexity. Traditional methods of attack graph generation, most of which are manual and strongly reliant on expert knowledge, need revision. Given the fast-growing complexity of such systems and the threats’ dynamics, there is a natural demand for more efficient and adaptive approaches in threat modeling and attack graph generation.

One of the major problems in cybersecurity today is that the vulnerability landscape keeps changing. New vulnerabilities are continuously discovered, and attackers develop new exploitation methods. Static rules, heuristics, and manual curation shackle classic attack graph generation methods. These approaches are time-consuming and usually cannot provide the extent of coverage needed. This gap exposes systems to such emerging threats that could not be captured by those static models previously. This would, in turn, require a much more dynamic approach to keep up with the rapidly changing threat environment.

Currently, manual curation and computational algorithms are used to create attack graphs. Formal definitions and model-checking algorithms form the basis of current techniques for creating attack graphs. Still, these techniques are normally specific to a domain and inflexible when introducing new types of attacks. For instance, conventional methods involve a lot of manual input of information on the vulnerability; this could be better, considering that new vulnerabilities are being found almost daily. Often, such approaches only utilize static formal definitions of an attack, which cannot be dynamically applied to new attack vectors. All this brings out the reality that there is a need for a new approach that can adapt dynamically to new information upon its reception.

A research team from the University of California Irvine and Cisco Research has proposed another line of work in a new approach toward automated attack graph generation using retriever-augmented LLMs, namely CrystalBall, leveraging GPT-4. This approach automates chaining CVEs according to their preconditions and postconditions, supporting dynamicity and scalability in attack graph generation. It is designed to process large volumes of unstructured and structured data and fits modern cybersecurity environments. The research team has worked particularly on integrating LLMs with a retriever model that improves the accuracy and relevance of the attack graphs generated.

The underlying technology behind CrystalBall is sophisticated and effective. It applies a generation method augmented by a retriever, namely RAG, for retrieving the most relevant CVEs concerning a given set of system information supplied by the user against a large dataset. This information will be stored in a relational database supporting semantic search, enabling the system to chain vulnerabilities with a high degree of accuracy. It is applied as a black box to the LLM-based system, where the latter generates attack graphs. This approach ensures the comprehensiveness and relevance of generated graphs to the context in which they are applied for security purposes.

Rigorously, CrystalBall’s performance has been tested and compared against other methods. It has been shown that research into LLMs, especially GPT-4, increased the efficiency and accuracy of generating attack graphs. For instance, it processed threat reports and then generated attack graphs to a high degree of accuracy, covering 95% of relevant vulnerabilities and chaining them into coherent attack paths. Compared with other models, GPT-4 performed best on detail and cross-device vulnerability chaining, generating the most contextually relevant and accurate graphs. This solves a major deficiency of past techniques that often missed important contextual links between vulnerabilities.

When using large language models for cybersecurity—attack graph generation, these results are a big deal. On the other hand, CrystalBall improves the efficiency of attack graph generation and the accuracy and real-time relevance of the graphs generated. The important point is that while LLMs perform quite well in most scenarios, this approach still has limitations. Lacking domain-specific expertise, LLMs sometimes generate graphs that may further need refining or validation by a human expert. Moreover, there is an ethical concern while developing machine learning models for cybersecurity tasks because of the possibility of misuse.

In conclusion, this study concludes that the research provides a strong solution for the modern cybersecurity challenges. Further, the CrystalBall system enables the power of big Language Models like GPT-4 by providing a dynamic, scalable, and highly accurate method of generating the attack graphs. It is one of the approaches to overcome the shortcomings of previous methods in this area of research and keep up with the fast pace of change in the landscape of vulnerabilities and threats. Yet, many challenges remain open, but the potential benefits of this line of work render it a promising direction for further research and application in cybersecurity.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Researchers from UCI and Cisco Propose ‘CrystalBall’: A Novel AI Method for Automated Attack Graph Generation Using Retriever-Augmented Large Language Models appeared first on MarkTechPost.

Efficient and Robust Controllable Generation: ControlNeXt Revolutioniz …

The research paper titled “ControlNeXt: Powerful and Efficient Control for Image and Video Generation” addresses a significant challenge in generative models, particularly in the context of image and video generation. As diffusion models have gained prominence for their ability to produce high-quality outputs, the need for fine-grained control over these generated results has become increasingly important. Traditional methods, such as ControlNet and Adapters, have attempted to enhance controllability by integrating additional architectures. However, these approaches often lead to substantial increases in computational demands, particularly in video generation, where the processing of each frame can double GPU memory consumption. This paper highlights the limitations of existing methods, which need to improve with high resource requirements and weak control. It introduces ControlNeXt as a more efficient and robust solution for controllable visual generation.

Existing architectures typically rely on parallel branches or adapters to incorporate control information, which can significantly inflate the model’s complexity and training requirements. For instance, ControlNet employs additional layers to process control conditions alongside the main generation process. However, this architecture can lead to increased latency and training difficulties, particularly due to the introduction of zero convolution layers that complicate convergence. In contrast, the proposed ControlNeXt method aims to streamline this process by replacing heavy additional branches with a more straightforward, efficient architecture. This design minimizes the computational burden while maintaining the ability to integrate with other low-rank adaptation (LoRA) weights, allowing for style alterations without necessitating extensive retraining.

Delving deeper into the proposed method, ControlNeXt introduces a novel architecture that significantly reduces the number of learnable parameters to 90% less than its predecessors. This is achieved using a lightweight convolutional network to extract conditional control features rather than relying on a parallel control branch. The architecture is designed to maintain compatibility with existing diffusion models while enhancing efficiency. Furthermore, the introduction of Cross Normalization (CN) replaces zero convolution, addressing the slow convergence and training challenges typically associated with standard methods. Cross Normalization aligns the data distributions of new and pre-trained parameters, facilitating a more stable training process. This innovative approach optimizes the training time and enhances the model’s overall performance across various tasks.

The performance of ControlNeXt has been rigorously evaluated through a series of experiments involving different base models for image and video generation. The results demonstrate that ControlNeXt effectively retains the original model’s architecture while introducing only a minimal number of auxiliary components. This lightweight design allows seamless integration as a plug-and-play module with existing systems. The experiments reveal that ControlNeXt achieves remarkable efficiency, with significantly reduced latency overhead and parameter counts compared to traditional methods. The ability to fine-tune large pre-trained models with minimal additional complexity positions ControlNeXt as a robust solution for a wide range of generative tasks, enhancing the quality and controllability of generated outputs.

In conclusion, the research paper presents ControlNeXt as a powerful and efficient method for image and video generation that addresses the critical issues of high computational demands and weak control in existing models. By simplifying the architecture and introducing Cross Normalization, the authors provide a solution that not only enhances performance but also maintains compatibility with established frameworks. ControlNeXt stands out as a significant advancement in the field of controllable generative models, promising to facilitate more precise and efficient generation of visual content.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Efficient and Robust Controllable Generation: ControlNeXt Revolutionizes Image and Video Creation appeared first on MarkTechPost.

Cracking the Code of AI Alignment: This AI Paper from the University o …

Effectively aligning large language models (LLMs) with human instructions is a critical challenge in the field of AI. Current LLMs often struggle to generate responses that are both accurate and contextually relevant to user instructions, particularly when relying on synthetic data. Traditional methods, such as model distillation and human-annotated datasets, have their own limitations, including scalability issues and a lack of data diversity. Addressing these challenges is essential for enhancing the performance of AI systems in real-world applications, where they must interpret and execute a wide range of user-defined tasks.

Current approaches to instruction alignment primarily rely on human-annotated datasets and synthetic data generated through model distillation. While human-annotated data is high in quality, it is expensive and difficult to scale. On the other hand, synthetic data, often produced via distillation from larger models, tends to lack diversity and may lead to models that overfit to specific types of tasks, thereby limiting their ability to generalize to new instructions. These limitations, including high costs and the “false promise” of distillation, hinder the development of robust, versatile LLMs capable of handling a broad spectrum of tasks.

A team of researchers from University of Washington and Meta Fair propose a novel method known as “instruction back-and-forth translation.” This approach enhances the generation of synthetic instruction-response pairs by integrating backtranslation with response rewriting. Initially, instructions are generated from pre-existing responses extracted from large-scale web corpora. These responses are then refined by an LLM, which rewrites them to better align with the generated instructions. This innovative method leverages the rich diversity of information available on the web while ensuring high-quality, instruction-following data, marking a significant advancement in the field.

The approach involves fine-tuning a base LLM on seed data to create instructions that match web-scraped responses. The Dolma corpus, a large-scale open-source dataset, provides the source of these responses. After generating the initial instruction-response pairs, a filtering step retains only the highest quality pairs. An aligned LLM, such as Llama-2-70B-chat, then rewrites the responses to further enhance their quality. Nucleus sampling is employed for response generation, with a focus on both filtering and rewriting to ensure data quality. Testing against several baseline datasets reveals superior performance for models fine-tuned on synthetic data generated through this technique.

This new method achieves significant improvements in model performance across various benchmarks. Models fine-tuned using the Dolma + filtering + rewriting dataset attain a win rate of 91.74% on the AlpacaEval benchmark, surpassing models trained on other prevalent datasets such as OpenOrca and ShareGPT. Additionally, it outperforms previous approaches using data from ClueWeb, demonstrating its effectiveness in generating high-quality, diverse instruction-following data. The enhanced performance underscores the success of the back-and-forth translation technique in producing better-aligned and more accurate large language models.

In conclusion, the introduction of this new method for generating high-quality synthetic data marks a significant advancement in aligning LLMs with human instructions. By combining back-translation with response rewriting, researchers have developed a scalable and effective approach that improves the performance of instruction-following models. This advancement is crucial for the AI field, offering a more efficient and accurate solution for instruction alignment, which is essential for deploying LLMs in practical applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Cracking the Code of AI Alignment: This AI Paper from the University of Washington and Meta FAIR Unveils Better Alignment with Instruction Back-and-Forth Translation appeared first on MarkTechPost.

Answer.AI Releases answerai-colbert-small: A Proof of Concept for Smal …

AnswerAI has unveiled a robust model called answerai-colbert-small-v1, showcasing the potential of multi-vector models when combined with advanced training techniques. This proof-of-concept model, developed using the innovative JaColBERTv2.5 training recipe and additional optimizations, demonstrates remarkable performance despite its compact size of just 33 million parameters. The model’s efficiency is particularly noteworthy, as it achieves these results while maintaining a footprint comparable to MiniLM.

In a surprising turn of events, answerai-colbert-small-v1 has surpassed the performance of all previous models of similar size on common benchmarks. Even more impressively, it has outperformed much larger and widely used models, including e5-large-v2 and bge-base-en-v1.5. This achievement underscores the potential of AnswerAI’s approach in pushing the boundaries of what’s possible with smaller, more efficient AI models.

Multi-vector retrievers, introduced through the ColBERT model architecture, offer a unique approach to document representation. Unlike traditional methods that create a single vector per document, ColBERT generates multiple smaller vectors, each representing a single token. This technique addresses the information loss often associated with single-vector representations, particularly in out-of-domain generalization tasks. The architecture also incorporates query augmentation, using masked language modeling to enhance retrieval performance.

ColBERT’s innovative MaxSim scoring mechanism calculates the similarity between query and document tokens, summing the highest similarities for each query token. While this approach consistently improves out-of-domain generalization, it initially faced challenges with in-domain tasks and required significant memory and storage resources. ColBERTv2 addressed these issues by introducing a more modern training recipe, including in-batch negatives and knowledge distillation, along with a unique indexing approach that reduced storage requirements.

In the Japanese language context, JaColBERTv1 and v2 have demonstrated even greater success than their English counterparts. JaColBERTv1, following the original ColBERT training recipe, became the strongest monolingual Japanese retriever of its time. JaColBERTv2, built on the ColBERTv2 recipe, further improved performance and currently stands as the strongest out-of-domain retriever across all existing Japanese benchmarks, though it still faces some challenges in large-scale retrieval tasks like MIRACL.

The answerai-colbert-small-v1 model has been specifically designed with future compatibility in mind, particularly for the upcoming RAGatouille overhaul. This forward-thinking approach ensures that the model will remain relevant and useful as new technologies emerge. Despite its future-oriented design, the model maintains broad compatibility with recent ColBERT implementations, offering users flexibility in their choice of tools and frameworks.

For those interested in utilizing this innovative model, there are two primary options available. Users can opt for the Stanford ColBERT library, which is a well-established and widely-used implementation. Alternatively, they can choose RAGatouille, which may offer additional features or optimizations. The installation process for either or both of these libraries is straightforward, requiring a simple command execution to get started.

Image source: https://huggingface.co/answerdotai/answerai-colbert-small-v1

The results of the answerai-colbert-small-v1 model demonstrate its exceptional performance when compared to single-vector models.

Image source: https://huggingface.co/answerdotai/answerai-colbert-small-v1

AnswerAI’s answerai-colbert-small-v1 model represents a significant advancement in multi-vector retrieval systems. Despite its compact 33 million parameters, it outperforms larger models like e5-large-v2 and bge-base-en-v1.5. Built on the ColBERT architecture and enhanced by the JaColBERTv2.5 training recipe, it excels in out-of-domain generalization. The model’s success stems from its multi-vector approach, query augmentation, and MaxSim scoring mechanism. Designed for future compatibility, particularly with the upcoming RAGatouille overhaul, it remains compatible with recent ColBERT implementations. Users can easily implement it using either the Stanford ColBERT library or RAGatouille, showcasing AnswerAI’s potential to reshape AI efficiency and performance.

Check out the Model Card and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Answer.AI Releases answerai-colbert-small: A Proof of Concept for Smaller, Faster, Modern ColBERT Models appeared first on MarkTechPost.

Neural Magic Releases LLM Compressor: A Novel Library to Compress LLMs …

Neural Magic has released the LLM Compressor, a state-of-the-art tool for large language model optimization that enables far quicker inference through much more advanced model compression. Hence, the tool is an important building block in Neural Magic’s pursuit of making high-performance open-source solutions available to the deep learning community, especially inside the vLLM framework.

Image Source

LLM Compressor reduces the difficulties that arise from the previously fragmented landscape of model compression tools, wherein users had to develop multiple bespoke libraries similar to AutoGPTQ, AutoAWQ, and AutoFP8 to apply certain quantization and compression algorithms. Such fragmented tools are folded into one library by LLM Compressor to easily apply state-of-the-art compression algorithms like GPTQ, SmoothQuant, and SparseGPT. These algorithms are implemented to create compressed models that offer reduced inference latency and maintain high levels of accuracy, which is critical for the model to be in production environments.

The second key technical advancement the LLM Compressor brings is activation and weight quantization support. In particular, activation quantization is important to ensure that INT8 and FP8 tensor cores are utilized. These are optimized for high-performance computing on the new GPU architectures from NVIDIA, such as the Ada Lovelace and Hopper architectures. This is an important capability in accelerating compute-bound workloads where the computational bottleneck is eased by using lower-precision arithmetic units. It means that, by quantizing activations and weights, the LLM Compressor allows for up to a twofold increase in performance for inference tasks, mainly under high server loads. This is attested by large models like Llama 3.1 70B, which proves that using the LLM Compressor, the model achieves latency performance very close to that of an unquantized version running on four GPUs with just two.

Image Source

Besides activation quantization, the LLM Compressor supports state-of-the-art structured sparsity, 2:4, weight pruning with SparseGPT. This weight pruning removes redundant parameters selectively to reduce the loss in accuracy by dropping 50% of the model’s size. In addition to accelerating inference, this quantization-pruning combination minimizes the memory footprint and enables deployment on resource-constrained hardware for LLMs.

The LLM Compressor was designed to integrate easily into any open-source ecosystem, particularly the Hugging Face model hub, via the painless loading and running of compressed models within vLLM. Further, the tool extends this by supporting a variety of quantization schemes, including fine-grained control over quantization, like per-tensor or per-channel on weights and per-tensor or per-token quantization on activation. This flexibility in the quantization strategy will allow very fine tuning concerning the demands on performance and accuracy from different models and deployment scenarios.

Image Source

Technically, the LLM Compressor is designed to work with various model architectures with extensibility. It has an aggressive roadmap for the tool, including extending support to MoE models, vision-language models, and non-NVIDIA hardware platforms. Other areas in the roadmap that are due for development include advanced quantization techniques such as AWQ and tools for creating non-uniform quantization schemes; those are expected to extend model efficiency further.

In conclusion, the LLM Compressor thus becomes an important tool for researchers and practitioners alike in optimizing LLMs for deployment to production. It is open-source and has state-of-the-art features, making it easier to compress models and obtain heavy performance improvements without affecting the integrity of the models. The LLM Compressor and similar tools will play a very important role shortly when AI continues scaling in efficiently deploying large models on diverse hardware environments, making them more accessible for application in many other areas.

Check out the GitHub Page and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Neural Magic Releases LLM Compressor: A Novel Library to Compress LLMs for Faster Inference with vLLM appeared first on MarkTechPost.