Nous: An Open-Source TypesScript Platform for Building Autonomous AI A …

Building and managing such AI systems requires specialized knowledge due to the intricate interactions between various components. The AI landscape is fragmented, with disparate tools and libraries that lead to integration challenges and inconsistencies. This fragmentation hinders the ability to create standardized, interoperable, and reusable AI components, making the development process arduous and less accessible to a broader audience. Researchers addressed the complexity and fragmentation of developing autonomous AI agents and Large Language Model (LLM) workflows by releasing a typescript open-source platform.

Current methods for developing autonomous AI agents and LLM workflows often involve specialized tools and libraries, each serving different purposes like data processing, model training, inference, and decision-making. However, these tools are often not standardized, making integration difficult and leading to inefficiencies in the development process. The proposed solution, Nous, is an open-source TypeScript platform that aims to streamline the creation and management of these complex AI systems. Nous provides a unified framework to simplify development by offering standardized tools and promoting interoperability among AI components. It empowers developers to build sophisticated AI systems without needing extensive expertise in every aspect of AI development.

Nous is built on a component-based architecture that allows developers to create and combine reusable modules for various AI tasks. This modularity promotes flexibility and scalability, enabling the platform to handle large-scale AI applications. The platform emphasizes declarative programming, where developers specify the desired outcomes rather than the exact steps to achieve them. This approach simplifies the development process and makes it easier to reason about the system’s behavior. Nous also integrates seamlessly with popular AI libraries and frameworks such as TensorFlow, PyTorch, and Hugging Face Transformers, making it an extensible and adaptable tool for diverse AI workflows. Although Nous is not yet quantified against existing methods, its efficient design optimizes resource utilization and minimizes latency. It also prioritizes reliability and robustness, ensuring that AI systems built on the platform are dependable and resilient.

In conclusion, Nous offers a promising solution to the challenges of AI development by providing a standardized and efficient platform that simplifies the creation and management of autonomous AI agents and LLM workflows. By addressing the complexity and fragmentation in the AI landscape, Nous has the potential to accelerate innovation, improve accessibility to AI technologies, and foster collaboration among developers and researchers. The platform’s modularity, declarative programming approach, and integration with existing tools make it a powerful and versatile tool for building sophisticated AI systems, ultimately contributing to the advancement of artificial intelligence.

The post Nous: An Open-Source TypesScript Platform for Building Autonomous AI Agents and LLM Workflows appeared first on MarkTechPost.

FalconMamba 7B Released: The World’s First Attention-Free AI Model w …

The Technology Innovation Institute (TII) in Abu Dhabi has recently unveiled the FalconMamba 7B, a groundbreaking artificial intelligence model. This model, the first strong attention-free 7B model, is designed to overcome many of the limitations existing AI architectures face, particularly in handling large data sequences. The FalconMamba 7B is released under the TII Falcon License 2.0. It is available as an open-access model within the Hugging Face ecosystem, making it accessible for researchers and developers globally.

FalconMamba 7B distinguishes itself based on the Mamba architecture, originally proposed in the paper “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” This architecture diverges from the traditional transformer models that dominate the AI landscape today. Transformers, while powerful, have a fundamental limitation in processing large sequences due to their reliance on attention mechanisms, which increase compute and memory costs with sequence length. FalconMamba 7B, however, overcomes these limitations through its architecture, which includes extra RMS normalization layers to ensure stable training at scale. This enables the model to process sequences of arbitrary length without an increase in memory storage, making it capable of fitting on a single A10 24GB GPU.

Image Source

One of the standout features of FalconMamba 7B is its constant token generation time, irrespective of the context size. This is a great advantage over traditional models, where generation time typically increases with the context length due to the need to attend to all previous tokens in the context. The Mamba architecture addresses this by storing only its recurrent state, thus avoiding the linear scaling of memory requirements and generation time.

The training of FalconMamba 7B involved approximately 5500GT, primarily composed of RefinedWeb data, supplemented with high-quality technical and code data from public sources. The model was trained using a constant learning rate for most of the process, followed by a short learning rate decay stage. A small portion of high-quality curated data was added during this final stage to enhance the model’s performance further.

Image Source

In terms of benchmarks, FalconMamba 7B has demonstrated impressive results across various evaluations. For example, the model scored 33.36 in the MATH benchmark, while in the MMLU-IFEval and BBH benchmarks, it scored 19.88 and 3.63, respectively. These results highlight the model’s strong performance compared to other state-of-the-art models, particularly in tasks requiring long sequence processing.

FalconMamba 7B’s architecture also enables it to fit larger sequences in a single 24GB A10 GPU compared to transformer models. It maintains a constant generation throughput without any increase in CUDA peak memory. This efficiency in handling large sequences makes FalconMamba 7B a highly versatile tool for applications requiring extensive data processing.

FalconMamba 7B is compatible with the Hugging Face transformers library (version >4.45.0). It supports features like bits and bytes quantization, which allows the model to run on smaller GPU memory constraints. This makes it accessible to many users, from academic researchers to industry professionals.

TII has introduced an instruction-tuned version of FalconMamba, fine-tuned with an additional 5 billion tokens of supervised fine-tuning data. This version enhances the model’s ability to perform instructional tasks more precisely and effectively. Users can also benefit from faster inference using torch.compile, further increasing the model’s utility in real-world applications.

In conclusion, the release of FalconMamba 7B by the Technology Innovation Institute, with its innovative architecture, impressive performance on benchmarks, and accessibility through the Hugging Face ecosystem, FalconMamba 7B, is poised to make a substantial impact across various sectors.

Check out the Model and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post FalconMamba 7B Released: The World’s First Attention-Free AI Model with 5500GT Training Data and 7 Billion Parameters appeared first on MarkTechPost.

This AI Paper by Apple Introduces Matryoshka Diffusion Models: A Hiera …

Diffusion models have set new benchmarks for generating realistic, intricate images and videos. However, scaling these models to handle high-resolution outputs remains a formidable challenge. The primary issues revolve around the significant computational power and complex optimization processes required, which make it difficult to implement these models efficiently in practical applications.

One of the central problems in high-resolution image and video generation is the inefficiency and resource intensity of current diffusion models. These models must repeatedly reprocess entire high-resolution inputs, which is time-consuming and computationally demanding. Moreover, the need for deep architectures with attention blocks to manage high-resolution data further complicates the optimization process, making achieving the desired output quality even more challenging.

Traditional methods for generating high-resolution images typically involve a multi-stage process. Cascaded models, for example, create pictures at lower resolutions first and then enhance them through additional stages, resulting in a high-resolution output. Another common approach is using latent diffusion models, which operate in a downsampled latent space and depend on auto-encoders to generate high-resolution images. However, these methods come with challenges, such as increased complexity and a potential drop in quality due to the inherent compression in the latent space.

Researchers from Apple have introduced a groundbreaking approach known as Matryoshka Diffusion Models (MDM) to address these challenges in high-resolution image and video generation. MDM stands out by integrating a hierarchical structure into the diffusion process, eliminating the need for separate stages that complicate training and inference in traditional models. This innovative method enables the generation of high-resolution content more efficiently and with greater scalability, marking a significant advancement in the field of AI-driven visual content creation.

The MDM methodology is built on a NestedUNet architecture, where the features and parameters for smaller-scale inputs are embedded within those of larger scales. This nesting allows the model to handle multiple resolutions simultaneously, significantly improving training speed and resource efficiency. The researchers also introduced a progressive training schedule that starts with low-resolution inputs and gradually increases the resolution as training progresses. This approach speeds up the training process and enhances the model’s ability to optimize for high-resolution outputs. The architecture’s hierarchical nature ensures that computational resources are allocated efficiently across different resolution levels, leading to more effective training and inference.

The performance of MDM is noteworthy, particularly in its ability to achieve high-quality results with less computational overhead compared to existing models. The research team from Apple demonstrated that MDM could train high-resolution models up to 1024×1024 pixels using the CC12M dataset, which contains 12 million images. Despite the relatively small size of the dataset, MDM achieved strong zero-shot generalization, meaning it performed well on new data without the need for extensive fine-tuning. The model’s efficiency is further highlighted by its ability to produce high-resolution images with Frechet Inception Distance (FID) scores that are competitive with state-of-the-art methods. For instance, MDM achieved a FID score of 6.62 on ImageNet 256×256 and 13.43 on MS-COCO 256×256, demonstrating its capability to generate high-quality images efficiently.

In conclusion, the introduction of Matryoshka Diffusion Models by researchers at Apple represents a significant step forward in high-resolution image and video generation. By leveraging a hierarchical structure and a progressive training schedule, MDM offers a more efficient and scalable solution than traditional methods. This advancement addresses the inefficiencies and complexities of existing diffusion models and paves the way for more practical and resource-efficient applications of AI-driven visual content creation. As a result, MDM holds great potential for future developments in the field, providing a robust framework for generating high-quality images and videos with reduced computational demands.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post This AI Paper by Apple Introduces Matryoshka Diffusion Models: A Hierarchical Approach for Efficient High-Resolution Image Generation appeared first on MarkTechPost.

Harness the power of AI and ML using Splunk and Amazon SageMaker Canva …

As the scale and complexity of data handled by organizations increase, traditional rules-based approaches to analyzing the data alone are no longer viable. Instead, organizations are increasingly looking to take advantage of transformative technologies like machine learning (ML) and artificial intelligence (AI) to deliver innovative products, improve outcomes, and gain operational efficiencies at scale. Furthermore, the democratization of AI and ML through AWS and AWS Partner solutions is accelerating its adoption across all industries.
For example, a health-tech company may be looking to improve patient care by predicting the probability that an elderly patient may become hospitalized by analyzing both clinical and non-clinical data. This will allow them to intervene early, personalize the delivery of care, and make the most efficient use of existing resources, such as hospital bed capacity and nursing staff.
AWS offers the broadest and deepest set of AI and ML services and supporting infrastructure, such as Amazon SageMaker and Amazon Bedrock, to help you at every stage of your AI/ML adoption journey, including adoption of generative AI. Splunk, an AWS Partner, offers a unified security and observability platform built for speed and scale.
As the diversity and volume of data increases, it is vital to understand how they can be harnessed at scale by using complementary capabilities of the two platforms. For organizations looking beyond the use of out-of-the-box Splunk AI/ML features, this post explores how Amazon SageMaker Canvas, a no-code ML development service, can be used in conjunction with data collected in Splunk to drive actionable insights. We also demonstrate how to use the generative AI capabilities of SageMaker Canvas to speed up your data exploration and help you build better ML models.
Use case overview
In this example, a health-tech company offering remote patient monitoring is collecting operational data from wearables using Splunk. These device metrics and logs are ingested into and stored in a Splunk index, a repository of incoming data. Within Splunk, this data is used to fulfill context-specific security and observability use cases by Splunk users, such as monitoring the security posture and uptime of devices and performing proactive maintenance of the fleet.
Separately, the company uses AWS data services, such as Amazon Simple Storage Service (Amazon S3), to store data related to patients, such as patient information, device ownership details, and clinical telemetry data obtained from the wearables. These could include exports from customer relationship management (CRM), configuration management database (CMDB), and electronic health record (EHR) systems. In this example, they have access to an extract of patient information and hospital admission records that reside in an S3 bucket.
The following table illustrates the different data explored in this example use case.

Description
Feature Name
Storage
Example Source

Age of patient
age
AWS
EHR

Units of alcohol consumed by patient every week
alcohol_consumption
AWS
EHR

Tobacco usage by patient per week
tabacco_use
AWS
EHR

Average systolic blood pressure of patient
avg_systolic
AWS
Wearables

Average diastolic blood pressure of patient
avg_diastolic
AWS
Wearables

Average resting heart rate of patient
avg_resting_heartrate
AWS
Wearables

Patient admission record
admitted
AWS
EHR

Number of days the device has been active over a period
num_days_device_active
Splunk
Wearables

Average end of the day battery level over a period
avg_eod_device_battery_level
Splunk
Wearables

This post describes an approach with two key components:

The two data sources are stored alongside each other using a common AWS data engineering pipeline. Data is presented to the personas that need access using a unified interface.
An ML model to predict hospital admissions (admitted) is developed using the combined dataset and SageMaker Canvas. Professionals without a background in ML are empowered to analyze the data using no-code tooling.

The solution allows custom ML models to be developed from a broader variety of clinical and non-clinical data sources to cater for different real-life scenarios. For example, it can be used to answer questions such as “If patients have a propensity to have their wearables turned off and there is no clinical telemetry data available, can the likelihood that they are hospitalized still be accurately predicted?”
AWS data engineering pipeline
The adaptable approach detailed in this post starts with an automated data engineering pipeline to make data stored in Splunk available to a wide range of personas, including business intelligence (BI) analysts, data scientists, and ML practitioners, through a SQL interface. This is achieved by using the pipeline to transfer data from a Splunk index into an S3 bucket, where it will be cataloged.
The approach is shown in the following diagram.

Figure 1: Architecture overview of data engineering pipeline

The automated AWS data pipeline consists of the following steps:

Data from wearables is stored in a Splunk index where it can be queried by users, such as security operations center (SOC) analysts, using the Splunk search processing language (SPL). Spunk’s out-of-the-box AI/ML capabilities, such as the Splunk Machine Learning Toolkit (Splunk MLTK) and purpose-built models for security and observability use cases (for example, for anomaly detection and forecasting), can be utilized inside the Splunk Platform. Using these Splunk ML features allows you to derive contextualized insights quickly without the need for additional AWS infrastructure or skills.
Some organizations may look to develop custom, differentiated ML models, or want to build AI-enabled applications using AWS services for their specific use cases. To facilitate this, an automated data engineering pipeline is built using AWS Step Functions. The Step Functions state machine is configured with an AWS Lambda function to retrieve data from the Splunk index using the Splunk Enterprise SDK for Python. The SPL query requested through this REST API call is scoped to only retrieve the data of interest.

Lambda supports container images. This solution uses a Lambda function that runs a Docker container image. This allows larger data manipulation libraries, such as pandas and PyArrow, to be included in the deployment package.
If a large volume of data is being exported, the code may need to run for longer than the maximum possible duration, or require more memory than supported by Lambda functions. If this is the case, Step Functions can be configured to directly run a container task on Amazon Elastic Container Service (Amazon ECS).

For authentication and authorization, the Spunk bearer token is securely retrieved from AWS Secrets Manager by the Lambda function before calling the Splunk /search REST API endpoint. This bearer authentication token lets users access the REST endpoint using an authenticated identity.
Data retrieved by the Lambda function is transformed (if required) and uploaded to the designated S3 bucket alongside other datasets. This data is partitioned and compressed, and stored in storage and performance-optimized Apache Parquet file format.
As its last step, the Step Functions state machine runs an AWS Glue crawler to infer the schema of the Splunk data residing in the S3 bucket, and catalogs it for wider consumption as tables using the AWS Glue Data Catalog.
Wearable data exported from Splunk is now available to users and applications through the Data Catalog as a table. Analytics tooling such as Amazon Athena can now be used to query the data using SQL.
As data stored in your AWS environment grows, it is essential to have centralized governance in place. AWS Lake Formation allows you to simplify permissions management and data sharing to maintain security and compliance.

An AWS Serverless Application Model (AWS SAM) template is available to deploy all AWS resources required by this solution. This template can be found in the accompanying GitHub repository.
Refer to the README file for required prerequisites, deployment steps, and the process to test the data engineering pipeline solution.
AWS AI/ML analytics workflow
After the data engineering pipeline’s Step Functions state machine successfully completes and wearables data from Splunk is accessible alongside patient healthcare data using Athena, we use an example approach based on SageMaker Canvas to drive actionable insights.
SageMaker Canvas is a no-code visual interface that empowers you to prepare data, build, and deploy highly accurate ML models, streamlining the end-to-end ML lifecycle in a unified environment. You can prepare and transform data through point-and-click interactions and natural language, powered by Amazon SageMaker Data Wrangler. You can also tap into the power of automated machine learning (AutoML) and automatically build custom ML models for regression, classification, time series forecasting, natural language processing, and computer vision, supported by Amazon SageMaker Autopilot.
In this example, we use the service to classify whether a patient is likely to be admitted to a hospital over the next 30 days based on the combined dataset.
The approach is shown in the following diagram.

Figure 2: Architecture overview of ML development

The solution consists of the following steps:

An AWS Glue crawler crawls the data stored in S3 bucket. The Data Catalog exposes this data found in the folder structure as tables.
Athena provides a query engine to allow people and applications to interact with the tables using SQL.
SageMaker Canvas uses Athena as a data source to allow the data stored in the tables to be used for ML model development.

Solution overview
SageMaker Canvas allows you to build a custom ML model using a dataset that you have imported. In the following sections, we demonstrate how to create, explore, and transform a sample dataset, use natural language to query the data, check for data quality, create additional steps for the data flow, and build, test, and deploy an ML model.
Prerequisites
Before proceeding, refer to Getting started with using Amazon SageMaker Canvas to make sure you have the required prerequisites in place. Specifically, validate that the AWS Identity and Access Management (IAM) role your SageMaker domain is using has a policy attached with sufficient permissions to access Athena, AWS Glue, and Amazon S3 resources.
Create the dataset
SageMaker Canvas supports Athena as a data source. Data from wearables and patient healthcare data residing across your S3 bucket is accessed using Athena and the Data Catalog. This allows this tabular data to be directly imported into SageMaker Canvas to start your ML development.
To create your dataset, complete the following steps:

On the SageMaker Canvas console, choose Data Wrangler in the navigation pane.
On the Import and prepare dropdown menu, choose Tabular as the dataset type to denote that the imported data consists of rows and columns.

Figure 3: Importing tabular data using SageMaker Data Wrangler

For Select a data source, choose Athena.

On this page, you will see your Data Catalog database and tables listed, named patient_data and splunk_ops_data.

Join (inner join) the tables together using the user_id and id to create one overarching dataset that can be used during ML model development.
Under Import settings, enter unprocessed_data for Dataset name.
Choose Import to complete the process.

Figure 4: Joining data using SageMaker Data Wrangler

The combined dataset is now available to explore and transform using SageMaker Data Wrangler.
Explore and transform the dataset
SageMaker Data Wrangler enables you to transform and analyze the source dataset through data flows while still maintaining a no-code approach.
The previous step automatically created a data flow in the SageMaker Canvas console which we have renamed to data_prep_data_flow.flow. Additionally, two steps are automatically generated, as listed in the following table.

Step
Name
Description

1
Athena Source
Sets the unprocessed_data dataset as source of the data flow

2
Data types
Sets column types of unprocessed_data dataset

Before we create additional transform steps, let’s explore two SageMaker Canvas features that can help us focus on the right actions.
Use natural language to query the data
SageMaker Data Wrangler also provides generative AI capabilities called Chat for data prep powered by a large language model (LLM). This feature allows you to explore your data using natural language without any background in ML or SQL. Furthermore, any contextualized recommendations returned by the generative AI model can be introduced directly back into the data flow without writing any code.
In this section, we present some example prompts to demonstrate this in action. These examples have been selected to illustrate the art of the possible. We recommend that you experiment with different prompts to gain the best results for your particular use cases.
Example 1: Identify Splunk default fields
In this first example, we want to know whether there are Splunk default fields that we could potentially exclude from our dataset prior to ML model development.

In SageMaker Data Wrangler, open your data flow.
Choose Step 2 Data types, and choose Chat for data prep.
In the Chat for data prep pane, you can enter prompts in natural language to explore and transform the data. For example:

Prompt: This data has been exported from a Splunk index. Which columns are likely to be Splunk default fields?
Response: _bkt, _cd, _indextime, _raw, _si, _sourcetype, _time are likely Splunk default fields based on the Summary section showing they are mostly metadata/index fields.
Prompt: OK. Let’s drop _bkt, _cd, _indextime, _raw, _si, _sourcetype, and _time.
Response: The code drops the columns ‘_bkt’, ‘_cd’, ‘_indextime’, ‘_raw’, ‘_si’, ‘_sourcetype’ and ‘_time’ from the dataframe df using the drop() method.

In this example, the generative AI LLM has correctly identified Splunk default fields that could be safely dropped from the dataset.

Choose Add to steps to add this identified transformation to the data flow.

Figure 5: Using SageMaker Data Wrangler’s chat for data prep to identify Splunk’s default fields

Example 2: Identify additional columns that could be dropped
We now want to identify any further columns that could be dropped without being too specific about what we’re looking for. We want the LLM to make the suggestions based on the data, and provide us with the rationale. For example:

Prompt: List out the names of other columns that we should drop before ML development. Give me a list of columns, along with the reasons why.
Response: The code drops the timestamp, punct, id, index and linecount columns from the dataset. These columns do not contain useful information for machine learning and would add noise to the models.

In addition to the Splunk default fields identified earlier, the generative AI model is now proposing the removal of columns such as timestamp, punct, id, index, and linecount that don’t appear to be conducive to ML model development.

Figure 6: Using SageMaker Data Wrangler’s chat for data prep to identify additional fields that can be dropped

Example 3: Calculate average age column in dataset
You can also use the generative AI model to perform Text2SQL tasks in which you can simply ask questions of the data using natural language. This is useful if you want to validate the content of the dataset.
In this example, we want to know what the average patient age value is within the dataset:

Prompt: What is the average age column of the dataset?
Response: The average age in the dataset is 67 years.

By expanding View code, you can see what SQL statements the LLM has constructed using its Text2SQL capabilities. This gives you full visibility into how the results are being returned.

Figure 7: Using SageMaker Data Wrangler’s chat for data prep to run SQL statements

Check for data quality
SageMaker Canvas also provides exploratory data analysis (EDA) capabilities that allow you to gain deeper insights into the data prior to the ML model build step. With EDA, you can generate visualizations and analyses to validate whether you have the right data, and whether your ML model build is likely to yield results that are aligned to your organization’s expectations.
Example 1: Create a Data Quality and Insights Report
Complete the following steps to create a Data Quality and Insights Report:

While in the data flow step, choose the Analyses tab.
For Analysis type, choose Data Quality and Insights Report.
For Target column, choose admitted.
For Problem type, choose Classification.

This performs an analysis of the data that you have and provides information such as the number of missing values and outliers.

Figure 8: Running SageMaker Data Wrangler’s data quality and insights report

Refer to Get Insights On Data and Data Quality for details on how to interpret the results of this report.
Example 2: Create a Quick Model
In this second example, choose Quick Model for Analysis type and for Target column, choose admitted. The Quick Model estimates the expected predicted quality of the model.
By running the analysis, the estimated F1 score (a measure of predictive performance) of the model and feature importance scores are displayed.

Figure 9: Running SageMaker Data Wrangler’s quick model feature to assess the potential accuracy of the model

SageMaker Canvas supports many other analysis types. By reviewing these analyses in advance of your ML model build, you can continue to engineer the data and features to gain sufficient confidence that the ML model will meet your business objectives.
Create additional steps in the data flow
In this example, we have decided to update our data_prep_data_flow.flow data flow to implement additional transformations. The following table summarizes these steps.

Step
Transform
Description

3
Chat for data prep
Removes Splunk default fields identified.

4
Chat for data prep
Removes additional fields identified as being unhelpful to ML model development.

5
Group by
Groups together the rows by user_id and calculates an average of time-ordered numerical fields from Splunk. This is performed to convert the ML problem type from time series forecasting into a simple two-category prediction of target feature (admitted) using averages of the input values over a given time period. Alternatively, SageMaker Canvas also supports time series forecasting.

6
Drop column (manage columns)
Drops remaining columns that are unnecessary for our ML development, such as columns with high cardinality (for example, user_id).

7
Parse column as type
Converts numerical value types, for example from Float to Long. This is performed to make sure values, such as those in unit of days, remain integers after calculations.

8
Parse column as type
Converts additional columns that need to be parsed (each column requires a separate step).

9
Drop duplicates (manage rows)
Drops duplicate rows to avoid overfitting.

To create a new transform, view the data flow, then choose Add transform on the last step.

Figure 10: Using SageMaker Data Wrangler to add a transform to a data flow

Choose Add transform, and proceed to choose a transform type and its configuration.

Figure 11: Using SageMaker Data Wrangler to add a transform to a data flow

The following screenshot shows our newly updated end-to-end data flow featuring multiple steps. In this example, we ran the analyses at the end of the data flow.

Figure 12: Showing the end-to-end SageMaker Canvas Data Wrangler data flow

If you want to incorporate this data flow into a productionized ML workflow, SageMaker Canvas can create a Jupyter notebook that exports your data flow to Amazon SageMaker Pipelines.
Develop the ML model
To get started with ML model development, complete the following steps:

Choose Create model directly from the last step of the data flow.

Figure 13: Creating a model from the SageMaker Data Wrangler data flow

For Dataset name, enter a name for your transformed dataset (for example, processed_data).
Choose Export.

Figure 14: Naming the exported dataset to be used by the model in SageMaker Data Wrangler

This step will automatically create a new dataset.

After the dataset has been created successfully, choose Create model to begin the ML model creation.

Figure 15: Creating the model in SageMaker Data Wrangler

For Model name, enter a name for the model (for example, my_healthcare_model).
For Problem type, select Predictive analysis.
Choose Create.

Figure 16: Naming the model in SageMaker Canvas and selecting the predictive analysis type

You are now ready to progress through the Build, Analyze, Predict, and Deploy stages to develop and operationalize the ML model using SageMaker Canvas.

On the Build tab, for Target column, choose the column you want to predict (admitted).
Choose Quick build to build the model.

The Quick build option has a shorter build time, but the Standard build option generally enjoys higher accuracy.

Figure 17: Selecting the target column to predict in SageMaker Canvas

After a few minutes, on the Analyze tab, you will be able to view the accuracy of the model, along with column impact, scoring, and other advanced metrics. For example, we can see that a feature from the wearables data captured in Splunk—average_num_days_device_active—has a strong impact on whether the patient is likely to be admitted or not, along with their age. As such, the health-tech company may proactively reach out to elderly patients who tend to keep their wearables off to minimize the risk of their hospitalization.

Figure 18: Displaying the results from the model quick build in SageMaker Canvas

When you’re happy with the results from the Quick build, repeat the process with a Standard build to make sure you have an ML model with higher accuracy that can be deployed.
Test the ML model
Our ML model has now been built. If you’re satisfied with its accuracy, you can make predictions using this ML model using net new data on the Predict tab. Predictions can be performed either using batch (list of patients) or for a single entry (one patient).
Experiment with different values and choose Update prediction. The ML model will respond with a prediction for the new values that you have entered.
In this example, the ML model has identified a 64.5% probability that this particular patient will be admitted to hospital in the next 30 days. The health-tech company will likely want to prioritize the care of this patient.

Figure 19: Displaying the results from a single prediction using the model in SageMaker Canvas

Deploy the ML model
It is now possible for the health-tech company to build applications that can use this ML model to make predictions. ML models developed in SageMaker Canvas can be operationalized using a broader set of SageMaker services. For example:

You can register the model in Amazon SageMaker Model Registry so that data scientists and ML practitioners can continue to tune its performance, after which they can deploy it using machine learning operations (MLOps) practices. SageMaker Model Registry allows you to manage your ML models, compare versions, and visualize metrics. For more information, see Register and Deploy Models with Model Registry.
You can directly deploy the model as a SageMaker endpoint so that applications can call it to invoke the ML model and make predictions. After a SageMaker endpoint is deployed, it will run 24/7. Refer to Amazon SageMaker pricing for details on on-demand pricing.

To deploy the ML model, complete the following steps:

On the Deploy tab, choose Create Deployment.
Specify Deployment name, Instance type, and Instance count.
Choose Deploy to make the ML model available as a SageMaker endpoint.

In this example, we reduced the instance type to ml.m5.4xlarge and instance count to 1 before deployment.

Figure 20: Deploying the using SageMaker Canvas

At any time, you can directly test the endpoint from SageMaker Canvas on the Test deployment tab of the deployed endpoint listed under Operations on the SageMaker Canvas console.
Refer to the Amazon SageMaker Canvas Developer Guide for detailed steps to take your ML model development through its full development lifecycle and build applications that can consume the ML model to make predictions.
Clean up
Refer to the instructions in the README file to clean up the resources provisioned for the AWS data engineering pipeline solution.
SageMaker Canvas bills you for the duration of the session, and we recommend logging out of SageMaker Canvas when you are not using it. Refer to Logging out of Amazon SageMaker Canvas for more details. Furthermore, if you deployed a SageMaker endpoint, make sure you have deleted it.
Conclusion
This post explored a no-code approach involving SageMaker Canvas that can drive actionable insights from data stored across both Splunk and AWS platforms using AI/ML techniques. We also demonstrated how you can use the generative AI capabilities of SageMaker Canvas to speed up your data exploration and build ML models that are aligned to your business’s expectations.
Learn more about AI on Splunk and ML on AWS.

About the Authors

Alan Peaty is a Senior Partner Solutions Architect, helping Global Systems Integrators (GSIs), Global Independent Software Vendors (GISVs), and their customers adopt AWS services. Prior to joining AWS, Alan worked as an architect at systems integrators such as IBM, Capita, and CGI. Outside of work, Alan is a keen runner who loves to hit the muddy trails of the English countryside, and is an IoT enthusiast.

Brett Roberts is the Global Partner Technical Manager for AWS at Splunk, leading the technical strategy to help customers better secure and monitor their critical AWS environments and applications using Splunk. Brett was a member of the Splunk Trust and holds several Splunk and AWS certifications. Additionally, he co-hosts a community podcast and blog called Big Data Beard, exploring trends and technologies in the analytics and AI space.

Arnaud Lauer is a Principal Partner Solutions Architect in the Public Sector team at AWS. He enables partners and customers to understand how to best use AWS technologies to translate business needs into solutions. He brings more than 18 years of experience in delivering and architecting digital transformation projects across a range of industries, including public sector, energy, and consumer goods.

Supercharge Email & Ad Remarketing Reach with Customers.ai Shopify …

Hey DTC marketers! We’ve got some news that’s going to make your life a whole lot easier and a lot more profitable if your store is on Shopify. 

At Customers.ai, we help DTC businesses like yours grow, scale, and crush your sales goals. 

So while you’re busy building your brand and driving customer acquisition and retention initiatives, Customers.ai’s Shopify app for email and ad remarketing boosts your store’s sales by increasing the reach and efficiency of your highest performing marketing channels.

Introducing the Customers.ai Shopify App—your all-in-one visitor identification and retargeting solution for supercharging your Shopify store right from the platform you already know and love.

What’s New?

You’re probably thinking, “Wait, doesn’t Customers.ai already integrate with Shopify?” And you’re right! But we’ve taken things to the next level. 

Our new Shopify App is not just an integration, it’s a full-blown app that you can add directly from the Shopify App Store, making the setup faster, easier, and more seamless than ever before.

Why This App is a Game-Changer for Your Ecommerce Business

Ecommerce is competitive—there’s no getting around that. You need every advantage you can get to stand out, attract customers and, most importantly, keep them coming back. 

That’s where the Customers.ai Shopify Marketing App comes in.

1. Easy Access Right from Shopify

We know your time is valuable, so we’ve made it easier than ever to integrate Customers.ai into your Shopify store. 

The app serves as a gateway, allowing you to seamlessly connect your store with Customers.ai. This setup gives you the best of both worlds: the convenience of accessing Customers.ai through Shopify and the full suite of tools and insights available in our dedicated platform. 

Whether you’re running advanced audience segmentation, launching targeted campaigns, or analyzing customer behavior, you’ll find it all just a click away from your Shopify admin panel.

2. Scale Your Business with First-Party Visitor Data

We all know the best data is the kind that comes directly from your customers. 

With privacy concerns on the rise and third-party cookies becoming a thing of the past, relying on first-party data is more crucial than ever. That’s where the Customers.ai Shopify App shines.

We empower you to collect and utilize first-party data from your Shopify store seamlessly. You’ll gain valuable insights directly from your customers, allowing you to understand who is visiting your site along with their behaviors, preferences, and purchasing patterns. 

This means you can create highly personalized marketing campaigns that resonate with your audience, driving more engagement and conversions.

Not only does this give you a competitive edge, but it also ensures that you’re building trust with your customers by respecting their privacy and using their data responsibly. 

3. Boost Your Sales with Powerful Retargeting

One of the biggest challenges ecommerce marketers face is turning visitors into buyers, and even harder—getting those buyers to come back. 

Customers.ai makes this easy by helping you re-engage lost visitors and upsell to existing customers, by identifying those most likely to convert, based on their interactions with your store.

Instead of just creating ads, we equip you with the tools to connect with the right audience through multi-channel retargeting, whether it’s via Facebook, Instagram, Google, or even Messenger and SMS.

4. Seamless Customer Journey Mapping

Understanding your customer’s journey is key to optimizing your marketing efforts. 

With Customers.ai, you can easily map out the entire customer journey from the first click to the final purchase and beyond. 

You’ll have access to data-driven insights that allow you to fine-tune your strategies and maximize your ROI.

Plus, with the customer journey data all in one place, you’ll save time on manual tasks, allowing you to focus on what you do best: growing your brand.

Get Started Today!

Ready to see the Customers.ai Shopify App in action?

Head over to the Shopify App Store and download the Customers.ai Shopify Marketing App today. Setup is quick and easy, and our support team is always here to help you get the most out of the app.

Whether you’re just starting out or you’re an ecommerce veteran, the Customers.ai Shopify App is designed to help you achieve your goals faster and more effectively.

Don’t just take our word for it—try it out and experience the difference for yourself. 

Happy selling!

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post Supercharge Email & Ad Remarketing Reach with Customers.ai Shopify App appeared first on Customers.ai.

Integrating Stereoelectronic Effects into Molecular Graphs: A Novel Ap …

Traditional molecular representations, primarily focused on covalent bonds, have neglected crucial aspects like delocalization and non-covalent interactions. Existing machine learning models have utilized information-sparse representations, limiting their ability to capture molecular complexity. While computational chemistry has developed robust quantum-mechanical methods, their application in machine learning has been constrained by calculation challenges for complex systems. Graph-based representations have provided some topological information but lack quantum-chemical priors.

The increasing complexity of prediction tasks has highlighted the need for higher-fidelity representations. This work addresses these gaps by introducing stereo electronics-infused molecular graphs (SIMGs), which incorporate quantum-chemical interactions. SIMGs aim to enhance the interpretability and performance of machine learning models in molecular property predictions, overcoming the limitations of previous approaches and providing a more comprehensive understanding of molecular behavior.

Molecular representation is crucial for understanding chemical reactions and designing new materials. Traditional models use information-sparse representations, which are inadequate for complex tasks. This paper introduces stereoelectronics-infused molecular graphs (SIMGs), incorporating quantum-chemical information into molecular graphs. SIMGs enhance traditional representations by adding nodes for bond orbitals and lone pairs, addressing the neglect of essential interactions like delocalization and non-covalent forces. This approach aims to provide a more comprehensive understanding of molecular interactions, improving machine learning algorithms’ performance in predicting molecular properties and enabling evaluation of previously intractable systems, such as entire proteins.

The researchers employed Q-Chem 6.0.1 and NBO 7.0 for calculations using a high-throughput workflow infrastructure. They conducted Natural Bond Orbital analysis to quantify localized electron information, excluding Rydberg orbitals. The team introduced Stereo Electronics-Infused Molecular Graphs (SIMGs), incorporating stereoelectronic effects and representing donor-acceptor interactions. Their model architecture stacked multiple graph neural network blocks with graph attention layers and ReLU activation, addressing over-smoothing issues in multi-layer networks. Performance evaluation focused on lone pair classification and bond-related task predictions, demonstrating high accuracy and a 98% reconstruction rate of ground-truth extended graphs.

The model demonstrated exceptional performance across various prediction tasks, achieving high accuracy in classifying lone pair quantities and types. It successfully reconstructed the ground-truth extended graph in 98% of cases. Node-level tasks showed remarkable performance, with atom-related predictions achieving excellent R² scores and low MAEs and RMSEs. Lone pair predictions, especially for s and p-character, achieved excellent scores, while d-prediction tasks showed slightly lower performance due to limited data.

Bond-related task predictions were favorable, particularly for hybridization characters and polarizations. Performance positively correlated with interaction sample abundance. The F1 score ensured unbiased measurements for imbalanced classifications, highlighting the model’s effectiveness in capturing long-range interactions. These results underscore the successful integration of stereoelectronic effects into molecular graphs, significantly enhancing the model’s predictive capabilities across various molecular properties while also addressing challenges associated with d-character predictions. 

The study concludes that incorporating stereoelectronic interactions into molecular graphs significantly enhances machine-learning model performance, enabling a detailed understanding of molecular properties and behaviors. This approach allows predictions for previously inaccessible molecules, including complex biological structures. The new representation facilitates high-throughput Natural Bond Orbital analysis, potentially accelerating theoretical chemistry research. The tailored double-graph neural network workflow enables the broad application of learned representations. These findings suggest further exploration of stereoelectronic effects could lead to more sophisticated models, expanding applications in drug discovery and materials science. The study demonstrates the potential for advanced molecular representations to revolutionize predictive capabilities in chemistry and related fields.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Integrating Stereoelectronic Effects into Molecular Graphs: A Novel Approach for Enhanced Machine Learning Representations and Molecular Property Predictions appeared first on MarkTechPost.

Revolutionizing AI with Mamba: A Survey of Its Capabilities and Future …

Deep learning has revolutionized various domains, with Transformers emerging as a dominant architecture. However, Transformers must improve the processing of lengthy sequences due to their quadratic computational complexity. Recently, a novel architecture named Mamba has shown promise in building foundation models with comparable abilities to Transformers while maintaining near-linear scalability with sequence length. This survey aims to comprehensively understand this emerging model by consolidating existing Mamba-empowered studies.

Transformers have empowered numerous advanced models, especially large language models (LLMs) comprising billions of parameters. Despite their impressive achievements, Transformers still face inherent limitations, particularly time-consuming inference resulting from the quadratic computation complexity of attention calculation. To address these challenges, Mamba, inspired by classical state space models, has emerged as a promising alternative for building foundation models. Mamba delivers comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length, making it a potential game-changer in deep learning.

Mamba’s architecture is a unique blend of concepts from recurrent neural networks (RNNs), Transformers, and state space models. This hybrid approach allows Mamba to harness the strengths of each architecture while mitigating their weaknesses. The innovative selection mechanism within Mamba is particularly noteworthy; it parameterizes the state space model based on the input, enabling the model to dynamically adjust its focus on relevant information. This adaptability is crucial for handling diverse data types and maintaining performance across various tasks.

Mamba’s performance is a standout feature, demonstrating remarkable efficiency. It achieves up to three times faster computation on A100 GPUs compared to traditional Transformer models. This speedup is attributed to its ability to compute recurrently with a scanning method, which reduces the overhead associated with attention calculations. Moreover, Mamba’s near-linear scalability means that as the sequence length increases, the computational cost does not grow exponentially. This feature makes it feasible to process long sequences without incurring prohibitive resource demands, opening new avenues for deploying deep learning models in real-time applications.

Moreover, Mamba’s architecture has been shown to retain powerful modeling capabilities for complex sequential data. By effectively capturing long-range dependencies and managing memory through its selection mechanism, Mamba can outperform traditional models in tasks requiring deep contextual understanding. This performance is particularly evident in applications such as text generation and image processing, where maintaining context over long sequences is paramount. As a result, Mamba stands out as a promising foundation model that not only addresses the limitations of Transformers but also paves the way for future advancements in deep learning applications across various domains.

This survey comprehensively reviews recent Mamba-associated studies, covering advancements in Mamba-based models, techniques for adapting Mamba to diverse data, and applications where Mamba can excel. Mamba’s powerful modeling capabilities for complex and lengthy sequential data and near-linear scalability make it a promising alternative to Transformers. The survey also discusses current limitations and explores promising research directions to provide deeper insights for future investigations. As Mamba continues to evolve, it holds great potential to significantly impact various fields and push the boundaries of deep learning.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Revolutionizing AI with Mamba: A Survey of Its Capabilities and Future Directions appeared first on MarkTechPost.

Understanding Language Model Distillation

Knowledge Distillation (KD) has become a key technique in the field of Artificial Intelligence, especially in the context of Large Language Models (LLMs), for transferring the capabilities of proprietary models, like GPT-4, to open-source alternatives like LLaMA and Mistral. In addition to improving the performance of open-source models, this procedure is essential for compressing them and increasing their efficiency without significantly sacrificing their functionality. KD also helps open-source models become better versions of themselves by empowering them to become their own instructors.

In recent research, a thorough analysis of KD’s function in LLMs has been discussed, highlighting the significance of KD’s transfer of advanced knowledge to smaller, less resource-intensive models. The three primary pillars of the study’s structure were verticalisation, skill, and algorithm. Every pillar embodies a distinct facet of knowledge design, from the fundamental workings of the employed algorithms to the augmentation of particular cognitive capacities inside the models to the real-world implementations of these methods in other domains.

A Twitter user has elaborated on the study in a recent tweet. Within language models, distillation describes a process that condenses a vast and intricate model, referred to as the teacher model, into a more manageable and effective model, referred to as the student model. The main objective is to transfer the teacher’s knowledge to the student to enable the learner to perform at a level that is comparable to the teacher’s while utilizing a lot less processing power.

This is accomplished by teaching the student model to behave in a way that resembles that of the instructor, either by mirroring the teacher’s output distributions or by matching the teacher’s internal representations. Techniques like logit-based distillation and hidden states-based distillation are frequently used in the distillation process.

The principal advantage of distillation lies in its substantial decrease in both model size and computational needs, hence enabling the deployment of models in resource-constrained environments. The student model may frequently retain a high level of performance even with its reduced size, closely resembling the larger instructor model’s capabilities. When memory and processing power are limited, as they are in embedded systems and mobile devices, this efficiency is critical.

Distillation allows for freedom in the student model’s architecture selection. A considerably smaller model, such as StableLM-2-1.6B, can be created using the knowledge from a bigger model, such as Llama-3.1-70B, making the larger model usable in situations where it would not be feasible to use. When compared to conventional training methods, distillation techniques like those offered by tools like Arcee-AI’s DistillKit can result in significant performance gains, frequently without the need for extra training data.

In conclusion, this study is a useful tool for researchers, providing a thorough summary of the state-of-the-art approaches in knowledge distillation and recommending possible directions for further investigation. Through the gap between proprietary and open-source LLMs, this work highlights the potential for creating AI systems that are more powerful, accessible, and efficient. 

Check out the Related Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Understanding Language Model Distillation appeared first on MarkTechPost.

Crab Framework Released: An AI Framework for Building LLM Agent Benchm …

The development of autonomous agents capable of performing complex tasks across various environments has gained significant traction in artificial intelligence research. These agents are designed to interpret and execute natural language instructions within graphical user interface (GUI) environments, such as websites, desktop operating systems, and mobile devices. The ability of these agents to seamlessly navigate and perform tasks in these diverse environments is crucial for advancing human-computer interaction, allowing machines to handle increasingly intricate functions that span multiple platforms and systems.

A major challenge in this area is the development of reliable benchmarks that can accurately assess the performance of these agents in real-world scenarios. Traditional benchmarks often fail to meet this need due to limitations, such as a narrow focus on single-environment tasks, reliance on static datasets, and simplistic evaluation methods that do not reflect the dynamic nature of real-world applications. For example, existing benchmarks evaluate agents based on whether they achieve a final goal without considering the incremental progress made during the task or the multiple valid approaches an agent might take. This results in a less comprehensive evaluation that may not accurately capture the agent’s capabilities.

Researchers from KAUST, Eigent.AI, UTokyo, CMU, Stanford, Harvard, Tsinghua, SUSTech, and Oxford have developed the Crab framework, a novel benchmarking tool designed to evaluate cross-environment tasks. This framework stands out by supporting functions that span multiple devices and platforms, such as desktops and mobile phones, and by incorporating a graph-based evaluation method that offers a more detailed and nuanced assessment of an agent’s performance. Unlike traditional benchmarks, the Crab framework allows for the simultaneous operation of agents across different environments, making it more reflective of the complexities agents face in real-world scenarios.

The Crab framework introduces an innovative approach to task evaluation by decomposing complex tasks into smaller, manageable sub-tasks, each represented as nodes in a directed acyclic graph (DAG). This graph-based structure enables the sequential and parallel execution of sub-tasks, evaluated at multiple points rather than just at the end. This approach allows for assessing an agent’s performance at each task step, providing a more accurate picture of how well the agent functions across different environments. The flexibility of this method also accommodates multiple valid pathways to completing a task, ensuring a fairer and more comprehensive evaluation.

In the Crab Benchmark-v0, the researchers implemented a set of 100 real-world tasks that span both cross-environment and single-environment challenges. These tasks are designed to reflect common real-world applications, such as managing calendars, sending emails, navigating maps, and interacting with web browsers and terminal commands. The benchmark includes 29 tasks for Android devices, 53 tasks for Ubuntu desktops, and 18 tasks that require interaction between both environments. This comprehensive set of functions allows for a rigorous assessment of how well agents can perform across different platforms, simulating real-world conditions as closely as possible.

The research team tested the Crab framework using four advanced multimodal language models (MLMs): GPT-4o, GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro. The agents were evaluated in single-agent and multi-agent configurations, with nine different agent settings tested. The results revealed that the single-agent setup using the GPT-4o model achieved the highest task completion ratio of 35.26%, indicating its superior ability to handle cross-environment tasks. In contrast, other models and configurations showed varying effectiveness, with multi-agent structures generally performing slightly lower than single-agent setups. The performance metrics introduced by the Crab framework, such as Completion Ratio (CR), Execution Efficiency (EE), and Cost Efficiency (CE), successfully differentiated between the methods, highlighting the strengths & weaknesses of each model.

The framework also provided insights into why tasks were not completed, with the termination reasons categorized as False Completion, Reach Step Limit, and Invalid Action. For instance, multi-agent structures were more likely to produce invalid actions or incorrectly complete tasks due to potential miscommunication between agents. This analysis underlined the importance of improving communication protocols within multi-agent systems to enhance their overall performance.

In conclusion, the Crab framework introduces a detailed graph-based evaluation method and supports cross-environment tasks, offering a more dynamic and accurate assessment of agent performance. The benchmark’s rigorous testing with advanced MLMs such as GPT-4o and GPT-4 Turbo has provided valuable insights into the capabilities & challenges of current autonomous agents, paving the way for future research and development in this field. The framework’s ability to closely mirror real-world conditions makes it a critical tool for advancing the state of autonomous agent research.

Check out the Paper, GitHub, and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Crab Framework Released: An AI Framework for Building LLM Agent Benchmark Environments in a Python-Centric Way appeared first on MarkTechPost.

Parler-TTS Released: A Fully Open-Sourced Text-to-Speech Model with Ad …

Parler-TTS has emerged as a robust text-to-speech (TTS) library, offering two powerful models: Parler-TTS Large v1 and Parler-TTS Mini v1. Both models are trained on an impressive 45,000 hours of audio data, enabling them to generate high-quality, natural-sounding speech with remarkable control over various features. Users can manipulate aspects such as gender, background noise, speaking rate, pitch, and reverberation through simple text prompts, providing unprecedented flexibility in speech generation.

Image source: https://huggingface.co/spaces/parler-tts/parler_tts

The Parler-TTS Large v1 model boasts 2.2 billion parameters, making it a formidable tool for complex speech synthesis tasks. On the other hand, Parler-TTS Mini v1 serves as a lightweight alternative, offering similar capabilities in a more compact form. Both models are part of the broader Parler-TTS project, which aims to provide the community with comprehensive TTS training resources and dataset pre-processing code, fostering innovation and development in the field of speech synthesis.

One of the standout features of both Parler-TTS models is their ability to ensure speaker consistency across generations. The models have been trained on 34 distinct speakers, each characterized by name (e.g., Jon, Lea, Gary, Jenna, Mike, Laura). This feature allows users to specify a particular speaker in their text descriptions, enabling the generation of consistent voice outputs across multiple instances. For example, users can create a description like “Jon’s voice is monotone yet slightly fast in delivery” to maintain a specific speaker’s characteristics.

Image source: https://huggingface.co/spaces/parler-tts/parler_tts

The Parler-TTS project stands out from other TTS models due to its commitment to open-source principles. All datasets, pre-processing tools, training code, and model weights are released publicly under permissive licenses. This approach enables the community to build upon and extend the work, fostering the development of even more powerful TTS models. The project’s ecosystem includes the Parler-TTS repository for model training and fine-tuning, the Data-Speech repository for dataset annotation, and the Parler-TTS organization for accessing annotated datasets and future checkpoints.

To optimize the quality and characteristics of generated speech, Parler-TTS offers several useful tips for users. One key technique is to include specific terms in the text description to control audio clarity. For instance, incorporating the phrase “very clear audio” will prompt the model to generate the highest quality audio output. Conversely, using “very noisy audio” will introduce higher levels of background noise, allowing for more diverse and realistic speech environments when needed.

Punctuation plays a crucial role in controlling the prosody of generated speech. Users can utilize this feature to add nuance and natural pauses to the output. For example, strategically placing commas in the input text will result in small breaks in the generated speech, mimicking the natural rhythm and flow of human conversation. This simple yet effective method allows for greater control over the pacing and emphasis of the generated audio.

The remaining speech features, such as gender, speaking rate, pitch, and reverberation, can be directly manipulated through the text prompt. This level of control allows users to fine-tune the generated speech to match specific requirements or preferences. By carefully crafting the input description, users can achieve a wide range of voice characteristics, from a slow, deep masculine voice to a rapid, high-pitched feminine one, with varying degrees of reverberation to simulate different acoustic environments.

Parler-TTS emerges as a cutting-edge text-to-speech library, featuring two models: Large v1 and Mini v1. Trained on 45,000 hours of audio, these models generate high-quality speech with controllable features. The library offers speaker consistency across 34 voices and embraces open-source principles, fostering community innovation. Users can optimize output by specifying audio clarity, using punctuation for prosody control, and manipulating speech characteristics through text prompts. With its comprehensive ecosystem and user-friendly approach, Parler-TTS represents a significant advancement in speech synthesis technology, providing powerful tools for both complex tasks and lightweight applications.

Check out the GitHub and Demo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Parler-TTS Released: A Fully Open-Sourced Text-to-Speech Model with Advanced Speech Synthesis for Complex and Lightweight Applications appeared first on MarkTechPost.

Unraveling Human Reward Learning: A Hybrid Approach Combining Reinforc …

Human reward-guided learning is often modeled using simple RL algorithms that summarize past experiences into key variables like Q-values, representing expected rewards. However, recent findings suggest that these models oversimplify the complexity of human memory and decision-making. For instance, individual events and global reward statistics can significantly influence behavior, indicating that memory involves more than just summary statistics. ANNs, particularly RNNs, offer a more complex model by capturing long-term dependencies and intricate learning mechanisms, though they often need to be more interpretable than traditional RL models.

Researchers from institutions including Google DeepMind, University of Oxford, Princeton University, and University College London studied human reward-learning behavior using a hybrid approach combining RL models with ANNs. Their findings suggest that human behavior needs to be adequately explained by algorithms that incrementally update choice variables. Instead, human reward learning relies on a flexible memory system that forms complex representations of past events over multiple timescales. By iteratively replacing components of a classic RL model with ANNs, they uncovered insights into how experiences shape memory and guide decision-making.

A dataset was gathered from a reward-learning task involving 880 participants. In this task, participants repeatedly chose between four actions, each rewarded based on noisy, drifting reward magnitudes. After filtering, the study included 862 participants and 617,871 valid trials. Most participants learned the task by consistently choosing actions with higher rewards. This extensive dataset enabled significant behavioral variance extraction using RNNs and hybrid models, outperforming basic RL models in capturing human decision-making patterns.

The data was initially modeled using a traditional RL model (Best RL) and a flexible Vanilla RNN. Best RL, identified as the most effective among incremental-update models, employed a reward module to update Q-values and an action module for action perseverance. However, its simplicity limited its expressivity. The Vanilla RNN, which processes actions, rewards, and latent states together, predicted choices more accurately (68.3% vs. 58.9%). Further hybrid models like RL-ANN and Context-ANN, while improving upon Best RL, still fell short of Vanilla RNN. Memory-ANN, incorporating recurrent memory representations, matched Vanilla RNN’s performance, suggesting that detailed memory use was key to participants’ learning in the task.

The study reveals that traditional RL models, which rely solely on incrementally updated decision variables, need to catch up in predicting human choices compared to a novel model incorporating memory-sensitive decision-making. This new model distinguishes between decision variables that drive choices and memory variables that modulate how these decision variables are updated based on past rewards. Unlike RL models, where decision and learning variables are intertwined, this approach separates them, providing a clearer understanding of how learning influences choices. The model suggests that human knowledge is influenced by compressed memories of task history, reflecting both short- and long-term reward and action histories, which modulate learning independently of how they are implemented.

Memory-ANN, the proposed modular cognitive architecture, separates reward-based learning from action-based learning, supported by evidence from computational models and neuroscience. The architecture comprises a “surface” level of decision rules that process observable data and a “deep” level that handles complex, context-rich representations. This dual-layer system allows for flexible, context-driven decision-making, suggesting that human reward learning involves simple surface-level processes and deeper memory-based mechanisms. These findings agree that complex models with rich representations must capture the full spectrum of human behavior, particularly in learning tasks. The insights gained here could have broader applications, extending to various learning tasks and cognitive science.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Unraveling Human Reward Learning: A Hybrid Approach Combining Reinforcement Learning with Advanced Memory Architectures appeared first on MarkTechPost.

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models …

Large Language Models (LMMs) are developing significantly and proving to be capable of handling more complicated jobs that call for a blend of different integrated skills. Among these jobs include GUI navigation, converting images to code, and comprehending films. A number of benchmarks, including MME, MMBench, SEEDBench, MMMU, and MM-Vet, have been established in order to comprehensively evaluate the performance of LMMs. It concentrates on assessing LMMs according to their capacity to integrate fundamental functions.

In recent research, MM-Vet has established itself as one of the most popular benchmarks for evaluating LLMs, particularly through its use of open-ended vision-language questions designed to assess integrated capabilities. Six fundamental vision-language (VL) skills are particularly assessed by this benchmark: numeracy, recognition, knowledge, spatial awareness, language creation, and optical character recognition (OCR). Many real-world applications depend on the ability to comprehend and absorb written and visual information cohesively, which is made possible by these skills.

However, there’s limitation with the original MM-Vet format: it can only be used for questions with a single image-text pair. This is problematic because it fails to capture the intricacy of real-world situations, where information is frequently presented in text and visual sequences. In these kinds of situations, a model is put to the test in a more sophisticated and practical way by having to comprehend and interpret a variety of textual and visual information in context.

MM-Vet has been improved to MM-Vet v2 in order to get around this restriction. ‘Image-text sequence understanding’ is the seventh VL capability included in this edition. This feature is intended to assess a model’s processing speed for sequences containing both text and visual information, more representative of the kinds of tasks that Large Multimodal Models (LMMs) are likely to encounter in real-world scenarios. With the addition of this new feature, MM-Vet v2 offers a more thorough evaluation of an LMM’s overall effectiveness and capacity to manage intricate and interconnected tasks.

MM-Vet v2 aims to increase the size of the evaluation set while preserving the high caliber of the assessment samples, in addition to improving the capabilities evaluated. This guarantees that the standard will continue to be strict and trustworthy even as it expands to encompass increasingly difficult and varied jobs. After benchmarking multiple LMMs using MM-Vet v2, it was shown that Claude 3.5 Sonnet has the greatest performance score (71.8). This marginally outperformed GPT-4o, which had a score of 71.0, suggesting that Claude 3.5 Sonnet is marginally more adept at completing the challenging tasks assessed by MM-Vet v2. With a competitive score of 68.4, InternVL2-Llama3-76B stood out as the top open-weight model, proving its robustness in spite of its open-weight status.

In conclusion, MM-Vet v2 is a major step forward in the evaluation of LMMs. It provides a more comprehensive and realistic assessment of their abilities by adding the capacity to comprehend and process image-text sequences, as well as increasing the evaluation set’s quality and scope.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models (LMMs) for Integrated Capabilities appeared first on MarkTechPost.

Idefics3-8B-Llama3 Released: An Open Multimodal Model that Accepts Arb …

Machine learning models integrating text and images have become pivotal in advancing capabilities across various applications. These multimodal models are designed to process and understand combined textual and visual data, which enhances tasks such as answering questions about images, generating descriptions, or creating content based on multiple images. They are crucial for improving document comprehension and visual reasoning, especially in complex scenarios involving diverse data formats.

The core challenge in multimodal document processing involves handling and integrating large volumes of text and image data to deliver accurate and efficient results. Traditional models often need help with latency and accuracy when managing these complex data types simultaneously. This can lead to suboptimal performance in real-time applications where quick and precise responses are essential.

Existing techniques for processing multimodal inputs generally involve separate analyses of text and images, followed by a fusion of the results. These methods can be resource-intensive and may only sometimes yield the best outcomes due to the intricate nature of combining different data forms. Models such as Apache Kafka and Apache Flink are used for managing data streams, but they often require extensive resources and can become unwieldy for large-scale applications.

To overcome these limitations, HuggingFace Researchers have developed Idefics3-8B-Llama3, a cutting-edge multimodal model designed for enhanced document question answering. This model integrates the SigLip vision backbone with the Llama 3.1 text backbone, supporting text and image inputs with up to 10,000 context tokens. The model, licensed under Apache 2.0, represents a significant advancement over previous versions by combining improved document QA capabilities with a robust multimodal approach.

Idefics3-8B-Llama3 utilizes a novel architecture that effectively merges textual and visual information to generate accurate text outputs. The model’s 8.5 billion parameters enable it to handle diverse inputs, including complex documents that feature text and images. The enhancements include better handling of visual tokens by encoding images into 169 visual tokens and incorporating extended fine-tuning datasets like Docmatix. This approach aims to refine document understanding and improve overall performance in multimodal tasks.

Performance evaluations show that Idefics3-8B-Llama3 marks a substantial improvement over its predecessors. The model achieves a remarkable 87.7% accuracy in DocVQA and a 55.9% score in MMStar, compared to Idefics2’s 49.5% in DocVQA and 45.2% in MMMU. These results indicate significant enhancements in handling document-based queries and visual reasoning. The new model’s ability to manage up to 10,000 tokens of context and its integration with advanced technologies contribute to these performance gains.

In conclusion, Idefics3-8B-Llama3 represents a major advancement in multimodal document processing. By addressing previous limitations and delivering improved accuracy and efficiency, this model provides a valuable tool for applications requiring sophisticated text and image data integration. The document QA and visual reasoning improvements underscore its potential for many use cases, making it a significant step forward in the field.

Check out the Model. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Idefics3-8B-Llama3 Released: An Open Multimodal Model that Accepts Arbitrary Sequences of Image and Text Inputs and Produces Text Outputs appeared first on MarkTechPost.

This AI Paper from OpenAI Introduces the GPT-4o System Card: A Framewo …

Multimodal models are designed to make human-computer interaction more intuitive and natural, enabling machines to understand and respond to human inputs in ways that closely mirror human communication. This progress is crucial for advancing applications across various industries, including healthcare, education, and entertainment.

One of the main challenges in AI development is ensuring these powerful models’ safe and ethical use. As AI systems become more sophisticated, the risks associated with their misuse—such as spreading misinformation, reinforcing biases, and generating harmful content—increase. It is vital to address these issues to ensure AI advancements benefit society rather than worsen existing social problems. Balancing AI capabilities with necessary safeguards is essential to prevent unintended consequences.

Existing methods to mitigate these risks include curated datasets, safety filters, and moderation tools designed to detect and block harmful content. However, these methods often need to be improved when dealing with the complexities of multimodal AI systems. For instance, models trained on text data may struggle to interpret and generate accurate responses for audio or visual inputs. Furthermore, these approaches may only partially account for the diverse range of human interactions, such as different languages, accents, and cultural nuances, highlighting the need for more advanced solutions to ensure the safe deployment of AI technologies.

To address these challenges, OpenAI introduced the GPT-4o System Card, offering a comprehensive overview of GPT-4o’s capabilities, limitations, and safety evaluations. This document outlines the preparedness framework for assessing the model’s safety, including evaluations of its speech-to-speech capabilities, text and image processing, and potential societal impacts. The System Card marks a step forward in transparency and safety for AI models, providing detailed insights into the safeguards and evaluations that underpin the deployment of GPT-4o. It guides understanding GPT-4o’s operation and the measures taken to ensure alignment with ethical standards and safety protocols.

The GPT-4o System Card details the model’s methodology, which employs an autoregressive approach to generate outputs based on a sequence of inputs, including text, audio, and images. The model was trained on a diverse dataset comprising public web data, proprietary data from partnerships, and multimodal data such as images and videos. This extensive training process enabled GPT-4o to effectively interpret and generate data across various formats, making it particularly adept at handling complex inputs. Additionally, OpenAI implemented post-training safety filters and moderation tools to detect and block harmful content, ensuring the model’s outputs are safe and aligned with human preferences. The System Card emphasizes the importance of these safety measures, particularly in managing sensitive content and preventing misuse.

The performance of GPT-4o, as highlighted in the System Card, is remarkable for its speed and accuracy in processing multimodal data. The model can respond to audio inputs with human-like speed, averaging response times between 232 to 320 milliseconds, comparable to human conversation. GPT-4o also significantly improves non-English language processing, surpassing previous models in tasks involving text generation and code understanding. For example, the model achieved a 19% completion rate for high-school-level tasks. However, it still faced challenges in more advanced scenarios, such as collegiate and professional-level tasks, where completion rates were lower. These results highlight the model’s potential for practical applications while also indicating areas for further improvement.

The System Card also provides detailed evaluations of GPT-4o’s safety features, including its ability to refuse requests for generating unauthorized or harmful content. The model was trained to reject requests for copyrighted material, including audio and music, and uses classifiers to detect and block inappropriate outputs. GPT-4o successfully avoided generating harmful content during testing in over 95% of evaluated cases. Furthermore, the model was assessed for its ability to handle diverse user voices, including different accents, without significant variation in performance. This consistency is crucial for ensuring the model can be deployed in various real-world settings without introducing biases or disparities in service quality.

Overall, the introduction of the GPT-4o System Card represents a significant advancement in the transparency and safety of AI models. The research conducted by OpenAI underscores the importance of continuous evaluation and improvement to mitigate risks while maximizing AI’s benefits. The System Card provides a comprehensive framework for understanding and assessing GPT-4o’s capabilities, offering a more robust solution for the safe deployment of advanced AI systems. This development is a promising step toward achieving powerful and responsible AI, ensuring its benefits are widely accessible without compromising safety or ethical standards.

Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post This AI Paper from OpenAI Introduces the GPT-4o System Card: A Framework for Safe and Responsible AI Development appeared first on MarkTechPost.

How Deltek uses Amazon Bedrock for question and answering on governmen …

This post is co-written by Kevin Plexico and Shakun Vohra from Deltek.
Question and answering (Q&A) using documents is a commonly used application in various use cases like customer support chatbots, legal research assistants, and healthcare advisors. Retrieval Augmented Generation (RAG) has emerged as a leading method for using the power of large language models (LLMs) to interact with documents in natural language.
This post provides an overview of a custom solution developed by the AWS Generative AI Innovation Center (GenAIIC) for Deltek, a globally recognized standard for project-based businesses in both government contracting and professional services. Deltek serves over 30,000 clients with industry-specific software and information solutions.
In this collaboration, the AWS GenAIIC team created a RAG-based solution for Deltek to enable Q&A on single and multiple government solicitation documents. The solution uses AWS services including Amazon Textract, Amazon OpenSearch Service, and Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) and LLMs from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Deltek is continuously working on enhancing this solution to better align it with their specific requirements, such as supporting file formats beyond PDF and implementing more cost-effective approaches for their data ingestion pipeline.
What is RAG?
RAG is a process that optimizes the output of LLMs by allowing them to reference authoritative knowledge bases outside of their training data sources before generating a response. This approach addresses some of the challenges associated with LLMs, such as presenting false, outdated, or generic information, or creating inaccurate responses due to terminology confusion. RAG enables LLMs to generate more relevant, accurate, and contextual responses by cross-referencing an organization’s internal knowledge base or specific domains, without the need to retrain the model. It provides organizations with greater control over the generated text output and offers users insights into how the LLM generates the response, making it a cost-effective approach to improve the capabilities of LLMs in various contexts.
The main challenge
Applying RAG for Q&A on a single document is straightforward, but applying the same across multiple related documents poses some unique challenges. For example, when using question answering on documents that evolve over time, it is essential to consider the chronological sequence of the documents if the question is about a concept that has transformed over time. Not considering the order could result in providing an answer that was accurate at a past point but is now outdated based on more recent information across the collection of temporally aligned documents. Properly handling temporal aspects is a key challenge when extending question answering from single documents to sets of interlinked documents that progress over the course of time.
Solution overview
As an example use case, we describe Q&A on two temporally related documents: a long draft request-for-proposal (RFP) document, and a related subsequent government response to a request-for-information (RFI response), providing additional and revised information.
The solution develops a RAG approach in two steps.
The first step is data ingestion, as shown in the following diagram. This includes a one-time processing of PDF documents. The application component here is a user interface with minor processing such as splitting text and calling the services in the background. The steps are as follows:

The user uploads documents to the application.
The application uses Amazon Textract to get the text and tables from the input documents.
The text embedding model processes the text chunks and generates embedding vectors for each text chunk.
The embedding representations of text chunks along with related metadata are indexed in OpenSearch Service.

The second step is Q&A, as shown in the following diagram. In this step, the user asks a question about the ingested documents and expects a response in natural language. The application component here is a user interface with minor processing such as calling different services in the background. The steps are as follows:

The user asks a question about the documents.
The application retrieves an embedding representation of the input question.
The application passes the retrieved data from OpenSearch Service and the query to Amazon Bedrock to generate a response. The model performs a semantic search to find relevant text chunks from the documents (also called context). The embedding vector maps the question from text to a space of numeric representations.
The question and context are combined and fed as a prompt to the LLM. The language model generates a natural language response to the user’s question.

We used Amazon Textract in our solution, which can convert PDFs, PNGs, JPEGs, and TIFFs into machine-readable text. It also formats complex structures like tables for easier analysis. In the following sections, we provide an example to demonstrate Amazon Textract’s capabilities.
OpenSearch is an open source and distributed search and analytics suite derived from Elasticsearch. It uses a vector database structure to efficiently store and query large volumes of data. OpenSearch Service currently has tens of thousands of active customers with hundreds of thousands of clusters under management processing hundreds of trillions of requests per month. We used OpenSearch Service and its underlying vector database to do the following:

Index documents into the vector space, allowing related items to be located in proximity for improved relevancy
Quickly retrieve related document chunks at the question answering step using approximate nearest neighbor search across vectors

The vector database inside OpenSearch Service enabled efficient storage and fast retrieval of related data chunks to power our question answering system. By modeling documents as vectors, we could find relevant passages even without explicit keyword matches.
Text embedding models are machine learning (ML) models that map words or phrases from text to dense vector representations. Text embeddings are commonly used in information retrieval systems like RAG for the following purposes:

Document embedding – Embedding models are used to encode the document content and map them to an embedding space. It is common to first split a document into smaller chunks such as paragraphs, sections, or fixed size chunks.
Query embedding – User queries are embedded into vectors so they can be matched against document chunks by performing semantic search.

For this post, we used the Amazon Titan model, Amazon Titan Embeddings G1 – Text v1.2, which intakes up to 8,000 tokens and outputs a numerical vector of 1,536 dimensions. The model is available through Amazon Bedrock.
Amazon Bedrock provides ready-to-use FMs from top AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. It offers a single interface to access these models and build generative AI applications while maintaining privacy and security. We used Anthropic Claude v2 on Amazon Bedrock to generate natural language answers given a question and a context.
In the following sections, we look at the two stages of the solution in more detail.
Data ingestion
First, the draft RFP and RFI response documents are processed to be used at the Q&A time. Data ingestion includes the following steps:

Documents are passed to Amazon Textract to be converted into text.
To better enable our language model to answer questions about tables, we created a parser that converts tables from the Amazon Textract output into CSV format. Transforming tables into CSV improves the model’s comprehension. For instance, the following figures show part of an RFI response document in PDF format, followed by its corresponding extracted text. In the extracted text, the table has been converted to CSV format and sits among the rest of the text.
For long documents, the extracted text may exceed the LLM’s input size limitation. In these cases, we can divide the text into smaller, overlapping chunks. The chunk sizes and overlap proportions may vary depending on the use case. We apply section-aware chunking, (perform chunking independently on each document section), which we discuss in our example use case later in this post.
Some classes of documents may follow a standard layout or format. This structure can be used to optimize data ingestion. For example, RFP documents tend to have a certain layout with defined sections. Using the layout, each document section can be processed independently. Also, if a table of contents exists but is not relevant, it can potentially be removed. We provide a demonstration of detecting and using document structure later in this post.
The embedding vector for each text chunk is retrieved from an embedding model.
At the last step, the embedding vectors are indexed into an OpenSearch Service database. In addition to the embedding vector, the text chunk and document metadata such as document, document section name, or document release date are also added to the index as text fields. The document release date is useful metadata when documents are related chronologically, so that LLM can identify the most updated information. The following code snippet shows the index body:

index_body = {
“embedding_vector”: <embedding vector of a text chunk>,
“text_chunk”: <text chunk>,
“document_name”: <document name>,
“section_name”: <document section name>,
“release_date”: <document release date>,
# more metadata can be added
}

Q&A
In the Q&A phrase, users can submit a natural language question about the draft RFP and RFI response documents ingested in the previous step. First, semantic search is used to retrieve relevant text chunks to the user’s question. Then, the question is augmented with the retrieved context to create a prompt. Finally, the prompt is sent to Amazon Bedrock for an LLM to generate a natural language response. The detailed steps are as follows:

An embedding representation of the input question is retrieved from the Amazon Titan embedding model on Amazon Bedrock.
The question’s embedding vector is used to perform semantic search on OpenSearch Service and find the top K relevant text chunks. The following is an example of a search body passed to OpenSearch Service. For more details see the OpenSearch documentation on structuring a search query.

search_body = {
“size”: top_K,
“query”: {
“script_score”: {
“query”: {
“match_all”: {}, # skip full text search
},
“script”: {
“lang”: “knn”,
“source”: “knn_score”,
“params”: {
“field”: “embedding-vector”,
“query_value”: question_embedding,
“space_type”: “cosinesimil”
}
}
}
}
}

Any retrieved metadata, such as section name or document release date, is used to enrich the text chunks and provide more information to the LLM, such as the following:

def opensearch_result_to_context(os_res: dict) -> str:
“””
Convert OpenSearch result to context
Args:
os_res (dict): Amazon OpenSearch results
Returns:
context (str): Context to be included in LLM’s prompt
“””
data = os_res[“hits”][“hits”]
context = []
for item in data:
text = item[“_source”][“text_chunk”]
doc_name = item[“_source”][“document_name”]
section_name = item[“_source”][“section_name”]
release_date = item[“_source”][“release_date”]
context.append(
f”<<Context>>: [Document name: {doc_name}, Section name: {section_name}, Release date: {release_date}] {text}”
)
context = “n n —— n n”.join(context)
return context

The input question is combined with retrieved context to create a prompt. In some cases, depending on the complexity or specificity of the question, an additional chain-of-thought (CoT) prompt may need to be added to the initial prompt in order to provide further clarification and guidance to the LLM. The CoT prompt is designed to walk the LLM through the logical steps of reasoning and thinking that are required to properly understand the question and formulate a response. It lays out a type of internal monologue or cognitive path for the LLM to follow in order to comprehend the key information within the question, determine what kind of response is needed, and construct that response in an appropriate and accurate way. We use the following CoT prompt for this use case:

“””
Context below includes a few paragraphs from draft RFP and RFI response documents:

Context: {context}

Question: {question}

Think step by step:

1- Find all the paragraphs in the context that are relevant to the question.
2- Sort the paragraphs by release date.
3- Use the paragraphs to answer the question.

Note: Pay attention to the updated information based on the release dates.
“””

The prompt is passed to an LLM on Amazon Bedrock to generate a response in natural language. We use the following inference configuration for the Anthropic Claude V2 model on Amazon Bedrock. The Temperature parameter is usually set to zero for reproducibility and also to prevent LLM hallucination. For regular RAG applications, top_k and top_p are usually set to 250 and 1, respectively. Set max_tokens_to_sample to maximum number of tokens expected to be generated (1 token is approximately 3/4 of a word). See Inference parameters for more details.

{
“temperature”: 0,
“top_k”: 250,
“top_p”: 1,
“max_tokens_to_sample”: 300,
“stop_sequences”: [“nnHuman:nn”]
}

Example use case
As a demonstration, we describe an example of Q&A on two related documents: a draft RFP document in PDF format with 167 pages, and an RFI response document in PDF format with 6 pages released later, which includes additional information and updates to the draft RFP.
The following is an example question asking if the project size requirements have changed, given the draft RFP and RFI response documents:
Have the original scoring evaluations changed? if yes, what are the new project sizes?
The following figure shows the relevant sections of the draft RFP document that contain the answers.

The following figure shows the relevant sections of the RFI response document that contain the answers.

For the LLM to generate the correct response, the retrieved context from OpenSearch Service should contain the tables shown in the preceding figures, and the LLM should be able to infer the order of the retrieved contents from metadata, such as release dates, and generate a readable response in natural language.
The following are the data ingestion steps:

The draft RFP and RFI response documents are uploaded to Amazon Textract to extract text and tables as the content. Additionally, we used regular expression to identify document sections and table of contents (see the following figures, respectively). The table of contents can be removed for this use case because it doesn’t have any relevant information.
We split each document section independently into smaller chunks with some overlaps. For this use case, we used a chunk size of 500 tokens with the overlap size of 100 tokens (1 token is approximately 3/4 a word). We used a BPE tokenizer, where each token corresponds to about 4 bytes.
An embedding representation of each text chunk is obtained using the Amazon Titan Embeddings G1 – Text v1.2 model on Amazon Bedrock.
Each text chunk is stored into an OpenSearch Service index along with metadata such as section name and document release date.

The Q&A steps are as follows:

The input question is first transformed to a numeric vector using the embedding model. The vector representation used for semantic search and retrieval of relevant context in the next step.
The top K relevant text chunk and metadata are retrieved from OpenSearch Service.
The opensearch_result_to_context function and the prompt template (defined earlier) are used to create the prompt given the input question and retrieved context.
The prompt is sent to the LLM on Amazon Bedrock to generate a response in natural language. The following is the response generated by Anthropic Claude v2, which matched with the information presented in the draft RFP and RFI response documents. The question was “Have the original scoring evaluations changed? If yes, what are the new project sizes?” Using CoT prompting, the model can correctly answer the question.

Key features
The solution contains the following key features:

Section-aware chunking – Identify document sections and split each section independently into smaller chunks with some overlaps to optimize data ingestion.
Table to CSV transformation – Convert tables extracted by Amazon Textract into CSV format to improve the language model’s ability to comprehend and answer questions about tables.
Adding metadata to index – Store metadata such as section name and document release date along with text chunks in the OpenSearch Service index. This allowed the language model to identify the most up-to-date or relevant information.
CoT prompt – Design a chain-of-thought prompt to provide further clarification and guidance to the language model on the logical steps needed to properly understand the question and formulate an accurate response.

These contributions helped improve the accuracy and capabilities of the solution for answering questions about documents. In fact, based on Deltek’s subject matter experts’ evaluations of LLM-generated responses, the solution achieved a 96% overall accuracy rate.
Conclusion
This post outlined an application of generative AI for question answering across multiple government solicitation documents. The solution discussed was a simplified presentation of a pipeline developed by the AWS GenAIIC team in collaboration with Deltek. We described an approach to enable Q&A on lengthy documents published separately over time. Using Amazon Bedrock and OpenSearch Service, this RAG architecture can scale for enterprise-level document volumes. Additionally, a prompt template was shared that uses CoT logic to guide the LLM in producing accurate responses to user questions. Although this solution is simplified, this post aimed to provide a high-level overview of a real-world generative AI solution for streamlining review of complex proposal documents and their iterations.
Deltek is actively refining and optimizing this solution to ensure it meets their unique needs. This includes expanding support for file formats other than PDF, as well as adopting more cost-efficient strategies for their data ingestion pipeline.
Learn more about prompt engineering and generative AI-powered Q&A in the Amazon Bedrock Workshop. For technical support or to contact AWS generative AI specialists, visit the GenAIIC webpage.
Resources
To learn more about Amazon Bedrock, see the following resources:

Amazon Bedrock Workshop
Amazon Bedrock User Guide

To learn more about OpenSearch Service, see the following resources:

Amazon OpenSearch Service Documentation
Amazon OpenSearch Workshop

See the following links for RAG resources on AWS:

Retrieval augmented generation (RAG)
Knowledge Bases for Amazon Bedrock

About the Authors
Kevin Plexico is Senior Vice President of Information Solutions at Deltek, where he oversees research, analysis, and specification creation for clients in the Government Contracting and AEC industries. He leads the delivery of GovWin IQ, providing essential government market intelligence to over 5,000 clients, and manages the industry’s largest team of analysts in this sector. Kevin also heads Deltek’s Specification Solutions products, producing premier construction specification content including MasterSpec® for the AIA and SpecText.
Shakun Vohra is a distinguished technology leader with over 20 years of expertise in Software Engineering, AI/ML, Business Transformation, and Data Optimization. At Deltek, he has driven significant growth, leading diverse, high-performing teams across multiple continents. Shakun excels in aligning technology strategies with corporate goals, collaborating with executives to shape organizational direction. Renowned for his strategic vision and mentorship, he has consistently fostered the development of next-generation leaders and transformative technological solutions.
Amin Tajgardoon is an Applied Scientist at the AWS Generative AI Innovation Center. He has an extensive background in computer science and machine learning. In particular, Amin’s focus has been on deep learning and forecasting, prediction explanation methods, model drift detection, probabilistic generative models, and applications of AI in the healthcare domain.
Anila Joshi has more than a decade of experience building AI solutions. As an Applied Science Manager at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.
Yash Shah and his team of scientists, specialists and engineers at AWS Generative AI Innovation Center, work with some of AWS most strategic customers on helping them realize art of the possible with Generative AI by driving business value. Yash has been with Amazon for more than 7.5 years now and has worked with customers across healthcare, sports, manufacturing and software across multiple geographic regions.
Jordan Cook is an accomplished AWS Sr. Account Manager with nearly two decades of experience in the technology industry, specializing in sales and data center strategy. Jordan leverages his extensive knowledge of Amazon Web Services and deep understanding of cloud computing to provide tailored solutions that enable businesses to optimize their cloud infrastructure, enhance operational efficiency, and drive innovation.