September 2023 - i-genie.co.uk

This AI Paper Proposes LLM-Grounder: A Zero-Shot, Open-Vocabulary Appr …

Posted on September 30, 2023 by i-genie

Understanding their surroundings in three dimensions (3D vision) is essential for domestic robots to perform tasks like navigation, manipulation, and answering queries. At the same time, current methods can need help to deal with complicated language queries or rely excessively on large amounts of labeled data.

ChatGPT and GPT-4 are just two examples of large language models (LLMs) with amazing language understanding skills, such as planning and tool use. By breaking down large problems into smaller ones and learning when, what, and how to employ a tool to finish sub-tasks, LLMs can be deployed as agents to solve complicated problems. Parsing the compositional language into smaller semantic constituents, interacting with tools and environment to collect feedback, and reasoning with spatial and commonsense knowledge to iteratively ground the language to the target object are all necessary for 3D visual grounding with complex natural language queries.

Nikhil Madaan and researchers from the University of Michigan and New York University present LLM-Grounder, a novel zero-shot LLM-agent-based 3D visual grounding process that uses an open vocabulary. While a visual grounder excels at grounding basic noun phrases, the team hypothesizes that an LLM can help mitigate the “bag-of-words” limitation of a CLIP-based visual grounder by taking on the challenging language deconstruction, spatial, and commonsense reasoning tasks itself.

LLM-Grounder relies on an LLM to coordinate the grounding procedure. After receiving a natural language query, the LLM breaks it down into its parts or semantic ideas, such as the type of object sought, its properties (including color, shape, and material), landmarks, and geographical relationships. To locate each concept in the scene, these sub-queries are sent to a visual grounder tool supported by OpenScene or LERF, both of which are CLIP-based open-vocabulary 3D visual grounding approaches. The visual grounder suggests a few bounding boxes based on where the most promising candidates for a notion are located in the scene. The visual grounder tools compute spatial information, such as object volumes and distances to landmarks, and feed that data back to the LLM agent, allowing the latter to make a more well-rounded assessment of the situation in terms of spatial relation and common sense and ultimately choose a candidate that best matches all criteria in the original query. The LLM agent will continue to cycle through these steps until it reaches a decision. The researchers take a step beyond existing neural-symbolic methods by using the surrounding context in their analysis.

The team highlights that the method doesn’t require labeled data for training. Given the semantic variety of 3D settings and the scarcity of 3D-text labeled data, its open-vocabulary and zero-shot generalization to novel 3D scenes and arbitrary text queries is an attractive feature. Using the ScanRefer benchmark, the researchers conduct experimental evaluations of LLM-Grounder. The ability to interpret compositional visual referential expressions is important to evaluating grounding in 3D vision language in this benchmark. The results show that the method outperforms state-of-the-art zero-shot grounding accuracy on ScanRefer with no labeled data. It also enhances the grounding capacity of open-vocabulary approaches like OpenScene and LERF. Based on their erasure research, LLM improves grounding capabilities in proportion to the complexity of the language query. These show the efficiency of the LLM-Grounder method for 3D vision language problems, making it ideal for robotics applications where awareness of context and the ability to quickly and accurately react to changing questions are crucial.

Check out the Paper and Demo. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Proposes LLM-Grounder: A Zero-Shot, Open-Vocabulary Approach to 3D Visual Grounding for Next-Gen Household Robots appeared first on MarkTechPost.

Build a crop segmentation machine learning model with Planet data and …

Posted on September 30, 2023 by i-genie

This guest post is co-written by Lydia Lihui Zhang, Business Development Specialist, and Mansi Shah, Software Engineer/Data Scientist, at Planet Labs. The analysis that inspired this post was originally written by Jennifer Reiber Kyle.
Amazon SageMaker geospatial capabilities combined with Planet’s satellite data can be used for crop segmentation, and there are numerous applications and potential benefits of this analysis to the fields of agriculture and sustainability. In late 2023, Planet announced a partnership with AWS to make its geospatial data available through Amazon SageMaker.
Crop segmentation is the process of splitting up a satellite image into regions of pixels, or segments, that have similar crop characteristics. In this post, we illustrate how to use a segmentation machine learning (ML) model to identify crop and non-crop regions in an image.
Identifying crop regions is a core step towards gaining agricultural insights, and the combination of rich geospatial data and ML can lead to insights that drive decisions and actions. For example:

Making data-driven farming decisions – By gaining better spatial understanding of the crops, farmers and other agricultural stakeholders can optimize the use of resources, from water to fertilizer to other chemicals across the season. This sets the foundation for reducing waste, improving sustainable farming practices wherever possible, and increasing productivity while minimizing environmental impact.
Identifying climate-related stresses and trends – As climate change continues to affect global temperature and rainfall patterns, crop segmentation can be used to identify areas that are vulnerable to climate-related stress for climate adaptation strategies. For example, satellite imagery archives can be used to track changes in a crop growing region over time. These could be the physical changes in size and distribution of croplands. They could also be the changes in soil moisture, soil temperature, and biomass, derived from the different spectral index of satellite data, for deeper crop health analysis.
Assessing and mitigating damage – Finally, crop segmentation can be used to quickly and accurately identify areas of crop damage in the event of a natural disaster, which can help prioritize relief efforts. For example, after a flood, high-cadence satellite images can be used to identify areas where crops have been submerged or destroyed, allowing relief organizations to assist affected farmers more quickly.

In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.
Planet and AWS’s partnership on geospatial ML
SageMaker geospatial capabilities empower data scientists and ML engineers to build, train, and deploy models using geospatial data. SageMaker geospatial capabilities allow you to efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained ML models, and explore model predictions and geospatial data on an interactive map using 3D-accelerated graphics and built-in visualization tools. With SageMaker geospatial capabilities, you can process large datasets of satellite imagery and other geospatial data to create accurate ML models for various applications, including crop segmentation, which we discuss in this post.
Planet Labs PBC is a leading Earth-imaging company that uses its large fleet of satellites to capture imagery of the Earth’s surface on a daily basis. Planet’s data is therefore a valuable resource for geospatial ML. Its high-resolution satellite imagery can be used to identify various crop characteristics and their health over time, anywhere on Earth.
The partnership between Planet and SageMaker enables customers to easily access and analyze Planet’s high-frequency satellite data using AWS’s powerful ML tools. Data scientists can bring their own data or conveniently find and subscribe to Planet’s data without switching environments.
Crop segmentation in an Amazon SageMaker Studio notebook with a geospatial image
In this example geospatial ML workflow, we look at how to bring Planet’s data along with the ground truth data source into SageMaker, and how to train, infer, and deploy a crop segmentation model with a KNN classifier. Finally, we assess the accuracy of our results and compare this to our ground truth classification.
The KNN classifier used was trained in an Amazon SageMaker Studio notebook with a geospatial image, and provides a flexible and extensible notebook kernel for working with geospatial data.
The Amazon SageMaker Studio notebook with geospatial image comes pre-installed with commonly used geospatial libraries such as GDAL, Fiona, GeoPandas, Shapely, and Rasterio, which allow the visualization and processing of geospatial data directly within a Python notebook environment. Common ML libraries such as OpenCV or scikit-learn are also used to perform crop segmentation using KNN classification, and these are also installed in the geospatial kernel.
Data selection
The agricultural field we zoom into is located at the usually sunny Sacramento County in California.
Why Sacramento? The area and time selection for this type of problem is primarily defined by the availability of ground truth data, and such data in crop type and boundary data, is not easy to come by. The 2015 Sacramento County Land Use DWR Survey dataset is a publicly available dataset covering Sacramento County in that year and provides hand-adjusted boundaries.
The primary satellite imagery we use is the Planet’s 4-band PSScene Product, which contains the Blue, Green, Red, and Near-IR bands and is radiometrically corrected to at-sensor radiance. The coefficients for correcting to at-sensor reflectance are provided in the scene metadata, which further improves the consistency between images taken at different times.
Planet’s Dove satellites that produced this imagery were launched February 14, 2017 (news release), therefore they didn’t image Sacramento County back in 2015. However, they have been taking daily imagery of the area since the launch. In this example, we settle for the imperfect 2-year gap between the ground truth data and satellite imagery. However, Landsat 8 lower-resolution imagery could have been used as a bridge between 2015 and 2017.
Access Planet data
To help users get accurate and actionable data faster, Planet has also developed the Planet Software Development Kit (SDK) for Python. This is a powerful tool for data scientists and developers who want to work with satellite imagery and other geospatial data. With this SDK, you can search and access Planet’s vast collection of high-resolution satellite imagery, as well as data from other sources like OpenStreetMap. The SDK provides a Python client to Planet’s APIs, as well as a no-code command line interface (CLI) solution, making it easy to incorporate satellite imagery and geospatial data into Python workflows. This example uses the Python client to identify and download imagery needed for the analysis.
You can install the Planet Python client in the SageMaker Studio notebook with geospatial image using a simple command:

%pip install planet

You can use the client to query relevant satellite imagery and retrieve a list of available results based on the area of interest, time range, and other search criteria. In the following example, we start by asking how many PlanetScope scenes (Planet’s daily imagery) cover the same area of interest (AOI) that we define earlier through the ground data in Sacramento, given a certain time range between June 1 and October 1, 2017; as well as a certain desired maximum cloud coverage range of 10%:

# create a request using the SDK from the search specifications of the data

item_type = [‘PSScene’]

geom_filter_train = data_filter.geometry_filter(aoi_train)
date_range_filter = data_filter.date_range_filter(“acquired”, gt=datetime(month=6, day=1, year=2017), lt=datetime(month=10, day=1, year=2017))
cloud_cover_filter = data_filter.range_filter(‘cloud_cover’, lt=0.10)

combined_filter_test = data_filter.and_filter([geom_filter_test, date_range_filter, cloud_cover_filter])

# Run a quick search for our TRAIN data
async with Session() as sess:
cl = sess.client(‘data’)
results = cl.search(name=’temp_search_train’,search_filter=combined_filter_train, item_types=item_type)
train_result_list = [i async for i in results]

print(“Number of train scene results: “, len(train_result_list))

The returned results show the number of matching scenes overlapping with our area of interest. It also contains each scene’s metadata, its image ID, and a preview image reference.
After a particular scene has been selected, with specification on the scene ID, item type, and product bundles (reference documentation), you can use the following code to download the image and its metadata:

train_scene_id = ‘20170601_180425_0f35’
item_type = ‘PSScene’
bundle_type = ‘analytic_sr_udm2’

# define the order request
products = [order_request.product([train_scene_id], bundle_type, item_type)]
request = order_request.build_request(‘train_dataset’, products=products)

# download the training data
async with Session() as sess:
cl = sess.client(‘orders’)
# use “reporting” to manage polling for order status
with reporting.StateBar(state=’creating’) as bar:
# perform the order with the prior created order request
order = await cl.create_order(request)
bar.update(state=’created’, order_id=train_order[‘id’])

# wait via polling until the order is processed
await cl.wait(train_order[‘id’], callback=bar.update_state)

# download the actual asset
await cl.download_order(order_id=order[‘id’], directory=download_directory, progress_bar=True, overwrite=True)

This code downloads the corresponding satellite image to the Amazon Elastic File System (Amazon EFS) volume for SageMaker Studio.
Model training
After the data has been downloaded with the Planet Python client, the segmentation model can be trained. In this example, a combination of KNN classification and image segmentation techniques is used to identify crop area and create georeferenced geojson features.
The Planet data is loaded and preprocessed using the built-in geospatial libraries and tools in SageMaker to prepare it for training the KNN classifier. The ground truth data for training is the Sacramento County Land Use DWR Survey dataset from 2015, and the Planet data from 2017 is used for testing the model.
Convert ground truth features to contours
To train the KNN classifier, the class of each pixel as either crop or non-crop needs to be identified. The class is determined by whether the pixel is associated with a crop feature in the ground truth data or not. To make this determination, the ground truth data is first converted into OpenCV contours, which are then used to separate crop from non-crop pixels. The pixel values and their classification are then used to train the KNN classifier.
To convert the ground truth features to contours, the features must first be projected to the coordinate reference system of the image. Then, the features are transformed into image space, and finally converted into contours. To ensure the accuracy of the contours, they are visualized overlaid on the input image, as shown in the following example.

To train the KNN classifier, crop and non-crop pixels are separated using the crop feature contours as a mask.

The input of KNN classifier consists of two datasets: X, a 2d array that provides the features to be classified on; and y, a 1d array that provides the classes (example). Here, a single classified band is created from the non-crop and crop datasets, where the band’s values indicate the pixel class. The band and the underlying image pixel band values are then converted to the X and y inputs for the classifier fit function.

Train the classifier on crop and non-crop pixels
The KNN classification is performed with the scikit-learn KNeighborsClassifier. The number of neighbors, a parameter greatly affecting the estimator’s performance, is tuned using cross-validation in KNN cross-validation. The classifier is then trained using the prepared datasets and the tuned number of neighbor parameters. See the following code:

def fit_classifier(pl_filename, ground_truth_filename, metadata_filename, n_neighbors):
weights = ‘uniform’
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
train_class_band = create_contour_classified_band(pl_filename, ground_truth_filename)
X = to_X(load_refl_bands(pl_filename, metadata_filename))
y = to_y(train_class_band)
clf.fit(X, y)
return clf

clf = fit_classifier(train_scene_filename,
train_ground_truth_filename,
train_metadata_filename,
n_neighbors)

To assess the classifier’s performance on its input data, the pixel class is predicted using the pixel band values. The classifier’s performance is mainly based on the accuracy of the training data and the clear separation of the pixel classes based on the input data (pixel band values). The classifier’s parameters, such as the number of neighbors and the distance weighting function, can be adjusted to compensate for any inaccuracies in the latter. See the following code:

def predict(pl_filename, metadata_filename, clf):
bands = load_refl_bands(pl_filename, metadata_filename)
X = to_X(bands)
y = clf.predict(X)
return classified_band_from_y(bands[0].mask, y)

train_predicted_class_band = predict(train_scene_filename, train_metadata_filename, clf)

Evaluate model predictions
The trained KNN classifier is utilized to predict crop regions in the test data. This test data consists of regions that were not exposed to the model during training. In other words, the model has no knowledge of the area prior to its analysis and therefore this data can be used to objectively evaluate the model’s performance. We start by visually inspecting several regions, beginning with a region that is comparatively noisier.

The visual inspection reveals that the predicted classes are mostly consistent with the ground truth classes. There are a few regions of deviation, which we inspect further.

Upon further investigation, we discovered that some of the noise in this region was due to the ground truth data lacking the detail that is present in the classified image (top right compared to top left and bottom left). A particularly interesting finding is that the classifier identifies trees along the river as non-crop, whereas the ground truth data mistakenly identifies them as crop. This difference between these two segmentations may be due to the trees shading the region over the crops.
Following this, we inspect another region that was classified differently between the two methods. These highlighted regions were previously marked as non-crop regions in the ground truth data in 2015 (top right) but changed and shown clearly as cropland in 2017 through the Planetscope Scenes (top left and bottom left). They were also classified largely as cropland through the classifier (bottom right).

Again, we see the KNN classifier presents a more granular result than the ground truth class, and it also successfully captures the change happening in the cropland. This example also speaks to the value of daily refreshed satellite data because the world often changes much faster than annual reports, and a combined method with ML like this can help us pick up the changes as they happen. Being able to monitor and discover such changes via satellite data, especially in the evolving agricultural fields, provides helpful insights for farmers to optimize their work and any agricultural stakeholder in the value chain to get a better pulse of the season.
Model evaluation
The visual comparison of the images of the predicted classes to the ground truth classes can be subjective and can’t be generalized for assessing the accuracy of the classification results. To obtain a quantitative assessment, we obtain classification metrics by using scikit-learn’s classification_report function:

# train dataset
print(classification_report(to_y(create_contour_classified_band(train_scene_filename,
train_ground_truth_filename)),
to_y(train_predicted_class_band),
target_names=[‘crop’, ‘non-crop’]))

precision recall f1-score support

crop 0.89 0.86 0.87 2641818
non-crop 0.83 0.86 0.84 2093907

accuracy 0.86 4735725
macro avg 0.86 0.86 0.86 4735725
weighted avg 0.86 0.86 0.86 4735725

# test dataset
print(classification_report(to_y(create_contour_classified_band(test_scene_filename,
test_ground_truth_filename)),
to_y(test_predicted_class_band),
target_names=[‘crop’, ‘non-crop’]))

precision recall f1-score support

crop 0.94 0.73 0.82 1959630
non-crop 0.32 0.74 0.44 330938

accuracy 0.73 2290568
macro avg 0.63 0.74 0.63 2290568
weighted avg 0.85 0.73 0.77 2290568

The pixel classification is used to create a segmentation mask of crop regions, making both precision and recall important metrics, and the F1 score a good overall measure for predicting accuracy. Our results give us metrics for both crop and non-crop regions in the train and test dataset. However, to keep things simple, let’s take a closer look at these metrics in the context of the crop regions in the test dataset.
Precision is a measure of how accurate our model’s positive predictions are. In this case, a precision of 0.94 for crop regions indicates that our model is very successful at correctly identifying areas that are indeed crop regions, where false positives (actual non-crop regions incorrectly identified as crop regions) are minimized. Recall, on the other hand, measures the completeness of positive predictions. In other words, recall measures the proportion of actual positives that were identified correctly. In our case, a recall value of 0.73 for crop regions means that 73% of all true crop region pixels are correctly identified, minimizing the number of false negatives.
Ideally, high values of both precision and recall are preferred, although this can be largely dependent on the application of the case study. For example, if we were examining these results for farmers looking to identify crop regions for agriculture, we would want to give preference to a higher recall than precision, in order to minimize the number of false negatives (areas identified as non-crop regions that are actually crop regions) in order to make the most use of the land. The F1-score serves as an overall accuracy metric combining both precision and recall, and measuring the balance between the two metrics. A high F1-score, such as ours for crop regions (0.82), indicates a good balance between both precision and recall and a high overall classification accuracy. Although the F1-score drops between the train and test datasets, this is expected because the classifier was trained on the train dataset. An overall weighted average F1 score of 0.77 is promising and adequate enough to try segmentation schemes on the classified data.
Create a segmentation mask from the classifier
The creation of a segmentation mask using the predictions from the KNN classifier on the test dataset involves cleaning up the predicted output to avoid small segments caused by image noise. To remove speckle noise, we use the OpenCV median blur filter. This filter preserves road delineations between crops better than the morphological open operation.

To apply binary segmentation to the denoised output, we first need to convert the classified raster data to vector features using the OpenCV findContours function.

Finally, the actual segmented crop regions can be computed using the segmented crop outlines.

The segmented crop regions produced from the KNN classifier allow for precise identification of crop regions in the test dataset. These segmented regions can be used for various purposes, such as field boundary identification, crop monitoring, yield estimation, and resource allocation. The achieved F1 score of 0.77 is good and provides evidence that the KNN classifier is an effective tool for crop segmentation in remote sensing images. These results can be used to further improve and refine crop segmentation techniques, potentially leading to increased accuracy and efficiency in crop analysis.
Conclusion
This post demonstrated how you can use the combination of Planet’s high cadence, high-resolution satellite imagery and SageMaker geospatial capabilities to perform crop segmentation analysis, unlocking valuable insights that can improve agricultural efficiency, environmental sustainability, and food security. Accurately identifying crop regions enables further analysis on crop growth and productivity, monitoring of land use changes, and detection of potential food security risks.
Moreover, the combination of Planet data and SageMaker offers a wide range of use cases beyond crop segmentation. The insights can enable data-driven decisions on crop management, resource allocation, and policy planning in agriculture alone. With different data and ML models, the combined offering could also expand into other industries and use cases towards digital transformation, sustainability transformation, and security.
To start using SageMaker geospatial capabilities, see Get started with Amazon SageMaker geospatial capabilities.
To learn more about Planet’s imagery specifications and developer reference materials, visit Planet Developer’s Center. For documentation on Planet’s SDK for Python, see Planet SDK for Python. For more information about Planet, including its existing data products and upcoming product releases, visit https://www.planet.com/.
Planet Labs PBC Forward-Looking Statements
Except for the historical information contained herein, the matters set forth in this blog post are forward-looking statements within the meaning of the “safe harbor” provisions of the Private Securities Litigation Reform Act of 1995, including, but not limited to, Planet Labs PBC’s ability to capture market opportunity and realize any of the potential benefits from current or future product enhancements, new products, or strategic partnerships and customer collaborations. Forward-looking statements are based on Planet Labs PBC’s management’s beliefs, as well as assumptions made by, and information currently available to them. Because such statements are based on expectations as to future events and results and are not statements of fact, actual results may differ materially from those projected. Factors which may cause actual results to differ materially from current expectations include, but are not limited to the risk factors and other disclosures about Planet Labs PBC and its business included in Planet Labs PBC’s periodic reports, proxy statements, and other disclosure materials filed from time to time with the Securities and Exchange Commission (SEC) which are available online at www.sec.gov, and on Planet Labs PBC’s website at www.planet.com. All forward-looking statements reflect Planet Labs PBC’s beliefs and assumptions only as of the date such statements are made. Planet Labs PBC undertakes no obligation to update forward-looking statements to reflect future events or circumstances.

About the authors
Lydia Lihui Zhang is the Business Development Specialist at Planet Labs PBC, where she helps connect space for the betterment of earth across various sectors and a myriad of use cases. Previously, she was a data scientist at McKinsey ACRE, an agriculture-focused solution. She holds a Master of Science from MIT Technology Policy Program, focusing on space policy. Geospatial data and its broader impact on business and sustainability have been her career focus.
Mansi Shah is a software engineer, data scientist, and musician whose work explores the spaces where artistic rigor and technical curiosity collide. She believes data (like art!) imitates life, and is interested in the profoundly human stories behind the numbers and notes.
Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes computer vision and efficient model training. In his spare time, he enjoys running, playing basketball, and spending time with his family.
Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.
Shital Dhakal is a Sr. Program Manager with the SageMaker geospatial ML team based in the San Francisco Bay Area. He has a background in remote sensing and Geographic Information System (GIS). He is passionate about understanding customers pain points and building geospatial products to solve them. In his spare time, he enjoys hiking, traveling, and playing tennis.

This AI Paper Dives into Embodied Evaluations: Unveiling the Tong Test …

Posted on September 29, 2023 by i-genie

Unlike narrow or specialized AI systems designed for specific tasks, Artificial General Intelligence (AGI) can perform a wide range of functions that aim to replicate human intelligence’s broad cognitive abilities and adaptability. AGI can function autonomously by making decisions and taking actions independently. AGI can also comprehend ambiguous or incomplete information.

Achieving AGI is a complex and challenging endeavor, as it requires solving numerous difficult problems in machine learning, natural language processing, robotics, and other AI-related fields.

Researchers at the National Key Laboratory of General Artificial Intelligence propose a new way of evaluating AGI by introducing the Tong Test. “Tong” corresponds to the Chinese character of general in AGI.

They propose that AGI evaluation should be rooted in scenarios with the complex environments of DEPSI. They say that only through evaluations within DEPSI can the human-like abilities of AGI, such as commonsense reasoning, intention inference of social interactions, trust, and self-awareness, be promptly assessed. The Tong test offers a new perspective on AGI evaluation by emphasizing the importance of DEPSI as ability, value-oriented rather than a task-oriented evaluation.

The Tong test is a benchmark and evaluation system focusing on essential features such as infinite tasks, self-driven task generation, value alignment, and causal understanding. Their proposed virtual platform could also support embodied AI in training and testing. Embodied AI agents acquire information within this platform and continue to learn and finetune their values and abilities interactively.

To support infinite tasks, they follow a compositional graphical model as a basic form of knowledge representation that parses any given scene’s spatial, temporal, and causal relations. They define a fluent space for the time-varying variables; these represent all possible scene configurations that can be represented within a continuous DEPSI environment space.

The Tong test spans two domains called the U–V dual system. The U-system describes the agent’s understanding of extrinsic physical or social rules. In contrast, the V-system comprises the agent’s intrinsic values, defined as a set of value functions upon which the self-driven behaviors of the agent are built. The Tong test platform has modules for intermediate data visualization and a panel that displays the model’s performance, indicating how well the tested model performed.

Thus, the proposed Tong test based on DEPSI defines the five multidimensional levels of values and abilities and provides a practical pathway for building theoretical guidance for developing AI algorithms.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Dives into Embodied Evaluations: Unveiling the Tong Test as a Novel Benchmark for Progress Toward Artificial General Intelligence appeared first on MarkTechPost.

CMU Researchers Introduce AdaTest++: Enhancing the Auditing of Large L …

Posted on September 29, 2023 by i-genie

Auditing Large Language Models (LLMs) has become a paramount concern as these models are increasingly integrated into various applications. Ensuring their ethical, unbiased, and responsible behavior is essential. However, the traditional auditing process can be time-consuming, lacks systematicity, and may not uncover all potential issues. Researchers have introduced AdaTest++, an advanced auditing tool that revolutionizes the LLM auditing landscape to address these challenges.

Auditing LLMs is a complex and demanding task. It involves manually testing these models to uncover biases, errors, or undesirable outputs. This process can be highly labor-intensive, lacks structure, and may not effectively reveal all potential issues. Consequently, there is a pressing need for an improved auditing framework that streamlines the process, enhances sensemaking, and facilitates communication between auditors and LLMs.

Traditional methods for auditing LLMs often rely on ad-hoc testing. Auditors interact with the model, attempting to uncover issues through a trial-and-error approach. While this approach can identify some problems, it needs a more systematic and comprehensive framework for auditing LLMs effectively.

Researchers have introduced AdaTest++, an innovative auditing tool designed to overcome the limitations of current methods. AdaTest++ is built upon a sensemaking framework, which guides auditors through four key stages: Surprise, Schemas, Hypotheses, and Assessment.

AdaTest++ incorporates several critical features to enhance the auditing process:

Prompt Templates: AdaTest++ provides auditors with a library of prompt templates. These templates enable auditors to translate their hypotheses about model behavior into precise and reusable prompts. This feature streamlines the process of formulating specific queries for the LLM, making it easier to test and validate hypotheses related to bias, accuracy, or appropriateness of model responses.

Organizing Tests: The tool includes features for systematically organizing tests into meaningful schemas. This functionality empowers auditors to categorize and group tests based on common themes or model behavior patterns. By improving the organization of test cases, AdaTest++ enhances the efficiency of the auditing process and simplifies the tracking and analysis of model responses.

Top-Down and Bottom-Up Exploration: AdaTest++ accommodates top-down and bottom-up auditing approaches. Auditors can initiate the process with predefined hypotheses and use prompt templates to guide their queries. Alternatively, they can commence the exploration from scratch, relying on the tool to generate test suggestions that reveal unexpected model behaviors.

Validation and Refinement: In the final stage, auditors can validate their hypotheses by generating tests that provide supporting evidence or counter-evidence. AdaTest++ enables users to refine their mental models of the LLM’s behavior through iterative testing and hypothesis modification. Auditors can create new tests or adapt existing ones to understand the model’s capabilities and limitations better.

AdaTest++ has demonstrated remarkable effectiveness in assisting auditors throughout the auditing process. Users have reported significant improvements in their ability to uncover unexpected model behaviors, systematically organize their findings, and refine their comprehension of LLMs. This collaborative approach between auditors and LLMs, facilitated by AdaTest++, fosters transparency and trust in AI systems.

In conclusion, AdaTest++ offers a compelling solution to the challenges associated with auditing Large Language Models. By providing auditors with a powerful and systematic tool, AdaTest++ empowers them to assess model behavior comprehensively, uncover potential biases or errors, and refine their understanding. This tool significantly contributes to the responsible deployment of LLMs in various domains, promoting transparency and accountability in AI systems.

As the utilization of LLMs continues to expand, tools like AdaTest++ play an indispensable role in ensuring these models align with ethical and safety standards. Auditors can rely on AdaTest++ to navigate the intricate landscape of LLM behavior, ultimately benefiting society by promoting the responsible use of AI technology.

Check out the Paper and CMU Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post CMU Researchers Introduce AdaTest++: Enhancing the Auditing of Large Language Models through Advanced Human-AI Collaboration Techniques appeared first on MarkTechPost.

This AI Paper Introduces Quilt-1M: Harnessing YouTube to Create the La …

Posted on September 29, 2023 by i-genie

In response to the scarcity of comprehensive datasets in the field of histopathology, a research team has introduced a groundbreaking solution known as QUILT-1M. This new framework aims to leverage the wealth of information available on YouTube, particularly in the form of educational histopathology videos. By curating a massive dataset from these videos, QUILT-1M comprises an impressive 1 million paired image-text samples, making it the largest vision-language histopathology dataset to date.

The scarcity of such datasets has hindered progress in the field of histopathology, where dense, interconnected representations are essential for capturing the complexity of various disease subtypes. QUILT-1M offers several advantages. First, it does not overlap with existing data sources, ensuring a unique contribution to histopathology knowledge. Second, the rich textual descriptions extracted from expert narrations within educational videos provide comprehensive information. Lastly, multiple sentences per image offer diverse perspectives and a thorough understanding of each histopathological image.

The research team used a combination of models, algorithms, and human knowledge databases to curate this dataset. They also expanded QUILT by adding data from other sources, including Twitter, research papers, and PubMed. The dataset’s quality is evaluated through various metrics, including ASR error rates, precision of language model corrections, and sub-pathology classification accuracy.

In terms of results, QUILT-1M outperforms existing models, including BiomedCLIP, in zero-shot, linear probing, and cross-modal retrieval tasks across various sub-pathology types. QUILTNET performs better than out-of-domain CLIP baseline and state-of-the-art histopathology models across 12 zero-shot tasks, covering 8 different sub-pathologies. The research team emphasizes the potential of QUILT-1M to benefit both computer scientists and histopathologists.

In conclusion, QUILT-1M represents a significant advancement in the field of histopathology by providing a large, diverse, and high-quality vision-language dataset. It opens new possibilities for research and the development of more effective histopathology models.

Check out the Paper, Project, and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Introduces Quilt-1M: Harnessing YouTube to Create the Largest Vision-Language Histopathology Dataset appeared first on MarkTechPost.

Accenture creates a Knowledge Assist solution using generative AI serv …

Posted on September 29, 2023 by i-genie

This post is co-written with Ilan Geller and Shuyu Yang from Accenture.
Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up to date. At the same time, the unstructured nature of much of this content makes it time consuming to find answers using traditional search.
Internally, employees can often spend countless hours hunting down information they need to do their jobs, leading to frustration and reduced productivity. And when they can’t find answers, they have to escalate issues or make decisions without complete context, which can create risk.
Externally, customers can also find it frustrating to locate the information they are seeking. Although enterprise knowledge bases have, over time, improved the customer experience, they can still be cumbersome and difficult to use. Whether seeking answers to a product-related question or needing information about operating hours and locations, a poor experience can lead to frustration, or worse, a customer defection.
In either case, as knowledge management becomes more complex, generative AI presents a game-changing opportunity for enterprises to connect people to the information they need to perform and innovate. With the right strategy, these intelligent solutions can transform how knowledge is captured, organized, and used across an organization.
To help tackle this challenge, Accenture collaborated with AWS to build an innovative generative AI solution called Knowledge Assist. By using AWS generative AI services, the team has developed a system that can ingest and comprehend massive amounts of unstructured enterprise content.
Rather than traditional keyword searches, users can now ask questions and extract precise answers in a straightforward, conversational interface. Generative AI understands context and relationships within the knowledge base to deliver personalized and accurate responses. As it fields more queries, the system continuously improves its language processing through machine learning (ML) algorithms.
Since launching this AI assistance framework, companies have seen dramatic improvements in employee knowledge retention and productivity. By providing quick and precise access to information and enabling employees to self-serve, this solution reduces training time for new hires by over 50% and cuts escalations by up to 40%.
With the power of generative AI, enterprises can transform how knowledge is captured, organized, and shared across the organization. By unlocking their existing knowledge bases, companies can boost employee productivity and customer satisfaction. As Accenture’s collaboration with AWS demonstrates, the future of enterprise knowledge management lies in AI-driven systems that evolve through interactions between humans and machines.
Accenture is working with AWS to help clients deploy Amazon Bedrock, utilize the most advanced foundational models such as Amazon Titan, and deploy industry-leading technologies such as Amazon SageMaker JumpStart and Amazon Inferentia alongside other AWS ML services.
This post provides an overview of an end-to-end generative AI solution developed by Accenture for a production use case using Amazon Bedrock and other AWS services.
Solution overview
A large public health sector client serves millions of citizens every day, and they demand easy access to up-to-date information in an ever-changing health landscape. Accenture has integrated this generative AI functionality into an existing FAQ bot, allowing the chatbot to provide answers to a broader array of user questions. Increasing the ability for citizens to access pertinent information in a self-service manner saves the department time and money, lessening the need for call center agent interaction. Key features of the solution include:

Hybrid intent approach – Uses generative and pre-trained intents
Multi-lingual support – Converses in English and Spanish
Conversational analysis – Reports on user needs, sentiment, and concerns
Natural conversations – Maintains context with human-like natural language processing (NLP)
Transparent citations – Guides users to the source information

Accenture’s generative AI solution provides the following advantages over existing or traditional chatbot frameworks:

Generates accurate, relevant, and natural-sounding responses to user queries quickly
Remembers the context and answers follow-up questions
Handles queries and generates responses in multiple languages (such as English and Spanish)
Continuously learns and improves responses based on user feedback
Is easily integrable with your existing web platform
Ingests a vast repository of enterprise knowledge base
Responds in a human-like manner
The evolution of the knowledge is continuously available with minimal to no effort
Uses a pay-as-you-use model with no upfront costs

The high-level workflow of this solution involves the following steps:

Users create a simple integration with existing web platforms.
Data is ingested into the platform as a bulk upload on day 0 and then incremental uploads day 1+.
User queries are processed in real time with the system scaling as required to meet user demand.
Conversations are saved in application databases (Amazon Dynamo DB) to support multi-round conversations.
The Anthropic Claude foundation model is invoked via Amazon Bedrock, which is used to generate query responses based on the most relevant content.
The Anthropic Claude foundation model is used to translate queries as well as responses from English to other desired languages to support multi-language conversations.
The Amazon Titan foundation model is invoked via Amazon Bedrock to generate vector embeddings.
Content relevance is determined through similarity of raw content embeddings and the user query embedding by using Pinecone vector database embeddings.
The context along with the user’s question is appended to create a prompt, which is provided as input to the Anthropic Claude model. The generated response is provided back to the user via the web platform.

The following diagram illustrates the solution architecture.

The architecture flow can be understood in two parts:

Offline data loading to Amazon Kendra
End-user online flow

In the following sections, we discuss different aspects of the solution and its development in more detail.
Model selection
The process for model selection included regress testing of various models available in Amazon Bedrock, which included AI21 Labs, Cohere, Anthropic, and Amazon foundation models. We checked for supported use cases, model attributes, maximum tokens, cost, accuracy, performance, and languages. Based on this, we selected Claude-2 as best suited for this use case.
Data source
We created an Amazon Kendra index and added a data source using web crawler connectors with a root web URL and directory depth of two levels. Several webpages were ingested into the Amazon Kendra index and used as the data source.
GenAI chatbot request and response process
Steps in this process consist of an end-to-end interaction with a request from Amazon Lex and a response from a large language model (LLM):

The user submits the request to the conversational front-end application hosted in an Amazon Simple Storage Service (Amazon S3) bucket through Amazon Route 53 and Amazon CloudFront.
Amazon Lex understands the intent and directs the request to the orchestrator hosted in an AWS Lambda function.
The orchestrator Lambda function performs the following steps:

The function interacts with the application database, which is hosted in a DynamoDB-managed database. The database stores the session ID and user ID for conversation history.
Another request is sent to the Amazon Kendra index to get the top five relevant search results to build the relevant context. Using this context, modified prompt is constructed required for the LLM model.
The connection is established between Amazon Bedrock and the orchestrator. A request is posted to the Amazon Bedrock Claude-2 model to get the response from the LLM model selected.

The data is post-processed from the LLM response and a response is sent to the user.

Online reporting
The online reporting process consists of the following steps:

End-users interact with the chatbot via a CloudFront CDN front-end layer.
Each request/response interaction is facilitated by the AWS SDK and sends network traffic to Amazon Lex (the NLP component of the bot).
Metadata about the request/response pairings are logged to Amazon CloudWatch.
The CloudWatch log group is configured with a subscription filter that sends logs into Amazon OpenSearch Service.
Once available in OpenSearch Service, logs can be used to generate reports and dashboards using Kibana.

Conclusion
In this post, we showcased how Accenture is using AWS generative AI services to implement an end-to-end approach towards digital transformation. We identified the gaps in traditional question answering platforms and augmented generative intelligence within its framework for faster response times and continuously improving the system while engaging with the users across the globe. Reach out to the Accenture Center of Excellence team to dive deeper into the solution and deploying this solution for your clients.
This Knowledge Assist platform can be applied to different industries, including but not limited to health sciences, financial services, manufacturing, and more. This platform provides natural, human-like responses to questions using knowledge that is secured. This platform enables efficiency, productivity, and more accurate actions for its users can take.
The joint effort builds on the 15-year strategic relationship between the companies and uses the same proven mechanisms and accelerators built by the Accenture AWS Business Group (AABG).
Connect with the AABG team at accentureaws@amazon.com to drive business outcomes by transforming to an intelligent data enterprise on AWS.
For further information about generative AI on AWS using Amazon Bedrock or Amazon SageMaker, we recommend the following resources:

Generative AI on AWS: Technology
Get started with generative AI on AWS using Amazon SageMaker JumpStart

You can also sign up for the AWS generative AI newsletter, which includes educational resources, blogs, and service updates.

About the Authors
Ilan Geller is the Managing Director at Accenture with focus on Artificial Intelligence, helping clients Scale Artificial Intelligence applications and the Global GenAI COE Partner Lead for AWS.
Shuyu Yang is Generative AI and Large Language Model Delivery Lead and also leads CoE (Center of Excellence) Accenture AI (AWS DevOps professional) teams.
Shikhar Kwatra is an AI/ML specialist solutions architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS.
Jay Pillai is a Principal Solution Architect at Amazon Web Services. In this role, he functions as the Global Generative AI Lead Architect and also the Lead Architect for Supply Chain Solutions with AABG. As an Information Technology Leader, Jay specializes in artificial intelligence, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.
Karthik Sonti leads a global team of Solutions Architects focused on conceptualizing, building, and launching horizontal, functional, and vertical solutions with Accenture to help our joint customers transform their business in a differentiated manner on AWS.

Speed up your time series forecasting by up to 50 percent with Amazon …

Posted on September 29, 2023 by i-genie

We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code.
SageMaker Canvas supports a number of use cases, including time-series forecasting used for inventory management in retail, demand planning in manufacturing, workforce and guest planning in travel and hospitality, revenue prediction in finance, and many other business-critical decisions where highly-accurate forecasts are important. As an example, time-series forecasting allows retailers to predict future sales demand and plan for inventory levels, logistics, and marketing campaigns. Time-series forecasting models in SageMaker Canvas use advanced technologies to combine statistical and machine learning algorithms, and deliver highly accurate forecasts.
In this post, we describe the enhancements to the forecasting capabilities of SageMaker Canvas and guide you on using its user interface (UI) and AutoML APIs for time-series forecasting. While the SageMaker Canvas UI offers a code-free visual interface, the APIs empower developers to interact with these features programmatically. Both can be accessed from the SageMaker console.
Improvements in forecasting experience
With today’s launch, SageMaker Canvas has upgraded its forecasting capabilities using AutoML, delivering up to 50 percent faster model building performance and up to 45 percent quicker predictions on average compared to previous versions across various benchmark datasets. This reduces the average model training duration from 186 to 73 minutes and the average prediction time from 33 to 18 minutes for a typical batch of 750 time series with data size up to 100 MB. Users can now also programmatically access model construction and prediction functions through Amazon SageMaker Autopilot APIs, which come with model explainability and performance reports.
Previously, introducing incremental data required retraining the entire model, which was time-consuming and caused operational delays. Now, in SageMaker Canvas, you can add recent data to generate future forecasts without retraining the entire model. Just input your incremental data to your model to use the latest insights for upcoming forecasts. Eliminating retraining accelerates the forecasting process, allowing you to more quickly apply those results to your business processes.
With SageMaker Canvas now using AutoML for forecasting, you can harness model building and prediction functions through SageMaker Autopilot APIs, ensuring consistency across the UI and APIs. For example, you can start with building models in the UI, then switch to using APIs for generating predictions. This updated modeling approach also enhances model transparency in several ways:

Users can access an explainability report that offers clearer insights into factors influencing predictions. This is valuable for risk, compliance teams, and external regulators. The report elucidates how dataset attributes influence specific time series forecasts. It employs impact scores to measure each attribute’s relative effect, indicating whether they amplify or reduce forecast values.
You can now access the trained models and deploy them to SageMaker Inference or your preferred infrastructure for predictions.
A performance report is available, granting deeper insights into optimal models chosen by AutoML for specific time series and the hyperparameters used during training.

Generate time-series forecasts using the SageMaker Canvas UI
The SageMaker Canvas UI lets you seamlessly integrate data sources from the cloud or on-premises, merge datasets effortlessly, train precise models, and make predictions with emerging data—all without coding. Let’s explore generating a time-series forecast using this UI.
First, you import data into SageMaker Canvas from various sources, including from local files from your computer, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Athena, Snowflake, and over 40 other data sources. After importing data, you can explore and visualize it to get additional insights, such as with scatterplots or bar charts. After you’re ready to create a model, you can do it with just a few clicks after configuring necessary parameters, such as selecting a target column to forecast and specifying how many days into the future you want to forecast. The following screenshots show an example visualization of predicting product demand based on historical weekly demand data for specific products in different store locations:

The following image shows weekly forecasts for a specific product in different store locations:

For a comprehensive guide on how to use the SageMaker Canvas UI for forecasting, check out this blog post.
If you need an automated workflow or direct ML model integration into apps, our forecasting functions are accessible through APIs. In the following section, we provide a sample solution detailing how to employ our APIs for automated forecasting.
Generate time-series forecast using APIs
Let’s dive into how to use the APIs to train the model and generate predictions. For this demonstration, consider a situation where a company needs to predict product stock levels at various stores to meet customer demand. At a high level, the API interactions break down into the following steps:

Prepare the dataset.
Create a SageMaker Autopilot job.
Evaluate the Autopilot job:

Explore the model accuracy metrics and backtest results.
Explore the model explainability report.

Generate predictions from the model:

Use the real-time inference endpoint created as part of the Autopilot job; or
Use a batch transform job.

Sample Amazon SageMaker Studio notebook showcasing forecasting with APIs
We’ve provided a sample SageMaker Studio notebook on GitHub to help accelerate your time-to-market when your business prefers to orchestrate forecasting through programmatic APIs. The notebook offers a sample synthetic dataset available through a public S3 bucket. The notebook guides you through all the steps outlined in the workflow image mentioned above. While the notebook provides a basic framework, you can tailor the code sample to fit your specific use case. This includes modifying it to match your unique data schema, time-resolution, forecasting horizon, and other necessary parameters to achieve your desired results.
Conclusion
SageMaker Canvas democratizes time-series forecasting by offering a user-friendly, code-free experience that empowers business analysts to create highly accurate machine learning models. With today’s AutoML upgrades, it delivers up to 50 percent faster model building, up to 45 percent quicker predictions, and introduces API access for both model construction and prediction functions, enhancing its transparency and consistency. The unique ability of SageMaker Canvas to seamlessly handle incremental data without retraining ensures swift adaptation to ever-changing business demands.
Whether you prefer the intuitive UI or versatile APIs, SageMaker Canvas simplifies data integration, model training, and prediction, making it a pivotal tool for data-driven decision-making and innovation across industries.
To learn more, review the documentation, or explore the notebook available in our GitHub repository. Pricing information for time-series forecasting using SageMaker Canvas is available on the SageMaker Canvas Pricing page, and for SageMaker training and inference pricing when using SageMaker Autopilot APIs please see the SageMaker Pricing page.
These capabilities are available in all AWS Regions where SageMaker Canvas and SageMaker Autopilot are publicly accessible. For more information about Region availability, see AWS Services by Region.

About the Authors
Nirmal Kumar is Sr. Product Manager for the Amazon SageMaker service. Committed to broadening access to AI/ML, he steers the development of no-code and low-code ML solutions. Outside work, he enjoys travelling and reading non-fiction.
Charles Laughlin is a Principal AI/ML Specialist Solution Architect who works on the Amazon SageMaker service team at AWS. He helps shape the service roadmap and collaborates daily with diverse AWS customers to help transform their businesses using cutting-edge AWS technologies and thought leadership. Charles holds a M.S. in Supply Chain Management and a Ph.D. in Data Science.
Ridhim Rastogi a Software Development Engineer who works on Amazon SageMaker service team at AWS. He is passionate about building scalable distributed systems with a focus on solving real-world problems through AI/ML. In his spare time, he likes to solve puzzles, read fiction, and explore his surroundings.
Ahmed Raafat is a Principal Solutions Architect at AWS, with 20 years of field experience and a dedicated focus of 5 years within the AWS ecosystem. He specializes in AI/ML solutions. His extensive experience extends across various industry verticals, rendering him a trusted advisor for numerous enterprise customers, facilitating their seamless navigation and acceleration of their cloud journey.
John Oshodi is a Senior Solutions Architect at Amazon Web Services based in London, UK. He specializes in data and analytics and serves as a technical advisor for numerous AWS enterprise customers, supporting and accelerating their cloud journey. Outside of work, he enjoys travelling to new places and experiencing new cultures with his family.

Robust time series forecasting with MLOps on Amazon SageMaker

Posted on September 29, 2023 by i-genie

In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success.
In these applications, time series data can have heavy-tailed distributions, where the tails represent extreme values. Accurate forecasting in these regions is important in determining how likely an extreme event is and whether to raise an alarm. However, these outliers significantly impact the estimation of the base distribution, making robust forecasting challenging. Financial institutions rely on robust models to predict outliers such as market crashes. In energy, weather, and healthcare sectors, accurate forecasts of infrequent but high-impact events such as natural disasters and pandemics enable effective planning and resource allocation. Neglecting tail behavior can lead to losses, missed opportunities, and compromised safety. Prioritizing accuracy at the tails helps lead to reliable and actionable forecasts. In this post, we train a robust time series forecasting model capable of capturing such extreme events using Amazon SageMaker.
To effectively train this model, we establish an MLOps infrastructure to streamline the model development process by automating data preprocessing, feature engineering, hyperparameter tuning, and model selection. This automation reduces human error, improves reproducibility, and accelerates the model development cycle. With a training pipeline, businesses can efficiently incorporate new data and adapt their models to evolving conditions, which helps ensure that forecasts remain reliable and up to date.
After the time series forecasting model is trained, deploying it within an endpoint grants real-time prediction capabilities. This empowers you to make well-informed and responsive decisions based on the most recent data. Furthermore, deploying the model in an endpoint enables scalability, because multiple users and applications can access and utilize the model simultaneously. By following these steps, businesses can harness the power of robust time series forecasting to make informed decisions and stay ahead in a rapidly changing environment.
Overview of solution
This solution showcases the training of a time series forecasting model, specifically designed to handle outliers and variability in data using a Temporal Convolutional Network (TCN) with a Spliced Binned Pareto (SBP) distribution. For more information about a multimodal version of this solution, refer to The science behind NFL Next Gen Stats’ new passing metric. To further illustrate the effectiveness of the SBP distribution, we compare it with the same TCN model but using a Gaussian distribution instead.
This process significantly benefits from the MLOps features of SageMaker, which streamline the data science workflow by harnessing the powerful cloud infrastructure of AWS. In our solution, we use Amazon SageMaker Automatic Model Tuning for hyperparameter search, Amazon SageMaker Experiments for managing experiments, Amazon SageMaker Model Registry to manage model versions, and Amazon SageMaker Pipelines to orchestrate the process. We then deploy our model to a SageMaker endpoint to obtain real-time predictions.
The following diagram illustrates the architecture of the training pipeline.

The following diagram illustrates the inference pipeline.

You can find the complete code in the GitHub repo. To implement the solution, run the cells in SBP_main.ipynb.
Click here to open the AWS console and follow along.
SageMaker pipeline
SageMaker Pipelines offers a user-friendly Python SDK to create integrated machine learning (ML) workflows. These workflows, represented as Directed Acyclic Graphs (DAGs), consist of steps with various types and dependencies. With SageMaker Pipelines, you can streamline the end-to-end process of training and evaluating models, enhancing efficiency and reproducibility in your ML workflows.
The training pipeline begins with generating a synthetic dataset that is split into training, validation, and test sets. The training set is used to train two TCN models, one utilizing Spliced Binned-Pareto distribution and the other employing Gaussian distribution. Both models go through hyperparameter tuning using the validation set to optimize each model. Afterward, an evaluation against the test set is conducted to determine the model with the lowest root mean squared error (RMSE). The model with the best accuracy metric is uploaded to the model registry.
The following diagram illustrates the pipeline steps.

Let’s discuss the steps in more detail.
Data generation
The first step in our pipeline generates a synthetic dataset, which is characterized by a sinusoidal waveform and asymmetric heavy-tailed noise. The data was created using a number of parameters, such as degrees of freedom, a noise multiplier, and a scale parameter. These elements influence the shape of the data distribution, modulate the random variability in our data, and adjust the spread of our data distribution, respectively.
This data processing job is accomplished using a PyTorchProcessor, which runs PyTorch code (generate_data.py) within a container managed by SageMaker. Data and other relevant artifacts for debugging are located in the default Amazon Simple Storage Service (Amazon S3) bucket associated with the SageMaker account. Logs for each step in the pipeline can be found in Amazon CloudWatch.
The following figure is a sample of the data generated by the pipeline.

You can replace the input with a wide variety of time series data, such as symmetric, asymmetric, light-tailed, heavy-tailed, or multimodal distribution. The model’s robustness allows it to be applicable to a broad range of time series problems, provided sufficient observations are available.
Model training
After data generation, we train two TCNs: one using SBP distribution and other using Gaussian distribution. SBP distribution employs a discrete binned distribution as its predictive base, where the real axis is divided into discrete bins, and the model predicts the likelihood of an observation falling within each bin. This methodology enables the capture of asymmetries and multiple modes because the probability of each bin is independent. An example of the binned distribution is shown in the following figure.

The predictive binned distribution on the left is robust to extreme events because the log-likelihood is not dependent on the distance between the predicted mean and observed point, differing from parametric distributions like Gaussian or Student’s t. Therefore, the extreme event represented by the red dot will not bias the learned mean of the distribution. However, the extreme event will have zero probability. To capture extreme events, we form an SBP distribution by defining the lower tail at the 5th quantile and the upper tail at the 95th quantile, replacing both tails with weighted Generalized Pareto Distributions (GPD), which can quantify the likeliness of the event. The TCN will output the parameters for the binned distribution base and GPD tails.
Hyperparameter search
For optimal output, we use automatic model tuning to find the best version of a model through hyperparameter tuning. This step is integrated into SageMaker Pipelines and allows for the parallel run of multiple training jobs, employing various methods and predefined hyperparameter ranges. The result is the selection of the best model based on the specified model metric, which is RMSE. In our pipeline, we specifically tune the learning rate and number of training epochs to optimize our model’s performance. With the hyperparameter tuning capability in SageMaker, we increase the likelihood that our model achieves optimal accuracy and generalization for the given task.
Due to the synthetic nature of our data, we are keeping Context Length and Lead Time as static parameters. Context Length refers to the number of historical time steps inputted into the model, and Lead Time represents the number of time steps in our forecast horizon. For the sample code, we are only tuning Learning Rate and the number of epochs to save on time and cost.
SBP-specific parameters are kept constant based on extensive testing by the authors on the original paper across different datasets:

Number of Bins (100) – This parameter determines the number of bins used to model the base of the distribution. It is kept at 100, which has proven to be most effective across multiple industries.
Percentile Tail (0.05) – This denotes the size of the generalized Pareto distributions at the tail. Like the previous parameter, this has been exhaustively tested and found to be most efficient.

Experiments
The hyperparameter process is integrated with SageMaker Experiments, which helps organize, analyze, and compare iterative ML experiments, providing insights and facilitating tracking of the best-performing models. Machine learning is an iterative process involving numerous experiments encompassing data variations, algorithm choices, and hyperparameter tuning. These experiments serve to incrementally refine model accuracy. However, the large number of training runs and model iterations can make it challenging to identify the best-performing models and make meaningful comparisons between current and past experiments. SageMaker Experiments addresses this by automatically tracking our hyperparameter tuning jobs and allowing us to gain further details and insight into the tuning process, as shown in the following screenshot.

Model evaluation
The models undergo training and hyperparameter tuning, and are subsequently evaluated via the evaluate.py script. This step utilizes the test set, distinct from the hyperparameter tuning stage, to gauge the model’s real-world accuracy. RMSE is used to assess the accuracy of the predictions.
For distribution comparison, we employ a probability-probability (P-P) plot, which assesses the fit between the actual vs. predicted distributions. The closeness of the points to the diagonal indicates a perfect fit. Our comparisons between SBP’s and Gaussian’s predicted distributions against the actual distribution show that SBP’s predictions align more closely with the actual data.

As we can observe, SBP has lower RMSE on the base, lower tail, and upper tail. The SBP distribution improved the accuracy of the Gaussian distribution by 61% on the base, 56% on the lower tail, and 30% on the upper tail. Overall, the SBP distribution has significantly better results.
Model selection
We use a condition step in SageMaker Pipelines to analyze model evaluation reports, opting for the model with the lowest RMSE for improved distribution accuracy. The selected model is converted into a SageMaker model object, readying it for deployment. This involves creating a model package with crucial parameters and packaging it into a ModelStep.
Model registry
The selected model is then uploaded to SageMaker Model Registry, which plays a critical role in managing models ready for production. It stores models, organizes model versions, captures essential metadata and artifacts such as container images, and governs the approval status of each model. By using the registry, we can efficiently deploy models to accessible SageMaker environments and establish a foundation for continuous integration and continuous deployment (CI/CD) pipelines.
Inference
Upon completion of our training pipeline, our model is then deployed using SageMaker hosting services, which enables the creation of an inference endpoint for real-time predictions. This endpoint allows seamless integration with applications and systems, providing on-demand access to the model’s predictive capabilities through a secure HTTPS interface. Real-time predictions can be used in scenarios such as stock price and energy demand forecast. Our endpoint provides a single-step forecast for the provided time series data, presented as percentiles and the median, as shown in the following figure and table.

1st percentile
5th percentile
Median
95th percentile
99th percentile

1.12
3.16
4.70
7.40
9.41

Clean up
After you run this solution, make sure you clean up any unnecessary AWS resources to avoid unexpected costs. You can clean up these resources using the SageMaker Python SDK, which can be found at the end of the notebook. By deleting these resources, you prevent further charges for resources you are no longer using.
Conclusion
Having an accurate forecast can highly impact a business’s future planning and can also provide solutions to a variety of problems in different industries. Our exploration of robust time series forecasting with MLOps on SageMaker has demonstrated a method to obtain an accurate forecast and the efficiency of a streamlined training pipeline.
Our model, powered by a Temporal Convolutional Network with Spliced Binned Pareto distribution, has shown accuracy and adaptability to outliers by improving the RMSE by 61% on the base, 56% on the lower tail, and 30% on the upper tail over the same TCN with Gaussian distribution. These figures make it a reliable solution for real-world forecasting needs.
The pipeline demonstrates the value of automating MLOps features. This can reduce manual human effort, enable reproducibility, and accelerate model deployment. SageMaker features such as SageMaker Pipelines, automatic model tuning, SageMaker Experiments, SageMaker Model Registry, and endpoints make this possible.
Our solution employs a miniature TCN, optimizing just a few hyperparameters with a limited number of layers, which are sufficient for effectively highlighting the model’s performance. For more complex use cases, consider using PyTorch or other PyTorch-based libraries to construct a more customized TCN that aligns with your specific needs. Additionally, it would be beneficial to explore other SageMaker features to enhance your pipeline’s functionality further. To fully automate the deployment process, you can use the AWS Cloud Development Kit (AWS CDK) or AWS CloudFormation.
For more information on time series forecasting on AWS, refer to the following:

Amazon Forecast
Deep demand forecasting with Amazon SageMaker
Hierarchical Forecasting using Amazon SageMaker
Build a cold start time series forecasting engine using AutoGluon

Feel free to leave a comment with any thoughts or questions!

About the Authors
Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.
Alston Chan is a Software Development Engineer at Amazon Ads. He builds machine learning pipelines and recommendation systems for product recommendations on the Detail Page. Outside of work, he enjoys game development and rock climbing.
Maria Masood specializes in building data pipelines and data visualizations at AWS Commerce Platform. She has expertise in Machine Learning, covering natural language processing, computer vision, and time-series analysis. A sustainability enthusiast at heart, Maria enjoys gardening and playing with her dog during her downtime.

What is Model Merging?

Posted on September 28, 2023 by i-genie

Model merging refers to the process of combining multiple distinct models, each designed to perform separate tasks or solve different problems, into a single unified model without requiring additional training. Depending on the specific technique and goal, merging models can also be called ensemble learning, model blending, or model stacking. This technique aims to create a more versatile and comprehensive Machine Learning model capable of handling various tasks simultaneously.

In the context of LLMs, model merging can involve combining LLMs with different initializations, architectures, or training on different tasks. The primary goal is to leverage the strengths of each individual model and create a multi-task LLM that can address a broader range of tasks. This approach can significantly improve performance and efficiency by allowing the combined model to benefit from the knowledge and capabilities of each constituent model.

Why merge ML models?

Combining Machine Learning models offers several benefits, such as reducing prediction variability and bias through averaging or voting among diverse models. Leveraging complex patterns and features from various data sources and models can enhance prediction accuracy and adaptability. Moreover, model merging can improve prediction diversity and reliability by reducing reliance on a single dataset or algorithm.

Model merging results in better performance, improved efficiency, and broader applicability, making it a valuable strategy for leveraging the strengths of different AI models without the need for extensive additional training.

Strategies for combining LLMs

One common approach is to combine models by averaging their weights or parameters. This can result in a fused model that benefits from the knowledge and expertise embedded in each original model. Model merging may also involve the integration of features from each model. This is particularly useful when the models have learned task-specific features that are valuable for the overall performance of the merged model.

Some model merging techniques allow for merging models up to a specified layer, creating a multi-head model. This approach can be beneficial when different models specialize in different aspects of a task.

Some Recent Research Papers on Model Merging

Fusing fine-tuned models for better pretraining

In this research, the authors acknowledge that pretrained models are widely used as a starting point for natural language processing tasks but can be expensive to create. They propose a novel approach of fusing multiple existing fine-tuned models into one, using an average of their weights. This fused model consistently outperforms pretrained models and is often superior to intertraining, where a base model is fine-tuned on another task. The fusion process is less dependent on the target task and remains effective even with weight decay, providing a more cost-effective and resource-efficient method for improving model initialization in NLP.

Resolving Interference When Merging Models

Transfer learning, which involves further fine-tuning pre-trained models for downstream tasks, offers improved performance, faster convergence, and sample efficiency. However, task-specific fine-tuned models often cannot collaborate effectively. Model merging methods have emerged to address this, but they frequently neglect interference between parameters from different models, causing performance drops. In response, the authors propose TIES-MERGING, which resolves interference issues by resetting parameters, resolving sign conflicts, and merging only compatible parameters. TIES-MERGING outperforms existing methods across diverse settings, emphasizing the importance of addressing interference in model merging for enhanced performance and versatility.

ZipIt! Merging Models from Different Tasks without Training

This research addresses the challenge of merging distinct models with different initializations, each trained for a separate task, into a single multi-task model without additional training. While previous model merging methods work for models trained on the same task, they fall short when combining models trained for different tasks. The authors introduce “ZipIt,” a general merging method for arbitrary models with the same architecture to overcome this limitation. ZipIt incorporates two key strategies: first, it allows for merging features within each model to account for non-shared features, and second, it supports partial merging up to a specified layer, creating a multi-head model. These innovations result in a significant 20-60% improvement over previous methods, enabling the effective merging of models trained on disparate tasks.

Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

References:

https://insight.factset.com/finding-the-perfect-blend-merging-large-language-models-with-classic-machine-learning-techniques

https://arxiv.org/pdf/2204.03044.pdf

https://arxiv.org/pdf/2306.01708.pdf

https://www.linkedin.com/advice/1/how-do-you-merge-machine-learning-models-skills-machine-learning

https://arxiv.org/pdf/2305.03053.pdf

The post What is Model Merging? appeared first on MarkTechPost.

Unveiling the Secrets of Multimodal Neurons: A Journey from Molyneux t …

Posted on September 28, 2023 by i-genie

Transformers could be one of the most important innovations in the artificial intelligence domain. These neural network architectures, introduced in 2017, have revolutionized how machines understand and generate human language.

Unlike their predecessors, transformers rely on self-attention mechanisms to process input data in parallel, enabling them to capture hidden relationships and dependencies within sequences of information. This parallel processing capability not only accelerated training times but also opened the way for the development of models with significant levels of sophistication and performance, like the famous ChatGPT.

Recent years have shown us how capable artificial neural networks have become in a variety of tasks. They changed the language tasks, vision tasks, etc. But the real potential lies in crossmodal tasks, where they integrate various sensory modalities, such as vision and text. These models have been augmented with additional sensory inputs and have achieved impressive performance on tasks that require understanding and processing information from different sources.

In 1688, a philosopher named William Molyneux presented a fascinating riddle to John Locke that would continue to captivate the minds of scholars for centuries. The question he posed was simple yet profound: If a person blind from birth were suddenly to gain their sight, would they be able to recognize objects they had previously only known through touch and other non-visual senses? This intriguing inquiry, known as the Molyneux Problem, not only delves into the realms of philosophy but also holds significant implications for vision science.

In 2011, vision neuroscientists started a mission to answer this age-old question. They found that immediate visual recognition of previously touch-only objects is not feasible. However, the important revelation was that our brains are remarkably adaptable. Within days of sight-restoring surgery, individuals could rapidly learn to recognize objects visually, bridging the gap between different sensory modalities.

Is this phenomenon also valid for multimodal neurons? Time to meet the answer.

Multimodal neurons in transformer MLPs activate on specific features. Source: https://arxiv.org/pdf/2308.01544.pdf

We find ourselves in the middle of a technological revolution. Artificial neural networks, particularly those trained on language tasks, have displayed remarkable prowess in crossmodal tasks, where they integrate various sensory modalities, such as vision and text. These models have been augmented with additional sensory inputs and have achieved impressive performance on tasks that require understanding and processing information from different sources.

One common approach in these vision-language models involves using an image-conditioned form of prefix-tuning. In this setup, a separate image encoder is aligned with a text decoder, often with the help of a learned adapter layer. While several methods have employed this strategy, they have usually relied on image encoders, such as CLIP, trained alongside language models.

However, a recent study, LiMBeR, introduced a unique scenario that mirrors the Molyneux Problem in machines. They used a self-supervised image network, BEIT, which had never seen any linguistic data and connected it to a language model, GPT-J, using a linear projection layer trained on an image-to-text task. This intriguing setup raises fundamental questions: Does the translation of semantics between modalities occur within the projection layer, or does the alignment of vision and language representations happen inside the language model itself?

Top five multimodal neurons for a sample image from 6 COCO supercategories. Source: https://arxiv.org/pdf/2308.01544.pdf

The research presented by the authors at MIT seeks to find answers to this 4 centuries-old mystery and shed light on how these multimodal models work.

First, they found that image prompts transformed into the transformer’s embedding space do not encode interpretable semantics. Instead, the translation between modalities occurs within the transformer.

Second, multimodal neurons, capable of processing both image and text information with similar semantics, are discovered within the text-only transformer MLPs. These neurons play a crucial role in translating visual representations into language.

The final and perhaps the most important finding is that these multimodal neurons have a causal effect on the model’s output. Modulating these neurons can lead to the removal of specific concepts from image captions, highlighting their significance in the multimodal understanding of content.

This investigation into the inner workings of individual units within deep networks uncovers a wealth of information. Just as convolutional units in image classifiers can detect colors and patterns, and later units can recognize object categories, multimodal neurons are found to emerge in transformers. These neurons are selective for images and text with similar semantics.

Furthermore, multimodal neurons can emerge even when vision and language are learned separately. They can effectively convert visual representations into coherent text. This ability to align representations across modalities has wide-reaching implications, making language models powerful tools for various tasks that involve sequential modeling, from game strategy prediction to protein design.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Unveiling the Secrets of Multimodal Neurons: A Journey from Molyneux to Transformers appeared first on MarkTechPost.

Researchers from MIT and CUHK Propose LongLoRA (Long Low-Rank Adaptati …

Posted on September 28, 2023 by i-genie

The introduction of Large language models (LLMs) has brought a significant level of advancement in the field of Artificial Intelligence. Based on the concepts of Natural Language Processing (NLP), Natural Language Understanding (NLU), and Natural Language Generation (NLG), LLMs have taken over the world with their incredible capabilities. The well-known models, such as LLaMA and LLaMA2, have been very effective tools for understanding and producing natural language.

However, they have set restrictions, such as a maximum context size of 2048 tokens for LLaMA and 4096 tokens for LLaMA2, respectively. Due to this restriction, they struggle to execute duties that call for digesting lengthy documents or lengthy queries. Training or perfecting LLMs with longer sequences is one method for extending the context window, but this presents computing difficulties and may be resource-prohibitively expensive.

Low-rank adaptation (LoRA) is an easy method for extending the context window. Low-rank matrices, which are computationally efficient and limit the number of trainable parameters, are used by LoRA to alter the linear projection layers in self-attention blocks. However, the training of long-context models with simple low-rank adaptation does not appear to be very effective, according to empirical studies. Due to the typical self-attention mechanism, it produces significant levels of confusion for extended context expansions and loses effectiveness as the context size increases.

To overcome the limitations, a team of researchers has introduced LongLoRA, an efficient fine-tuning approach for extending the context sizes of pre-trained large language models without incurring excessive computational costs. LongLoRA has been developed for effectively increasing the context window of pretrained LLMs like LLaMA2. It accelerates the process of expanding the context of LLMs in two important ways.

First, LongLoRA makes effective context extension during fine-tuning possible by utilizing shift short attention (S2-Attn). While dense global attention is still required for LLMs to perform well during inference, the fine-tuning process can be carried out effectively and quickly by employing sparse local attention. In comparison to fine-tuning with conventional attention techniques, S2-Attn enables context extension and results in significant computational savings, as it can be easily integrated and is an optional part of inference because it just requires two lines of code to implement during training.

Second, LongLoRA reconsiders the fine-tuning procedure with an emphasis on parameter-effective context expansion techniques. The team has discovered that LoRA performs admirably for context extension, provided the model has trainable embedding and normalization layers. This realization is key to successfully extending the context without substantially increasing the computing burden.

With LLaMA2 models ranging in size from 7B/13B to 70B, LongLoRA has presented remarkable empirical results for a variety of tasks. On a single 8 x A100 GPU computer, the method increases the context of these models from 4k tokens to 100k tokens for LLaMA2 7B or up to 32k tokens for LLaMA2 70B. It does this expanded context while maintaining the original model structures, making it compatible with already-in-use methods and tools like FlashAttention-2.

A dataset called LongQA has also been developed for supervised fine-tuning in order to assist the actual use of LongLoRA. More than 3k question-answer pairings with extensive contexts can be found in this dataset. The availability of this dataset expands LongLoRA’s usefulness for academics and professionals looking to expand the capabilities of LLMs.

Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Researchers from MIT and CUHK Propose LongLoRA (Long Low-Rank Adaptation), An Efficient Fine-Tuning AI Approach For Long Context Large Language Models (LLMs) appeared first on MarkTechPost.

A generative AI-powered solution on Amazon SageMaker to help Amazon EU …

Posted on September 28, 2023 by i-genie

The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline design values, to review of value engineering proposals, to analysis of reports and compliance checks. Today, these are addressed by a Central Technical Team, comprised of subject matter experts (SMEs) who can answer such highly technical specialized questions, and provide this service to all stakeholders and teams throughout the project lifecycle. The team is looking for a generative AI question answering solution to quickly get information and proceed with their engineering design. Notably, these use cases are not limited to the Amazon D&C team alone but are applicable to the broader scope of Global Engineering Services involved in project deployment. The entire range of stakeholders and teams engaged in the project lifecycle can benefit from a generative AI question-answering solution, as it will enable quick access to critical information, streamlining the engineering design and project management processes.
The existing generative AI solutions for question answering are mainly based on Retrieval Augmented Generation (RAG). RAG searches documents through large language model (LLM) embedding and vectoring, creates the context from search results through clustering, and uses the context as an augmented prompt to inference a foundation model to get the answer. This method is less efficient for the highly technical documents from Amazon D&C, which contains significant unstructured data such as Excel sheets, tables, lists, figures, and images. In this case, the question answering task works better by fine-tuning the LLM with the documents. Fine-tuning adjusts and adapts the weights of the pre-trained LLM to improve the model quality and accuracy.
To address these challenges, we present a new framework with RAG and fine-tuned LLMs. The solution uses Amazon SageMaker JumpStart as the core service for the model fine-tuning and inference. In this post, we not only provide the solution, but also discuss the lessons learned and best practices when implementing the solution in real-world use cases. We compare and contrast how different methodologies and open-source LLMs performed in our use case and discuss how to find the trade-off between model performance and compute resource costs.
Solution overview
The solution has the following components, as shown in the architecture diagram:

Content repository – The D&C contents include a wide range of human-readable documents with various formats, such as PDF files, Excel sheets, wiki pages, and more. In this solution, we stored these contents in an Amazon Simple Storage Service (Amazon S3) bucket and used them as a knowledge base for information retrieval as well as inference. In the future, we will build integration adapters to access the contents directly from where they live.
RAG framework with a fine-tuned LLM – This consists of the following subcomponents:

RAG framework – This retrieves the relevant data from documents, augments the prompts by adding the retrieved data in context, and passes it to a fine-tuned LLM to generate outputs.
Fine-tuned LLM – We constructed the training dataset from the documents and contents and conducted fine-tuning on the foundation model. After the tuning, the model learned the knowledge from the D&C contents, and therefore can respond to the questions independently.
Prompt validation module – This measures the semantic match between the user’s prompt and the dataset for fine-tuning. If the LLM is fine-tuned to answer this question, then you can inference the fine-tuned model for a response. If not, you can use RAG to generate the response.
LangChain – We use LangChain to build a workflow to respond to the incoming questions.

End-user UI – This is the chatbot UI to capture users’ questions and queries, and present the answer from the RAG and LLM response.

In the next sections, we demonstrate how to create the RAG workflow and build the fine-tuned models.
RAG with foundation models by SageMaker JumpStart
RAG combines the powers of pre-trained dense retrieval and sequence-to-sequence (seq2seq) foundation models. For question answering from Amazon D&C documents, we need to prepare the following in advance:

Embedding and indexing the documents using an LLM embedding model – We split the multiple documents into small chunks based on the document chapter and section structure, tested with the Amazon GPT-J-6B model on SageMaker JumpStart to generate the indexes, and stored the indexes in a FAISS vector store
A pre-trained foundation model to generate responses from prompts – We tested with Flan-T5 XL, Flan-T5 XXL, and Falcon-7B models on SageMaker JumpStart

The question answering process is implemented by LangChain, which is a framework for developing applications powered by language models. The workflow in the chain contains the following steps:

Get a question from the user.
Perform semantic search on the indexed documents through FAISS to get the top K most-relevant document chunks.
Define the prompt template, such as

“””Answer based on context:nn{context}nn{question}”””

Augment the retrieved document chunks as the {context} and the user question as the {question} in the prompt.
Prompt the foundation model with the constructed zero-shot prompt.
Return the model output to the user.

We tested 125 questions on Amazon D&C requirements and specifications, and RAG returned a good response for some questions. In the following example, RAG with the Flan-T5-XXL model provided accurate response from the right sections of the documents.

Question
What are the PLUMBING requirements for Drinking Fountains and Water Coolers? (Section 22 47 00)

Expected answer
The Developer shall provide water dispensing bars, floor standing at the Warehouse wall space immediately adjacent to the entrance of each toilet block serving the Warehouse and in the equivalent location adjacent to the Lockers Room toilet block. Units shall be in accordance with Amazon standard. Final numbers and placement to be agreed with Amazon prior to installation. Minimum of two units dispensing cooled and room temperature water to be provided at each location. Water Coolers or Water Bars – The Developer shall provide potable water supplies as required for remote water coolers or water bars throughout the facility. The Developer shall provide water points, drainage and any electrical services required in the Break Rooms for the Water Bars which will be provided by the Catering Contractor.

RAG answer (GPT-J-6B embedding with Flan-T5-XXL)
The Developer shall provide water dispensing bars, floor standing at the Warehouse wall space immediately adjacent to the entrance of each toilet block serving the Warehouse and in the equivalent location adjacent to the Lockers Room toilet block. Units shall be in accordance with Amazon standard. Final numbers and placement to be agreed with Amazon prior to installation. Minimum of two units dispensing cooled and room temperature water to be provided at each location. Water Coolers or Water Bars – The Developer shall provide potable water supplies as required for remote water coolers or water bars throughout the facility. The Developer shall provide water points, drainage and any electrical services required in the Break Rooms for the Water Bars which will be provided by the Catering Contractor.

However, we observed that RAG doesn’t provide clear and crisp answers to the questions that are more specific and highly technical, such as guidelines, numbers, and figures. As shown in the following example, when a technical dimension output is expected, RAG can’t provide the specific answer, mainly because the embedding and searching process can’t return a concise context.

Question
What is the Floor Heights requirement for P2 First Floor level Concrete?

Expected answer
7,50 m AFFL

RAG answer (GPT-J-6B embedding with Flan-T5-XXL)
P2 First Floor level Concrete

This needs to be improved because hallucination is not tolerated due to the criticality of the consequences in this use case.
Fine-tune LLMs on SageMaker
To address this challenge and improve the response quality, we take a new approach to fine-tune the LLM model using the documents for a question answering task. The model will be trained to learn the corresponding knowledge from the documents directly. Unlike RAG, it’s not dependent on whether the documents are properly embedded and indexed, and whether the semantic search algorithm is effective enough to return the most relevant contents from the vector database.
To prepare the training dataset for fine-tuning, we extract the information from the D&C documents and construct the data in the following format:

Instruction – Describes the task and provides partial prompt
Input – Provides further context to be consolidated into the prompt
Response – The output of the model

During the training process, we add an instruction key, input key, and response key to each part, combine them into the training prompt, and tokenize it. Then the data is fed to a trainer in SageMaker to generate the fine-tuned model.
To accelerate the training process and reduce the cost of compute resources, we employed Parameter Efficient Fine-Tuning (PEFT) with the Low-Rank Adaptation (LoRA) technique. PEFT allows us to only fine-tune a small number of extra model parameters, and LoRA represents the weight updates with two smaller matrices through low-rank decomposition. With PEFT and LoRA on 8-bit quantization (a compression operation that further reduces the memory footprint of the model and accelerates the training and inference performance), we are able to fit the training of 125 question-answer pairs within a g4dn.x instance with a single GPU.
To prove the effectiveness of the fine-tuning, we tested with multiple LLMs on SageMaker. We selected five small-size models: Bloom-7B, Flan-T5-XL, GPT-J-6B, and Falcon-7B on SageMaker JumpStart, and Dolly-3B from Hugging Face on SageMaker.
Through 8-bit LoRA-based training, we are able to reduce the trainable parameters to no more than 5% of the full weights of each model. The training takes 10–20 epochs to converge, as shown in the following figure. For each model, the fine-tuning processes can fit on a single GPU of a g4dn.x instance, which optimized the costs of compute resources.

Inference the fine-tuned model deployed on SageMaker
We deployed the fine-tuned model along with the RAG framework in a single GPU g4dn.x node on SageMaker and compared the inference results for the 125 questions. The model performance is measured by two metrics. One is the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score, a popular natural language processing (NLP) model evaluation method that calculates the quotient of the matching words under the total count of words in the reference sentence. The other is the semantic (textual) similarity score, which measures how close the meaning of two pieces of text meanings are by using a transformer model to encode sentences to get their embeddings, then using a cosine similarity metric to compute their similarity score. From the experiments, we can see these two metrics are fairly consistent in presenting the quality of answers to the questions.
In the following table and figure, we can see that the fine-tuned Falcon-7B model provides the best quality of answering, and the Flan-T5-XL and Dolly-3B models also generate accurate responses.

.
RAG_GPT-J-6B_Flan-T5-XL
RAG_GPT-J-6B_Flan-T5-XXL
Fine-tuned Bloom-7B1
Fine-tuned Flan-T5-XL
Fine-tuned GPT-J-6B
Fine-tuned Dolly-3B
Fine-tuned Falcon-7B

ROUGE Score
0.1938
0.2747
0.3914
0.4977
0.3740
0.6991
0.9459

Semantic Similarity Score
0.4121
0.4786
0.5666
0.7205
0.5414
0.8147
0.9774

The following is one example answer from the models. RAG with the Flan-T5-XL model provided a highly hallucinated response, whereas the response from RAG with the larger Flan-T5-XXL model was more relevant but still not able to cover the full expected message. Comparatively, all the fine-tuned models (Flan-T5-XL, Dolly-3B, and Falcon-7B) were able to provide the most accurate response as the expected answer.

Question
What are the requirements for Post Grinding Sealer/Hardener? (TS0004 – section 2.8)

Expected answer
A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Answer from RAG (GPT-J-6B embedding with Flan-T5-XL)
The Penetrating Concrete Hardener must be discussed at the pre-pour meeting and must be approved by the Flooring Consultant and Amazon

Answer from RAG (GPT-J-6B embedding with Flan-T5-XXL)
The proposed compound must be discussed at the pre – pour meeting and must be approved by the Flooring Consultant and Amazon. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Answer from fine-tuned Flan-T5-XL
A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Answer from fine-tuned Dolly-3B
A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Answer from fine-tuned Falcon-7B
A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Solution prototype and outcome
We developed a prototype based on the presented architecture and conducted a proof of concept to demonstrate the outcome. To take advantage of both the RAG framework and the fine-tuned LLM, and also to reduce the hallucination, we first semantically validate the incoming question. If the question is among the training data for the fine-tuning (the fine-tuned model already has the knowledge to provide a high-quality answer), then we direct the question as a prompt to inference the fine-tuned model. Otherwise, the question goes through LangChain and gets the response from RAG. The following diagram illustrates this workflow.

We tested the architecture with a test dataset of 166 questions, which contains the 125 questions used to fine-tune the model and an additional 41 questions that the fine-tuned model wasn’t trained with. The RAG framework with the embedding model and fine-tuned Falcon-7B model provided high-quality results with a ROUGE score of 0.7898 and a semantic similarity score of 0.8781. As shown in the following examples, the framework is able to generate responses to users’ questions that are well matched with the D&C documents.
The following image is our first example document.

The following screenshot shows the bot output.

The bot is also able to respond with data from a table or list and display figures for the corresponding questions. For example, we use the following document.

The following screenshot shows the bot output.

We can also use a document with a figure, as in the following example.

The following screenshot shows the bot output with text and the figure.

The following screenshot shows the bot output with just the figure.

Lessons learned and best practices
Through the solution design and experiments with multiple LLMs, we learned how to ensure the quality and performance for the question answering task in a generative AI solution. We recommend the following best practices when you apply the solution to your question answering use cases:

RAG provides reasonable responses to engineering questions. The performance is heavily dependent on document embedding and indexing. For highly unstructured documents, you may need some manual work to properly split and augment the documents before LLM embedding and indexing.
The index search is important to determine the RAG final output. You should properly tune the search algorithm to achieve a good level of accuracy and ensure RAG generates more relevant responses.
Fine-tuned LLMs are able to learn additional knowledge from highly technical and unstructured documents, and possess the knowledge within the model with no dependency on the documents after training. This is especially useful for use cases where hallucination is not tolerated.
To ensure the quality of model response, the training dataset format for fine-tuning should utilize a properly defined, task-specific prompt template. The inference pipeline should follow the same template in order to generate human-like responses.
LLMs often come with a substantial price tag and demand considerable resources and exorbitant costs. You can use PEFT and LoRA and quantization techniques to reduce the demand of compute power and avoid high training and inference costs.
SageMaker JumpStart provides easy-to-access pre-trained LLMs for fine-tuning, inference, and deployment. It can significantly accelerate your generative AI solution design and implementation.

Conclusion
With the RAG framework and fine-tuned LLMs on SageMaker, we are able to provide human-like responses to users’ questions and prompts, thereby enabling users to efficiently retrieve accurate information from a large volume of highly unstructured and unorganized documents. We will continue to develop the solution, such as providing a higher level of contextual response from previous interactions, and further fine-tuning the models from human feedback.
Your feedback is always welcome; please leave your thoughts and questions in the comments section.

About the authors
Yunfei Bai is a Senior Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.
Burak Gozluklu is a Principal ML Specialist Solutions Architect located in Boston, MA. Burak has over 15 years of industry experience in simulation modeling, data science, and ML technology. He helps global customers adopt AWS technologies and specifically AI/ML solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is passionate about yoga and meditation.
Elad Dwek is a Construction Technology Manager at Amazon. With a background in construction and project management, Elad helps teams adopt new technologies and data-based processes to deliver construction projects. He identifies needs and solutions, and facilitates the development of the bespoke attributes. Elad has an MBA and a BSc in Structural Engineering. Outside of work, Elad enjoys yoga, woodworking, and traveling with his family.

MDaudit uses AI to improve revenue outcomes for healthcare customers

Posted on September 28, 2023 by i-genie

MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to improve end-user productivity to meet growing demand and adapt to the changing healthcare landscape. MDaudit recognized that in order to meet its healthcare customers’ unique business challenges, it would benefit from automating its external auditing workflow (EAW) using AI to reduce dependencies on legacy IT frameworks and reduce manual activities needed to manage external payer audits. The end goal was to empower its customers to quickly respond to a large volume of external audit requests and improve revenue outcomes with AI-driven automation. MDaudit also recognized the opportunity to evolve its existing architecture into a solution that could scale with the growing demand for its EAW module.
In this post, we discuss MDaudit’s solution to this challenge, the benefits for their customers, and the architecture involved.
Solution overview
MDaudit built an intelligent document processing (IDP) solution, SmartScan.ai. The solution automates the extraction and formatting of data elements from unstructured PDFs that are part of the Additional Documentation Requests (ADR) service for Payment Review that customers of MDaudit receive from commercial and federal payers across the country.
Designed with client-level isolation at the document level, MDaudit customers start by uploading their ADR documents via a web portal to Amazon Simple Storage Service (Amazon S3).

This prompts an AWS Lambda function to initiate Amazon Textract. Using Amazon Textract for optical character recognition (OCR) to convert text images into machine-readable text, MDaudits’s SmartScan.ai can process scanned PDFs without manual review. The solution also uses Amazon Comprehend, which uses natural language processing (NLP) to identify and extract key entities from the ADR documents, such as name, date of birth, and date of service. The OCR extract from Amazon Textract and the output from Amazon Comprehend are then compared against preexisting configurations of data objects stored in Amazon DynamoDB. If the format isn’t recognized, the solution conducts a generalized search to extract relevant data points from the PDFs uploaded by the customer. The new configuration is then sent to the human-in-the-loop using Amazon Augmented AI (Amazon A2I). After the configuration has been approved, it’s stored and made available for future scans, thus enhancing security. By using Amazon CloudWatch in the solution, MDaudit monitors metrics, events, and logs throughout the end-to-end solution.
Benefits
In the post pandemic era, the healthcare sector is still grappling with financial hardships characterized by thin margins as a result of staffing shortages, reduced patient volumes and the upsurge in inflation. Simultaneously, Payer’s post payment recovery audits have skyrocketed by more than 900% and aggravating the situation further, Revenue cycle management (RCM) workforce reductions by 50-70% have put them in a precarious position to defend against the overwhelming impact of these post payment audits. The external audit workflow offered by MDaudit streamlines the management and response to external audits through automated workflows, successfully safeguarding millions of dollars in revenue. With the integration of AI-driven capabilities, using AWS AI/ML services, their innovative solution SmartScan.ai introduces further time savings and enhanced data accuracy by automatically extracting pertinent patient information from lengthy audit letters, which can vary from tens to hundreds of pages. As a result, customers are now capable of managing a much higher volume of demand letters from Payers, increasing their productivity by an estimated tenfold. These advancements lead to improved efficiencies, significant cost savings, faster response to external audits and the retention of revenue in a timely manner.
The Initial adaptation statistics indicate that the average processing time for an ADR letter is approximately 40 seconds, with accuracy rates approaching 90%. Within the first couple months of launching SmartScan.ai, MDaudit’s customers have successfully responded to the audit requests and safeguarded approximately $3 million in revenue.
Our approach to innovation centers on collaboration with our ecosystem partners, and AWS has proven to be a valuable strategic ally in our healthcare transformation mission.” says Nisheet Goenka, VP of Engineering at MDaudit. “Our close cooperation with AWS and our extended account team not only expedited the development process but also spared us four months of dedicated engineering efforts. This has resulted in the creation of a solution that provides us with meaningful data to support our Healthcare customers.”
Summary
This post discussed the unique business challenges faced by customers in the healthcare industry. We also reviewed how MDaudit is solving those challenges, the architecture MDaudit used, and how AI and machine learning played a part in their solution. To start exploring ML and AI today, refer to Machine Learning on AWS, and see where it can help you in your next solution.

About the Authors

Jake Bernstein is a Solutions Architect at Amazon Web Services with a passion for modernization and serverless first architecture. And a focus on helping customers optimize their architecture and accelerate their cloud journey.

Guy Loewy is a Senior Solutions Architect At Amazon Web Services with a focus on serverless and event driven architecture.

Justin Leto is a Senior Solutions Architect At Amazon Web Services with a focus on Machine Learning and Analytics.

Revolutionizing Panoptic Segmentation with FC-CLIP: A Unified Single-S …

Posted on September 27, 2023 by i-genie

Image segmentation is a fundamental computer vision task where an image is divided into meaningful parts or regions. It’s like dividing a picture into different pieces so a computer can identify and understand distinct objects or areas within the image. This process is crucial for various applications, from medical image analysis to autonomous vehicles, as it enables computers to interpret and interact with the visual world much like humans do.

Segmentation can be divided into two topics basically semantic and instance segmentation. Semantic segmentation means labeling each pixel in an image with the type of object it belongs, and the latter is counting individual objects of the same type, even if they’re close together.

Then, there is the king of segmentation: panoptic segmentation. It combines the challenges of both semantic segmentation and instance segmentation, aiming to predict non-overlapping masks, each paired with its corresponding class label.

Over the years, researchers have made significant strides in improving the performance of panoptic segmentation models, with a primary focus on panoptic quality (PQ). However, a fundamental challenge has limited the application of these models in real-world scenarios: the restriction on the number of semantic classes due to the high cost of annotating fine-grained datasets.

This is a significant problem, as you can imagine. It is extremely time-consuming to go over thousands of images and mark every single object inside them. What if we could somehow automate this process? What if we could have a unified approach for this? Time to meet FC-CLIP.

FC-CLIP is a unified single-stage framework that addresses the aforementioned limitation. It holds the potential to revolutionize panoptic segmentation and extend its applicability to open-vocabulary scenarios.

To overcome the challenges of closed-vocabulary segmentation, the computer vision community has explored the realm of open-vocabulary segmentation. In this paradigm, text embeddings of category names represented in natural language are used as label embeddings. This approach enables models to classify objects from a wider vocabulary, significantly enhancing their ability to handle a broader range of categories. Pretrained text encoders are often employed to ensure that meaningful embeddings are provided, allowing models to capture the semantic nuances of words and phrases crucial for open-vocabulary segmentation.

Both ViT-based and CNN-based CLIP produces semantic-meaningful features. Source: https://arxiv.org/pdf/2308.02487.pdf

Multi-modal models, such as CLIP and ALIGN, have shown great promise in open-vocabulary segmentation. These models leverage their ability to learn aligned image-text feature representations from vast amounts of internet data. Recent methods like SimBaseline and OVSeg have adapted CLIP for open-vocabulary segmentation, utilizing a two-stage framework.

While these two-stage approaches have shown considerable success, they inherently suffer from inefficiency and ineffectiveness. The need for separate backbones for mask generation and CLIP classification increases the model size and computational costs. Additionally, these methods often perform mask segmentation and CLIP classification at different input scales, leading to suboptimal results.

This raises a critical question: Can we unify the mask generator and CLIP classifier into a single-stage framework for open-vocabulary segmentation? Such a unified approach could potentially streamline the process, making it more efficient and effective.

Overview of FC-CLIP. Source: https://arxiv.org/pdf/2308.02487.pdf

The answer to this question lies in FC-CLIP. This pioneering single-stage framework seamlessly integrates mask generation and CLIP classification on top of a shared Frozen Convolutional CLIP backbone. FC-CLIP’s design builds upon some smart observations:

1. Pre-trained Alignment: The frozen CLIP backbone ensures that the pre-trained image-text feature alignment remains intact, allowing for out-of-vocabulary classification.

2. Strong Mask Generator: The CLIP backbone can serve as a robust mask generator with the addition of a lightweight pixel decoder and mask decoder.

3. Generalization with Resolution: Convolutional CLIP exhibits better generalization abilities as the input size scales up, making it an ideal choice for dense prediction tasks.

The adoption of a single frozen convolutional CLIP backbone results in an elegantly simple yet highly effective design. FC-CLIP is not only simpler in design but also boasts a substantially lower computational cost. Compared to previous state-of-the-art models, FC-CLIP requires significantly fewer parameters and shorter training times, making it highly practical.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Revolutionizing Panoptic Segmentation with FC-CLIP: A Unified Single-Stage Artificial Intelligence AI Framework appeared first on MarkTechPost.

Meet OpenCopilot: Create Custom AI Copilots for Your Own SaaS Product …

Posted on September 27, 2023 by i-genie

An AI Copilot is an artificial intelligence system that assists developers, programmers, or other professionals in various tasks related to software development, coding, or content creation. AI Copilots can help programmers by providing code suggestions, identifying errors, and offering code snippets that align with the developer’s coding style. AI Copilots can work within integrated development environments (IDEs), assist in collaborative coding projects, and help in the content generation in LLMs.

AI Copilots can learn from the developer’s coding patterns and adapt to their preferences over time, which enhances the user’s assistance experience. Well-known AI Copilots include GitHub Copilot and OpenAI GPT-3. AI Copilots leverage various artificial intelligence, natural language processing (NLP), machine learning, and code analysis. AI Copilots are often updated regularly to incorporate new programming languages, frameworks, and best practices, ensuring they remain valuable to developers as technology evolves.

Now, a team of researchers design OpenCopilot. It is a user’s own AI copilot, trained specifically for their product and their requirement. Unlike generic AI models, OpenCopilot deeply integrates with a product’s underlying APIs by the primary function and effortlessly executes API calls whenever required. It uses LLMs to determine if the user’s request requires calling an API endpoint. It stands as a tool that can significantly improve efficiency and reduce the manual work involved in interfacing with APIs.

OpenCopilot can call your underlying APIs and transform the responses into meaningful texts. It can also automatically produce certain request payload fields based on the context. Users need to provide their API/backend definition as well as their public endpoints to call them. Users can also embed OpenCopilot’s chat bubble into their SaaS applications. OpenCopilot ensures the provided schema is valid to produce optimized results.

However, the limitations of this product as of now are that it cannot call multiple endpoints simultaneously and is not designed for large or complex APIs. It doesn’t retain chat history and treats each message as a standalone interaction.

Users need to create unlimited copilots and embed the copilot into their SaaS product using standard JS calls. They need to provide Swagger definitions for their APIs and embed the validator and recommender to it. Users can add chat memory and Vector DB support for large Swagger files.

Their future work will include making the platform more versatile by introducing a plugin system catering to various authentication methods. They also plan on incorporating offline LLMs as they can process sensitive or confidential information without the need to transmit data over the internet. This will reduce the risk of data breaches and unauthorized access. They are also working on expanding OpenCopilot’s data ingestion capabilities with plans to support a range of formats, from texts and PDFs to websites and other data sources.

Check out the GitHub and Documentation. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meet OpenCopilot: Create Custom AI Copilots for Your Own SaaS Product (like Shopify Sidekick) appeared first on MarkTechPost.