Meet ToolQA: A New Dataset that Evaluates the Ability of Large Languag …

Large Language Models (LLMs) have proven to be really effective in the fields of Natural Language Processing (NLP) and Natural Language Understanding (NLU). Famous LLMs like GPT, BERT, PaLM, etc., are being used by researchers to provide solutions in every domain ranging from education and social media to finance and healthcare. Being trained on massive amounts of datasets, these LLMs capture a vast amount of knowledge. LLMs have displayed ability in question-answering through tuning, content generation, text summarization, translation of languages, etc. Though LLMs have shown impressive capabilities lately, there have been difficulties in producing plausible and ungrounded information without any hallucinations and weakness in numerical reasoning. 

Recent research has shown augmenting LLMs with external tools, including retrieval augmentation, math tools, and code interpreters, is a better approach to overcoming the above challenges. Evaluating the effectiveness of these external tools poses difficulties, as current evaluation methodologies need help to determine whether the model is merely recalling pre-trained information or genuinely utilizing external tools for problem-solving. To overcome these limitations, a team of researchers from the College of Computing, Georgia Institute of Technology, and Atlanta, GA, have introduced ToolQA, a benchmark for question-answering that assesses the proficiency of LLMs in using outside resources.

ToolQA consists of data from eight domains and defines 13 types of tools that can acquire information from external reference corpora. A question, an answer, reference corpora, and a list of available tools are all included in each instance of ToolQA. The uniqueness of ToolQA lies in the fact that all questions can only be answered by using appropriate tools to extract information from the reference corpus, which thereby minimizes the possibility of LLMs answering questions solely based on internal knowledge and allows for a faithful evaluation of their tool-utilization abilities.

ToolQA involves three automated phases: Reference Data Collection, Human-guided Question Generation, and Programmatic Answer Generation. In the first phase, various types of public corpora, including text, tables, and graphs, are gathered from different domains and serve as the reference corpora for tool-based question answering. In the second phase, questions are created that can only be resolved with the aid of the tools rather than the reference corpora. This is accomplished via a template-based question-generating method, which also involves question instantiation with tool attributes and human-guided template production and validation. The third phase produces accurate answers for the generated questions, operators corresponding to the tools are implemented, and answers are obtained programmatically from the reference corpora.

The team conducted experiments using both standard LLMs and tool-augmented LLMs to answer questions in ToolQA. The results showed that LLMs that only rely on internal knowledge, such as ChatGPT and Chain-of-thoughts prompting, have low success rates, about 5% for easy questions and 2% for hard ones. On the other hand, tool-augmented LLMs like Chameleon and ReAct performed better by using external tools, with the best performance achieved by tool-augmented LLMs being 43.15% for easy questions and 8.2% for hard questions.

The outcomes and error analysis show that ToolQA is a difficult benchmark for current tool-augmented LLM approaches, particularly for difficult problems that call for more intricate tool compositional reasoning. It is a promising addition to the developments in AI.

Check Out the Paper and Github Repo. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Meet ToolQA: A New Dataset that Evaluates the Ability of Large Language Models (LLMs) to Use External Tools for Question Answering appeared first on MarkTechPost.

Meet FastSAM: The Breakthrough Real-Time Solution Achieving High-Perfo …

The Segment Anything Model (SAM) is a newer proposal in the field. It’s a vision foundation concept that’s been hailed as a breakthrough. It may employ multiple possible user involvement prompts to segment any object in the image accurately. Using a Transformer model that has been extensively trained on the SA-1B dataset, SAM can easily handle a wide variety of situations and objects. In other words, Segment Anything is now possible thanks to SAM. This task has the potential to serve as a foundation for a wide variety of future vision challenges due to its generalizability. 

Despite these improvements and the promising results of SAM and subsequent models in handling the segment anything task, its practical implementations still need to be improved. The primary challenge with SAM’s architecture is the high processing requirements of Transformer (ViT) models contrasted with their convolutional analogs. Increased demand from commercial applications inspired a team of researchers from China to create a real-time answer to the segment anything problem; researchers call it FastSAM.

To solve this problem, researchers split the segment anything task into two parts: all-instance segmentation and prompt-guided selection. The first step depends on using a detector based on a Convolutional Neural Network (CNN). Segmentation masks for each instance in the image are generated. The second stage then displays the matching region of interest to the input. They show that a real-time model for any arbitrary data segment is feasible using the computational efficiency of convolutional neural networks (CNNs). They also believe our approach could pave the way for the widespread use of the fundamental segmentation process in commercial settings.

Using the YOLACT approach, YOLOv8-seg is an object detector that forms the basis of our proposed FastSAM. Researchers also use SAM’s comprehensive SA-1B dataset. This CNN detector achieves performance on par with SAM despite being directly trained using only 2% (1/50) of the SA-1B dataset, allowing for real-time application despite significantly decreased computational and resource constraints. They also demonstrate its generalization performance by applying it to various downstream segmentation tasks.

The segment-anything model in real-time has practical applications in industry. It has a wide range of possible uses. The suggested method not only offers a novel, implementable answer to a wide variety of vision tasks but also at a very high speed, often tens or hundreds of times quicker than conventional approaches. The new perspectives it provides on large model architecture for general vision problems are also welcome. Our research suggests that there are still cases where specialized models offer the best efficiency-accuracy balance. Our method then demonstrates the viability of a route that, by inserting an artificial before the structure, can greatly minimize the computational cost required to run the model.

The team summarizes their main contributions as follows:

The Segment Anything challenge is addressed by introducing a revolutionary, real-time CNN-based method that drastically decreases processing requirements without sacrificing performance.

Insights into the potential of lightweight CNN models in complicated vision tasks are shown in this article, which includes the first research of applying a CNN detector to the segment anything challenge.

The merits and shortcomings of the suggested method in the segment of anything domain are revealed through a comparison with SAM on various benchmarks.

Overall, the proposed FastSAM matches the performance of SAM while being 50x and 170x faster to execute, respectively. Its fast performance might benefit industrial applications, such as road obstacle identification, video instance tracking, and picture editing. FastSAM can produce higher-quality masks for huge objects in some photos. The suggested FastSAM can fulfill the real-time segment operation by selecting resilient and efficient objects of interest from a segmented image. They conducted an empirical investigation comparing FastSAM to SAM on four zero-shot tasks: edge recognition, proposal generation, instance segmentation, and localization with text prompts. Results show that FastSAM is 50 times faster than SAM-ViT-H in running time and can efficiently process many downstream jobs in real-time.

Check Out the Paper and Github Repo. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Meet FastSAM: The Breakthrough Real-Time Solution Achieving High-Performance Segmentation with Minimal Computational Load appeared first on MarkTechPost.

Meet SDFStudio: An Unified and Modular Framework for Neural Implicit S …

Over the past few years, there has been a rapid increase in several computer vision and computer graphics-related fields, especially surface reconstruction. The primary goal of this ever-changing field in 3D scanning is to efficiently recreate surfaces from given point clouds while meeting specific quality criteria. These algorithms aim to estimate the underlying geometry of the scanned object’s surface based on the given point cloud data. The surface can then be utilized for various purposes, such as visualization, virtual reality, computer-aided design, and medical imaging. Some of the most well-known approaches for surface reconstruction include Self-Organized Maps, Bayesian reconstruction, and Poisson reconstruction. With surface reconstruction being a crucial aspect of 3D scanning, immense research is ongoing to come up with various suitable techniques for surface reconstruction from 3D scans using unsupervised machine learning. 

Taking a step in this direction, a diverse group of researchers from the University of Tübinge, ETH Zurich, and Czech Technical University, Prague, have collaborated and developed SDFStudio, a unified and versatile tool for Neural Implicit Surface Reconstruction (NISR). The framework has been built on top of the nerfstudio project, which essentially provides APIs to streamline the process of creating, training, and visualizing Neural Radiance Fields (NeRF). As part of its implementation, the developers have used three major surface reconstruction methods: UniSurf, VolSDF, and NeuS. UniSurf, or Universal Surface Reconstruction, is a surface reconstruction method that aims to generate a smooth surface representation from an unorganized point cloud by combining implicit functions and polygonal meshes. Volumetric Signed Distance Field, or VolSDF, on the other hand, is a surface reconstruction method that leverages a volumetric representation of the input point cloud. NeuS, or Neural Surface, is a surface reconstruction method that utilizes deep neural networks to generate a surface representation from a point cloud by essentially combining the strengths of both implicit surface representations and learning-based approaches. 

In order to support a range of scene representations and techniques for surface reconstruction, SDFStudio uses the Signed Distance Function (SDF) as its key representation, which defines the surface as an iso-surface of the implicit function. In order to estimate the SDF, SDFStudio uses various techniques such as Multi-Layer Perceptrons (MLPs), Tri-plane, and Multi-res feature grids. These techniques leverage neural networks and feature grids to estimate the signed distance or occupancy values at different locations in the scene. To further enhance accuracy and efficiency, the tool also incorporates multiple point sampling strategies, one of them being surface-guided sampling, inspired by the UniSurf method. Additionally, SDFStudio employs Voxel-surface guided sampling derived from the NeuralReconW method. This approach leverages the information from voxel grids to guide the sampling process, ensuring that the generated points are more likely to lie on the object’s surface. By incorporating such sampling techniques, SDFStudio ensures that the generated point samples are representative of the underlying surface and ensures the improved quality and accuracy of the reconstructed surfaces.One of the standout characteristics of SDFStudio is that it offers a unified and modular implementation, which provides a convenient framework for transferring ideas and techniques between different methods within the tool. For example, idea transfer is observed from Mono-NeuS to NeuS. Another instance of idea transfer is seen in Geo-VolSDF, which incorporates the idea from Geo-NeuS into VolSDF. This ability to transfer ideas between different methods in SDFStudio promotes advancements in surface reconstruction by giving researchers the room to experiment with different combinations, taking inspiration from one process and integrating it into another. To quickly get started with SDFStudio, you can follow the setup instructions available on its GitHub repository.

Check Out the Project Page and Github Repo. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

tinyEinstein

AdCreative.ai

SaneBox

Katch

Check Out 100’s AI Tools in AI Tools Club
The post Meet SDFStudio: An Unified and Modular Framework for Neural Implicit Surface Reconstruction Built on Top of the Nerfstudio Project appeared first on MarkTechPost.

Web-Scale Training Unleashed: Deepmind Introduces OWLv2 and OWL-ST, th …

Open-vocabulary object detection is a critical aspect of various real-world computer vision tasks. However, the limited availability of detection training data and the fragility of pre-trained models often lead to subpar performance and scalability issues.

To tackle this challenge, the DeepMind research team introduces the OWLv2 model in their latest paper, “Scaling Open-Vocabulary Object Detection.” This optimized architecture improves training efficiency and incorporates the OWL-ST self-training recipe, substantially enhancing detection performance and achieving state-of-the-art results in the open-vocabulary detection task.

The primary objective of this work is to optimize the label space, annotation filtering, and training efficiency for the open-vocabulary detection self-training approach, ultimately achieving robust and scalable open-vocabulary performance with limited labeled data.

The proposed self-training approach consists of three key steps:

The team employs an existing open-vocabulary detector to perform open box detection on WebLI, a large-scale dataset of web image-text pairs.

They utilize OWL-ViT CLIP-L/14 to annotate all the WebLI images with bounding box pseudo annotations.

They fine-tune the trained model using human-annotated detection data, further refining its performance.

Notably, the researchers employ a variant of the OWL-ViT architecture to train more effective detectors. This architecture leverages contrastively trained image-text models to initialize image and text encoders while the detection heads are randomly initialized.

During the training stage, the team employs the same losses and augments queries with “pseudo-negatives” from the OWL-ViT architecture, optimizing training efficiency to maximize the utilization of the available labeled images. 

They also incorporate previously proposed practices for large-scale Transformer training to enhance training efficiency further. As a result, the OWLv2 model reduces training FLOPS by approximately 50% and accelerates training throughput by 2× compared to the original OWL-ViT model.

The team compares their proposed approach with previous state-of-the-art open-vocabulary detectors in their empirical study. The OWL-ST technique improves Average Precision (AP) on LVIS rare classes from 31.2% to 44.6%. Moreover, combining the OWL-ST recipe with the OWLv2 architecture achieves new state-of-the-art performance.

Overall, the OWL-ST recipe presented in this paper significantly improves detection performance by leveraging weak supervision from large-scale web data, enabling web-scale training for open-world localization. This approach addresses the limitations posed by the scarcity of labeled detection data and demonstrates the potential for achieving robust open-vocabulary object detection in a scalable manner.

Check Out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

tinyEinstein

AdCreative.ai

SaneBox

Katch

Check Out 100’s AI Tools in AI Tools Club
The post Web-Scale Training Unleashed: Deepmind Introduces OWLv2 and OWL-ST, the Game-Changing Tools for Open-Vocabulary Object Detection, Powered by Unprecedented Self-Training Techniques appeared first on MarkTechPost.

Democratize computer vision defect detection for manufacturing quality …

Cost of poor quality is top of mind for manufacturers. Quality defects increase scrap and rework costs, decrease throughput, and can impact customers and company reputation. Quality inspection on the production line is crucial for maintaining quality standards. In many cases, human visual inspection is used to assess the quality and detect defects, which can limit the throughput of the line due to limitations of human inspectors.
The advent of machine learning (ML) and artificial intelligence (AI) brings additional visual inspection capabilities using computer vision (CV) ML models. Complimenting human inspection with CV-based ML can reduce detection errors, speed up production, reduce the cost of quality, and positively impact customers. Building CV ML models typically requires expertise in data science and coding, which are often rare resources in manufacturing organizations. Now, quality engineers and others on the shop floor can build and evaluate these models using no-code ML services, which can accelerate exploration and adoption of these models more broadly in manufacturing operations.
Amazon SageMaker Canvas is a visual interface that enables quality, process, and production engineers to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code. You can use SageMaker Canvas to create single-label image classification models for identifying common manufacturing defects using your own image datasets.
In this post, you will learn how to use SageMaker Canvas to build a single-label image classification model to identify defects in manufactured magnetic tiles based on their image.
Solution overview
This post assumes the viewpoint of a quality engineer exploring CV ML inspection, and you will work with sample data of magnetic tile images to build an image classification ML model to predict defects in the tiles for the quality check. The dataset contains more than 1,200 images of magnetic tiles, which have defects such as blowhole, break, crack, fray, and uneven surface. The following images provide an example of single-label defect classification, with a cracked tile on the left and a tile free of defects on the right.

In a real-world example, you can collect such images from the finished products in the production line. In this post, you use SageMaker Canvas to build a single-label image classification model that will predict and classify defects for a given magnetic tile image.
SageMaker Canvas can import image data from a local disk file or Amazon Simple Storage Service (Amazon S3). For this post, multiple folders have been created (one per defect type such as blowhole, break, or crack) in an S3 bucket, and magnetic tile images are uploaded to their respective folders. The folder called Free contains defect-free images.

There are four steps involved in building the ML model using SageMaker Canvas:

Import the dataset of the images.
Build and train the model.
Analyze the model insights, such as accuracy.
Make predictions.

Prerequisites
Before starting, you need to set up and launch SageMaker Canvas. This setup is performed by an IT administrator and involves three steps:

Set up an Amazon SageMaker domain.
Set up the users.
Set up permissions to use specific features in SageMaker Canvas.

Refer to Getting started with using Amazon SageMaker Canvas and Setting Up and Managing Amazon SageMaker Canvas (for IT Administrators) to configure SageMaker Canvas for your organization.
When SageMaker Canvas is set up, the user can navigate to the SageMaker console, choose Canvas in the navigation pane, and choose Open Canvas to launch SageMaker Canvas.

The SageMaker Canvas application is launched in a new browser window.

After the SageMaker Canvas application is launched, you start the steps of building the ML model.
Import the dataset
Importing the dataset is the first step when building an ML model with SageMaker Canvas.

In the SageMaker Canvas application, choose Datasets in the navigation pane.
On the Create menu, choose Image.
For Dataset name, enter a name, such as Magnetic-Tiles-Dataset.
Choose Create to create the dataset.

After the dataset is created, you need to import images in the dataset.

On the Import page, choose Amazon S3 (the magnetic tiles images are in an S3 bucket).

You have the choice to upload the images from your local computer as well.

Select the folder in the S3 bucket where the magnetic tile images are stored and chose Import Data.

SageMaker Canvas starts importing the images into the dataset. When the import is complete, you can see the image dataset created with 1,266 images.

You can choose the dataset to check the details, such as a preview of the images and their label for the defect type. Because the images were organized in folders and each folder was named with the defect type, SageMaker Canvas automatically completed the labeling of the images based on the folder names. As an alternative, you can import unlabeled images, add labels, and perform labeling of the individual images at a later point of time. You can also modify the labels of the existing labeled images.

The image import is complete and you now have an images dataset created in the SageMaker Canvas. You can move to the next step to build an ML model to predict defects in the magnetic tiles.
Build and train the model
You train the model using the imported dataset.

Choose the dataset (Magnetic-tiles-Dataset) and choose Create a model.
For Model name, enter a name, such as Magnetic-Tiles-Defect-Model.
Select Image analysis for the problem type and choose Create to configure the model build.

On the model’s Build tab, you can see various details about the dataset, such as label distribution, count of labeled vs. unlabeled images, and also model type, which is single-label image prediction in this case. If you have imported unlabeled images or you want to modify or correct the labels of certain images, you can choose Edit dataset to modify the labels.

You can build model in two ways: Quick build and Standard build. The Quick build option prioritizes speed over accuracy. It trains the model in 15–30 minutes. The model can be used for the prediction but it can’t be shared. It’s a good option to quickly check feasibility and accuracy of training a model with a given dataset. The Standard build chooses accuracy over speed, and model training can take between 2–4 hours.
For this post, you train the model using the Standard build option.

Choose Standard build on the Build tab to start training the model.

The model training starts instantly. You can see the expected build time and training progress on the Analyze tab.

Wait until the model training is complete, then you can analyze model performance for the accuracy.
Analyze the model
In this case, it took less than an hour to complete the model training. When the model training is complete, you can check model accuracy on the Analyze tab to determine if the model can accurately predict defects. You see the overall model accuracy is 97.7% in this case. You can also check the model accuracy for each of the individual label or defect type, for instance 100% for Fray and Uneven but approximately 95% for Blowhole. This level of accuracy is encouraging, so we can continue the evaluation.

To better understand and trust the model, enable Heatmap to see the areas of interest in the image that the model uses to differentiate the labels. It’s based on the class activation map (CAM) technique. You can use the heatmap to identify patterns from your incorrectly predicted images, which can help improve the quality of your model.

On the Scoring tab, you can check precision and recall for the model for each of the labels (or class or defect type). Precision and recall are evaluation metrics used to measure the performance of a binary and multiclass classification model. Precision tells how good the model is at predicting a specific class (defect type, in this example). Recall tells how many times the model was able to detect a specific class.

Model analysis helps you understand the accuracy of the model before you use it for prediction.
Make predictions
After the model analysis, you can now make predictions using this model to identify defects in the magnetic tiles.
On the Predict tab, you can choose Single prediction and Batch prediction. In a single prediction, you import a single image from your local computer or S3 bucket to make a prediction about the defect. In batch prediction, you can make predictions for multiple images that are stored in a SageMaker Canvas dataset. You can create a separate dataset in SageMaker Canvas with the test or inference images for the batch prediction. For this post, we use both single and batch prediction.
For single prediction, on the Predict tab, choose Single prediction, then choose Import image to upload the test or inference image from your local computer.

After the image is imported, the model makes a prediction about the defect. For the first inference, it might take few minutes because the model is loading for the first time. But after the model is loaded, it makes instant predictions about the images. You can see the image and the confidence level of the prediction for each label type. For instance, in this case, the magnetic tile image is predicted to have an uneven surface defect (the Uneven label) and the model is 94% confident about it.

Similarly, you can use other images or a dataset of images to make predictions about the defect.
For the batch prediction, we use the dataset of unlabeled images called Magnetic-Tiles-Test-Dataset by uploading 12 test images from your local computer to the dataset.

On the Predict tab, choose Batch prediction and choose Select dataset.

Select the Magnetic-Tiles-Test-Dataset dataset and choose Generate predictions.

It will take some time to generate the predictions for all the images. When the status is Ready, choose the dataset link to see the predictions.

You can see predictions for all the images with confidence levels. You can choose any of the individual images to see image-level prediction details.

You can download the prediction in CSV or .zip file format to work offline. You can also verify the predicted labels and add them to your training dataset. To verify the predicted labels, choose Verify prediction.

In the prediction dataset, you can update labels of the individual images if you don’t find the predicted label correct. When you have updated the labels as required, choose Add to trained dataset to merge the images into your training dataset (in this example, Magnetic-Tiles-Dataset).

This updates the training dataset, which includes both your existing training images and the new images with predicted labels. You can train a new model version with the updated dataset and potentially improve the model’s performance. The new model version won’t be an incremental training, but a new training from scratch with the updated dataset. This helps keep the model refreshed with new sources of data.
Clean up
After you have completed your work with SageMaker Canvas, choose Log out to close the session and avoid any further cost.

When you log out, your work such as datasets and models remains saved, and you can launch a SageMaker Canvas session again to continue the work later.
SageMaker Canvas creates an asynchronous SageMaker endpoint for generating the predictions. To delete the endpoint, endpoint configuration, and model created by SageMaker Canvas, refer to Delete Endpoints and Resources.
Conclusion
In this post, you learned how to use SageMaker Canvas to build an image classification model to predict defects in manufactured products, to compliment and improve the visual inspection quality process. You can use SageMaker Canvas with different image datasets from your manufacturing environment to build models for use cases like predictive maintenance, package inspection, worker safety, goods tracking, and more. SageMaker Canvas gives you the ability to use ML to generate predictions without needing to write any code, accelerating the evaluation and adoption of CV ML capabilities.
To get started and learn more about SageMaker Canvas, refer to the following resources:

Amazon SageMaker Canvas Developer Guide
Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts

About the authors
Brajendra Singh is solution architect in Amazon Web Services working with enterprise customers. He has strong developer background and is a keen enthusiast for data and machine learning solutions.
Danny Smith is Principal, ML Strategist for Automotive and Manufacturing Industries, serving as a strategic advisor for customers. His career focus has been on helping key decision-makers leverage data, technology and mathematics to make better decisions, from the board room to the shop floor. Lately most of his conversations are on democratizing machine learning and generative AI.
Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.